Calculating conditional cumulative time


Calculating conditional cumulative time



Following the pointers from this question.



I'd like to calculate the cumulative time for all the Cats, by considering their respective last toggle status.


Cat



EDIT:
I'd also want to check if the FIRST Toggle status of a Cat is Off and if it is so, for that specific cat, the time from midnight 00:00:00 till this first FIRST Off time should be added to its total conditional cumulative ontime.


EDIT:


Toggle


Cat


Off


cat


00:00:00



Sample data:


Time Cat Toggle
1 05:12:09 36 On
2 05:12:12 26R Off # First Toggle of this Cat happens to be Off, Condition met
3 05:12:15 26R On
4 05:12:16 26R Off
5 05:12:18 99 Off # Condition met
6 05:12:18 99 On
7 05:12:24 36 Off
8 05:12:26 36 On
9 05:12:29 80 Off # Condition met
10 05:12:30 99 Off
11 05:12:31 95 Off # Condition met
12 05:12:32 36 Off



Desired sample output:


Cat Time(Secs)
1 36 21
2 26R 18733 # (=1+18732), 18732 secs to be added = total Sec from midnight till 05:12:12
3 99 18750 # (=12+18738), 18738 secs to be added = total Sec from midnight till 05:12:18
4 .. ..



Any sort of help is appreciated.





What have you tried so far?
– camille
Jun 30 at 21:37





@camille Created a list of lists (dataframes) for all the cats, but as there's no specific sequence of toggles, things didn't work out.
– Keyshov Borate
Jun 30 at 21:47




3 Answers
3



A possible solution using data.table:


# load the 'data.table'-package, convert 'df' to a 'data.table'
# and 'Time'-column to a time-format
library(data.table)
setDT(df)[, Time := as.ITime(Time)]

# calculate the time-difference
df[, .(time.diff = sum((shift(Time, type = 'lead') - Time) * (Toggle == 'On'), na.rm = TRUE))
, by = Cat]



which gives:


Cat time.diff
1: 36 21
2: 26R 1
3: 99 12
4: 80 0
5: 95 0



In respons to your question in the comments, you could do:


# create a new data.table with midnigth times for the categories where
# the first 'Toggle' is on "Off"
df0 <- df[, .I[first(Toggle) == "Off"], by = Cat
][, .(Time = as.ITime("00:00:00"), Cat = unique(Cat), Toggle = "On")]

# bind that to the original data.table; order on 'Cat' and 'Time'
# and then do the same calculation
rbind(df, df0)[order(Cat, Time)
][, .(time.diff = sum((shift(Time, type = 'lead') - Time) * (Toggle == 'On'), na.rm = TRUE))
, by = Cat]



which gives:


Cat time.diff
1: 26R 18733
2: 36 21
3: 80 18749
4: 95 18751
5: 99 18750



An alternative with base R (only original question):


df$Time <- as.POSIXct(df$Time, format = "%H:%M:%S")

stack(sapply(split(df, df$Cat),
function(x) sum(diff(x[["Time"]]) * (head(x[["Toggle"]],-1) == 'On'))))



which gives:


values ind
1 1 26R
2 21 36
3 0 80
4 0 95
5 12 99



Or with the tidyverse (only original question):


library(dplyr)
library(lubridate)

df %>%
mutate(Time = lubridate::hms(Time)) %>%
group_by(Cat) %>%
summarise(time.diff = sum(diff(Time) * (head(Toggle, -1) == 'On'),
na.rm = TRUE))





Thanks Jaap. While creating time.diff, how could I check if the First Toggle for the Cat is Off and if so how could I add 'time from midnight to its First Off time'.
– Keyshov Borate
Jul 2 at 12:47



time.diff


Toggle


Cat


Off





@KeyshovBorate Do you mean that for Cat == 26R the first Toggle has to be set to On and the corresponding Time to zero? Could you include the desired output based on above example data?
– Jaap
Jul 2 at 13:43



Cat == 26R


Toggle


On


Time





Sure let me add. No. It's basically adding the time from midnight for all the Cats which has its Very First toggle as Off - Because that's going to be the first time it has been off since midnight - the time which we must consider as its total ontime.
– Keyshov Borate
Jul 2 at 14:37


Cat


Off





@KeyshovBorate see the update, HTH
– Jaap
Jul 2 at 15:23





Great to know Jaap :)
– Keyshov Borate
Jul 2 at 16:54



using base R:


df$Time=as.POSIXct(df$Time,,"%H:%M:%S")

stack(by(df,df$Cat,function(x)sum(c(0,diff(x$Time))*(x$Toggle=="Off"))))

values ind
1 1 26R
2 21 36
3 0 80
4 0 95
5 12 99



One can use as.difftime function to convert time from H:M:S format to seconds. Then for each On statue find the lead record in order to calculate interval of time lapsed from On.


as.difftime


H:M:S


On


lead


On


library(dplyr)

# Convert Time in seconds.
df %>% mutate(Time = as.difftime(Time, units = "secs")) %>%
group_by(Cat) %>%
mutate(TimeInterVal = ifelse(Toggle == "On", (lead(Time) - Time), 0)) %>%
summarise(TimeInterVal = sum(TimeInterVal))


# # A tibble: 5 x 2
# Cat TimeInterVal
# <chr> <dbl>
# 1 26R 1.00
# 2 36 21.0
# 3 80 0
# 4 95 0
# 5 99 12.0



Note: On can consider arranging data on Time ensure rows are ordered on time.


Time



Data:


df <- read.table(text ="
Time Cat Toggle
1 05:12:09 36 On
2 05:12:12 26R Off
3 05:12:15 26R On
4 05:12:16 26R Off
5 05:12:18 99 Off
6 05:12:18 99 On
7 05:12:24 36 Off
8 05:12:26 36 On
9 05:12:29 80 Off
10 05:12:30 99 Off
11 05:12:31 95 Off
12 05:12:32 36 Off",
header = TRUE, stringsAsFactors = FALSE)





Thanks MKR for quick solution, +1. Till the point of TimeInterVal creation, there weren't issues although in summarise it produced NAs too often, tried handling those but to avail hard luck
– Keyshov Borate
Jul 1 at 8:45


TimeInterVal


summarise


NA





@KeyshovBorate Just change the summarise as summarise(TimeInterVal = sum(TimeInterVal, na.rm = TRUE))
– MKR
Jul 1 at 9:03


summarise


summarise(TimeInterVal = sum(TimeInterVal, na.rm = TRUE))





Yes, did that, still the same..
– Keyshov Borate
Jul 1 at 10:32






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

django NoReverseMatch Exception

List of Kim Possible characters