Calculating conditional cumulative time
Calculating conditional cumulative time
Following the pointers from this question.
I'd like to calculate the cumulative time for all the Cat
s, by considering their respective last toggle status.
Cat
EDIT:
I'd also want to check if the FIRST Toggle
status of a Cat
is Off
and if it is so, for that specific cat
, the time from midnight 00:00:00
till this first FIRST Off time should be added to its total conditional cumulative ontime.
EDIT:
Toggle
Cat
Off
cat
00:00:00
Sample data:
Time Cat Toggle
1 05:12:09 36 On
2 05:12:12 26R Off # First Toggle of this Cat happens to be Off, Condition met
3 05:12:15 26R On
4 05:12:16 26R Off
5 05:12:18 99 Off # Condition met
6 05:12:18 99 On
7 05:12:24 36 Off
8 05:12:26 36 On
9 05:12:29 80 Off # Condition met
10 05:12:30 99 Off
11 05:12:31 95 Off # Condition met
12 05:12:32 36 Off
Desired sample output:
Cat Time(Secs)
1 36 21
2 26R 18733 # (=1+18732), 18732 secs to be added = total Sec from midnight till 05:12:12
3 99 18750 # (=12+18738), 18738 secs to be added = total Sec from midnight till 05:12:18
4 .. ..
Any sort of help is appreciated.
@camille Created a list of lists (dataframes) for all the cats, but as there's no specific sequence of toggles, things didn't work out.
– Keyshov Borate
Jun 30 at 21:47
3 Answers
3
A possible solution using data.table:
# load the 'data.table'-package, convert 'df' to a 'data.table'
# and 'Time'-column to a time-format
library(data.table)
setDT(df)[, Time := as.ITime(Time)]
# calculate the time-difference
df[, .(time.diff = sum((shift(Time, type = 'lead') - Time) * (Toggle == 'On'), na.rm = TRUE))
, by = Cat]
which gives:
Cat time.diff
1: 36 21
2: 26R 1
3: 99 12
4: 80 0
5: 95 0
In respons to your question in the comments, you could do:
# create a new data.table with midnigth times for the categories where
# the first 'Toggle' is on "Off"
df0 <- df[, .I[first(Toggle) == "Off"], by = Cat
][, .(Time = as.ITime("00:00:00"), Cat = unique(Cat), Toggle = "On")]
# bind that to the original data.table; order on 'Cat' and 'Time'
# and then do the same calculation
rbind(df, df0)[order(Cat, Time)
][, .(time.diff = sum((shift(Time, type = 'lead') - Time) * (Toggle == 'On'), na.rm = TRUE))
, by = Cat]
which gives:
Cat time.diff
1: 26R 18733
2: 36 21
3: 80 18749
4: 95 18751
5: 99 18750
An alternative with base R (only original question):
df$Time <- as.POSIXct(df$Time, format = "%H:%M:%S")
stack(sapply(split(df, df$Cat),
function(x) sum(diff(x[["Time"]]) * (head(x[["Toggle"]],-1) == 'On'))))
which gives:
values ind
1 1 26R
2 21 36
3 0 80
4 0 95
5 12 99
Or with the tidyverse (only original question):
library(dplyr)
library(lubridate)
df %>%
mutate(Time = lubridate::hms(Time)) %>%
group_by(Cat) %>%
summarise(time.diff = sum(diff(Time) * (head(Toggle, -1) == 'On'),
na.rm = TRUE))
Thanks Jaap. While creating
time.diff
, how could I check if the First Toggle
for the Cat
is Off
and if so how could I add 'time from midnight to its First Off time'.– Keyshov Borate
Jul 2 at 12:47
time.diff
Toggle
Cat
Off
@KeyshovBorate Do you mean that for
Cat == 26R
the first Toggle
has to be set to On
and the corresponding Time
to zero? Could you include the desired output based on above example data?– Jaap
Jul 2 at 13:43
Cat == 26R
Toggle
On
Time
Sure let me add. No. It's basically adding the time from midnight for all the
Cat
s which has its Very First toggle as Off
- Because that's going to be the first time it has been off since midnight - the time which we must consider as its total ontime.– Keyshov Borate
Jul 2 at 14:37
Cat
Off
@KeyshovBorate see the update, HTH
– Jaap
Jul 2 at 15:23
Great to know Jaap :)
– Keyshov Borate
Jul 2 at 16:54
using base R:
df$Time=as.POSIXct(df$Time,,"%H:%M:%S")
stack(by(df,df$Cat,function(x)sum(c(0,diff(x$Time))*(x$Toggle=="Off"))))
values ind
1 1 26R
2 21 36
3 0 80
4 0 95
5 12 99
One can use as.difftime
function to convert time from H:M:S
format to seconds. Then for each On
statue find the lead
record in order to calculate interval of time lapsed from On
.
as.difftime
H:M:S
On
lead
On
library(dplyr)
# Convert Time in seconds.
df %>% mutate(Time = as.difftime(Time, units = "secs")) %>%
group_by(Cat) %>%
mutate(TimeInterVal = ifelse(Toggle == "On", (lead(Time) - Time), 0)) %>%
summarise(TimeInterVal = sum(TimeInterVal))
# # A tibble: 5 x 2
# Cat TimeInterVal
# <chr> <dbl>
# 1 26R 1.00
# 2 36 21.0
# 3 80 0
# 4 95 0
# 5 99 12.0
Note: On can consider arranging data on Time
ensure rows are ordered on time.
Time
Data:
df <- read.table(text ="
Time Cat Toggle
1 05:12:09 36 On
2 05:12:12 26R Off
3 05:12:15 26R On
4 05:12:16 26R Off
5 05:12:18 99 Off
6 05:12:18 99 On
7 05:12:24 36 Off
8 05:12:26 36 On
9 05:12:29 80 Off
10 05:12:30 99 Off
11 05:12:31 95 Off
12 05:12:32 36 Off",
header = TRUE, stringsAsFactors = FALSE)
Thanks MKR for quick solution, +1. Till the point of
TimeInterVal
creation, there weren't issues although in summarise
it produced NA
s too often, tried handling those but to avail hard luck– Keyshov Borate
Jul 1 at 8:45
TimeInterVal
summarise
NA
@KeyshovBorate Just change the
summarise
as summarise(TimeInterVal = sum(TimeInterVal, na.rm = TRUE))
– MKR
Jul 1 at 9:03
summarise
summarise(TimeInterVal = sum(TimeInterVal, na.rm = TRUE))
Yes, did that, still the same..
– Keyshov Borate
Jul 1 at 10:32
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What have you tried so far?
– camille
Jun 30 at 21:37