Pandas - Check adjacent column for value
Pandas - Check adjacent column for value
I have a df that tracks the status of an issue. From 'Open', 'In Progress' to 'Closed' like such:
T1 T2 T3 T4 T5
1 Open In Progress Closed
2 In Progress Closed
3 Open In Progress Open Closed
4 Open In Progress Closed Open Closed
5 Open In Progress Closed
Basically I want to find all issues that get reopened. This can be noted by any row that has a Closed
value that then has a followup transition. For example, index 4
has a closed value in T3
but then T4
contains something to indicate it has been reopened.
Closed
4
T3
T4
The output would be:
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed 0
2 In Progress Closed 0
3 Open In Progress Open Closed 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed 0
In the real df, the columns range from T1 to T25 and has 50k rows.
So basically I need to check each column, find if closed exists and then check the next column to see if it is not empty.
Thanks
1 Answer
1
I think need:
df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed NaN NaN 0
2 In Progress Closed NaN NaN NaN 0
3 Open In Progress Open Closed NaN 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed NaN NaN 0
Detail:
Check Open
value per df
:
Open
df
print ((df == 'Open'))
T1 T2 T3 T4 T5
1 True False False False False
2 False False False False False
3 True False True False False
4 True False False True False
5 True False False False False
With shifted DataFrame check Closed
:
Closed
print (df.shift(axis=1))
T1 T2 T3 T4 T5
1 NaN Open In Progress Closed NaN
2 NaN In Progress Closed NaN NaN
3 NaN Open In Progress Open Closed
4 NaN Open In Progress Closed Open
5 NaN Open In Progress Closed NaN
print ((df.shift(axis=1)) == 'Closed')
T1 T2 T3 T4 T5
1 False False False True False
2 False False True False False
3 False False False False True
4 False False False True False
5 False False False True False
Then chain together by &
to AND
and get at least one True
per rows by any
:
&
AND
True
any
print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')))
T1 T2 T3 T4 T5
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False True False
5 False False False False False
print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1))
1 False
2 False
3 False
4 True
5 False
dtype: bool
And last convert boolean mask to integers by astype
and assign to new column:
astype
df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed NaN NaN 0
2 In Progress Closed NaN NaN NaN 0
3 Open In Progress Open Closed NaN 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed NaN NaN 0
In Progress Closed Open In Progress Closed
You are right..
– Anton vBR
Jul 1 at 10:26
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
@AntonvBR - Why think
In Progress Closed Open In Progress Closed
shoud not work? because Open is matched with first Closed. I understand question need check Open and previous Close (if understand it well, sure ;))– jezrael
Jul 1 at 10:23