Pandas - Check adjacent column for value


Pandas - Check adjacent column for value



I have a df that tracks the status of an issue. From 'Open', 'In Progress' to 'Closed' like such:


T1 T2 T3 T4 T5
1 Open In Progress Closed
2 In Progress Closed
3 Open In Progress Open Closed
4 Open In Progress Closed Open Closed
5 Open In Progress Closed



Basically I want to find all issues that get reopened. This can be noted by any row that has a Closedvalue that then has a followup transition. For example, index 4 has a closed value in T3 but then T4 contains something to indicate it has been reopened.


Closed


4


T3


T4



The output would be:


T1 T2 T3 T4 T5       Reopened
1 Open In Progress Closed                       0
2 In Progress Closed                                   0  
3 Open In Progress Open Closed               0
4 Open In Progress Closed Open Closed        1
5 Open In Progress Closed                       0



In the real df, the columns range from T1 to T25 and has 50k rows.



So basically I need to check each column, find if closed exists and then check the next column to see if it is not empty.



Thanks




1 Answer
1



I think need:


df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed NaN NaN 0
2 In Progress Closed NaN NaN NaN 0
3 Open In Progress Open Closed NaN 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed NaN NaN 0



Detail:



Check Open value per df:


Open


df


print ((df == 'Open'))
T1 T2 T3 T4 T5
1 True False False False False
2 False False False False False
3 True False True False False
4 True False False True False
5 True False False False False



With shifted DataFrame check Closed:


Closed


print (df.shift(axis=1))
T1 T2 T3 T4 T5
1 NaN Open In Progress Closed NaN
2 NaN In Progress Closed NaN NaN
3 NaN Open In Progress Open Closed
4 NaN Open In Progress Closed Open
5 NaN Open In Progress Closed NaN

print ((df.shift(axis=1)) == 'Closed')
T1 T2 T3 T4 T5
1 False False False True False
2 False False True False False
3 False False False False True
4 False False False True False
5 False False False True False



Then chain together by & to AND and get at least one True per rows by any:


&


AND


True


any


print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')))
T1 T2 T3 T4 T5
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False True False
5 False False False False False

print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1))
1 False
2 False
3 False
4 True
5 False
dtype: bool



And last convert boolean mask to integers by astype and assign to new column:


astype


df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed NaN NaN 0
2 In Progress Closed NaN NaN NaN 0
3 Open In Progress Open Closed NaN 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed NaN NaN 0





@AntonvBR - Why think In Progress Closed Open In Progress Closed shoud not work? because Open is matched with first Closed. I understand question need check Open and previous Close (if understand it well, sure ;))
– jezrael
Jul 1 at 10:23


In Progress Closed Open In Progress Closed





You are right..
– Anton vBR
Jul 1 at 10:26






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

django NoReverseMatch Exception

List of Kim Possible characters