How to extract the first 2 digits of all numbers in a column of a dataframe?

Multi tool use
Multi tool use


How to extract the first 2 digits of all numbers in a column of a dataframe?



I am completely new at Python (this is my first assignment) and I am trying to take the first two digits of the D-column of the following dataframe and put those two digits in a new column F:


import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A' : [1, 1, 1, 4, 5, 3, 3, 4, 1, 4],
'B' : [8, 4, 3, 1, 1, 6, 4, 6, 9, 8],
'C' : [69,82,8,25,56,79,98,68,49,82],
'D' : [1663, 8818, 9232, 9643, 4900, 8568, 4975, 8938, 7513, 1515],
'E' : ['Married','Single','Single','Divorced','Widow(er)','Single','Married','Divorced','Married','Widow(er)']})



I found several possible solutions here on Stack Overflow, and tried to apply them but none of them is working for me. Either I get some error message (different depending on which solution I tried to apply) I do not get th result that I am expecting.





Welcome to StackOverflow! Please elaborate on what the issue is and what you've tried already to solve the issue. Also, please have a look at this help article. Cheers :)
– vatbub
Jun 30 at 23:55




3 Answers
3



Try this:


import math

def first_two(d):
return (d // 10 ** (int(math.log(d, 10)) - 1))

df1['F'] = df1.D.apply(first_two)



output:


In [212]: df1
Out[212]:
A B C D E F
0 1 8 69 1663 Married 16
1 1 4 82 8818 Single 88
2 1 3 8 9232 Single 92
3 4 1 25 9643 Divorced 96
4 5 1 56 4900 Widow(er) 49
5 3 6 79 8568 Single 85
6 3 4 98 4975 Married 49
7 4 6 68 8938 Divorced 89
8 1 9 49 7513 Married 75
9 4 8 82 1515 Widow(er) 15



Most of the SO solutions use string slicing - this will use math to do the "slice".


math


df1['F'] = df1.D.apply(lambda d: d // 10 ** (int(math.log(d, 10)) - 1))



Didn't include the setup - but it is as described above


#string slice method
In [255]: print(t.timeit(100))
3.3840187825262547e-06

#'first_two' method
In [252]: print(t.timeit(100))
1.8120044842362404e-06

#'lambda' method
In [249]: print(t.timeit(100))
1.9049621187150478e-06



It is odd that calling the method is faster than the lambda (?)


lambda





Thank you very much to everyone. I tried all the methods proposed and all of them solved my problem. I decided to implement the fastest one. Thank you again for your help!
– SamR
Jul 1 at 11:11



Here's a solution using NumPy. It requires numbers in D to have at least 2 digits.


D


df = pd.DataFrame({'D': [1663, 8818, 9232, 9643, 31, 455, 43153, 45]})

df['F'] = df['D'] // np.power(10, np.log10(df['D']).astype(int) - 1)

print(df)

D F
0 1663 16
1 8818 88
2 9232 92
3 9643 96
4 31 31
5 455 45
6 43153 43
7 45 45



If all your numbers have 4 digits, you can simply use df['F'] = df['D'] // 100.


df['F'] = df['D'] // 100



For larger dataframes, these numeric methods will be more efficient than converting integers to strings, extracting the first 2 characters and converting back to int.


int





agree - the power and log functions (math based) are notably faster, especially with really large data sets.
– Bill Armstrong
Jul 1 at 2:22


power


log





Now I'll have to work through with some paper and pencil the difference between these two solutions as to why one can operate with 1 digit and the other requires at least 2. (here]
– Bill Armstrong
Jul 1 at 2:27






good to know the difference between the easy way & the best way.
– Kumar
Jul 1 at 8:08



You could use something like:


df1['f'] = df1.D.astype(str).str[:2].astype(int)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

liLo xSMUC,OO4exOtPO,a1de
Tieb,4TKoELI71 aSt

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

django NoReverseMatch Exception

Audio Livestreaming with Python & Flask