How to extract the first 2 digits of all numbers in a column of a dataframe?

I am completely new at Python (this is my first assignment) and I am trying to take the first two digits of the D-column of the following dataframe and put those two digits in a new column F:

import pandas as pd import numpy as np df1 = pd.DataFrame({'A' : [1, 1, 1, 4, 5, 3, 3, 4, 1, 4], 'B' : [8, 4, 3, 1, 1, 6, 4, 6, 9, 8], 'C' : [69,82,8,25,56,79,98,68,49,82], 'D' : [1663, 8818, 9232, 9643, 4900, 8568, 4975, 8938, 7513, 1515], 'E' : ['Married','Single','Single','Divorced','Widow(er)','Single','Married','Divorced','Married','Widow(er)']})

I found several possible solutions here on Stack Overflow, and tried to apply them but none of them is working for me. Either I get some error message (different depending on which solution I tried to apply) I do not get th result that I am expecting.

Welcome to StackOverflow! Please elaborate on what the issue is and what you've tried already to solve the issue. Also, please have a look at this help article. Cheers :)
– vatbub
Jun 30 at 23:55

3 Answers
3

Try this:

import math def first_two(d): return (d // 10 ** (int(math.log(d, 10)) - 1)) df1['F'] = df1.D.apply(first_two)

output:

In [212]: df1 Out[212]: A B C D E F 0 1 8 69 1663 Married 16 1 1 4 82 8818 Single 88 2 1 3 8 9232 Single 92 3 4 1 25 9643 Divorced 96 4 5 1 56 4900 Widow(er) 49 5 3 6 79 8568 Single 85 6 3 4 98 4975 Married 49 7 4 6 68 8938 Divorced 89 8 1 9 49 7513 Married 75 9 4 8 82 1515 Widow(er) 15

Most of the SO solutions use string slicing - this will use math to do the "slice".

math

df1['F'] = df1.D.apply(lambda d: d // 10 ** (int(math.log(d, 10)) - 1))

Didn't include the setup - but it is as described above

#string slice method In [255]: print(t.timeit(100)) 3.3840187825262547e-06 #'first_two' method In [252]: print(t.timeit(100)) 1.8120044842362404e-06 #'lambda' method In [249]: print(t.timeit(100)) 1.9049621187150478e-06

It is odd that calling the method is faster than the lambda (?)

lambda

Thank you very much to everyone. I tried all the methods proposed and all of them solved my problem. I decided to implement the fastest one. Thank you again for your help!
– SamR
Jul 1 at 11:11

Here's a solution using NumPy. It requires numbers in D to have at least 2 digits.

D

df = pd.DataFrame({'D': [1663, 8818, 9232, 9643, 31, 455, 43153, 45]}) df['F'] = df['D'] // np.power(10, np.log10(df['D']).astype(int) - 1) print(df) D F 0 1663 16 1 8818 88 2 9232 92 3 9643 96 4 31 31 5 455 45 6 43153 43 7 45 45

If all your numbers have 4 digits, you can simply use df['F'] = df['D'] // 100.

df['F'] = df['D'] // 100

For larger dataframes, these numeric methods will be more efficient than converting integers to strings, extracting the first 2 characters and converting back to int.

int

agree - the power and log functions (math based) are notably faster, especially with really large data sets.
– Bill Armstrong
Jul 1 at 2:22

power

log

Now I'll have to work through with some paper and pencil the difference between these two solutions as to why one can operate with 1 digit and the other requires at least 2. (here]
– Bill Armstrong
Jul 1 at 2:27

good to know the difference between the easy way & the best way.
– Kumar
Jul 1 at 8:08

You could use something like:

df1['f'] = df1.D.astype(str).str[:2].astype(int)

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Search between a Gas Station