Selecting iloc based on a condition - python

My problem is quite hard to explain but easily understandable with an example :
From this dataframe
pd.DataFrame([[2,"1523974569"],[3,"3214569871"],[0,"9384927512"]])
I would like to obtain :
pd.DataFrame(["15","321",""])
It means that the first column is telling me how much characters I should extract from the second column starting from the start.
Thanks

you could get it using apply and lambda on dataframe as below
df = pd.DataFrame([[2,"1523974569"],[3,"3214569871"],[0,"9384927512"]])
df[2] = df.apply(lambda x : x[1][:x[0]], axis=1)
df
it will give you the output
0 1 2
0 2 1523974569 15
1 3 3214569871 321
2 0 9384927512

Related

Pandas - How to extract values from a large DF without any 'keys' using another DF's values?

I've got one large matrix as a pandas DF w/o any 'keys' but plain numbers on top. A smaller version of that just to demonstrate the problem in here would be like this input:
M=pd.DataFrame(np.random.rand(4,5))
What I want to accomplish is using another given DF as reference that has a structure like this
N=pd.DataFrame({'A':[2,2,2],'B':[2,3,4]})
...to extract the values from the large DF whereas the values of 'A' correspond to the ROW number and 'B' values to the COLUMN number of the large DF so that the expected output would look like this:
Large DF
0 1 2 3 4
0 0.766275 0.910825 0.378541 0.775416 0.639854
1 0.505877 0.992284 0.720390 0.181921 0.501062
2 0.439243 0.416820 0.285719 0.100537 0.429576
3 0.243298 0.560427 0.162422 0.631224 0.033927
Small DF
A B
0 2 2
1 2 3
2 2 4
Expected Output:
A B extracted values
0 2 2 0.285719
1 2 3 0.100537
2 2 4 0.429576
So far I've tried different version of something like this
N['extracted'] = M.iloc[N['A'].astype(int):,N['B'].astype(int)]
..but it keeps failing with an error saying
TypeError: cannot do positional indexing on RangeIndex with these indexers
[0 2
1 2
2 2
Which approach would be the best ?
Is this job better to accomplish by converting the DF's into a numpy arrays ?
Thanks for help!
I think you want to use the apply function. This goes row by row through your data set.
N['extracted'] = N.apply(lambda row: M.iloc[row['A'], row['B']], axis=1)

Changing values in dataframe based on cell and column name

I have a dataframe
df=pd.DataFrame( [0,1,2],columns=[‘3m3a’,’1z6n’,’11p66d’])
Now i would like to apply 2 * value * (last numbers of column name). Eg for the last 2 * 2* 66
Df.apply(lambda x: 2*x) for step 1
Step 2 is the hardest part
Can do new dataframe like df2=df.stack().reset_index().apply(lambda x: x[re.search(‘[azAZ]+’,x).end():]) and then multiple the 2.
What’s a more pythonic way?
For DataFrame:
3m3a 1z6n 11p66d
0 0 1 2
You can use .colums.str.extract and then DataFrame.multiply:
vals = df.columns.str.extract(r"(\d+)[a-z]*?$").T.astype(int)
df = df.multiply(2 * vals.values, axis=1)
print(df)
Prints:
3m3a 1z6n 11p66d
0 0 12 264
Late to the party, and having found almost the same answer, but using negative look-behind regex:
newdf = df.multiply(
2 * df.columns.str.extract(r'.*(?<!\d)(\d+)\D*').astype(int).values.ravel(),
axis=1)
>>> newdf
3m3a 1z6n 11p66d
0 0 12 264
Thank you, that both works
what if i would like to split the column in 2 parts, one up to and including the first letter, and the second the part after
df.columns.str.split(r"(\d+\D+)",n=1,expand=True)
work but give me a 3 part with first blank

How can I rename NaN columns in python pandas?

Good day everyone! I had trouble putting a nested dictionary as separate columns. However, I fixed it using the concat and json.normalize function. But for some reason the code I used removed all the column names and returned NaN as values for the columns...
Does someone knows how to fix this?
Code I used:
import pandas as pd
c = ['photo.photo_replace', 'photo.photo_remove', 'photo.photo_add', 'photo.photo_effect', 'photo.photo_brightness',
'photo.background_color', 'photo.photo_resize', 'photo.photo_rotate', 'photo.photo_mirror', 'photo.photo_layer_rearrange',
'photo.photo_move', 'text.text_remove', 'text.text_add', 'text.text_edit', 'text.font_select', 'text.text_color', 'text.text_style',
'text.background_color', 'text.text_align', 'text.text_resize', 'text.text_rotate', 'text.text_move', 'text.text_layer_rearrange']
df_edit = pd.concat([json_normalize(x)[c] for x in df['editables']], ignore_index=True)
df.columns = df.columns.str.split('.').str[1]
Current problem:
Result I want:
df= pd.DataFrame({
'A':[1,2,3],
'B':[3,3,3]
})
print(df)
A B
0 1 3
1 2 3
2 3 3
c=['new_name1','new_name2']
df.columns=c
print(df)
new_name1 new_name2
0 1 3
1 2 3
2 3 3
remember , lenght of column names (c) should be equal to column amount

How to find the number of an element in a column of a dataframe

For example, I have a dataframe A likes below :
a b c
x 0 2 1
y 1 3 2
z 0 2 4
I want to get the number of 0 in column 'a' , which should returns 2. ( A[x][a] and A[z][a] )
Is there a simple way or is there a function I can easily do this?
I've Googled for it, but there are only articles like this.
count the frequency that a value occurs in a dataframe column
Which makes a new dataframe, and is too complicated to what I only need to do.
Use sum with boolean mask - Trues are processes like 1, so output is count of 0 values:
out = A.a.eq(0).sum()
print (out)
2
Try value_counts from pandas (here):
df.a.value_counts()["0"]
If the values are changeable, do it with df[column_name].value_counts()[searched_value]

Subtract 2 values from one another within 1 column after groupby

I am very sorry if this is a very basic question but unfortunately, I'm failing miserably at figuring out the solution.
I need to subtract the first value within a column (in this case column 8 in my df) from the last value & divide this by a number (e.g. 60) after having applied groupby to my pandas df to get one value per id. The final output would ideally look something like this:
id
1 1523
2 1644
I have the actual equation which works on its own when applied to the entire column of the df:
(df.iloc[-1,8] - df.iloc[0,8])/60
However I fail to combine this part with the groupby function. Among others, I tried apply, which doesn't work.
df.groupby(['id']).apply((df.iloc[-1,8] - df.iloc[0,8])/60)
I also tried creating a function with the equation part and then do apply(func)but so far none of my attempts have worked. Any help is much appreciated, thank you!
Demo:
In [204]: df
Out[204]:
id val
0 1 12
1 1 13
2 1 19
3 2 20
4 2 30
5 2 40
In [205]: df.groupby(['id'])['val'].agg(lambda x: (x.iloc[-1] - x.iloc[0])/60)
Out[205]:
id
1 0.116667
2 0.333333
Name: val, dtype: float64

Categories