What is the difference between using `[data2]` and `[[data2]]` with `groupby`

What is the difference between using `[data2]` and `[[data2]]` with `groupby` - python

I am working through a Python for data analysis tutorial and want some clarification on the output I get from using [data2] and [[data2]] when using groupby.

If use:
[data2]
you get Series with Multiindex.
If use subset
[[data2]]
you get DataFrame with Multiindex.
And if use:
df.groupby(['key1','key2'], as_index=False)['data2'].mean()
you get DataFrame with 3 columns without Multiindex.
Maybe it is more clear if use another form:
import pandas as pd
df = pd.DataFrame({'key1':[1,2,2,1,2,2],
'key2':[4,4,4,4,5,5],
'data2':[7,8,9,1,3,5],
'D':[1,3,5,7,9,5]})
print (df)
D data2 key1 key2
0 1 7 1 4
1 3 8 2 4
2 5 9 2 4
3 7 1 1 4
4 9 3 2 5
5 5 5 2 5
print (df['data2'].groupby([df.key1,df.key2]).mean())
key1 key2
1 4 4.0
2 4 8.5
5 4.0
Name: data2, dtype: float64
print (df[['data2']].groupby([df.key1,df.key2]).mean())
data2
key1 key2
1 4 4.0
2 4 8.5
5 4.0

Related

how to replace rows by row with condition

i want to replace all rows that have "A" in name column
with single row from another df
i got this
data={"col1":[2,3,4,5,7],
"col2":[4,2,4,6,4],
"col3":[7,6,9,11,2],
"col4":[14,11,22,8,5],
"name":["A","A","V","A","B"],
"n_roll":[8,2,1,3,9]}
df=pd.DataFrame.from_dict(data)
df
that is my single row (the another df)
data2={"col1":[0]
,"col2":[1]
,"col3":[5]
,"col4":[6]
}
df2=pd.DataFrame.from_dict(data2)
df2
that how i want it to look like
data={"col1":[0,0,4,0,7],
"col2":[1,1,4,1,4],
"col3":[5,5,9,5,2],
"col4":[6,6,22,6,5],
"name":["A","A","V","A","B"],
"n_roll":[8,2,1,3,9]}
df=pd.DataFrame.from_dict(data)
df
i try do this df.loc[df["name"]=="A"][df2.columns]=df2
but it did not work

We can try mask + combine_first
df = df.mask(df['name'].eq('A'), df2.loc[0], axis=1).combine_first(df)
df
col1 col2 col3 col4 name n_roll
0 0 1 5 6 A 8.0
1 0 1 5 6 A 2.0
2 4 4 9 22 V 1.0
3 0 1 5 6 A 3.0
4 7 4 2 5 B 9.0

df.loc[df["name"]=="A"][df2.columns]=df2 is index-chaining and is not expected to work. For details, see the doc.
You can also use boolean indexing like this:
df.loc[df['name']=='A', df2.columns] = df2.values
Output:
col1 col2 col3 col4 name n_roll
0 0 1 5 6 A 8
1 0 1 5 6 A 2
2 4 4 9 22 V 1
3 0 1 5 6 A 3
4 7 4 2 5 B 9

Replace NaN values with values from other table

Please help.
My first table looks like:
id val1 val2
0 4 30
1 5 NaN
2 3 10
3 2 8
4 3 NaN
My second table looks like
id val1 val2_estimate
0 1 8
1 2 12
2 3 13
3 4 16
4 5 22
I want to replace Nan in 1st table with estimated values from column val2_estimate from 2nd table where val1 are the same. val1 in 2nd table are unique. End result need to look like that:
id val1 val2
0 4 30
1 5 22
2 3 10
3 2 8
4 3 13
I want to replace NaN values only.

Use merge to get the corresponding df2's estimate for df1, then use fillna:
df['val2'] = df['val2'].fillna(
df.merge(df2, on=['val1'], how='left')['val2_estimate'])
df
id val1 val2
0 0 4 30.0
1 1 5 22.0
2 2 3 10.0
3 3 2 8.0
4 4 3 13.0
Many ways to skin a cat, this is one of them.

Use fillna with map from a pd.Series created using set_index:
df['val2'] = df['val2'].fillna(df['val1'].map(df2.set_index('val1')['val2_estimate']))
df
Output:
val1 val2
id
0 4 30.0
1 5 22.0
2 3 10.0
3 2 8.0
4 3 13.0

Pandas how to get this?

I am using Python3 pandas to read a CSV file which contains 4 columns, named {a,b,c,d}.
Now I want to add a new column e where the data is given by (d-last.d)/last.d.
How can I do it?

Use sub with div and for select last value iat:
df = pd.DataFrame({
'a':[4,5,4,5,5,4],
'b':[7,8,9,4,2,3],
'c':[1,3,5,7,1,0],
'd':[5,3,6,9,2,10],
})
df['e'] = df['d'].sub(df['d'].iat[-1]).div(df['d'].iat[-1])
print (df)
a b c d e
0 4 7 1 5 -0.5
1 5 8 3 3 -0.7
2 4 9 5 6 -0.4
3 5 4 7 9 -0.1
4 5 2 1 2 -0.8
5 4 3 0 10 0.0

Pandas - Filter data frame with another data frame

I have two data frames that have same length like this
df1:
density
1 1,45
2 3,87
3 4,35
4 2,87
5 0.74
6 9.34
7 3.087
8 0.28
9 6,47
10 5,59
The second data frame looks like this
df2:
State
1 1
2 1
3 1
4 1
5 1
6 1
7 0
8 0
9 0
10 0
I want an output that looks like this which means filter df1 in order to keep only values where df2 is equal to 1 :
output:
density
1 1,45
2 3,87
3 4,35
4 2,87
5 0.74
6 9.34
How can I do that?
Can you help me please.

Let's use boolean index:
df1[df2.eq(1).values]
Output:
density
1 1,45
2 3,87
3 4,35
4 2,87
5 0.74
6 9.34

This should work
df1[df2.State.astype(bool)]

Pandas conditionally copying of cell value

Working with a Pandas DataFrame, I am trying to copy data from one cell into another cell only if the recipient cell contains a specific value. The transfer should go from:
Col1 Col2
0 4 X
1 2 5
2 1 X
3 7 8
4 12 20
5 3 X
And the result should be
Col1 Col2
0 4 4
1 2 5
2 1 1
3 7 8
4 12 20
5 3 3
Is there an elegant or simple solution I am missing?

df.Col2 = df.Col1.where(df.Col2 == 'X', df.Col2)

import pandas as pd
import numpy as np
df.Col2 = np.where(df.Col2 == 'specific value', df.Col1, df.Col2)

Using pandas.DataFrame.ffill:
>>> df.replace('X', np.nan, inplace=True)
>>> df.ffill(axis=1)
Col1 Col2
0 4 4
1 2 5
2 1 1
3 7 8
4 12 20
5 3 3

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is the difference between using `[data2]` and `[[data2]]` with `groupby` - python

I am working through a Python for data analysis tutorial and want some clarification on the output I get from using [data2] and [[data2]] when using groupby.

Related

how to replace rows by row with condition

Replace NaN values with values from other table

Pandas how to get this?

Pandas - Filter data frame with another data frame

Pandas conditionally copying of cell value

Categories

Resources