Pandas get column value based on row value [duplicate] - python

This question already has answers here:
Lookup Values by Corresponding Column Header in Pandas 1.2.0 or newer
(4 answers)
Closed 1 year ago.
I have the following dataframe:
df = pd.DataFrame(data={'flag': ['col3', 'col2', 'col2'],
'col1': [1, 3, 2],
'col2': [5, 2, 4],
'col3': [6, 3, 6],
'col4': [0, 4, 4]},
index=pd.Series(['A', 'B', 'C'], name='index'))
index
flag
col1
col2
col3
col4
A
col3
1
5
6
0
B
col2
3
2
3
4
C
col2
2
4
6
4
For each row, I want to get the value when column name is equal to the flag.
index
flag
col1
col2
col3
col4
col_val
A
col3
1
5
6
0
6
B
col2
3
2
3
4
2
C
col2
2
4
6
4
4
– Index A has a flag of col3. So col_val should be 6 because df['col3'] for that row is 6.
– Index B has a flag of col2. So col_val should be 2 because df['col2'] for that row is 2.
– Index C has a flag of col2. So col_val should be 4 because df['col2'] for that row is 3.

Per this page:
idx, cols = pd.factorize(df['flag'])
df['COl_VAL'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Output:
>>> df
flag col1 col2 col3 col4 COl_VAL
index
A col3 1 5 6 0 6
B col2 3 2 3 4 2
C col2 2 4 6 4 4

The docs has an example that you can adapt; the solution is below is just another option.
What it does is flip the dataframe into a MultiIndex dataframe, select the relevant columns and trim it to non nulls::
cols = [(ent, ent) for ent in df.flag.unique()]
(df.assign(col_val = df.pivot(index = None, columns = 'flag')
.loc(axis = 1)[cols].sum(1)
)
flag col1 col2 col3 col4 col_val
index
A col3 1 5 6 0 6.0
B col2 3 2 3 4 2.0
C col2 2 4 6 4 4.0

try this:
cond = ([df.columns.values[1:]] * df.shape[0]) == df.flag.values.reshape(-1,1)
df1 = df.set_index('flag', append=True)
df1.join(df1.where(cond).ffill(axis=1).col4.rename('res')).reset_index('flag')

Related

Python: Transform or Melt and Groupby a Dataframe

I have something like this:
df =
col1 col2 col3
0 B C A
1 E D G
2 NaN F B
EDIT : I need to convert it into this:
result =
Name location
0 B col1,col2
1 C col1
2 A col1
3 E col2
4 D col2
5 G col2
6 F col3
Essentially getting a "location" telling me which column an "Name" is in. Thank you in advance.
Try melt and dropna:
>>> df.melt(var_name='location').dropna().groupby('value', sort=False, as_index=False).agg(', '.join)
value location
0 B col1, col3
1 E col1
2 C col2
3 D col2
4 F col2
5 A col3
6 G col3
>>>
Also groupby and agg.
Or an alternative with stack():
new = df.stack().reset_index().drop('level_0',axis=1).dropna()
new.columns = ['name','location']
prints:
name location
0 col1 B
1 col2 C
2 col3 A
3 col1 E
4 col2 D
5 col3 G
6 col2 F
EDIT:
To get your updated output you could use a groupby along with join():
new.groupby('location').agg({'name':lambda x: ', '.join(list(x))}).reset_index()
Which gives you:
location name
0 A col3
1 B col1, col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3
Try using melt to convert columns to rows. And give the rows a column name.
Then dropna to remove the NaN values in rows.
df = df.melt(var_name="location", value_name="Name").dropna()
You can use pandas.melt and pandas.groupby.agg:
df = df.melt(var_name="location", value_name="Name").dropna()
new_df = df.groupby("Name", as_index=False).agg(",".join)
print(new_df)
Output:
Name location
0 A col3
1 B col1,col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3

Fetch row data from known index in pandas

df1:
col1 col2
0 a 5
1 b 2
2 c 1
df2:
col1
0 qa0
1 qa1
2 qa2
3 qa3
4 qa4
5 qa5
final output:
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
Basically , in df1, I have index stored for another df data. I have to fetch data from df2 and append it in df1.
I don't know how to fetch data via index number.
Use Series.map by another Series:
df1['col3'] = df1['col2'].map(df2['col1'])
Or use DataFrame.join with rename column:
df1 = df1.join(df2.rename(columns={'col1':'col3'})['col3'], on='col2')
print (df1)
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
You can use iloc to get data and then to_numpy for values
df1["col3"] = df2.iloc[df1.col2].to_numpy()
df1
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1

Pandas: Get mean of different rows when columns are equal

I'm trying to find the mean of values in different rows, grouped by similarities in other columns. Example:
In [14]: pd.DataFrame({'col1':[1,2,1,2], 'col2':['A','C','A','B'], 'col3':[1, 5, 6, 9]})
Out[14]:
col1 col2 col3
0 1 A 1
1 2 C 5
2 1 A 6
3 2 B 9
What I would like is to add a column with the means of col3, for all rows where the combination of col1 and col2 match. Desired output:
Out[14]:
col1 col2 col3 mean
0 1 A 1 3.5
1 2 C 5 5
2 1 A 6 3.5
3 2 B 9 9
I have tried several things with groupby in combination with apply but couldn't get proper results.
its a transform my man
df['mean'] = df.groupby(['col1','col2']).col3.transform('mean')

How to do group by and take count of unique and count of some value as aggregate on same column in python pandas?

My question is related to my previous Question but it's different. So I am asking the new question.
In above question see the answer of #jezrael.
df = pd.DataFrame({'col1':[1,1,1],
'col2':[4,4,6],
'col3':[7,7,9],
'col4':[3,3,5]})
print (df)
col1 col2 col3 col4
0 1 4 7 3
1 1 4 7 3
2 1 6 9 5
df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
col4 col3 result_col
col1 col2
1 4 1 2 2.0
6 1 1 1.0
Now here I want to take count for the specific value of col4 . Say I also want to take count of col4 == 3 in the same query.
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'}) ... + count(col4=='3')
How to do this in same above query I have tried bellow but not getting solution.
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique','col4':'x: lambda x[x == 7].count()'})
Do some preprocessing by including the col4==3 as a column ahead of time. Then use aggregate
df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
old answers
g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
result_col=g.col4.apply(lambda x: x.eq(3).sum()))
col3 col4 result_col
col1 col2
1 4 2 1 2
6 1 1 0
slightly rearranged
g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
I think you need aggregate with list of function in dict for column col4.
If need count 3 values the simpliest is sum True values in x == 3:
df1 = df.groupby(['col1','col2'])
.agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
col4_nunique col4_count_3 col3_size
col1 col2
1 4 1 2 2
6 1 0 1

Rearrange Python Pandas DataFrame Rows into a Single Row

I have a Pandas dataframe that looks something like:
df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}, index=['A', 'B', 'C', 'D'])
col1 col2
A 1 50
B 2 60
C 3 70
D 4 80
However, I want to automatically rearrange it so that it looks like:
col1 A col1 B col1 C col1 D col2 A col2 B col2 C col2 D
0 1 2 3 4 50 60 70 80
I want to combine the row name with the column name
I want to end up with only one row
df2 = df.unstack()
df2.index = [' '.join(x) for x in df2.index.values]
df2 = pd.DataFrame(df2).T
df2
col1 A col1 B col1 C col1 D col2 A col2 B col2 C col2 D
0 1 2 3 4 5 6 7 8
If you want to have the orignal x axis labels in front of the column names ("A col1"...) just change .join(x) by .join(x[::-1]):
df2 = df.unstack()
df2.index = [' '.join(x[::-1]) for x in df2.index.values]
df2 = pd.DataFrame(df2).T
df2
A col1 B col1 C col1 D col1 A col2 B col2 C col2 D col2
0 1 2 3 4 5 6 7 8
Here's one way to do it, there could be a simpler way
In [562]: df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [50, 60, 70, 80]},
index=['A', 'B', 'C', 'D'])
In [563]: pd.DataFrame([df.values.T.ravel()],
columns=[y+x for y in df.columns for x in df.index])
Out[563]:
col1A col1B col1C col1D col2A col2B col2C col2D
0 1 2 3 4 50 60 70 80

Categories