I have a dataframe df as where Col1, Col2 and Col3 are column names:
Col1 Col2 Col3
a b
B 2 3
C 10 6
First row above with values a, b is subcategory so Col1 is empty for row 1.
I am trying to get the following:
B Col2 a 2
B Col3 b 3
C Col2 a 10
C Col3 b 6
I am not sure how to approach above.
Edit:
df.to_dict()
Out[16]:
{'Unnamed: 0': {0: nan, 1: 'B', 2: 'C'},
'Col2': {0: 'a', 1: '2', 2: '10'},
'Col3': {0: 'b', 1: '3', 2: '6'}}
Use stack and join
df_final = (df.iloc[1:].set_index('Col1').stack().reset_index(0)
.join(df.iloc[0,1:].rename('1')).sort_values('Col1'))
Out[345]:
Col1 0 1
Col2 B 2 a
Col3 B 3 b
Col2 C 10 a
Col3 C 6 b
You can try this replacing that NaN with a blank(or any string you want the colum to be named):
df.fillna('').set_index('Col1').T\
.set_index('',append=True).stack().reset_index()
Output:
level_0 Col1 0
0 Col2 a B 2
1 Col2 a C 10
2 Col3 b B 3
3 Col3 b C 6
df.fillna('Col0').set_index('Col1').T\
.set_index('Col0',append=True).stack().reset_index(level=[1,2])
Output:
Col0 Col1 0
Col2 a B 2
Col2 a C 10
Col3 b B 3
Col3 b C 6
df = pd.DataFrame.from_dict({'Col1': {0: np.nan, 1: 'B', 2: 'C'},
'Col2': {0: 'a', 1: '2', 2: '10'},
'Col3': {0: 'b', 1: '3', 2: '6'}})
# set index as a multi-index from the first row
df.index = pd.MultiIndex.from_product([df.iloc[0,:]])
# get rid of the empty row and reset the index
df = df.iloc[1:,:].reset_index()
answer = pd.melt(df, id_vars=['Col1',0], value_vars=['Col2','Col3'],value_name='vals')
answer[['Col1','variable',0,'vals']]
Col1 variable 0 vals
0 B Col2 a 2
1 C Col2 b 10
2 B Col3 a 3
3 C Col3 b 6
You can do the following:
df = pd.DataFrame({'Col1': {0: np.nan, 1: 'B', 2: 'C'},
'Col2': {0: 'a', 1: '2', 2: '10'},
'Col3': {0: 'b', 1: '3', 2: '6'}})
melted = pd.melt(df, id_vars=['Col1'], value_vars=['Col3',
'Col2']).dropna().reset_index(drop=True)
subframe = pd.DataFrame({'Col2': ['a'], 'Col3': ['b']}).melt()
melted.merge(subframe, on='variable')
Out[1]:
Col1 variable value_x value_y
0 B Col3 3 b
1 C Col3 6 b
2 B Col2 2 a
3 C Col2 10 a
Then you can rename your columns as you want
You can melt the dataframe, create a new column dependent on which rows are null, and then filter out the rows where the columns both have a and b :
(
df.melt("Col1")
.assign(temp=lambda x: np.where(x.Col1.isna(), x.value, np.nan))
.ffill()
.query("value != temp")
)
Col1 variable value temp
1 B Col2 2 a
2 C Col2 10 a
4 B Col3 3 b
5 C Col3 6 b
Related
This question already has answers here:
Lookup Values by Corresponding Column Header in Pandas 1.2.0 or newer
(4 answers)
Closed 1 year ago.
I have the following dataframe:
df = pd.DataFrame(data={'flag': ['col3', 'col2', 'col2'],
'col1': [1, 3, 2],
'col2': [5, 2, 4],
'col3': [6, 3, 6],
'col4': [0, 4, 4]},
index=pd.Series(['A', 'B', 'C'], name='index'))
index
flag
col1
col2
col3
col4
A
col3
1
5
6
0
B
col2
3
2
3
4
C
col2
2
4
6
4
For each row, I want to get the value when column name is equal to the flag.
index
flag
col1
col2
col3
col4
col_val
A
col3
1
5
6
0
6
B
col2
3
2
3
4
2
C
col2
2
4
6
4
4
– Index A has a flag of col3. So col_val should be 6 because df['col3'] for that row is 6.
– Index B has a flag of col2. So col_val should be 2 because df['col2'] for that row is 2.
– Index C has a flag of col2. So col_val should be 4 because df['col2'] for that row is 3.
Per this page:
idx, cols = pd.factorize(df['flag'])
df['COl_VAL'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Output:
>>> df
flag col1 col2 col3 col4 COl_VAL
index
A col3 1 5 6 0 6
B col2 3 2 3 4 2
C col2 2 4 6 4 4
The docs has an example that you can adapt; the solution is below is just another option.
What it does is flip the dataframe into a MultiIndex dataframe, select the relevant columns and trim it to non nulls::
cols = [(ent, ent) for ent in df.flag.unique()]
(df.assign(col_val = df.pivot(index = None, columns = 'flag')
.loc(axis = 1)[cols].sum(1)
)
flag col1 col2 col3 col4 col_val
index
A col3 1 5 6 0 6.0
B col2 3 2 3 4 2.0
C col2 2 4 6 4 4.0
try this:
cond = ([df.columns.values[1:]] * df.shape[0]) == df.flag.values.reshape(-1,1)
df1 = df.set_index('flag', append=True)
df1.join(df1.where(cond).ffill(axis=1).col4.rename('res')).reset_index('flag')
I have something like this:
df =
col1 col2 col3
0 B C A
1 E D G
2 NaN F B
EDIT : I need to convert it into this:
result =
Name location
0 B col1,col2
1 C col1
2 A col1
3 E col2
4 D col2
5 G col2
6 F col3
Essentially getting a "location" telling me which column an "Name" is in. Thank you in advance.
Try melt and dropna:
>>> df.melt(var_name='location').dropna().groupby('value', sort=False, as_index=False).agg(', '.join)
value location
0 B col1, col3
1 E col1
2 C col2
3 D col2
4 F col2
5 A col3
6 G col3
>>>
Also groupby and agg.
Or an alternative with stack():
new = df.stack().reset_index().drop('level_0',axis=1).dropna()
new.columns = ['name','location']
prints:
name location
0 col1 B
1 col2 C
2 col3 A
3 col1 E
4 col2 D
5 col3 G
6 col2 F
EDIT:
To get your updated output you could use a groupby along with join():
new.groupby('location').agg({'name':lambda x: ', '.join(list(x))}).reset_index()
Which gives you:
location name
0 A col3
1 B col1, col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3
Try using melt to convert columns to rows. And give the rows a column name.
Then dropna to remove the NaN values in rows.
df = df.melt(var_name="location", value_name="Name").dropna()
You can use pandas.melt and pandas.groupby.agg:
df = df.melt(var_name="location", value_name="Name").dropna()
new_df = df.groupby("Name", as_index=False).agg(",".join)
print(new_df)
Output:
Name location
0 A col3
1 B col1,col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3
I have a data frame like this:
df
col1 col2 col3 col4
A B C 12
A B C 8
A B C 10
P Q R 12
P Q R 11
K L S 1
K L S 15
U V R 20
I want to get those rows where col4 value is maximum for col3 values for each col1 and col2 combinations
for example the result I am looking for is
col1 col2 col3 col4
A B C 12
P Q R 12
K L S 15
U V R 20
how to do it in most efficient way using pandas ?
Try this:
>>> import pandas as pd
>>> df = pd.read_csv("t.csv")
>>> df
col1 col2 col3 col4
0 A B C 12
1 A B C 8
2 A B C 10
3 P Q R 12
4 P Q R 11
5 K L S 1
6 K L S 15
7 U V R 20
>>> df.groupby(['col1']).max()
col2 col3 col4
col1
A B C 12
K L S 15
P Q R 12
U V R 20
You can us the groupby function with max() :
df = pd.DataFrame({'col1' : ['A','A','A','P','P'], 'col2' : ['B','B','B','Q','Q'],
'col3':['C','C','C','R','R'], 'col4':[12,8,10,12,11]})
df.groupby(['col1', 'col2']).max()
Out :
col1 col2 col3 col4
A B C 12
P Q R 12
You need to use groupby:
import pandas as pd
# setup test data
data = {'col1': ['A', 'A', 'A', 'P', 'P', 'K', 'K', 'U'], 'col2': ['B', 'B', 'B', 'Q', 'Q', 'L', 'L', 'V'],
'col3': ['C', 'C', 'C', 'R', 'R', 'S', 'S', 'R'], 'col4': [12, 8, 10, 12,11,1,15,20]}
data = pd.DataFrame(data=data)
# get max values
out_data = data.groupby(['col1', 'col2', 'col3']).max()
Output:
col1 col2 col3 col4
A B C 12
K L S 15
P Q R 12
U V R 20
I have a Pandas dataframe that looks something like:
df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}, index=['A', 'B', 'C', 'D'])
col1 col2
A 1 50
B 2 60
C 3 70
D 4 80
However, I want to automatically rearrange it so that it looks like:
col1 A col1 B col1 C col1 D col2 A col2 B col2 C col2 D
0 1 2 3 4 50 60 70 80
I want to combine the row name with the column name
I want to end up with only one row
df2 = df.unstack()
df2.index = [' '.join(x) for x in df2.index.values]
df2 = pd.DataFrame(df2).T
df2
col1 A col1 B col1 C col1 D col2 A col2 B col2 C col2 D
0 1 2 3 4 5 6 7 8
If you want to have the orignal x axis labels in front of the column names ("A col1"...) just change .join(x) by .join(x[::-1]):
df2 = df.unstack()
df2.index = [' '.join(x[::-1]) for x in df2.index.values]
df2 = pd.DataFrame(df2).T
df2
A col1 B col1 C col1 D col1 A col2 B col2 C col2 D col2
0 1 2 3 4 5 6 7 8
Here's one way to do it, there could be a simpler way
In [562]: df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [50, 60, 70, 80]},
index=['A', 'B', 'C', 'D'])
In [563]: pd.DataFrame([df.values.T.ravel()],
columns=[y+x for y in df.columns for x in df.index])
Out[563]:
col1A col1B col1C col1D col2A col2B col2C col2D
0 1 2 3 4 50 60 70 80
Using the DataFrame below as an example:
import pandas as pd
df = pd.DataFrame({'col1':[1, 2, 3, 2, 1] , 'col2':['A', 'A', 'B', 'B','C']})
col1 col2
0 1 A
1 2 A
2 3 B
3 2 B
4 1 C
how can I get
col1 col2
0 1 A,C
1 2 A,B
2 3 B
You can groupby on 'col1' and then apply a lambda that joins the values:
In [88]:
df = pd.DataFrame({'col1':[1, 2, 3, 2, 1] , 'col2':['A', 'A', 'B', 'B','C']})
df.groupby('col1')['col2'].apply(lambda x: ','.join(x)).reset_index()
Out[88]:
col1 col2
0 1 A,C
1 2 A,B
2 3 B