I have a dataframe of type:
a = ['a','b','c','a','b','c','a','b','c']
b = [0,1,2,3,4,5,6,7,8]
df = pd.DataFrame({'key':a,'values':b})
key values
0 a 0
1 b 1
2 c 2
3 a 3
4 b 4
5 c 5
6 a 6
7 b 7
8 c 8
I want to move the values in the "values" column to new columns where they have the same "key".
So result:
key values0 values1 values2
0 a 0 3 6
1 b 1 4 7
2 c 2 5 8
From this question How can I pivot a dataframe?
I've tried:
a=d1.pivot_table(index='key',values='values',aggfunc=list).squeeze()
pd.DataFrame(a.tolist(),index=a.index)
Which gives
0 1 2
key
a 0 3 6
b 1 4 7
c 2 5 8
But I don't want the index to be 'key', I want the index to stay the same.
You can use reset_index.
a = df.pivot_table(index='key',values='values',aggfunc=list).squeeze()
out = pd.DataFrame(a.tolist(),index=a.index).add_prefix('values').reset_index()
print(out)
# Output
key values0 values1 values2
0 a 0 3 6
1 b 1 4 7
2 c 2 5 8
Another way to do it:
out = (df.pivot_table('values', 'key', df.index // 3)
.add_prefix('values').reset_index())
print(out)
# Output
key values0 values1 values2
0 a 0 3 6
1 b 1 4 7
2 c 2 5 8
df["id"] = df.groupby("key").cumcount()
df.pivot(columns="id", index="key").reset_index()
# key values
# id 0 1 2
# 0 a 0 3 6
# 1 b 1 4 7
# 2 c 2 5 8
Related
I would like to obtain the 'Value' column below, from the original df:
A B C Column_To_Use
0 2 3 4 A
1 5 6 7 C
2 8 0 9 B
A B C Column_To_Use Value
0 2 3 4 A 2
1 5 6 7 C 7
2 8 0 9 B 0
Use DataFrame.lookup:
df['Value'] = df.lookup(df.index, df['Column_To_Use'])
print (df)
A B C Column_To_Use Value
0 2 3 4 A 2
1 5 6 7 C 7
2 8 0 9 B 0
I have a dataframe with lots of data and 1 column that is structured like this:
index var_1
1 a=3:b=4:c=5:d=6:e=3
2 b=3:a=4:c=5:d=6:e=3
3 e=3:a=4:c=5:d=6
4 c=3:a=4:b=5:d=6:f=3
I am trying to structure the data in that column to look like this:
index a b c d e f
1 3 4 5 6 3 0
2 4 3 5 6 3 0
3 4 0 5 6 3 0
4 4 5 3 6 0 3
I have done the following thus far:
df1 = df['var1'].str.split(':', expand=True)
I can then loop through the cols of df1 and do another split on '=', but then I'll just have loads of disorganised label cols and value cols.
Use list comprehension with dictionaries for each value and pass to DataFrame constructor:
comp = [dict([y.split('=') for y in x.split(':')]) for x in df['var_1']]
df = pd.DataFrame(comp).fillna(0).astype(int)
print (df)
a b c d e f
0 3 4 5 6 3 0
1 4 3 5 6 3 0
2 4 0 5 6 3 0
3 4 5 3 6 0 3
Or use Series.str.split with expand=True for DataFrame, reshape by DataFrame.stack, again split, remove first level of MultiIndex and add new level by 0 column, last reshape by Series.unstack:
df = (df['var_1'].str.split(':', expand=True)
.stack()
.str.split('=', expand=True)
.reset_index(level=1, drop=True)
.set_index(0, append=True)[1]
.unstack(fill_value=0)
.rename_axis(None, axis=1))
print (df)
a b c d e f
1 3 4 5 6 3 0
2 4 3 5 6 3 0
3 4 0 5 6 3 0
4 4 5 3 6 0 3
Here's one approach using str.get_dummies:
out = df.var_1.str.get_dummies(sep=':')
out = out * out.columns.str[2:].astype(int).values
out.columns = pd.MultiIndex.from_arrays([out.columns.str[0], out.columns])
print(out.max(axis=1, level=0))
a b c d e f
index
1 3 4 5 6 3 0
2 4 3 5 6 3 0
3 4 0 5 6 3 0
4 4 5 3 6 0 3
You can apply "extractall" and "pivot".
After "extractall" you get:
0 1
index match
1 0 a 3
1 b 4
2 c 5
3 d 6
4 e 3
2 0 b 3
1 a 4
2 c 5
3 d 6
4 e 3
3 0 e 3
1 a 4
2 c 5
3 d 6
4 0 c 3
1 a 4
2 b 5
3 d 6
4 f 3
And in one step:
rslt= df.var_1.str.extractall(r"([a-z])=(\d+)") \
.reset_index(level="match",drop=True) \
.pivot(columns=0).fillna(0)
1
0 a b c d e f
index
1 3 4 5 6 3 0
2 4 3 5 6 3 0
3 4 0 5 6 3 0
4 4 5 3 6 0 3
#rslt.columns= rslt.columns.levels[1].values
So I have the following pandas dataframe:
import pandas as pd
sample_df = pd.DataFrame({'note': ['D','C','D','C'], 'time': [1,1,4,6], 'val': [6,4,7,9]})
which gives the result
note time val
0 D 1 6
1 C 1 4
2 D 4 7
3 C 6 9
What I want is
note index time val
C 1 1 4
3 6 9
D 0 1 6
2 4 7
I tried sample_df.set_index('note',append=True) and it didn't work.
Add DataFrame.swaplevel with DataFrame.sort_index by first level:
df = sample_df.set_index('note', append=True).swaplevel(1,0).sort_index(level=0)
print (df)
time val
note
C 1 1 4
3 6 9
D 0 1 6
2 4 7
If need set level name add DataFrame.rename_axis:
df = (sample_df.rename_axis('idx')
.set_index('note',append=True)
.swaplevel(1,0)
.sort_index(level=0))
print (df)
time val
note idx
C 1 1 4
3 6 9
D 0 1 6
2 4 7
Alternatively:
sample_df.index.rename('old_index', inplace=True)
sample_df.reset_index(inplace=True)
sample_df.set_index(['note','old_index'], inplace=True)
sample_df.sort_index(level=0, inplace=True)
print (sample_df)
time val
note old_index
C 1 1 4
3 6 9
D 0 1 6
2 4 7
I am using MultiIndex create the target index
sample_df.index=pd.MultiIndex.from_arrays([sample_df.note,sample_df.index])
sample_df.drop('note',1,inplace=True)
sample_df=sample_df.sort_index(level=0)
sample_df
time val
note
C 1 1 4
3 6 9
D 0 1 6
2 4 7
I would use set_index and pop to simultaneously discard column 'note' and set new index
df.set_index([df.pop('note'), df.index]).sort_index(level=0)
Out[380]:
time val
note
C 1 1 4
3 6 9
D 0 1 6
2 4 7
I have a data frame like this:
df1 = pd.DataFrame({'a': [1,2],
'b': [3,4],
'c': [6,5]})
df1
Out[150]:
a b c
0 1 3 6
1 2 4 5
Now I want to create a df that repeats each row based on difference between col b and c plus 1. So diff between b and c for first row is 6-3 = 3. I want to repeat that row 3+1=4 times. Similarly for second row the difference is 5-4 = 1, so I want to repeat it 1+1=2 times. The column d is added to have value from min(b) to diff between b and c (i.e.6-3 = 3. So it goes from 3->6). So I want to get this df:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5
Do it with reindex + repeat, then using groupby cumcount assign the new value d
df1.reindex(df1.index.repeat(df1.eval('c-b').add(1))).\
assign(d=lambda x : x.c-x.groupby('a').cumcount(ascending=False))
Out[572]:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5
I have two df,
First df
A B C
1 1 3
1 1 2
1 2 5
2 2 7
2 3 7
Second df
B D
1 5
2 6
3 4
The column Bhas the same meaning in the both dfs. What is the most easy way add column D to the corresponding values in the first df? Output should be:
A B C D
1 1 3 5
1 1 2 5
1 2 5 6
2 2 7 6
2 3 7 4
Perform a 'left' merge in your case on column 'B':
In [206]:
df.merge(df1, how='left', on='B')
Out[206]:
A B C D
0 1 1 3 5
1 1 1 2 5
2 1 2 5 6
3 2 2 7 6
4 2 3 7 4
Another method would be to set 'B' on your second df as the index and then call map:
In [215]:
df1 = df1.set_index('B')
df['D'] = df['B'].map(df1['D'])
df
Out[215]:
A B C D
0 1 1 3 5
1 1 1 2 5
2 1 2 5 6
3 2 2 7 6
4 2 3 7 4