I have a dataframe with columns A, B, C, D and the index is a time series.
I want to create a new dataframe with the same index, but many more columns in a multi index. A, B, C, D are the first level of the multi index. I want every column in the new dataframe to have the same value that A, B, C, D did, according to its multi index level.
In other words, if I have a data frame like this:
A B C D
0 2 3 4 5
1 X Y Z 1
I want to make a new dataframe that looks like this
A B C D
0 1 2 3 4 5 6 7
0 2 2 2 3 3 4 5 5
1 X X X Y Y Z 1 1
In other words - I want to do the equivalent of an "HLOOKUP" in excel, using the first level of the multi-index and looking up on the original dataframe.
The new multi-index is pre-determined.
As suggested by cᴏʟᴅsᴘᴇᴇᴅ in the comments, you can use DataFrame.reindex with the columns and level arguments:
In [35]: mi
Out[35]:
MultiIndex(levels=[['A', 'B', 'C', 'D'], ['0', '1', '2', '3', '4', '5', '6', '7']],
labels=[[0, 0, 0, 1, 1, 2, 3, 3], [0, 1, 2, 3, 4, 5, 6, 7]])
In [36]: df
Out[36]:
A B C D
0 2 3 4 5
1 X Y Z 1
In [37]: df.reindex(columns=mi, level=0)
Out[37]:
A B C D
0 1 2 3 4 5 6 7
0 2 2 2 3 3 4 5 5
1 X X X Y Y Z 1 1
Related
How to split a column into rows if values are separated with a comma? I am stuck in here. I have used the following code
xd = df.assign(var1=df['var1'].str.split(',')).explode('var1')
xd = xd.assign(var2=xd['var2'].str.split(',')).explode('var2')
xd
But the above code generate multiple irrelevant rows. I am stuck here. Please suggest answers
DataFrame.explode
For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length.
From docs:
df = pd.DataFrame({'A': [[0, 1, 2], 'foo', [], [3, 4]],
'B': 1,
'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
df
A B C
0 [0, 1, 2] 1 [a, b, c]
1 foo 1 NaN
2 [] 1 []
3 [3, 4] 1 [d, e]
Multi-column explode.
df.explode(list('AC'))
A B C
0 0 1 a
0 1 1 b
0 2 1 c
1 foo 1 NaN
2 NaN 1 NaN
3 3 1 d
3 4 1 e
For your specific question:
xd = df.assign(
var1=df['var1'].str.split(','),
var2=df['var2'].str.split(',')
).explode(['var1', 'var2'])
xd
var1 var2 var3
0 a e 1
0 b f 1
0 c g 1
0 d h 1
1 p s 2
1 q t 2
1 r u 2
I have a Dataframe file in which I want to switch the order of columns in only the third row while keeping other rows the same.
Under some condition, I have to switch orders for my project, but here is an example that probably has no real meaning.
Suppose the dataset is
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['a', 'b', 'c', 'd', 'e']})
df
out[1]:
A B C
0 0 5 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e
I want to have the output:
A B C
0 0 5 a
1 1 6 b
2 **7 2** c
3 3 8 d
4 4 9 e
How do I do it?
I have tried:
new_order = [1, 0, 2] # specify new order of the third row
i = 2 # specify row number
df.iloc[i] = df[df.columns[new_order]].loc[i] # reorder the third row only and assign new values to df
I observed from the output of the right-hand side that the columns are reordering as I wanted:
df[df.columns[new_order]].loc[i]
Out[2]:
B 7
A 2
C c
Name: 2, dtype: object
But when assigned to df again, it did nothing. I guess it's because of the name matching.
Can someone help me? Thanks in advance!
df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'B': ['X', 'Y', 'Z'] * 3,
'C': [1, 2, 3, 1, 2, 3, 1, 2, 3]})
>>> df
A B C
0 1 X 1
1 1 Y 2
2 1 Z 3
3 2 X 1
4 2 Y 2
5 2 Z 3
6 3 X 1
7 3 Y 2
8 3 Z 3
result = df.pivot_table(index=['B'], values='C', aggfunc=sum)
>>> result
B
X 3
Y 6
Z 9
Name: C, dtype: int64
How can I have the column name for C show up a above the sums, and how can I sort result either ascending or descending. Result is a series not a dataframe and seems non-sortable?
Python: 2.7.11 and Pandas: 0.17.1
You were very close. Note that the brackets around the values coerces the result into a dataframe instead of a series (i.e. values=['C'] instead of values='C').
result = df.pivot_table(index = ['B'], values=['C'], aggfunc=sum)
>>> result
C
B
X 3
Y 6
Z 9
Asresult is now a dataframe, you can use sort_values on it:
>>> result.sort_values('C', ascending=False)
C
B
Z 9
Y 6
X 3
Suppose we have a Pandas DataFrame f defined as follows. I am trying to create a mask to select all rows with value 'a' or 'b' in column 'xx'(I would like to select out row 0, 1, 3, 4).
f = pd.DataFrame([['a', 'b','c','a', 'b','c'],['1', '2','3', '4', '5','6', ]])
f = f.transpose()
f.columns = ['xx', 'yy']
f
xx yy
0 a 1
1 b 2
2 c 3
3 a 4
4 b 5
5 c 6
Is there any elegant way to do this in pandas?
I know to select all rows with f.xx =='a', we can do f[f.xx == 'a']. While I have not figure out how to select rows with f.xx is either 'a' or 'b'. Thanks.
You could use isin
print(f[(f["xx"].isin(("a","b")))])
Which will give you:
xx yy
0 a 1
1 b 2
3 a 4
4 b 5
If you really wanted a mask you could use or |:
mask = (f["xx"] == "a") | (f["xx"] == "b")
print(f[mask])
Which will give you the same output:
xx yy
0 a 1
1 b 2
3 a 4
4 b 5
Suppose I create a pandas DataFrame with two columns, one of which contains some numbers and the other contains letters. Like this:
import pandas as pd
from pprint import pprint
df = pd.DataFrame({'a': [1,2,3,4,5,6], 'b': ['y','x','y','x','y', 'y']})
pprint(df)
a b
0 1 y
1 2 x
2 3 y
3 4 x
4 5 y
5 6 y
Now say that I want to make a third column (c) whose value is equal to the last value of a when b was equal to x. In the cases where a value of x was not encountered in b yet, the value in c should default to 0.
The procedure should produce pretty much the following result:
last_a = 0
c = []
for i,b in enumerate(df['b']):
if b == 'x':
last_a = df.iloc[i]['a']
c += [last_a]
df['c'] = c
pprint(df)
a b c
0 1 y 0
1 2 x 2
2 3 y 2
3 4 x 4
4 5 y 4
5 6 y 4
Is there a more elegant way to accomplish this either with or without pandas?
In [140]: df = pd.DataFrame({'a': [1,2,3,4,5,6], 'b': ['y','x','y','x','y', 'y']})
In [141]: df
Out[141]:
a b
0 1 y
1 2 x
2 3 y
3 4 x
4 5 y
5 6 y
FInd out where column 'b' == x, then return the value in that column (not the location); this column is already the 'a' column
In [142]: df['c'] = df.loc[df['b']=='x','a'].apply(lambda v: v if v < len(df) else np.nan)
Fill the rest of the values forward, then fill holes with 0
In [143]: df['c'] = df['c'].ffill().fillna(0)
In [144]: df
Out[144]:
a b c
0 1 y 0
1 2 x 2
2 3 y 2
3 4 x 4
4 5 y 4
5 6 y 4