Is there an easy way to reformat the columns from
2000-01-03 Location1 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-04 Location2 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-05 Location3 A1 B1 C1 A2 B2 C2 A3 B3 C3
to
2000-01-03 Location1 A1 A2 A3 B1 B2 B3 C1 C2 C3
2000-01-04 Location2 A1 A2 A3 B1 B2 B3 C1 C2 C3
2000-01-05 Location3 A1 A2 A3 B1 B2 B3 C1 C2 C3
Thanks
To reorder columns dynamically based on value, here is a way to do it:
df.sort_values(df.index[0], axis=1)
This returns a dataframe with columns (axis=1) ordered based on sorted value of first row.
Here is a full example using your data sample:
import pandas as pd
from io import StringIO
sample=StringIO('''date location x1 y1 z1 x2 y2 z2 x3 y3 z3
2000-01-03 Location1 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-04 Location2 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-05 Location3 A1 B1 C1 A2 B2 C2 A3 B3 C3''')
df = pd.read_csv(sample, sep=' ')
print(df)
df2 = df.sort_values(df.index[0], axis=1)
initialCols = ['date','location']
restCols = [col for col in df2.columns if col not in initialCols]
dfFinal = df2[initialCols + restCols]
dfFinal
One of the easiest ways to reorder the columns of Pandas DataFrame is to use indexing
reordered_col_list = ['Date', 'Location', 'A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']
df = df[reordered_col_list]
Related
I have two dfs that I want to concat
(sorry I don't know how to properly recreate a df here)
A B
a1 b1
a2 b2
a3 b3
A C
a1 c1
a4 c4
Result:
A B C
a1 b1 c1
a2 b2 NaN
a3 b3 NaN
a4 NaN c4
I have tried:
merge = pd.concat([df1,df2],axis = 0,ignore_index= True)
but this seems to just append the second df to the first df
Thank you!
I believe you need an outer join:
>>> pd.merge(df,df2,how='outer')
A B C
0 a1 b1 c1
1 a2 b2 NaN
2 a3 b3 NaN
3 a4 NaN c4
Lets say I have the dataframe below:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
I am trying to write something that would essentially say; if column A contains A1, A2, or A4, then add a 'column E' populated by 'xx' in the rows where any of the three variables appear.
Then create a df2 which only contains the flagged rows and a df3 which has the flagged rows and column E subtracted. Resulting in df2:
A B C D E
0 A1 B1 C1 D1 xx
1 A2 B2 C2 D2 xx
2 A4 B4 C4 D4 xx
and df3:
A B C D
0 A0 B0 C0 D0
1 A3 B3 C3 D3
Python/pandas beginner here, so any and all help is much appreciated!
You can use boolean indexing:
mask = df["A"].isin(["A1", "A2", "A4"])
df_a = df[mask].copy()
df_a["E"] = "xx"
df_b = df[~mask] # add .copy()
print(df_a)
print(df_b)
Prints:
A B C D E
1 A1 B1 C1 D1 xx
2 A2 B2 C2 D2 xx
4 A4 B4 C4 D4 xx
A B C D
0 A0 B0 C0 D0
3 A3 B3 C3 D3
I have two dataframes:
df1 :
A B C
0 a0 b0 c0
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
df2 :
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a70 b2 c20
3 a3 b9 c9
In df1, for every row, whenever Column A and Column B values are equal to values in df2, column C should be updated with value from df2.
Output:
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
I tried the following, but it did not work.
df1.set_index(['A', 'B'])
df2.set_index(['A', 'B'])
df1.update(df2)
df1.reset_index()
df2.reset_index()
df1["C"][:4] = np.where((df1["A"][:4]==df2["A"])&(df1["B"][:4]==df2["B"]),df2["C"],df1["C"][:4])
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
I have a large data-frame with the format as below:
If any of the cell is "NaN", i want to copy from the cell immediately above it. So, my dataframe should look like:
In case the first row has "NaN", then I'll have to let it be.
Can someone please help me with this?
This looks like pandas if so you need to call ffill
In [72]:
df = pd.DataFrame({'A':['A0','A1','A2',np.NaN,np.NaN, 'A3'], 'B':['B0','B1','B2',np.NaN,np.NaN, 'B3'], 'C':['C0','C1','C2',np.NaN,np.NaN, 'C3']})
df
Out[72]:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 NaN NaN NaN
4 NaN NaN NaN
5 A3 B3 C3
In [73]:
df.ffill()
Out[73]:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 A2 B2 C2
4 A2 B2 C2
5 A3 B3 C3
Lets say I have a pandas dataframe as follows:
A B C D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
I would like to know how I can convert it to this.
A B
0 C c0 a0 b0
D d0 a0 b0
1 C c1 a1 b1
D d1 a1 b1
2 C c2 a2 b2
D d2 a2 b2
3 C c3 a3 b3
D d3 a3 b3
basically making a few columns as rows and creating a multi index.
Well, melt will pretty much get it in the form you want and then you can set the index as desired:
print df
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
Now use melt to stack (note, I reset the index and use that column as an id_var because it looks like you want the [0,1,2,3] index including in the stacking):
new = pd.melt(df.reset_index(),value_vars=['C','D'],id_vars=['index','A','B'])
print new
index A B variable value
0 0 a0 b0 C c0
1 1 a1 b1 C c1
2 2 a2 b2 C c2
3 3 a3 b3 C c3
4 0 a0 b0 D d0
5 1 a1 b1 D d1
6 2 a2 b2 D d2
7 3 a3 b3 D d3
Now just set the index (well sort it and then set the index to make it look like your desired output):
new = new.sort(['index']).set_index(['index','variable','value'])
print new
A B
index variable value
0 C c0 a0 b0
D d0 a0 b0
1 C c1 a1 b1
D d1 a1 b1
2 C c2 a2 b2
D d2 a2 b2
3 C c3 a3 b3
D d3 a3 b3
If you don't need the [0,1,2,3] as part of the stack, the melt command is a bit cleaner:
print pd.melt(df,value_vars=['C','D'],id_vars=['A','B'])
A B variable value
0 a0 b0 C c0
1 a1 b1 C c1
2 a2 b2 C c2
3 a3 b3 C c3
4 a0 b0 D d0
5 a1 b1 D d1
6 a2 b2 D d2
7 a3 b3 D d3