Python Pandas Column Formation - python

Is there an easy way to reformat the columns from
2000-01-03 Location1 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-04 Location2 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-05 Location3 A1 B1 C1 A2 B2 C2 A3 B3 C3
to
2000-01-03 Location1 A1 A2 A3 B1 B2 B3 C1 C2 C3
2000-01-04 Location2 A1 A2 A3 B1 B2 B3 C1 C2 C3
2000-01-05 Location3 A1 A2 A3 B1 B2 B3 C1 C2 C3
Thanks

To reorder columns dynamically based on value, here is a way to do it:
df.sort_values(df.index[0], axis=1)
This returns a dataframe with columns (axis=1) ordered based on sorted value of first row.
Here is a full example using your data sample:
import pandas as pd
from io import StringIO
sample=StringIO('''date location x1 y1 z1 x2 y2 z2 x3 y3 z3
2000-01-03 Location1 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-04 Location2 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-05 Location3 A1 B1 C1 A2 B2 C2 A3 B3 C3''')
df = pd.read_csv(sample, sep=' ')
print(df)
df2 = df.sort_values(df.index[0], axis=1)
initialCols = ['date','location']
restCols = [col for col in df2.columns if col not in initialCols]
dfFinal = df2[initialCols + restCols]
dfFinal

One of the easiest ways to reorder the columns of Pandas DataFrame is to use indexing
reordered_col_list = ['Date', 'Location', 'A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']
df = df[reordered_col_list]

Related

Pandas concat with different columns

I have two dfs that I want to concat
(sorry I don't know how to properly recreate a df here)
A B
a1 b1
a2 b2
a3 b3
A C
a1 c1
a4 c4
Result:
A B C
a1 b1 c1
a2 b2 NaN
a3 b3 NaN
a4 NaN c4
I have tried:
merge = pd.concat([df1,df2],axis = 0,ignore_index= True)
but this seems to just append the second df to the first df
Thank you!
I believe you need an outer join:
>>> pd.merge(df,df2,how='outer')
A B C
0 a1 b1 c1
1 a2 b2 NaN
2 a3 b3 NaN
3 a4 NaN c4

Adding and subtracting dataframe rows conditionally

Lets say I have the dataframe below:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
I am trying to write something that would essentially say; if column A contains A1, A2, or A4, then add a 'column E' populated by 'xx' in the rows where any of the three variables appear.
Then create a df2 which only contains the flagged rows and a df3 which has the flagged rows and column E subtracted. Resulting in df2:
A B C D E
0 A1 B1 C1 D1 xx
1 A2 B2 C2 D2 xx
2 A4 B4 C4 D4 xx
and df3:
A B C D
0 A0 B0 C0 D0
1 A3 B3 C3 D3
Python/pandas beginner here, so any and all help is much appreciated!
You can use boolean indexing:
mask = df["A"].isin(["A1", "A2", "A4"])
df_a = df[mask].copy()
df_a["E"] = "xx"
df_b = df[~mask] # add .copy()
print(df_a)
print(df_b)
Prints:
A B C D E
1 A1 B1 C1 D1 xx
2 A2 B2 C2 D2 xx
4 A4 B4 C4 D4 xx
A B C D
0 A0 B0 C0 D0
3 A3 B3 C3 D3

Pandas combine two dataframes to update values of a particular column in 1st dataframe

I have two dataframes:
df1 :
A B C
0 a0 b0 c0
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
df2 :
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a70 b2 c20
3 a3 b9 c9
In df1, for every row, whenever Column A and Column B values are equal to values in df2, column C should be updated with value from df2.
Output:
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
I tried the following, but it did not work.
df1.set_index(['A', 'B'])
df2.set_index(['A', 'B'])
df1.update(df2)
df1.reset_index()
df2.reset_index()
df1["C"][:4] = np.where((df1["A"][:4]==df2["A"])&(df1["B"][:4]==df2["B"]),df2["C"],df1["C"][:4])
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4

Copying values into Nan fields

I have a large data-frame with the format as below:
If any of the cell is "NaN", i want to copy from the cell immediately above it. So, my dataframe should look like:
In case the first row has "NaN", then I'll have to let it be.
Can someone please help me with this?
This looks like pandas if so you need to call ffill
In [72]:
df = pd.DataFrame({'A':['A0','A1','A2',np.NaN,np.NaN, 'A3'], 'B':['B0','B1','B2',np.NaN,np.NaN, 'B3'], 'C':['C0','C1','C2',np.NaN,np.NaN, 'C3']})
df
Out[72]:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 NaN NaN NaN
4 NaN NaN NaN
5 A3 B3 C3
In [73]:
df.ffill()
Out[73]:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 A2 B2 C2
4 A2 B2 C2
5 A3 B3 C3

convert some rows in rows of a multiindex in pandas dataframe

Lets say I have a pandas dataframe as follows:
A B C D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
I would like to know how I can convert it to this.
A B
0 C c0 a0 b0
D d0 a0 b0
1 C c1 a1 b1
D d1 a1 b1
2 C c2 a2 b2
D d2 a2 b2
3 C c3 a3 b3
D d3 a3 b3
basically making a few columns as rows and creating a multi index.
Well, melt will pretty much get it in the form you want and then you can set the index as desired:
print df
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
Now use melt to stack (note, I reset the index and use that column as an id_var because it looks like you want the [0,1,2,3] index including in the stacking):
new = pd.melt(df.reset_index(),value_vars=['C','D'],id_vars=['index','A','B'])
print new
index A B variable value
0 0 a0 b0 C c0
1 1 a1 b1 C c1
2 2 a2 b2 C c2
3 3 a3 b3 C c3
4 0 a0 b0 D d0
5 1 a1 b1 D d1
6 2 a2 b2 D d2
7 3 a3 b3 D d3
Now just set the index (well sort it and then set the index to make it look like your desired output):
new = new.sort(['index']).set_index(['index','variable','value'])
print new
A B
index variable value
0 C c0 a0 b0
D d0 a0 b0
1 C c1 a1 b1
D d1 a1 b1
2 C c2 a2 b2
D d2 a2 b2
3 C c3 a3 b3
D d3 a3 b3
If you don't need the [0,1,2,3] as part of the stack, the melt command is a bit cleaner:
print pd.melt(df,value_vars=['C','D'],id_vars=['A','B'])
A B variable value
0 a0 b0 C c0
1 a1 b1 C c1
2 a2 b2 C c2
3 a3 b3 C c3
4 a0 b0 D d0
5 a1 b1 D d1
6 a2 b2 D d2
7 a3 b3 D d3

Categories