I have a large data-frame with the format as below:
If any of the cell is "NaN", i want to copy from the cell immediately above it. So, my dataframe should look like:
In case the first row has "NaN", then I'll have to let it be.
Can someone please help me with this?
This looks like pandas if so you need to call ffill
In [72]:
df = pd.DataFrame({'A':['A0','A1','A2',np.NaN,np.NaN, 'A3'], 'B':['B0','B1','B2',np.NaN,np.NaN, 'B3'], 'C':['C0','C1','C2',np.NaN,np.NaN, 'C3']})
df
Out[72]:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 NaN NaN NaN
4 NaN NaN NaN
5 A3 B3 C3
In [73]:
df.ffill()
Out[73]:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 A2 B2 C2
4 A2 B2 C2
5 A3 B3 C3
Related
I have a dataset:
name val
a a1
a a2
b b1
b b2
b b3
c c1
I want to make all possible permutations "names" which are not same. So desired result is:
name1 val1 name2 val2
a a1 b b1
a a1 b b2
a a1 b b3
a a2 b b1
a a2 b b2
a a2 b b3
a a1 c c1
a a2 c c2
b b1 c c1
b b2 c c1
b b3 c c1
How to do that? Id like to write a function that would make same operation with bigger table with same structure.
I would like to make it efficiently, since original data has several thousands rows
Easiest is to cross merge and query, if you have enough memory for few million rows, which is not too bad:
df.merge(df, how='cross', suffixes=['1','2']).query('name1 < name2')
Output:
name1 val1 name2 val2
2 a a1 b b1
3 a a1 b b2
4 a a1 b b3
5 a a1 c c1
8 a a2 b b1
9 a a2 b b2
10 a a2 b b3
11 a a2 c c1
17 b b1 c c1
23 b b2 c c1
29 b b3 c c1
I have two dfs that I want to concat
(sorry I don't know how to properly recreate a df here)
A B
a1 b1
a2 b2
a3 b3
A C
a1 c1
a4 c4
Result:
A B C
a1 b1 c1
a2 b2 NaN
a3 b3 NaN
a4 NaN c4
I have tried:
merge = pd.concat([df1,df2],axis = 0,ignore_index= True)
but this seems to just append the second df to the first df
Thank you!
I believe you need an outer join:
>>> pd.merge(df,df2,how='outer')
A B C
0 a1 b1 c1
1 a2 b2 NaN
2 a3 b3 NaN
3 a4 NaN c4
Lets say I have the dataframe below:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
I am trying to write something that would essentially say; if column A contains A1, A2, or A4, then add a 'column E' populated by 'xx' in the rows where any of the three variables appear.
Then create a df2 which only contains the flagged rows and a df3 which has the flagged rows and column E subtracted. Resulting in df2:
A B C D E
0 A1 B1 C1 D1 xx
1 A2 B2 C2 D2 xx
2 A4 B4 C4 D4 xx
and df3:
A B C D
0 A0 B0 C0 D0
1 A3 B3 C3 D3
Python/pandas beginner here, so any and all help is much appreciated!
You can use boolean indexing:
mask = df["A"].isin(["A1", "A2", "A4"])
df_a = df[mask].copy()
df_a["E"] = "xx"
df_b = df[~mask] # add .copy()
print(df_a)
print(df_b)
Prints:
A B C D E
1 A1 B1 C1 D1 xx
2 A2 B2 C2 D2 xx
4 A4 B4 C4 D4 xx
A B C D
0 A0 B0 C0 D0
3 A3 B3 C3 D3
I have the following type of a dataframe, values which are grouped by 3 different categories A,B,C:
import pandas as pd
A = ['A1', 'A2', 'A3', 'A2', 'A1']
B = ['B3', 'B2', 'B2', 'B1', 'B3']
C = ['C2', 'C2', 'C3', 'C1', 'C3']
value = ['6','2','3','3','5']
df = pd.DataFrame({'categA': A,'categB': B, 'categC': C, 'value': value})
df
Which looks like:
categA categB categC value
0 A1 B3 C2 6
1 A2 B2 C2 2
2 A3 B2 C3 3
3 A2 B1 C1 3
4 A1 B3 C3 5
Now, when I want to unstack this df by the C category, .unstack() returns some multi-indexed dataframe with 'value' at the first level and my categories of interest C1, C2 & C3 at the second level:
df = df.set_index(['categA','categB','categC']).unstack('categC')
df
Output:
value
categC C1 C2 C3
categA categB
A1 B3 NaN 6 5
A2 B1 3 NaN NaN
B2 NaN 2 NaN
A3 B2 NaN NaN 3
Is there a quick and clean way to get rid of the multi-index by reducing it to the highest available level? This is what I'd like to have as output:
categA categB C1 C2 C3
A1 B3 NaN 6 5
A2 B1 3 NaN NaN
B2 NaN 2 NaN
A3 B2 NaN NaN 3
Many thanks in advance!
Edit:
print(df.reset_index())
gives:
categA categB value
categC C1 C2 C3
0 A1 B3 NaN 6 5
1 A2 B1 3 NaN NaN
2 A2 B2 NaN 2 NaN
3 A3 B2 NaN NaN 3
Adding reset_index also , unstack with Series
df.set_index(['categA','categB','categC']).value.unstack('categC').reset_index()
Out[875]:
categC categA categB C1 C2 C3
0 A1 B3 None 6 5
1 A2 B1 3 None None
2 A2 B2 None 2 None
3 A3 B2 None None 3
I have two dataframes that look like this:
df1=
A B
1 A1 B1
2 A2 B2
3 A3 B3
df2 =
A C
4 A4 C4
5 A5 C5
I would like to append df2 to df1, like so:
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 NaN
5 A5 NaN
(Note: I've edited the dataframes so that not all the columns in df1 are necessarily in df2)
Whether I use concat or append, the resulting dataframe I get would have a column called "C" with the first three rows filled with nan. I just want to keep the two original columns in df1, with the new values appended. Is there a way concatenate the dataframes without having to drop the extra column afterwards?
You can first filter columns for appending by subset:
print (df2[['A']])
A
4 A4
5 A5
print (pd.concat([df1, df2[['A']]]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 NaN
5 A5 NaN
print (df1.append(df2[['A']]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 NaN
5 A5 NaN
print (df2[['A','B']])
A B
4 A4 B4
5 A5 B5
print (pd.concat([df1, df2[['A','B']]]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
Or:
print (df1.append(df2[['A','B']]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
EDIT by comment:
If columns in df1 and df2 have different columns, use intersection:
print (df1)
A B D
1 A1 B1 R
2 A2 B2 T
3 A3 B3 E
print (df2)
A B C
4 A4 B4 C4
5 A5 B5 C5
print (df1.columns.intersection(df2.columns))
Index(['A', 'B'], dtype='object')
print (pd.concat([df1, df2[df1.columns.intersection(df2.columns)]]))
A B D
1 A1 B1 R
2 A2 B2 T
3 A3 B3 E
4 A4 B4 NaN
5 A5 B5 NaN
Actually the solution is in an obscure corner of this page. Here's the code to use:
pd.concat([df1,df2],join_axes=[df1.columns])