Generate a dataframe from list with different length [duplicate] - python

This question already has answers here:
Creating dataframe from a dictionary where entries have different lengths
(8 answers)
Closed 4 years ago.
Here I got many list with different length, like a=[1,2,3] and b=[2,3]
I would like to generate a pd.DataFrame from them, by padding nan at the end of list, like this:
a b
1 1 2
2 2 3
3 3 nan
Any good idea to help me do so?

Use
In [9]: pd.DataFrame({'a': pd.Series(a), 'b': pd.Series(b)})
Out[9]:
a b
0 1 2.0
1 2 3.0
2 3 NaN
Or,
In [10]: pd.DataFrame.from_dict({'a': a, 'b': b}, orient='index').T
Out[10]:
a b
0 1.0 2.0
1 2.0 3.0
2 3.0 NaN

Related

Combining partially overlapping data frames on both axes pandas [duplicate]

This question already has answers here:
Merging two data frames and keeping the extra rows from first df
(2 answers)
Closed 1 year ago.
What is the best way to combine multiple data frames that partially overlap on both axes?
I came up already with a workable solution but I'm unsure it's the best way nor that I should be doing that at all.
So I have the following frames:
df1 = pd.DataFrame([[6,6],[7,7]],columns=['a','b'])
print(df1)
a b
0 6 6
1 7 7
df2 = pd.DataFrame([[7,7],[8,8]],columns=['b','c'], index=[1,2])
print(df2)
b c
1 7 7
2 8 8
basically the only overlapping data point is b1
and I'd like to obtain the following:
a b c
0 6.0 6 NaN
1 7.0 7 7.0
2 NaN 8 8.0
If I do a regular concat I end up either with a duplicate on the index or on the columns. Now, the workaround I found is the following:
dfc = pd.concat([df1,df2],axis=0)
dfc = dfc.groupby(dfc.index).mean()
print(dfc)
a b c
0 6.0 6 NaN
1 7.0 7 7.0
2 NaN 8 8.0
I wonder if there is a better way to do it and more in general if this is best practice when handling data.
I should also add that in my datasets the overlapping data is always an exact duplicate and I "should" never have different values if the indexes are the same.
Thank you!
You want to use combine_first:
df1.combine_first(df2)
output:
a b c
0 6.0 6 NaN
1 7.0 7 7.0
2 NaN 8 8.0
NB. if b1 is different in the two dataframes, this will take the value from df1

how to fill dataframe with former value? [duplicate]

This question already has answers here:
How to replace NaNs by preceding or next values in pandas DataFrame?
(10 answers)
Closed 3 years ago.
I import the data from an excel file. But the format of merged cells in excel file does not match in python. Therefore, I have to modify the data in python.
for example: the data I import in python looks like
0 aa
1 NaN
2 NaN
3 NaN
4 b
5 NaN
6 NaN
7 NaN
8 NaN
9 ccc
10 NaN
11 NaN
12 NaN
13 dd
14 NaN
15 NaN
16 NaN
the result I want is:
0 aa
1 aa
2 aa
3 aa
4 b
5 b
6 b
7 b
8 b
9 ccc
10 ccc
11 ccc
12 ccc
13 dd
14 dd
15 dd
16 dd
I tried to use for loop to fix the problem. But it took lots of time and I have a huge dataset. I do not know if there is a faster way to do it.
Looks like .fillna() is your friend – quoting the documentation::
We can also propagate non-null values forward or backward.
>>> df
A B C D
0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1
2 NaN NaN NaN 5
3 NaN 3.0 NaN 4
>>> df.fillna(method='ffill')
A B C D
0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1
2 3.0 4.0 NaN 5
3 3.0 3.0 NaN 4
This is exactly the use of the .fillna() function in pandas
You can get your desired result with the help of apply AND fillna methods :-
import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'A':['a', np.nan, np.nan, 'b', np.nan]})
l = []
def change(value):
if value == "bhale":
value = l[-1]
return value
else:
l.append(value)
return value
# First converting NaN values into any string value like `bhale` here
df['A'] = df['A'].fillna('bhale')
df["A"] = df['A'].apply(change) # Using apply method.
df
I hope it may help you.

Pandas combining dataframes based on column value

I am trying to turn multiple dataframes into a single one based on the values in the first column, but not every dataframe has the same values in the first column. Take this example:
df1:
A 4
B 6
C 8
df2:
A 7
B 4
F 3
full_df:
A 4 7
B 6 4
C 8
F 3
How do I do this using python and pandas?
You can use pandas merge with outer join
df1.merge(df2,on =['first_column'],how='outer')
You can use pd.concat, remembering to align indices:
res = pd.concat([df1.set_index(0), df2.set_index(0)], axis=1)
print(res)
1 1
A 4.0 7.0
B 6.0 4.0
C 8.0 NaN
F NaN 3.0

Pandas: how to join two dataframes combinatorially [duplicate]

This question already has answers here:
cartesian product in pandas
(13 answers)
Closed 4 years ago.
I have two dataframes that I would like to combine combinatorial-wise (i.e. combinatorially join each row from one df to each row of another df). I can do this by merging on 'key's but my solution is clearly cumbersome. I'm looking for a more straightforward, even pythonesque way of handling this operation. Any suggestions?
MWE:
fred = pd.DataFrame({'A':[1., 4.],'B':[2., 5.], 'C':[3., 6.]})
print(fred)
A B C
0 1.0 2.0 3.0
1 4.0 5.0 6.0
jim = pd.DataFrame({'one':['a', 'c'],'two':['b', 'd']})
print(jim)
one two
0 a b
1 c d
fred['key'] = [1,2]
jim1 = jim.copy()
jim1['key'] = 1
jim2 = jim.copy()
jim2['key'] = 2
jim3 = jim1.append(jim2)
jack = pd.merge(fred, jim3, on='key').drop(['key'], axis=1)
print(jack)
A B C one two
0 1.0 2.0 3.0 a b
1 1.0 2.0 3.0 c d
2 4.0 5.0 6.0 a b
3 4.0 5.0 6.0 c d
You can join every row of fred with every row of jim by merging on a key column which is equal to the same value (say, 1) for every row:
In [16]: pd.merge(fred.assign(key=1), jim.assign(key=1), on='key').drop('key', axis=1)
Out[16]:
A B C one two
0 1.0 2.0 3.0 a b
1 1.0 2.0 3.0 c d
2 4.0 5.0 6.0 a b
3 4.0 5.0 6.0 c d
Are you looking for the cartesian product of the two dataframes, like a cross join?
It is answered here.

Merge dataframes without duplicating rows in python pandas [duplicate]

This question already has answers here:
Pandas left join on duplicate keys but without increasing the number of columns
(2 answers)
Closed 4 years ago.
I'd like to combine two dataframes using their similar column 'A':
>>> df1
A B
0 I 1
1 I 2
2 II 3
>>> df2
A C
0 I 4
1 II 5
2 III 6
To do so I tried using:
merged = pd.merge(df1, df2, on='A', how='outer')
Which returned:
>>> merged
A B C
0 I 1.0 4
1 I 2.0 4
2 II 3.0 5
3 III NaN 6
However, since df2 only contained one value for A == 'I', I do not want this value to be duplicated in the merged dataframe. Instead I would like the following output:
>>> merged
A B C
0 I 1.0 4
1 I 2.0 NaN
2 II 3.0 5
3 III NaN 6
What is the best way to do this? I am new to python and still slightly confused with all the join/merge/concatenate/append operations.
Let us create a new variable g, by cumcount
df1['g']=df1.groupby('A').cumcount()
df2['g']=df2.groupby('A').cumcount()
df1.merge(df2,how='outer').drop('g',1)
Out[62]:
A B C
0 I 1.0 4.0
1 I 2.0 NaN
2 II 3.0 5.0
3 III NaN 6.0

Categories