Summing Two DataFrames by Index - python

I have the following
df1 = pd.DataFrame([1, 1, 1, 1, 1], index=[ 1, 2, 3, 4 ,5 ], columns=['A'])
df2 = pd.DataFrame([ 1, 1, 1, 1, 1], index=[ 2, 3, 4, 5, 6], columns=['A'])
I want to return the DataFrame which will be the sum of the two for each row:
df = pd.DataFrame([ 1, 2, 2, 2, 2, 1], index=[1, 2, 3, 4, 5, 6], columns=['A'])
of course, the idea is that I don't know what the actual indices are, so the intersection could be empty and I'd get a concatenation of both DataFrames.

You can concatenate by row, fill missing values by 0, and sum by row:
>>> pd.concat([df1, df2], axis=1).fillna(0).sum(axis=1)
1 1
2 2
3 2
4 2
5 2
6 1
dtype: float64
If you want it as a DataFrame, simply do
pd.DataFrame({
'A': pd.concat([df1, df2], axis=1).fillna(0).sum(axis=1)})
(Also, note that if you need to do this just for specific Series A, Just use
pd.concat([df1.A, df2.A], axis=1).fillna(0).sum(axis=1)
)

Related

Pandas DataFrame: Manipulate value in one column based von value in other column

I got two Pandas DataFrames df and df_0:
df = pd.DataFrame({'condition': [1, 2, 3],
'value': [1, 2, 3]})
df_0 = pd.DataFrame({'condition': [1, 2],
'value': [1, 3]})
condition value
0 1 1
1 2 2
1 3 3
condition value
0 1 1
1 2 3
I want to subtract (-=) the value in df_0 from the value (column) in df, if the resp. condition value/column is the same.
Pseudo code
df_desired_result = pd.DataFrame({'condition': [1, 2, 3],
'value': [0, -1, 3]})
condition value
0 1 0
1 2 -1
2 3 3
How can I achiev this?
Thanks a lot in advance for your reply!
Lucy
Pandas does an index wise arithmetic operations.
just set the 'condition' to be the index, and then fill the NaN values
import pandas as pd
import numpy as np
df = pd.DataFrame({'condition': [1, 2, 3],
'value': [1, 2, 3]})
df_0 = pd.DataFrame({'condition': [1, 2],
'value': [1, 3]})
df_desired_result = df.set_index('condition')-df_0.set_index('condition').reindex(df.condition,fill_value=0)
print(df_desired_result)

Pandas: back unique values to column in order

I'm not sure how I should proceed in this case.
Consider a df like bellow and when I do df.A.unique() -> give me an array like this [1, 2, 3, 4]
But also I want the index of this values, like numpy.unique()
df = pd.DataFrame({'A': [1,1,1,2,2,2,3,3,4], 'B':[9,8,7,6,5,4,3,2,1]})
df.A.unique()
>>> array([1, 2, 3, 4])
And
np.unique([1,1,1,2,2,2,3,3,4], return_inverse=True)
>>> (array([1, 2, 3, 4]), array([0, 0, 0, 1, 1, 1, 2, 2, 3]))
How can I do it in Pandas? Unique values with index.
In pandas we have drop_duplicates
df.A.drop_duplicates()
Out[22]:
0 1
3 2
6 3
8 4
Name: A, dtype: int64
To match the np.unique output factorize
pd.factorize(df.A)
Out[21]: (array([0, 0, 0, 1, 1, 1, 2, 2, 3]), Int64Index([1, 2, 3, 4], dtype='int64'))
You can also use a dict to .map() with index of .unique():
df.A.map({i:e for e,i in enumerate(df.A.unique())})
0 0
1 0
2 0
3 1
4 1
5 1
6 2
7 2
8 3

Merging pandas dataframes with different size on column with non-unique elements

I have two pandas dataframes I want to combine based on the value of a common column in the dataframe. However in one of the dataframes the values in the column are not unique:
df1 = pd.DataFrame(
{'SimId:': [1, 1, 1, 2, 2],
'RunId': [1, 2, 3, 1, 2],
'Velocity': [5, 6, 7, 8, 9]})
df2 = pd.DataFrame(
{'SimId': [1, 2],
'weather': ['sun', 'snow']})
As a result I would like to get a dataframe like this:
df3 = pd.DataFrame(
{'SimId:': [1, 1, 1, 2, 2],
'RunId': [1, 2, 3, 1, 2],
'Velocity': [5, 6, 7, 8, 9],
'weather': ['sun', 'sun', 'sun', 'snow', 'snow']})
When trying to merge like this:
df3 = pd.merge(df1, df2, on='SimId', how='right')
I get a "KeyError".
Can anyone help me with what is the most pythonic way to solve this?
Your code works:
df3 = pd.merge(df1, df2, on='SimId', how='right')
You just need to fix a typo in df1: not 'SimId:', but 'SimId'.
your code works as Andrey said just fix a typo in df1
df1 = pd.DataFrame(
{'SimId': [1, 1, 1, 2, 2],
'RunId': [1, 2, 3, 1, 2],
'Velocity': [5, 6, 7, 8, 9]})
df2 = pd.DataFrame(
{'SimId': [1, 2],
'weather': ['sun', 'snow']})
df3 = pd.merge(df1, df2, on='SimId', how='right')
print (df3)
RunId SimId Velocity weather
# 0 1 1 5 sun
# 1 2 1 6 sun
# 2 3 1 7 sun
# 3 1 2 8 snow
# 4 2 2 9 snow

change column values (and type) to a pandas Dataframe

I am trying to rename a column in a pandas dataframes, but different dataframes have different types of columns and I need an help. An easy example will clarify you my issue.
import pandas as pd
dic1 = {'a': [4, 1, 3, 1], 'b': [4, 2, 1, 4], 'c': [5, 7, 9, 1]}
dic2 = {1: [4, 1, 3, 1], 2: [4, 2, 1, 4], 3: [5, 7, 9, 1]}
df1 = pd.DataFrame(dic1)
df2 = pd.DataFrame(dic2)
Now if I type
df1.columns.values[-1] = 'newName'
I can easily change the last column name of the first dataframe, but if I type
df2.columns.values[-1] = 'newName'
I get a message of error from Python as the columns in the second dataframe are of different type. Is there a way to change the type of those columns and/or make Python understand in some ways that even the last column of df2 has to be named 'newName'?
This isn't the normal method to rename a column, you should use rename to do this:
In [95]:
df2.rename(columns={df2.columns[-1]:'newName'}, inplace=True)
df2
Out[95]:
1 2 newName
0 4 4 5
1 1 2 7
2 3 1 9
3 1 4 1

Dataframe after reindex does not show all items in multiindex

Main goal - to reindex DataFrame with new multiindex that contains new values
In[34]: df = pd.DataFrame([[1,2,3], [1,4,2],[2,3,4], [2,2,1]], columns=['a', 'b', 'c'])
In[35]: df = df.set_index(['a', 'b'])
In[36]: df.index
Out[36]:
MultiIndex(levels=[[1, 2], [2, 3, 4]],
labels=[[0, 0, 1, 1], [0, 2, 1, 0]],
names=[u'a', u'b'])
In[37]: df_ri = df.reindex_axis([1,2,3,4], level='b', axis=0)
In[39]: df_ri
Out[39]:
c
a b
1 2 3
4 2
2 3 4
2 1
In[40]: df_ri.index
Out[40]:
MultiIndex(levels=[[1, 2], [1, 2, 3, 4]], #all new values are stored here but are not visible in df
labels=[[0, 0, 1, 1], [1, 3, 2, 1]],
names=[u'a', u'b'])
At the end - df output is not changed. When I Look at new index - it has new values, but they are not shown. I can avoid that by creating new df with that new index and then merging old df with new one - but it is not the best approach. Any suggestions?

Categories