I have two dataframes with identical structures df and df_a. df_a is a subset of df that I need to reintegrate into df. Essentially, df_a has various rows (with varying indices) from df that have been manipulated.
Below is an example of indices of each df and df_a. These both have the same column structure so all the columns are the same, it's only the rows and idex of the rows that differ.
>> df
index .. other_columns ..
0
1
2
3
. .
9999
10000
10001
[10001 rows x 20 columns]
>> df_a
index .. other_columns ..
5
12
105
712
. .
9824
9901
9997
[782 rows x 20 columns]
So, I want to overwrite only the rows in df that have the indices of df_a with the corresponding rows in df_a. I checked out Replace rows in a Pandas df with rows from another df and replace rows in a pandas data frame but neither of those tell how to use the indices of another dataframe to replace the values in the rows.
Something along the lines of:
df.loc[df_a.index, :] = df_a[:]
I don't know if this wants you meant, for that you would need to be more specific, but if the first data frame was modified to be a new data frame with different indexes, then you can use this code to reset back the indexes:
import pandas as pd
df_a = pd.DataFrame({'a':[1,2,3,4],'b':[5,4,2,7]}, index=[2,55,62,74])
df_a.reset_index(inplace=True, drop=True)
print(df_a)
PRINTS:
a b
0 1 5
1 2 4
2 3 2
3 4 7
Related
I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5
I am trying to fill a dataframe, containing repeated elements, based on an identifier.
My Dataframe is as follows:
Code Value
0 SJHV
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
4 CPO3 22A
5 SJHV 33C #< -- Numbers stored as strings
6 TOY
7 TOY #< -- These aren't NaN, they are empty strings
I would like to remove the empty 'Value' rows only if a non-empty 'Value' row exists. To be clear, I would want my output to look like:
Code Value
0 SJHV 33C
1 SJIO 96B
2 CPO3 22A
3 TOY
My attempt was as follows:
df['Value'].replace('', np.nan, inplace=True)
df2 = df.dropna(subset=['Value']).drop_duplicates('Code')
As expected, this code also drops the 'TOY' Code. Any suggestions?
The empty strings should go to the bottom if you sort them, then you can just drop duplicates.
import pandas as pd
df = pd.DataFrame({'Code':['SJHV','SJIO','SJHV','CPO3','CPO3','SJHV','TOY','TOY'],'Value':['','96B','33C','22A','22A','33C','','']})
df = (
df.sort_values(by=['Value'], ascending=False)
.drop_duplicates(subset=['Code'], keep='first')
.sort_index()
)
Output
Code Value
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
6 TOY
I have 2 df
In first df there are 3 columns, 10 rows, 3rd column is output column
In second Df there are 3 columns 1000 rows
If my first df 2 column matches with 2 columns of second df then 3rd column from first df has to append second df.
both df is below
df1
,A,B,output
1,abc,CCE,out1
2,def,CCE,out2
3,ghi,CCE,out3
4,hij,CCE,out4
5,klm,,out5
df2
,A,B
1,abc,CCE
2,def,CCE
3,lmn,CCE
4,opq,CCE
5,abc,CCE
6,klm,,
df2_expected
1,abc,CCE,out1
2,def,CCE,out2
3,lmn,CCE,
4,opq,CCE,
5,abc,CCE,out1
6,klm,,out5
As example i am giving 3 column actually in first df its n column and df2 its n-1 column means output column wont present in df2
Please try this
import pandas as pd
data1={'nu':[1,2,3,4,5], 'A':['abc','def','ghi','hij','klm'], 'B':['CCE','CCE','CCE','CCE','CCE'], 'output':['out1','out2','out3','out4','out5',]}
data2={'nu':[1,2,3,4,5], 'A':['abc','def','lmn','opq','abc'], 'B':['CCE','CCE','CCE','CCE','CCE'], 'output':[]}
df1=pd.DataFrame(data1,columns=['A','B','output'], index=data1['nu'])
df2=pd.DataFrame(data2,columns=['A','B'], index=data2['nu'])
df2.merge(df1, on=['A','B'],how='left').fillna('')
A B output
0 abc CCE out1
1 def CCE out2
2 lmn CCE
3 opq CCE
4 abc CCE out1
I have a csv file that I get from a specific software. In the csv file there are 196 rows, each row has a different amount of values. The values are seperated by a semicolon.
I want to have all values of the dataframe in one column, how to do it?
dftest = pd.read_csv("test.csv", sep=';', header=None)
dftest
0
0 14,0;14,0;13,9;13,9;13,8;14,0;13,9;13,9;13,8;1...
1 14,0;14,0;13,9;14,0;14,0;13,9;14,0;14,0;13,8;1...
2 13,8;13,9;14,0;13,9;13,9;14,6;14,0;14,0;13,9;1...
3 14,5;14,4;14,2;14,1;13,9;14,1;14,1;14,2;14,1;1...
4 14,1;14,0;14,1;14,2;14,0;14,3;13,9;14,2;13,7;1...
5 14,5;14,1;14,1;14,1;14,5;14,1;13,9;14,0;14,1;1...
6 14,1;14,7;14,0;13,9;14,2;13,8;13,8;13,9;14,8;1...
7 14,7;13,9;14,2;14,7;15,0;14,5;14,0;14,3;14,0;1...
8 13,9;13,8;15,1;14,1;13,8;14,3;14,1;14,8;14,0;1...
9 15,0;14,4;14,4;13,7;15,0;13,8;14,1;15,0;15,0;1...
10 14,3;13,8;13,9;14,8;14,3;14,0;14,5;14,1;14,0;1...
11 14,5;15,5;14,0;14,1;14,0;13,8;14,2;14,0;15,9;1...
The output looks like this, I want to have all values in one column
I would like to make it look like this:
0 14,0
1 14,0
2 13,9
.
.
.
If there is only one column 0 with values splitted by ; use Series.str.split with DataFrame.stack:
df = dftest[0].str.split(';', expand=True).stack().reset_index(drop=True)
you can also use numpy ravel and convert this to 1D Array.
df = pd.read_csv("test.csv", sep=';', header=None)
df = pd.DataFrame(df.values.ravel(), columns=['Name'])
I have an index in a pandas dataframe which repeats the index value. I want to re-index as multi-index where repeated indexes are grouped.
The indexing looks like such:
so I would like all the 112335586 index values would be grouped under the same in index.
I have looked at this question Create pandas dataframe by repeating one row with new multiindex but here the value can be index can be pre-defined but this is not possible as my dataframe is far too large to hard code this.
I also looked at at the multi-index documentation but this also pre-defines the value for the index.
I believe you need:
s = pd.Series([1,2,3,4], index=[10,10,20,20])
s.index.name = 'EVENT_ID'
print (s)
EVENT_ID
10 1
10 2
20 3
20 4
dtype: int64
s1 = s.index.to_series()
s2 = s1.groupby(s1).cumcount()
s.index = [s.index, s2]
print (s)
EVENT_ID
10 0 1
1 2
20 0 3
1 4
dtype: int64
Try this:
df.reset_index(inplace=True)
df['sub_idx'] = df.groupby('EVENT_ID').cumcount()
df.set_index(['EVENT_ID','sub_idx'], inplace=True)