Concat dataframes on different columns - python

I have 3 different csv files and I'm looking for concat the values. The only condition I need is that the first csv dataframe must be in column A of the new csv, the second csv dataframe in the column B and the Thirth csv dataframe in the C Column. The quantity of rows is the same for all csv files.
Also I need to change the three headers to ['año_pasado','mes_pasado','este_mes']
import pandas as pd
df = pd.read_csv('año_pasado_subastas2.csv', sep=',')
df1 = pd.read_csv('mes_pasado_subastas2.csv', sep=',')
df2 = pd.read_csv('este_mes_subastas2.csv', sep=',')
df1
>>>
Subastas
166665859
237944547
260106086
276599496
251813654
223790056
179340698
177500866
239884764
234813107
df2
>>>
Subastas
212003586
161813617
172179313
209185016
203804433
198207783
179410798
156375658
130228140
124964988
df3
>>>
Subastas
142552750
227514418
222635042
216263925
196209965
140984000
139712089
215588302
229478041
222211457
The output that I need is:
año_pasado,mes_pasado,este_mes
166665859,124964988,142552750
237944547,161813617,227514418
260106086,172179313,222635042
276599496,209185016,216263925
251813654,203804433,196209965
223790056,198207783,140984000
179340698,179410798,139712089
177500866,156375658,215588302
239884764,130228140,229478041
234813107,124964988,222211457

I think you need concat of Series created by squeeze=True if one column data only or selecting columns and for new columns names use parameter keys:
df = pd.read_csv('año_pasado_subastas2.csv', squeeze=True)
df1 = pd.read_csv('mes_pasado_subastas2.csv', squeeze=True)
df2 = pd.read_csv('este_mes_subastas2.csv', squeeze=True)
cols = ['año_pasado','mes_pasado','este_mes']
df = pd.concat([df, df1, df2], keys = cols, axis=1)
Or:
df = pd.read_csv('año_pasado_subastas2.csv')
df1 = pd.read_csv('mes_pasado_subastas2.csv')
df2 = pd.read_csv('este_mes_subastas2.csv')
cols = ['año_pasado','mes_pasado','este_mes']
df = pd.concat([df['Subastas'], df1['Subastas'], df2['Subastas']], keys = cols, axis=1)
print (df)
año_pasado mes_pasado este_mes
0 166665859 212003586 142552750
1 237944547 161813617 227514418
2 260106086 172179313 222635042
3 276599496 209185016 216263925
4 251813654 203804433 196209965
5 223790056 198207783 140984000
6 179340698 179410798 139712089
7 177500866 156375658 215588302
8 239884764 130228140 229478041
9 234813107 124964988 222211457

Related

Pandas: modify multiple dataframes (in a loop)

I have multiple data frames that I want to do the same function for them. therefore I need to iterate over my frameworks.
# read text files
df1 = pd.read_csv("df1.txt", sep="\t", error_bad_lines=False, index_col =None)
df2 = pd.read_csv("df2.txt", sep="\t", error_bad_lines=False, index_col =None)
df3 = pd.read_csv("df3.txt", sep="\t", error_bad_lines=False, index_col =None)
I have used the following code, however, it is not working (it means that all dataframes are still the same, and the changes do not affect them):
for df in [df1 , df2 , df3]:
df = df[df["Time"]>= 600.0].reset_index(drop=True)
df.head()
How I can iterate over them? and how can I overwrite dataframes?
The problem is that you're not changing the data frames in place, but rather creating new ones. Here's a piece of code that changes things in-place. I don't have your data, so I create fake data for the sake of this example:
df1 = pd.DataFrame(range(10))
df2 = pd.DataFrame(range(20))
df3 = pd.DataFrame(range(30))
df_list = [df1, df2, df3]
for df in df_list:
# use whatever condition you need in the following line
# for example, df.drop(df[df["Time"] < 600].index, inplace=True)
# in your case.
df.drop(df[df[0] % 2 == 0].index, inplace=True)
df.reset_index(inplace = True)
print(df2) # for example
The result for df2 is:
index 0
0 1 1
1 3 3
2 5 5
3 7 7
4 9 9
5 11 11
6 13 13
7 15 15
8 17 17
9 19 19
This might work:
df_list=[df1,df2,df3]
for df in range(len(df_list)):
df=df_list[i]
df_list[i]=df[df["Time"]>=600.0].reset_iundex(drop=True)
If you just store the new df to another list or same list you are all good.
newdf_list = [] # create new list to store df
for df in [df1 , df2 , df3]:
df = df[df["Time"]>= 600.0].reset_index(drop=True)
df.head()
newdf_list.append(df) # append changed df to new list

concat by taking the values from column

i have a list ['df1', 'df2'] where i have stores some dataframes which have been filtered on few conditions. Then i have converted this list to dataframe using
df = pd.DataFrame(list1)
now the df has only one column
0
df1
df2
sometimes it may also have
0
df1
df2
df3
i wanted to concate all these my static code is
df_new = pd.concat([df1,df2],axis=1) or
df_new = pd.concat([df1,df2,df3],axis=1)
how can i make it dynamic (without me specifying as df1,df2) so that it takes the values and concat it.
Using array to add the lists and data frames :
import pandas as pd
lists = [[1,2,3],[4,5,6]]
arr = []
for l in lists:
new_df = pd.DataFrame(l)
arr.append(new_df)
df = pd.concat(arr,axis=1)
df
Result :
0 0
0 1 4
1 2 5
2 3 6

Issues with append, merge and join for 3 different dataframe outputs from pandas with 1 index

I have 10000 data that I'm sorting into a dictionary and then exporting that to a csv using pandas. I'm sorting temperatures, pressures and flow associated with a key. But when doing this I find: https://imgur.com/a/aNX7RHf
but I want something like this:https://imgur.com/a/ZxJgPv4
I'm transposing my dataframe so the index can be rows but in this case I want only 3 rows 1,2, & 3, and all the data populate those rows.
flow_dictionary = {'200:P1F1':[5.5, 5.5, 5.5]}
pres_dictionary = {'200:PT02':[200,200,200],
'200:PT03':[200,200,200],
'200:PT06':[66,66,66],
'200:PT07':[66,66,66]}
temp_dictionary = {'200:TE02':[27,27,27],
'200:TE03':[79,79,79],
'200:TE06':[113,113,113],
'200:TE07':[32,32,32]}
df = pd.DataFrame.from_dict(temp_dictionary, orient='index').T
df2 = pd.DataFrame.from_dict(pres_dictionary, orient='index').T
df3 = pd.DataFrame.from_dict(flow_dictionary, orient='index').T
df = df.append(df2, ignore_index=False, sort=True)
df = df.append(df3, ignore_index=False, sort=True)
df.to_csv('processedSegmentedData.csv')
SOLUTION:
df1 = pd.DataFrame.from_dict(temp_dictionary, orient='index').T
df2 = pd.DataFrame.from_dict(pres_dictionary, orient='index').T
df3 = pd.DataFrame.from_dict(flow_dictionary, orient='index').T
df4 = pd.concat([df1,df2,df3], axis=1)
df4.to_csv('processedSegmentedData.csv')

Mapping to dataframes based on one column

I have a dataframe (df1) of 5 columns (a,b,c,d,e) with 6 rows and another dataframe (df2) with 2 columns (a,z) with 20000 rows.
How do I map and merge those dataframes using ('a') value.
So that df1 having 5 columns should map values in df2 having 2 columns with 'a' value and return a new df which has 6 columns (5 from df1 and 1 mapped row in df2) with 6 rows.
By using pd.concat:
import pandas as pd
import numpy as np
columns_df1 = ['a','b','c','d']
columns_df2 = ['a','z']
data_df1 = [['abc','def','ghi','xyz'],['abc2','def2','ghi2','xyz2'],['abc3','def3','ghi3','xyz3'],['abc4','def4','ghi4','xyz4']]
data_df2 = [['a','z'],['a2','z2']]
df_1 = pd.DataFrame(data_df1, columns=columns_df1)
df_2 = pd.DataFrame(data_df2, columns=columns_df2)
print(df_1)
print(df_2)
frames = [df_1, df_2]
print (pd.concat(frames))
OUTPUT:
Edit:
To replace NaN values you could use pandas.DataFrame.fillna:
print (pd.concat(frames).fillna("NULL"))
Replcae NULL with anything you want e.g. 0
OUTPUT:

Nested merges in pandas with suffixes

I'm trying to merge multiple dataframes in pandas and keep the column labels straight in the resulting dataframe. Here's my test case:
import pandas as pd
df1 = pd.DataFrame(data = [[1,1],[3,1],[5,1]], columns = ['key','val'])
df2 = pd.DataFrame(data = [[1,2],[3,2],[7,2]], columns = ['key','val'])
df3 = pd.DataFrame(data = [[1,3],[2,3],[4,3]], columns = ['key','val'])
df = pd.merge(pd.merge(df1,df2,on='key', suffixes=['_1','_2']),df3,on='key',suffixes=[None,'_3'])
I'm getting this:
df =
key val_1 val_2 val
0 1 1 2 3
I'd like to see this:
df =
key val_1 val_2 val_3
0 1 1 2 3
The last pair of suffixes that I've specified is: [None,'_3'], the logic being that the pair ['_1','_2'] has created unique column names for the previous merge.
The suffix is needed only when the merged dataframe has two columns with same name. When you merge df3, your dataframe has column names val_1 and val_2 so there is no overlap.
You can handle that by renaming val to val_3 like this
df = df1.merge(df2, on = 'key', suffixes=['_1','_2']).merge(df3, on = 'key').rename(columns = {'val': 'val_3'})
you have to try this on
df = pd.merge(pd.merge(df1,df2,on='key', suffixes=[None,'_2']),df3,on='key',suffixes=['_1,'_3'])
it's work for me

Categories