Create single pandas dataframe from a list of dataframes - python

I have a list of about 25 dfs and all of the columns are the same. Although the row count is different, I am only interested in the first row of each df.
How can I iterate through the list of dfs, copy the first row from each and concatenate them all into a single df?

Select first row by position with DataFrame.iloc and [[0]] for one row DataFrames and join together by concat:
df = pd.concat([x.iloc[[0]] for x in dfs], ignore_index=True)
Or use DataFrame.head for one row DataFrames:
df = pd.concat([x.head(1) for x in dfs], ignore_index=True)

Related

How to drop columns, do operation on remaining columns, then insert back the dropped columns?

I have a large dataset. I want to apply something on all the columns except for 2.
I dropped the 2 columns and created a separate dataframe, then tried merging the dataframes after the operation is applied.
I tried appending, merging, joining the two dataframes but they all created duplicate rows. Appending doubled the row count, and changed the dropped columns.
I just want to add back the 2 columns to the initial dataframe unchanged. Any help?
df= col1 col2 col3... col100
1 2 3 100
df2=df.loc[:,['col2', 'col3']]
df.drop(columns=['col2', 'col3'], inplace=True)
Then do what I needed to do to df.
Now I want to merge df and df2.
Like this:
cols = ['col2', 'col3']
df2 = df[cols]
df.drop(columns=cols, inplace=True)
# do something
df = pd.concat([df, df2], axis=1)
This will work as long as you didn't remove rows from either dataframes or changed their order

How to multiply matching rows from two dataframes

I have two dataframes that looks like this
df1:
df2:
So the thing is that I want to multiply the ratio column of df1 with the columns Total, Hombres, Mujeres in df2 when it the column of 'Estado' matches with the column of 'Entidad Federativa in both tables', and when it stops matching it goes to the second row and does the same with the matching columns. Anyone has any idea on his? I would appreciate it a lot.
Use DataFrame.div with level=0 for match first level of MultiIndex in df2 and index values of Estado:
df1 = df1.rename(index={'Aguascal':'Aguascalientes'})
#if necessary
#df1 = df1.set_index('Estado')
df2 = (df2.replace(',','', regex=True)
.astype(int)
.set_index(['Entidad Federativa','Grupo quinquenal de edad']))
df = df2.div(df1['Ratio'], level=0, axis=0)

Pandas merge two data frame only to first occurrence

I have two dataframe, I am able to merge by pd.merge(df1, df2, on='column_name'). But I only want to merge on first occurrence in df1 Any pointer or solution? It's a many to one, and I only want the first occurrence merged. Thanks in advance!
Since you want to merge two dataframes of different lengths, you'll have to have NaN values in the merged dataframe cells where there are no corresponding indices in df2. So let's try this. Merge left. This will duplicate df2 values for duplicated column_name rows in df1. Have a mask ready to filter those rows and assign NaN for them in the columns from df2.
mask = df1['column_name'].duplicated()
new_df = df1.merge(df2, how='left', on='column_name')
new_df.loc[mask, df2.columns[df2.columns!='column_name']] = np.nan

Efficient way to drop a row from Dataframe A if an element equals an element in Dataframe B

I have a column in Dataframe B that contains elements I wish to drop from Dataframe A, should A contain them. I wish to drop the entire row from A.
I'm not new to programming but I am learning the extensive pandas library. From what I've seen, this can't be in any way efficient or proper.
for i in range(0,106):
for j in range(0,171):
if dfB.iloc[i,2] == dfA.iloc[j,0]:
dfA.drop(j, inplace=True)
IIUC:
dfA = dfA.loc[~dfA["ColumnNameInA"].isin(dfB["ColumnNameInB"])]
You would need to substitute the appropriate column names.
In this case, dfA["ColumnNameInA"].isin(dfB["ColumnNameInB"]) creates a series that is True whenever the value in dfA's column is in dfB's column. We pass that to .loc, and reassign to dfA.
This should also work:
df = df[df['A'] == df2['B']]
Assumption: df and df2 are the same lengths, and you are comparing row x from df to row x from df2.
Example Dataset:
df = pd.DataFrame({'A': [1,2,3]})
df2 = pd.DataFrame({'B': [1,4,3]})
Output:
df
A
0 1
2 3

How to select dataframes that only have 6 columns from a list

I have a list of dataframes varying in columns. I want to select only the dataframes that have specifically 6 columns only and concatenate.
df=pd.DataFrame(list, columns=['0'])
This gives me a column from every dataframe in the list..
IIUC
l = [df for df in dfs if df.columns.size == 6]
Then
pd.concat(l)

Categories