Python iterate through multiple dataframes - python

I am trying to rename columns in multiple dataframes and convert those columns to an integer. This is the code I have:
def clean_col(df,col_name):
df.reset_index(inplace=True)
df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
df[col_name]=df[col_name].apply(lambda x: int(x))
I have a dictionary of the dataframe names and the new name of the columns:
d = {
all_df: "all",
coal_df: "coal",
liquids_df: "liquids",
coke_df: "coke",
natural_gas_df: "natural_gas",
nuclear_df: "nuclear",
hydro_electricity_df: "hydro",
wind_df: "wind",
utility_solar_df: "utility_solar",
geothermal_df: "geo_thermal",
wood_biomass_df: "biomass_wood",
biomass_other_df: "biomass_other",
other_df: "other",
solar_all_df: "all_solar",
}
for i, (key, value) in enumerate(d.items()):
clean_col(key, value)
And this is the error I am getting:
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
Any help would be appreciated

You are on the right track by using a dictionary to link your old and new column names. If you loop through your list of dataframes; then loop through your new column names dictionary, that will work.
df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df2 = pd.DataFrame({"A": [1, 2, 3], "D": [4, 5, 6], "F": [4, 5, 6]})
all_dfs = [df1, df2]
display(df1)
display(df2)
d = {
"A": "aaaaa",
"D": "ddddd",
}
for df in all_dfs:
for col in d:
if col in df.columns:
df.rename(columns={col: d.get(col)}, inplace=True)
display(df1)
display(df2)

Using globals (or locals).
import pandas as pd
import io
data1 = '''id,name
1,A
2,B
3,C
4,D
'''
data2 = '''id,name
1,W
2,X
3,Y
4,Z
'''
df1 = pd.read_csv(io.StringIO(data1))
df2 = pd.read_csv(io.StringIO(data2))
def clean_function(dfname, col_name):
df = globals()[dfname] # also see locals()
df.rename(columns={df.columns[0]:'NewID', df.columns[1]: col_name},inplace=True)
return df
mydict = { 'df1': 'NewName', 'df2': 'AnotherName'}
for k,v in mydict.items():
df = clean_function(k,v)
print(df)
Output:
NewID NewName
0 1 A
1 2 B
2 3 C
3 4 D
NewID AnotherName
0 1 W
1 2 X
2 3 Y
3 4 Z

I just created two different lists and then iterated through a list of the dataframes and a list of the new column names
def clean_col(df,col_name):
df.reset_index(inplace=True)
df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
df[col_name]=df[col_name].apply(lambda x: int(x))
list_df=[all_df, coal_df, liquids_df, coke_df, natural_gas_df, nuclear_df, hydro_electricity_df, wind_df, utility_solar_df, geothermal_df, wood_biomass_df, biomass_other_df, other_df, solar_all_df]
list_col=['total', 'coal' , 'liquids' , 'coke' , 'natural_gas', 'nuclear', 'hydro','wind','utility_solar', 'geo_thermal', 'biomass_wood', 'biomass_other', 'other','all_solar']
for a,b in zip(list_df,list_col):
clean_col(a,b)

Related

Match column to another column containing array

I have very junior question in python - i have a dataframe with a column containing some IDs and separate dataframe that contains 2 columns, out of which 1 is an array:
df1 = pd.DataFrame({"some_id": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame([["A", [1, 2]], ["B", [3, 4]], ["C", [5]]], columns=['letter', 'some_ids'])
I want to add do df1 new column "letter' that for a given "some_id" will look up df2, check if this id is in df2['some_ids'] and return df2['letter']
I tried this:
df1['letter'] = df2[df1[some_id].isin(df2['some_ids')].letter
and get NaNs - any suggestion where I make mistake?
Create dictionary with flatten nested lists in dict comprehension and then use Series.map:
d = {x: a for a,b in zip(df2['letter'], df2['some_ids']) for x in b}
df1['letter'] = df1['some_id'].map(d)
Or mapping by Series created by DataFrame.explode with DataFrame.set_index:
df1['letter'] = df1['some_id'].map(df2.explode('some_ids').set_index('some_ids')['letter'])
Or use left join with rename column:
df1 = df1.merge(df2.explode('some_ids').rename(columns={'some_ids':'some_id'}), how='left')
print (df1)
some_id letter
0 1 A
1 2 A
2 3 B
3 4 B
4 5 C

How can I use values from a Pandas data frame in a calcul?

I have a Pandas data frame. In the A columns there are ints like [1, 5, 3], and in the B columns there are string like ["abcdef", "ghijklmno", "qwertyuiop"]
I want to create a C columns with the columns B first char according to the columns A. In my example I want the C columns to be like ["a", "ghijk", "qwe" ]
I tried:
data_frame['C'] = data_frame.B.str[:data_frame["A"]]
but it doesn't work.
You can set use a lambda function and set the axis = 1 to use column A to set the string length of B
df = pd.DataFrame({
'A' : [1, 5, 3,],
'B' : ["abcdef", "ghijklmno", "qwertyuiop"]
})
df['c'] = df.apply(lambda x : x['B'][:x['A']], axis = 1)
df
Maybe try looping over the columns:
df['C'] = [b[:a] for a, b in zip(df['A'], df['B'])]

Change column names of Pandas dataframes contained in a list

I have a list of Pandas dataframes:
df_list = [df1, df2, df3]
The dataframes have the same column names; let's call them "col1", "col2" and "col3".
How can I change the column names to "colnew1", "colnew2" and "colnew3", without using a loop?
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(columns={"A": "a", "B": "c"})
a c
0 1 4
1 2 5
2 3 6
This is taken right from the pandas website.

How to break/pop a nested Dictionary inside a list, inside a pandas dataframe?

I have a dataframe which has a dictionary inside a nested list and i want to split the column 'C' :
A B C
1 a [ {"id":2,"Col":{"x":3,"y":4}}]
2 b [ {"id":5,"Col":{"x":6,"y":7}}]
expected output :
A B C_id Col_x Col_y
1 a 2 3 4
2 b 5 6 7
From the comments, json_normalize might help you.
After extracting id and col columns with:
df[["Col", "id"]] = df["C"].apply(lambda x: pd.Series(x[0]))
You can explode the dictionary in Col with json_normalize and use concat to merge with existing dataframe:
df = pd.concat([df, json_normalize(df.Col)], axis=1)
Also, use drop to remove old columns.
Full code:
# Import modules
import pandas as pd
from pandas.io.json import json_normalize
# from flatten_json import flatten
# Create dataframe
df = pd.DataFrame([[1, "a", [ {"id":2,"Col":{"x":3,"y":4}}]],
[2, "b", [ {"id":5,"Col":{"x":6,"y":7}}]]],
columns=["A", "B", "C"])
# Add col and id column + remove old "C" column
df = pd.concat([df, df["C"].apply(lambda x: pd.Series(x[0]))], axis=1) \
.drop("C", axis=1)
print(df)
# A B Col id
# 0 1 a {'x': 3, 'y': 4} 2
# 1 2 b {'x': 6, 'y': 7} 5
# Show json_normalize behavior
print(json_normalize(df.Col))
# x y
# 0 3 4
# 1 6 7
# Explode dict in "col" column + remove "Col" colun
df = pd.concat([df, json_normalize(df.Col)], axis=1) \
.drop(["Col"], axis=1)
print(df)
# A B id x y
# 0 1 a 2 3 4
# 1 2 b 5 6 7
You can try .apply method
df['C_id'] = df['C'].apply(lambda x: x[0]['id'])
df['C_x'] = df['C'].apply(lambda x: x[0]['Col']['x'])
df['C_y'] = df['C'].apply(lambda x: x[0]['Col']['y'])
Code
import pandas as pd
A = [1, 2]
B = ['a', 'b']
C = [{"id":2,"Col":{"x":3,"y":4}}, {"id":5,"Col":{"x":6,"y":7}}]
df = pd.DataFrame({"A": A, "B": B, "C_id": [element["id"] for element in C],
"Col_x": [element["Col"]["x"] for element in C],
"Col_y": [element["Col"]["y"] for element in C]})
Ouput:

Build a dataframe from a dict with specified labels from a txt

i want to make a dataframe with defined labels. Dont know how to tell panda to take the labels from the list. Hope someone can help
import numpy as np
import pandas as pd
df = []
thislist = []
thislist = ["A","D"]
thisdict = {
"A": [1, 2, 3],
"B": [4, 5, 6],
"C": [7, 8, 9],
"D": [7, 8, 9]
}
df = pd.DataFrame(data= thisdict[thislist]) # <- here is my problem
I want to get this:
df = A D
1 7
2 8
3 9
Use:
df = pd.DataFrame(thisdict)[thislist]
print(df)
A D
0 1 7
1 2 8
2 3 9
We could also use DataFrame.drop
df = pd.DataFrame(thisdict).drop(columns = ['B','C'])
or DataFrame.reindex
df = pd.DataFrame(thisdict).reindex(columns = thislist)
or DataFrame.filter
df = pd.DataFrame(thisdict).filter(items=thislist)
We can also use filter to filter thisdict.items()
df = pd.DataFrame(dict(filter(lambda item: item[0] in thislist, thisdict.items())))
print(df)
A D
0 1 7
1 2 8
2 3 9
I think this answer is completed with the solution of #anky_91
Finally, I recommend you see how to index
IIUC, use .loc[] with the dataframe constructor:
df = pd.DataFrame(thisdict).loc[:,thislist]
print(df)
A D
0 1 7
1 2 8
2 3 9
Use a dict comprehension to create a new dictionary that is a subset of your original so you only construct the DataFrame you care about.
pd.DataFrame({x: thisdict[x] for x in thislist})
A D
0 1 7
1 2 8
2 3 9
If you want to deal with the possibility of missing Keys, add some logic so it's similar to reindex
pd.DataFrame({x: thisdict[x] if x in thisdict.keys() else np.NaN for x in thislist})
df = pd.DataFrame(thisdict)
df[['A', 'D']]
another alternative for your input:
thislist = ["A","D"]
thisdict = {
"A": [1, 2, 3],
"B": [4, 5, 6],
"C": [7, 8, 9],
"D": [7, 8, 9]
}
df = pd.DataFrame(thisdict)
and than simply remove your columns not in thelist (you can do it directly from the df or aggregate them):
remove_columns = []
for c in df.columns:
if c not in thislist:
remove_columns.append(c)
and remove it:
df.drop(columns=remove_columns, inplace=True)

Categories