Apply function to multiple dataframes and create multiple dataframes from that - python

I have a list of multiple data frames on cryptocurrency. So I want to apply a function to all of these data frames, which should convert all the data frames so that I am only left with data from 2021.
The function looks like this:
dataframe_list = [bitcoin, have, binance, Cardano, chainlink, cosmos, crypto com, dogecoin, eos, Ethereum, iota, litecoin, monero, nem, Polkadot, Solana, stellar, tether, uni swap, usdcoin, wrapped, xrp]
def date_func(i):
i['Date'] = pd.to_datetime(i['Date'])
i = i.set_index(i['Date'])
i = i.sort_index()
i = i['2021-01-01':]
return(i)
for dataframe in dataframe_list:
dataframe = date_func(dataframe)
However, I am only left with one data frame called 'dataframe', which only contains values of the xrp dataframe.
I would like to have a new dataframe from each dataframe, called aave21, bitcoin21 .... which only contains values from 2021 onwards.
What am I doing wrong?
Best regards and thanks in advance.

You are overwriting dataframe when iterating over dataframe_list, i.e. you only keep the latest dataframe.
You can either try:
dataframe = pd.DataFrame()
for df in dataframe_list:
dataframe.append(date_func(df))
Or shorter:
dataframe = pd.concat([data_func(df) for df in dataframe_list])

You are overwriting dataframe variable in your for loop when iterating over dataframe_list. You need to keep appending results into a new variable.
final_df = pd.DataFrame()
for dataframe in dataframe_list:
final_df.append(date_func(dataframe))
print(final_df)

Related

What is the best way to combine dataframes that have been created through a for loop?

I am trying to combine dataframes with 2 columns into a single dataframe. The initial dataframes are generated through a for loop and stored in a list. I am having trouble getting the data from the list of dataframes into a single dataframe. Right now when I run my code, it treats each full dataframe as a row.
def linear_reg_function(category):
df = pd.read_csv(file)
df = df[df['category_column'] == category]`
df1 = df[['category_column', 'value_column']]
df_export.append(df1)
df_export = []
for category in category_list:
linear_reg_function(category)
when I run this block of code I get a list of dataframes that have 2 columns. When I try to convert df_export to a dataframe, it ends up with 12 rows (the number of categories in category_list). I tried:
df_export = pd.DataFrame()
but the result was:
_
I would like to have a single dataframe with 2 columns, [Category, Value] that includes the values of all 12 categories generated in the for loop.
You can use pd.concat to merge a list of DataFrames into a single big DataFrame.
appended_data = []
for infile in glob.glob("*.xlsx"):
data = pandas.read_excel(infile)
# store DataFrame in list
appended_data.append(data)
# see pd.concat documentation for more info
appended_data = pd.concat(appended_data)
# write DataFrame to an excel sheet
appended_data.to_excel('appended.xlsx')
you can manipulate it to your proper demande

Summary Row for a pd.DataFrame with multiindex

I have a multiIndex dataframe created with pandas similar to this one:
nest = {'A1': dfx[['aa','bb','cc']],
'B1':dfx[['dd']],
'C1':dfx[['ee', 'ff']]}
reform = {(outerKey, innerKey): values for outerKey, innerDict in nest.items() for innerKey, values in innerDict.items()}
dfzx = pd.DataFrame(reform)
What I am trying to achieve is to add a new row at the end of the dataframe that contains a summary of the total for the three categories represented by the new index (A1, B1, C1).
I have tried with df.loc (what I would normally use in this case) but I get error. Similarly for iloc.
a1sum = dfzx['A1'].sum().to_list()
a1sum = sum(a1sum)
b1sum = dfzx['B1'].sum().to_list()
b1sum = sum(b1sum)
c1sum = dfzx['C1'].sum().to_list()
c1sum = sum(c1sum)
totalcat = a1sum, b1sum, c1sum
newrow = ['Total', totalcat]
newrow
dfzx.loc[len(dfzx)] = newrow
ValueError: cannot set a row with mismatched columns
#Alternatively
newrow2 = ['Total', a1sum, b1sum, c1sum]
newrow2
dfzx.loc[len(dfzx)] = newrow2
ValueError: cannot set a row with mismatched columns
How can I fix the mistake? Or else is there any other function that would allow me to proceed?
Note: the DF is destined to be moved on an Excel file (I use ExcelWriter).
The type of results I want to achieve in the end is this one (gray row "SUM"
I came up with a sort of solution on my own.
I created a separate DataFrame in Pandas that contains the summary.
I used ExcelWriter to have both dataframes on the same excel worksheet.
Technically It would be then possible to style and format data in Excel (xlsxwriter or framestyle seem to be popular modules to do so). Alternatively one should be doing that manually.

Create mutliple dataframes by changing one column with a for loop?

I am calculating heat decay from spent fuel rods using variable cooling times. How can I create multiple dataframes, by varying the cooling time column with a for loop, then write them to a file?
Using datetime objects, I am creating multiple columns of cooling time values by subtracting a future date from the date the fuel rod was discharged.
I then tried to use a for loop to index these columns into a new dataframe, with the intent to streamline multiple files by using newly created dataframes in a new function.
df = pd.read_excel('data')
df.columns = ['ID','Enr','Dis','Mtu']
# Discharge Dates
_0 = dt.datetime(2020,12,1)
_1 = dt.datetime(2021,6,1)
_2 = dt.datetime(2021,12,1)
_3 = dt.datetime(2022,6,1)
# Variable Cooling Time Columns
df['Ct_0[Years]'] = df['Dis'].apply(lambda x: (((_0 - x).days)/365))
df['Ct_1[Years]'] = df['Dis'].apply(lambda x: (((_1 - x).days)/365))
df['Ct_2[Years]'] = df['Dis'].apply(lambda x: (((_2 - x).days)/365))
df['Ct_3[Years]'] = df['Dis'].apply(lambda x: (((_3 - x).days)/365))
# Attempting to index columns into new data frame
for i in range(4):
df = df[['ID','Mtu','Enr','Ct_%i[Years]'%i]]
tfile = open('Inventory_FA_%s.prn'%i,'w')
### Apply conditions for flagging
tfile.close()
I was expecting the created cooling time columns to be indexed into the newly defined dataframe df. Instead I received the following error;
KeyError: "['Ct_1[Years]'] not in index"
Thank you for the help.
You are overwriting your dataframe in each iteration of your loop with the line:
df = df[['ID','Mtu','Enr','Ct_%i[Years]'%i]]
which is why you are fine on your first iteration (error doesn't say anything about 'Ct_0[Years]' not being in the index), and then die on your second iteration. You've dropped everything but the columns you selected in your first iteration. Select your columns into a temporary df instead:
for i in range(4):
df_temp = df[['ID','Mtu','Enr','Ct_%i[Years]'%i]]
tfile = open('Inventory_FA_%s.prn'%i,'w')
### Apply conditions for flagging using df_temp
tfile.close()
Depending on what your conditions are, there might be a better way to do this that doesn't require making a temporary view into the dataframe, but this should help.
Why are you creating a new dataframe? is it only to reorganize/drop columns?.Engineero is right you are effectively rewriting df on each iteration.
Anyways you could try:
dfnew = pd.Dataframe()
dfnew = df[['ID','Mtu','Enr']]
for i in range(4):
dftemp = df[['Ct_%i[Years]'%i]]
dfnew.join(dftemp)

How to loop over multiple Pandas DataFrames using the filter function?

So I have multiple data frames that I am attempting to loop over.
I have created a list using the following code:
data_list = [df1, df2, df3]
After that I would like to filter out a predefined range of numbers in the column 'Firm_Code' in each data frame.
So far, I am able to filter out firms with a respective code between 6000 and 6999 for a single data frame as follows:
FFirms = range(6000,7000)
Non_FFirms = [b for b in df1['Firm_Code'] if b not in FFirms]
df1 = df1.loc[df1['Firm_Code'].isin(Non_FFirms)]
Now I would like to loop over the data_list. My first try looks like the following:
for i in data_list:
i = i.loc[i.Firm_Code.isin(Non_FFirms)]
Appreciate any suggestions!
Instead of making the list of dataframes, you can concat all the data frames into a single dataframe.
data_df = pd.concat([df1,df2,df3],ignore_index=True)
In case you need identification from which dataframe you have fetched the value you can add a new column say 'Df_number'.
using data_df you can you can filter the data
FFirms = range(6000,7000)
Non_FFirms = [b for b in df1['Firm_Code'] if b not in FFirms]
filtered_data_df = data_df.loc[data_df['Firm_Code'].isin(Non_FFirms)]

Adding columns to a dataframe from another dataframe shows NaN instead of values

I have a two dictionaries of pandas data frames. I am looping through the dictionaries and adding columns from one data frame to the other. Finally, I concatenate all the updated data frames into a single data frame but the columns which were added or updated show NaN instead of values. Here is how I am doing it. Thanks
old_id_header = list(l[2][ID].columns)
new_id_header = list(new_dfs_dict[ID].columns)
for i in range(2, len(new_id_header)):
new_dfs_dict[ID][new_id_header[i]] = new_dfs_dict[ID][new_id_header[i]].astype(float)
l[2][ID][new_id_header[i]] = new_dfs_dict[ID][new_id_header[i]]
Then I am concatenating this way.
ready_to_set_df = pd.DataFrame()
for ID in l[2]:
ready_to_set_df = pd.concat([ready_to_set_df, l[2][ID]], sort=False)

Categories