No changes to original dataframe after applying loop - python

I have a list of dataframes such that
df_lst = [df1, df2]
I also created a function which removes the rows with '0' in the dataframe:
def dropzeros(df):
newdf = df[df['x']!=0.0]
return newdf
I tried applying this through a loop and placed an assignment variable within the loop, but the original dataframe remained unchanged even after running the loop.
for df in df_lst:
df = dropzeros(df)
I also tried using list comprehensions to go about it
df_lst = [dropzeros(df) for df in df_lst]
I know the function works since when i apply print(len(df)) before and after the command dropzeros(df) there was a drop in the len, however, may I know how might I go about this problem such that my original dataframe is altered after running the loop?

That's because the variable df in your for loop does not reference a value in your list. You are creating a variable df afresh each iteration of your loop.
You can assign via enumerate and pipe your function:
for idx, df in enumerate(df_lst):
df_lst[idx] = df.pipe(dropzeros)

Related

Yield multiple empty dataframes by a function in Python

I would like to yield multiple empty dataframes by a function in Python.
import pandas as pd
df_list = []
def create_multiple_df(num):
for i in range(num):
df = pd.DataFrame()
df_name = "df_" + str(num)
exec(df_name + " = df ")
df_list.append(eval(df_name))
for i in df_list:
yield i
e.g. when I create_multiple_df(3), I would like to have df_1, df_2 and df_3 returned.
However, it didn't work.
I have two questions,
How to store multiple dataframes in a list (i.e. without evaluating the contents of the dataframes)?
How to yield multiple variable elements from a list?
Thanks!
It's very likely that you do not want to have df_1, df_2, df_3 ... etc. This is often a design pursued by beginners for some reason, but trust me that a dictionary or simply a list will do the trick without the need to hold different variables.
Here, it sounds like you simply want a list comprehension:
dfs = [pd.DataFrame() for _ in range(n)]
This will create n empty dataframes and store them in a list. To retrieve or modify them, you can simply access their position. This means instead of having a dataframe saved in a variable df_1, you can have that in the list dfs and use dfs[1] to get/edit it.
Another option is a dictionary comprehension:
dfs = {i: pd.DataFrame() for i in range(n)}
It works in a similar fashion, you can access it by dfs[0] or dfs[1] (or even have real names, e.g. {f'{genre}': pd.DataFrame() for genre in ['romance', 'action', 'thriller']}. Here, you could do dfs['romance'] or dfs['thriller'] to retrieve the corresponding df).

Python - Function that have a dataframe to store the results from others functions

I'm trying a function that, maybe is very simple, that I want to store the results from other functions and on the end print all the results (Like a logger function).
For that I've the following code:
import pandas as pd
def append_rows(id, result):
df = pd.DataFrame([])
df = df.append(pd.DataFrame(
{'id': id,
'result': result}, index=[0]), ignore_index=False, sort=False)
return df
def calculator_1():
for i in range(5):
print(append_rows(i,'Draft' + i+1))
def calculator_1():
for i in range(2):
print(append_rows(i,'Draft' + 1))
print(append_rows('', ''))
My expected result is:
1,Draft2
2,Draft3
3,Draft4
4,Draft5
5,Draft6
1,Draft1
2,Draft1
But the actual result is (:
"",""
My requirement is to have the a unique function to store the results from others functions, instead of have multiple dataframes from each functions and at the end concatenate all of them into one.
Anyone knows how can I do that?
Thanks!
With the current append_rows function as is you are creating a new dataframe in each iteration. It's not entirely clear what you want to achieve, but I imagine you could be interested in adding new rows to your dataframe in each iteration?
In that case I would reccommend the following steps:
create a dataframe outside of a function
create a list_of_lists outside of the function
add each newly created list from your loop to the list of lists
append the list of lists to the dataframe after the loop
if you are simply interested in creating a log of your iterations then I don't see why you would need a dataframe at all, you can simply print the values in a loop.

How to cycle through a list of pandas dataframes

I'm trying to work out the correct method for cycling through a number of pandas dataframes using a 'for loop'. All of them contain 'year' columns from 1960 to 2016, and from each df I want to remove the columns '1960' to '1995'.
I created a list of dfs and also a list of str values for the years.
dflist = [apass,rtrack,gdp,pop]
dfnewlist =[]
for i in range(1960, 1996):
dfnewlist.append(str(i))
for df in dflist:
df = df.drop(dfnewlist, axis = 1)
My for loop runs without error, but it does not remove the columns.
Edit - Just to add, when I do this manually without the for loop, such as below, it works fine:
gdp = gdp.drop(dfnewlist, axis = 1)
This is a common issues for people in for loops. When you say
for df in dflist:
and then change df, the changes do not happen to the actual object in the list, just to df
use enumerate to fix
for i,df in enumerate(dflist):
dflist[i]=df.drop(dfnewlist,axis=1)
To ensure some robustness, you can us the errors='ignore' flag just in case one of the columns doesn't exist, the drop won't error out.
However, your real problem is that when you loop, df starts by referring to the thing in the list. But then you overwrite the name df by assigning to that name the results of df.drop(dfnewlist, axis=1). This does not replace the dataframe in your list as you'd hoped but creates a new name df that no longer points to the item in the list.
Instead, you can use the inplace=True flag.
drop_these = [*map(str, range(1960, 1996)]
for df in dflist:
df.drop(drop_these, axis=1, errors='ignore', inplace=True)

append to empty dataframe iteratively

I have a for loop. At each iteration a dataframe is created. I want this dataframe to be appended to an overall result dataframe.
Currently I tried to do it with this code:
resultDf = pd.DataFrame()
for name in list:
iterationresult = calculatesomething(name)
resultDf.append(iterationresult)
print(resultDf)
However, the resultDf is empty.
How can this be done?
UPDATE
I think changing
resultDf.append(iterationresult)
to
resultDf = resultDf.append(iterationresult)
does the trick
Not iterative, but how about simply:
df = pd.DataFrame([calculatesomething(name) for name in list])
This is much more straightforward, and faster as well.
Another idiomatic idea could be to do this:
df = pd.DataFrame(list, columns = ["name"])
df["calc"] = df.name.map(calculatesomething)
By the way, it's a bad practice to call a list list, because it will shadow the builtin type.

List Comprehension Over Pandas Dataframe Rows

I can't understand why this snippet of code:
df = PA.DataFrame()
[df.append(aFunction(x)) for x in aPandaSeries]
does not give me the same DataFrame (df) as:
df = PA.DataFrame()
for x in xrange(len(aPandaSeries)):
df = df.append(aFunction(aPandaSeries[x]))
I am trying to pythonise the second section by using the first section, but df has far fewer rows in the former than the latter.
A couple of things...
.append() method returns None. So df = df.append() will set df to None value.
List comprehensions are useful to filter or process a list of values, so you generally wouldn't use .append() with a list comprehension. It makes more sense to rewrite the 2nd line in first snippet as:
for x in aPandaSeries:
df.append(aFunction(x))

Categories