Copy Pandas DataFrame using '=' trick [duplicate] - python

This question already has an answer here:
pandas dataframe, copy by value
(1 answer)
Closed 5 years ago.
I have two pandas DataFrames, sdm. I wanted to create a copy of that DataFrame and work on that and later, I want to create another copy from sdm and work on different analysis. However, when I create a new Data Frame like this,
new_df = sdm
It creates a copy, however, when I alter new_df, it makes changes to the my old DataFrame sdm. How can I handle this without using =?

What python does is passing by reference. Try this:
new_df = sdm.copy()
I think you should have search more, I am sure there will be lots of questions on this topic!

you need to use new_df = sdm.copy() instead which is described here in the official documentation. new_df = sdm doesn't work because this assignement operation performs a copy by reference and not by value which means in nutshell, both new_df and sdm will reference the same data in memory.

Related

Python Pandas .str.split() creates an extra column that can't be dropped [duplicate]

This question already has answers here:
How to avoid pandas creating an index in a saved csv
(6 answers)
Closed 5 months ago.
I'm using the pandas split function to create new columns from an existing one. All of that works fine and I get my expected columns created. The issue is that it is creating an additional column in the exported csv file. So, it says there are 3 columns when there are actually 4.
I've tried various functions to drop that column, but it isn't recognized as part of the data frame so it can't be successfully removed.
Hopefully someone has had this issue and can offer a possible solution.
[example of the csv data frame output with the unnecessary column added]
The column A doesn't come from split but it's the index of your actual dataframe by default. You can change that by setting index=False in df.to_csv:
df.to_csv('{PATH}.csv', index=False)

How to transpose a dataframe in pandas [duplicate]

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 7 months ago.
I have a table in csv that looks like this:
I want to tranpose it to look like this, where the columns are nmow rows of a new column called ACCOUNTLABEL, and the values are in a corresponding column called VALUE:
Any help? thanks!
You might want to look at pandas.melt function : https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I wouldn't call that a 'transposition' but 'un-pivoting' a table.
Edit: I just noticed that your question has nothing to do with transposing a DataFrame, but I will leave this here, in case it helps.
Use df.T for this. It uses the linked method.
I didn't downvote your question, but someone did because the provided link is the first search result if you google 'transpose pandas dataframe'.

Can`t merge Pandas Dataframe [duplicate]

This question already has answers here:
Append to Series in python/pandas not working
(2 answers)
Closed 2 years ago.
I have around 8 .csv files in a given directory. When I am running this code, getting empty dataframe (new_df which I have specified.).
I have already seen how to use concat function to get the job done but just wondering what i am doing wrong in my approach since i read documentation on DataFrame.append() and it should have worked.
path = Path("/content/Sales_data/")
new_df = pd.DataFrame()
for file in path.glob("*.csv"):
df = pd.read_csv(file)
new_df.append(df, ignore_index=True)
new_df
Appreciate any recommendation.
Try setting new_df to the DataFrame with appended data:
new_df = new_df.append(df, ignore_index=True)
The problem with your code is due to the fact that append returns a new object, it does not modify the existing DataFrame in place

Assigning dataframe to dataframe in Pandas Python [duplicate]

This question already has answers here:
Copy Pandas DataFrame using '=' trick [duplicate]
(2 answers)
Closed 4 years ago.
When i assign dataframe to another dataframe, making changes to one dataframe affects another dataframe
Code:
interest_margin_data = initial_margin_data
interest_margin_data['spanReq'] = (interest_margin_data['spanReq']*interest_margin_data['currency'].map(interestrate_dict))/(360*100*interest_margin_data['currency'].map(currency_dict))
initial_margin_data['spanReq'] /= initial_margin_data['currency'].map(currency_dict)
The second line changes the values in initial_margin_data as well.
Why is this so? How to affect this?
Use .copy to create a separate dataframe in memory:
interest_margin_data = initial_margin_data.copy()
It creates a different object in memory, rather than just pointing to the same place.
This is done so if you create a "view" of the dataframe it does not require substantially extra memory. It can index it, and calculate using the source.
In your case however you do not want this.

Pandas: Dictionary of Dataframes [duplicate]

This question already has answers here:
How can you dynamically create variables? [duplicate]
(8 answers)
Closed 5 years ago.
I have a function that I made to analyze experimental data (all individual .txt files)
This function outputs a dictionary ({}) of Pandas Dataframes
Is there a efficient way to iterate over this dictionary and output individual dataframes?
Let's say my dictionary is called analysisdict
for key in analysisdict.keys():
dfx=pd.concat([analysisdict[key]['X'], analysisdict[key]['Y']], axis=1)
Where dfx would be an individual dataframe. (I'm guessing a second loop might be required? Perhaps I should iterate through a list of df names?)
The output would be df1...dfn
EDIT: I initially misread your question, and thought you wanted to concatenate all the DataFrames into one. This does that:
dfx = pd.concat([df for df in analysisdict.values()], ignore_index=True)
(Thanks to #paul-h for the ignore_index=True tip)
I read your question more carefully and realized that you're asking how to assign each DataFrame in your dictionary to its own variable, resulting in separate DataFrames named df1, df2, ..., dfn. Everything in my experience says that dynamically creating variables in this way is an anti-pattern, and best left to dictionaries. Check out the discussion here: How can you dynamically create variables via a while loop?

Categories