This question already has answers here:
How can you dynamically create variables? [duplicate]
(8 answers)
Closed 5 years ago.
I have a function that I made to analyze experimental data (all individual .txt files)
This function outputs a dictionary ({}) of Pandas Dataframes
Is there a efficient way to iterate over this dictionary and output individual dataframes?
Let's say my dictionary is called analysisdict
for key in analysisdict.keys():
dfx=pd.concat([analysisdict[key]['X'], analysisdict[key]['Y']], axis=1)
Where dfx would be an individual dataframe. (I'm guessing a second loop might be required? Perhaps I should iterate through a list of df names?)
The output would be df1...dfn
EDIT: I initially misread your question, and thought you wanted to concatenate all the DataFrames into one. This does that:
dfx = pd.concat([df for df in analysisdict.values()], ignore_index=True)
(Thanks to #paul-h for the ignore_index=True tip)
I read your question more carefully and realized that you're asking how to assign each DataFrame in your dictionary to its own variable, resulting in separate DataFrames named df1, df2, ..., dfn. Everything in my experience says that dynamically creating variables in this way is an anti-pattern, and best left to dictionaries. Check out the discussion here: How can you dynamically create variables via a while loop?
Related
This question already has answers here:
How to avoid pandas creating an index in a saved csv
(6 answers)
Closed 5 months ago.
I'm using the pandas split function to create new columns from an existing one. All of that works fine and I get my expected columns created. The issue is that it is creating an additional column in the exported csv file. So, it says there are 3 columns when there are actually 4.
I've tried various functions to drop that column, but it isn't recognized as part of the data frame so it can't be successfully removed.
Hopefully someone has had this issue and can offer a possible solution.
[example of the csv data frame output with the unnecessary column added]
The column A doesn't come from split but it's the index of your actual dataframe by default. You can change that by setting index=False in df.to_csv:
df.to_csv('{PATH}.csv', index=False)
This question already has answers here:
How can repetitive rows of data be collected in a single row in pandas?
(3 answers)
pandas group by and find first non null value for all columns
(3 answers)
Closed 7 months ago.
While using iterrows to implement the logic takes lot of time.Can some suggest a way on how I could optimize the code with vectorized/apply()
Below is the input table..From a partition of (ITEMSALE,ITEMID),I need to populate rows with rank=1 .If any column value is null in rank=1,I need to populate the next available value in that column.This has to be done for all columns in dataset.
Below is the output format expected
I have tried below logic using iterrows where am accessing values rowise.Performance is too low using this method.
This should get you what you need
df.loc[df.loc[df['Item_ID'].isna()].groupby('Item_Sale')['Date'].idxmin()]
This question already has answers here:
What's the most efficient way to export multiple pandas dataframes to csv files?
(3 answers)
Closed 1 year ago.
I am new to python. I am sure there is simple way to do this but I am struggling a bit.
I have 100 dataframes with names Lens1,Lens2, ..., Lens100.
I want to write each dataframe to a csv file.
Lens1.to_csv(path+"lens 1.csv", index=False) This command for Lens2 and Lens2.csv... and so on till Lens100 save as Lens100.csv. so a 100 times...
I have tried the following:
for key,j in range(101):
x='Lens%s'%(j)
x.to_csv(path+x+".csv")
It does not seem to work and the error is
'str' object has no attribute 'to_csv'.
Any help will be much appreciated.
You are getting this error because the x in your for loop are not dataframes, but strings. You cannot call a dataframe by adding a str to its name; Python will consider it as a string.
You can store the dataframes in a list dataframes first, then store the names of the dataframes in a list name. Then, proceed using the following codes.
for j in range(101):
x = dataframes[j]
x.to_csv(path + name[j] + ".csv", index = False)
This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
I am fairly new to using Python and having come from using SQL I have been using PANDAS to build reports from CSV files with reasonable success. I have been able to answer most of questions thanks mainly to this site, but I dont seem to be able to find an answer to my question:
I have a dataframe which has 2 columns I want to be able to group on the first column and display the lowest and highest alphabetical values from the second column concatenated into a third column. I could do this fairly easy in SQL but as I say I am struggling getting my head around it in Python/Pandas
example:
source data:
LINK_NAME, CITY_NAME
Linka, Citya
Linka, Cityz
Linkb,Cityx
Linkb,Cityc
Desired output:
LINK_NAME,LINKID
Linka, CityaCityz
Linkb,CitycCityx
Edit:
Sorry for missing part of your question.
To sort the strings within each group alphabetically, you could define a function to apply to the grouped items:
def first_and_last_alpha(series):
sorted_series = series.sort_values()
return "".join([sorted_series.iloc[0], sorted_series.iloc[-1]])
df.groupby("LINK_NAME")["CITY_NAME"].apply(first_and_last_alpha)
Original:
Your question seems to be a duplicate of this one.
The same effect, with your data, is achieved by:
df.groupby("LINK_NAME")["CITY_NAME"].apply(lambda x: "".join(x))
where df is your pandas.Dataframe object
In future, it's good to provide a reproducible example, including anything you've attempted before posting. For example, the output from df.to_dict() would allow me to recreate your example data instantly.
This question already has answers here:
Copy Pandas DataFrame using '=' trick [duplicate]
(2 answers)
Closed 4 years ago.
When i assign dataframe to another dataframe, making changes to one dataframe affects another dataframe
Code:
interest_margin_data = initial_margin_data
interest_margin_data['spanReq'] = (interest_margin_data['spanReq']*interest_margin_data['currency'].map(interestrate_dict))/(360*100*interest_margin_data['currency'].map(currency_dict))
initial_margin_data['spanReq'] /= initial_margin_data['currency'].map(currency_dict)
The second line changes the values in initial_margin_data as well.
Why is this so? How to affect this?
Use .copy to create a separate dataframe in memory:
interest_margin_data = initial_margin_data.copy()
It creates a different object in memory, rather than just pointing to the same place.
This is done so if you create a "view" of the dataframe it does not require substantially extra memory. It can index it, and calculate using the source.
In your case however you do not want this.