I have an original dataframe with 4 columns (for the example lets call them product_id, year_month, week, order_amount) and > 50,000 rows. There are 240 individual product_id values and each one of them behaves differently in the data, therefore I wanted to create individual dataframes from the original one based on individual product_id. I was able to do this by utilizing:
dict_of_productid = {k: v for k, v in df.groupby('product_id)}
this created a dictionary with the key being the product_id and the values being the columns: product_id, year_month, week, order_amount. Each item in the dictionary also maintained the index from the original df. for example: if product_id = dvvd56 was on row# 4035 then on the dictionary it will be on the dataframe created for product_id dvvd56 but with the index still being 4035.
What I'm stuck with now is a dictionary with df's as values but can't find a way to convert these values into individual dataframes I can use and manipulate. If there is a way to do this please let me know! I'll be very grateful. thank you
I found a way to go about this, but I dont know if this is the most appropriate way, but it might help for further answers in order to clarify what I want to do.
First step was to convert the unique values into a list and then sorting them in order:
product_id_list = df['product_id'].value_counts().index.to_list()
product_id_list = sorted(product_id_list)
After this was done I created a formula and then iterated over it with the individual values of the product_id_list:
def get_df(key):
for k in key:
df_productid = dict_of_productid[k]
return df_productid
for c, i in enumerate(product_id_list):
globals()[f'df_{c}'] = get_df([f'{i}'])
this allows me now to separate all the values of the dictionary that was created into separate dataframes that I can call without explicitly stating the product id. I can just do df_1 and get the dataframe.
(I dont know if this is the most efficient way to go about this)
Related
I'm new to python, so please forgive me if this is a stupid question.
I'm trying to separate a bigger dataset into smaller data frames based on a unique row value (station ID). I've done the following, which made a dict and did separate them into smaller data frames, but within this dict?
dfs = dict(list(df.groupby('Station')))
when I open it in Jupyter it only shows the station ID next to a number series (0-20).
is there a way to name these smaller data frames to the station ID? I'm used to R/tidyverse so there has to be a way to do this easily?
Thank you!
S
tried the following too:
dct = {}
for idx, v in enumerate(df['Station'].unique()):
dct[f'df{idx}'] = df.loc[df['Station'] == v]
print(dct)
but just names them df1, df2, df3, etc.
If you need a dict specifically, you can use
dfs = {name: group for name, group in df.groupby('Station')}
but that creates copies of data; try iterating over the groups (and names) directly with
for name, group in df.groupby('Station'):
# logic
I have a dictionary like the following
dict = {“df_text":df1, “df_logo":df2, “df_person":df3}
Each of the values in the dictionary is a dataframe.
Yet my actual dictionary is larger, so I want to make a loop that generate multiple dataframes from all of the components of this dict. In a way that “key” be the name of the dataframe and the corresponding value the dataframe’s value.
ex.
df_text=pd.DataFrame(df1)
How can I do this?
You can add the contents of your dict as variables to vars:
for k, v in dict.items():
vars()[k] = v
After that you can access them simply as df_text, df_logo etc.
(as you wrote in your question, the values of your dict are already dataframe, so I assume you don't want to wrap them once more into a dataframe)
I am currently working with dataframes in pandas. In sum, I have a dataframe called "Claims" filled with customer claims data, and I want to parse all the rows in the dataframe based on the unique values found in the field 'Part ID.' I would then like to take each set of rows and append it one at a time to an empty dataframe called "emptydf." This dataframe has the same column headings as the "Claims" dataframe. Since the values in the 'Part ID' column change from week to week, I would like to find some way to do this dynamically, rather than comb through the dataframe each week manually. I was thinking of somehow incorporating the df.where() expression and a For Loop, but am at a loss as to how to put it all together. Any insight into how to go about this, or even some better methods, would be great! The code I have thus far is divided into two steps as follows:
emptydf = Claims[0:0]
#Create empty dataframe
2.Parse_Claims = Claims.query('Part_ID == 1009')
emptydf = emptydf.append(Parse_Claims)
#Parse the dataframe by each unique Part ID number and append to empty dataframe. As you can see, I can only hard code one Part ID number at a time so far. This would take hours to complete manually, so I would love to figure out a way to iterate through the Part ID column and append the data dynamically.
Needless to say, I am super new to Python, so I definitely appreciate your patience in advance!
empty_df = list(Claims.groupby(Claims['Part_ID']))
this will create a list of tuples one for each part id. each tuple has 2 elements 1st is part id and 2nd is subset for that part id
This question already has answers here:
Pandas DataFrame to List of Dictionaries
(5 answers)
Closed 4 years ago.
I was wondering how to make cleaner code, so I started to pay attention to some of my daily code routines. I frequently have to iterate over a dataframe to update a list of dicts:
foo = []
for index, row in df.iterrows():
bar = {}
bar['foobar0'] = row['foobar0']
bar['foobar1'] = row['foobar1']
foo.append(bar)
I think it is hard to maintain, because if df keys are changed, then the loop will not work. Besides that, write same index for two data structures is kind of code duplication.
The context is, I frequently make api calls to a specific endpoint that receives a list of dicts.
I'm looking for improviments for that routine, so how can I change index assignments to some map and lambda tricks, in order to avoid errors caused by key changes in a given dataframe(frequently resulted from some query in database)?
In other words, If a column name in database is changed, the dataframe keys will change too, So I'd like to create a dict on the fly with same keys of a given dataframe and fill each dict entry with dataframe corresponding values.
How can I do that?
The simple way to do this is to_dict, which takes an orient argument that you can use to specify how you want the result structured.
In particular, orient='records' gives you a list of records, each one a dict in {col1name: col1value, col2name: col2value, ...} format.
(Your question is a bit confusing. At the very end, you say, "I'd like to create a dict on the fly with same keys of a given dataframe and fill each dict entry with dataframe corresponding values." This makes it sound like you want a dict of lists (that's to_dict(orient='list') or maybe a dict of dicts (that's to_dict(orient='dict')—or just to_dict(), because that's the default), not a list of dicts.
If you want to know how to do this manually (which you don't want to actually do, but it's worth understanding): a DataFrame acts like a dict, with the column names as the keys and the Series as the values. So you can get a list of the column names the same way you do with a normal dict:
columns = list(df)
Then:
foo = []
for index, row in df.iterrows():
bar = {}
for key in keys:
bar[key] = row[key]
foo.append(bar)
Or, more compactly:
foo = [{key: row[key] for key in keys} for _, row in df.iterrows()}]
I have a data frame something like this . One of the things I want to do get as the final outcome is a dataframe with ID and the sumof the values based on the keys.
I can get the all the keys in the App_colum by following:
apps = []
for index,row in sub.iterrows():
apps+=row["App_column"].keys()
apps = list(set(apps))
But now I need to do sum of each key accross all the rows.
and I'm not able to crack this.ANy help will be great.
Thanks