How to generate multiple dataframes from a dictionary? - python

I have a dictionary like the following
dict = {“df_text":df1, “df_logo":df2, “df_person":df3}
Each of the values in the dictionary is a dataframe.
Yet my actual dictionary is larger, so I want to make a loop that generate multiple dataframes from all of the components of this dict. In a way that “key” be the name of the dataframe and the corresponding value the dataframe’s value.
ex.
df_text=pd.DataFrame(df1)
How can I do this?

You can add the contents of your dict as variables to vars:
for k, v in dict.items():
vars()[k] = v
After that you can access them simply as df_text, df_logo etc.
(as you wrote in your question, the values of your dict are already dataframe, so I assume you don't want to wrap them once more into a dataframe)

Related

Pandas: get values from columns

I have following dataframe in pandas:
US46434V7617 US3160928731 US4642865251
2021-07-20 13.741297 53.793367 104.151499
How can I convert this to a dict with as keys the columns and as values the values of the columns. For example:
[{'US46434V7617': 13.741297048948578, 'US3160928731': 53.7933674972021, 'US4642865251': 104.15149908700006}]
You can use df.to_dict with orient='records':
df.to_dict(orient='records')
This will give you a list of dictionaries, one for each row. By the way, the structure you provided in the question is not valid, it must be a list of dictionaries or a dictionary with key:value pairs

How to convert dictionary of DataFrames into individual DataFrames (Python, Pandas)

I have an original dataframe with 4 columns (for the example lets call them product_id, year_month, week, order_amount) and > 50,000 rows. There are 240 individual product_id values and each one of them behaves differently in the data, therefore I wanted to create individual dataframes from the original one based on individual product_id. I was able to do this by utilizing:
dict_of_productid = {k: v for k, v in df.groupby('product_id)}
this created a dictionary with the key being the product_id and the values being the columns: product_id, year_month, week, order_amount. Each item in the dictionary also maintained the index from the original df. for example: if product_id = dvvd56 was on row# 4035 then on the dictionary it will be on the dataframe created for product_id dvvd56 but with the index still being 4035.
What I'm stuck with now is a dictionary with df's as values but can't find a way to convert these values into individual dataframes I can use and manipulate. If there is a way to do this please let me know! I'll be very grateful. thank you
I found a way to go about this, but I dont know if this is the most appropriate way, but it might help for further answers in order to clarify what I want to do.
First step was to convert the unique values into a list and then sorting them in order:
product_id_list = df['product_id'].value_counts().index.to_list()
product_id_list = sorted(product_id_list)
After this was done I created a formula and then iterated over it with the individual values of the product_id_list:
def get_df(key):
for k in key:
df_productid = dict_of_productid[k]
return df_productid
for c, i in enumerate(product_id_list):
globals()[f'df_{c}'] = get_df([f'{i}'])
this allows me now to separate all the values of the dictionary that was created into separate dataframes that I can call without explicitly stating the product id. I can just do df_1 and get the dataframe.
(I dont know if this is the most efficient way to go about this)

How to Append a Column to a Dataframe which is inside a Dictionary

`dict = {k:v for k,v in df.groupby(['Year','Month','Day'])} this is my first dictionary. dataframe_sampledictI'm working on a data task. I'm producing some anomaly scores and I'm assigning these scores to empty dictionaries. Also I have more than one dataframe in another dictionary.
I want to append the new column that I've produced as a result of anomaly detection and I want to append this column to each dataframe in my dictionary. Is there any way to do this?
Thanks.
belows are mine empty dictionaries
result_anomaly_clustering={}
result_hbos_scoring={}
result_mad_based_outlier={}
for key in list(df_dict):
result_anomaly_clustering[key]=anomaly_clustering(2,df_dict[key])
result_hbos_scoring[key]=hbos_scoring (0.05,5,df_dict[key])
result_mad_based_outlier[key]=mad_based_outlier(3.5,df_dict[key])
Above code is my anomaly algorithm functions that I'm calling
I found that the main problem is because of indexing in the functions. When I fixed the indexing in the functions, the problem was solved.
Thanks for your help.
# sample dictionary with anomaly scores
anomalies = {1: 2, 2:8}
# sample dictionary with dataframes
data_frames = {1: pd.DataFrame(data={'a': [4,4]}), 2:pd.DataFrame(data={'b':[6,6,7]})}
# iterate over key, value pairs of the dictionary of dataframes
# and create column score taken from anomalies dict by key
# with key matching anomalies dict key.
for k,v in data_frames.items():
v['score'] = anomalies[k]

Dataframe iteration better practices for values assignment [duplicate]

This question already has answers here:
Pandas DataFrame to List of Dictionaries
(5 answers)
Closed 4 years ago.
I was wondering how to make cleaner code, so I started to pay attention to some of my daily code routines. I frequently have to iterate over a dataframe to update a list of dicts:
foo = []
for index, row in df.iterrows():
bar = {}
bar['foobar0'] = row['foobar0']
bar['foobar1'] = row['foobar1']
foo.append(bar)
I think it is hard to maintain, because if df keys are changed, then the loop will not work. Besides that, write same index for two data structures is kind of code duplication.
The context is, I frequently make api calls to a specific endpoint that receives a list of dicts.
I'm looking for improviments for that routine, so how can I change index assignments to some map and lambda tricks, in order to avoid errors caused by key changes in a given dataframe(frequently resulted from some query in database)?
In other words, If a column name in database is changed, the dataframe keys will change too, So I'd like to create a dict on the fly with same keys of a given dataframe and fill each dict entry with dataframe corresponding values.
How can I do that?
The simple way to do this is to_dict, which takes an orient argument that you can use to specify how you want the result structured.
In particular, orient='records' gives you a list of records, each one a dict in {col1name: col1value, col2name: col2value, ...} format.
(Your question is a bit confusing. At the very end, you say, "I'd like to create a dict on the fly with same keys of a given dataframe and fill each dict entry with dataframe corresponding values." This makes it sound like you want a dict of lists (that's to_dict(orient='list') or maybe a dict of dicts (that's to_dict(orient='dict')—or just to_dict(), because that's the default), not a list of dicts.
If you want to know how to do this manually (which you don't want to actually do, but it's worth understanding): a DataFrame acts like a dict, with the column names as the keys and the Series as the values. So you can get a list of the column names the same way you do with a normal dict:
columns = list(df)
Then:
foo = []
for index, row in df.iterrows():
bar = {}
for key in keys:
bar[key] = row[key]
foo.append(bar)
Or, more compactly:
foo = [{key: row[key] for key in keys} for _, row in df.iterrows()}]

Updating excel rows with data in the form of python dict_items

I have a list of dictionaries which have value which needs to be updated to an excel sheet with corresponding column headers ,
new=[{"slno":"1","region":"2","customer":"3"}]
I am not sure about data types in python as I am a beginner,
All I want to do is update an excel sheet with the data from the above dict using a for loop. I always end up with a unordered data,
In the excel file there are column headers with the name exactly that of the Key of the dict so I was hoping to insert the respecting value in the excel column.
Note: I was able to write it to excel using a for loop but dict was giving random numbers so the values were messed up when updated on sheet.
xfile = openpyxl.load_workbook('D:\\LoginLibrary\\test.xlsx')
sheet = xfile.get_sheet_by_name('OE')
charcounter="A"
i=i
for key in g:
sheet[charcounter+str(i)]=key
charcounter = (chr(ord(charcounter[0]) + 1))
xfile.save('D:\\LoginLibrary\\test.xlsx')
One of the difficulties of dictionaries is that when you iterate over it as a loop, the keys can be in any order. However, something you can do is get the whole list of keys, then sort that list. For example:
xfile = openpyxl.load_workbook('D:\\LoginLibrary\\test.xlsx')
sheet = xfile.get_sheet_by_name('OE')
charcounter="A"
i=i
new = {"slno":"1","region":"2","customer":"3"} #The outer brackets made it a list, unneeded
print(sorted(new.keys())) #Prints out all the keys in alphabetical order
list_of_sorted_keys = sorted(new.keys())
for key in list_of_sorted_keys:
sheet[charcounter+str(i)]=key
charcounter = (chr(ord(charcounter[0]) + 1))
xfile.save('D:\\LoginLibrary\\test.xlsx')
Note: I don't know much about writing to excel, so I'm assuming that you have that part right. My additions just modify the dictionary so that it is organized.
If alphabetical order for the keys doesn't do your job, you can order by the values as well, although it's more difficult to get your keys from values because dictionaries aren't supposed to work that way.
Another way could be to just make the original data set as a list of tuples, as so:
new=[("slno","1"),("region","2"),("customer","3")]
That will keep all your data in order that you put it in the list, because lists are accessed by integer indices.
I hope one of these ideas meets your needs!

Categories