Tried my best looking for a similar answer, but didn't seem to find the necessary one.
I have a dictionary of dataframe objects, where the key is the dataframe name, and the value is the actual dataframe
table_names_dict = {'name_1': dataframe_1, 'name_2': dataframe_2}
I am trying to loop over the dictionary and dynamically create separate dataframes, using the keys as their names:
name_1 = dataframe_1
name_2 = dataframe_2
I tried something of the sort
for key, value in table_names_dict.items():
key = value
This simply created one dataframe named value
I've also tried
locals().update(table_names_dict)
Which did create the necessary variables, but they are not accessible in Spyders variable explorer, and from what I've read, the use of locals() is frowned upon.
What am I doing wrong?
You can use globals() for this:
for i in table_names_dict:
globals()[i]=table_names_dict[i]
Related
If I have a dataframe df and want to access the unique values of ID, I can do something like this.
UniqueFactor = df.ID.unique()
But how can I convert this into a function in order to access different variables? I tried this, but it doesn't work
def separate_by_factor(df, factor):
# Separating the readings by given factor
UniqueFactor = df.factor.unique()
separate_by_factor('ID')
And it shouldn't, because I'm passing a string as a variable name. How can I get around this?
I don't know how I can better word the question, sorry for being too vague.
When you create a DataFrame, every column that is a valid identifier it's treated as an attribute. To access a column based on its name (like in your example), you need to use df[factor].unique().
I am trying to create multiple dataframes inside a for loop using the below code:
for i in range(len(columns)):
f'df_v{i+1}' = df.pivot(index="no", columns=list1[i], values=list2[i])
But I get the error "Cannot assign to literal". Not sure whether there is a way to create the dataframes dynamically in pandas?
This syntax
f'df_v{i+1}' = df.pivot(index="no", columns=list1[i], values=list2[i])
means that you are trying to assign DataFrames to a string, which is not possible. You might try using a dictionary, instead:
my_dfs = {}
for i in range(len(columns)):
my_dfs[f'df_v{i+1}'] = df.pivot(index="no", columns=list1[i], values=list2[i])
Since it allows the use of named keys, which seems like what you want. This way you can access your dataframes using my_dfs['df_v1'], for example.
I have a dictionary like this.
The inside of dictionary is ...
When I clicked the {top:'Dataframe',rising:'Dataframe'}, i accessed to two different Dataframes which are top and rising.
My question is that How can i access these dataframes directly?
I tried to use dict to dataframe examples. They did not work. Any help would be appreciated.
Use dictionary access operator [] twice
related_queries_dict['Big Data']['rising']
I don't have any code right now, but if possible, where would i start after inputting my csv file?
Maybe there's an easier way of doing this, but once i can assign each cell its own variable i'd like to use pyad to validate if the variable is disabled or enabled in python against active directory
Decided to turn my comment into an answer.
You need to "assign every row in that column a separate variable", hence I assume you want every cell in that column to be its own separate variable
I am not exactly sure how you use pyad, but logically if you need to access each row by name, I would recommend using a Python dictionary to accomplish this task.
Before beginning to read through your CSV file, create an empty dictionary; as you read through your column, create new entries in the dictionary with key the variable name you wanted, and value the value of the cell. You can then access the values regularly but instead of variable_name you need to use dict_name["variable_name"].
I want to read and prepare data from a excel spreadsheet containing many sheets with data.
I first read the data from the excel file using pd.read_excel with sheetname=None so that all the sheets can be written into the price_data object.
price_data = pd.read_excel('price_data.xlsx', sheetname=None)
This gives me an OrderedDict object with 5 dataframes.
Afterwards I need to obtain the different dataframes which compose the object price_data. I thought of using a for iteration for this, which gives me the opportunity to do other needed iterative operations such as setting the index of the dataframes.
This is the approach I tried
for key, df in price_data.items():
df.set_index('DeliveryStart', inplace=True)
key = df
With this code I would expect that each dataframe would be written into an object named by the key iterator, and at the end I would have as many dataframes as those inside my original data_price object. However I end up with two identical dataframes, one named key and one named value.
Suggestions?
Reason for current behaviour:
In your example, the variables key and df will be created (if not already existing) and overwritten in each iteration of the loop. In each iteration, you are setting key to point towards the object df (which also remains set in df, as Python allows multiple pointers to the same object). However, the key object is then overwritten in the next loop and set to the new value of df. At the end of the loop, the variables will remain in their last state.
To illustrate:
from collections import OrderedDict
od = OrderedDict()
od["first"] = "foo"
od["second"] = "bar"
# I've added an extra layer of `enumerate` just to display the loop progress.
# This isn't required in your actual code.
for loop, (key, val) in enumerate(od.items()):
print("Iteration: {}".format(loop))
print(key, val)
key = val
print(key,val)
print("Final output:", key, val)
Output:
Iteration: 0
first foo
foo foo
Iteration: 1
second bar
bar bar
Final output: bar bar
Solution:
It looks like you want to dynamically set the variables to be named the same as the value of key, which isn't considered a good idea (even though it can be done). See Dynamically set local variable for more discussion.
It sounds like a dict, or OrderedDict is actually a good format for you to store the DataFrames alongside the name of the sheet it originated from. Essentially, you have a container with the named attributes you want to use. You can then iterate over the items to do work like concatenation, filtering or similar.
If there's a different reason you wanted the DataFrames to be in standalone objects, leave a comment and I will try and make a follow-up suggestion.
If you are happy to set index of the DataFrames in-place, you could try this:
for key in price_data:
price_data[key].set_index('DeliveryStart', inplace=True)