I am new to python and am trying to improve a code's readability as well as speed by removing the recurrent use of exec() and eval(). However it is not obvious to me how I need to alter to code to obtain this.
I want the program to make dataframes and arrays with names based on input. Let's say that the input is like this:
A=[Red, Blue]
B=[Banana, Apple]
C=[Pie, Cake]
Then the code will then make a dataframe with a name based on each combination of input:
Red_Banana_Pie, Red_Banana_Cake, Red_Apple_Pie, Red_Apple_Cake, etc. by looping through the three lists.
for color in A[0:len(A)]:
for fruit in B[0:len(B)]:
for type in C[0:len(C)]:
And then in each loop:
exec('DataFr_'+color+'_'+fruit+'_'+type+'=pd.DataFrame((Data),columns=[\'Title1\',\'Title2\'])')
How can I do this without the exec command?
When you run exec('DataFr_'+color+'_'+fruit+'_'+type+'=pd.DataFrame((Data),columns=[\'Title1\',\'Title2\'])'), you will get 8 DataFrames that has different names. But I don't recommend this because you have to use eval() every time you want to access to your DataFrame.(otherwise you can hardcode that, but that's really bad thing)
I think you need multi-dimensional dictionary for dataframe.
When the input is
A=["Red", "Blue"]
B=["Banana", "Apple"]
C=["Pie", "Cake"]
[+] additionally, you'll basically get user input as string in python.(like "hello, world!")
data_set = {}
for color in A:
data_set.update({color:{}})
for fruit in B:
data_set[color].update({fruit:{}})
for type in C:
data_set[color][fruit].update({type:pd.DataFrame((Data),columns=['Title1','Title2'])})
# I think you have some Data in other place, right?
[+] moreover, you can iterate List without [0:len(A)] in python.
Then you can use each DataFrame by data_set['Red']['Banana']['Cake'].(your implementation will be data_set[A[0]][B[0]][C[1]])
Then you can dynamically create DataFrame for each color, fruit, type without eval, and access to them without hardcoded value.
Related
I have dataframes that follow name syntax of 'df#' and I would like to be able to loop through these dataframes in a function. In the code below, if function "testing" is removed, the loop works as expected. When I add the function, it gets stuck on the "test" variable with keyerror = "iris1".
import statistics
iris1 = sns.load_dataset('iris')
iris2 = sns.load_dataset('iris')
def testing():
rows = []
for i in range(2):
test=vars()['iris'+str(i+1)]
rows.append([
statistics.mean(test['sepal_length']),
statistics.mean(test['sepal_width'])
])
testing()
The reason this will be valuable is because I am subsetting my dataframe df multiple times to create quick visualizations. So in Jupyter, I have one cell where I create visualizations off of df1,df2,df3. In the next cell, I overwrite df1,df2,df3 based on different subsetting rules. This is advantageous because I can quickly do this by calling a function each time, so the code stays quite uniform.
Store the datasets in a dictionary and pass that to the function.
import statistics
import seaborn as sns
datasets = {'iris1': sns.load_dataset('iris'), 'iris2': sns.load_dataset('iris')}
def testing(data):
rows = []
for i in range(1,3):
test=data[f'iris{i}']
rows.append([
statistics.mean(test['sepal_length']),
statistics.mean(test['sepal_width'])
])
testing(datasets)
No...
You should NEVER make a sentence like I have dataframes that follow name syntax of 'df#'
Then you have a list of dataframes, or a dict of dataframe, depending how you want to index them...
Here I would say a list
Then you can forget about vars(), trust me you don't need it... :)
EDIT :
And use list comprehensions, your code could hold in three lines :
import statistics
list_iris = [sns.load_dataset('iris'), sns.load_dataset('iris')]
rows = [
(statistics.mean(test['sepal_length']), statistics.mean(test['sepal_width']))
for test in list_iris
]
Storing as a list or dictionary allowed me to create the function. There is still a problem of the nubmer of dataframes in the list varies. It would be nice to be able to just input n argument specifying how many objects are in the list (I guess I could just add a bunch of if statements to define the list based off such an argument). **EDIT: Changing my code so that I don't use df# syntax, instead just putting it directly into a list
The problem I was experiencing is still perplexing. I can't for the life of me figure out why the "test" variable performs as expected outside of a function, but inside of a function it fails. I'm going to go the route of creating a list of dataframes, but am still curious to understand why it fails inside of the function.
I agree with #Icarwiz that it might not be the best way to go about it but you can make it work with.
test=eval('iris'+str(i+1))
I have a function that returns specific country-currency pairs that are used in the following step.
The return is something like:
lst_dolar = ['USA_dolar','Canada_dolar','Australia_dolar']
lst_eur = ['France_euro','Germany_euro','Italy_euro']
lst_pound=['England_pound','Scotland_pound','Wales_pound']
I then use a function that returns a dataframe.
One of the parameters of this function is country-currency pair and the other is the period, from a list of periods:
period_lst = ['1y','2y','3y','4y','5y']
What I would like to do is to then get a list of dataframes, that will be then saved, each single one of them, to a different table, using SQLite3.
My question is how do I apply my function to each element of the lists of country-currency pairs and for each element of the period_lst and then obtain differently named dataframes as a result?
Ex: USA_dolar_1y
I then would like to be able to take each one of these dataframes and saved them to a table, in a database, that has the same name as each dataframe.
Thank you!
Whenever you think you need to dynamically name variables in Python, you probably want a dictionary:
def my_func(df, period):
# do something with period and dataframe and return the result
return df
period_lst = ['1y', '2y', '3y', '4y', '5y']
usa_dollar= {}
for p in period_lst:
usa_dollar[p] = my_func(df, p)
You can then access the various resulting dataframes (or whatever your function returns) by their period:
use_data(usa_dollar['3y'])
By the way: don't use capitals in your variable names, you should reserve CamelCase for class names and write function names and variable names in lowercase, separated by underscores for readability. So, usa_dollar, not USAdollar, for example.
This helps editors spot problems in your code and makes the code easier to read for other programmers, as well as future you. Look up PEP8 for more of these style rules.
Another by the way: if the only reason you want to keep the resulting dataframes in separate variables is to then write them to a file, you could just write the dataframe to the file once you've created it, and reuse the variable for the next one, if you have no immediate other need for the data you're about to overwrite.
my code is on the bottom
"parse_xml" function can transfer a xml file to a df, for example, "df=parse_XML("example.xml", lst_level2_tags)" works
but as I want to save to several dfs so I want to have names like df_ first_level_tag, etc
when I run the bottom code, I get an error "f'df_{first_level_tag}'=parse_XML("example.xml", lst_level2_tags)
^
SyntaxError: can't assign to literal"
I also tried .format method instead of f-string but it also hasn't worked
there are at least 30 dfs to save and I don't want to do it one by one. always succeeded with f-string in Python outside pandas though
Is the problem here about f-string/format method or my code has other logic problem?
if necessary for you, the parse_xml function is directly from this link
the function definition
for first_level_tag in first_level_tags:
lst_level2_tags = []
for subchild in root[0]:
lst_level2_tags.append(subchild.tag)
f'df_{first_level_tag}'=parse_XML("example.xml", lst_level2_tags)
This seems like a situation where you'd be best served by putting them into a dictionary:
dfs = {}
for first_level_tag in first_level_tags:
lst_level2_tags = []
for subchild in root[0]:
lst_level2_tags.append(subchild.tag)
dfs[first_level_tag] = parse_XML("example.xml", lst_level2_tags)
There's nothing structurally wrong with your f-string, but you generally can't get dynamic variable names in Python without doing ugly things. In general, storing the values in a dictionary ends up being a much cleaner solution when you want something like that.
One advantage of working with them this way is that you can then just iterate over the dictionary later on if you want to do something to each of them. For example, if you wanted to write each of them to disk as a CSV with a name matching the tag, you could do something like:
for key, df in dfs.items():
df.to_csv(f'{key}.csv')
You can also just refer to them individually (so if there was a tag named a, you could refer to dfs['a'] to access it in your code later).
I have a dictionary created from a json file. This dictionary has a nested structure and every few weeks additional parameters are added.
I use a script to generate additional copies of the existing parameters when I want multiple "legs" added. So I first add the additional legs. So say I start with 1 leg as my template and I want 10 legs, I will just clone that leg 9 more times and add it to the list.
Then I loop through each of the parameters (called attributes) and have to clone certain elements for each leg that was added so that it has a 1:1 match. I don't care about the content so cloning the first leg value is fine.
So I do the following:
while len(data['attributes']['groupA']['params']['weights']) < legCount:
data['attributes']['groupA']['params']['weights'].append(data['attributes']['groupA']['params']['weights'][0])
while len(data['attributes']['groupB']['paramsGroup']['factors']) < legCount:
data['attributes']['groupB']['paramsGroup']['factors'].append(data['attributes']['groupB']['paramsGroup']['factors'][0])
while len(data['attributes']['groupC']['items']['delta']) < legCount:
data['attributes']['groupC']['items']['delta'].append(data['attributes']['groupC']['items']['delta'][0])
What I'd like to do is make these attributes all strings and just loop through them dynamically so that when I need to add additional ones, I can just paste one string into my list and it works without having another while loop.
So I converted it to this:
attribs = [
"data['attributes']['groupA']['params']['weights']",
"data['attributes']['groupB']['paramsGroup']['factors']",
"data['attributes']['groupC']['items']['delta']",
"data['attributes']['groupD']['xxxx']['yyyy']"
]
for attrib in attribs:
while len(eval(attrib)) < legCount:
eval(attrib).append(eval(attrib)[0])
In this case eval is safe because there is no user input, just a defined list of entries. Tho I wouldn't mind finding an alternative to eval either.
It works up until the last line. I don't think the .append is working on the eval() result. It's not throwing an error.. just not appending to the element.
Any ideas on the best way to handle this?
Not 100% sure this will fix it, but I do notice one thing.
In your above code in your while condition you are accessing:
data['attributes']['groupA']['params']['weights']
then you are appending to
data['attributes']['groupA']['params']['legs']
In your below code it looks like you are appending to 'weights' on the first iteration. However, this doesn't explain the other attributes you are evaluating... just one red flag I noticed.
Actually my code was working. I was just checking the wrong variable. Thanks Me! :)
If I want to assign to an element of a list only one value I use always a dictionary. For example:
{'Monday':1, 'Tuesday':2,...'Friday':5,..}
But I want to assign to one element of a list many values, like for example:
Monday: Jogging, Swimming, Skating
Tuesday: School, Work, Dinner, Cinema
...
Friday: Doctor
Is any built-in structure or a simple way to make something like this in python?
My idea: I was thinking about something like: a dictionary which as a key holds a day and as a value holds a list, but maybe there is a better solution.
A dictionary whose values are lists is perfectly fine, and in fact very common.
In fact, you might want to consider an extension to that: a collections.defaultdict(list). This will create a new empty list the first time you access any key, so you can write code like this:
d[day].append(activity)
… instead of this:
if not day in d:
d[day] = []
d[day].append(activity)
The down-side of a defaultdict is that you no longer have a way to detect that a key is missing in your lookup code, because it will automatically create a new one. If that matters, use a regular dict together with the setdefault method:
d.setdefault(day, []).append(activity)
You could wrap either of these solutions up in a "MultiDict" class that encapsulates the fact that it's a dictionary of lists, but the dictionary-of-lists idea is such a common idiom that it really isn't necessary to hide it.