global variables and .format() - python

I wrote a python script that generates PDF reports. I had to do some data manipulation to change column names in each of the data sets I used.
My question is, Is there a way to set a global variable and then using .format() inside the Target_Hours_All.rename() ???
I have hardcoded each column name.
For example, Target_Hours_All.rename(columns = {'VP_x':'VP', '2018 Q1 Target Hours':'hourTarget18Q1'}, inplace = True)
However, I want to be able to run this each quarter without having to update every df.rename. Instead, I would like to have global variables at the top of the script and change those.
Any help would be greatly appreciated!!!

Easiest way? Move the strings you want to change out of the function call into variables and then use the variables within Target_Hours_All.rename()
This makes the code way lot easier to read.
I can only guess what Target_Hours_All.rename does. But my guess would be that it takes the hash in "columns" and replaces the key with the value. Correct?
So could write your columns line as:
columns = {}
vpl = 'VP_x'
vpr = 'VP'
columns[vpl] = vpr
target_hours_l = '20{year} {quarter} Target Hours'.format(year='18', quarter='Q1')
target_hours_r = 'hourTarget{year}{quarter}'.format(year='18',quarter='Q1')
columns[target_hours_l] = target_hours_r
Target_Hours_All.rename(columns = columns, ... )
Yes this is more code and I should not have named my has columns but something else instead. So there is way for improvement. But it shows how you can use .format() for your call.

Related

variable dataframe name - loop works by itself, but not inside of function

I have dataframes that follow name syntax of 'df#' and I would like to be able to loop through these dataframes in a function. In the code below, if function "testing" is removed, the loop works as expected. When I add the function, it gets stuck on the "test" variable with keyerror = "iris1".
import statistics
iris1 = sns.load_dataset('iris')
iris2 = sns.load_dataset('iris')
def testing():
rows = []
for i in range(2):
test=vars()['iris'+str(i+1)]
rows.append([
statistics.mean(test['sepal_length']),
statistics.mean(test['sepal_width'])
])
testing()
The reason this will be valuable is because I am subsetting my dataframe df multiple times to create quick visualizations. So in Jupyter, I have one cell where I create visualizations off of df1,df2,df3. In the next cell, I overwrite df1,df2,df3 based on different subsetting rules. This is advantageous because I can quickly do this by calling a function each time, so the code stays quite uniform.
Store the datasets in a dictionary and pass that to the function.
import statistics
import seaborn as sns
datasets = {'iris1': sns.load_dataset('iris'), 'iris2': sns.load_dataset('iris')}
def testing(data):
rows = []
for i in range(1,3):
test=data[f'iris{i}']
rows.append([
statistics.mean(test['sepal_length']),
statistics.mean(test['sepal_width'])
])
testing(datasets)
No...
You should NEVER make a sentence like I have dataframes that follow name syntax of 'df#'
Then you have a list of dataframes, or a dict of dataframe, depending how you want to index them...
Here I would say a list
Then you can forget about vars(), trust me you don't need it... :)
EDIT :
And use list comprehensions, your code could hold in three lines :
import statistics
list_iris = [sns.load_dataset('iris'), sns.load_dataset('iris')]
rows = [
(statistics.mean(test['sepal_length']), statistics.mean(test['sepal_width']))
for test in list_iris
]
Storing as a list or dictionary allowed me to create the function. There is still a problem of the nubmer of dataframes in the list varies. It would be nice to be able to just input n argument specifying how many objects are in the list (I guess I could just add a bunch of if statements to define the list based off such an argument). **EDIT: Changing my code so that I don't use df# syntax, instead just putting it directly into a list
The problem I was experiencing is still perplexing. I can't for the life of me figure out why the "test" variable performs as expected outside of a function, but inside of a function it fails. I'm going to go the route of creating a list of dataframes, but am still curious to understand why it fails inside of the function.
I agree with #Icarwiz that it might not be the best way to go about it but you can make it work with.
test=eval('iris'+str(i+1))

Apply function to each element of multiple lists; return differently named dataframes

I have a function that returns specific country-currency pairs that are used in the following step.
The return is something like:
lst_dolar = ['USA_dolar','Canada_dolar','Australia_dolar']
lst_eur = ['France_euro','Germany_euro','Italy_euro']
lst_pound=['England_pound','Scotland_pound','Wales_pound']
I then use a function that returns a dataframe.
One of the parameters of this function is country-currency pair and the other is the period, from a list of periods:
period_lst = ['1y','2y','3y','4y','5y']
What I would like to do is to then get a list of dataframes, that will be then saved, each single one of them, to a different table, using SQLite3.
My question is how do I apply my function to each element of the lists of country-currency pairs and for each element of the period_lst and then obtain differently named dataframes as a result?
Ex: USA_dolar_1y
I then would like to be able to take each one of these dataframes and saved them to a table, in a database, that has the same name as each dataframe.
Thank you!
Whenever you think you need to dynamically name variables in Python, you probably want a dictionary:
def my_func(df, period):
# do something with period and dataframe and return the result
return df
period_lst = ['1y', '2y', '3y', '4y', '5y']
usa_dollar= {}
for p in period_lst:
usa_dollar[p] = my_func(df, p)
You can then access the various resulting dataframes (or whatever your function returns) by their period:
use_data(usa_dollar['3y'])
By the way: don't use capitals in your variable names, you should reserve CamelCase for class names and write function names and variable names in lowercase, separated by underscores for readability. So, usa_dollar, not USAdollar, for example.
This helps editors spot problems in your code and makes the code easier to read for other programmers, as well as future you. Look up PEP8 for more of these style rules.
Another by the way: if the only reason you want to keep the resulting dataframes in separate variables is to then write them to a file, you could just write the dataframe to the file once you've created it, and reuse the variable for the next one, if you have no immediate other need for the data you're about to overwrite.

when converting XML to SEVERAL dataframes, how to name these dfs in a dynamic way?

my code is on the bottom
"parse_xml" function can transfer a xml file to a df, for example, "df=parse_XML("example.xml", lst_level2_tags)" works
but as I want to save to several dfs so I want to have names like df_ first_level_tag, etc
when I run the bottom code, I get an error "f'df_{first_level_tag}'=parse_XML("example.xml", lst_level2_tags)
^
SyntaxError: can't assign to literal"
I also tried .format method instead of f-string but it also hasn't worked
there are at least 30 dfs to save and I don't want to do it one by one. always succeeded with f-string in Python outside pandas though
Is the problem here about f-string/format method or my code has other logic problem?
if necessary for you, the parse_xml function is directly from this link
the function definition
for first_level_tag in first_level_tags:
lst_level2_tags = []
for subchild in root[0]:
lst_level2_tags.append(subchild.tag)
f'df_{first_level_tag}'=parse_XML("example.xml", lst_level2_tags)
This seems like a situation where you'd be best served by putting them into a dictionary:
dfs = {}
for first_level_tag in first_level_tags:
lst_level2_tags = []
for subchild in root[0]:
lst_level2_tags.append(subchild.tag)
dfs[first_level_tag] = parse_XML("example.xml", lst_level2_tags)
There's nothing structurally wrong with your f-string, but you generally can't get dynamic variable names in Python without doing ugly things. In general, storing the values in a dictionary ends up being a much cleaner solution when you want something like that.
One advantage of working with them this way is that you can then just iterate over the dictionary later on if you want to do something to each of them. For example, if you wanted to write each of them to disk as a CSV with a name matching the tag, you could do something like:
for key, df in dfs.items():
df.to_csv(f'{key}.csv')
You can also just refer to them individually (so if there was a tag named a, you could refer to dfs['a'] to access it in your code later).

Assigning a variable to part of a value

So I am writing a program in python3 and Im stuck on one part. My problem is I have a variable. example:
variable = 12345
so I want to assign a specific variable to one part of the value of the other variable.
Ive tried example:
variable[2] = 55
(this is an example but hopefully im coming off clear enough)
so I want to take the 3rd digit in "12345" 12[3]45
so the "3" I want to assign a variable basically from a place inside another variable.
What sucks is it has to be inside that variable. Ive thought about passing it as a string instead but I think that will change the whole rest of the script being that now its a string.
so I thought also about each time I need to call that variable I can pass it as a string to another variable but im trying not to get swamped with a million str and int swaps. I hope Ive made this understandable...I get that its not very clear lol but Ive confused my self with this script any help will be much appreciated :)
In any case we have to perform conversions to strings and lists. We can achieve this in compatible way with python2.x and python3.x as follow:
# input variable
variable = 12345
# convert that variable to list
variable_lst = [int(i) for i in str(variable)]
# change necessary digit by list index
variable_lst[2] = 55
# reverse conversion then assign updated data to variable
variable = int(''.join(str(j) for j in variable_lst)) # 125545
you should convert your variable to list type, so you could change any index of it:
var = 12345
var_list= list(str(var))
var_list[2]=55
var_list= int(''.join(str(i) for i in var_list))

Usng the value of a string as a variable name

Let's say I have a string like this.
string = "someString"
I now want to create a new instance of say, a dict() object using the variable stored in string. Can I do this?
string = dict()
Hoping it becomes "someString = dict()". Is this right? If not, how do i do it? Still learning python. Any help would be greatly appreciated.
Yes, it is possible to do this, though it is considered a bad thing to do:
string = 'someString'
globals()[string] = dict()
Instead you should do something like:
my_dynamic_vars = dict()
string = 'someString'
my_dynamic_vars.update({string: dict()})
then my_dynamic_vars[string] is a dict()
You really shouldn't do this, but if you really want to, you can use exec()
For your example, you would use this:
exec(string + " = dict()")
And this would assign a new dictionary to a variable by the name of whatever string is.
Using black magic, the kind that send you to python hell, it's possible.
The globals() and locals() functions, for example, will give you the dictionaries that contain variables currently in scope as (key, value) entries. While you can try to edit these dictionaries, the results are sometimes unpredictable, sometimes incorrect, and always undesirable.
So no. There is no way of creating a variable with a non-explicit name.
If the variable you want to set is inside an object, you can use setattr(instance,'variable_name',value)

Categories