Using a Variable in a .query function - python

I'm trying to create a function that takes the input name of a value in a column and that value will then be used in a df.query function. However, I cannot figure out how to make it a variable that it recognizes as the input.
This is what I have right now:
def gettingWeeks(stateAbbr, stateName):
stateCases = cases.query('state == stateName')
But it does not recognize stateName. Is there a way to do this?
Thanks!

Pandas DataFrame.query method expects an expression string created accordingly to its specific syntax. To use variables from the current name space you have to use # symbol before the name of the variable:
stateCases = cases.query("state == #stateName")
Should work fine.
Here is the doc.

Related

Use eval function with 'mean()' and 'median()'

I want to calculate mean and median from of a dataframe so I put them in a list as follows:
comb_methods = ['median','mean']
I use a loop and use eval function to make the functions callable, and calculate the result and add it as a new column to the dataframe
for combin in comb_methods:
combination = eval(combin)
heartdata[combin] = heartdata.combination(axis=1)
I get the following error.
name 'median' is not defined
I'm trying to understand why this is occurring for hours but I can't figure it out!
You need to use getattr instead of eval:
for combin in comb_methods:
heartdata[combin] = getattr(heartdata, combin)(axis=1)
getattr looks for the attribute of a given object with a name as a string. Writing
getattr(heartdata, 'median')
returns heartdata.median (a method which we then call with the axis=1 argument).
eval on the other hand simply evaluates whatever string you pass onto it. So
eval('median')
is the same as simply typing median (without quotes) on a Python script. Python will believe that median is a variable, and will throw the error you see when it can't find said variable.

Calling a function in Pycharm

I am new in Python and I am currently learning OOP in Pycharm.
When I type in a simple function like type(mylist), I dont see the answer in the console, I have to add print in the beginning, same with any other function, although in the tutorials I am currently following, they just call the function by typing its name and adding a parameter.
Same with my first attribute (please see screenshots)
Please help me if you know how to get around it.
You need to separate the object instantiation from the print()
my_dog = Dog(mybreed='lab')
print(my_dog)
Instead of:
print(my_dog=Dog(mybreed='lab'))
You could either split it to two lines:
my_dog = Dog(mybreed='lab')
print(my_dog)
Or, if you don't need the my_dog variable:
print(Dog(mybreed='lab'))
In python variable_name = expression can't be regarded as expression to be used as parameter, so print(my_dog=Dog(mybreed='lab')) will raise an error.
You can sure finish your job by this way:
my_dog = Dog(mybreed='lab') # assign the variable my_dog
print(my_dog) # print the variable my_dog
If you don't need variable my_dog, you can just use print(Dog(mybreed='lab')), which will surely work.
If you do prefer assign a variable and pass it as a parameter (just like C++ does), you can use Assignment Expressions(also The Walrus Operator) := in Python 3.8 or higher version:
print(my_dog:=Dog(mybreed='lab'))
But just keep it in mind that this operator maybe not as convenient as you think!

How to use strings as name to define a function?

What I want:
I want to use string to be name of a function, I am defining. Code to reproduce result:
def creator(string):
def string():
return 0
return string
So the creator function takes an input string, say 'test1' and the creator function should create a global function named test1. So whenever I call test1() (as a normal function call) it should return 0.
Doubts whether you really need it aside, you cannot do this in Python.
What you are looking for is a way to set a global from local context. The way Python resolves variable names (scoping) doesn't allow to do so.

pandas DataFrame.query expression that returns all rows by default

I have discovered the pandas DataFrame.query method and it almost does exactly what I needed it to (and implemented my own parser for, since I hadn't realized it existed but really I should be using the standard method).
I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.
There's just one thing missing: a way to select everything in the dataframe. Sometimes what my users want to use is every row, so they would put 'All' or something into that configuration option. In fact, that will be the default option.
I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.
The only things I can think of are 1) put an if clause every time I need to do this type of query (probably 3 or 4 times in the code) or 2) subclass DataFrame and either reimplement query, or add a query_with_all method:
import pandas as pd
class MyDataFrame(pd.DataFrame):
def query_with_all(self, query_string):
if query_string.lower() == 'all':
return self
else:
return self.query(query_string)
And then use my own class every time instead of the pandas one. Is this the only way to do this?
Keep things simple, and use a function:
def query_with_all(data_frame, query_string):
if query_string == "all":
return data_frame
return data_frame.query(query_string)
Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.
If you're restricted to using df.query, you can use a global variable
ALL = slice(None)
df.query('#ALL', engine='python')
If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use
df.query('tuple()')
All of these will property handle NaN values.
df.query('ilevel_0 in ilevel_0') will always return the full dataframe, also when the index contains NaN values or even when the dataframe is completely empty.
In you particular case you could then define a global variable all_true = 'ilevel_0 in ilevel_0' (as suggested in the comments by Zero) so that your engineers could use the name of the global variable in their config file instead.
This statement is just a dirty way to properly query True like you already tried. ilevel_0 is a more formal way of making sure you are referring the index. See the docs here for more details on using in and ilevel_0: https://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method

Python: Convert Name of a Variable or Variable's Property into a String

i'm pretty sure there is a way to do this in Python, but i couldn't find a way. Thanks for the help.
Customer.Name (Variable.Property)
Want to use the property's name directly in code, rather than hard-coded it.
colName = Customer.Name.*toString()*
dataframe.drop(colName, 1)
It's pretty simple how to do this. just use the str() function.
For example:
vbl = True
print(str(bvl))
True
Customer.Name(str(Variable.Property))

Categories