I've been scratching my head all morning trying to understand why the following doesn't seem to work. The idea here is that if there is no column provided, run 1 set of code and if there is, then run another.
Let's say the value of df just holds 1 value = 'A'
def function_name(df, col):
if col == None:
df = df.str.lower()
else:
df[col] = df[col].str.lower()
function_name(df, None)
Expected Results: 'a'
Current Results: 'A'
If I was to run function_name(df, 'A'):
Expected Results: 'a'
Current Results: 'a'
Ideally when running the function, since None was passed in it should return whatever commands I passed through but currently, it's acting as if nothing is happening. When I debug by printing it, I can see that the code is doing the 'stuff' but the function itself isn't resulting in whatever commands was run. Any thoughts?
Since i can't comment, i'll post it as an answer.
df is acting as two data types, in the if block it is not an array, whereas in else block it is acting as an array so please check it out. You can do df[0] in the first block.
Moreover you mentioned that the function isn't returning anything, this is due to the missing return statement, So if you intend to return the new df add return df at the end of the function.
Lets say, You're passing df which just holds 1 value = 'A'. So, the code
def function_name(df, col):
if col == None:
df = str(df).lower()
else:
df[col] = str(df).lower()
print(df)
function_name('A', None)
you're expecting an output a when you're passing 'A', and the function does not return any value. So, I tried printing the value of df and its matching the expected output. The .(dot) operator is used to access class, structure, union members, and str is a datatype, so using df.str.lower() leads to AttributeError: type object 'str' has no attribute 'df'. I hope this answer helps.
Related
I'm new to python and pandas but I have a problem I cannot wrap my head around.
I'm trying to add a new column to my DataFrame. To achieve that I use the assign() function.
Most of the examples on the internet are painfully trivial and I cannot find a solution for my problem.
What works:
my_dataset.assign(new_col=lambda x: my_custom_long_function(x['long_column']))
def my_custom_long_function(input)
return input * 2
What doesn't work:
my_dataset.assign(new_col=lambda x: my_custom_string_function(x['string_column'])
def my_custom_string_function(input)
return input.upper()
What confuses me is that in the debug I can see that even for my_custom_long_function the parameter is a Series, not a long.
I just want to use the lambda function and pass a value of the column to do my already written complicated functions. How do I do this?
Edit: The example here is just for demonstrative purpose, the real code is basically an existing complex function that does not care about panda's types and needs a str as a parameter.
Because the column doesn't have a upper method, in order to use it, you need to do str.upper:
my_dataset.assign(new_col=lambda x: my_custom_string_function(x['string_column'])
def my_custom_string_function(input)
return input.str.upper()
That said, I would use:
my_dataset['new column'] = my_dataset['string_column'].str.upper()
For efficiency.
Edit:
my_dataset['new column'] = my_dataset['string_column'].apply(lambda x: my_custom_string_function(x))
def my_custom_string_function(input):
return input.upper()
I believe this is a simple question but still want to get a quick and clear answer to my case:
def get_query_history(idx, url, archive_location):
idx = idx + 1
return idx # I meant to return the idx's value (end up 1000 for every call) and used it in the next loop in main
main:
idx = 1
while current <= end_date:
with open(archive_location, 'a') as the_archive:
get_query_history(idx, url, archive_location) # I want to increase the idx every time I call the function
Apparently this is not the way I should take in python, can anyone enlighten me?
Here, I'll post it as an answer but I'll expand a bit.
Since you're returning idx increased value, just store it back in the 'main' scope:
idx = 1
while current <= end_date:
with open(archive_location, 'a') as the_archive:
idx = get_query_history(idx, url, archive_location)
# make sure you update your `current` ;)
In some languages you have an option to pass a variable to a function by reference in such a way that the function can change its value so you wouldn't need to return your value. Python kind of passes by reference, but since simple values are unmutable whenever you try to set its value in your function the reference to the passed value gets overwritten.
This doesn't apply to encapsulated objects, tho, so you could encapsulate your idx in a list and then pass it as a list. In that case you wouldn't need return at all:
def get_query_history(idx, url, archive_location):
idx[0] += 1
# do whatever else
# in your main:
idx = [1] # encapsulate the value in a list
while current <= end_date:
with open(archive_location, 'a') as the_archive:
get_query_history(idx, url, archive_location) # notice, no return capture
# make sure you update your `current` ;)
But generally, if you can return the value there is no need for these shenanigans, it's just to demonstrate that a function can modify the passed arguments under certain conditions.
And, finally, if you really want to force pass-by-reference behavior, you can totally hack Python to do even that, check this (and never use it in production!) ;)
so basically i am writing a some codes to cross check if my data is consistent.
I have written the below code but it has been showing TypeError: argument of type 'NoneType' is not iterable, i have tried changing the code quite a few times but still the same error comes out. Many thanks.
def checkdata(sex,school):
if (sex == 'F') and ('boys school' in school) :
return 'inconsistent'
if (sex == 'M') and ('girls school' in school):
return 'inconsistent'
return
def Dif() :
with arcpy.da.UpdateCursor(DATA_SET,
[sex, school]) as Cursor :
for Cols in Cursor :
Data = checkdata(Cols[0], Cols[1])
if Data is not None:
print (Data, " ",Cols)
In this instance the 'Cursor' variable is None you can check this by printing it before it is used in the loop.
When the loop tries to iterate over None it raises the error shown.
UPDATE:
In that case I would suggest that school is None and the reasoning above holds. Please include the full error message when asking questions like this.
Ah. For one of your data records, you must be getting a None as the value of school. The TypeError is being thrown by the in operator, which expects a sequence type as the second operand. None isn't a sequence type - it's None ;-)
Try adding print(sex, school) as the first line of checkdata() to confirm the parameters are what you expect.
Rather than explicitly specifying the DataFrame columns in the code below, I'm trying to give an option of passing the name of the data frame in itself, without much success.
The code below gives a
"ValueError: Wrong number of dimensions" error.
I've tried another couple of ideas but they all lead to errors of one form or another.
Apart from this issue, when the parameters are passed as explicit DataFrame columns, p as a single column, and q as a list of columns, the code works as desired. Is there a clever (or indeed any) way of passing in the data frame so the columns can be assigned to it implicitly?
def cdf(p, q=[], datafr=None):
if datafr!=None:
p = datafr[p]
for i in range(len(q)):
q[i]=datafr[q[i]]
...
(calculate conditional probability tables for p|q)
to summarize:
current usage:
cdf(df['var1'], [df['var2'], df['var3']])
desired usage:
cdf('var1', ['var2', 'var3'], datafr=df)
Change if datafr != None: to if datafr is not None:
Pandas doesn't know which value in the dataframe you are trying to compare to None so it throws an error. is checks if both datafr and None are the pointing to the same object, which is a more stringent identity check. See this explanation.
Additional tips:
Python iterates over lists
#change this
for i in range(len(q)):
q[i]=datafr[q[i]]
#to this:
for i in q:
q[i] = datafr[q]
If q is a required parameter don't do q = [ ] when defining your function. If it is an optional parameter, ignore me.
Python can use position to match the arguments passed to the function call to with the parameters in the definition.
cdf('var1', ['var2', 'var3'], datafr=df)
#can be written as:
cdf('var1', ['var2', 'var3'], df)
I am still new to Python and have been reviewing the following code not written by me.
Could someone please explain how the first instance of the variable "clean" is able to be be called in the check_arguments function? It seems to me as though it is calling an as yet undefined variable. The code works but shouldn't that call to "clean" produce an error?
To be clear the bit I am referring to is this.
def check_arguments(ages):
clean, ages_list = parse_ages_argument(ages)
The full code is as follows...
def check_arguments(ages):
clean, ages_list = parse_ages_argument(ages)
if clean != True:
print('invalid ages: %s') % ages
return ages_list
def parse_ages_argument(ages):
clean = True
ages_list = []
ages_string_list = ages.split(',')
for age_string in ages_string_list:
if age_string.isdigit() != True:
clean = False
break
for age_string in ages_string_list:
try:
ages_list.append(int(age_string))
except ValueError:
clean = False
break
ages_list.sort(reverse=True)
return clean, ages_list
ages_list = check_arguments('1,2,3')
print(ages_list)
Python doesn't have a comma operator. What you are seeing is sequence unpacking.
>>> a, b = 1, 2
>>> print a, b
1 2
how the first instance of the variable "clean" is able to be be called in the check_arguments function?
This is a nonsensical thing to ask in the first place, since variables aren't called; functions are. Further, "instance" normally means "a value that is of some class type", not "occurrence of the thing in question in the code listing".
That said: the line of code in question does not use an undefined variable clean. It defines the variable clean (and ages_list at the same time). parse_ages_argument returns two values (as you can see by examining its return statement). The two returned values are assigned to the two variables, respectively.