I have a list of different names. I want to take one name at a time and match it with values in a particular column in a data frame. If the conditions are met, the following calculation will be performed:
orderno == orderno + 1
However, unfortunately, the code does not seem to work. Is there anything that I can do to make sure it works?
DfCustomers['orderno'] = 0
for i in uniquecustomer:
if i == "DfCustomers['EntityName']":
orderno == orderno + 1
Remove the quotes (""). By writing
if i == "DfCustomers['EntityName']":
you compare the variable i with the actual string "DfCustomers['EntityName']" instead of the variable DfCustomers['EntityName']. Try to remove the quotes and print out the variable to get a feeling for it, e.g.
print("DfCustomers['EntityName']")
vs
print(DfCustomers['EntityName'])
Try first removing the quotes around the "DfCustomers['EntityName']" so as to not just compare directly to that string. Then, within your logic the orderno variable should be incremented by 1, not compared to its value + 1. The new code could look something like this:
DfCustomers['orderno'] = 0
for i in uniquecustomer:
if i == DfCustomers['EntityName']:
orderno = orderno + 1
Related
I am trying to make my code look better and create functions that do all the work from running just one line but it is not working as intended. I am currently pulling data from a pdf that is in a table into a pandas dataframe. From there I have 4 functions, all calling each other and finally returning the updated dataframe. I can see that it is full updated when I print it in the last method. However I am unable to access and use that updated dataframe, even after I return it.
My code is as follows
def data_cleaner(dataFrame):
#removing random rows
removed = dataFrame.drop(columns=['Unnamed: 1','Unnamed: 2','Unnamed: 4','Unnamed: 5','Unnamed: 7','Unnamed: 9','Unnamed: 11','Unnamed: 13','Unnamed: 15','Unnamed: 17','Unnamed: 19'])
#call next method
col_combiner(removed)
def col_combiner(dataFrame):
#Grabbing first and second row of table to combine
first_row = dataFrame.iloc[0]
second_row = dataFrame.iloc[1]
#List to combine columns
newColNames = []
#Run through each row and combine them into one name
for i,j in zip(first_row,second_row):
#Check to see if they are not strings, if they are not convert it
if not isinstance(i,str):
i = str(i)
if not isinstance(j,str):
j = str(j)
newString = ''
#Check for double NAN case and change it to Expenses
if i == 'nan' and j == 'nan':
i = 'Expenses'
newString = newString + i
#Check for leading NAN and remove it
elif i == 'nan':
newString = newString + j
else:
newString = newString + i + ' ' + j
newColNames.append(newString)
#Now update the dataframes column names
dataFrame.columns = newColNames
#Remove the name rows since they are now the column names
dataFrame = dataFrame.iloc[2:,:]
#Going to clean the values in the DF
clean_numbers(dataFrame)
def clean_numbers(dataFrame):
#Fill NAN values with 0
noNan = dataFrame.fillna(0)
#Pull each column, clean the values, then put it back
for i in range(noNan.shape[1]):
colList = noNan.iloc[:,i].tolist()
#calling to clean the column so that it is all ints
col_checker(colList)
noNan.iloc[:,i] = colList
return noNan
def col_checker(col):
#Going through, checking and cleaning
for i in range(len(col)):
#print(type(colList[i]))
if isinstance(col[i],str):
col[i] = col[i].replace(',','')
if col[i].isdigit():
#print('not here')
col[i] = int(col[i])
#If it is not a number then make it 0
else:
col[i] = 0
Then when I run this:
doesThisWork = data_cleaner(cleaner)
type(doesThisWork)
I get NoneType. I might be doing this the long way as I am new to this, so any advice is much appreciated!
The reason you are getting NoneType is because your function does not have a return statement, meaning that when finishing executing it will automatically returns None. And it is the return value of a function that is assigned to a variable var in a statement like this:
var = fun(x)
Now, a different thing entirely is whether or not your dataframe cleaner will be changed by the function data_cleaner, which can happen because dataframes are mutable objects in Python.
In other words, your function can read your dataframe and change it, so after the function call cleaner is different than before. At the same time, your function can return a value (which it doesn't) and this value will be assigned to doesThisWork.
Usually, you should prefer that your function does only one thing, so expect that the function changes its argument and return a value is usually bad practice.
I am cleaning csv file using python. My goal is to find any numbers that does not started with 0, and append 0 in front of the number
example existing data :
Expected output :
0 will be appended to each number that does not start with 0
My current code :
The logic of the code below is to filter numbers that started with 1 and then append 0 in front of it.
I managed to append zero in from of each number that does not start with zero but I cannot update into data frame.
for i in eg1['MOBILENO']:
if re.findall(r'^["1"]+', i):
z = "0"+ i
print(z)
You can try the following example
df['MOBILENO'] = df['MOBILENO'].apply(lambda x: "0" + x if re.findall(r'^["1"]+', x) else x)
I have tried this and it worked, check this once.
Indentations were not given properly, check that when you paste the code.
for i in range(len(eg1)):
if eg1.loc[i, 'MOBILENO'][0] != '0':
x.loc[i,'MOBILENO'] = '0' + x.loc[i,'MOBILENO']
I am trying to make the switch from STATA to python for data analysis and I'm running into some hiccups that I'd like some help with. I am attempting to create a secondary variable based on some values in an original variable. I want to create a binary variable which identifies fall accidents (E-codes E880.xx -E888.xx) with a value of 1, and all other e-codes with a value of 0. in a list of ICD-9 codes with over 10,000 rows, so manual imputation isn't possible.
in STATA the code would look something like this
newvar= 0
replace newvar = 1 if ecode_variable == "E880"
replace newvar = 1 if ecode_variable == "E881"
etc
I tried a similar statement in python, but it's not working
data['ecode_fall'] = 1 if data['ecode'] == 'E880'
is this type of work possible in python? Is there a function in the numpy or pandas packages that could help with this.
I've also tried creating a dictionary variable which calls the fall injury codes 1 and applying it to the variable to no avail.
Put the if first.
if data['ecode'] == 'E880': data['ecode_fall'] = 1
you can break it out into two lines like this:
if data['ecode'] == 'E880':
data['ecode_fall'] = 1
or if you include an else statement you can have it in one line, similar syntax to your SATA code:
data['ecode_fall'] = 1 if data['ecode'] == 'E880' else None
Following from the other answers, you can also check multiple values at once like so:
if data['ecode'] in ('E880', 'E881', ...):
data['ecode_fall'] = 1
this leaves you having to only do one if statement per unique value of data['ecode_fall'].
I'm working on a homework assignment where I have to figure out how to slice a string starting at the index of the first occurrence of a sub string to the index of the second occurrence of that sub string. The problem is specifically trying to slice the string "juxtaposition" from the first "t" to the second "t" and we are supposed to use .find() to do so, but I haven't been able to figure it out.
I've tried to use WHILE to create a loop to find the index of the different occurrences of the sub string and then use that to slice the string, but I haven't been able to make it work.
This is what I've been able to come up with so far:
long_word = "juxtaposition"
location = long_word.find("t")
start_index = location
stop_index = 0
while location != -1:
stop_index + long_word.find("t",location + 1)
print(long_word[int(start_index),int(stop_index)])
When I ran this it didn't show an error message but it doesn't show an output either, and in order to edit the cell again I have to interrupt the kernel.
There are a million ways to approach this. One, which is a bit ugly but interesting for learning is to slice your string such as: mystring[start:stop] where you specify the start point as the first .find() and the stop point as the second .find().
The stop point is interesting, because you're passing .find()+1 as the start point of .find() so it skips the first instance of the letter. The final +1 is to include the 't' in the output if you want it.
Typically in python this would be frowned upon because it's unnecessarily unreadable, but I thought I'd post it to give you an idea of how flexible you can be in solving these problems
long_word[long_word.find('t'):long_word.find('t',long_word.find('t')+1)+1]
Output
'taposit'
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
long_word = "juxtaposition"
location = "t"
locations = (list(find_all(long_word, location)))
substr = long_word[locations[0]:locations[1]+1]
print (substr)
output:
taposit
The find method on strings in Python accepts a second parameter for the index in the string to begin searching. In order to find the second occurrence of the substring's index, provide the first occurrence's index + 1 as the second parameter to find:
def get_substring(long_word, search):
first_occurence_idx = long_word.find(search)
if first_occurence_idx == -1:
return
# for second call of `find`, provide start param so that it only searches
# after the first occurence
second_occurence_idx = long_word.find(search, first_occurence_idx + 1)
if second_occurence_idx == -1:
return
return long_word[first_occurence_idx:second_occurence_idx + len(search)]
# example provided
assert get_substring('juxtaposition', 't') == 'taposit'
# case where search occurs once in long_word
assert get_substring('juxtaposition', 'x') is None
# case where search is not in long_word
assert get_substring('juxtaposition', 'z') is None
# case where len(search > 1) and search occurs twice
assert get_substring('juxtaposition justice', 'ju') == 'juxtaposition ju'
I have some python code to change the severities of incoming SNMP traps on the NMS I am using.
The incoming SNMP traps contain objects that are ranges of numbers for a given severity level. The below code works if the the incoming object numbers are singular, 1,2,3,4,5 etc. But it doesnt work for the below when trying to match a regex number range.
## This gets the alarmTrapSeverity function and creates a variable called Severity to hold the value
if getattr(evt, 'alarmTrapSeverity', None) is not None:
Severity = getattr(evt, 'alarmTrapSeverity')
## This part runs through the Severity to assign the correct value
if str(Severity) == '0':
evt.severity = 0
elif str(Severity) == '([1-9]|1[0-9])':
evt.severity = 1
Please could you advise the correct way to do this. My regex skills are still developing.
If I am understanding this correctly, in the else-if statement you would like to perform a regular expression search in order to confirm a match. My approach would look like this,
## This gets the alarmTrapSeverity function and creates a variable called
Severity to hold the value
if getattr(evt, 'alarmTrapSeverity', None) is not None:
Severity = getattr(evt, 'alarmTrapSeverity')
regex = re.compile(r'([1-9]|1[0-9])')
## This part runs through the Severity to assign the correct value
if str(Severity) == '0':
evt.severity = 0
elif regex.search(str(Severity)) != None:
evt.severity = 1
This would search the str(Severity) variable for a matching substring, which in this case would be the string containing numbers between 1-19. Then as long as it finds a match, sets evt.severity = 1.
Also, looking back at your question, if you were having issues with finding numbers between 1-19 with that regex, another example which works might be,
"10|1?[1-9]"