I am cleaning csv file using python. My goal is to find any numbers that does not started with 0, and append 0 in front of the number
example existing data :
Expected output :
0 will be appended to each number that does not start with 0
My current code :
The logic of the code below is to filter numbers that started with 1 and then append 0 in front of it.
I managed to append zero in from of each number that does not start with zero but I cannot update into data frame.
for i in eg1['MOBILENO']:
if re.findall(r'^["1"]+', i):
z = "0"+ i
print(z)
You can try the following example
df['MOBILENO'] = df['MOBILENO'].apply(lambda x: "0" + x if re.findall(r'^["1"]+', x) else x)
I have tried this and it worked, check this once.
Indentations were not given properly, check that when you paste the code.
for i in range(len(eg1)):
if eg1.loc[i, 'MOBILENO'][0] != '0':
x.loc[i,'MOBILENO'] = '0' + x.loc[i,'MOBILENO']
Related
take = randint(0, len(teacherClass[teacher])-1)
print(take)
print(teacherClass)
print(teacher)
print(teacherClass[teacher])
triesDone = 0
while triesDone < len(teacherClass[teacher]):
cp = teacherClass[teacher][take]
if (cp not in (blocks[teacher][day])) and (blocksS[cp][day][block] == ""):
blocks[teacher][day][block] = cp
blocksS[cp][day][block] = teacherSub[teacher]
take +=1
triesDone += 1
if take == len(teacherClass[teacher])-1:
take = 0
When I run the program after some time, the above part is hit and the program starts working as intended but line 8 raises the error ("IndexError: list index out of range").
Trying to solve that and understand the problem, I tried to print the entire dictionary(teacherClass) and the indices used(teacher and take) but even after that, it seems the line 8 should work.
Output I am getting:
Output with list and index
Please help me understand the problem and a solution. Thanks
There is a possibility that take could be: len(teacherClass[teacher])-1 from the assignment on the first line. Later there is take += 1. This mean that it is larger than the limit, so take = 0 is never executed.
Did you mean:
if take >= len(teacherClass[teacher])-1:
take = 0
I have a list of different names. I want to take one name at a time and match it with values in a particular column in a data frame. If the conditions are met, the following calculation will be performed:
orderno == orderno + 1
However, unfortunately, the code does not seem to work. Is there anything that I can do to make sure it works?
DfCustomers['orderno'] = 0
for i in uniquecustomer:
if i == "DfCustomers['EntityName']":
orderno == orderno + 1
Remove the quotes (""). By writing
if i == "DfCustomers['EntityName']":
you compare the variable i with the actual string "DfCustomers['EntityName']" instead of the variable DfCustomers['EntityName']. Try to remove the quotes and print out the variable to get a feeling for it, e.g.
print("DfCustomers['EntityName']")
vs
print(DfCustomers['EntityName'])
Try first removing the quotes around the "DfCustomers['EntityName']" so as to not just compare directly to that string. Then, within your logic the orderno variable should be incremented by 1, not compared to its value + 1. The new code could look something like this:
DfCustomers['orderno'] = 0
for i in uniquecustomer:
if i == DfCustomers['EntityName']:
orderno = orderno + 1
I'm working on a homework assignment where I have to figure out how to slice a string starting at the index of the first occurrence of a sub string to the index of the second occurrence of that sub string. The problem is specifically trying to slice the string "juxtaposition" from the first "t" to the second "t" and we are supposed to use .find() to do so, but I haven't been able to figure it out.
I've tried to use WHILE to create a loop to find the index of the different occurrences of the sub string and then use that to slice the string, but I haven't been able to make it work.
This is what I've been able to come up with so far:
long_word = "juxtaposition"
location = long_word.find("t")
start_index = location
stop_index = 0
while location != -1:
stop_index + long_word.find("t",location + 1)
print(long_word[int(start_index),int(stop_index)])
When I ran this it didn't show an error message but it doesn't show an output either, and in order to edit the cell again I have to interrupt the kernel.
There are a million ways to approach this. One, which is a bit ugly but interesting for learning is to slice your string such as: mystring[start:stop] where you specify the start point as the first .find() and the stop point as the second .find().
The stop point is interesting, because you're passing .find()+1 as the start point of .find() so it skips the first instance of the letter. The final +1 is to include the 't' in the output if you want it.
Typically in python this would be frowned upon because it's unnecessarily unreadable, but I thought I'd post it to give you an idea of how flexible you can be in solving these problems
long_word[long_word.find('t'):long_word.find('t',long_word.find('t')+1)+1]
Output
'taposit'
def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
long_word = "juxtaposition"
location = "t"
locations = (list(find_all(long_word, location)))
substr = long_word[locations[0]:locations[1]+1]
print (substr)
output:
taposit
The find method on strings in Python accepts a second parameter for the index in the string to begin searching. In order to find the second occurrence of the substring's index, provide the first occurrence's index + 1 as the second parameter to find:
def get_substring(long_word, search):
first_occurence_idx = long_word.find(search)
if first_occurence_idx == -1:
return
# for second call of `find`, provide start param so that it only searches
# after the first occurence
second_occurence_idx = long_word.find(search, first_occurence_idx + 1)
if second_occurence_idx == -1:
return
return long_word[first_occurence_idx:second_occurence_idx + len(search)]
# example provided
assert get_substring('juxtaposition', 't') == 'taposit'
# case where search occurs once in long_word
assert get_substring('juxtaposition', 'x') is None
# case where search is not in long_word
assert get_substring('juxtaposition', 'z') is None
# case where len(search > 1) and search occurs twice
assert get_substring('juxtaposition justice', 'ju') == 'juxtaposition ju'
I'm using the NumPy python library to run large-scale edits on a .csv file. I'm using this python code:
import numpy as np
def main():
try:
e,a,ad,c,s,z,ca,fn,ln,p,p2,g,ssn,cn,com,dob,doh,em = np.loadtxt('c:\wamp\www\_quac\carryover_data\SI\Employees.csv',delimiter=',',unpack=True,dtype='str')
x=0
dob = dob.split('/')
for digit in dob:
if len(digit) == 1:
digit = str('0'+digit)
dob = str(dob[2]+'-'+dob[0]+'-'+dob[1])
doh = doh.split('/')
for digit in doh:
if len(digit) == 1:
digit = str('0'+digit)
doh = str(doh[2]+'-'+doh[0]+'-'+doh[1])
for eID in e:
saveLine=eID+','+a[x]+','+ad[x]+','+c[x]+','+s[x]+','+z[x]+','+ca[x]+','+fn[x]+','+ln[x]+','+p[x]+','+p2[x]+','+g[x]+','+ssn[x]+','+cn[x]+','+com[x]+','+dob[x]+','+doh[x]+','+em[x]+'\n'
saveFile = open('fixedEmployees.csv','a')
saveFile.write(saveLine)
saveFile.close()
x+=1
except Exception, e:
print str(e)
main()
dob and doh contain a string, e.g. 4/26/2012 and I'm trying to convert these to mysql friendly DATE forms, e.g. 2012-04-26. The error that is printed when I run this script is
cannot set an array element with a sequence
It does not specify a line and so I don't know what this really means. I'm pretty new to python; I've checked other questions with this same error but I can't make sense of their code. Any help is very appreciated.
Try using zfill to reformat the date string so you can have a '0' before your '4'. (zfill pads a string on the left with zeros to fill the width.)
doh = '4/26/2012'
doh = doh.split('/')
for i, s in enumerate(doh):
doh[i] = s.zfill(2)
doh = doh[2]+'-'+doh[0]+'-'+doh[1]
# result: '2012-04-26'
As for the cannot set an array element with a sequence it would be helpful to know
where that is occurring. I'm guessing there is something wrong with structure of the array.
Ok, to solve it I had to do a couple things. After removing the try-except commands, I found out that the error was on line 5, the line with e,a,ad,c,s etc. I couldn't eliminate the problem until I simply copied the 2 columns I wanted to focus on only and made a new program for dealing with those.
Then I had to create a .txt instead of a .csv because Excel auto-formats the dates and literally changes the values before I can even touch them. There is no way around that, I've learned. You can't turn the date-auto-format off. A serious problem with excel. So here's my solution for this NumPy script (it changes the first column and keeps the second the same):
import numpy as np
def main():
dob,doh=np.loadtxt('temp.csv',
delimiter=',',
unpack=True,
dtype='str')
x=0
for eachDate in dob:
if any(c.isalpha() for c in eachDate):
newDate=eachDate
elif (eachDate == ''):
newDate=''
else:
sp = eachDate.split('/')
y=0
ndArray = ['','','']
for eachDig in sp:
if len(eachDig) == 1:
eachDig = str('0'+eachDig)
if y == 0:
ndArray[0] = eachDig
elif y == 1:
ndArray[1] = eachDig
elif y == 2:
ndArray[2] = eachDig
newDate=str(ndArray[2]+'-'+ndArray[0]+'-'+ndArray[1])
y=0
y+=1
print eachDate+'--->'+newDate
"""creates a .txt file with the edited dates"""
saveLine=str(newDate+','+doh[x]+'\n')
saveFile=open('__newTemp.txt','a')
saveFile.write(saveLine)
saveFile.close()
x+=1
main()
I then used Data->Import from text with "TEXT" format option in Excel to get the column into my .csv. I realize this is probably bulky and noobish but it got the job done :3
I have a dataset and am trying to work out where there are peaks in the data; a data point with a higher value than the point before and after it.
I have code which works for one dataset but now transferring it to another dataset brings up index out of range error for certain lines.
The code I have is:
for line in file.readlines():
peaks.append(0)
line = line.split(',')
time.append(float(line[0]))
TP.append(float(line[3]))
level.append(float(line[5]))
for i in range(len(level)-1):
i = i + 1
if (level[i] > level[i-1]) and (level[i] > level[i+1]):
peaks[i] = 1
noPeaks = noPeaks +1
print noPeaks
Yet for one line (so far) it says data is out of range - visually inspecting the data doesn't suggest this - the value is higher than the previous value but lower than the next so on a rising limb of the graph.
Any help would be great!
I cannot see your loop but the (level[i] > level[i+1]) suggests that you are forgetting to put
for i in range(1,len(list)-1)
key to note there is that -1 since you're doing that +1 and the range only goes to max-1 anyway.
Starting your loop at 0 would not throw an out of bounds error since list[-1] is perfectly legal in python. however, i dont think you want your first comparison to be list[-1] > list[0]
Due to edit,
You do not need to do the
i = i + 1
line in you're code, you will hit the length of the list because the for loop will also increment, causing an out of bounds error. Remove that line and it should work.
If you're looping over a list l usingi`, then you should take to handle both the first and last points specially:
for i in xrange(1, len(l) - 1):
# your check
When i is referring to the last element of level, level[i+1] will not exist and will raise IndexError.
I've rewritten this taking into account other people's answers:
for line in file:
line = line.split(',')
time.append(float(line[0]))
TP.append(float(line[3]))
level.append(float(line[5]))
peaks = [0]*len(level)
numPeaks = 0
for i in range(1, len(level)-1):
if level[i-1] < level[i] and level[i+1] < level[i]:
peaks[i] = 1
numPeaks += 1
print numPeaks