Regular Expressions assignment - python

I am taking an online class for python. I've been at this for 2 weeks. I've written the following code to find numbers in a sample text document. The problem I'm having is when I move from line to line and run the regex, it finds the first set of numbers, then skips any remaining numbers on the line and moves to the next line where it matches only the first number on the line. My code is below:
#!/usr/bin/python
import re
try:
fname = raw_input("Enter file name: ")
fh = open(fname)
except:
print 'Invalid Input'
quit()
numlist = list()
for line in fh:
nums = re.findall('[0-9]+',line)
if len(nums) < 1 : continue
num = int(nums[0])
numlist.append(num)
print (numlist)

you are explicitly telling it to skip all numbers but the first:
num = int(nums[0])
instead, use a list comprehension to coerce to int and append the entire list using extend().
numlist.extend([int(x) for x in num])

As others already noted, you're discarding all other numbers in the list and taking only the first element. You can use the map function to convert the numbers to int and then extend the list
for line in fh:
nums = re.findall('[0-9]+',line)
if len(nums) < 1 : continue
nums = map(int, nums)
numlist.extend(nums)

The problem is that you're not looping on nums, but only appending the first item in the nums list.
To solve this, you should iterate on nums and append each item.

Related

How to process words and numbers in a file in python

Hello I'm a few weeks into python and now learning files. I've made the program be able to sum the numbers in the file if there were only numbers but now there are numbers aswell as words. How do I make it ignore the words and make it sum to 186?
def sum_numbers_in_file(filename):
"""reads all the numbers in a file and returns the sum of the numbers"""
filename = open(filename)
lines = filename.readlines()
result = 0
for num in lines:
result = result + int(num)
num.rstrip()
filename.close()
return result
answer = sum_numbers_in_file('sum_nums_test_01.txt')
print(answer)
This is in the file:
1
Pango
2
Whero
3
4
10
Kikorangi
20
40
100
-3
4
5
You can easily add a try-except statement inside the function to make it work only on numbers:
def sum_numbers_in_file(filename):
"""reads all the numbers in a file and returns the sum of the numbers"""
filename = open(filename)
lines = filename.readlines()
result = 0
for num in lines:
try:
result = result + int(num)
num.rstrip()
except ValueError:
pass
filename.close()
return result
answer = sum_numbers_in_file('sum_nums_test_01.txt')
print(answer)
Or you can use the isalpha method:
def sum_numbers_in_file(filename):
"""reads all the numbers in a file and returns the sum of the numbers"""
filename = open(filename)
lines = filename.readlines()
result = 0
for num in lines:
num = num.rstrip()
if not num.isalpha():
result = result + int(num)
filename.close()
return result
answer = sum_numbers_in_file('sum_nums_test_01.txt')
print(answer)
The isalpha() returns true only if the string doesn't contain symbols or numbers, so you can use it to check if the string is a number. Also works on decimal numbers.
Note that it also detects symbols as numbers, so if there's a symbol in the line it will count that as a number, potentially generating errors!
You can use a try-except block, an advanced yet effective way of preventing errors. Add this in your for loop:
try:
result += int(num)
except: pass
Normally it's a good practice to add something in the except clause but we don't want anything so we just pass. The trymeans we try but if we fail we go to the except part.
I would suggest using a try/except block:
with open("words.txt") as f:
nums = []
for l in f:
try:
nums.append(float(l))
except ValueError:
pass
result = sum(nums)
A simple one-liner that you could implement to get all numerical values if you want an alternative would be:
with open("words.txt") as f:
nums = [float(l.strip()) for l in f if not l.strip().isalpha()]
result = sum(nums)
Here, I convert each line into a float and append that value to the nums list. If the line is not a numerical value, it will simply just be passed over, hence pass.
You cannot use .isnumeric() as it will only work for strings that contain only integers. This means no decimals or negative numbers.
Here are couple of way's you can try using isdigit,
value = 0
with open("sum_nums_test_01.txt") as f:
for l in f.readlines():
if l.strip().isdigit():
value += int(l)
with open("sum_nums_test_01.txt") as f:
value = sum(int(f) for f in f.readlines() if f.strip().isdigit())

Findin numbers in txt by reg expression, small problem

I wrote the code, however, it is finding only the first number in the line, and I am kind of stuck. So if there are 2 or more numbers in line in getting only 1, what am I doing wrong? I am a beginner.
import re
fhand = open('text2.txt','r')
numlist = list()
total = 0
for line in fhand:
line = line.rstrip()
numbers = re.findall(r'[0-9]+', line)
if len(numbers) < 1: continue
for element in numbers :
num = float(numbers[0])
if num not in numlist:
numlist.append(num)
else : continue
sumlist = sum(numlist)
print(numlist)
print(sumlist)
http://py4e-data.dr-chuck.net/regex_sum_228867.txt that's the text file I am using and my sum is 191882, and the result should much bigger because my text is reading the only first number from a line. Cheers guys I will be grateful
In the comment melpomene already answered but in case you need to see, change your code to
for element in numbers :
num = float(element)
how about this (use re.M) to pass a multi-line flag.
with open('text2.txt') as f:
s = sum(map(float,re.findall(r'[0-9]+', f.read(), re.M)))
print(s)
Returns:
425922.0

How to extract all the numbers from a text file using re.findall() and compute the sum using a for-loop?

The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for regular expression of [0-9]+ and then converting the extracted strings to integers and summing up the integers. I'm having different outcome it supposed to end with (209). Also, how can I simplify my code? Thanks (here is the txt file http://py4e-data.dr-chuck.net/regex_sum_167791.txt)
import re
hand = open("regex_sum_167791.txt")
total = 0
count = 0
for line in hand:
count = count+1
line = line.rstrip()
x = re.findall("[0-9]+", line)
if len(x)!= 1 : continue
num = int(x[0])
total = num + total
print(total)
Assuming that you need to sum all the numbers in your txt:
total = 0
with open("regex_sum_167791.txt") as f:
for line in f:
total += sum(map(int, re.findall("\d+", line)))
print(total)
# 417209
Logics
To start with, try using with when you do open so that once any job is done, open is closed.
Following lines are removed as they seemed redundant:
count = count+1: Not used.
line = line.rstrip(): re.findall takes care of extraction, so you don't have to worry about stripping lines.
if len(x)!= 1 : continue: Seems like you wanted to skip the line with no digits. But since sum(map(int, re.findall("\d+", line))) returns zero in such case, this is also unnecessary.
num = int(x[0]): Finally, this effectively grabs only one digit from the line. In case of two or more digits found in a single line, this won't serve the original purpose. And since int cannot be directly applied to iterables, I used map(int, ...).
You were almost there:
import re
hand = open("regex_sum_167791.txt")
total = 0
for line in hand:
count = count+1
line = line.rstrip()
x = re.findall("[0-9]+", line)
for i in x:
total += int(i)
print(total)
Answer: 417209

Python list comprehensions with regular expressions on a text

I have a text file which includes integers in the text. There are one or more integers in a line or none. I want to find these integers with regular expressions and compute the sum.
I have managed to write the code:
import re
doc = raw_input("File Name:")
text = open(doc)
lst = list()
total = 0
for line in text:
nums = re.findall("[0-9]+", line)
if len(nums) == 0:
continue
for num in nums:
num = int(num)
total += num
print total
But i also want to know the list comprehension version, can someone help?
Since you want to calculate the sum of the numbers after you find them It's better to use a generator expression with re.finditer() within sum(). Also if the size of file in not very huge you better to read it at once, rather than one line at a time.
import re
doc = raw_input("File Name:")
with open(doc) as f:
text = f.read()
total = sum(int(g.group(0)) for g in re.finditer(r'\d+', text))

Why is this not correct? (codeeval challenge)PYTHON

This is what I have to do https://www.codeeval.com/open_challenges/140/
I've been on this challenge for three days, please help. It it is 85-90 partially solved. But not 100% solved... why?
This is my code:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
saver=[]
text=""
textList=[]
positionList=[]
num=0
exists=int()
counter=0
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
print text
test_cases.close()
This code works for me:
import sys
def main(name_file):
_file = open(name_file, 'r')
text = ""
while True:
try:
line = _file.next()
disordered_line, numbers_string = line.split(';')
numbers_list = map(int, numbers_string.strip().split(' '))
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
if missing_number == 0:
missing_number = len(disordered_line)
numbers_list.append(missing_number)
disordered_list = disordered_line.split(' ')
string_position = zip(disordered_list, numbers_list)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
except StopIteration:
break
_file.close()
print text.strip()
if __name__ == '__main__':
main(sys.argv[1])
I'll try to explain my code step by step so maybe you can see the difference between your code and mine one:
while True
A loop that breaks when there are no more lines.
try:
I put the code inside a try and catch the StopIteracion exception, because this is raised when there are no more items in a generator.
line = _file.next()
Use a generator, so that way you do not put all the lines in memory from once.
disordered_line, numbers_string = line.split(';')
Get the unordered phrase and the numbers of every string's position.
numbers_list = map(int, numbers_string.strip().split(' '))
Convert every number from string to int
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
Get the missing number from the serial of numbers, so that missing number is the position of the last string in the phrase.
if missing_number == 0:
missing_number = len(unorder_line)
Check if the missing number is equal to 0 if so then the really missing number is equal to the number of the strings that make the phrase.
numbers_list.append(missing_number)
Append the missing number to the list of numbers.
disordered_list = disordered_line.split(' ')
Conver the disordered phrase into a list.
string_position = zip(disordered_list, numbers_list)
Combine every string with its respective position.
ordered = sorted(string_position, key = lambda x: x[1])
Order the combined list by the position of the string.
text += " ".join([x[0] for x in ordered])
Concatenate the ordered phrase, and the reamining code it's easy to understand.
UPDATE
By looking at your code here is my opinion tha might solve your problem.
split already returns a list so you do not have to loop over the splitted content to add that content to another list.
So these six lines:
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
can be converted into three:
splitted_test = test.strip().split(';')
textList = splitted_test[0].split(" ")
positionList = map(int, splitted_test[1].split(" "))
In this line positionList = map(int, splitted_test[0].split(" ")) You already convert numbers into int, so you save these two lines:
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
The next lines:
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
can be converted into the next four:
missing_number = sum(xrange(sorted(positionList)[0],sorted(positionList)[-1]+1)) - sum(positionList)
if missing_number == 0:
missing_number = len(textList)
positionList.append(missing_number)
Basically what these lines do is calculate the missing number in the serie of numbers so the len of the serie is the same as textList.
The next lines:
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
Can be replaced by these ones:
string_position = zip(textList, positionList)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
From this way you can save, lines and memory, also use xrange instead of range.
Maybe the factors that make your code pass partially could be:
Number of lines of the script
Number of time your script takes.
Number of memory your script uses.
What you could do is:
Use Generators. #You save memory
Reduce for's, this way you save lines of code and time.
If you think something could be made it easier, do it.
Do not redo the wheel, if something has been already made it, use it.

Categories