I'm just learning about regular expressions and I need to read in a text file and find every instance of a number and find the sum of all the numbers.
import re
sum = 0
list_of_numbers = list()
working_file = open("sample.txt", 'r')
for line in working_file:
line = line.rstrip()
working_list = re.findall('[0-9]+', line)
if len(working_list) != 1:
continue
print(working_list)
for number in working_list:
num = int(number)
list_of_numbers.append(num)
for number in list_of_numbers:
sum += number
print(sum)
I put the print(working_list) in order to try and debug it and see if all the numbers are getting found correctly and I've seen, by manually scanning the text file, that some numbers are being skipped while others are not. I'm confused as to why as I thought my regular expression guaranteed that any string with any amount of digits will be added to the list.
Here is the file.
You're only validating lines that have ONLY one number, so a line with two numbers will be skipped because of if len(working_list) != 1: continue, that basically says "if there isn't EXACTLY one number on this line then skip", you may have meant something like if len(working_list) < 1: continue
I would do it like:
import re
digits_re = re.compile(r'(\d+(\.\d+)?)')
with open("sample.txt", 'r') as fh:
numbers = [float(match[0]) for match in digits_re.findall(fh.read())]
print(sum(numbers))
or like you're doing with ints just
import re
digits_re = re.compile(r'(\d+)')
with open("sample.txt", 'r') as fh:
numbers = [int(match[0]) for match in digits_re.findall(fh.read())]
print(sum(numbers))
h = open('file.txt')
nos = list()
for ln in h:
fi = re.findall('[0-9]+', ln)
for i in fi:
nos.append(int(i))
print('Sum:', sum(nos))
Related
Basically, this is my task. Extract numbers from a text file and then calculate the sum of them.
I wrote the code successfully and but it doesn't work fine with 2 or more digit numbers and negative numbers. What should i do?
f = open('file6.txt', 'r')
suma = 0
file = f.readlines()
for line in file:
for i in line:
if i.isdigit() == True:
suma += int(i)
print("The sum is ", suma)
file6.txt:
1
10
Output:
The sum is 2
In your case, you are going line by line first through the loop and looking at every digit ( in second loop ) to add.
And /n at the end of elements make the .isDigit() function disabled to find the digits.
So your updated code should be like this :
f = open('file6.txt', 'r')
suma = 0
file = f.readlines()
for line in file:
if line.strip().isdigit():
suma += int(line)
print("The sum is ", suma)
Hope it helps!
Use re.split to split the input into words on anything that is not part of a number. Try to convert the words into numbers, silently skip if this fails.
import re
sum_nums_in_file = 0
with open('file6.txt') as f:
for line in f:
for word in re.split(r'[^-+\dEe.]+', line):
try:
num = float(word)
sum_nums_in_file += num
except:
pass
print(f"The sum is {sum_nums_in_file}")
This works for example on files such as this:
-1 2.0e0
+3.0
Hello I'm a few weeks into python and now learning files. I've made the program be able to sum the numbers in the file if there were only numbers but now there are numbers aswell as words. How do I make it ignore the words and make it sum to 186?
def sum_numbers_in_file(filename):
"""reads all the numbers in a file and returns the sum of the numbers"""
filename = open(filename)
lines = filename.readlines()
result = 0
for num in lines:
result = result + int(num)
num.rstrip()
filename.close()
return result
answer = sum_numbers_in_file('sum_nums_test_01.txt')
print(answer)
This is in the file:
1
Pango
2
Whero
3
4
10
Kikorangi
20
40
100
-3
4
5
You can easily add a try-except statement inside the function to make it work only on numbers:
def sum_numbers_in_file(filename):
"""reads all the numbers in a file and returns the sum of the numbers"""
filename = open(filename)
lines = filename.readlines()
result = 0
for num in lines:
try:
result = result + int(num)
num.rstrip()
except ValueError:
pass
filename.close()
return result
answer = sum_numbers_in_file('sum_nums_test_01.txt')
print(answer)
Or you can use the isalpha method:
def sum_numbers_in_file(filename):
"""reads all the numbers in a file and returns the sum of the numbers"""
filename = open(filename)
lines = filename.readlines()
result = 0
for num in lines:
num = num.rstrip()
if not num.isalpha():
result = result + int(num)
filename.close()
return result
answer = sum_numbers_in_file('sum_nums_test_01.txt')
print(answer)
The isalpha() returns true only if the string doesn't contain symbols or numbers, so you can use it to check if the string is a number. Also works on decimal numbers.
Note that it also detects symbols as numbers, so if there's a symbol in the line it will count that as a number, potentially generating errors!
You can use a try-except block, an advanced yet effective way of preventing errors. Add this in your for loop:
try:
result += int(num)
except: pass
Normally it's a good practice to add something in the except clause but we don't want anything so we just pass. The trymeans we try but if we fail we go to the except part.
I would suggest using a try/except block:
with open("words.txt") as f:
nums = []
for l in f:
try:
nums.append(float(l))
except ValueError:
pass
result = sum(nums)
A simple one-liner that you could implement to get all numerical values if you want an alternative would be:
with open("words.txt") as f:
nums = [float(l.strip()) for l in f if not l.strip().isalpha()]
result = sum(nums)
Here, I convert each line into a float and append that value to the nums list. If the line is not a numerical value, it will simply just be passed over, hence pass.
You cannot use .isnumeric() as it will only work for strings that contain only integers. This means no decimals or negative numbers.
Here are couple of way's you can try using isdigit,
value = 0
with open("sum_nums_test_01.txt") as f:
for l in f.readlines():
if l.strip().isdigit():
value += int(l)
with open("sum_nums_test_01.txt") as f:
value = sum(int(f) for f in f.readlines() if f.strip().isdigit())
I wrote the code, however, it is finding only the first number in the line, and I am kind of stuck. So if there are 2 or more numbers in line in getting only 1, what am I doing wrong? I am a beginner.
import re
fhand = open('text2.txt','r')
numlist = list()
total = 0
for line in fhand:
line = line.rstrip()
numbers = re.findall(r'[0-9]+', line)
if len(numbers) < 1: continue
for element in numbers :
num = float(numbers[0])
if num not in numlist:
numlist.append(num)
else : continue
sumlist = sum(numlist)
print(numlist)
print(sumlist)
http://py4e-data.dr-chuck.net/regex_sum_228867.txt that's the text file I am using and my sum is 191882, and the result should much bigger because my text is reading the only first number from a line. Cheers guys I will be grateful
In the comment melpomene already answered but in case you need to see, change your code to
for element in numbers :
num = float(element)
how about this (use re.M) to pass a multi-line flag.
with open('text2.txt') as f:
s = sum(map(float,re.findall(r'[0-9]+', f.read(), re.M)))
print(s)
Returns:
425922.0
The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for regular expression of [0-9]+ and then converting the extracted strings to integers and summing up the integers. I'm having different outcome it supposed to end with (209). Also, how can I simplify my code? Thanks (here is the txt file http://py4e-data.dr-chuck.net/regex_sum_167791.txt)
import re
hand = open("regex_sum_167791.txt")
total = 0
count = 0
for line in hand:
count = count+1
line = line.rstrip()
x = re.findall("[0-9]+", line)
if len(x)!= 1 : continue
num = int(x[0])
total = num + total
print(total)
Assuming that you need to sum all the numbers in your txt:
total = 0
with open("regex_sum_167791.txt") as f:
for line in f:
total += sum(map(int, re.findall("\d+", line)))
print(total)
# 417209
Logics
To start with, try using with when you do open so that once any job is done, open is closed.
Following lines are removed as they seemed redundant:
count = count+1: Not used.
line = line.rstrip(): re.findall takes care of extraction, so you don't have to worry about stripping lines.
if len(x)!= 1 : continue: Seems like you wanted to skip the line with no digits. But since sum(map(int, re.findall("\d+", line))) returns zero in such case, this is also unnecessary.
num = int(x[0]): Finally, this effectively grabs only one digit from the line. In case of two or more digits found in a single line, this won't serve the original purpose. And since int cannot be directly applied to iterables, I used map(int, ...).
You were almost there:
import re
hand = open("regex_sum_167791.txt")
total = 0
for line in hand:
count = count+1
line = line.rstrip()
x = re.findall("[0-9]+", line)
for i in x:
total += int(i)
print(total)
Answer: 417209
I have a text file which includes integers in the text. There are one or more integers in a line or none. I want to find these integers with regular expressions and compute the sum.
I have managed to write the code:
import re
doc = raw_input("File Name:")
text = open(doc)
lst = list()
total = 0
for line in text:
nums = re.findall("[0-9]+", line)
if len(nums) == 0:
continue
for num in nums:
num = int(num)
total += num
print total
But i also want to know the list comprehension version, can someone help?
Since you want to calculate the sum of the numbers after you find them It's better to use a generator expression with re.finditer() within sum(). Also if the size of file in not very huge you better to read it at once, rather than one line at a time.
import re
doc = raw_input("File Name:")
with open(doc) as f:
text = f.read()
total = sum(int(g.group(0)) for g in re.finditer(r'\d+', text))