The problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.
MY CODE: in which sample.txt is my text file
import re
hand = open('sample.txt')
for line in hand:
line = line.rstrip()
x = re.findall('[0-9]+',line)
print x
x = [int(i) for i in x]
add = sum(x)
print add
OUTPUT:
You need to append the find results to another list. So that the number found on current line will be kept back when iterating over to the next line.
import re
hand = open('sample.txt')
l = []
for line in hand:
x = re.findall('[0-9]+',line)
l.extend(x)
j = [int(i) for i in l]
add = sum(j)
print add
or
with open('sample.txt') as f:
print sum(map(int, re.findall(r'\d+', f.read())))
try this
import re
hand = open("a.txt")
x=list()
for line in hand:
y = re.findall('[0-9]+',line)
x = x+y
sum=0
for z in x:
sum = sum + int(z)
print(sum)
Related
I have written this code to extract only digits from a text file and then calculate sum of those values extracted . But I am getting 0 as answer which should 285701 in actual. I don't understand what I am doing wrong even after working on it for long, I am not very experienced in programming just started learning.
import re
fname = open("http://py4e-data.dr-chuck.net/regex_sum_1501185.txt")
sum = 0
value = list()
for line in fname:
line = re.findall("[0-9]+", line)
value = value + line
for x in value:
sum = sum + int(x)
print(sum)
You can't open web urls with open() you need to use urllib.request.urlopen():
import urllib.request
import re
fname = urllib.request.urlopen("http://py4e-data.dr-chuck.net/regex_sum_1501185.txt")
data = fname.read().decode()
data = data.split('\n')
sum = 0
value = list()
for line in data:
nums = re.findall("[0-9]+", line)
value = value + nums
for x in value:
sum = sum + int(x)
print(sum)
Output:
285701
You need to be careful with your variable names naming your variable sum causes that you won't be able to use the builtin function sum()
It would be better if your code looks like that:
import urllib.request
import re
fname = urllib.request.urlopen("http://py4e-data.dr-chuck.net/regex_sum_1501185.txt")
data = fname.read(50000).decode()
data = data.split('\n')
value = list()
for line in data:
line = re.findall("[0-9]+", line)
value = value + [int(i) for i in line]
print(sum(value))
Docs
I have a list of numbers like so;
7072624 through 7072631
7072672 through 7072687
7072752 through 7072759
7072768 through 7072783
The below code is what I have so far, i've removed the word "through" and it now prints a list of numbers.
import os
def file_read(fname):
content_array = []
with open (fname) as f:
for line in f:
content_array.append(line)
#print(content_array[33])
#new_content_array = [word for line in content_array[33:175] for word in line.split()]
new_content_array = [word for line in content_array[33:37] for word in line.split()]
while 'through' in new_content_array: new_content_array.remove('through')
print(new_content_array)
file_read('numbersfile.txt')
This gives me the following output.
['7072624', '7072631', '7072672', '7072687', '7072752', '7072759', '7072768', '7072783']
So what I'm wanting to do but struggling to find is how to split the 'new_content_array' into two arrays so the output is as follows.
array1 = [7072624, 7072672, 7072752, 7072768]
array2 = [7072631, 7072687, 7072759, 7072783]
I then want to be able to take each value in array 2 from the value in array 1
7072631 - 7072624
7072687 - 7072672
7072759 - 7072752
7072783 - 7072768
I've been having a search but can't find anything similar to my situation.
Thanks in advance!
Try this below:
list_data = ['7072624', '7072631', '7072672', '7072687', '7072752', '7072759', '7072768', '7072783']
array1 = [int(list_data[i]) for i in range(len(list_data)) if i % 2 == 0]
array2 = [int(list_data[i]) for i in range(len(list_data)) if i % 2 != 0]
l = ['7072624', '7072631', '7072672', '7072687', '7072752', '7072759','7072768', '7072783']
l1 = [l[i] for i in range(len(l)) if i % 2 == 0]
l2 = [l[i] for i in range(len(l)) if i % 2 == 1]
print(l1) # ['7072624', '7072672', '7072752', '7072768']
print(l2) # ['7072631', '7072687', '7072759', '7072783']
result = list(zip(l1,l2))
As a result you will get:
[('7072624', '7072631'),
('7072672', '7072687'),
('7072752', '7072759'),
('7072768', '7072783')]
I think that as comprehension list, but you could also use filter
You could try to split line using through keyword,
then removing all non numeric chars such as new line or space using a lambda function and regex inside a list comprehension
import os
import re
def file_read(fname):
new_content_array = []
with open (fname) as f:
for line in f:
line_array = line.split('through')
new_content_array.append([(lambda x: re.sub(r'[^0-9]', "", x))(element) for element in line_array])
print(new_content_array)
file_read('numbersfile.txt')
Output looks like this:
[['7072624', '7072631'], ['7072672', '7072687'], ['7072752', '7072759'], ['7072768', '7072783']]
Then you just could extract first element of each nested list to store separately in a variable and so on with second element.
Good luck
I want to open a file, get numbers after the = sign, and put the result into a list. I did the first steps, but I'm stuck with assignment of the results into a list.
I tried to create a list and assign the result on it but when I print my list it shows me only the last results:
import cv2 as cv
import time
import numpy
from math import log
import csv
import re
statList = []
with open("C:\\ProgramData\\OutilTestObjets3D\\MaquetteCB-2019\\DataSet\\DEFAULT\\terrain\\3DObjects\\building\\house01.ive.stat.txt", 'r') as f:
#
statList = f.readlines()
statList = [x.strip() for x in statList]
for line in statList :
if (re.search("=" ,str(line))):
if (re.search('#IND',str(line))):
print("ERREUR")
else:
results = re.findall("=\s*?(\d+\.\d+|\d+)", str(line))
print ("result="+str(results))
statList.append(log(float(results[0])))
floatList = [str(results)]
print(floatList)
Its because you are overwriting results variable each time through your loop.
try
#
results = []
statList = f.readlines()
statList = [x.strip() for x in statList]
for line in statList :
if (re.search("=" ,str(line))):
if (re.search('#IND',str(line))):
print("ERREUR")
else:
results.extend(re.findall("=\s*?(\d+\.\d+|\d+)", str(line)))
print ("result="+str(results))
statList.append(log(float(results[0])))
floatList = [str(results)]
print(floatList)
The problem with your program is defining an empty list statList, then redefine it as statList = f.readlines() and append results to it. So, change the name of empty list, then you can use extend as long as results are list objects. And finally, use built-in map function to apply a function for every single item of your list:
from math import log
import re
final_result = []
with open("file.txt", 'r') as f:
#
statList = f.readlines()
statList = [x.strip() for x in statList]
for line in statList :
if (re.search("=" ,str(line))):
if (re.search('#IND',str(line))):
print("ERREUR")
else:
result = re.findall("=\s*?(\d+\.\d+|\d+)", str(line))
print("result=" + result[0])
final_result.extend(result)
# final_result.append(result[0])
floats_list = list(map(float, final_result))
logs_list = list(map(log, floats_list))
I have a file with lots of lines that contain numbers seperated by coma , on each line.
Some lines contain numbers that are float numbers ie: ['9.3']
i have been trying to delete those numbers from the list for about 1h~ but no luck. and whenever it tries to use int on a float number it gives error. Im not sure how to remove these floating numbers from the lines
the numbers: https://pastebin.com/7vveTxjW
this is what i've done so far:
with open('planets.txt','r') as f:
lst = []
temp = []
for line in f:
l = line.strip()
l = l.split(',')
for x in l:
if x == '':
l[l.index(x)] = 0
elif x == '\n' or x == '0':
print "removed value", l[l.index(x)]
del l[l.index(x)]
try:
temp.append([int(y) for y in l])
except ValueError:
pass
First off, modifying the list you are iterating over is a bad idea. A better idea might be to construct a new list from the elements that can be converted to an int.
def is_int(element):
try:
int(element)
except ValueError:
return False
return True
with open('planets.txt','r') as f:
lst = []
temp = []
for line in f:
l = line.strip().split(',')
temp.append([int(y) for y in l if is_int(y)])
If you want to include the float values' integral component, you can do:
def is_float(element):
try:
float(element)
except ValueError:
return False
return True
with open('planets.txt','r') as f:
lst = []
temp = []
for line in f:
l = line.strip().split(',')
temp.append([int(float(y)) for y in l if is_float(y)])
looks like youre over complicating it, once you have got all your numbers from the file to a list just run this
numbers=["8","9","10.5","11.1"]
intList=[]
for num in numbers:
try:
int_test=int(num)
intList.append(num)
except ValueError:
pass
just change my numbers list to your list name
You can just match digits followed by a point followed by more digits:
import re
output_list = []
input = open('input.txt', 'r')
for line in input:
if '.' not in line:
output_list.append(line)
else:
output_list.append(re.sub(r"(\d+\.\d+)", '', line))
print("Removed", re.search(r"(\d+\.\d+)", line).group(1))
I would keep the numbers as string and simply check if there is a 'dot' in each 'splitted item' to identify floats.
with open('planets.txt','r') as f:
for line in f:
line = ','.join([item for item in line.strip().split(',') if not '.' in item])
print(line)
# ... write to file ...
When you get a list that contains digits (for instance, ['1','2.3','4.5','1.0']) you can use the following
intDigits = [ i for i in digits if float(i) - int(float(i)) == 0]
Doing int() on a float shouldn't give an error. Maybe your 'float' is actually a string as you're reading from the file, because
int('9.3')
doesn't work but
int(9.3)
does.
Edit:
How about applying this function to every number
def intify(n):
if n == '':
return n
try:
return int(n)
except ValueError:
return int(float(n))
If you just need to remove floats this was the simplest method for me.
mixed = [1,float(1),2,3,4.3,5]
ints = [i for i in mixed if type(i) is not float]
Results in: [1, 2, 3, 5]
This question already has answers here:
Sum of strings extracted from text file using regex
(10 answers)
Closed 6 years ago.
I am extracting numbers from sentences within a large file. Example sentence (in file finalsum.txt):
the cat in the 42 hat and the cow 1772 jumps over the moon.
I then use regular expressions to create a list where I want to add together 1772 + 42 for the entire file creating a final sum of all the numbers.
Here is my current code :
import re
import string
fname = raw_input('Enter file name: ')
try:
if len(fname) < 1: fname = "finalsum.txt"
handle = open(fname, 'r')
except:
print 'Cannot open file:', fname
counts = dict()
numlist = list()
for line in handle:
line = line.rstrip()
x = re.findall('([0-9]+)', line)
if len(x) > 0:
result = map(int, x)
numlist.append(result)
print numlist
I know this code can be written in two lines using line comprehension, but I am just learning. Thank you!
You should not append the list but instead join the lists using + operator.
for line in handle:
line = line.rstrip()
x = re.findall('([0-9]+)', line)
if len(x) > 0:
result = map(int, x)
numlist+=result
#numlist.append(result)
print numlist
print sum(numlist)
Illustration:
>>> a = [1,2]
>>> b = [2,3]
>>> c = [4,5]
>>> a.append(b)
>>> a
[1, 2, [2, 3]]
>>> print b + c
[2, 3, 4, 5]
As you can see, append() appends list object at the end where as + joins the two lists.
Check this:
import re
line = 'the cat in the 42 hat and the cow 1772 jumps over the moon 11 11 11'
y = re.findall('([0-9]+)', line)
print sum([int(z) for z in y])
It does not create nested lists rather it adds all the value in a given list after converting it to integer. You can manage a temporary variable which will store the final sum and keep adding the sum of all the integers in the current line to this temporary variable.
Like this:
answer = 0
for line in handle:
line = line.rstrip()
x = re.findall('([0-9]+)', line)
if len(x) > 0:
answer += sum([int(z) for z in x])
print answer