I have written this code to extract only digits from a text file and then calculate sum of those values extracted . But I am getting 0 as answer which should 285701 in actual. I don't understand what I am doing wrong even after working on it for long, I am not very experienced in programming just started learning.
import re
fname = open("http://py4e-data.dr-chuck.net/regex_sum_1501185.txt")
sum = 0
value = list()
for line in fname:
line = re.findall("[0-9]+", line)
value = value + line
for x in value:
sum = sum + int(x)
print(sum)
You can't open web urls with open() you need to use urllib.request.urlopen():
import urllib.request
import re
fname = urllib.request.urlopen("http://py4e-data.dr-chuck.net/regex_sum_1501185.txt")
data = fname.read().decode()
data = data.split('\n')
sum = 0
value = list()
for line in data:
nums = re.findall("[0-9]+", line)
value = value + nums
for x in value:
sum = sum + int(x)
print(sum)
Output:
285701
You need to be careful with your variable names naming your variable sum causes that you won't be able to use the builtin function sum()
It would be better if your code looks like that:
import urllib.request
import re
fname = urllib.request.urlopen("http://py4e-data.dr-chuck.net/regex_sum_1501185.txt")
data = fname.read(50000).decode()
data = data.split('\n')
value = list()
for line in data:
line = re.findall("[0-9]+", line)
value = value + [int(i) for i in line]
print(sum(value))
Docs
Related
I've been trying to figure this out for about a year now and I'm really burnt out on it so please excuse me if this explanation is a bit rough.
I cannot include job data, but it would be accurate to imagine 2 csv files both with the first column populated with values (Serial numbers/phone numbers/names, doesn't matter - just values). Between both csv files, some values would match while other values would only be contained in one or the other (Timmy is in both files and is a match, Robert is only in file 1 and does not match any name in file 2).
I can successfully output a csv value ONCE that exists in the both csv files (I.e. both files contain "Value78", output file will contain "Value78" only once).
When I try to tack on an else statement to my if condition, to handle non-matching items, the program will output 1 entry for every item it does not match with (makes 100% sense, matches happen once but every other comparison result besides the match is a non-match).
I cannot envision a structure or method to hold the fields that don't match back so that they can be output once and not overrun my terminal or output file.
My goal is to output two csv files, matches and non-matches, with the non-matches having only one entry per value.
Anyways, onto the code:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
OFile.write(MyList[x][0] + '\n')
else:
pass
The "else: pass" is where the logic of filtering out non-matches is escaping me. Outputting from this else statement will write the non-matching value (len(VList) - 1) times for an iteration that DOES produce 1 match, the entire len(VList) for an iteration with no match. I've tried using a counter and only outputting if the counter equals the len(VList), (incrementing in the else statement, writing output under the scope of the second for loop), but received the same output as if I tried outputting non-matches.
Below is one way you might go about deduplicating and then writing to a file:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
list_of_non_matches = []
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
OFile.write(MyList[x][0] + '\n')
else:
list_of_non_matches.append(MyList[x][0])
# Remove duplicates from the non matches
new_list = []
[new_list.append(x) for x in list_of_non_matches if x not in new_list]
# Write the new list to a file
for i in new_list:
NFile.write(i + '\n')
Does this work?
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,'r') as MFile,
(VENDORUNITS,'r') as VFile,
(MATCHES,'w') as OFile,
(NONMATCHES,mode,'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
MyVals = [x for x in MyList]
MyVals = [x[0] for x in MyVals]
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
vVals = [x for x in VList]
vVals = [x[0] for x in vVals]
for val in MyVals:
if val in vVals:
OFile.write(Val + '\n')
else:
NFile.write(Val + '\n')
#for x in range(len(MyList)):
# for y in range(len(VList)):
# if str(MyList[x][0]) == str(VList[y][0]):
# OFile.write(MyList[x][0] + '\n')
# else:
# pass
Sorry, I had some issues with my PC. I was able to solve my own question the night I posted. The solution I used is so simple I'm kicking myself for not figuring it out way sooner:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
tmpStr = ''
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
tmpStr = '' #Sets to blank so comparison fails, works because break
OFile.write(MyList[x][0] + '\n')
break
else:
tmp = str(MyList[x][0])
if tmp != '':
NFile.write(tmp + '\n')
I have a .txt-file called ecc.txt. It contains more than 8000 lines of numbers. I want to count the average of every 360 lines in that file.
Here is the code:
import math
f = open(r'ecc.txt').read()
data = []
for line in data:
sum = 0
for i in range (len(data)):
if i % 360 != 0:
sum = sum + ecc[i]
else:
average = sum / 360
print(average)
sum = 0
When I am running it, nothing happens. I didn't get any results. The code just running and end without any result.
Is there something wrong with this code?
Thank you.
avg_dict = {}
with open('ecc.txt') as f:
data = f.read().split(' ')
sum = 0
i = 0
for str_number in data:
sum += int(str_number)
i += 1
if i % 360 == 0:
avg_dict[i] = sum/360
sum = 0
I've assumed that your file text has an empty space as separator. Otherwise, you can change the sep value in the split method. If there is not separator change data as:
data = list(f.read())
You code would work with some changes:
import math
data=[]
with open(r'ecc.txt') as f:
for i in f:
data.append(int(i))
for line in data:
sum = 0
for i in range (len(data)):
if i%360 !=0:
sum = sum + ecc[i]
else:
average = sum/360
print(average)
sum=0
Be aware though, that this code doesn't include values for each 360th element (i guess it's not a problem for an average) and also you don't have average for last elements
I am doing an exercise for finding all the float point values in a text file and computing the average .
I have managed to extract all the necessary values but they are being stored in a list of lists and I don't know how extract them as floats in order to do the calculations .
Here is my code :
import re
fname = input("Enter file name: ")
fhandle = open(fname)
x = []
count = 0
for line in fhandle:
if not line.startswith("X-DSPAM-Confidence:") : continue
s = re.findall(r"[-+]?\d*\.\d+|\d+", line)
x.append(s)
count = count + 1
print(x)
print("Done")
and this is the output of x :
[['0.8475'], ['0.6178'], ['0.6961'], ['0.7565'], ['0.7626'], ['0.7556'], ['0.7002'], ['0.7615'], ['0.7601'], ['0.7605'], ['0.6959'], ['0.7606'], ['0.7559'], ['0.7605'], ['0.6932'], ['0.7558'], ['0.6526'], ['0.6948'], ['0.6528'], ['0.7002'], ['0.7554'], ['0.6956'], ['0.6959'], ['0.7556'], ['0.9846'], ['0.8509'], ['0.9907']]
Done
You can make x a flat list of floats from the start:
# ...
for line in fhandle:
# ...
s = re.findall(r"[-+]?\d*\.\d+|\d+", line)
x.extend(map(float, s))
Note that re.findall returns a list, so we extend x by it while applying float to all the strings in it.
so -----2-----3----5----2----3----- would become -----4-----5----7----4----5-----
if the constant was 2 and etc. for every individual line in the text file.
This would involve splitting recognising numbers in between strings and adding a constant to them e.g ---15--- becomes ---17--- not ---35---.
(basically getting a guitar tab and adding a constant to every fret number)
Thanks. Realised this started out vague and confusing so sorry about that.
lets say the file is:
-2--3--5---7--1/n-6---3--5-1---5
and im adding 2, it should become:
-4--5--7---9--3/n-8---5--7-3---7
Change the filename to something relevant and this code will work. Anything below new_string needs to be change for what you need, eg writing to a file.
def addXToAllNum(int: delta, str: line):
values = [x for x in s.split('-') if x.isdigit()]
values = [str(int(x) + delta) for x in values]
return '--'.join(values)
new_string = '' # change this section to save to new file
for line in open('tabfile.txt', 'r'):
new_string += addXToAllNum(delta, line) + '\n'
## general principle
s = '-4--5--7---9--3 -8---5--7-3---7'
addXToAllNum(2, s) #6--7--9--11--10--7--9--5--9
This takes all numbers and increments by the shift regardless of the type of separating characters.
import re
shift = 2
numStr = "---1----9---15---"
print("Input: " + numStr)
resStr = ""
m = re.search("[0-9]+", numStr)
while (m):
resStr += numStr[:m.start(0)]
resStr += str(int(m.group(0)) + shift)
numStr = numStr[m.end(0):]
m = re.search("[0-9]+", numStr)
resStr += numStr
print("Result:" + resStr)
Hi You Can use that to betwine every line in text file add -
rt = ''
f = open('a.txt','r')
app = f.readlines()
for i in app:
rt+=str(i)+'-'
print " ".join(rt.split())
import re
c = 2 # in this example, the increment constant value is 2
with open ('<your file path here>', 'r+') as file:
new_content = re.sub (r'\d+', lambda m : str (int (m.group (0)) + c), file.read ())
file.seek (0)
file.write (new_content)
The problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.
MY CODE: in which sample.txt is my text file
import re
hand = open('sample.txt')
for line in hand:
line = line.rstrip()
x = re.findall('[0-9]+',line)
print x
x = [int(i) for i in x]
add = sum(x)
print add
OUTPUT:
You need to append the find results to another list. So that the number found on current line will be kept back when iterating over to the next line.
import re
hand = open('sample.txt')
l = []
for line in hand:
x = re.findall('[0-9]+',line)
l.extend(x)
j = [int(i) for i in l]
add = sum(j)
print add
or
with open('sample.txt') as f:
print sum(map(int, re.findall(r'\d+', f.read())))
try this
import re
hand = open("a.txt")
x=list()
for line in hand:
y = re.findall('[0-9]+',line)
x = x+y
sum=0
for z in x:
sum = sum + int(z)
print(sum)