Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am new at this and couldn't search my exact problem. Though it may be simple to many of you, it is certainly a bowl full of aggravation to me. So, here it is...
I have a text file with columns and rows. The columns are string and the rows are numeric.
EXAMPLE: text file.
Line 1: a.1 2.g 2.2 b.3
Line 2: --------------------
Line 3: 1 2 4 1
Line 4: 3 3 1 1
Line 5: 2 1 5 8
I need to read the text file and do some simple math.
1. sum rows 3,4,5.
The final result should like the following:
FINAL
Line 1:
Line 2:
Line 3: 8
Line 4: 8
Line 5: 16
Here is what i have so far...
files = open("exam-grades.txt", "r")
line_number = 1
for line in files:
if (line_number == 1 or line_number == 2):
continue
else:
sum = 0
numbers = line.split("\t")
for n in numbers:
sum = sum + float(n)
print "Line #: %d / Sum is %d ."%(line_number,sum)
line_number = line_number + 1
line_number is 1 initially, so it hits the first if statement and continues without incrementing line_number, and thus does the same thing for every line, and doesn't process any of them.
You could correct for this by replacing the top of your loop like so:
for line in files:
if line_number>=3:
sum = 0
...
with open("exam-grades.txt", "r") as f:
for i, line in enumerate(f, start=1):
if i >= 3:
l = line.split()
print l[0], l[1], sum((int(i) for i in l[2:] if i.isdigit()))
Logic here is pretty simple, you're enumerating lines in a files starting from 1.
Checking if a line number is 3 or greater, splitting a line into a list and verifying if a list element might be converted to int, if it might we summing it up and print out.
I would try to make it bit more flexible - i.e. we won't sum up numbers for lines where at least one value can't be converted to float/number:
def to_number(s):
try:
return float(s)
except ValueError:
return None
def sum_in_line(s):
sm = 0
cols = s.split()
try:
return sum(to_number(x) for x in cols)
except TypeError:
return ''
with open("exam-grades.txt", "r") as f:
i = 1
for line in f:
print('%4d:\t%s' % (i, sum_in_line(line)))
i += 1
Input data:
a.1 2.g 2.2 b.3
--------------------
1 2 4 1
3 3 1 1
2 1 5 8
1 a Z 12
Output:
1:
2:
3: 8.0
4: 8.0
5: 16.0
6:
1) Use pass instead of continue, if you use continue line_number will not be updated
2) to get numbers use first_part, numbers = line.split(':');numbers = number.split().
because if you try to get numbers like you did it will fail:
>>> sum = 0
>>> line="Line 3: 1 2 4 1"
>>> numbers = line.split("\t")
>>> numbers
['Line 3:', '1', '2', '4', '1']
>>> for n in numbers:
... sum = sum + float(n)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: could not convert string to float: Line 3:
You shoud skip the first element 'Line 3:'.
Try this:
files = open("exam-grades.txt", "r")
line_number = 1
for line in files:
if (line_number == 1 or line_number == 2 or line_number == 3):
print "Line %d:" % line_number
pass
else:
sum = 0
first_part, numbers = line.split(":")
for n in numbers.split():
sum = sum + float(n)
print "Line %d: Sum is %d" % (line_number,sum)
line_number = line_number + 1
Seems a bit like a natural for regular expressions to me, depending on the complexity of your numbers... in this example, I have "data" as a generic iterator for your input data:
import re
valid = re.compile(r'^(?:\s*(\d+\s*))+\s*$')
numbers = re.compile(r'(\d+)')
for line,row in enumerate(data):
if valid.match(row):
print "Line #: %d / Sum is %d ."%(line+1,sum(map(int, numbers.findall(row))))
Related
After reading a text, I need to add 1 to a sum if I find a ( character, and subtract 1 if I find a ) character in the text. I can't figure out what I'm doing wrong.
This is what I tried at first:
file = open("day12015.txt")
sum = 0
up = "("
for item in file:
if item is up:
sum += 1
else:
sum -= 1
print(sum)
I have this long text like the following example (((())))((((( .... If I find a ), I need to subtract 1, if I find a (, I need to add 1. How can I solve it? I'm always getting 0 as output even if I change my file manually.
your for loop only gets all the string in the file so you have to loop through the string to get your desired output.
Example .txt
(((())))(((((
Full Code
file = open("Data.txt")
sum = 0
up = "("
for string in file:
for item in string:
if item is up:
sum += 1
else:
sum -= 1
print(sum)
Output
5
Hope this helps.Happy Coding :)
So you need to sum +1 for "(" character and -1 for ")".
Do it directly specifying what to occur when you encounter this character. Also you need to read the lines from a file as you're opening it. In your code, you are substracting one for every case that is not "(".
file = open("day12015.txt")
total = 0
for line in file:
for character in line:
if character == "(":
total += 1
elif character == ")":
total -= 1
print(sum)
That's simply a matter of counting each character in the text. The sum is the difference between those counts. Look:
from pathlib import Path
file = Path('day12015.txt')
text = file.read_text()
total = text.count('(') - text.count(')')
For the string you posted, for example, we have this:
>>> p = '(((())))((((('
>>> p.count('(') - p.count(')')
5
>>>
Just for comparison and out of curiosity, I timed the str.count() and a loop approach, 1,000 times, using a string composed of 1,000,000 randoms ( and ). Here is what I found:
import random
from timeit import timeit
random.seed(0)
p = ''.join(random.choice('()') for _ in range(1_000_000))
def f():
return p.count('(') - p.count(')')
def g():
a, b = 0, 0
for c in p:
if c == '(':
a = a + 1
else:
b = b + 1
return a - b
print('f: %5.2f s' % timeit(f, number=1_000))
print('g: %5.2f s' % timeit(g, number=1_000))
f: 8.19 s
g: 49.34 s
It means the loop approach is 6 times slower, even though the str.count() one is iterating over p two times to compute the result.
I am writing a code for extracting specific lines from my file and then look for the maximum number, more specifically for its position (index).
So I start my code looking for the lines:
with open (filename,'r') as f:
lines = f.readlines()
for index, line in enumerate(lines):
if 'a ' in line:
x=(lines[index])
print(x)
So here from my code I got the lines I was looking for:
a 3 4 5
a 6 3 2
Then the rest of my code is looking for the maximum between the numbers and prints the index:
y = [float(item) for item in x.split()]
z=y.index(max(y[1:3]))
print(z)
now the code finds the index of the two largest numbers (so for 5 in the first line and 6 in the second):
3
1
But I want my code compare also the numbers between the two lines (so largest number between 3,4, 5,6,3,2), to have as output the index of the line, where is in the file the line containing the largest number (for example line 300) and the position in line (1).
Can you suggest to me some possible solutions?
You can try something like that.
max_value - list, where you can get max number, line and position
max_value = [0, 0, 0] # value, line, position
with open(filename, 'r') as f:
lines = f.readlines()
for index, line in enumerate(lines):
if 'a ' in line:
# get line data with digits
line_data = line.split(' ')[1:]
# check if element digit and bigger then max value - save it
for el_index, element in enumerate(line_data):
if element.isdigit() and int(element) > max_value[0]:
max_value = [int(element), index, el_index]
print(max_value)
Input data
a 3 4 5
a 6 3 2
Output data
# 6 - is max, 1 - line, 0 - position
[6, 1, 0]
You should iterate over every single line and keep track of the line number as well as the position of the items in that line all together. Btw you should run this with python 3.9+ (because of .startswith() method.)
with open(filename) as f:
lines = [line.rstrip() for line in f]
max_ = 0
line_and_position = (0, 0)
for i, line in enumerate(lines):
if line.startswith('a '):
# building list of integers for finding the maximum
list_ = [int(i) for i in line.split()[1:]]
for item in list_:
if item > max_:
max_ = item
# setting the line number and position in that line
line_and_position = i, line.find(str(item))
print(f'maximum number {max_} is in line {line_and_position[0] + 1} at index {line_and_position[1]}')
Input :
a 3 4 5
a 6 3 2
a 1 31 4
b 2 3 2
a 7 1 8
Output:
maximum number 31 is in line 3 at index 4
You can do it like below. I commented each line for explanation. This method differs from the others in that: using regex we are getting the current number and it's character position from one source. In other words, there is no going back into the line to find data after-the-fact. Everything we need comes on every iteration of the loop. Also, all the lines are filtered as they are received. Between the 2, having a stack of conditions is eliminated. We end up with 2 loops that get directly to the point and one condition to see if the requested data needs to be updated.
import re
with open(filename, 'r') as f:
#prime data
data = (0, 0, 0)
#store every line that starts with 'a' or blank line if it doesn't
for L, ln in enumerate([ln if ln[0] is 'a' else '' for ln in f.readlines()]):
#get number and line properties
for res in [(int(m.group('n')), L, m.span()[0]) for m in re.compile(r'(?P<n>\d+)').finditer(ln)]:
#compare new number with current max
if res[0] > data[0]:
#store new properties if greater
data = res
#print final
print('Max: {}, Line: {}, Position: {}'.format(*data))
My code doesn't do what it's supposed to do - finding max/min and printing which line contains each of those values.
It does find the max/min, but it doesn't print the expected line. Here is my code:
eqlCounter = 0
octals = []
with open("D:\matura\Matura2017\Dane_PR2\liczby.txt", "r") as f:
for x in f:
lines = f.readline()
splited = lines.split()
toInt = int(splited[1], 8) #oct to int(dec)
octals.append(toInt)
if int(splited[0]) == toInt:
eqlCounter += 1
low = min(octals)
maxx = max(octals)
print("same nmbrs: ", eqlCounter) #a
print("min: ", min(octals),"at: ",octals.index(low))
print("max: ", max(octals),"at: ",octals.index(maxx))
Each line contains a decimal number(1st column) and an octal (2nd column). My code finds the smallest and the biggest octal numbers and then it prints them out as a decimals. It works fine until displaying the lines that contain such values.
40829 134773
28592 31652
15105 123071
18227 36440
51074 122407
23893 117256
30785 100453
39396 11072
50492 105177
36134 32555
OUTPUT:
same nmbrs: 0
min: 4666 at: 3
max: 40622 at: 2
The values were found correctly, but not in the 3rd line. 8 is supposed to be the correct output, since it's the line that contains that exact value.
Here is the correct version of your code. The issue is with the way you iterate over lines of the file. Also you need +1 if you want to see row 8 instead of row 7.
eqlCounter = 0
octals = []
with open("D:\liczby.txt", "r") as f:
for line in f.readlines():
splited = line.split()
toInt = int(splited[1], 8) #oct to int(dec)
octals.append(toInt)
if int(splited[0]) == toInt:
eqlCounter += 1
# print(splited[0],splited[1],toInt)
low = min(octals)
maxx = max(octals)
print("same nmbrs: ", eqlCounter) #a
print("min: ", min(octals),"at: ",octals.index(low)+1)
print("max: ", max(octals),"at: ",octals.index(maxx)+1)
result:
same nmbrs: 0
min: 4666 at: 8
max: 47611 at: 1
When executing your code you get a compile error:
Traceback (most recent call last):
File "app.py", line 5, in <module>
lines = f.readline()
ValueError: Mixing iteration and read methods would lose data
This is because you are doing a for loop on the lines of your input file while reading a line and jumping to the next one, that means you are skipping one line in each iteration.
Here is your code fixed:
eqlCounter = 0
octals = []
with open("D:\matura\Matura2017\Dane_PR2\liczby.txt", "r") as f:
lines = f.readlines()
for line in lines:
splited = line.split()
toInt = int(splited[1], 8) #oct to int(dec)
octals.append(toInt)
if int(splited[0]) == toInt:
eqlCounter += 1
low = min(octals)
maxx = max(octals)
print("same nmbrs: ", eqlCounter) #a
print("min: ", min(octals),"at: ",octals.index(low))
print("max: ", max(octals),"at: ",octals.index(maxx))
I have a .txt file and I would like to print lines 3, 7, 11, 15,...
So, after printing the third line, I would like to print every 4th line afterward.
I began by looking at the modulus operator:
#Open the file
with open('file.txt') as file:
#Iterate through lines
for i, line in enumerate(file):
#Choose every third line in a file
if i % 3 == 0:
print(line)
#Close the file when you're done
file.close()
but that approach prints every third line. If i % 3 == 1 that prints lines 1, 4, 7, 10, 13 etc.
Instead of using modulo, simply just use addition, start it with the first line you want to show, and then add 4 to it
next_line = 2 # Line 3 is index 2
for i, line in enumerate(file):
if i == next_line:
print(line)
next_line = next_line + 4
Your code is almost fine, except for the modulo: you want the remainder of the division by 4 to be 3.
with open('file.txt') as file:
for i, line in enumerate(file):
if i % 4 == 3:
print(line)
Note that you don't need to explicitely close your file at the end: that's what with is intended for, it makes sure that your file gets closed whatever happens.
So you want to something to happen every fourth time, that means modulo 4. Try changing your if to if i % 4 == N: with a good number for N.
By the way, when using the with statement you have don't have to call close(), it does so automatically.
How about:
# Fetch all lines from the file
lines = open('20 - Modular OS - lang_en_vs2.srt').readlines()
# Print the 3rd line
print(lines[2])
# throw away the first 3 lines, so the modulo (below) works ok
for i in range(3):
del(lines[0])
# print every 4th line after that
for (i in range(len(lines)):
if (i > 0 and i % 4 == 0):
print(lines[i])
Read every line into an array.
Output the 3rd line.
We then need every fourth line, so by deleteing the first 3 elements, it's easy to simply test against modulo 4 (the "% 4") and output the line.
x = 0
with open('file.txt') as file:
#Iterate through lines
for i, line in enumerate(file):
x += 1
#Choose every third line in a file
if x == 4:
print(line)
x = 0
#Close the file when you're done
file.close()
Result
>>> i = 0
>>> for x in range(0, 100):
... i += 1
... if i is 4:
... print(x)
... i = 0
3
7
11
15
19
23
27
31
35
39
43
47
51
55
59
63
67
71
75
79
83
87
91
95
99
file = open('file.txt')
print(file[2])
#Iterate through lines
for i in file:
#Choose every third line in a file, beginning with 4
if i % 4 == 0:
print(i+3)
elif i % 4 == 0:
print(i)
This works, but isn't super elegant.
My data looks like:
1 1.45
1 1.153
2 2.179
2 2.206
2 2.59
2 2.111
3 3.201
3 3.175
4 4.228
4 4.161
4 4.213
The output I want is :
1 2 (1 occurs 2 times)
2 4
3 2
4 3
For this I run the following code:
SubPatent2count = {}
for line in data.split('\n'):
for num in line.split('\t'):
Mapper_data = ["%s\t%d" % (num[0], 1) ]
for line in Mapper_data:
Sub_Patent,count = line.strip().split('\t',1)
try:
count = int(count)
except ValueError:
continue
try:
SubPatent2count[Sub_Patent] = SubPatent2count[Sub_Patent]+count
except:
SubPatent2count[Sub_Patent] = count
for Sub_Patent in SubPatent2count.keys():
print ('%s\t%s'% ( Sub_Patent, SubPatent2count[Sub_Patent] ))
At the end I get this error :
3 for num in line.split('\t'):
4 #print(num[0])
----> 5 Mapper_data = ["%s\t%d" % (num[0], 1) ]
6 #print(Mapper_data)
7 for line in Mapper_data:
IndexError: string index out of range
If you have any Idea how I can deal with this error please Help.
Thank you!
Just suggesting another approach: Have you tried with list comprehension + groupy from itertools?
from itertools import groupby
print([(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])])
# where [x.split(' ')[0] for x in data.split('\n')] generates a list of all starting number
# and groupy counts them
Or if you want that exact output:
from itertools import groupby
mylist = [(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])]
for key, repetition in mylist:
print(key, repetition)
Thank you everybody, your suggestions really helped me, I changed my code as follow:
SubPatent2count = {}
for line in data.split('\n'):
Mapper_data = ["%s\o%d" % (line.split(' ')[0], 1) ]
for line in Mapper_data:
Sub_Patent,count = line.strip().split('\o',1)
try:
count = int(count)
except ValueError:
continue
try:
SubPatent2count[Sub_Patent] = SubPatent2count[Sub_Patent]+count
except:
SubPatent2count[Sub_Patent] = count
for Sub_Patent in SubPatent2count.keys():
print ('%s\t%s'% ( Sub_Patent, SubPatent2count[Sub_Patent] ))
And it gives the following result:
1 2 (1 occurs 2 times)
2 4
3 2
4 3
num[0] is probably an empty string, that's why you are getting an index out of range error. Another possibility is that you are in fact separating the number in each line with empty strings, not with tabs.
Anyway, your code seems a little strange. For example, you encode the data in a string in a list of one element (Mapped_data) and then decode it to process it. That is really not necessary and you should avoid it.
Try this code:
from collections import Counter
decoded_data = [ int(l.split(' ', 1)[0]) for l in data.split('\n') if len(l)>0]
SubPatent2count = Counter(decoded_data)
for k in SubPatent2count:
print k, SubPatent2count[k]