#if(len(results) != 0)
fr = (open("new_file.txt","r"))
fr1 = (open("results.txt","w"))
for j in range (len(line_list)):
for i, line in enumerate(fr):
if(i == line_list[j]):`find the line in the file`
fr1.write(FAILURE_STRING+line)`mark the failure`
else:`enter code here`
fr1.write(line)
fr.close()
fr1.close()
In the above example mmy j loop executes only once. I am trying to mark the failure in the results file. even if my line_list has almost 7 element (line numbers i am suppose to mark the failure for in case of a mismatch), it marks failure for only 1 element. If i take the J for loop inside, it will mark all the failure the there will be the duplicates inside the results file (the number of duplicates of each line would be as same as the number of elements in the line_list )
If I understood correctly, you have a list of lines that do not match with the ones on a file (new_file.txt), and you want to introduce an error string to those lines. For that, you have to use fr.readlines() on the cycle, which results in something like this
line_list = [0, 2, 2, 4] # Example list of lines
FAILURE_STRING = "NO"
fr = open("new_file.txt", "r")
fr1 = open("results.txt", "w")
for i, line in enumerate(fr.readlines()):
if(i == line_list[i]):
fr1.write(FAILURE_STRING+line)
else:
fr1.write(line)
fr.close()
fr1.close()
open returns a generator, and you can only iterate over a generator once.
You have two options:
Reverse the for loops so you only iterate over the file once.
for i, line in enumerate(fr):
for j in range (len(line_list)):
if(i == line_list[j]): #find the line in the file
fr1.write(FAILURE_STRING+line)#mark the failure`
else:
fr1.write(line)
Cast your file to a type that's not a generator
fr = [i for i in fr]
Thank you for all your answers. #NightShadeQueen the 2 point in your answer helped me to get to it.
The following is the solution which worked:
if(len(results) != 0):
fr1 = (open("results.txt","w"))
fr = (open("new_file.txt","r"))
fr = [i for i in fr]
for i in range (len(fr)):
if i in line_list:
fr1.write(FAILURE_STRING+fr[i])
else:`enter code here`
fr1.write(fr[i])
fr1.close()
Related
I have a file that puts out lines that have two values each. I need to compare the second value in every line to make sure those values are not repeated more than once. I'm very new to coding so any help is appreciated.
My thinking was to turn each line into a list with two items each, and then I could compare the same position from a couple lists.
This is a sample of what my file contains:
20:19:18 -1.234567890
17:16:15 -1.098765432
14:13:12 -1.696969696
11:10:09 -1.696969696
08:07:06 -1.696969696
Here's the code I'm trying to use. Basically I want it to ignore those first two lines and print out the third line, since it gets repeated more than once:
with open('my_file') as txt:
for line in txt: #this section turns the file into lists
linelist = '%s' % (line)
lista = linelist.split(' ')
n = 1
for line in lista:
listn = line[n]
listo = line[n + 1]
listp = line[n + 2]
if listn[1] == listo[1] and listn[1] == listp[1]:
print line
else:
pass
n += 1
What I want to see is:
14:13:12 -1.696969696
But I keep getting an error on the long if statement of "string index out of range"
You would be a lot better off using a dictionary type structure. Dictionary allows you to quickly check for existence.
Basically check if the 2nd value is a key in your dict. If a key then print the line. Else just add the 2nd value as a key for later.
myDict = {}
with open('/home/dmoraine/pylearn/%s' % (file)) as txt:
for line in txt:
key = line.split()[1]
if key in myDict:
print(line)
else:
myDict[key] = None #value doesn't matter
Some simple debugging highlights the functional problem:
with open('my_file.txt') as txt:
for line in txt: #this section turns the file into lists
linelist = '%s' % (line)
lista = linelist.split(' ')
print(linelist, lista)
n = 1
for line in lista:
print("line", n, ":\t", line)
listn = line[n]
listo = line[n + 1]
listp = line[n + 2]
print(listn, '|',listo, '|',listp)
if listn[1] == listo[1] and listn[1] == listp[1]:
print(line)
n += 1
Output:
20:19:18 -1.234567890
['20:19:18', '-1.234567890\n']
17:16:15 -1.098765432
['17:16:15', '-1.098765432\n']
14:13:12 -1.696969696
['14:13:12', '-1.696969696\n']
11:10:09 -1.696969696
['11:10:09', '-1.696969696\n']
08:07:06 -1.696969696
['08:07:06', '-1.696969696\n']
line 1 : 08:07:06
8 | : | 0
In short, you've mis-handled the variables. When you get to the second loop, lista is the "words" of the final line; you've read and discarded all of the others. line iterates through these individual words. Your listn/o/p variables are, therefore, individual characters. Thus, there is no such thing as listn[1], and you get an error.
Instead, you need to build some sort of list of the floating-point numbers. For instance, using your top loop as a starting point:
float_list = {}
for line in txt: #this section turns the file into lists
lista = line.split(' ')
my_float = float(lista[1]) # Convert the second field into a float
float_list.append(my_float)
Now you need to write code that will find duplicates in float_list. Can you take it from there?
Ended up turning each line into a list, and then making a dictionary of all the lists. Thank you all for your help.
I am trying to figure out if it is possible to access the elements of a list around the element you are currently at. I have a list that is large (20k+ lines) and I want to find every instance of the string 'Name'. Additionally, I also want to get +/- 5 elements around each 'Name' element. So 5 lines before and 5 lines after. The code I am using is below.
search_string = 'Name'
with open('test.txt', 'r') as infile, open ('textOut.txt','w') as outfile:
for line in infile:
if search_string in line:
outfile.writelines([line, next(infile), next(infile),
next(infile), next(infile), next(infile)])
Getting the lines after the occurrence of 'Name' is pretty straightforward, but figuring out how to access the elements before it has me stumped. Anyone have an ideas?
20k lines isn't that much, if it's ok to read all of them in a list, we can take slices around the index where a match is found, like this:
with open('test.txt', 'r') as infile, open('textOut.txt','w') as outfile:
lines = [line.strip() for line in infile.readlines()]
n = len(lines)
for i in range(n):
if search_string in lines[i]:
start = max(0, i - 5)
end = min(n, i + 6)
outfile.writelines(lines[start:end])
You can use the function enumerate that allows you to iterate through both elements and indexes.
Example to access elements 5 indexes before and after your current element :
n = len(l)
for i, x in enumerate(l):
print(l[max(i-5, 0)]) # Prevent picking last elements of iterable by using negative indexes
print(x)
print(l[min(i+5, n-1)]) # Prevent overflow
You need to keep track of the index of where in the list you currently are
So something like:
# Read the file into list_of_lines
index = 0
while index < len(list_of_lines):
if list_of_lines[index] == 'Name':
print(list_of_lines[index - 1]) # This is the previous line
print(list_of_lines[index + 1]) # This is the next line
# And so on...
index += 1
Let's say you have your lines stored in your list:
lines = ['line1', 'line2', 'line3', 'line4', 'line5', 'line6', 'line7', 'line8', 'line9']
You could define a method returning elements grouped by n consecutives, as a generator:
def each_cons(iterable, n = 2):
if n < 2: n = 1
i, size = 0, len(iterable)
while i < size-n+1:
yield iterable[i:i+n]
i += 1
Teen, just call the method. To show the content I'm calling list on it, but you can iterate over it:
lines_by_3_cons = each_cons(lines, 3) # or any number of lines, 5 in your case
print(list(lines_by_3_cons))
#=> [['line1', 'line2', 'line3'], ['line2', 'line3', 'line4'], ['line3', 'line4', 'line5'], ['line4', 'line5', 'line6'], ['line5', 'line6', 'line7'], ['line6', 'line7', 'line8'], ['line7', 'line8', 'line9']]
I personally loved that problem. All guys here are doing it by taking the whole file into memory. I think I wrote a memory efficient code.
Here, check this out!
myfile = open('infile.txt')
stack_print_moments = []
expression = 'MYEXPRESSION'
neighbourhood_size = 5
def print_stack(stack):
for line in stack:
print(line.strip())
print('-----')
current_stack = []
for index, line in enumerate(myfile):
current_stack.append(line)
if len(current_stack) > 2 * neighbourhood_size + 1:
current_stack.pop(0)
if expression in line:
stack_print_moments.append(index + neighbourhood_size)
if index in stack_print_moments:
print_stack(current_stack)
last_index = index
for index in range(last_index, last_index + neighbourhood_size + 1):
if index in stack_print_moments:
print_stack(current_stack)
current_stack.pop(0)
More advanced code is here: Github link
I am trying to read a file, collect some lines, batch process them and then post process the result.
Example:
with open('foo') as input:
line_list = []
for line in input:
line_list.append(line)
if len(line_list) == 10:
result = batch_process(line_list)
# something to do with result here
line_list = []
if len(line_list) > 0: # very probably the total lines is not mutiple of 10 e.g. 11
result = batch_process(line_list)
# something to do with result here
I do not want to duplicate the batch invoking and post processing so I want to know if could dynamically add some content to input, e.g.
with open('foo') as input:
line_list = []
# input.append("THE END")
for line in input:
if line != 'THE END':
line_list.append(line)
if len(line_list) == 10 or line == 'THE END':
result = batch_process(line_list)
# something to do with result here
line_list = []
So if in this case I cannot duplicate the code in if branch. Or if has any other better manner could know it's the last line?
If your input is not too large and fits comfortably in memory, you can read everything into a list, slice the list into sub-list of length 10 and loop over them.
k = 10
with open('foo') as input:
lines = input.readlines()
slices = [lines[i:i+k] for i in range(0, len(lines), k)]
for slice in slices:
batch_process(slice)
If you want to append a mark to the input lines, you also have to read all lines first.
I recently started programming and I wanted to sort a file, but in the end, this code only returns one line, even though the text file has 65 lines...
f = open(".\\test.txt")
g, u = [], []
a = 0
for i, line in enumerate(f):
a += 1
if i%2 == 0:
g.append(f.readlines()[i])
print(i),
elif i%2 == 1:
u.append(f.readlines()[i])
print(i),
print(u),
print(g)
Your for loop starts reading the file line by line. But then its contents go and read the rest of the file in a single readlines call; after that, there's nothing more to be read! So you end up with the first line in line and the second line in g, since you only kept one of the lines that readlines() read.
open(filename) gives you an iterator over the lines of a file. This iterator will be exhausted after reading all the lines once, any subsequent calls to readlines after the first one will therefore give you an empty list.
Demo:
>>> with open('testfile.txt') as f:
... a = f.readlines()
... b = f.readlines()
... a
... b
...
['hello\n', 'stack\n', 'overflow\n']
[]
You have to do
lines = f.readlines()
for i, line in enumerate(lines):
a += 1
if i%2 == 0:
g.append(lines[i])
print(i),
elif i%2 == 1:
u.append(lines[i])
print(i),
I need to write a Python program to read the values in a file, one per line, such as file: test.txt
1
2
3
4
5
6
7
8
9
10
Denoting these as j1, j2, j3, ... jn,
I need to sum the differences of consecutive values:
a=(j2-j1)+(j3-j2)+...+(jn-j[n-1])
I have example source code
a=0
for(j=2;j<=n;j++){
a=a+(j-(j-1))
}
print a
and the output is
9
If I understand correctly, the following equation;
a = (j2-j1) + (j3-j2) + ... + (jn-(jn-1))
As you iterate over the file, it will subtract the value in the previous line from the value in the current line and then add all those differences.
a = 0
with open("test.txt", "r") as f:
previous = next(f).strip()
for line in f:
line = line.strip()
if not line: continue
a = a + (int(line) - int(previous))
previous = line
print(a)
Solution (Python 3)
res = 0
with open("test.txt","r") as fp:
lines = list(map(int,fp.readlines()))
for i in range(1,len(lines)):
res += lines[i]-lines[i-1]
print(res)
Output: 9
test.text contains:
1
2
3
4
5
6
7
8
9
10
I'm not even sure if I understand the question, but here's my best attempt at solving what I think is your problem:
To read values from a file, use "with open()" in read mode ('r'):
with open('test.txt', 'r') as f:
-your code here-
"as f" means that "f" will now represent your file if you use it anywhere in that block
So, to read all the lines and store them into a list, do this:
all_lines = f.readlines()
You can now do whatever you want with the data.
If you look at the function you're trying to solve, a=(j2-j1)+(j3-j2)+...+(jn-(jn-1)), you'll notice that many of the values cancel out, e.g. (j2-j1)+(j3-j2) = j3-j1. Thus, the entire function boils down to jn-j1, so all you need is the first and last number.
Edit: That being said, please try and search this forum first before asking any questions. As someone who's been in your shoes before, I decided to help you out, but you should learn to reference other people's questions that are identical to your own.
The correct answer is 9 :
with open("data.txt") as f:
# set prev to first number in the file
prev = int(next(f))
sm = 0
# iterate over the remaining numbers
for j in f:
j = int(j)
sm += j - prev
# update prev
prev = j
print(sm)
Or using itertools.tee and zip:
from itertools import tee
with open("data.txt") as f:
a,b = tee(f)
next(b)
print(sum(int(j) - int(i) for i,j in zip(a, b)))