Reading compressed file line by line - python

This is my code:
f = gzip.open('nome_file.gz','r')
line = f.readline()
for line in f:
line = f.readline()
line = line.strip('\n')
if not line: break
elements = line.split(" ")
print elements[0]," ",elements[1]," ",elements[44]," ",elements[45]
f.close()
I really don't know why just one line over two is read.

for line in f: reads a line. The next line line = f.readline() reads the next line and stores it in the same variable.
You read every line, but skip every second one.
Simply dropping line = f.readline() should solve the problem.

Related

How to move the cursor to a specific line in python

I am reading data from a .txt file. I need to read lines, starting from a certain line, so I don't have to read the whole file (using .readlines()). Since I know which line i should start reading from, I came up with this(it does not work though):
def create_list(pos):
list_created = []
with open('text_file.txt', 'r') as f:
f.seek(pos) #Here I want to put the cursor at the begining of the line that I need to read from
line = f.readline() #And here I read the first line
while line != '<end>\n':
line = line.rstrip('\n')
list_created.append(line.split(' '))
line = f.readline()
f.close()
return list_created
print(create_list(2)) #Here i need to create a list starting from the 3rd line of my file
And my text file looks something like this:
Something #line in pos= 0
<start> #line in pos= 1
MY FIRST LINE #line in pos= 2
MY SECOND LINE #line in pos= 3
<end>
An the result should be somethign like:
[['MY', 'FIRST', 'LINE'], ['MY', 'SECOND', 'LINE']]
Basically, I need to start my readline() from a specific line.
Does this work? If you don't want to read the entire file with .readlines(), you can skip a single line by calling .readline(). This way you can call readline() as many times as you want to move your cursor down, then return the next line. Also, I don't recommend using line != '<end>\n' unless you're absolutely sure that there will be a newline after <end>. Instead, do something like not '<end>' in line:
def create_list(pos):
list_created = []
with open('text_file.txt', 'r') as f:
for i in range(pos):
f.readline()
line = f.readline() #And here I read the first line
while not '<end>' in line:
line = line.rstrip('\n')
list_created.append(line.split(' '))
line = f.readline()
f.close()
return list_created
print(create_list(2)) #Here i need to create a list starting from the 3rd line of my file
text_file.txt:
Something
<start>
MY FIRST LINE
MY SECOND LINE
<end>
Output:
[['MY', 'FIRST', 'LINE'], ['MY', 'SECOND', 'LINE']]

Why does not the parameter passed to the keyword argument end in the print function of Python does not work as expected in the below context?

I ran the following code,
with open('test.txt', 'r') as f:
for line in f:
print(line, end=' ')
I expected to get,
This is the first line This is the second line This is the third line
as output.
Instead I got,
This is the first line
This is the second line
This is the third line
Can someone tell me why this behavior occurs?
The content in the .txt file is as below,
This is the first line
This is the second line
This is the third line
In the file you have a line break for each line. You can remove it with the strip() function.
Example:
with open("test.txt", "r") as f:
for line in f:
print(line.strip(), end=" ")
The contents of the text file has a \n after every line so i suggest that you replace the '\n' by '' by adding the following line:
line = line.replace('\n', '')
So the code would look like this:
with open('test.txt', 'r') as f:
for line in f:
line = line.replace('\n', '')
print(line, end=' ')

How to modify a line in a file using Python

I am trying to do what for many will be a very straight forward thing but for me is just infuriatingly difficult.
I am trying search for a line in a file that contains certain words or phrases and modify that line...that's it.
I have been through the forum and suggested similar questions and have found many hints but none do just quite what I want or are beyond my current ability to grasp.
This is the test file:
# 1st_word 2nd_word
# 3rd_word 4th_word
And this is my script so far:
############################################################
file = 'C:\lpthw\\text'
f1 = open(file, "r+")
f2 = open(file, "r+")
############################################################
def wrline():
lines = f1.readlines()
for line in lines:
if "1st_word" in line and "2nd_word" in line:
#f2.write(line.replace('#\t', '\t'))
f2.write((line.replace('#\t', '\t')).rstrip())
f1.seek(0)
wrline()
My problem is that the below inserts a \n after the line every time and adds a blank line to the file.
f2.write(line.replace('#\t', '\t'))
The file becomes:
1st_word 2nd_word
# 3rd_word 4th_word
An extra blank line between the lines of text.
If I use the following:
f2.write((line.replace('#\t', '\t')).rstrip())
I get this:
1st_word 2nd_wordd
# 3rd_word 4th_word
No new blank line inserted but and extra "d" at the end instead.
What am I doing wrong?
Thanks
Your blank line is coming from the original blank line in the file. Writing a line with nothing in it writes a newline to the file. Instead of not putting anything into the written line, you have to completely skip the iteration, so it does not write that newline. Here's what I suggest:
def wrline():
lines = open('file.txt', 'r').readlines()
f2 = open('file.txt', 'w')
for line in lines:
if '1st_word' in line and '2nd_word' in line:
f2.write((line.replace('# ', ' ')).rstrip('\n'))
else:
if line != '\n':
f2.write(line)
f2.close()
I would keep read and write operations separate.
#read
with open(file, 'r') as f:
lines = f.readlines()
#parse, change and write back
with open(file, 'w') as f:
for line in lines:
if line.startswith('#\t'):
line = line[1:]
f.write(line)
You have not closed the files and there is no need for the \t
Also get rid of the rstrip()
Read in the file, replace the data and write it back.. open and close each time.
fn = 'example.txt'
new_data = []
# Read in the file
with open(fn, 'r+') as file:
filedata = file.readlines()
# Replace the target string
for line in filedata:
if "1st_word" in line and "2nd_word" in line:
line = line.replace('#', '')
new_data.append(line)
# Write the file out again
with open(fn, 'w+') as file:
for line in new_data:
file.write(line)

replace line in python file

I want to write a program that gives some integer value. I have a file with a value in the first line. How can I change the value of line (for example to 12). This is my code, but
this gets a value and I want to go to line 2 and addition m to that number in line 2 but it doesn't work.
t=open('pash.txt', 'r')
g=[]
for i in range(3):
g.append(t.readline())
t.close()
g[o-1]=(int(g[o-1]))+m # o is the number of line in file
print(g[o-1])
t=open("pash.txt","w")
for i in range(3):
t.write(str(g[i]))
t.write('\n')
t.close()
You can open, read file line by line using readlines, modify content and re-write the file:
with open('pash.txt', 'r') as f:
lines = f.readlines()
m = 5 # value you need to add to a line.
o = 2 # line number of the line to modify.
with open('pash.txt', 'w') as f:
for x, line in enumerate(lines):
if x == o:
line = int(line) + m
f.write(line)

If the next line of a file contains a string, append it to the end of the current one

I have a CSV with 13 million lines. The data is not quote encapsulated and it contains newlines, which is causing a row of data to have line breaks. The data does not have multiple breaks per line, only one.
How would I take data like this?
Line of data
Line of data
continuation of previous line of data
Line of data
Line of data
continuation of previous line
Line of data
And turn it into this:
Line of data
Line of data continuation of previous line of data
Line of data
Line of data continuation of previous line
Line of data
I've tested this by storing the line in a variable and processing the next one, looking for the first character to be anything but 'L', and appending it. I've also tried using f.tell() and f.seek() to move around in the file, but I haven't been able to get it to work.
Assuming every time a line starts with a space it should be concatenated with the preceding line, this should work:
with open(data) as infile:
previous_line = None
for line in infile:
if previous_line is None:
previous_line = line
if line.startswith(' '):
line = previous_line.strip() + line
previous_line = line
print(line.strip())
Here's a cheap, reasonably efficient continuation line joiner for you.
def cont_lines(source):
last_line = ''
for line in source:
if line.startswith(' '):
last_line += line.lstrip() # append a continuation
else:
if last_line:
yield last_line
last_line = line
if last_line: # The one remaining as the source has ended.
yield last_line
Use like this:
with open("tile.csv") as f:
for line in cont_lines(f):
# do something with line
It only uses as much memory as the longest set of continuation lines in your file.
I was able to work out something.
infile = "test.txt"
def peek_line(f):
pos = f.tell()
line = f.readline()
f.seek(pos)
return line
f = open(infile, 'r')
while True:
line = f.readline()
if not line:
break
peek = peek_line(f)
if not peek.startswith('T'):
line = (line.strip() + f.readline())
print line,
I'm open to feedback on this method.

Categories