I am reading data from a .txt file. I need to read lines, starting from a certain line, so I don't have to read the whole file (using .readlines()). Since I know which line i should start reading from, I came up with this(it does not work though):
def create_list(pos):
list_created = []
with open('text_file.txt', 'r') as f:
f.seek(pos) #Here I want to put the cursor at the begining of the line that I need to read from
line = f.readline() #And here I read the first line
while line != '<end>\n':
line = line.rstrip('\n')
list_created.append(line.split(' '))
line = f.readline()
f.close()
return list_created
print(create_list(2)) #Here i need to create a list starting from the 3rd line of my file
And my text file looks something like this:
Something #line in pos= 0
<start> #line in pos= 1
MY FIRST LINE #line in pos= 2
MY SECOND LINE #line in pos= 3
<end>
An the result should be somethign like:
[['MY', 'FIRST', 'LINE'], ['MY', 'SECOND', 'LINE']]
Basically, I need to start my readline() from a specific line.
Does this work? If you don't want to read the entire file with .readlines(), you can skip a single line by calling .readline(). This way you can call readline() as many times as you want to move your cursor down, then return the next line. Also, I don't recommend using line != '<end>\n' unless you're absolutely sure that there will be a newline after <end>. Instead, do something like not '<end>' in line:
def create_list(pos):
list_created = []
with open('text_file.txt', 'r') as f:
for i in range(pos):
f.readline()
line = f.readline() #And here I read the first line
while not '<end>' in line:
line = line.rstrip('\n')
list_created.append(line.split(' '))
line = f.readline()
f.close()
return list_created
print(create_list(2)) #Here i need to create a list starting from the 3rd line of my file
text_file.txt:
Something
<start>
MY FIRST LINE
MY SECOND LINE
<end>
Output:
[['MY', 'FIRST', 'LINE'], ['MY', 'SECOND', 'LINE']]
Related
Suppose I have a text file that goes like this:
AAAAAAAAAAAAAAAAAAAAA #<--- line 1
BBBBBBBBBBBBBBBBBBBBB #<--- line 2
CCCCCCCCCCCCCCCCCCCCC #<--- line 3
DDDDDDDDDDDDDDDDDDDDD #<--- line 4
EEEEEEEEEEEEEEEEEEEEE #<--- line 5
FFFFFFFFFFFFFFFFFFFFF #<--- line 6
GGGGGGGGGGGGGGGGGGGGG #<--- line 7
HHHHHHHHHHHHHHHHHHHHH #<--- line 8
Ignore "#<--- line...", it's just for demonstration
Assumptions
I don't know what line 3 is going to contain (because it changes
all the time)...
The first 2 lines have to be deleted...
After the first 2 lines, I want to keep 3 lines...
Then, I want to delete all lines after the 3rd line.
End Result
The end result should look like this:
CCCCCCCCCCCCCCCCCCCCC #<--- line 3
DDDDDDDDDDDDDDDDDDDDD #<--- line 4
EEEEEEEEEEEEEEEEEEEEE #<--- line 5
Lines deleted: First 2 + Everything after the next 3 (i.e. after line 5)
Required
All Pythonic suggestions are welcome! Thanks!
Reference Material
https://thispointer.com/python-how-to-delete-specific-lines-in-a-file-in-a-memory-efficient-way/
def delete_multiple_lines(original_file, line_numbers):
"""In a file, delete the lines at line number in given list"""
is_skipped = False
counter = 0
# Create name of dummy / temporary file
dummy_file = original_file + '.bak'
# Open original file in read only mode and dummy file in write mode
with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
# Line by line copy data from original file to dummy file
for line in read_obj:
# If current line number exist in list then skip copying that line
if counter not in line_numbers:
write_obj.write(line)
else:
is_skipped = True
counter += 1
# If any line is skipped then rename dummy file as original file
if is_skipped:
os.remove(original_file)
os.rename(dummy_file, original_file)
else:
os.remove(dummy_file)
Then...
delete_multiple_lines('sample.txt', [0,1,2])
The problem with this method might be that, if your file had 1-100 lines on top to delete, you'll have to specify [0,1,2...100]. Right?
Answer
Courtesy of #sandes
The following code will:
delete the first 63
get you the next 95
ignore the rest
create a new file
with open("sample.txt", "r") as f:
lines = f.readlines()
new_lines = []
idx_lines_wanted = [x for x in range(63,((63*2)+95))]
# delete first 63, then get the next 95
for i, line in enumerate(lines):
if i > len(idx_lines_wanted) -1:
break
if i in idx_lines_wanted:
new_lines.append(line)
with open("sample2.txt", "w") as f:
for line in new_lines:
f.write(line)
EDIT: iterating directly over f
based in #Kenny's comment and #chepner's suggestion
with open("your_file.txt", "r") as f:
new_lines = []
for idx, line in enumerate(f):
if idx in [x for x in range(2,5)]: #[2,3,4]
new_lines.append(line)
with open("your_new_file.txt", "w") as f:
for line in new_lines:
f.write(line)
This is really something that's better handled by an actual text editor.
import subprocess
subprocess.run(['ed', original_file], input=b'1,2d\n+3,$d\nwq\n')
A crash course in ed, the POSIX standard text editor.
ed opens the file named by its argument. It then proceeds to read commands from its standard input. Each command is a single character, with some commands taking one or two "addresses" to indicate which lines to operate on.
After each command, the "current" line number is set to the line last affected by a command. This is used with relative addresses, as we'll see in a moment.
1,2d means to delete lines 1 through 2; the current line is set to 2
+3,$d deletes all the lines from line 5 (current line is 2, so 2 + 3 == 5) through the end of the file ($ is a special address indicating the last line of the file)
wq writes all changes to disk and quits the editor.
I understand that it skips the first row in the file because of header, but how can I avoid it? The syntax must be exactly the same as it is below.
File contains: Rabbit, Pig, Dog, Horse, Bird
try:
file = open("file.txt")
line = file.readline()
animals = []
for line in file:
animals.append(line.rstrip())
animals.sort()
print(animals)
finally:
file.close()
Output is ['Bird', 'Dog', 'Horse', 'Pig']
Every time readline() is called, it reads a line from a file and moves the cursor to the beggining of the next line.
In your code, line = file.readline() reads the first line and moves the cursor to the second line. As a result, your for loop starts from the second line of the file. If there is no particular reason you need it, just delete it. If you do need it, just append the line variable in the list and then do the sorting.
The problems is the line variable, it reads a line and it is never used. You should consider using another name for the variable file because it is a keyword. Also it is good practice to open files in this format:
with open("file.txt") as f:
f.read()
This should work.
try:
f = open("file.txt")
animals = []
for line in f:
animals.append(line.rstrip())
animals.sort()
print(animals)
finally:
f.close()
you are reading the first line and you are not doing something, you can include the first line in your output list: animals = [line.rstrip()]
or you can use the context manager:
with open('file.txt') as fp:
animals = sorted(l.rstrip() for l in fp)
I have a CSV with 13 million lines. The data is not quote encapsulated and it contains newlines, which is causing a row of data to have line breaks. The data does not have multiple breaks per line, only one.
How would I take data like this?
Line of data
Line of data
continuation of previous line of data
Line of data
Line of data
continuation of previous line
Line of data
And turn it into this:
Line of data
Line of data continuation of previous line of data
Line of data
Line of data continuation of previous line
Line of data
I've tested this by storing the line in a variable and processing the next one, looking for the first character to be anything but 'L', and appending it. I've also tried using f.tell() and f.seek() to move around in the file, but I haven't been able to get it to work.
Assuming every time a line starts with a space it should be concatenated with the preceding line, this should work:
with open(data) as infile:
previous_line = None
for line in infile:
if previous_line is None:
previous_line = line
if line.startswith(' '):
line = previous_line.strip() + line
previous_line = line
print(line.strip())
Here's a cheap, reasonably efficient continuation line joiner for you.
def cont_lines(source):
last_line = ''
for line in source:
if line.startswith(' '):
last_line += line.lstrip() # append a continuation
else:
if last_line:
yield last_line
last_line = line
if last_line: # The one remaining as the source has ended.
yield last_line
Use like this:
with open("tile.csv") as f:
for line in cont_lines(f):
# do something with line
It only uses as much memory as the longest set of continuation lines in your file.
I was able to work out something.
infile = "test.txt"
def peek_line(f):
pos = f.tell()
line = f.readline()
f.seek(pos)
return line
f = open(infile, 'r')
while True:
line = f.readline()
if not line:
break
peek = peek_line(f)
if not peek.startswith('T'):
line = (line.strip() + f.readline())
print line,
I'm open to feedback on this method.
I must re-order an input file and then print the output to a new file.
This is the input file:
The first line never changes.
The second line was a bit much longer.
The third line was short.
The fourth line was nearly the longer line.
The fifth was tiny.
The sixth line is just one line more.
The seventh line was the last line of the original file.
This is what the output file should look like:
The first line never changes.
The seventh line was the last line of the original file.
The second line was a bit much longer.
The sixth line is just one line more.
The third line was short.
The fifth was tiny.
The fourth line was nearly the longer line.
I have code already that reverse the input file and prints it to the output file which looks like this
ifile_name = open(ifile_name, 'r')
lines = ifile_name.readlines()
ofile_name = open(ofile_name, "w")
lines[-1] = lines[-1].rstrip() + '\n'
for line in reversed(lines):
ofile_name.write(line)
ifile_name.close()
ofile_name.close()
Is there anyway I can get the desired format in the text file while keeping my reverse code?
Such as print the first line of the input file, then reverse and print that line, the print the second line of the input file, then reverse and print that line etc.
Sorry if this may seem unclear I am very new to Python and stack overflow.
Thanks in advance.
This is a much elegant solution I believe if you don't care about the list generated.
with open("ifile_name","r") as f:
init_list=f.read().strip().splitlines()
with open("result.txt","a") as f1:
while True:
try:
f1.write(init_list.pop(0)+"\n")
f1.write(init_list.pop()+"\n")
except IndexError:
break
ifile_name = "hello/input.txt"
ofile_name = "hello/output.txt"
ifile_name = open(ifile_name, 'r')
lines = ifile_name.readlines()
ofile_name = open(ofile_name, "w")
lines[-1] = lines[-1].rstrip() + '\n'
start = 0
end = len(lines) - 1
while start < end:
ofile_name.write(lines[start])
ofile_name.write(lines[end])
start += 1
end -= 1
if start == end:
ofile_name.write(lines[start])
ifile_name.close()
ofile_name.close()
use two pivots start and end to point which line to write to the file.
once start == end, write the middle line to the file
This is my code:
f = gzip.open('nome_file.gz','r')
line = f.readline()
for line in f:
line = f.readline()
line = line.strip('\n')
if not line: break
elements = line.split(" ")
print elements[0]," ",elements[1]," ",elements[44]," ",elements[45]
f.close()
I really don't know why just one line over two is read.
for line in f: reads a line. The next line line = f.readline() reads the next line and stores it in the same variable.
You read every line, but skip every second one.
Simply dropping line = f.readline() should solve the problem.