Add line number when adding line to a file - python

From my main program I am calling in a loop a script which, as an output, adds a line of data to a txt file. What is the easiest way to include also the line number?
Here is the code I am using:
if area > 1000:
f = open(output_file, "a")
f.write("%s %s\n" % (a, b))
f.close

You'll first need to count the number of lines in the file already, before adding a new line with the counter incremented:
if area > 1000:
with open(output_file, "r+") as f:
linecount = sum(1 for _ in output_file)
f.write("%s %s %s\n" % (linecount + 1, a, b))
This is the simpler approach; it reads the whole file and counts the lines. For larger files, you'd have to read a chunk at the end to find the last line, then parse the last counter from that last line instead, to avoid reading through the whole file.

You should not be opening the output file everytime. As for the counter, you can just maintain a counter yourself.
with open(input_file, 'r') as i, open(output_file, 'w') as o:
count = 1
for line in i:
# do some computation
if area > 1000:
o.write('%d: %s %s\n' %(count, a, b))
count += 1

Related

How to read data corresponds to specific line numbers from a 60GB text file in python?

I have a text file (1 Billion lines) of 60GB size. I have to extract data corresponds to specified line numbers which can be read from another text file (eg:1, 4, 70, 100...etc). Due to the size I can't load data to memory and then extract lines. Also, line by line matching and extraction would take many days of time. Is there any solution exist for this problem?
2 methods which I tried:
1. first method
f = open('line_numbers.txt')
lines = f.readlines()
numbers =[int(e.strip()) for e in lines]
r = max(numbers)
file = open('OUTPUT_RESULT.txt','w')
with open('Large_File.txt') as infile:
for num, line in enumerate(infile,1):
if (num<= r):
if (num in numbers):
file.write(line)
else:
pass
print(num)
It will take many days to get the result
2. second method
import pandas as pd
data = pd.read_csv('Large_File.txt', header=None)
file = open('OUTPUT_RESULT.txt','w')
f = open('line_numbers.txt')
lines = f.readlines()
numbers =[int(e.strip()) for e in lines]
x = data.loc[numbers,:]
file.write(x)
It does not load file to memory
Is there any solution available to resolve this?
Your issue is probably with the if (num in numbers) line. Not only does it not need the parentheses, but it also checks this for every iteration, even though your code goes through the file in order (first line 1, then line 2, etc.).
That can be easily optimised and doing so, the code below ran in only 12 seconds on a test file of about 50 million lines. It should process your file in a few minutes.
import random
numbers = sorted([random.randint(1, 50000000) for _ in range(1000)])
outfile = open('specific_lines.txt', 'w')
with open('archive_list.txt', 'r', encoding='cp437') as infile:
for num, line in enumerate(infile, 1):
if numbers:
if num == numbers[0]:
outfile.write(line)
print(num)
del numbers[0]
else:
pass
Note: this generates a 1,000 random line numbers, replace with your loaded numbers like in your example. If your list of number is far greater, the write time for the output file will increase execution time somewhat.
Your code would be like:
with open('line_numbers.txt') as f:
lines = f.readlines()
numbers = sorted([int(e.strip()) for e in lines])
outfile = open('specific_lines.txt', 'w')
with open('archive_list.txt', 'r', encoding='cp437') as infile:
for num, line in enumerate(infile, 1):
if numbers:
if num == numbers[0]:
outfile.write(line)
print(num)
del numbers[0]
else:
pass

I need to format a write statement in python to format it to a text file. I also am not sure how to keep the updated list when writing the file

I know it's not completely finished, but I'm very confused as how to format the save inventory function so it prints like the original text file. During the add_item function, it shows that the item has been added to the lists. But when going to write nothing is there or updated.
Example of how the text file needs to look
def save_inventory(inventoryFile, descriptionArray, quantityArray, priceArray, intrecords):
outfile = open(inventoryFile, "w")
with open('inventory1.txt', 'r') as f:
count = -1
for line in f:
count+=1
if count % 3 == 0: #this is the remainder operator
outfile.write(descriptionArray)
print(descriptionArray)
with open('inventory1.txt', 'r') as f:
count = -2
for line in f:
count+=1
if count % 3 == 0: #this is the remainder operator
outfile.write(str(quantityArray))
print(quantityArray)
with open('inventory1.txt', 'r') as f:
count = -3
for line in f:
count+=1
if count % 3 == 0: #this is the remainder operator
outfile.write(str(priceArray))
print(priceArray)
outfile.close()
You are only writing to the file when you have read a line. If your text file is empty you will never write to the file.
What I would do is zip the lists together and loop through them. Then write three lines to the file for each pass through the loop. You can print a carriage return with '\n'
with open(inventoryFile, 'w') as f:
for d, q, p in zip(descriptionArray, quantityArray, priceArray):
f.write('%s\n%s\n%s\n' % (d, q, p))

Indexing lines in a Python file

I want to open a file, and simply return the contents of said file with each line beginning with the line number.
So hypothetically if the contents of a is
a
b
c
I would like the result to be
1: a
2: b
3: c
Im kind of stuck, tried enumerating but it doesn't give me the desired format.
Is for Uni, but only a practice test.
A couple bits of trial code to prove I have no idea what I'm doing / where to start
def print_numbered_lines(filename):
"""returns the infile data with a line number infront of the contents"""
in_file = open(filename, 'r').readlines()
list_1 = []
for line in in_file:
for item in line:
item.index(item)
list_1.append(item)
return list_1
def print_numbered_lines(filename):
"""returns the infile data with a line number infront of the contents"""
in_file = open(filename, 'r').readlines()
result = []
for i in in_file:
result.append(enumerate(i))
return result
A file handle can be treated as an iterable.
with open('tree_game2.txt') as f:
for i, line in enumerate(f):
print ("{0}: {1}".format(i+1,line))
There seems no need to write a python script, awk would solve your problem.
awk '{print NR": "$1}' your_file > new_file
What about using an OrderedDict
from collections import OrderedDict
c = OrderedDict()
n = 1
with open('file.txt', 'r') as f:
for line in f:
c.update({n:line})
#if you just want to print it, skip the dict part and just do:
print n,line
n += 1
Then you can print it out with:
for n,line in c.iteritems(): #.items() if Python3
print k,line
the simple way to do it:
1st:with open the file -----2ed:using count mechanism:
for example:
data = object of file.read()
lines = data.split("\n")
count =0
for line in lines:
print("line "+str(count)+">"+str()+line)
count+=1

"list index out of range" when try to output lines from a text file using python

I was trying to extract even lines from a text file and output to a new file. But with my codes python warns me "list index out of range". Anyone can help me? THANKS~
Code:
f = open('input.txt', 'r')
i = 0
j = 0
num_lines = sum(1 for line in f)
newline = [0] * num_lines
print (num_lines)
for i in range(1, num_lines):
if i % 2 == 0:
newline[i] = f.readlines()[i]
print i, newline[i]
i = i + 1
f.close()
f = open('output.txt', 'w')
for j in range(0,num_lines):
if j % 2 == 0:
f.write(newline[j] + '\n')
j = j + 1
f.close()
Output:
17
Traceback (most recent call last):
File "./5", line 10, in <module>
a = f.readlines()[1]
IndexError: list index out of range
After
num_lines = sum(1 for line in f)
The file pointer in f is at the end of the file. Therefore any subsequent call of f.readlines() gives an empty list. The minimal fix is to use f.seek(0) to return to the start of the file.
However, a better solution would be to read through the file only once, e.g. using enumerate to get the line and its index i:
newline = []
for i, line in enumerate(f):
if i % 2 == 0:
newline.append(line)
In your original script you read the file once to scan the number of lines, then you (try to) read the lines in memory, you needlessly create a list for the full size instead of just extending it with list.append, you initialize the list with zeroes which does not make sense for a list containing strings, etc.
Thus, this script does what your original idea was, but better and simpler and faster:
with open('input.txt', 'r') as inf, open('output.txt', 'w') as outf:
for lineno, line in enumerate(inf, 1):
if lineno % 2 == 0:
outf.write(line)
Specifically
open the files with with statement so that they are automatically closed when
the block is exited.
write as they are read
as lines are numbered 1-based, use the enumerate with the start value 1 so that you truly get the even numbered lines.
You've also got the itertools.islice approach available:
from itertools import islice
with open('input') as fin, open('output', 'w') as fout:
fout.writelines(islice(fin, None, None, 2))
This saves the modulus operation and puts the line writing to system level.

how to get certain line from file using python

I need to get a certain part of my file and write it in new file. Keep the rest in a new file. So I will have 3 files . 1) Original file 2)Selected lines 3) The rest . I have a code that works for taking the first selection. I'm having problem to get the next selection and so on. Here's my code :
counter=0
with open('1','r') as file1: #open raw data
with open('2','w') as file3:
with open('3','w') as file_out:
for i in file1:
if counter <10: ############# Next I need to get line 10 to 20 followed by 20 to 30
file_out.write(i)
else:
file3.write(i)
counter += 1
How can I change my code so that I can get the next selection?
Does this make what you want?
def split_on_crosses(infile, chunk_size):
head_num = 1 # counter for chunks
head_file = open('1-head.txt', 'w') # outport to first head file
tails = [] # outports to tail files
with open(infile,'r') as inport: #open raw data
for i, line in enumerate(inport, start=1):
head_file.write(line)
for t in tails: # write to all tail files
t.write(line)
if i % chunk_size == 0: # boundary of chunk is reached
tails.append(open('%s-tail.txt' % head_num, 'w')) # add one tail file
head_num += 1
head_file = open('%s-head.txt' % head_num, 'w') # switch to next head file
split_on_crosses('infile.txt', 10)
This should do what you want, written in Python3.x.
#read file1, get the lines as an array, length of said array, and close it.
alpha=open('alpha.txt','r')
alphaLine=alpha.readlines()
alphaLength=len(alphaLine)
alpha.close()
#lines above 10 and below 20 are sent to beta, while 10 to 20 are sent to gamma.
beta=open('beta.txt','w')
gamma=open('gamma.txt','w')
for i in range(alphaLength):
if i<9:
beta.write(alphaLine[i])
elif i<20:
gamma.write(alphaLine[i])
else:
beta.write(alphaLine[i])
beta.close()
gamma.close()
For speed, I will assume the file is small enough to hold in memory (rather than re-reading the file each time):
from itertools import islice
BLOCKSZ = 10 # lines per chunk
# file names
INPUT = "raw_data.txt"
OUTPUT_LINES = lambda a, b: "data_lines_{}_to_{}.txt" .format(a, b-1)
OUTPUT_EXCEPT = lambda a, b: "data_except_{}_to_{}.txt".format(a, b-1)
def main():
# read file as list of lines
with open(INPUT) as inf:
data = list(inf)
num_blocks = (len(data) + BLOCKSZ - 1) // BLOCKSZ
for block in range(num_blocks):
# calculate start and end lines for this chunk
start = block * BLOCKSZ
end = (block + 1) * BLOCKSZ
# write out [start:end]
with open(OUTPUT_RANGE(start, end), "w") as outf:
for line in islice(data, start, end):
outf.write(line)
# write out [:start] + [end:]
with open(OUTPUT_EXCEPT(start, end), "w") as outf:
for line in islice(data, start):
outf.write(line)
for line in islice(data, end - start):
pass
for line in inf:
outf.write(line)
if __name__=="__main__":
main()
Edit: I just realized I made a mistake in my line-slicing for OUTPUT_EXCEPT (thinking of islice offsets as absolute not relative); this is now fixed.

Categories