Dividing a .txt file in multiple parts in Python - python

I'm a begginer in Python, and I have a question about file reading :
I need to process info in a file to write it in another one. I know how to do that, but it's reaaally ressource-consuming for my computer, as the file is really big, but I know how it's formatted !
The file follows that format :
4 13
9 3 4 7
3 3 3 3
3 5 2 1
I won't explain what it is for, as it would take ages and would not be very useful, but the file is essentialy made of four lines like these, again and again. For now, I use this to read the file and convert it in a very long chain :
inputfile = open("input.txt", "r")
output = open("output.txt", "w")
Chain = inputfile.read()
Chain = Chain.split("\n")
Chained = ' '.join(Chain)
Chain = Chained.split(" ")
Chain = list(map(int, Chain))
Afterwards, I just treat it with "task IDs", but I feel like it's really not efficient.
So do you know how I could divide the chain into multiple ones knowing how they are formatted?
Thanks for reading !

How about:
res = []
with open('file', 'r') as f:
for line in f:
for num in line.split(' '):
res.append(int(num))
Instead of reading the whole file into memory, you go line by line.
Does this help?
If you need to go 4 lines at a time, just add an internal loop.
Regarding output, I'm assuming you want to do some computation on the input, so I wouldn't necessarily do this in the same loop. Either process the input once reading is done, or instead of using a list, use a queue and have another thread read from the queue while this thread is writing to it.
Perhaps the utility of a list comprehension will help a bit as well (I doubt this will make an impact):
res = []
with open('file', 'r') as f:
for line in f:
res.append( int(num) for num in line.split() )

hmm there's some method to write to a file without reading it i believe
Add text to end of line without loading file
https://docs.python.org/2.7/library/functions.html#print
from __future__ import print_function
# if you are using python2.7
i = open("input","r")
f = open("output.txt","w")
a = "awesome"
for line in i:
#iterate lines in file input
line.strip()
#this will remove the \n in the end of the string
print(line,end=" ",file=f)
#this will write to file output with space at the end of it
this might help, i'm a newbie too, but with better google fu XD

Maybe do it line by line. This way it consumes less memory.
inputfile = open("input.txt", "r")
output = open("output.txt", "a")
while True:
line = inputfile.readline()
numbers = words.split(" ")
integers = list(map(int, numbers))
if not line:
break
There is probably a newline character \n in the words. You should also replace that with an empty string.

If you don't wanna to consume memory (you can run of it if file is very large), you need to read lien by line.
with open('input.txt', 'w') as inputfile, open('"output.txt', 'w') as output:
for line in inputfile:
chain = line.split(" ")
#do some calculations or what ever you need
#and write those numbers to new file
numbers = list(map(int, chain))
for number in numbers
output.write("%d " % number)

Related

Reading strings and decreasing them

I just have a simple question when dealing with text files:
I have a text file and want to make a python program to read it and if it finds any number it replaces it by the number preceding it like if it finds 4 it replaces it with 3 so how can I do that?
The problem for me in this program is that python reads the numbers as strings, not integers, so it can't decrease or increase them.
out = open("out.txt", "w")
with open("Spider-Man.Homecoming.2017.1080p.BluRay.x264-[YTS.AG].txt", "r") as file:
lines = file.readlines()
for line in lines:
if line.isdigit():
out.write(str(int(line - 1)))
else:
out.write(line)
This code doesn't detect the numbers as numbers and I don't know why.
Putting #Samwise's comment together with your code:
with open("Spider-Man.Homecoming.2017.1080p.BluRay.x264-[YTS.AG].txt", "r") as file:
lines = file.readlines()
new_lines = []
for line in lines:
decreased = ''.join(str(int(c)-1) if c.isdigit() else c for c in line)
new_lines.append(decreased)
with open('out.txt', 'w') as out:
out.writelines(new_lines)
You also should close the file after writing to it, so switched to with open at the end as a better way to write to file.

Deleting a line if it starts with a specific number [duplicate]

This question already has answers here:
How to delete a specific line in a file?
(17 answers)
Closed 2 years ago.
Hey so right now I am trying to make a small program that can delete lines based off of number in front of the question. (Just so you don't have to retype the whole question again)
with open("DailyQuestions.txt", "r") as f:
lines = f.readlines()
with open("DailyQuestions.txt", "w") as w:
for line in lines:
Num, A = line.split(" - ")
if not line.startswith(Num):
w.write(line)
Textfile:
1 - Q1
2 - Q2
3 - Q3
4 - Q4
5 - Q5
The problem with this is that it either deletes the whole file or it it expects 2 values (Num, A = line.split(" - ")). I still can't figure out a way for it to just delete the whole line based on the number infront of it. Any tips or suggestions would help a lot!
Two things:
First, you can use one open by passing in r+, this can tidy up your code.
Secondly, w overwrites the entire file, it doesn't append. If you would like to make it append, you can pass in a instead of w to open. I recommend, assuming you aren't dealing with a large amount of data, storing what you want to write in a string variable (with newlines), and then writing that to the file.
Here is a solution:
toWrite = "";
with open("DailyQuestions.txt", "r") as f:
lines = f.readlines()
for line in lines:
Num, A = line.split(" - ")
if not Num == "Number to Replace":
toWrite += line
with open("DailyQuestions.txt", "w") as w:
w.write(toWrite)
As an option, read all lines from the file and the generator go through each, rewrite the file.
If you need to delete all lines where there is no number
with open('test_file.txt', 'r') as f:
#Delete line if not have -
lines = [value[value.find('-')+1:].strip() for value in f.readlines() if value.find('-') != -1]
with open('test_file.txt', 'w') as f:
if lines: f.write('\n'.join(lines))
Your primary problem lies in the construction of your Boolean Expression. Based on the current design of the program, the expression will always be False. Therefore nothing is outputted.
Before you search through all the lines of text, you need to create a variable to store the numbers that you want to exclude. Use a List or Set:
numbers_to_exclude = {'2', '4', '6', '5'}
Then utilize this variable in your Boolean Expression:
if first_character_of_line not in numbers_to_exclude:
Your final code should look something like:
numbers_to_exclude = {'2', '4', '6', '5'}
with open("DailyQuestions.txt", "r") as f:
lines = f.readlines()
with open("output.txt", "w") as w:
for line in lines:
first_char_in_line = line[0]
if first_char_in_line not in numbers_to_exclude: //The line begins with the number we want to output
w.write(line)
I dont really know how to code it but i can maybe to give you an idea.
My idea is to create all again by saving the text or all the line on a variable like a list and next rewrite it ( with a “for i in range(len()) :” loop for example) but with a “if” that took the first number and check if it is the number you want to delete, so you just have to dont write it.
The code will be not super optimize but i think that can work
I hope i can help you :)
If you are just trying to remove the lines from the file that starts with a specific number you could use this code:
rewrite = ""
exluded_num = 4 #Place your excluded number here
with open("DailyQuestions.txt", "r") as f:
for line in f:
Num, A = line.split(" - ")
if int(Num) != exluded_num:
rewrite += line
open("DailyQuestions.txt", "w").write(rewrite)
The problem with the code you posted was that you are altering the text file as you are trying to read through it. That will throw an exception.

Python- how to use while loop to return longest line of code

I just started learning python 3 weeks ago, I apologize if this is really basic. I needed to open a .txt file and print the length of the longest line of code in the file. I just made a random file named it myfile and saved it to my desktop.
myfile= open('myfile', 'r')
line= myfile.readlines()
len(max(line))-1
#the (the "-1" is to remove the /n)
Is this code correct? I put it in interpreter and it seemed to work OK.
But I got it wrong because apparently I was supposed to use a while loop. Now I am trying to figure out how to put it in a while loop. I've read what it says on python.org, watched videos on youtube and looked through this site. I just am not getting it. The example to follow that was given is this:
import os
du=os.popen('du/urs/local')
while 1:
line= du.readline()
if not line:
break
if list(line).count('/')==3:
print line,
print max([len(line) for line in file(filename).readlines()])
Taking what you have and stripping out the parts you don't need
myfile = open('myfile', 'r')
max_len = 0
while 1:
line = myfile.readline()
if not line:
break
if len(line) # ... somethin
# something
Note that this is a crappy way to loop over a file. It relys on the file having an empty line at the end. But homework is homework...
max(['b','aaa']) is 'b'
This lexicographic order isn't what you want to maximise, you can use the key flag to choose a different function to maximise, like len.
max(['b','aaa'], key=len) is 'aaa'
So the solution could be: len ( max(['b','aaa'], key=len) is 'aaa' ).
A more elegant solution would be to use list comprehension:
max ( len(line)-1 for line in myfile.readlines() )
.
As an aside you should enclose opening a file using a with statement, this will worry about closing the file after the indentation block:
with open('myfile', 'r') as mf:
print max ( len(line)-1 for line in mf.readlines() )
As other's have mentioned, you need to find the line with the maximum length, which mean giving the max() function a key= argument to extract that from each of lines in the list you pass it.
Likewise, in a while loop you'd need to read each line and see if its length was greater that the longest one you had seen so far, which you could store in a separate variable and initialize to 0 before the loop.
BTW, you would not want to open the file with os.popen() as shown in your second example.
I think it will be easier to understand if we keep it simple:
max_len = -1 # Nothing was read so far
with open("filename.txt", "r") as f: # Opens the file and magically closes at the end
for line in f:
max_len = max(max_len, len(line))
print max_len
As this is homework... I would ask myself if I should count the line feed character or not. If you need to chop the last char, change len(line) by len(line[:-1]).
If you have to use while, try this:
max_len = -1 # Nothing was read
with open("t.txt", "r") as f: # Opens the file
while True:
line = f.readline()
if(len(line)==0):
break
max_len = max(max_len, len(line[:-1]))
print max_len
For those still in need. This is a little function which does what you need:
def get_longest_line(filename):
length_lines_list = []
open_file_name = open(filename, "r")
all_text = open_file_name.readlines()
for line in all_text:
length_lines_list.append(len(line))
max_length_line = max(length_lines_list)
for line in all_text:
if len(line) == max_length_line:
return line.strip()
open_file_name.close()

Python- How to Remove Columns from a File

I'd like to remove the first column from a file. The file contains 3 columns separated by space and the columns has the following titles:
X', 'Displacement' and 'Force' (Please see the image).
I have came up with the following code, but to my disappointment it doesn't work!
f = open("datafile.txt", 'w')
for line in f:
line = line.split()
del x[0]
f.close()
Any help is much appreciated !
Esan
First of all, you're attempting to read from a file (by iterating through the file contents) that is open for writing. This will give you an IOError.
Second, there is no variable named x in existence (you have not declared/set one in the script). This will generate a NameError.
Thirdly and finally, once you have finished (correctly) reading and editing the columns in your file, you will need to write the data back into the file.
To avoid loading a (potentially large) file into memory all at once, it is probably a good idea to read from one file (line by line) and write to a new file simultaneously.
Something like this might work:
f = open("datafile.txt", "r")
g = open("datafile_fixed.txt", "w")
for line in f:
if line.strip():
g.write("\t".join(line.split()[1:]) + "\n")
f.close()
g.close()
Some reading about python i/o might be helpful, but something like the following should get you on your feet:
with open("datafile.txt", "r") as fin:
with open("outputfile.txt", "w") as fout:
for line in fin:
line = line.split(' ')
if len(line) == 3:
del line[0]
fout.write(line[0] + ' ' + line[1])
else:
fout.write('\n')
EDIT: fixed to work with blank lines
print ''.join([' '.join(l.split()[1:]) for l in file('datafile.txt')])
or, if you want to preserve spaces and you know that the second column always starts at the, say, 10th character:
print ''.join([l[11:] for l in file('datafile.txt')])

Fastest Way to Delete a Line from Large File in Python

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat...
Eventually once I'm comfortable with the process, I'll automate it entirely. For now however, let's assume I'm running this by hand.
What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python...but would be open to other examples. The line might be anywhere in the file.
If Python, assume the following interface:
def removeLine(filename, lineno):
Thanks,
-aj
You can have two file objects for the same file at the same time (one for reading, one for writing):
def removeLine(filename, lineno):
fro = open(filename, "rb")
current_line = 0
while current_line < lineno:
fro.readline()
current_line += 1
seekpoint = fro.tell()
frw = open(filename, "r+b")
frw.seek(seekpoint, 0)
# read the line we want to discard
fro.readline()
# now move the rest of the lines in the file
# one line back
chars = fro.readline()
while chars:
frw.writelines(chars)
chars = fro.readline()
fro.close()
frw.truncate()
frw.close()
Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also "fix" the line in place if the fix is not longer than the line you are replacing
import os
from mmap import mmap
def removeLine(filename, lineno):
f=os.open(filename, os.O_RDWR)
m=mmap(f,0)
p=0
for i in range(lineno-1):
p=m.find('\n',p)+1
q=m.find('\n',p)
m[p:q] = ' '*(q-p)
os.close(f)
If the other program can be changed to output the fileoffset instead of the line number, you can assign the offset to p directly and do without the for loop
As far as I know, you can't just open a txt file with python and remove a line. You have to make a new file and move everything but that line to it. If you know the specific line, then you would do something like this:
f = open('in.txt')
fo = open('out.txt','w')
ind = 1
for line in f:
if ind != linenumtoremove:
fo.write(line)
ind += 1
f.close()
fo.close()
You could of course check the contents of the line instead to determine if you want to keep it or not. I also recommend that if you have a whole list of lines to be removed/changed to do all those changes in one pass through the file.
If the lines are variable length then I don't believe that there is a better algorithm than reading the file line by line and writing out all lines, except for the one(s) that you do not want.
You can identify these lines by checking some criteria, or by keeping a running tally of lines read and suppressing the writing of the line(s) that you do not want.
If the lines are fixed length and you want to delete specific line numbers, then you may be able to use seek to move the file pointer... I doubt you're that lucky though.
Update: solution using sed as requested by poster in comment.
To delete for example the second line of file:
sed '2d' input.txt
Use the -i switch to edit in place. Warning: this is a destructive operation. Read the help for this command for information on how to make a backup automatically.
def removeLine(filename, lineno):
in = open(filename)
out = open(filename + ".new", "w")
for i, l in enumerate(in, 1):
if i != lineno:
out.write(l)
in.close()
out.close()
os.rename(filename + ".new", filename)
I think there was a somewhat similar if not exactly the same type of question asked here. Reading (and writing) line by line is slow, but you can read a bigger chunk into memory at once, go through that line by line skipping lines you don't want, then writing this as a single chunk to a new file. Repeat until done. Finally replace the original file with the new file.
The thing to watch out for is when you read in a chunk, you need to deal with the last, potentially partial line you read, and prepend that into the next chunk you read.
#OP, if you can use awk, eg assuming line number is 10
$ awk 'NR!=10' file > newfile
I will provide two alternatives based on the look-up factor (line number or a search string):
Line number
def removeLine2(filename, lineNumber):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
currentLineNumber = 0
while currentLineNumber < lineNumber:
inputFile.readline()
currentLineNumber += 1
seekPosition = inputFile.tell()
outputFile.seek(seekPosition, 0)
inputFile.readline()
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()
String
def removeLine(filename, key):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
seekPosition = 0
currentLine = inputFile.readline()
while not currentLine.strip().startswith('"%s"' % key):
seekPosition = inputFile.tell()
currentLine = inputFile.readline()
outputFile.seek(seekPosition, 0)
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()

Categories