The last line of my file is:
29-dez,40,
How can I modify that line so that it reads:
29-Dez,40,90,100,50
Note: I don't want to write a new line. I want to take the same line and put new values after 29-Dez,40,
I'm new at python. I'm having a lot of trouble manipulating files and for me every example I look at seems difficult.
Unless the file is huge, you'll probably find it easier to read the entire file into a data structure (which might just be a list of lines), and then modify the data structure in memory, and finally write it back to the file.
On the other hand maybe your file is really huge - multiple GBs at least. In which case: the last line is probably terminated with a new line character, if you seek to that position you can overwrite it with the new text at the end of the last line.
So perhaps:
f = open("foo.file", "wb")
f.seek(-len(os.linesep), os.SEEK_END)
f.write("new text at end of last line" + os.linesep)
f.close()
(Modulo line endings on different platforms)
To expand on what Doug said, in order to read the file contents into a data structure you can use the readlines() method of the file object.
The below code sample reads the file into a list of "lines", edits the last line, then writes it back out to the file:
#!/usr/bin/python
MYFILE="file.txt"
# read the file into a list of lines
lines = open(MYFILE, 'r').readlines()
# now edit the last line of the list of lines
new_last_line = (lines[-1].rstrip() + ",90,100,50")
lines[-1] = new_last_line
# now write the modified list back out to the file
open(MYFILE, 'w').writelines(lines)
If the file is very large then this approach will not work well, because this reads all the file lines into memory each time and writes them back out to the file, which is very inefficient. For a small file however this will work fine.
Don't work with files directly, make a data structure that fits your needs in form of a class and make read from/write to file methods.
I recently wrote a script to do something very similar to this. It would traverse a project, find all module dependencies and add any missing import statements. I won't clutter this post up with the entire script, but I'll show how I went about modifying my files.
import os
from mmap import mmap
def insert_import(filename, text):
if len(text) < 1:
return
f = open(filename, 'r+')
m = mmap(f.fileno(), os.path.getsize(filename))
origSize = m.size()
m.resize(origSize + len(text))
pos = 0
while True:
l = m.readline()
if l.startswith(('import', 'from')):
continue
else:
pos = m.tell() - len(l)
break
m[pos+len(text):] = m[pos:origSize]
m[pos:pos+len(text)] = text
m.close()
f.close()
Summary: This snippet takes a filename and a blob of text to insert. It finds the last import statement already present, and sticks the text in at that location.
The part I suggest paying most attention to is the use of mmap. It lets you work with files in the same manner you may work with a string. Very handy.
Related
I have a text file, which has the following:
20
15
10
And I have the following code:
test_file = open("test.txt","r")
n = 21
line1 = test_file.readline(1)
line2 = test_file.readline(2)
line3 = test_file.readline(3)
test_file.close()
line1 = int(line1)
line2 = int(line2)
line3 = int(line3)
test_file = open("test.txt","a")
if n > line1:
test_file.write("\n")
n = str(n)
test_file.write(n)
test_file.close()
This code checks if the variable 'n' is bigger than line 1. What I wanted it to do is if it is bigger than line 1, it should be written in a line before the previous line 1. However this code will write it at the bottom of the file. Is there anything I can do to write something where I want to and not at the bottom of the file?
Any help is appreciated.
You can put your whole data in a variable, edit that variable then overwrite the information in the file.
with open('test.txt', 'r') as file:
# read a list of lines into data
data = file.readlines()
# now change the 2nd line, note that you have to add a newline
data[1] = "42\t\n"
# and write everything back
with open('test.txt', 'w') as file:
file.writelines( data )
This is a short answer, implement your own algorithm to solve your own problem.
As correctly pointed out by Amadan in a comment, the only way to obtain this result is a complete rewrite of the file.
This, clearly depending on how strict your requirements are, is fairly inefficient.
If you want to understand more about inefficiency just imagine the actions you would have to manually take to write a new 1st line in a physical notebook page.
Since the 1st line is already written you would have to turn the page, write the new first line, then copy again all the lines from the old page and, finally, tear the 1st page out and have your perfect notebook with a perfect page again.
You are writing with pen so there is no possibility to delete, only a new page will do the trick.
That is quite some work!
This is - well, more or less - what Python is doing behind the scenes when it is opening for reading (the 'r' part in my examples below) and then opening for writing (the 'w' part) the same file again.
As a general idea imagine that when you see for loops there is a lot of work to do.
I will clumsily over-simplify saying that the more the for loops the slower the code (countless pages of paper have been written by brilliant minds on performances, I suggest you diving dive deeper and searching for "Big O notation" using your preferred search engine. Here's an example: https://www.freecodecamp.org/news/all-you-need-to-know-about-big-o-notation-to-crack-your-next-coding-interview-9d575e7eec4/).
A better solution would be to change your data file and make sure that the last value is also the most recent one.
Rewriting the file is as easy as writing an empty file, code and result are identical.
The trick here is that we have in memory (in the variables data and new_data) everything we need.
In data we store the whole content of the file before the change.
In new_data we can easily apply the needed modification because it is just a list containing a number and a newline (\n) for each list item.
Once new_data contains the data in the desired order all we need to do is write that list into a file.
Here's a possible solution, as close as possible to your code:
n = 21
with open('test.txt', 'r') as file:
data = file.readlines()
first_entry = int(data[0])
if (n > first_entry):
new_data = []
new_value = str(n) + "\n"
new_data.append(new_value)
for item in data:
new_data.append(item)
with open('test.txt', 'w') as file:
file.writelines(new_data)
Here's a more portable one:
def prepend_to_file_if_bigger_than_first_line(filename, value):
"""Checks if value is bigger than the one found in the 1st line of the specified file,
if true prepends it to the file
Args:
filename (str): The file name to check.
value (str): The value to check.
"""
with open(filename, 'r') as file:
data = file.readlines()
first_entry = int(data[0])
if (value > first_entry):
new_value = "{}\n".format(value)
new_data = []
new_data.append(new_value)
for old_value in data:
new_data.append(old_value)
with open(filename, 'w') as file:
file.writelines(new_data)
prepend_to_file_if_bigger_than_first_line("test.txt", 301)
As bonus some food for thought and exercises to learn:
What if instead of rewriting everything you just add a new line to the end of the page? Wouldn't it be more efficient and effective?
How would you re-implement my function above just to check the last line in file and append a new value?
Try bench-marking the prepend and the append solution, which one is best?
I am using python to make a template updater for html. I read a line and compare it with the template file to see if there are any changes that needs to be updated. Then I want to write any changes (if there are any) back to the same line I just read from.
Reading the file, my file pointer is positioned now on the next line after a readline(). Is there anyway I can write back to the same line without having to open two file handles for reading and writing?
Here is a code snippet of what I want to do:
cLine = fp.readline()
if cLine != templateLine:
# Here is where I would like to write back to the line I read from
# in cLine
Updating lines in place in text file - very difficult
Many questions in SO are trying to read the file and update it at once.
While this is technically possible, it is very difficult.
(text) files are not organized on disk by lines, but by bytes.
The problem is, that read number of bytes on old lines is very often different from new one, and this mess up the resulting file.
Update by creating a new file
While it sounds inefficient, it is the most effective way from programming point of view.
Just read from file on one side, write to another file on the other side, close the files and copy the content from newly created over the old one.
Or create the file in memory and finally do the writing over the old one after you close the old one.
At the OS level the things are a bit different from how it looks from Python - from Python a file looks almost like a list of strings, with each string having arbitrary length, so it seems to be easy to swap a line for something else without affecting the rest of the lines:
l = ["Hello", "world"]
l[0] = "Good bye"
In reality, though, any file is just a stream of bytes, with strings following each other without any "padding". So you can only overwrite the data in-place if the resulting string has exactly the same length as the source string - otherwise it'll simply overwrite the following lines.
If that is the case (your processing guarantees not to change the length of strings), you can "rewind" the file to the start of the line and overwrite the line with new data. The below script converts all lines in file to uppercase in-place:
def eof(f):
cur_loc = f.tell()
f.seek(0,2)
eof_loc = f.tell()
f.seek(cur_loc, 0)
if cur_loc >= eof_loc:
return True
return False
with open('testfile.txt', 'r+t') as fp:
while True:
last_pos = fp.tell()
line = fp.readline()
new_line = line.upper()
fp.seek(last_pos)
fp.write(new_line)
print "Read %s, Wrote %s" % (line, new_line)
if eof(fp):
break
Somewhat related: Undo a Python file readline() operation so file pointer is back in original state
This approach is only justified when your output lines are guaranteed to have the same length, and when, say, the file you're working with is really huge so you have to modify it in place.
In all other cases it would be much easier and more performant to just build the output in memory and write it back at once. Another option is to write to a temporary file, then delete the original and rename the temporary file so it replaces the original file.
I want to append a new line in the starting of 2GB+ file. I tried following code but code OUT of MEMORY
error.
myfile = open(tableTempFile, "r+")
myfile.read() # read everything in the file
myfile.seek(0) # rewind
myfile.write("WRITE IN THE FIRST LINE ")
myfile.close();
What is the way to write in a file file without getting the entire file in memory?
How to append a new line at starting of the file?
Please note, there's no way to do this with any built-in functions in Python.
You can do this easily in LINUX using tail / cat etc.
For doing it via Python we must use an auxiliary file and for doing this with very large files, I think this method is the possibility:
def add_line_at_start(filename,line_to_be_added):
f = fileinput.input(filename,inplace=1)
for xline in f:
if f.isfirstline():
print line_to_be_added.rstrip('\r\n') + '\n' + xline,
else:
print xline
NOTE:
Never try to use read() / readlines() functions when you are dealing with big files. These methods tried load the complete file into your memory
In your given code, seek function is going to take you the starting point but then everything you write would overwrite the current content
If you can afford having the entire file in memory at once:
first_line_update = "WRITE IN THE FIRST LINE \n"
with open(tableTempFile, 'r+') as f:
lines = f.readlines()
lines[0] = first_line_update
f.writelines(lines)
otherwise:
from shutil import copy
from itertools import islice, chain
# TODO: use a NamedTemporaryFile from the tempfile module
first_line_update = "WRITE IN THE FIRST LINE \n"
with open("inputfile", 'r') as infile, open("tmpfile", 'w+') as outfile:
# replace the first line with the string provided:
outfile.writelines(
(line for line in chain((first_line_update,), islice(infile,1,None)))
# if you don't want to replace the first line but to insert another line before
# this simplifies to:
#outfile.writelines(line for line in chain((first_line_update,), infile))
copy("tmpfile", "infile")
# TODO: remove temporary file
Generally, you can't do that. A file is a sequence of bytes, not a sequence of lines. This data model doesn't allow for insertions in arbitrary points - you can either replace a byte by another or append bytes at the end.
You can either:
Replace the first X bytes in the file. This could work for you if you can make sure that the first line's length will never vary.
Truncate the file, write the first line, then rewrite all the rest after it. If you can't fit all your file into the memory, then:
create a temporary file (the tempfile module will help you)
write your line to it
open your base file in r and copy its contents after the first line to the temporary file, piece-wise
close both files, then replace the input file by the temporary file
(Note that appending to the end of a file is much easier - all you need to do is open the file in the append a mode.)
I can do this using a separate file, but how do I append a line to the beginning of a file?
f=open('log.txt','a')
f.seek(0) #get to the first position
f.write("text")
f.close()
This starts writing from the end of the file since the file is opened in append mode.
In modes 'a' or 'a+', any writing is done at the end of the file, even if at the current moment when the write() function is triggered the file's pointer is not at the end of the file: the pointer is moved to the end of file before any writing. You can do what you want in two manners.
1st way, can be used if there are no issues to load the file into memory:
def line_prepender(filename, line):
with open(filename, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write(line.rstrip('\r\n') + '\n' + content)
2nd way:
def line_pre_adder(filename, line_to_prepend):
f = fileinput.input(filename, inplace=1)
for xline in f:
if f.isfirstline():
print line_to_prepend.rstrip('\r\n') + '\n' + xline,
else:
print xline,
I don't know how this method works under the hood and if it can be employed on big big file. The argument 1 passed to input is what allows to rewrite a line in place; the following lines must be moved forwards or backwards in order that the inplace operation takes place, but I don't know the mechanism
In all filesystems that I am familiar with, you can't do this in-place. You have to use an auxiliary file (which you can then rename to take the name of the original file).
To put code to NPE's answer, I think the most efficient way to do this is:
def insert(originalfile,string):
with open(originalfile,'r') as f:
with open('newfile.txt','w') as f2:
f2.write(string)
f2.write(f.read())
os.remove(originalfile)
os.rename('newfile.txt',originalfile)
Different Idea:
(1) You save the original file as a variable.
(2) You overwrite the original file with new information.
(3) You append the original file in the data below the new information.
Code:
with open(<filename>,'r') as contents:
save = contents.read()
with open(<filename>,'w') as contents:
contents.write(< New Information >)
with open(<filename>,'a') as contents:
contents.write(save)
The clear way to do this is as follows if you do not mind writing the file again
with open("a.txt", 'r+') as fp:
lines = fp.readlines() # lines is list of line, each element '...\n'
lines.insert(0, one_line) # you can use any index if you know the line index
fp.seek(0) # file pointer locates at the beginning to write the whole file again
fp.writelines(lines) # write whole lists again to the same file
Note that this is not in-place replacement. It's writing a file again.
In summary, you read a file and save it to a list and modify the list and write the list again to a new file with the same filename.
num = [1, 2, 3] #List containing Integers
with open("ex3.txt", 'r+') as file:
readcontent = file.read() # store the read value of exe.txt into
# readcontent
file.seek(0, 0) #Takes the cursor to top line
for i in num: # writing content of list One by One.
file.write(str(i) + "\n") #convert int to str since write() deals
# with str
file.write(readcontent) #after content of string are written, I return
#back content that were in the file
There's no way to do this with any built-in functions, because it would be terribly inefficient. You'd need to shift the existing contents of the file down each time you add a line at the front.
There's a Unix/Linux utility tail which can read from the end of a file. Perhaps you can find that useful in your application.
If the file is the too big to use as a list, and you simply want to reverse the file, you can initially write the file in reversed order and then read one line at the time from the file's end (and write it to another file) with file-read-backwards module
An improvement over the existing solution provided by #eyquem is as below:
def prepend_text(filename: Union[str, Path], text: str):
with fileinput.input(filename, inplace=True) as file:
for line in file:
if file.isfirstline():
print(text)
print(line, end="")
It is typed, neat, more readable, and uses some improvements python got in recent years like context managers :)
I tried a different approach:
I wrote first line into a header.csv file. body.csv was the second file. Used Windows type command to concatenate them one by one into final.csv.
import os
os.system('type c:\\\header.csv c:\\\body.csv > c:\\\final.csv')
with open("fruits.txt", "r+") as file:
file.write("bab111y")
file.seek(0)
content = file.read()
print(content)
I'm a beginner in python. I have a huge text file (hundreds of GB) and I want to convert the file into csv file. In my text file, I know the row delimiter is a string "<><><><><><><>". If a line contains that string, I want to replace it with ". Is there a way to do it without having to read the old file and rewriting a new file.
Normally I thought I need to do something like this:
fin = open("input", "r")
fout = open("outpout", "w")
line = f.readline
while line != "":
if line.contains("<><><><><><><>"):
fout.writeline("\"")
else:
fout.writeline(line)
line = f.readline
but copying hundreds of GB is wasteful. Also I don't know if open will eat lots of memory (does it treat file handler as a stream?)
Any help is greatly appreciated.
Note: an example of the file would be
file.txt
<><><><><><><>
abcdefeghsduai
asdjliwa
1231214 ""
<><><><><><><>
would be one row and one column in csv.
#richard-levasseur
I agree, sed seems like the right way to go. Here's a rough cut at what the OP describes:
sed -i -e's/<><><><><><><>/"/g' foo.txt
This will do the replacement in-place in the existing foo.txt. For that reason, I recommend having the original file under some sort of version control; any of the DVCS should fit the bill.
Yes, open() treats the file as a stream, as does readline(). It'll only read the next line. If you call read(), however, it'll read everything into memory.
Your example code looks ok at first glance. Almost every solution will require you to copy the file elsewhere. Its not exactly easy to modify the contents of a file inplace without a 1:1 replacement.
It may be faster to use some standard unix utilities (awk and sed most likely), but I lack the unix and bash-fu necessary to provide a full solution.
It's only wasteful if you don't have disk to spare. That is, fix it when it's a problem. Your solution looks ok as a first attempt.
It's not wasteful of memory because a file handler is a stream.
Reading lines is simply done using a file iterator:
for line in fin:
if line.contains("<><><><><><><>"):
fout.writeline("\"")
Also consider the CSV writer object to write CSV files, e.g:
import csv
writer = csv.writer(open("some.csv", "wb"))
writer.writerows(someiterable)
With python you will have to create a new file for safety sake, it will cause alot less headaches than trying to write in place.
The below listed reads your input 1 line at a time and buffers the columns (from what I understood of your test input file was 1 row) and then once the end of row delimiter is hit it will write that buffer to disk, flushing manually every 1000 lines of the original file. This will save some IO as well instead of writing every segment, 1000 writes of 32 bytes each will be faster than 4000 writes of 8 bytes.
fin = open(input_fn, "rb")
fout = open(output_fn, "wb")
row_delim = "<><><><><><><>"
write_buffer = []
for i, line in enumerate(fin):
if not i % 1000:
fout.flush()
if row_delim in line and i:
fout.write('"%s"\r\n'%'","'.join(write_buffer))
write_buffer = []
else:
write_buffer.append(line.strip())
Hope that helps.
EDIT: Forgot to mention, while using .readline() is not a bad thing don't use .readlines() which will go and read the entire content of the file into a list containing each line which is incredibly inefficient. Using the built in iterator that comes with a file object is the best memory usage and speed.
#Constatin suggests that if you would be satisfied with replacing '<><><><><><><>\n' by '" \n'
then the replacement string is the same length, and in that case you can craft a solution to in-place editing with mmap. You will need python 2.6. It's vital that the file is opened in the right mode!
import mmap, os
CHUNK = 2**20
oldStr = ''
newStr = '" '
strLen = len(oldStr)
assert strLen==len(newStr)
f = open("myfilename", "r+")
size = os.fstat(f.fileno()).st_size
for offset in range(0,size,CHUNK):
map = mmap.mmap(f.fileno(),
length=min(CHUNK+strLen,size-offset), # not beyond EOF
offset=offset)
index = 0 # start at beginning
while 1:
index = map.find(oldStr,index) # find next match
if index == -1: # no more matches in this map
break
map[index:index+strLen] = newStr
f.close()
This code is not debugged! It works for me on a 3 MB test case, but it may not work on a large ( > 2GB) file - the mmap module still seems a bit immature, so I wouldn't rely on it too much.
Looking at the bigger picture, from what you've posted it isn't clear that your file will end up as valid CSV. Also be aware that the tool you're planning to use to actually process the CSV may be flexible enough to deal with the file as it stands.
If you're delimiting fields with double quotes, it looks like you need to escape the double quotes you have occurring in your elements (for example 1231214 "" will need to be \n1231214 \"\").
Something like
fin = open("input", "r")
fout = open("output", "w")
for line in fin:
if line.contains("<><><><><><><>"):
fout.writeline("\"")
else:
fout.writeline(line.replace('"',r'\"')
fin.close()
fout.close()
[For the problem exactly as stated] There's no way that this can be done without copying the data, in python or any other language. If your processing always replaced substrings with new substrings of equal length, maybe you could do it in-place. But whenever you replace <><><><><><><> with " you are changing the position of all subsequent characters in the file. Copying from one place to another is the only way to handle this.
EDIT:
Note that the use of sed won't actually save any copying...sed doesn't really edit in-place either. From the GNU sed manual:
-i[SUFFIX]
--in-place[=SUFFIX]
This option specifies that files are to be edited in-place. GNU sed does this by creating a temporary file and sending output to this file rather than to the standard output.
(emphasis mine.)