python, when use writestr there is a text newline issue - python

I need your help~
I have a LF problem when I use 'wirtestr'.
the text have been written into zip well
but It comes in one line without line breaks.
Only I could find the delimiter which looks like square has circle in the middle of it, maybe hex code for newline.
If anyone knows about this problem please help!
fp = StringIO(line)
value = fp.getvalue()
filename1 = 'D:/re/m/11.txt'
filename2 = 'D:/re/m/dd.zip'
archive = zipfile.ZipFile(filename2, 'w', zipfile.ZIP_DEFLATED)
finfo = zipfile.ZipInfo(filename1)
archive.writestr(finfo, value)

The ZipFile.writestr method writes files from Python string in binary mode. All text files added with this method must then have explicit '\r\n' line endings for Windows programs to read them correctly afterwards.
Your original content had 'universal line ending' within python, which usually only turn into CRLF ('\r\n') when going through a text-mode output file.
That seems to be fixed in python 3.x

Related

Adding text at the beginning of multiple txt files into a folder. Problem of overwriting the text inside

im trying to add the same text at the beggining of all the txt files that are in a folder.
With this code i can do it, but there is a problem, i dont know why it overwrite part of the text that is at the beginning of each txt file.
output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
for f in glob.glob("*.txt"):
with open(f, 'r', encoding="utf8") as inputfile:
with open('%s/%s' % (output_dir, ntpath.basename(f)), 'w', encoding="utf8") as outputfile:
for line in inputfile:
outputfile.write(line.replace(line,"more_text"+line+"text_that_is_overwrited"))
outputfile.seek(0,io.SEEK_SET)
outputfile.write('text_that_overwrite')
outputfile.seek(0, io.SEEK_END)
outputfile.write("more_text")
The content of txt files that im trying to edit start with this:
here 4 spaces text_line_1
here 4 spaces text_line_2
The result is:
On file1.txt: text_that_overwriteited
On file1.txt: text_that_overwriterited
Your mental model of how writing a file works seems to be at odds with what's actually happening here.
If you seek back to the beginning of the file, you will start overwriting all of the file. There is no such thing as writing into the middle of a file. A file - at the level of abstraction where you have open and write calls - is just a stream; seeking back to the beginning of the stream (or generally, seeking to a specific position in the stream) and writing replaces everything which was at that place in the stream before.
Granted, there is a lower level where you could actually write new bytes into a block on the disk whilst that block still remains the storage for a file which can then be read as a stream. With most modern file systems, the only way to make this work is to replace that block with exactly the same amount of data, which is very rarely feasible. In other words, you can't replace a block containing 1024 bytes with data which isn't also exactly 1024 bytes. This is so marginally useful that it's simply not an operation which is exposed to the higher level of the file system.
With that out of the way, the proper way to "replace lines" is to not write those lines at all. Instead, write the replacement, followed by whichever lines were in the original file.
It's not clear from your question what exactly you want overwritten, so this is just a sketch with some guesses around that part.
output_dir = "output"
# prefer exist_ok=True over if not os.path.exists()
os.makedirs(output_dir, exist_ok=True)
for f in glob.glob("*.txt"):
# use a single with statement
# prefer os.path.basename over ntpath.basename; use os.path.join
with open(f, 'r', encoding="utf8") as inputfile, \
open(os.path.join(output_dir, os.path.basename(f)), 'w', encoding="utf8") as outputfile:
for idx, line in enumerate(inputfile):
if idx == 0:
outputfile.write("more text")
outputfile.write(line.rstrip('\n'))
outputfile.write("text that is overwritten\n")
continue
# else:
outputfile.write(line)
outputfile.write("more_text\n")
Given an input file like
here is some text
here is some more text
this will create an output file like
more texthere is some texttext that is overwritten
here is some more text
more_text
where the first line is a modified version of the original first line, and a new line is appended after the original file's contents.
I found this elsewhere on StackOverflow. Why does my text file keep overwriting the data on it?
Essentially, the w mode is meant to overwrite text.
Also, you seem to be writing a sitemap manually. If you are using a web framework like Flask or Django, they have plugin or built-in support for auto-generated sitemaps — you should use that instead. Alternatively, you could create an XML template for the sitemap using Jinja or DTL. Templates are not just for HTML files.

Automate notepad++ editing csv file using script

So I have this code that generates a .csv file of data, however the formatting is off due to the escapechar (can't fix this). I need to make all the double spaces into single spaces. I can do this in notepad++ with replace all, so I've written a python script using a notepad++ plugin that does this. Now I'd like to automate opening the file and running the script; is this possible using a batch file? Is there a better way to do this?
Example of before and after format needed:
"_time","location"
"2018-04-03T08:32:45.565000-0400","(0 , 3)"
"2018-04-03T08:32:45.565000-0400","(2 , 5)"
"_time","location"
"2018-04-03T08:32:45.565000-0400","(0,3)"
"2018-04-03T08:32:45.565000-0400","(2,5)"
You can do it all with Python.
Just read the file and use the string replace method. Probably you will create a temporary file with the adjustments and then rename it. Something like:
with open(fname) as f:
lines = f.readlines()
for line in lines:
newline = line.replace(" ", " ") #two spaces become one space
#... write newline to temp file, etc.

Read() function erases text in file [duplicate]

Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()

In place replacement of text in a file in Python

I am using the following code to upload a file on server using FTP after editing it:
import fileinput
file = open('example.php','rb+')
for line in fileinput.input('example.php'):
if 'Original' in line :
file.write( line.replace('Original', 'Replacement'))
file.close()
There is one thing, instead of replacing the text in its original place, the code adds the replaced text at the end and the text in original place is unchanged.
Also, instead of just the replaced text, it prints out the whole line. Could anyone please tell me how to resolve these two errors?
1) The code adds the replaced text at the end and the text in original place is unchanged.
You can't replace in the body of the file because you're opening it with the + signal. This way it'll append to the end of the file.
file = open('example.php','rb+')
But this only works if you want to append to the end of the document.
To bypass this you may use seek() to navigate to the specific line and replace it. Or create 2 files: an input_file and an output_file.
2) Also, instead of just the replaced text, it prints out the whole line.
It's because you're using:
file.write( line.replace('Original', 'Replacement'))
Free Code:
I've segregated into 2 files, an inputfile and an outputfile.
First it'll open the ifile and save all lines in a list called lines.
Second, it'll read all these lines, and if 'Original' is present, it'll replace it.
After replacement, it'll save into ofile.
ifile = 'example.php'
ofile = 'example_edited.php'
with open(ifile, 'rb') as f:
lines = f.readlines()
with open(ofile, 'wb') as g:
for line in lines:
if 'Original' in line:
g.write(line.replace('Original', 'Replacement'))
Then if you want to, you may os.remove() the non-edited file with:
More Info: Tutorials Point: Python Files I/O
The second error is how the replace() method works.
It returns the entire input string, with only the specified substring replaced. See example here.
To write to a specific place in the file, you should seek() to the right position first.
I think this issue has been asked before in several places, I would do a quick search of StackOverflow.
Maybe this would help?
Replacing stuff in a file only works well if original and replacement have the same size (in bytes) then you can do
with open('example.php','rb+') as f:
pos=f.tell()
line=f.readline()
if b'Original' in line:
f.seek(pos)
f.write(line.replace(b'Original',b'Replacement'))
(In this case b'Original' and b'Replacement' do not have the same size so your file will look funny after this)
Edit:
If original and replacement are not the same size, there are different possibilities like adding bytes to fill the hole or moving everything after the line.

Why can I not open pdf files that have been copied with this code

I need to do some manipulation of a number of pdf files. As a first step I wanted to copy them from a single directory into a tree that supports my needs. I used the following code
for doc in docList:
# these steps just create the directory structure I need from the file name
fileName = doc.split('\\')[-1]
ID = fileName.split('_')[0]
basedate = fileName.split('.')[0].split('_')[-1].strip()
rdate = '\\R' + basedate + '-' +'C' + basedate
newID = str(cikDict[ID])
newpath = basePath + newID + rdate
# check existence of the new path
if not os.path.isdir(newpath):
os.makedirs(newpath)
# reads the file in and then writes it to the new directory
fstring = open(doc).read()
outref = open(newpath +'\\' + fileName, 'wb')
outref.write(fstring)
outref.close()
When I run this code the directories are created and the there are files with the correct name in each directory. However, when I click to open a file I get an error from Acrobat informing me that the file was damaged and could not be repaired.
I was able to copy the files using
shutil.copy(doc,newpath)
To replace the last four lines - but I have not been able to figure out why I can't read the file as a string and then write it in a new location.
One thing I did was compare what was read from the source to what the file content was after a read after it had been written:
>>> newstring = open(newpath + '\\' +fileName).read()
>>> newstring == fstring
True
So it does not appear the content was changed?
I have not been able to figure out why I can't read the file as a string and then write it in a new location.
Please be aware that PDF is a binary file format, not a text file format. Methods treating files (or data in general) as text may change it in different ways, especially:
Reading data as text interprets bytes and byte sequences as characters according to some character encoding. Writing text back as data again transforms according some character encoding, too.
If the applied encodings differ, the result obviously differs from the original file. But even if the same encoding was used, differences can creep in: If the original file contains bytes which have no meaning in the applied encoding, some replacement character is used instead and the final result file contains the encoding of that replacement character, not the original byte sequence. Furthermore some encodings have multiple possible encodings for the same character. Thus, some input byte sequence may be replaced by some other sequence representing the same character in the output.
End-of-line sequences may be changed according to the preferences of the platform.
Binary files may contain different byte sequences used as end-of-line marker on one or the other platform, e.g. CR, LF, CRLF, ... Methods treating the data as text may replace all of them by the one sequence favored on the local platform. But as these bytes in binary files may have a different meaning than end-of-line, this replacement may be destructive.
Control characters in general may be ignored
In many encodings the bytes 0..31 have meanings as control characters. Methods treating binary data as text may interpret them somehow which may result in a changed output again.
All these changes can utterly destroy binary data, e.g. compressed streams inside PDFs.
You could try using binary mode for reading files by also opening them with a b in the mode string. Using binary mode both while reading and writing may solve your issue.
One thing I did was compare what was read from the source to what the file content was after a read after it had been written:
>>> newstring = open(newpath + '\\' +fileName).read()
>>> newstring == fstring
True
So it does not appear the content was changed?
Your comparison also reads the files as text. Thus, you do not compare the actual byte contents of the original and the copied file but their interpretations according to the encoding assumed while reading them. So damage has already been done on both sides of your comparison.
You should use shutil to copy files. It is platform aware and you avoid problems like this.
But you already discovered that.
You would be better served using with to open and close files. Then the files are opened and closed automatically. It is more idiomatic:
with open(doc, 'rb') as fin, open(fn_out, 'wb') as fout:
fout.write(fin.read()) # the ENTIRE file is read with .read()
If potentially you are dealing with a large file, read and write in chunks:
with open(doc, 'rb') as fin, open(fn_out, 'wb') as fout:
while True:
chunk=fin.read(1024)
if chunk:
fout.write(chunk)
else:
break
Note the 'rb' and 'wb' arguments to open. Since you are clearly opening this file under Windows, that prevents the interpretation of the file into a Windows string.
You should also use os.path.join rather than newpath + '\\' +fileName type operation.

Categories