Writing filenames with special characters python - python

I need to save file names with certain special characters, this is an example: IMAGE - Topčider.jpeg. I manage to save the file correctly with this code:
image_req = requests.get(image)
with open(title_for_file, "wb") as f:
f.write(image_req.content)
However, when I open the file with the pyexiv2 module it raises this error:
IMAGE - Topčider.jpeg: Failed to open the data source: No such file or directory (errno = 2)
If I look in the directory, the file is in there only as IMAGE - .jpeg, so my question is how can I resolve this error? I think this issue comes more from writing the file name rather than opening the file it's just "noticed" there. Giving the file a different name is not an option.
EDIT
I have tried to create the variable title_for_file as a unicode string i.e. u"{}".format(title) but this did not work

Thanks to Kelly Bundy in the comments you can't actually name the file this.

Related

How to create a file in a different directory? (python)

I am currently stuck with the following error:
IOError: [Errno 2] No such file or directory: '/home/pi/V1.9/Storage/Logs/Date_2018-08-02_12:51.txt'
My code to open the file is the following:
nameDir = os.path.join('/home/pi/V1.9/Storage/Logs', "Date_" + now.strftime("%Y-%m-%d_%H:%M") + ".txt")
f = open(nameDir, 'a')
I am trying to save a file to a certain path, which is /home/pi/V1.9/Storage/Logs. I am not sure why it can't find it, since I have already created the folder Logs in that space. The only thing being created is the text file. I am not sure if its suppose to join up like that, but I generally tried to follow the stages on this thread: Telling Python to save a .txt file to a certain directory on Windows and Mac
The problem seems to be here:
f = open(nameDir, 'a')
'a' stands for append, which means: the file should already exist, you get an error message because it doesn't. Use 'w' (write) instead, Python will create the file in that case.
If you are creating the file use the write mode w or use a+
f = open(nameDir, 'w')
f = open(nameDir, 'a+')
Use only a append if the file already exist.
Not really an answer to your question, but similar error. I had:
with open("/Users//jacobivanov/Desktop/NITL/Data Analysis/Completed Regressions/{0} Temperature Regression Parameters.txt".format(data_filename), mode = 'w+') as output:
Because data_filename was in reality a global file path, it concatenated and looked for a non-existent directory. If you are getting this error and are referring to the file path of an external file in the name of the generated file, check to verify it isn't doing this.
Might help someone.

EOF marker not found while use PyPDF2 merge pdf file in python

When I use the following code
from PyPDF2 import PdfFileMerger
merge = PdfFileMerger()
for newFile in nlst:
merge.append(newFile)
merge.write("newFile.pdf")
Something happened as following:
raise utils.PdfReadError("EOF marker not found")
PyPDF2.utils.PdfReadError: EOF marker not found
Anybody could tell me what happened?
After encountering this problem using camelot and PyPDF2, I did some digging and have solved the problem.
The end of file marker '%%EOF' is meant to be the very last line, but some PDF files put a huge chunk of javascript after this line, and the reader cannot find the EOF.
Illustration of what the EOF plus javascript looks like if you open it:
b'>>\r\n',
b'startxref\r\n',
b'275824\r\n',
b'%%EOF\r\n',
b'\n',
b'\n',
b'<script type="text/javascript">\n',
b'\twindow.parent.focus();\n',
b'</script><!DOCTYPE html>\n',
b'\n',
b'\n',
b'\n',
So you just need to truncate the file before the javascript begins.
Solution:
def reset_eof_of_pdf_return_stream(pdf_stream_in:list):
# find the line position of the EOF
for i, x in enumerate(txt[::-1]):
if b'%%EOF' in x:
actual_line = len(pdf_stream_in)-i
print(f'EOF found at line position {-i} = actual {actual_line}, with value {x}')
break
# return the list up to that point
return pdf_stream_in[:actual_line]
# opens the file for reading
with open('data/XXX.pdf', 'rb') as p:
txt = (p.readlines())
# get the new list terminating correctly
txtx = reset_eof_of_pdf_return_stream(txt)
# write to new pdf
with open('data/XXX_fixed.pdf', 'wb' as f:
f.writelines(txtx)
fixed_pdf = PyPDF2.PdfFileReader('data/XXX_fixed.pdf')
PDF is a file format, where a pdf parser normally starts reading the file by reading some global information located at the end of the file. At the very end of the document there needs to be a line with the content of
%%EOF
This is a marker, where the pdf parser knows, that the PDF document ends here and the global information it needs, should be before this (a startxref section).
I guess, that the error message you see, means, that one of the input documents was truncated and is missing this %%EOF-marker.
One simple solution for this problem (EOF marker not found). Open your .pdf file in other application (I used Libre office draw in Ubuntu 18.04). Then export the file as .pdf. Using this exported .pdf file the problem will not persist.
PyPDF2 cannot find the EOF marker in a PDF that is encrypted.
I came across the same error while I was working through the (excellent) Automate The Boring Stuff. Chapter 15, 2nd edition, page 355, project Combining Select Pages from Many PDFs.
I chose to combine all the PDFs I had made during this chapter into one document and one of them was an encrypted PDF and the project failed when it got to the end of the encrypted document with the error message:
PyPDF2.utils.PdfReadError: EOF marker not found
I moved the encrypted file to a different folder (so it would not be merged with the other pdfs and the project worked fine.
So, it seems PyPDF2 cannot find the EOF marker in a PDF that is encrypted.
I've also got that problem and got a solution.
First, python reads PDF as 'rb' or 'wb' as a binary read and write format.
END OF FILE
Occurs when that there was an open parenthesis somewhere on a line, but not a matching closing parenthesis. Python reached the end of the file while looking for the closing parenthesis.
Here is the 1 solution:
Close that file that you've opened earlier using this command
newfile.close()
Check whether that pdf is opened using other variable and again close it
Same_file_with_another_variable.close()
Now open it only once and use it , you are good to go.
I wanted to add my hacky solution to this issue.
I had the same error with python requests (application/pdf).
In my case the provider (a shipping labeling service) did give a 200 and a b'string which represents the PDF, but in some random cases it missed the EOF marker.
Because it was random, I came up with the following solution:
for obj in label_objects:
get_label = api.get_label(label_id=obj.label_id)
while not 'EOF' in str(get_label.content):
get_label = api.get_label(label_id=obj.label_id)
At a few tries it gives the b'string with EOF and we're good to proceed.
i had the same problem.
For me the solution was to close the previously opened file before working with it again.

How does one read a .dif file with Python

I am working on a project that requires me to read a file with a .dif extension. Dif stands for data information exchange. The file opens nicely in Open Office Calc. Then you can easily save as a csv file, however when I open in Python all I get are random characters that don't make sense. Here is the last code that I tried just to see if I could read.
txt = open('C:\myfile.dif', 'rb').read()
print txt
I would even be open to programatically converting the file to csv first. before opening if someone knows how to do that. As always, any help is much appreciated. Below is a partial screenshot of what I get when I run the code.
Hadn't heard of this file format. Went and got a sample here.
I tested your method and it works fine:
>>> content = open(r"E:\sample.dif", 'rb').read()
>>> print (content)
b'TABLE\r\n0,1\r\n"EXCEL"\r\nVECTORS\r\n0,8\r\n""\r\nTUPLES\r\n0,3\r\n""\r\nDATA\r\n0,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"Welcome to File Extension FYI Center!"\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n""\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"ID"\r\n1,0\r\n"Type"\r\n1,0\r\n"Description"\r\n-1,0\r\nBOT\r\n0,1\r\nV\r\n1,0\r\n"ASP"\r\n1,0\r\n"Active Server Pages"\r\n-1,0\r\nBOT\r\n0,2\r\nV\r\n1,0\r\n"JSP"\r\n1,0\r\n"JavaServer Pages"\r\n-1,0\r\nBOT\r\n0,3\r\nV\r\n1,0\r\n"PNG"\r\n1,0\r\n"Portable Network Graphics"\r\n-1,0\r\nBOT\r\n0,4\r\nV\r\n1,0\r\n"GIF"\r\n1,0\r\n"Graphics Interchange Format"\r\n-1,0\r\nBOT\r\n0,5\r\nV\r\n1,0\r\n"WMV"\r\n1,0\r\n"Windows Media Video"\r\n-1,0\r\nEOD\r\n'
>>>
The question is what is in the file and how do you want to handle it. Personally I liked:
with open(r"E:\sample.dif", 'rb') as f:
for line in f:
print (line)
In the first code block, that long line that has a b'' (for bytes!) in front of it can be iterated on \r\n:
b'TABLE\r\n'
b'0,1\r\n'
b'"EXCEL"\r\n'
b'VECTORS\r\n'
b'0,8\r\n'
b'""\r\n'
b'TUPLES\r\n'
b'0,3\r\n'
b'""\r\n'
b'DATA\r\n'
b'0,0\r\n'
.
.
.
b'"Windows Media Video"\r\n'
b'-1,0\r\n'
b'EOD\r\n'

python 2.7: Testing text file in idle

I m using the command testFile = open("test.txt") to open a simple text file and received the following: Does such errors occur due to the version of python one uses?
IOError: [Errno 2] No such file or directory: 'Test.txt
add an "a+" mode argument on to your open, this will create the file if it doesn't exist:
testFile = open("test.txt", "a+")
Syntax for opening file is:
file object = open(file_name [, access_mode][, buffering])
As you have not mentioned access_mode(optional), default is 'Read'. But if file 'test.txt' does not exists in the folder where you are executing script, it will through an error as you got.
To correct it, either add access_mode as "a+" or give full file path e.g. C:\test.txt (assuming windows system)
The error is independent from the version, but it is not clear
what you want to do with your file.
If you want to read from it and you get such an error, it means that your file is not where you think it is. In any case you should write a line like testFile = open("test.txt","r").
If you want to create a new file and write in it, you will have a line like
testFile = open("test.txt","w"). Finally, if your file already exists and you want to add things on it, use testFile = open("test.txt","a") (after having moved the file at the correct place). If your file is not in the directory of the script, you will use commands to find your file and open it.

python xlrd errors related to file extension changes

I am trying to organize a very large number of .DTA files using the xlrd library.
The first thing I found out was that .DTA files could be exported to excel files just by changing the extension .xls and opening them in excel. It gives a warning when you open it gives an error about a possibly corrupted file, but opens normally otherwise.
the file you are trying to open is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?
When in python however, when I try to open the file all I get is an error with no helpful information, which I'm pretty sure is caused by the file extension issue.
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1323, in getbof
raise XLRDError('Expected BOF record; found 0x%04x' % opcode)
XLRDError: Expected BOF record; found 0x5845
I tried my code by cutting and pasting the data into a new excel file and naming it the same thing and it worked, so I'm pretty sure this is the issue, but I have too many files to be able to do this for each one individually.
Is there a better way to solve this? Supressing the error or actually changing the file type and not just its extension somehow?
I think there is a Byte Order Mark at the beginning of the file that is not observable but exists. This answer describes how to remove it < converting utf-16 -> utf-8 AND remove BOM>.

Categories