Python zipfile: file name with new line characters - python

Somebody managed somehow to add a new line character \r\n to the name of a file in a zip, and that makes ZipFile fail when it tries to extract the zip:
2019-07-23 14:05:12,285 - __main__ - ERROR - Error desconocido: [Errno 22] Invalid argument: 'descargados\\03_26298_19\\ANEXO\r\n.pdf'. Saliendo.
Traceback (most recent call last):
File "motor.py", line 51, in main
procesar_descarga(zip_object, ruta_temp, ruta_final)
File "C:\Users\david\pycharmProjects\descargueitor2\volcado.py", line 90, in procesar_descarga
zip_object.extractall(str(ruta_temp))
File "C:\Users\david\Anaconda3\lib\zipfile.py", line 1616, in extractall
self._extract_member(zipinfo, path, pwd)
File "C:\Users\david\Anaconda3\lib\zipfile.py", line 1670, in _extract_member
open(targetpath, "wb") as target:
OSError: [Errno 22] Invalid argument: 'descargados\\03_26298_19\\ANEXO\r\n.pdf'
I tried the same file with several programs:
The built-in compressed files reader in Windows explorer just ignores the file: it is not listed nor extracted.
WinZip lists the file, but throws an error when opening or extracting the file.
7Zip can read and extract the file: it just converts the bad characters to underscores.
Is there any way to deal with this in Python? It looks like files in a zip cannot be renamed using the library.

Related

Python OSError: [Errno 9] Bad file descriptor after opening big json file

I just tried to read in a big json file (the Wikipedia json dump) in Python line by line and got the Error:
Traceback (most recent call last):
File "C:/.../test_json_wiki_file.py", line 19, in <module>
test_fct()
File "C:/.../test_json_wiki_file.py", line 12, in test_fct
for line in f:
OSError: [Errno 9] Bad file descriptor
Here is my code:
import json
def test_fct():
data = []
i = 0
with open('E:/.../20200713.json/20200713.json') as f:
for line in f:
data.append(json.loads(line))
i = i + 1
if i > 1:
input_file.close()
return data
test_data = test_fct()
The file size is around 700GB and the description (https://www.wikidata.org/wiki/Wikidata:Database_download) of the file states that it can be read line by line. I don't know if this is important but the E:/ hard drive is an external one.
Thank you for your help in advance :)
I don't have any firsthand knowledge on opening large files in python, but did you mean to have the path as 20200713.json/20200713.json. Is the first one actually a directory that has a .json extension? I'd also suggest trying to first load a smaller sample of the file (opening might be hard, so maybe just use the more command in terminal?).

gensim file not found error

I am executing the following line:
id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
This code is available at "https://radimrehurek.com/gensim/wiki.html". I downloaded the wikipedia corpus and generated the required files and wiki_en_wordids.txt is one of those files. This file is available in the following location:
~/gensim/results/wiki_en
So when i execute the code mentioned above I get the following error:
Traceback (most recent call last):
File "~\Python\Python36-32\temp.py", line 5, in <module>
id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
File "~\Python\Python36-32\lib\site-packages\gensim\corpora\dictionary.py", line 344, in load_from_text
with utils.smart_open(fname) as f:
File "~\Python\Python36-32\lib\site-packages\smart_open\smart_open_lib.py", line 129, in smart_open
return file_smart_open(parsed_uri.uri_path, mode)
File "~\Python\Python36-32\lib\site-packages\smart_open\smart_open_lib.py", line 613, in file_smart_open
return open(fname, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'wiki_en_wordids.txt'
Even though the file is available in the required location I get that error. Should I place the file in any other location? How do I determine what the right location is?
The code requires an absolute path here. Relative path should be used when entire operation is carried out in the same directory location, but in this case, the file name is passed as argument to some other function which is located at different location.
One way to handle this situation is using abspath -
import os
id2word = gensim.corpora.Dictionary.load_from_text(os.path.abspath('wiki_en_wordids.txt'))

Python: TarFile.open No such file or directory

So I'm fairly new to python and i'm writing a script that needs to untar a file. I use this simple function.
def untar(source_filename, dest_dir):
for f in os.listdir():
print(f)
if(source_filename.endswith("tar.gz") or source_filename.endswith(".tar")):
tar = tarfile.open(source_filename)
tar.extractall(dest_dir)
tar.close()
else:
raise Exception("Could not retrieve .depends for that file.")
I added the initial for loop for debugging purposes. When I invoke it, it prints out the name of the file i need in the current working directory meaning that it does exist. Here is the whole output.
dep.tar.gz
Traceback (most recent call last):
File "init.py", line 70, in <module>
untar('dep.tar.gz', ".")
File "init.py", line 17, in untar
tar = tarfile.open(source_filename)
File "/usr/lib/python3.4/tarfile.py", line 1548, in open
return func(name, "r", fileobj, **kwargs)
File "/usr/lib/python3.4/tarfile.py", line 1646, in bz2open
compresslevel=compresslevel)
File "/usr/lib/python3.4/bz2.py", line 102, in __init__
self._fp = _builtin_open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'dep.tar.gz'
Can someone tell me how it can see the file in the working directory, and then suddenly not be able to see the file in the working directory?
The program I was using to create the tar placed a space at the beginning of the filename. So python was looking for 'dep.tar.gz' and the actual filename was ' dep.tar.gz'. ty #Ben
TIL - filenames can start with spaces.

Python BZ2 IOError: invalid data stream

Traceback (most recent call last):
File "TTRC_main.py", line 309, in <module>
updater.start()
File "TTRC_main.py", line 36, in start
newFileData = bz2.BZ2File("C:/Program Files (x86)/Toontown Rewritten/temp/phase_7.mf.bz2"," rb").read()
IOError: invalid data stream
The code to retrieve file I'm getting that's giving me this error is:
newFileComp = urllib.URLopener()
newFileComp.retrieve("http://kcmo-1.download.toontownrewritten.com/content/phase_7.mf.bz2", "C:/Program Files (x86)/Toontown Rewritten/temp/phase_7.mf.bz2")
What do I do to fix this error? Its not really descriptive. (to me)
Could the issue be occuring because of the extra spacein the file mode? -
newFileData = bz2.BZ2File("C:/Program Files (x86)/Toontown Rewritten/temp/phase_7.mf.bz2"," rb").read()
Try this -
newFileData = bz2.BZ2File("C:/Program Files (x86)/Toontown Rewritten/temp/phase_7.mf.bz2","rb").read()
For me the issue was that the files were not in .bz2 format.
Make sure file is bz2 format.
Make sure the read and write actions are the same "r","w" or "rb","wb"
Like Anand said, no space in "rb".

IO error in savetxt while using numpy

Im trying to read a dataset and collect meta features from it.
I get the following error after executing the python file.
Traceback (most recent call last):
File "runmeta.py", line 79, in <module>
np.savetxt('datasets/'+str(i)+'/metafeatures',meta[i],delimiter=',')
File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 940, in savetxt
fh = open(fname, 'w')
IOError: [Errno 2] No such file or directory: 'datasets/2/metafeatures'
the error you're getting is simply telling you it didn't find the file. i would suggest looking into absolute and relative file paths.
advice in error handling:
the error is triggered on this line
fh = open(fname, 'w')
so as you debug your program, look at the line python shows you. maybe change the variable fname. that is where i would start.
currently
fname = 'datasets/2/metafeatures'

Categories