Extracting file from corrupted GZ - python

My code snippet can extract file from GZ as save it as .txt file, but sometimes that file may contain some weird text which crashes extract module.
Some Gibberish from file:
Method I use:
def unpackgz(name ,path):
file = path + '\\' +name
outfilename = file[:-3]+".txt"
inF = gzip.open(file, 'rb')
outF = open(outfilename, 'wb')
outF.write( inF.read() )
inF.close()
outF.close()
My question how I can go around this? Something maybe similar to with open(file, errors='ignore') as fil: . Because With that method I can extract only healthy files.
EDIT to First question
def read_corrupted_file(filename):
with gzip.open(filename, 'r') as f:
for line in f:
try:
string+=line
except Exception as e:
print(e)
return string
newfile = open("corrupted.txt", 'a+')
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb
newfile.write(read_corrupted_file(localfilename3))
Results in multiple errors:
Like This
Fixed to working state:
def read_corrupted_file(filename):
string=''
newfile = open("corrupted.txt", 'a+')
try:
with gzip.open(filename, 'rb') as f:
for line in f:
try:
newfile.write(line.decode('ascii'))
except Exception as e:
print(e)
except Exception as e:
print(e)
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb
read_corrupted_file(localfilename3)
print('done')

Generally if the file is corrupt then it will throw a error trying to unzip the file, there is not much you can do simply to still get the data, but if you just want to stop it crashing you could use a try catch.
try:
pass
except Exception as error:
print(error)
Applying this logic you could read line by line with gzip, with a try exception, after, still reading the next line when it hits a corrupted section.
import gzip
with gzip.open('input.gz','r') as f:
for line in f:
print('got line', line)

Related

what is an exception handler for

I have a script which wants to load integers from a text file. If the file does not exist I want the user to be able to browse for a different file (or the same file in a different location, I have UI implementation for that).
What I don't get is what the purpose of Exception handling, or catching exceptions is. From what I have read it seems to be something you can use to log errors, but if an input is needed catching the exception won't fix that. I am wondering if a while loop in the except block is the approach to use (or don't use the try/except for loading a file)?
with open(myfile, 'r') as f:
try:
with open(myfile, 'r') as f:
contents = f.read()
print("From text file : ", contents)
except FileNotFoundError as Ex:
print(Ex)
You need to use to while loop and use a variable to verify in the file is found or not, if not found, set in the input the name of the file and read again and so on:
filenotfound = True
file_path = myfile
while filenotfound:
try:
with open(file_path, 'r') as f:
contents = f.read()
print("From text file : ", contents)
filenotfound = False
except FileNotFoundError as Ex:
file_path = str(input())
filenotfound = True

How can I suppress the output of file errors when using tqdm?

I am loading a bunch of files and want to show a corresponding progress bar using tqdm.
for file_path in tqdm(file_paths, position=0, desc='files loaded'):
if is_binary(file_path):
continue
try:
with open(file_path, 'r', encoding='utf8', errors='ignore') as input_file:
file_content = input_file.read()
processing_queue.put(file_content)
except FileNotFoundError as e:
main_logger.error(f'Encountered exception while opening {file_path}: {e}')
Even though I am handling files that could not be found with an exception, I still get error messages printed to the console which interfere with the output of tqdm:
files loaded: 99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 257398/260429 [6:17:16<07:16, 6.95it/s]
[Errno 2] No such file or directory: '\\\\server\\path\\to\\file'██████ | 112570/260429 [6:17:11<7:05:21, 5.79it/s]
files loaded: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 260429/260429 [6:21:29<00:00, 11.38it/s]
Why are these messages still printed to the console and what can be done to supress them?
Ok I fixed it by opening the file in a seperate try/except block first!
for file_path in tqdm(file_paths, position=0, desc='files loaded'):
try:
open(file_path).close()
except EnvironmentError as e:
main_logger.error(f'Encountered exception while opening {file_path}: {e}')
continue
if is_binary(file_path):
continue
with open(file_path, 'r', encoding='utf8', errors='ignore') as input_file:
file_content = input_file.read()
processing_queue.put(file_content)

Python: FileNotFoundError not caught by try-except block

recently i started learning Python and encountered a problem i can`t find an answer to.
Idea of the program is to ask for username, load a dictionary from JSON file, and if the name is in the dictionary - print the users favourite number.
The code, that loads the JSON file looks like this:
import json
fav_numbers = {}
filename = 'numbers.JSON'
name = input('Hi, what`s your name? ')
try:
with open(filename) as f_obj:
fav_numbers = json.load(f_obj)
except FileNotFoundError:
pass
if name in fav_numbers.keys():
print('Hi {}, your fav number is {}, right?'.format(name, fav_numbers[name]))
else:
number = input('Hi {}, what`s your favourte number? '.format(name))
fav_numbers[name] = number
with open(filename, 'w') as f_obj:
json.dump(fav_numbers, filename)
Still, as i try to run it, it crashes, telling me:
Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: 'numbers.JSON'
File "/home/niedzwiedx/Dokumenty/Python/ulubionejson.py", line 22, in <module>
with open(filename) as f_obj:
What i`m doing wrong to catch the exception? (Already tried changing the FileNotFoundError to OSError or IOError)
The error comes from you last line, outside of your try/except
with open(filename, 'w') as f_obj:
json.dump(fav_numbers, filename)
filename is a string, not a file.
You have to use
with open(filename, 'w') as f_obj:
json.dump(fav_numbers, f_obj)
For additional safety, you can surround this part with try/except too
try:
with open(filename, 'w') as f_obj:
json.dump(fav_numbers, f_obj)
except (FileNotFoundError, PremissionError):
print("Impossible to create JSON file to save data")

Python:The process cannot access the file because it is being used by another process

try:
masterpath = os.path.join(path, "master.txt")
with open(masterpath, 'r') as f:
s = f.read()
f.close()
exec(s)
with open(masterpath, 'w') as g:
g.truncate()
g.close()
os.remove(masterpath)
Here I want to read something in a .txt file and then erase content and delete it. But it always shows it cannot delete it as 'The process cannot access the file because it is being used by another process'.
Actually what I need is to delete the .txt file, but it cannot delete immediately sometimes, so I erase the content at first in case that it will be read again. So is there any good way to read something in a .txt file and then delete this file as soon and stable as possible?
You should NOT call f.close() nor g.close(). It is called automatically by with statement.
remove the unnecessary close() statements to start - like #grapes mentioned - why are you truncating what you are deleting? just delete it...
try:
masterpath = os.path.join(path, "master.txt")
with open(masterpath, 'r') as f:
s = f.read()
exec(s)
except Error as e:
print(e)
else:
os.remove(masterpath)
FYI, it is bad form to execute the contents of a file if you do not control the contents of said file.
another option:
masterpath = os.path.join(path, "master.txt")
with open(masterpath, 'r') as f:
try:
s = f.read()
except Error as e:
print(e)
else:
exec(s)
os.remove(masterpath)
Try to use short sleep in exception part:
try:
masterpath = os.path.join(path, "master.txt")
with open(masterpath, 'r') as f:
s = f.read()
f.close()
exec(s)
with open(masterpath, 'w') as g:
g.truncate()
g.close()
os.remove(masterpath)
except WindowsError:
time.sleep(sleep)
else:
break
Another way is to use:
os.remove(masterpath)

File write and file read in utf-16 in python

I have this file write function:
def filewrite(folderpath, filename, strdata, encmode):
try:
path = os.path.join(folderpath, filename)
if not path:
return
create_dir_path(folderpath)
#path = os.path.join(folderpath, filepath)
with codecs.open(path, mode='w', encoding=encmode) as fp:
fp.write(unicode(strdata))
except Exception, e:
raise Exception(e)
which am using to write data to a file:
filewrite(folderpath, filename, strdata, 'utf-16')
But, when if try to read this file am getting the exception:
Exception: UTF-16 stream does not start with BOM
My file read function is as show below:
def read_in_chunks(file_object, chunk_size=4096):
try:
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
except Exception, ex:
raise ex
def fileread(folderPath, fileName, encmode):
try:
path = os.path.join(folderPath, fileName)
fileData = ''
if os.access(path, os.R_OK):
with codecs.open(path, mode='r', encoding=encmode) as fp:
for block in read_in_chunks(fp):
fileData = fileData + block
return fileData
return ''
except Exception, ex:
raise ex
Please, let me know what am doing wrong here.
Thanks
There doesn't appear to be anything wrong with your code. Running it on my machine creates the proper BOM at the start of the file automatically.
BOM is a sequence of bytes at the start of the file that indicates which order multi-byte encodings (UTF-16) should be read - you can read about system endianness if you're interested.
If you're running on a mac/linux you should be able to hd your_utf16file or hexdump your_utf16file to check the raw bytes inside the file. Running your code I saw the correct bytes 0xff 0xfe at the beginning of mine.
Try replacing your fileread function portion with
with codecs.open(path, mode='r', encoding=encmode) as fp:
for block in fp:
print block
to ensure you can still read the file after eliminating external factors (your read_in_chunks functional).

Categories