How to extract Bzip2 file in python

How to extract Bzip2 file in python - python

I am writing program which fetch extension of filename from string than download it and extract it, I am able to do it with Tar.gz and zip type of compression but while extracting Bz2 type of file i am able to read file and transfer data to new file in same directory but what i would like is to create folder like zipfile and tarfile using something like 'ExtractAll" and extract file in it, but i am unable can anyone help
def bz2_download(*args):
mystring_zip = ' '.join(args)
print('{} Bz2 file Download Started!!!'.format(mystring_zip.split('/')[-1].split('.')[-2]))
r = requests.get(mystring_zip)
filename = mystring_zip.split('/')[-1] # this will take only -1 splitted part of the url
with open(filename, 'wb') as output_file:
output_file.write(r.content)
print('{} Bz2 file Download Completed!!!'.format(mystring_zip.split('/')[-1].split('.')[-2]))
path_zip = mystring_zip.split('/')[-1].split('.')[0]
print(path_zip, '+++++++++++++++++++++++++++++++++++++++++++++++++++++++')
extract_dir = path_zip
print('--------------------------{} extraction Started----------------------------------'.format(extract_dir))
os.mkdir(path_zip)
with open(path_zip, 'wb') as new_file, open(filename, 'rb') as file:
decompressor = _bz2.BZ2Decompressor()
for data in iter(lambda: file.read(100 * 1024), b''):
a = new_file.write(decompressor.decompress(data))
print(a, '1111111111111111111111111111111111111111111')
if extract_dir:
print('---------------------------{} extraction completed----------------------------------'.format(extract_dir))
else:
print('---------------------------{} extraction Failed----------------------------------'.format(extract_dir))
Here is my Tar.gz code which download the file and extract it to a folder, want to do something similar with bzip2 file
def tar_download(*args):
mystring_tar = ' '.join(args)
print('{} Tar.gz file Download Started!!!'.format(mystring_tar.split('/')[-1].split('.')[0]))
r = requests.get(mystring_tar)
filename = mystring_tar.split('/')[-1]
with open(filename, 'wb') as output_file:
output_file.write(r.content)
print(' {} Tar.gz file Download Completed!!!'.format(mystring_tar.split('/')[-1].split('.')[0]))
path_tar = mystring_tar.split('/')[-1].split('.')[0]
extract_dir = path_tar
print('---------------------------{} extraction Started----------------------------------'.format(extract_dir))
thetarfile = tarfile.open(filename, mode="r|gz")
thetarfile.extractall(extract_dir)
if extract_dir:
print('---------------------------{} extraction Complete----------------------------------'.format(extract_dir))
else:
print('---------------------------{} extraction Failed----------------------------------'.format(extract_dir))

Related

Encoding any file type with python

I am trying to go through all files within a folder, read the file data encoded using utf-8, then rewriting that data to a new file which should create a copy of that file. However when doing so the new copy of the file gets corrupted.
-Should i be using utf-8 text encoding to encode all file types (.py, .txt, .docx, .jpg)?
-Is there one standard text encoding format that works for all file types?
def read_files():
files = ["program.py", "letter.docx", "cat.jpg", "hello_world.py"]
for file in files:
#open exsting file
f = open(file, encoding="utf-8")
file_content = f.read()
#get file name info
file_extension = file.split(".")[1]
file_name = file.split(".")[0]
#write encoded data to new file
f = open(file_name + "_converted." + file_extension , "wb")
f.write(bytes(file_content, encoding="utf-8"))
f.close()
read_files()

proper way to copy files with shutil:
import shutil
source = file
destination = file_name + "_converted." + file_extension
shutil.copy(source, destination)
bad and slow way to copy files:
def read_files():
files = ["program.py", "letter.docx", "cat.jpg", "hello_world.py"]
for file in files:
#open exsting file
f = open(file,'rb') # read file in binary mode
file_content = f.read()
f.close() # don't forget to close the file !
#get file name info
file_extension = file.split(".")[1]
file_name = file.split(".")[0]
#write raw data to new file
f = open(file_name + "_converted." + file_extension , "wb")
f.write(file_content)
f.close()
read_files()
if you don't need to decode them to text then you should only open them in binary mode, as things like jpg and docx will break in text mode and should be opened in binary mode.
alternatively if you actually need to do some work on the docx or jpg files then you should use the proper modules to do so like Pillow for jpg and docx module for docx files.

extracting compressed files from .gz extension while downloading them from ftp server [duplicate]

This question already has an answer here:
Retrieve data from gz file on FTP server without writing it locally
(1 answer)
Closed 2 years ago.
I've created a function which download .gz files from given ftp server and I want to extract them on the fly while downloading and delete compressed files afterwards. How can I do that?
sinex_domain = "ftp://cddis.gsfc.nasa.gov/gnss/products/bias/2013"
def download(sinex_domain):
user = getpass.getuser()
sinex_parse = urlparse(sinex_domain)
sinex_connetion = FTP(sinex_parse.netloc)
sinex_connetion.login()
sinex_connetion.cwd(sinex_parse.path)
sinex_files = sinex_connetion.nlst()
sinex_userpath = "C:\\Users\\" + user + "\\DCBviz\\sinex"
pathlib.Path(sinex_userpath).mkdir(parents=True, exist_ok=True)
for fileName in sinex_files:
local_filename = os.path.join(sinex_userpath, fileName)
file = open(local_filename, 'wb')
sinex_connetion.retrbinary('RETR '+ fileName, file.write, 1024)
#want to extract files in this loop
file.close()
sinex_connetion.quit()
download(sinex_domain)

Although there is probably a cleverer way that avoids storing the whole data in memory for each file, these appear to be quite small files (a few tens of kilobytes uncompressed), so it would be sufficient to read the compressed data into a BytesIO buffer, then decompress it in memory before writing it to the output file. (The compressed data is never saved to disk.)
You would add these imports:
import gzip
from io import BytesIO
and then your main loop becomes:
for fileName in sinex_files:
local_filename = os.path.join(sinex_userpath, fileName)
if local_filename.endswith('.gz'):
local_filename = local_filename[:-3]
data = BytesIO()
sinex_connetion.retrbinary('RETR '+ fileName, data.write, 1024)
data.seek(0)
uncompressed = gzip.decompress(data.read())
with open(local_filename, 'wb') as file:
file.write(uncompressed)
(Note that the file.close() is not needed.)

Error reading data from the old files, writing it into the new files and then deleting the old files in Python

I'm trying to use below code to read 5 files from source, write them in destination and then deleting the files in source. I get the following error: [Errno 13] Permission denied: 'c:\\data\\AM\\Desktop\\tester1. The file by the way look like this:
import os
import time
source = r'c:\data\AM\Desktop\tester'
destination = r'c:\data\AM\Desktop\tester1'
for file in os.listdir(source):
file_path = os.path.join(source, file)
if not os.path.isfile:
continue
print(file_path)
with open (file_path, 'r') as IN, open (destination, 'w') as OUT:
data ={
'Power': None,
}
for line in IN:
splitter = (ID, Item, Content, Status) = line.strip().split()
if Item in data == "Power":
Content = str(int(Content) * 10)
os.remove(IN)

I have re-written your entire code. I assume you want to update the value of Power by a multiple of 10 and write the updated content into a new file. The below code will do just that.
Your code had multiple issues, first and foremost, most of what you wanted in your head did not get written in the code (like writing into a new file, providing what and where to write, etc.). The original issue of the permission was because you were trying to open a directory to write instead of a file.
source = r'c:\data\AM\Desktop\tester'
destination = r'c:\data\AM\Desktop\tester1'
for file in os.listdir(source):
source_file = os.path.join(source, file)
destination_file=os.path.join(destination, file)
if not os.path.isfile:
continue
print(source_file)
with open (source_file, 'r') as IN , open (destination_file, 'w') as OUT:
data={
'Power': None,
}
for line in IN:
splitter = (ID, Item, Content, Status) = line.strip().split()
if Item in data:# == "Power": #Changed
Content = str(int(Content) * 10)
OUT.write(ID+'\t'+Item+'\t'+Content+'\t'+Status+'\n') #Added to write the content into destination file.
else:
OUT.write(line) #Added to write the content into destination file.
os.remove(source_file)
Hope this works for you.

I'm not sure what you're going for here, but here's what I could come up with the question put into the title.
import os
# Takes the text from the old file
with open('old file path.txt', 'r') as f:
text = f.read()
# Takes text from old file and writes it to the new file
with open('new file path.txt', 'w') as f:
f.write(text)
# Removes the old text file
os.remove('old file path.txt')

Sounds from your description like this line fails:
with open (file_path, 'r') as IN, open (destination, 'w') as OUT:
Because of this operation:
open (destination, 'w')
So, you might not have write-access to
c:\data\AM\Desktop\tester1
Set file permission on Windows systems:
https://www.online-tech-tips.com/computer-tips/set-file-folder-permissions-windows/

#Sherin Jayanand
One more question bro, I wanted to try something out with some pieces of your code. I made this of it:
import os
import time
from datetime import datetime
#Make source, destination and archive paths.
source = r'c:\data\AM\Desktop\Source'
destination = r'c:\data\AM\Desktop\Destination'
archive = r'c:\data\AM\Desktop\Archive'
for root, dirs, files in os.walk(source):
for f in files:
pads = (root + '\\' + f)
# print(pads)
for file in os.listdir(source):
dst_path=os.path.join(destination, file)
print(dst_path)
with open(pads, 'r') as IN, open(dst_path, 'w') as OUT:
data={'Power': None,
}
for line in IN:
(ID, Item, Content, Status) = line.strip().split()
if Item in data:
Content = str(int(Content) * 10)
OUT.write(ID+'\t'+Item+'\t'+Content+'\t'+Status+'\n')
else:
OUT.write(line)
But again I received the same error: Permission denied: 'c:\\data\\AM\\Desktop\\Destination\\C'
How comes? Thank you very much!

the downloaded file is corrupt

I write a script to download certain files from multiple pages from the web. The downloads seems to work but all the files corrupted. I tried different way to download the files but always give me corrupted files and all the files size only 4 kb.
Where do I need to change or revise my code to fix download's problem?
while pageCounter < 3:
soup_level1 = BeautifulSoup(driver.page_source, 'lxml')
for div in soup_level1.findAll('div', attrs ={'class':'financial-report-download ng-scope'}):
links = div.findAll('a', attrs = {'class':'ng-binding'}, href=re.compile("FinancialStatement"))
for a in links:
driver.find_element_by_xpath("//div[#ng-repeat = 'attachments in res.Attachments']").click()
files = [url + a['href']]
for file in files:
file_name = file.split('/')[-1]
print ("Downloading file:%s"%file_name)
# create response object
r = requests.get(file, stream = True)
# download started
with open(file_name, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024*1024):
if chunk:
f.write(chunk)
print ("%s downloaded!\n"%file_name)

Files added to tarfile come back as empty files

I am trying to add a file to a gzipped tarfile in python
import tarfile
# create test file
with open("testfile.txt", "w") as f:
f.write("TESTTESTTEST")
# create archive
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
with open("testfile.txt", 'rb') as f:
archive.addfile(tarfile.TarInfo("testfile.txt"), f)
# read test file out of archive
with tarfile.open("archfile.tar.gz", "r:gz") as archive:
print(archive.extractfile("testfile.txt").read())
The result is b'' - an empty bytestring.
The file is not empty - if I try to read the file using the following code:
with open("testfile.txt", 'rb') as f:
print(f.read())
... I get b'TESTTESTTEST'
Is there something obvious I am missing? My end goal is to add the string in memory using f = io.StringIO('TESTTESTTEST')
I also tried removing the :gz and I see the same problem with a raw tar archive.
For additional info - I'm using Python 3 in a jupyter session on Windows 10. I see the same problem in Windows/Python 3.5.2/PyCharm.

I hit a similar problem. The documentation says that when you call tar.addfile it will write TarInfo.size bytes from the given file. That means that you have to either create the TarInfo with the file size or use tar.add() instead of tar.addfile:
# create archive V1
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
with open("testfile.txt", 'rb') as f:
info = archive.gettarinfo("testfile.txt")
archive.addfile(info, f)
# create archive V2
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
archive.add("testfile.txt")
# create archive V3
with tarfile.open("archfile.tar.gz", "w:gz") as archive:
with io.BytesIO(b"TESTTESTTEST") as f:
info = tarfile.TarInfo("testfile.txt")
f.seek(0, io.SEEK_END)
info.size = f.tell()
f.seek(0, io.SEEK_SET)
archive.addfile(info, f)

You can us the StringIO module to write the content as a file object to the tar file.
Sample:
import tarfile
import StringIO
tar = tarfile.TarFile("archfile.tar.gz","w")
with open("testfile.txt", 'rb') as f:
s = StringIO.StringIO(f.read())
info = tarfile.TarInfo(name="testfile.txt")
info.size = len(s.buf)
tar.addfile(tarinfo=info, fileobj=s)
tar.close()

Not a perfect answer but I managed to work around this with zipfile.
import zipfile
import io
# create archive
with zipfile.ZipFile("archfile.zip", "w") as archive:
with io.StringIO("TESTTESTTEST") as f:
archive.writestr("1234.txt", f.read())
# read test file out of archive
with zipfile.ZipFile("archfile.zip", "r") as archive:
print(archive.read("1234.txt"))
produces b'TESTTESTTEST'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract Bzip2 file in python - python

Related

Encoding any file type with python

extracting compressed files from .gz extension while downloading them from ftp server [duplicate]

Error reading data from the old files, writing it into the new files and then deleting the old files in Python

the downloaded file is corrupt

Files added to tarfile come back as empty files

Categories

Resources