How to work with CSV files inside a zipped folder?

How to work with CSV files inside a zipped folder? - python

I'm working with zipped files in python for the first time, and I'm stumped.
I read the documentation for zipfile, but I'm not sure what would be the best way to do what I'm trying to do. I have a zipped folder with CSV files inside, and I'd like to be able to open the zip file, and retrieve certain values from the csv files inside.
Do I use zipfile.extract(file name here) to bring it to the current working directory? And if I do that, do I just use the file name to work with the file, or does this index or list them differently?
Currently, I manually extract all files in the zipped folder to the current working directory for my project, and then use the csv module to read them. All I'm really trying to do is remove that step.
Any and all help would be greatly appreciated!

You are looking to avoid extracting to disk, in the zip docs for python there is ZipFile.open() which gives you a file-like object. That is an object that mostly behaves like a regular file on disk, but it is in memory. It gives a bytes array when read, at least in py3.
Something like this...
from zipfile import ZipFile
import csv
with ZipFile('abc.zip') as myzip:
print(myzip.filelist)
for mf in myzip.filelist:
with myzip.open(mf.filename) as myfile:
mc = myfile.read()
c = csv.StringIO(mc.decode())
for row in c:
print(row)
The documentation of Python is actually quite good once one has learned how to find things as well as some of the basic programming terms/descriptions used in the documentation.
For some reason csv.BytesIO is not implemented, hence the extra step via csv.StringIO.

Related

Merging Multiple (Ideally) JSON Files Into One

Simple enough situation; I'm working from within a directory which contains a script, and a subdirectory at the same level which contains many JSON files.
Using ideally Python, I'd like to combine all of the JSON files into one. Depending on your suggestion, this may leave behind redundant headers, but I can pop those off the JSON as I convert that file into a python dictionary object. Not a problem.
The problem is that I have been unable to combine the files into one. I'm practicing on text files for a start, to no avail. I'm using the python "os" module, but no luck. Keenly;
path = "/Users/me/ScriptsAndData/BagOfJSON"
...
for filename in os.listdir(path):
with open(filename, 'rb') as read file:
....
Results in the error;
with open(filename, 'rb') as readfile:
FileNotFoundError: [Errno 2] No such file or directory: 'firstFile.JSON'
And this finds and names the first file from within the directory, but doesn't operate on it like a file.
tldr;
I'm trying to merge multiple JSON files, all located within a single directory, into a single JSON file. If you know how to do this for any filetype, I'd be happy to know how you do it, then build from there.
Cheers!

Reading gzipped data in Python

I have a *.tar.gz compressed file that I would like to read in with Python 2.7. The file contains multiple h5 formatted files as well as a few text files. I'm a novice with Python. Here is the code I'm trying to adapt:
`subset_path='c:\data\grant\files'
f=gzip.open(filename,'subset_full.tar.gz')
subset_data_path=os.path.join(subset_path,'f')
The first statement identifies the path to the folder with the data. The second statement tells Python to open a specific compressed file and the third statement (hopefully) executes a join of the prior two statements.
Several lines below this code I get an error when Python tries to use the 'subset_data_path' assignment.
What's going on?

The gzip module will only open a single file that has been compressed, i.e. my_file.gz. You have a tar archive of multiple files that are also compressed. This needs to be both untarred and uncompressed.
Try using the tarfile module instead, see https://docs.python.org/2/library/tarfile.html#examples
edit: To add a bit more information on what has happened, you have successfully opened the zipped tarball into a gzip file object, which will work almost the same as a standard file object. For instance you could call f.readlines() as if f was a normal file object and it would return the uncompressed lines.
However, this did not actually unpack the archive into new files in the filesystem. You did not create a subdirectory 'c:\data\grant\files\f', and so when you try to use the path subset_data_path you are looking for a directory that does not exist.
The following ought to work:
import tarfile
subset_path='c:\data\grant\files'
tar = tarfile.open("subset_full.tar.gz")
tar.extractall(subset_path)
subset_data_path=os.path.join(subset_path,'subset_full')

Opening/reading a list of unknown files using I/O methods

So I'm a newb :) Python question
I have a list of files and I'm looking to open/read these files using an I/O method
I understand if I explicitly go through each test file I've created and opening them one by one would be fine but how about if I have an unknown file and I tell it to be open/read, how would this be done?
Logically thinking, it sounds like I need to create a variable and assign it to a list of files and from there tell it open all the files in the list. So a for loop perhaps?

You can do it as follows:
import os
for fl in os.listdir(os.getcwd()):
with open(fl) as f:
#do stuff
Alternatively, if your files are not in the same directory as your script, you can do:
for fl in os.listdir('custom/path/to/files'):

Using Python, getting the name of files in a zip archive

I have several very large zip files available to download on a website. I am using Flask microframework (based on Werkzeug) which uses Python.
Is there a way to show the contents of a zip file (i.e. file and folder names) - to someone on a webpage - without actually downloading it? As in doing the working out server side.
Assume that I do not know what are in the zip archives myself.
I apoligize that this post does not include code.
Thank you for helping.

Sure, have a look at zipfile.ZipFile.namelist(). Usage is pretty simple, as you'd expect: you just create a ZipFile object for the file you want, and then namelist() gives you a list of the paths of files stored in the archive.
with ZipFile('foo.zip', 'r') as f:
names = f.namelist()
print names
# ['file1', 'folder1/file2', ...]

http://docs.python.org/library/zipfile.html
Specifically, try using the ZipFile.namelist() method.

Delete file from zipfile with the ZipFile Module

The only way I came up for deleting a file from a zipfile was to create a temporary zipfile without the file to be deleted and then rename it to the original filename.
In python 2.4 the ZipInfo class had an attribute file_offset, so it was possible to create a second zip file and copy the data to other file without decompress/recompressing.
This file_offset is missing in python 2.6, so is there another option than creating another zipfile by uncompressing every file and then recompressing it again?
Is there maybe a direct way of deleting a file in the zipfile, I searched and didn't find anything.

The following snippet worked for me (deletes all *.exe files from a Zip archive):
zin = zipfile.ZipFile ('archive.zip', 'r')
zout = zipfile.ZipFile ('archve_new.zip', 'w')
for item in zin.infolist():
buffer = zin.read(item.filename)
if (item.filename[-4:] != '.exe'):
zout.writestr(item, buffer)
zout.close()
zin.close()
If you read everything into memory, you can eliminate the need for a second file. However, this snippet recompresses everything.
After closer inspection the ZipInfo.header_offset is the offset from the file start. The name is misleading, but the main Zip header is actually stored at the end of the file. My hex editor confirms this.
So the problem you'll run into is the following: You need to delete the directory entry in the main header as well or it will point to a file that doesn't exist anymore. Leaving the main header intact might work if you keep the local header of the file you're deleting as well, but I'm not sure about that. How did you do it with the old module?
Without modifying the main header I get an error "missing X bytes in zipfile" when I open it. This might help you to find out how to modify the main header.

Not very elegant but this is how I did it:
import subprocess
import zipfile
z = zipfile.ZipFile(zip_filename)
files_to_del = filter( lambda f: f.endswith('exe'), z.namelist()]
cmd=['zip', '-d', zip_filename] + files_to_del
subprocess.check_call(cmd)
# reload the modified archive
z = zipfile.ZipFile(zip_filename)

The routine delete_from_zip_file from ruamel.std.zipfile¹ allows you to delete a file based on its full path within the ZIP, or based on (re) patterns. E.g. you can delete all of the .exe files from test.zip using
from ruamel.std.zipfile import delete_from_zip_file
delete_from_zip_file('test.zip', pattern='.*.exe')
(please note the dot before the *).
This works similar to mdm's solution (including the need for recompression), but recreates the ZIP file in memory (using the class InMemZipFile()), overwriting the old file after it is fully read.
¹ Disclaimer: I am the author of that package.

Based on Elias Zamaria comment to the question.
Having read through Python-Issue #51067, I want to give update regarding it.
For today, solution already exists, though it is not approved by Python due to missing Contributor Agreement from the author.
Nevertheless, you can take the code from https://github.com/python/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py and create a separate file from it. After that just reference it from your project instead of built-in python library: import myproject.zipfile as zipfile.
Usage:
with zipfile.ZipFile(f"archive.zip", "a") as z:
z.remove(f"firstfile.txt")
I believe it will be included in future python versions. For me it works like a charm for given use case.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to work with CSV files inside a zipped folder? - python

Related

Merging Multiple (Ideally) JSON Files Into One

Reading gzipped data in Python

Opening/reading a list of unknown files using I/O methods

Using Python, getting the name of files in a zip archive

Delete file from zipfile with the ZipFile Module

Categories

Resources