This question already has answers here:
Python append multiple files in given order to one big file
(12 answers)
Closed 6 years ago.
I have a directory on my system that contains ten zip files. Each zip file contains 1 text file. I want to write a Python script that unzips all of the files in the directory, and then concatenates all of the resulting (unzipped) files into a single file. How can I do this? So far, I have a script that is unzipping all of the files, but I am not sure how to go about adding the concatenation. Below is what I have.
import os, zipfile
dir_name = '/path/to/dir'
pattern = "my-pattern*.gz"
os.chdir(dir_name) # change directory from working dir to dir with files
for item in os.listdir(dir_name): # loop through items in dir
if item == pattern: # check for my pattern extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(dir_name) # extract file to dir
zip_ref.close() # close file
You don't have to write the files to disk when you unzip them, Python can read the file directly from the zip. So, assuming you don't need anything except the concatenated result, replace your last two lines with:
for zipfile in zip_ref.namelist():
with open('targetfile', 'a') as target:
target.write(zip_ref.read(zipfile))
Related
This question already has answers here:
Extract files from zip file and retain mod date?
(4 answers)
Closed last year.
I'm unzipping hundreds of zipped files with python as explained here.
import os
import zipfile
base_dir = '/users/me/myFile' # absolute path to the data folder
extension = ".zip"
os.chdir(base_dir) # change directory from working dir to dir with files
def unpack_all_in_dir(_dir):
for item in os.listdir(_dir): # loop through items in dir
abs_path = os.path.join(_dir, item) # absolute path of dir or file
if item.endswith(extension): # check for ".zip" extension
file_name = os.path.abspath(abs_path) # get full path of file
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(_dir) # extract file to dir
zip_ref.close() # close file
elif os.path.isdir(abs_path):
unpack_all_in_dir(abs_path) # recurse this function with inner folder
unpack_all_in_dir(base_dir)
When I unzip a file manually it will get its original modification date, whereas when doing it with code I loose this - the modification date turns into now's date.
Any idea of a way the preserve the original creation date?
I don't know zipfile very well, but according to this thread about modification date, preserving metadata is a pain.
You can hack around with calling some CLI archiving program as a subprocess, but you need to make sure that it's installed on the target system. I actually had to bundle 7zip with my script once, because of some issue with Python libraries, even third-party ones
Actually opening multiple zip files manually keeps the dates, at least on macOS.
Search for the files you need to unzip, select -> open.
This question already has an answer here:
ZIp only contents of directory, exclude parent
(1 answer)
Closed 1 year ago.
I am trying to use zipfile to zip several files together into a single .zip file. The files I need to zip are not in the root folder from which the python script runs, so i have to specify the path when adding a file to the zip. The problem is that I end up with a folder structure in the zip file, i really only want each file and not the folders. so..
zip = zipfile.ZipFile('./tmp/afile.zip', 'w')
zip.write('./tmp/file1.txt')
zip.write('./tmp/items/file2.txt')
results in a zip files that extracts to:
.
|
|-tmp
| |file.txt
| |-items
| | file2.txt
Is there any way to just add the files to the "root" of the zip file and not creature the folders?
try with Pathlib which returns a posix object with various attributes.
Also I would caution you about using zip as a variable as that's a key function in python
from pathlib import Path
from zipfile import ZipFile
zipper = ZipFile('afile.zip', 'w')
files = Path(root).rglob('*.txt') #get all files.
for file in files:
zipper.write(file,file.name)
This question already has answers here:
Python error os.walk IOError
(2 answers)
Closed 4 years ago.
I am trying to create a list of paths for multiple files with the same name and format from different folders. I tried doing this with os.walk with the following code:
import os
list_raster = []
for (path, dirs, files) in os.walk(r"C:\Users\Douglas\Rasters\Testing folder"):
for file in files:
if "woody02.tif" in file:
list_raster.append(files)
print (list_raster)
However, this only gives me two things
the file name
All file names in each folder
I need the the full location of only the specified 'woody02.txt' in each folder.
What am I doing wrong here?
The full path name is the first item in the tuples in the list returned by os.walk, so it is assigned to your path variable already.
Change:
list_raster.append(files)
to:
list_raster.append(os.path.join(path, file))
In the example code you posted you are appending files to your list instead of just the current file, in order to get the full path and file name for the current file you would need to change your code to something like this:
import os
list_raster = []
for (path, dirs, files) in os.walk(r"C:\Users\Douglas\Rasters\Testing folder"):
for file in files:
if "woody02.tif" in file:
# path will hold the current directory path where os.walk
# is currently looking and file would be the matching
# woody02.tif
list_raster.append(os.path.join(path, file))
# wait until all files are found before printing the list
print(list_raster)
This question already has answers here:
Rename multiple files in a directory in Python
(15 answers)
Closed 4 years ago.
Looking to change the file extension from .txt to .csv
import os, shutil
for filename in os.listdir(directory):
# if the last four characters are “.txt” (ignoring case)
# (converting to lowercase will include files ending in “.TXT”, etc)
if filename.lower().endswidth(“.txt”):
# generate a new filename using everything before the “.txt”, plus “.csv”
newfilename = filename[:-4] + “.csv”
shutil.move(filename, newfilename)
You can use os and rename.
But let me give you a small advice. When you do these kind of operations as (copy, delete, move or rename) I'd suggest you first print the thing you are trying to achieve. This would normally be the startpath and endpath.
Consider this example below where the action os.rename() is commented out in favor of print():
import os
for f in os.listdir(directory):
if f.endswith('.txt'):
print(f, f[:-4]+'.csv')
#os.rename(f, f[:-4]+'.csv')
By doing this we could be certain things look ok. And if your directory is somewhere else than . You would probably need to do this:
import os
for f in os.listdir(directory):
if f.endswith('.txt'):
fullpath = os.path.join(directory,f)
print(fullpath, fullpath[:-4]+'.csv')
#os.rename(fullpath, fullpath[:-4]+'.csv')
The os.path.join() will make sure the directory path is added too.
This question already has answers here:
How to create a zip archive of a directory?
(28 answers)
Closed 1 year ago.
I've got a folder called: 'files' which contains lots of jpg photographs. I've also got a file called 'temp.kml'. I want to create a KMZ file (basically a zip file) which contains the temp.kml file AND the files directory which has the photographs sitting inside it.
Here is my code:
zfName = 'simonsZip.kmz'
foo = zipfile.ZipFile(zfName, 'w')
foo.write("temp.kml")
foo.close()
os.remove("temp.kml")
This creates the kmz file and puts the temp.kml inside. But I also want to put the folder called 'files' in as well. How do I do this?
I read here on StackOverflow that some people have used shutil to make zip files. Can anyone offer a solution?
You can use shutil
import shutil
shutil.make_archive("simonsZip", "zip", "files")
The zipfile module in python has no support for adding a directory with file so you need to add the files one by one.
This is an (untested) example of how that can be achieved by modifying your code example:
import os
zfName = 'simonsZip.kmz'
foo = zipfile.ZipFile(zfName, 'w')
foo.write("temp.kml")
# Adding files from directory 'files'
for root, dirs, files in os.walk('files'):
for f in files:
foo.write(os.path.join(root, f))
foo.close()
os.remove("temp.kml")