Unzip files using Python to one folder - python

I want to unzip files with python 2.7.8 .When i try to extract zip files that contain files with same names in it to one folder, some files got lost because duplication names. I try that:
import zipfile,fnmatch,os
rootPath = r"C:\zip"
pattern = '*.zip'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
outpath = r"C:\Project\new"
zipfile.ZipFile(os.path.join(root, filename)).extractall(r"C:\Project\new")
UPDATE:
I try to extract all the files located inside the zip files into one folder only without a new subfolders created. If there are files with the same name i need all the files

The ZipFile.extractall() method simply extracts the files and stores them one by one in the target path. If you want to preserve files with duplicated names you will have to iterate over the members using ZipeFile.namelist() and take appropriate action when you detect duplicates. The ZipFile.read() allows you to read the file contents, then you can write them wherever (and with whatever name) you want.

Related

Using glob to find all zip files recursively in three sub folder

I am trying to look only in three specific subfolders and then recursively create a list of all zip files within the folders. I can easily do this with just 1 folder and recursively look through all subfolders that are within the inputpath, but there are other folders that get created that we cannot use plus we do not know what the folder names will be. So This is where I am at and I am not sure how to pass three subfolders to glob correctly.
# using glob, create a list of all the zip files in specified sub directories COMM, NMR, and NMH inside of input_path
zip_file = glob.glob(os.path.join(inputpath, "/comm/*.zip,/nmr/*.zip,/nmh/*.zip"), recursive=True)
#print(zip_file)
print(f"Found {len(zip_file)} zip files")
The string with commas in it is ... just a string. If you want to perform three globs, you need something like
zip_file = []
for dir in {"comm", "nmr", "nmh"}:
zip_file.extend(glob.glob(os.path.join(inputpath, dir, "*.zip"), recursive=True)
As noted by #Barmar in comments, if you want to look for zip files anywhere within these folders, the pattern needs to be ...(os.path.join(inputpath, dir, "**/*.zip"). If not, perhaps edit your question to provide an example of the structure you want to traverse.

How to unzip a zip file and copy the files into different different folders?

I want to unzip a zip file and want to copy the individual files in different different folders.
Let's for an example, I have a zip file with the name "Feeds.zip". In that I have 3 files name A,B,C. I want to copy these 3 files in different folders A1,B1,C1 respectively.
I have written the below code to unzip the file and know how to extract all the files. But as mentioned my requirement is a bit different.
with zipfile.ZipFile('C:/Feeds.zip', "r") as z:
z.extractall("C:/Desktop/")
Please help.
Instead of using extractall, use namelist to get the names of all the members, then iterate over them and use extract(member, path) to extract them to whatever path you want.
In your example: (where the folders are literally A1,B1,C1)
import os.path
with zipfile.ZipFile('C:/Feeds.zip', "r") as z:
for member in z.namelist():
dirname = os.path.basename(member) + "1"
z.extract(member, dirname)
If the files have extensions and you dont want them to appear in the folder names, use dirname = os.path.basename(member).split('.')[0] + "1".
Of course you can replace dirname with any other folder/path you want for each file. For instance, if you already have an array of paths to which you want to save the files, you can do
for member,path in zip(z.namelist(), paths):
z.extract(member, path)

Create a zip with only .pdf and .xml files from one directory

I would love to know how i can zip only all pdfs from the main directory without including the subfolders.
I've tried several times changing the code, without any succes with what i want to achieve.
import zipfile
fantasy_zip = zipfile.ZipFile('/home/rob/Desktop/projects/zenjobv2/archivetest.zip', 'w')
for folder, subfolders, files in os.walk('/home/rob/Desktop/projects/zenjobv2/'):
for file in files:
if file.endswith('.pdf'):
fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
elif file.endswith('.xml'):
fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
fantasy_zip.close()
I expect that a zip is created only with the .pdfs and .xml files from the zenjobv2 folder/directory without including any other folders/subfolders.
You are looping through the entire directory tree with os.walk(). It sounds like you want to just look at the files in a given directory. For that, consider os.scandir(), which returns an iterator of all files and subdirectories in a given directory. You will just have to filter out elements that are directories:
root = "/home/rob/Desktop/projects/zenjobv2"
for entry in os.scandir(root):
if entry.is_dir():
continue # Just in case there are strangely-named directories
if entry.path.endswith(".pdf") or entry.path.endswith(".xml"):
# Process the file at entry.path as you see fit

Reading all files that start with a certain string in a directory

Say I have a directory.
In this directory there are single files as well as folders.
Some of those folders could also have subfolders, etc.
What I am trying to do is find all of the files in this directory that start with "Incidences" and read each csv into a pandas data frame.
I am able to loop through all the files and get the names, but cannot read them into data frames.
I am getting the error that "___.csv" does not exist, as it might not be directly in the directory, but rather in a folder in another folder in that directory.
I have been trying the attached code.
inc_files2 = []
pop_files2 = []
for root, dirs, files in os.walk(directory):
for f in files:
if f.startswith('Incidence'):
inc_files2.append(f)
elif f.startswith('Population Count'):
pop_files2.append(f)
for file in inc_files2:
inc_frames2 = map(pd.read_csv, inc_files2)
for file in pop_files2:
pop_frames2 = map(pd.read_csv, pop_files2)
You are adding only file name to the lists, not their path. You can use something like this to add paths instead:
inc_files2.append(os.path.join(root, f))
You have to add the path from the root directory where you are
Append the entire pathname, not just the bare filename, to inc_files2.
You can use os.path.abspath(f) to read the full path of a file.
You can make use of this by making the following changes to your code.
for root, dirs, files in os.walk(directory):
for f in files:
f_abs = os.path.abspath(f)
if f.startswith('Incidence'):
inc_files2.append(f_abs)
elif f.startswith('Population Count'):
pop_files2.append(f_abs)

Unable to use getsize method with os.walk() returned files

I am trying to make a small program that looks through a directory (as I want to find recursively all the files in the sub directories I use os.walk()).
Here is my code:
import os
import os.path
filesList=[]
path = "C:\\Users\Robin\Documents"
for(root,dirs,files) in os.walk(path):
for file in files:
filesList+=file
Then I try to use the os.path.getsize() method to elements of filesList, but it doesn't work.
Indeed, I realize that the this code fills the list filesList with characters. I don't know what to do, I have tried several other things, such as :
for(root,dirs,files) in os.walk(path):
filesList+=[file for file in os.listdir(root) if os.path.isfile(file)]
This does give me files, but only one, which isn't even visible when looking in the directory.
Can someone explain me how to obtain files with which we can work (that is to say, get their size, hash them, or modify them...) on with os.walk ?
I am new to Python, and I don't really understand how to use os.walk().
The issue I suspect you're running into is that file contains only the filename itself, not any directories you have to navigate through from your starting folder. You should use os.path.join to combine the file name with the folder it is in, which is the root value yielded by os.walk:
for(root,dirs,files) in os.walk(path):
for file in files:
filesList.append(os.path.join(root, file))
Now all the filenames in filesList will be acceptable to os.path.getsize and other functions (like open).
I also fixed a secondary issue, which is that your use of += to extend a list wouldn't work the way you intended. You'd need to wrap the new file path in a list for that to work. Using append is more appropriate for adding a single value to the end of a list.
If you want to get a list of files including path use:
for(root, dirs, files) in os.walk(path):
fullpaths = [os.path.join(root, fil) for fil in files]
filesList+=fullpaths

Categories