Python: Unzip selected files in directory tree - python

I have the following directory, in the parent dir there are several folders lets say ABCD and within each folder many zips with names as displayed and the letter of the parent folder included in the name along with other info:
-parent--A-xxxAxxxx_timestamp.zip
-xxxAxxxx_timestamp.zip
-xxxAxxxx_timestamp.zip
--B-xxxBxxxx_timestamp.zip
-xxxBxxxx_timestamp.zip
-xxxBxxxx_timestamp.zip
--C-xxxCxxxx_timestamp.zip
-xxxCxxxx_timestamp.zip
-xxxCxxxx_timestamp.zip
--D-xxxDxxxx_timestamp.zip
-xxxDxxxx_timestamp.zip
-xxxDxxxx_timestamp.zip
I need to unzip only selected zips in this tree and place them in the same directory with the same name without the .zip extension.
Output:
-parent--A-xxxAxxxx_timestamp
-xxxAxxxx_timestamp
-xxxAxxxx_timestamp
--B-xxxBxxxx_timestamp
-xxxBxxxx_timestamp
-xxxBxxxx_timestamp
--C-xxxCxxxx_timestamp
-xxxCxxxx_timestamp
-xxxCxxxx_timestamp
--D-xxxDxxxx_timestamp
-xxxDxxxx_timestamp
-xxxDxxxx_timestamp
My effort:
for path in glob.glob('./*/xxx*xxxx*'): ##walk the dir tree and find the files of interest
zipfile=os.path.basename(path) #save the zipfile path
zip_ref=zipfile.ZipFile(path, 'r')
zip_ref=extractall(zipfile.replace(r'.zip', '')) #unzip to a folder without the .zip extension
The problem is that i dont know how to save the A,B,C,D etc to include them in the path where the files will be unzipped. Thus, the unzipped folders are created in the parent directory. Any ideas?

The code that you have seems to be working fine, you just to make sure that you are not overriding variable names and using the correct ones. The following code works perfectly for me
import os
import zipfile
import glob
for path in glob.glob('./*/xxx*xxxx*'): ##walk the dir tree and find the files of interest
zf = os.path.basename(path) #save the zipfile path
zip_ref = zipfile.ZipFile(path, 'r')
zip_ref.extractall(path.replace(r'.zip', '')) #unzip to a folder without the .zip extension

Instead of trying to do it in a single statement , it would be much easier and more readable to do it by first getting list of all folders and then get list of files inside each folder. Example -
import os.path
for folder in glob.glob("./*"):
#Using *.zip to only get zip files
for path in glob.glob(os.path.join(".",folder,"*.zip")):
filename = os.path.split(path)[1]
if folder in filename:
#Do your logic

Related

Moving Files from directory to their own folders

I am struggling with the paths and directories to solve this problem. Basically, I have a long list of .lammps files in one directory. My goal is to copy each file and move it into its own folder (which is one directory back) where its folder name is the file name minus the .lammps. All of the folders are already made, I just can't seem to figure out moving them. The entire list of files is in the Files directory. The individual folders are in the ROTATED FILES directory. Here is what I have. Any tips greatly appreciated.
Here is a file example
n-optimized.new.10_10-90-10_10.Ni00Nj01.lammps
The folder for this file is then named
n-optimized.new.10_10-90-10_10.Ni00Nj01
import os
file_directory = os.chdir("C:\Py Practice\ROTATED FILES\Files")
files = os.listdir()
for file in files:
# get the file -.lammps string
name1 = file.split('.')[0:4]
name2 = ".".join(name1)
# get the path for the files new respective folder (back a directory and paste folder name)
file_folder = "C:\Py Practice\ROTATED FILES/" + name2
# Move
combined_path = os.path.join(file, file_folder)
I've tried shutil and figured path join might be easier.
First of all, the code you have here shouldn't work since you either have to escape backslashes or use a raw string. Secondly, rather than using os for file system operations, it's much better to learn how to use pathlib (also a core python module) which provides a more modern object-oriented approach to file operations.
Using pathlib and shutil you can do something like
from pathlib import Path
from shutil import copyfile
file_directory = Path(r"C:\Py Practice\ROTATED FILES\Files")
# get the list of source files
source_files = [f for f in file_directory.glob('*.lammps')]
# create target file paths
target_files = [file_directory.parent / f.stem/ f.name for f in source_files]
for source, target in zip(source_files, target_files):
copyfile(str(source), str(target))
Here we're accessing different parts of file path using a convenient OOP structure. For example, if your file f is located in 'c:/foo/bar/boo.txt' then f.name is just the name of file: boo.txt, f.stem is the stem part of the file name (excluding the extension) boo, f.parent is its parent directory 'c:/foo/bar/' etc.
There's a really handy graphic of pathlib Path objects here.
The only inconvenience is that not all of core modules support Path objects yet so for copyfile we just need to get the string representation by calling str on the object.
And you don't even need to have target folders created beforehand, it's very easy to create the necessary folder structure as you go along:
from pathlib import Path
from shutil import copyfile
file_directory = Path(r"C:\Py Practice\ROTATED FILES\Files")
# get the list of source files
source_files = [f for f in file_directory.glob('*.lammps')]
# create target file paths
target_files = [file_directory.parent / f.stem/ f.name for f in source_files]
for source, target in zip(source_files, target_files):
# check that target directory exists
# and create a folder if not
if not target.parent.is_dir():
target.parent.mkdir()
copyfile(str(source), str(target))

Sorting through many files and unzipping them at once

I have a program that currently unzips files in a specific folder. However, I have many files in t that need to be sorted through. In my trades folder there are many coins and each coin has many files. I can get the files in each coin but cannot go through the folders initially for each coin. I need to be able to go through all those files without manually changing the directory to each coin.
This is how the files are set up
import os
import zipfile
dir_name = 'E:/binance-public-data/python/data/spot/monthly/trades'
extension = ".zip"
for item in os.listdir(dir_name):
if item.endswith(extension):
file_name = os.path.abspath(item)
zip_ref = zipfile.ZipFile(file_name)
zip_ref.extractall(dir_name)
zip_ref.close()
os.remove(file_name)
os.listdir gives you the file name in the directory. You would have to rebuild its path from your current directory before accessing the file. os.path.abspath(item) created an absolute path from your current working directory, not the directory holding your zip file.
You can use glob instead. It will filter out non-zip file extensions and will keep the relative path to the found file so you don't have to fiddle with the path. The pathlib module makes path handling a bit easier than glob.glob and os.path, so I'll use it here.
import zipfile
from pathlib import Path
dir_name = Path('E:/binance-public-data/python/data/spot/monthly/trades')
extension = ".zip"
for file_name in dir_name.glob("*/*" + extension):
with zipfile.ZipFile(file_name) as zip_ref:
zip_ref.extractall(file_name.parent)
file_name.unlink()

Python: Moving csv files after reading

I'm wanting to move .csv files after reading them.
The code I've come up with is to move any .csv files found in a folder, then direct to an archive folder.
src1 = "\\xxx\xxx\Source Folder"
dst1 = "\\xxx\xxx\Destination Folder"
for root, dirs, files in os.walk(src1):
for f in files:
if f.endswith('.csv'):
shutil.move(os.path.join(root,f), dst1)
Note: I imported shutil at the beginning of my code.
Note 2: The destination archive folder is within the source folder - will this have implications for the above code?
When I run this, nothing happens. I get no error messages and the file remains in the source folder.
Any insight is appreciated.
Edit (some context on my goal):
My overall code will be used to read .csv files that are moved manually into a source folder by users - I then want to archive these .csv files using Python once the data has been used. Every .csv file placed into the source folder by the users will have a different name - no .csv file name will be the same, which is why I want to search the source folder for .csv files and move them all.
You can use the pathlib module. I'm assuming you have got the same folder structure in the destination directory.
from pathlib import Path
src1 = "<Path to source folder>"
dst1 = "<Path to destination folder>"
for csv_file in Path(src1).glob('**/*.csv'):
relative_file_path = csv_file.relative_to(src1)
destination_path = dst1 / relative_file_path
csv_file.rename(destination_path)
Explanation-
for csv_file in Path(src1).glob('**/*.csv'):
The glob(returns generator object) will capture all the CSV files in the directory as well as in the subdirectory. Now, we can iterate over the files one by one.
relative_file_path = csv_file.relative_to(src1)
All the csv_files are now pathlib path objects. So, we can use the functions that the library provides. One such function is relative to. Here It'll copy the relative path of the file from the src folder. Let's say you have a CSV file like-
scr_folder/A/B/c.csv - It'll copy A/B/c.csv
destination_path = dst1 / relative_file_path
As the folder structure is the same the destination path now becomes -
dst_folder/A/B/c.csv
csv_file.rename(destination_path)
At Last, rename will just move the file from src to destination.
After a bunch of research I have found a solution:
import shutil
source = r"\\xx\Source"
destination = r"\\xx\Destination"
files = os.listdir(source)
for file in files:
new_path = shutil.move(f"{source}/{file}", destination)
print(new_path)
I was making it more complicated than it needed to be - because all files in the folder would be .csv anyway, I just needed to move all files. Thanks stackoverlfow.

How to create zip of a specific folder using shutil

I'm trying to zip all the folders of a particular directory separately with their respective folder names as the zip file name just the way how winzip does.My code below:
folder_list = os.walk('.').next()[1] # bingo
print folder_list
for each_folder in folder_list:
shutil.make_archive(each_folder, 'zip', os.getcwd())
But what it is doing is creating a zip of one folder and dumping all other files and folders of that directory into the zip file.LIke that it is doing for all the folders inside the current directory.
Any help on this !!!
With a little more research in shutil, I'm now able to make my code work. Below is my code:
import os, shutil
#Get the list of all folders present within the particular directory
folder_list = os.walk('.').next()[1]
#Start zipping the folders
for each_folder in folder_list:
shutil.make_archive(each_folder, 'zip', os.getcwd() + "\\" + each_folder)
I don;t think that is possible. You could look at the source.
In particular, at line 683 you can see that it explicitly passes compression=zipfile.ZIP_DEFLATED if your Python has the zipfile module, while the fallback code at line 635 doesn't pass any arguments besides -r and -q to the zip command-line tool.
You could try this one:
args = ['tar', '-zcf', dest_file_name, '-C', me_directory, '.']
res = subprocess.call(args)
If you want to get list of directories use some well written libs:
for (root, directories, _) in os.walk(my_dir):
for dir_name in directories:
path_to_dir = os.path.join(root, dir_name)// don't make concat, like a+'//'+b, thats not enviroment saint

Directory is not being recognized in Python

I'm uploading a zipped folder that contains a folder of text files, but it's not detecting that the folder that is zipped up is a directory. I think it might have something to do with requiring an absolute path in the os.path.isdir call, but can't seem to figure out how to implement that.
zipped = zipfile.ZipFile(request.FILES['content'])
for libitem in zipped.namelist():
if libitem.startswith('__MACOSX/'):
continue
# If it's a directory, open it
if os.path.isdir(libitem):
print "You have hit a directory in the zip folder -- we must open it before continuing"
for item in os.listdir(libitem):
The file you've uploaded is a single zip file which is simply a container for other files and directories. All of the Python os.path functions operate on files on your local file system which means you must first extract the contents of your zip before you can use os.path or os.listdir.
Unfortunately it's not possible to determine from the ZipFile object whether an entry is for a file or directory.
A rewrite or your code which does an extract first may look something like this:
import tempfile
# Create a temporary directory into which we can extract zip contents.
tmpdir = tempfile.mkdtemp()
try:
zipped = zipfile.ZipFile(request.FILES['content'])
zipped.extractall(tmpdir)
# Walk through the extracted directory structure doing what you
# want with each file.
for (dirpath, dirnames, filenames) in os.walk(tmpdir):
# Look into subdirectories?
for dirname in dirnames:
full_dir_path = os.path.join(dirpath, dirname)
# Do stuff in this directory
for filename in filenames:
full_file_path = os.path.join(dirpath, filename)
# Do stuff with this file.
finally:
# ... Clean up temporary diretory recursively here.
Usually to make things handle relative paths etc when running scripts you'd want to use os.path.
It seems to me that you're reading from a Zipfile the items you've not actually unzipped it so why would you expect the file/dirs to exist?
Usually I'd print os.getcwd() to find out where I am and also use os.path.join to join with the root of the data directory, whether that is the same as the directory containing the script I can't tell. Using something like scriptdir = os.path.dirname(os.path.abspath(__file__)).
I'd expect you would have to do something like
libitempath = os.path.join(scriptdir, libitem)
if os.path.isdir(libitempath):
....
But I'm guessing at what you're doing as it's a little unclear for me.

Categories