everyone. I'm really new to Python, so I need some help here.
I have a list of folders names inside a .CSV file. All these folders are inside the same path.
I need to zip them individually (each one needs to become a .ZIP file, maintaining its own original name) and, after zipping, delete the original folders.
Tried some things here, but had no success :(
I read about zipfiles, os.walk, import csv, but I can't get these things together.
Can someone help me with this one?
The code is here. I'm really sorry, it probably makes no sense as it is. I'm really a begginer :(
import os
import zipfile
import csv
import sys
os.chdir('dir')
files='dir'
for i in range (len(files)):
with zipfile.ZipFile(files(i)+'.zip', 'w') as zipMe:
zipMe.write(files[i], compress_type=zipfile.ZIP_DEFLATED)
The read_csv() function is called to open the CSV file and read the directory names from the CSV into a list named dir_names.
The read_csv() function calls the zip_dirs() function and passes the dir_names list containing the directory names. The zip_dirs() function zips each directory into a zip file and saves it.
The zip_dirs() function calls the delete_dirs() function and it deletes the original directories.
import os
import csv
import zipfile
dir_names = []
zip_out = 'dirs.zip'
dir_csv = 'dir_names.csv'
parent_dir = 'dir-to-zip/'
# delete directories
def delete_dirs(dir_names):
for dir in dir_names:
os.rmdir(parent_dir + dir)
# zip directories
def zip_dirs(dir_names):
for dir in dir_names:
zip_process = zipfile.ZipFile(dir + '.zip', "w", zipfile.ZIP_DEFLATED)
zip_process.write(parent_dir + dir)
zip_process.close()
delete_dirs(dir_names)
# read the directory names from csv file
def read_csv():
with open(dir_csv, 'r') as f:
csv_reader = csv.reader(f, delimiter=',')
for row in csv_reader:
dir_names.append(row[0])
zip_dirs(dir_names)
read_csv()
CSV Input:
| ------|
| dir1 |
| dir2 |
| dir3 |
| dir4 |
Directory Structure:
./dir-to-zip
/dir1
/dir2
/dir3
/dir4
Using the shutil library in python
#importing libraries
import pandas as pd
import os
import shutil
### *****below directories are simply examples; please set your own directories*****
folder_directory = 'D:\\Zip test' #setting directory where original folders are present
os.chdir(folder_directory) #changing current directory so that zip files are formed in the original directory
df = pd.read_csv(r'C:\Users\sahaa\Downloads\folder_names.csv') # reading the csv file with folder names
fol_nam = df['Folder Names'].tolist() # creating a list of the folders to be zipped which are present under the column 'Folder Names' in the CSV.
for i in fol_nam: # for loop
shutil.make_archive(i, 'zip', 'D:\\Zip test' + '\\' + i) # zipping each folder and keeping the nomenclature same
shutil.rmtree('D:\\Zip test' + '\\' + i) # deleting original folder after zipping
Try this
└───main_dir
├───dir01
├───dir02
└───dir03
....
import shutil
import os
def compress_directory(source_dir, output_filename, kind='zip'):
shutil.make_archive(output_filename, kind, source_dir)
main_dir = r'S:\repo\tools\main_dir'
# list sub folders
all_dirs = [os.path.join(*[main_dir, sub_dir]) for sub_dir in os.listdir(main_dir)]
# zip sub dirs
for sub_dir in all_dirs:
compress_directory(sub_dir, sub_dir)
# remove sub directory
shutil.rmtree(sub_dir)
# zip main dir
compress_directory(main_dir, main_dir)
# remove main directory
shutil.rmtree(main_dir)
...
└───main_dir.zip
├───dir01.zip
├───dir02.zip
└───dir03.zip
Related
This question already has answers here:
Moving all files from one directory to another using Python
(11 answers)
Closed 6 months ago.
I have a folder with a lot of tutorial links, so I wanted to create a script that reads the file name and for instance, if the file has in its name the word "VBA" or "Excel" it would create the folder Excel and send to it. The same would happen with files containing the word "python".
The code is running, but nothing happens and the files still in the same directory. Does anyone have an idea of what I'm doing wrong?
Here is what I have in the folder, all files are links to youtube tutorials or websites:
Please see my code below:
import os
import shutil
os.chdir(r"C:\Users\RMBORGE\Desktop\Useful stuff")
path = r"C:\Users\RMBORGE\Desktop\Useful stuff\Excel"
for f in os.listdir():
if "vba" in f:
shutil.move(os.chdir,path)
Try this
import os
import shutil
path_to_files = 'some path'
files = os.listdir(path_to_files)
for f in files:
if 'Excel' in f:
created_folder = os.path.join(path_to_files, 'Excel')
filepath = os.path.join(path_to_files, f)
os.makedirs(created_folder, exist_ok=True)
shutil.move(filepath, created_folder)
NB: You can add more if statements for different keywords like Excel
Use pathlib mkdir for creating the folders. Prepare the folders/keywords you want sort in the list 'folders'. And then what is important is skipping the folders because os.listdir() gives you the folders aswell (and there is an error if you want to move a folder into itself)
import os
import shutil
import pathlib
folders = ["vba", "Excel"]
path = "/home/vojta/Desktop/THESIS/"
for f in os.listdir():
if not os.path.isdir(os.path.join(path, f)): # skip folders
for fol in folders:
if fol in f:
pathlib.Path(os.path.join(path, fol)).mkdir(parents=True, exist_ok=True)
fol_path = os.path.join(path, fol)
shutil.move(os.path.join(path, f), os.path.join(fol_path, f))
Hi I'm working on a simple script that copy files from a directory to another based on a dataframe that contains a list of invoices.
Is there any way to do this as a partial match? like i want all the files that contains "F11000", "G13000" and go on continue this loop until no more data in DF.
I tried to figure it out by myself and I'm pretty sure changing the "X" on the copy function will do the trick, but can't see it.
import pandas as pd
import os
import glob
import shutil
data = {'Invoice':['F11000','G13000','H14000']}
df = pd.DataFrame(data,columns=['Doc'])
path = 'D:/Pyfilesearch'
dest = 'D:/Dest'
def find(name,path):
for root,dirs,files in os.walk(path):
if name in files:
return os.path.join(root,name)
def copy():
for x in df['Invoice']:
shutil.copy(find(x,path),dest)
copy()
Using pathlib
This is part of the standard library
Treats paths and objects with methods instead of strings
Python 3's pathlib Module: Taming the File System
Script assumes dest is an existing directory.
.rglob searches subdirectories for files
from pathlib import Path
import pandas as pd
import shutil
# convert paths to pathlib objects
path = Path('D:/Pyfilesearch')
dest = Path('D:/Dest')
# find files and copy
for v in df.Invoice.unique(): # iterate through unique column values
files = list(path.rglob(f'*{v}*')) # create a list of files for a value
files = [f for f in files if f.is_file()] # if not using file extension, verify item is a file
for f in files: # iterate through and copy files
print(f)
shutil.copy(f, dest)
Copy to subdirectories for each value
path = Path('D:/Pyfilesearch')
for v in df.Invoice.unique():
dest = Path('D:/Dest')
files = list(path.rglob(f'*{v}*'))
files = [f for f in files if f.is_file()]
dest = dest / v # create path with value
if not dest.exists(): # check if directory exists
dest.mkdir(parents=True) # if not, create directory
for f in files:
shutil.copy(f, dest)
I have a one folder, within it contains 5 sub-folders.
Each sub folder contains some 'x.txt','y.txt' and 'z.txt' files and it repeats in every sub-folders
Now I need to read and print only 'y.txt' file from all sub-folders.
My problem is I'm unable to read and print 'y.txt' files. Can you tell me how solve this problem.
Below is my code which I have written for reading y.txt file
import os, sys
import pandas as pd
file_path = ('/Users/Naga/Desktop/Python/Data')
for root, dirs, files in os.walk(file_path):
for name in files:
print(os.path.join(root, name))
pd.read_csv('TextInformation.txt',delimiter=";", names = ['Name', 'Value'])
error :File TextInformation.txt does not exist: 'TextInformation.txt'
You could also try the following approach to fetch all y.txt files from your subdirectories:
import glob
import pandas as pd
# get all y.txt files from all subdirectories
all_files = glob.glob('/Users/Naga/Desktop/Python/Data/*/y.txt')
for file in all_files:
data_from_this_file = pd.read_csv(file, sep=" ", names = ['Name', 'Value'])
# do something with the data
Subsequently, you can apply your code to all the files within the list all_files. The great thing with glob is that you can use wilcards (*). Using them you don't need the names of the subdirectories (you can even use it within the filename, e.g. *y.txt). Also see the documentation on glob.
Your issue is forgot adding the parent path of 'y.txt' file. I suggest this code for you, hope it help.
import os
pth = '/Users/Naga/Desktop/Python/Data'
list_sub = os.listdir(pth)
filename = 'TextInformation.txt'
for sub in list_sub:
TextInfo = open('{}/{}/{}'.format(pth, sub, filename), 'r').read()
print(TextInfo)
I got you a little code. you can personalize it anyway you like but the code works for you.
import os
for dirPath,foldersInDir,fileName in os.walk(path_to_main_folder):
if fileName is not []:
for file in fileName:
if file.endswith('y.txt'):
loc = os.sep.join([dirPath,file])
y_txt = open(loc)
y = y_txt.read()
print(y)
But keep in mind that {path_to_main} is the path that has the subfolders.
I am new to python and currently work on data analysis.
I am trying to open multiple folders in a loop and read all files in folders.
Ex. working directory contains 10 folders needed to open and each folder contains 10 files.
My code for open each folder with .txt file;
file_open = glob.glob("home/....../folder1/*.txt")
I want to open folder 1 and read all files, then go to folder 2 and read all files... until folder 10 and read all files.
Can anyone help me how to write loop to open folder, included library needed to be used?
I have my background in R, for example, in R I could write loop to open folders and files use code below.
folder_open <- dir("......./main/")
for (n in 1 to length of (folder_open)){
file_open <-dir(paste0("......./main/",folder_open[n]))
for (k in 1 to length of (file_open){
file_open<-readLines(paste0("...../main/",folder_open[n],"/",file_open[k]))
//Finally I can read all folders and files.
}
}
This recursive method will scan all directories within a given directory and then print the names of the txt files. I kindly invite you to take it forward.
import os
def scan_folder(parent):
# iterate over all the files in directory 'parent'
for file_name in os.listdir(parent):
if file_name.endswith(".txt"):
# if it's a txt file, print its name (or do whatever you want)
print(file_name)
else:
current_path = "".join((parent, "/", file_name))
if os.path.isdir(current_path):
# if we're checking a sub-directory, recursively call this method
scan_folder(current_path)
scan_folder("/example/path") # Insert parent direcotry's path
Given the following folder/file tree:
C:.
├───folder1
│ file1.txt
│ file2.txt
│ file3.csv
│
└───folder2
file4.txt
file5.txt
file6.csv
The following code will recursively locate all .txt files in the tree:
import os
import fnmatch
for path,dirs,files in os.walk('.'):
for file in files:
if fnmatch.fnmatch(file,'*.txt'):
fullname = os.path.join(path,file)
print(fullname)
Output:
.\folder1\file1.txt
.\folder1\file2.txt
.\folder2\file4.txt
.\folder2\file5.txt
Your glob() pattern is almost correct. Try one of these:
file_open = glob.glob("home/....../*/*.txt")
file_open = glob.glob("home/....../folder*/*.txt")
The first one will examine all of the text files in any first-level subdirectory of home/......, whatever that is. The second will limit itself to subdirectories named like "folder1", "folder2", etc.
I don't speak R, but this might translate your code:
for filename in glob.glob("......../main/*/*.txt"):
with open(filename) as file_handle:
for line in file_handle:
# perform data on each line of text
I think nice way to do that would be to use os.walk. That will generate tree and you can then iterate through that tree.
import os
directory = './'
for d in os.walk(directory):
print(d)
This code will look for all directories inside a directory, printing out the names of all files found there:
#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: print filenames one level down from starting folder
#--------*---------*---------*---------*---------*---------*---------*---------*
import os, fnmatch, sys
def find_dirs(directory, pattern):
for item in os.listdir(directory):
if os.path.isdir(os.path.join(directory, item)):
if fnmatch.fnmatch(item, pattern):
filename = os.path.join(directory, item)
yield filename
def find_files(directory, pattern):
for item in os.listdir(directory):
if os.path.isfile(os.path.join(directory, item)):
if fnmatch.fnmatch(item, pattern):
filename = os.path.join(directory, item)
yield filename
#--------*---------*---------*---------*---------*---------*---------*---------#
while True:# M A I N L I N E #
#--------*---------*---------*---------*---------*---------*---------*---------#
# # Set directory
os.chdir("C:\\Users\\Mike\\\Desktop")
for filedir in find_dirs('.', '*'):
print ('Got directory:', filedir)
for filename in find_files(filedir, '*'):
print (filename)
sys.exit() # END PROGRAM
pathlib is a good choose
from pathlib import Path
# or use: glob('**/*.txt')
for txt_path in [_ for _ in Path('demo/test_dir').rglob('*.txt') if _.is_file()]:
print(txt_path.absolute())
I have a folder (not zipped) containing multiple zip files (no other file type within folder). Each zip has the same type of text files containing different data saved within.
I know how to read in each separately, but I am looking to loop the process without having to type in each zip name. The zipfile archive does not seem to allow wild cards, so I cannot loop using this method. Is it possible to loop the process using glob?
The goal is to get the agency names without extracting all the zipfiles.
Single file read
import os
os.listdir('C:\\NTM\\Test\\')
['00003_32_332.zip', '00011_273_569.zip', '00012_258_276.zip']
import glob
glob.glob('C:\\NTM\\Test\\*.zip')
['C:\\NTM\\Test\\00003_32_332.zip', 'C:\\NTM\\Test\\00011_273_569.zip', 'C:\\NTM\\Test\\00012_258_276.zip']
import zipfile
archive=zipfile.ZipFile('C:\\NTM\\Test\\00011_273_569.zip')
testagency=archive.open('agency.txt')
testagency.read()
'agency_id,agency_name,nVRT,ValleyRide'
Update:
Now, that I can loop through the zip files and loop through to get the text file - I cannot print the agency_name from all of the zip files in the folder. My current code only prints the name of the last agency from the text file of the last zip file in the folder. Am I missing some compound statement structure?
def csv_dict_reader(file_obj):
reader=csv.DictReader(file_obj, delimiter=',')
for row in reader:
print(row['agency_name'])
if name == 'main':
with archive.open('agency.txt')as f_obj:
csv_dict_reader(f_obj)
Whatcom Transportation Authority
Sample Code
import glob
import zipfile
dirName = '/backup/'
zipList = glob.glob(diName+'*.zip')
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
archive.close()
Thanks Jean-Francois!
for archive_name in glob.glob('C:\\NTM\\Test\\*.zip'):
archive=zipfile.ZipFile(archive_name)
testagency=archive.open('agency.txt')
testagency.read()
As I could not comment on Fuji Komalans comment.
Here is the fixed code.
import glob
import zipfile
dirName = 'C:/test/'
zipList = glob.glob(dirName + '*.zip')
print(zipList)
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
print(fileName)
archive.close()