Python: Getting files into an archive without the directory? - python

I've been learning python for about 3 weeks now, and I'm currently trying to write a little script for sorting files (about 10.000) by keywords and date appearing in the filename. Files before a given date should be added to an archive. The sorting works fine, but not the archiving
It creates an archive - the name is fine - but in the archive is the complete path to the files.
If i open it, it looks like: folder1 -> folder2 -> folder3 -> files.
How can I change it such that the archive only contains the files and not the whole structure?
Below is a snippet with my zip function, node is the path where the files were before sorting, folder is a subfolder with the files sorted by a keyword in the name, items are the folders with files sorted by date.
I am using Python 2.6
def ZipFolder(node, zipdate):
xynode = node + '/xy'
yznode = node + '/yz'
for folder in [xynode,yznode]:
items = os.listdir(folder)
for item in items:
itemdate = re.findall('(?<=_)\d\d\d\d-\d\d', item)
print item
if itemdate[0] <= zipdate:
arcname = str(item) + '.zip'
x = zipfile.ZipFile(folder + '/' + arcname, mode='w', compression = zipfile.ZIP_DEFLATED)
files = os.listdir(folder + '/' + item)
for f in files:
x.write(folder + '/' + item + '/' + f)
print 'writing ' + str(folder + '/' + item + '/' + f) + ' in ' + str(item)
x.close()
shutil.rmtree(folder + '/' + item)
return
I am also open to any suggestions and improvements.

From help(zipfile):
| write(self, filename, arcname=None, compress_type=None)
| Put the bytes from filename into the archive under the name
| arcname.
So try changing your write() call with:
x.write(folder + '/' + item + '/' + f, arcname = f)
About your code, it seems to me good enough, especially for a 3 week pythonist, although a few comments would have been welcomed ;-)

Related

Split large directory into chunks of files

I have a directory structure with a lot of files in it (~1 million) which I would like to zip into chunks of 10k files. So far I have this, which creates, well, garbage files-- when I unzip them it looks like all of the files are glommed into one long file instead of individual files--- and I'm stuck. Any help would be greatly appreciated.
dirctr = 1
for root, dirs, files in os.walk(args.input_dir, followlinks=False):
counter = 1
curtar= args.output_dir + 'File' + str(dirctr) + '.gz'
tar = tarfile.open(name=curtar, mode="w:gz")
for filename in files:
if ((counter -1) % args.files_per_dir) == 0:
if tarfile.is_tarfile(curtar):
tar.close(curtar)
dirctr = dirctr + 1
curtar= args.output_dir + 'File' + str(dirctr) + '.gz'
tar.open(name=curtar, mode="w:gz")
tar.add(os.path.join(root,filename))
counter = counter + 1
tar.close(curtar)

Generating instances of duplicate files

I'm trying to edit a script I've previously written to generate .lab files (or basically txt files with a .lab extension) as part of a project I'm currently working on. Specifically, what I'm currently working on is to edit the script such that it will be able to handle duplicate filenames. Example: if a file named filename already exists, instead of appending or overwriting the existing file, it would instead create a new file named filename_1.
The script is written as shown below:
for line in reader:
utterance = line[4]
path = line[1].split("/")
folder_name = path[len(path) - 2]
file_name = path[len(path) - 1].split(".")[0]
duplicate_num = 0
# Generate the .lab file
try:
if os.path.isfile(truncated_audio_dir + "/" + folder_name + "/" + file_name + "_cut.lab"):
new_file_name = file_name + "_" + str(duplicate_num)
while os.path.isfile(truncated_audio_dir + "/" + folder_name + "/" + new_file_name + "_cut.lab"):
duplicate_num += 1
new_file_name = file_name + "_" + str(duplicate_num)
print("New File Name: ", new_file_name)
outfile = open(truncated_audio_dir + "/" + folder_name + "/" + file_name + "_cut.lab", "a")
outfile.write(utterance)
outfile.close()
elif not os.path.isfile(truncated_audio_dir + "/" + folder_name + "/" + file_name + "_cut.lab"):
print("Is File")
outfile = open(truncated_audio_dir + "/" + folder_name + "/" + file_name + "_cut.lab", "a")
outfile.write(utterance)
outfile.close()
except FileNotFoundError:
print(truncated_audio_dir + "/" + folder_name + "/" + file_name + "_cut.lab" + " not found")
continue
The issue is that when I try and run the script, the issues I'd been experiencing beforehand still seem to persist. Particularly, the case I'd written to handle duplicates does not seem to trigger at all, instead the program keeps throwing FileNotFoundError exceptions (which I'd originally written to handle the case if there was a directory that didn't exist). I'm suspecting that the FileNotFoundError exception handling I'd originally written is causing the issue, but maybe there may be something else that I may not be aware of. Any help would be gladly appreciated.
(the above code is a majority of the script, but not the complete script; I imported sys, csv, and os and reader refers to a csv that I am reading from)

How to unzip all files from the same filetype with python

I want to extract all files that have the same filetype from a zip file.
I have this code:
from zipfile import ZipFile
counter = 0
with ZipFile('Video.zip', 'r') as zipObject:
listOfFileNames = zipObject.namelist()
for fileName in listOfFileNames:
if fileName.endswith('.MXF'):
zipObject.extract(fileName, 'Greenscreen')
print('File ' + str(counter) + ' extracted')
counter += 1
print('All ' + str(counter) + ' files extraced')
The problem is that the zip file also has multiple sub-folders with the required .MXF files in them.
Thus after running the script my Greenscreen folder also shows all sub-folders like this:
But i just need the files of the same file-type. So it should look like this:

A simple python script to 'search text files for whole words' - with GUI

I am currently building a small program that allows searching for phrases in actors' dialog, using transcribed text files from video clips. I run into a few issues as described below...
Create user input:
# Get the SEARCH WINDOW
root = tk.Tk()
root.withdraw()
root.option_add('*background', '#111111')
root.option_add('*Entry*background', '#999999')
searchPhrase = sd.askstring(
"PhraseFinder v0.1 | filmwerk.nyc 2021 ", "Type keyword, or entire phrase, to search...", parent=root,)>
This seems to work fine. User input stored in searchPhrase...
Take user input from above (searchPhrase) and search a directory containing 800 text files ('whole word' search only - 'ignore case').
# Do THE SEARCH, based on user input
import glob
import os
rootDir = '/Volumes/audio/TRANSCRIBE/OUT'
os.chdir( rootDir )
for files in glob.glob( "*.txt" ):
with open(files) as f:
contents = f.read()
if (re.search(r'\b'+ re.escape(searchPhrase) + r'\b', contents, re.IGNORECASE)):
print( f )
This outputs:
<_io.TextIOWrapper name='FW_A01_2020-12-01_1856_C0004.txt' mode='r' encoding='US-ASCII'>
<_io.TextIOWrapper name='FW_A01_2020-12-01_1900_C0007.txt' mode='r' encoding='US-ASCII'>
The search result is correct, but the output format is not what I expected. So I need to rename stuff here. Unless there's a better way to get (print) the results? Currently, this gets output by print( f ).
The only thing I need from this output is to grab the actual file name:
FW_A01_2020-12-01_1856_C0004.txt and FW_A01_2020-12-01_1900_C0007.txt.
Then I need to rename & add the full path and finally store those search results files (clip list) in a continuous list, formatted like this:
> '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1806_C0001/FW_A01_2020-12-01_1806_C0001_000000.dng', '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1806_C0001/FW_A01_2020-12-01_1806_C0001_000000.dng',
Rename the 'search result' filenames (and add the full path), then store them in a variable. Since I don't know (yet) how to pipe in my actual search results into this function, I'll get the rootDir instead to perform the 'rename' as a test.
for currentFile in listofFiles:
listofFiles = listdir(rootDir)
for currentFile in listofFiles:
sourceFile = rootDir + "/" + currentFile
mainNameEnd = currentFile.find('.')
newFileName = currentFile[:mainNameEnd] + '_000000.dng'
dirLoc = currentFile[:mainNameEnd]
fullPathName = "'" + mediaDir + project.GetName() + "/" + "footage" + "/" + dirLoc + "/" + newFileName + "'" + "," + " "
print("Converting path name: " + fullPathName)
This outputs:
Converting path name: '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1806_C0001/FW_A01_2020-12-01_1806_C0001_000000.dng',
Converting path name: '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1812_C0003/FW_A01_2020-12-01_1812_C0003_000000.dng',
Converting path name: '/Volumes/RAID/Data/Media/TWO_CHAIRS/footage/FW_A01_2020-12-01_1856_C0004/FW_A01_2020-12-01_1856_C0004_000000.dng',
Great, exactly the output format I need. However, this only works with files found in rootDir. What I really need is to grab the 'search result' clip list and rename those files the same way. Also, the clip list needs to be a continuous line as shown earlier.
Once that's working I'll use the reformated clip list in the function below. This will then import the clips into an external app.
# Import clips from Search Result
# We insert the search_result_clip_list, separated by comma.
clips = resolve.GetMediaStorage().AddItemsToMediaPool(search_result_clip_list) # <-- clip list goes here
print(search_result_clip_list)
In a nutshell, I can't figure out how to take my search results, create a list, and finally use that list in the function above.
Would someone know how to implement this properly?
python 3.6.8 | MacOS 10.13.2 | Davinci Resolve 15
Real file name is in variable files and you should simply use
print(files)
In f you have file-object which reads data from file - not file name - and eventually you could use
print( f.name )
but I would prefer first version.
EDIT:
If you want to keep all filenames which match regex then you should use list.
Before loop create searchResult = [] and inside loop use searchResult.append( files )
searchResult = []
for files in glob.glob( "*.txt" ):
# ... code ...
if (re.search(r'\b'+ re.escape(searchPhrase) + r'\b', contents, re.IGNORECASE)):
print( files )
searchResult.append( files )
To get all the names in the same list:
You can use an empty list and add items to it in each loop like this:
my_names_list = []
for currentFile in listofFiles:
sourceFile = rootDir + "/" + currentFile
mainNameEnd = currentFile.find('.')
newFileName = currentFile[:mainNameEnd] + '_000000.dng'
dirLoc = currentFile[:mainNameEnd]
fullPathName = "'" + mediaDir + project.GetName() + "/" + "footage" + "/" + dirLoc + "/" + newFileName + "'" + "," + " "
print("Converting path name: " + fullPathName)
my_names_list.append(fullPathName)
You will get a list with all the names as its items.
Respect of this: However, this only works with files found in rootDir I don't really get what you want, try to be more specific.

Python copy files to a new directory and rename if file name already exists

I've already read this thread but when I implement it into my code it only works for a few iterations.
I'm using python to iterate through a directory (lets call it move directory) to copy mainly pdf files (matching a unique ID) to another directory (base directory) to the matching folder (with the corresponding unique ID). I started using shutil.copy but if there are duplicates it overwrites the existing file.
I'd like to be able to search the corresponding folder to see if the file already exists, and iteratively name it if more than one occurs.
e.g.
copy file 1234.pdf to folder in base directory 1234.
if 1234.pdf exists to name it 1234_1.pdf,
if another pdf is copied as 1234.pdf then it would be 1234_2.pdf.
Here is my code:
import arcpy
import os
import re
import sys
import traceback
import collections
import shutil
movdir = r"C:\Scans"
basedir = r"C:\Links"
try:
#Walk through all files in the directory that contains the files to copy
for root, dirs, files in os.walk(movdir):
for filename in files:
#find the name location and name of files
path = os.path.join(root, filename)
print path
#file name and extension
ARN, extension = os.path.splitext(filename)
print ARN
#Location of the corresponding folder in the new directory
link = os.path.join(basedir,ARN)
# if the folder already exists in new directory
if os.path.exists(link):
#this is the file location in the new directory
file = os.path.join(basedir, ARN, ARN)
linkfn = os.path.join(basedir, ARN, filename)
if os.path.exists(linkfn):
i = 0
#if this file already exists in the folder
print "Path exists already"
while os.path.exists(file + "_" + str(i) + extension):
i+=1
print "Already 2x exists..."
print "Renaming"
shutil.copy(path, file + "_" + str(i) + extension)
else:
shutil.copy(path, link)
print ARN + " " + "Copied"
else:
print ARN + " " + "Not Found"
Sometimes it is just easier to start over... I apologize if there is any typo, I haven't had the time to test it thoroughly.
movdir = r"C:\Scans"
basedir = r"C:\Links"
# Walk through all files in the directory that contains the files to copy
for root, dirs, files in os.walk(movdir):
for filename in files:
# I use absolute path, case you want to move several dirs.
old_name = os.path.join( os.path.abspath(root), filename )
# Separate base from extension
base, extension = os.path.splitext(filename)
# Initial new name
new_name = os.path.join(basedir, base, filename)
# If folder basedir/base does not exist... You don't want to create it?
if not os.path.exists(os.path.join(basedir, base)):
print os.path.join(basedir,base), "not found"
continue # Next filename
elif not os.path.exists(new_name): # folder exists, file does not
shutil.copy(old_name, new_name)
else: # folder exists, file exists as well
ii = 1
while True:
new_name = os.path.join(basedir,base, base + "_" + str(ii) + extension)
if not os.path.exists(new_name):
shutil.copy(old_name, new_name)
print "Copied", old_name, "as", new_name
break
ii += 1
I always use the time-stamp - so its not possible, that the file exists already:
import os
import shutil
import datetime
now = str(datetime.datetime.now())[:19]
now = now.replace(":","_")
src_dir="C:\\Users\\Asus\\Desktop\\Versand Verwaltung\\Versand.xlsx"
dst_dir="C:\\Users\\Asus\\Desktop\\Versand Verwaltung\\Versand_"+str(now)+".xlsx"
shutil.copy(src_dir,dst_dir)
For me shutil.copy is the best:
import shutil
#make a copy of the invoice to work with
src="invoice.pdf"
dst="copied_invoice.pdf"
shutil.copy(src,dst)
You can change the path of the files as you want.
I would say you have an indentation problem, at least as you wrote it here:
while not os.path.exists(file + "_" + str(i) + extension):
i+=1
print "Already 2x exists..."
print "Renaming"
shutil.copy(path, file + "_" + str(i) + extension)
should be:
while os.path.exists(file + "_" + str(i) + extension):
i+=1
print "Already 2x exists..."
print "Renaming"
shutil.copy(path, file + "_" + str(i) + extension)
Check this out, please!
import os
import shutil
import glob
src = r"C:\Source"
dest = r"C:\Destination"
par = "*"
i=1
d = []
for file in glob.glob(os.path.join(src,par)):
f = str(file).split('\\')[-1]
for n in glob.glob(os.path.join(dest,par)):
d.append(str(n).split('\\')[-1])
if f not in d:
print("copied",f," to ",dest)
shutil.copy(file,dest)
else:
f1 = str(f).split(".")
f1 = f1[0]+"_"+str(i)+"."+f1[1]
while f1 in d:
f1 = str(f).split(".")
f1 = f1[0]+"_"+str(i)+"."+f1[1]
print("{} already exists in {}".format(f1,dest))
i =i + 1
shutil.copy(file,os.path.join(dest,f1))
print("renamed and copied ",f1 ,"to",dest)
i = 1

Categories