How to extract a file within a folder within a zip? - python

I need to extract a file called Preview.pdf from a folder called QuickLooks inside of a zip file.
Right now my code looks a little like this:
with ZipFile(newName, 'r') as newName:
newName.extract(\QuickLooks\Preview.pdf)
newName.close()
(In this case, newName has been set equal to the full path to the zip).
It's important to note that the backslash is correct in this case because I'm on Windows.
The code doesn't work; here's the error it gives:
Traceback (most recent call last):
File "C:\Users\Asit\Documents\Evam\Python_Scripts\pageszip.py", line 18, in <module>
ZF.extract("""QuickLooks\Preview.pdf""")
File "C:\Python33\lib\zipfile.py", line 1019, in extract
member = self.getinfo(member)
File "C:\Python33\lib\zipfile.py", line 905, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'QuickLook/Preview.pdf' in the archive"
I'm running the Python script from inside Notepad++, and taking the output from its console.
How can I accomplish this?
Alternatively, how could I extract the whole QuickLooks folder, move out Preview.pdf, and then delete the folder and the rest of it's contents?
Just for context, here's the rest of the script. It's a script to get a PDF of a .pages file. I know there are bonified converters out there; I'm just doing this as an excercise with some sort of real-world application.
import os.path
import zipfile
from zipfile import *
import sys
file = raw_input('Enter the full path to the .pages file in question. Please note that file and directory names cannot contain any spaces.')
dir = os.path.abspath(os.path.join(file, os.pardir))
fileName, fileExtension = os.path.splitext(file)
if fileExtension == ".pages":
os.chdir(dir)
print (dir)
fileExtension = ".zip"
os.rename (file, fileName + ".zip")
newName = fileName + ".zip" #for debugging purposes
print (newName) #for debugging purposes
with ZipFile(newName, 'w') as ZF:
print("I'm about to list names!")
print(ZF.namelist()) #for debugging purposes
ZF.extract("QuickLook/Preview.pdf")
os.rename('Preview.pdf', fileName + '.pdf')
finalPDF = fileName + ".pdf"
print ("Check out the PDF! It's located at" + dir + finalPDF + ".")
else:
print ("Sorry, this is not a valid .pages file.")
sys.exit
I'm not sure if the import of Zipfile is redundant; I read on another SO post that it was better to use from zipfile import * than import zipfile. I wasn't sure, so I used both. =)
EDIT: I've changed the code to reflect the changes suggested by Blckknght.

Here's something that seems to work. There were several issues with your code. As I mentioned in a comment, the zipfile must be opened with mode 'r' in order to read it. Another is that zip archive member names always use forward slash / characters in their path names as separators (see section 4.4.17.1 of the PKZIP Application Note). It's important to be aware that there's no way to extract a nested archive member to a different subdirectory with Python's currentzipfilemodule. You can control the root directory, but nothing below it (i.e. any subfolders within the zip).
Lastly, since it's not necessary to rename the .pages file to .zip — the filename you passZipFile() can have any extension — I removed all that from the code. However, to overcome the limitation on extracting members to a different subdirectory, I had to add code to first extract the target member to a temporary directory, and then copy that to the final destination. Afterwards, of course, this temporary folder needs to deleted. So I'm not sure the net result is much simpler...
import os.path
import shutil
import sys
import tempfile
from zipfile import ZipFile
PREVIEW_PATH = 'QuickLooks/Preview.pdf' # archive member path
pages_file = input('Enter the path to the .pages file in question: ')
#pages_file = r'C:\Stack Overflow\extract_test.pages' # hardcode for testing
pages_file = os.path.abspath(pages_file)
filename, file_extension = os.path.splitext(pages_file)
if file_extension == ".pages":
tempdir = tempfile.gettempdir()
temp_filename = os.path.join(tempdir, PREVIEW_PATH)
with ZipFile(pages_file, 'r') as zipfile:
zipfile.extract(PREVIEW_PATH, tempdir)
if not os.path.isfile(temp_filename): # extract failure?
sys.exit('unable to extract {} from {}'.format(PREVIEW_PATH, pages_file))
final_PDF = filename + '.pdf'
shutil.copy2(temp_filename, final_PDF) # copy and rename extracted file
# delete the temporary subdirectory created (along with pdf file in it)
shutil.rmtree(os.path.join(tempdir, os.path.split(PREVIEW_PATH)[0]))
print('Check out the PDF! It\'s located at "{}".'.format(final_PDF))
#view_file(final_PDF) # see Bonus below
else:
sys.exit('Sorry, that isn\'t a .pages file.')
Bonus: If you'd like to actually view the final pdf file from the script, you can add the following function and use it on the final pdf created (assuming you have a PDF viewer application installed on your system):
import subprocess
def view_file(filepath):
subprocess.Popen(filepath, shell=True).wait()

Related

Is this the shortest, most efficient way to write a program to move my pdf files into a new folder?

New to coding, reading some books and trying to practice. Wrote a program in python3.7 to search through a directory, find all the pdf files and move them to a new folder called 'Reading Materials'.
How could I improve on this code e.g a shorter, more concise and/or efficient script in python?
import os, re, shutil
os.chdir(r'C:\\Users\\Luke\\Documents\\coding\\python\\') #set cwd to the where I want program to run
#create regex to identify pdf files
PDFregex = re.compile(r'''^(.*?) # all text before the file extension
\.{1} #start of file extension
(pdf)$ #ending in pdf''', re.VERBOSE)
Newdir = os.mkdir('Reading Material') #make new directory for files
NewdirPath = os.path.abspath('Reading Material')
print('new directory made at : '+NewdirPath)
#search through directory for files that contain .pdf extension using regex object
for pdf in os.listdir('.'):
mo = PDFregex.search(pdf)
if mo == None: #no pdf's found by regex search
continue #bypass loop
else:
originalLoc = os.path.join(os.path.abspath('.'), pdf) #original file location
newLoc = shutil.move(originalLoc, os.path.join(NewdirPath, pdf)) #move pdf to new folder
print('Moving file "%s" moved to "%s"...' %(pdf, newLoc)) #say what's moving
os.listdir(NewdirPath)
Regexp is overkilled here. os module has various method to help you extract informations about files.
You can use splitext method in os module to find the extension.
Something like this should work :
import os
import shutil
old_dir = 'C:\\Users\\Luke\\Documents\\coding\\python\\'
new_dir = 'Reading Material'
# You should always use underscore_notations to name variables instead of CamelCase (use for ClassNames) see https://www.python.org/dev/peps/pep-0008/
os.makedirs(new_dir, exist_ok=True)
for file_path in os.listdir(old_dir):
if os.path.splitext(file_path)[1] == '.pdf':
shutil.move(file_path, '{0}\\{1}'.format(new_dir, os.path.basename(file_path)))

gzip multiple files in python

I have to compress a lot of XML files into and split them by the data in the file name, just for clarification's sake, there is a parser which collects information from XML file and then moves it to a backup folder. My code needs to gzip it according to the date in the filename and group those files in a compressed .gz file.
Please find the code bellow:
import os
import re
import gzip
import shutil
import sys
import time
#
timestr = time.strftime("%Y%m%d%H%M")
logfile = 'D:\\Coleta\\log_compactador_xml_tar'+timestr+'.log'
ptm_dir = "D:\\PTM\\monitored_programs\\"
count_files_mdc = 0
count_files_3gpp = 0
count_tar = 0
#
for subdir, dir, files in os.walk(ptm_dir):
for file in files:
path = os.path.join(subdir, file)
try:
backup_files_dir = path.split(sep='\\')[4]
parser_id = path.split(sep='\\')[3]
if re.match('backup_files_*', backup_files_dir):
if file.endswith('xml'):
# print(time.strftime("%Y-%m-%d %H:%M:%S"), path)
data_arq = file[1:14]
if parser_id in ('parser-924'):
gzip_filename_mdc = os.path.join(subdir,'E4G_PM_MDC_IP51_'+timestr+'_'+data_arq)
with open(path, 'r')as f_in, gzip.open(gzip_filename_mdc + ".gz", 'at') as f_out_mdc:
shutil.copyfileobj(f_in, f_out_mdc)
count_files_mdc += 1
f_out_mdc.close()
f_in.close()
print(time.strftime("%Y-%m-%d %H:%M:%S"), "Compressing file MDC: ",path)
os.remove(path)
except PermissionError:
print(time.strftime("%Y-%m-%d %H:%M:%S"), "Permission error on file:", fullpath, file=logfile)
pass
except IndexError:
print(time.strftime("%Y-%m-%d %H:%M:%S"), "IndexError: ", path, file=logfile)
pass
As long as I seem it creates a stream of data, then compress and write it to a new file with the specified filename. However, instead of grouping each XML file independently inside a ".gz" file, it does creates inside the "gzip" file, a big file (big stream of data?) with the same name of the output "gzip" file, but without any extension. After the files are totally compressed, it's not possible to uncompress the big file generated inside the "gzip" output file. Does someone know where is the problem with my code?
PS: I have edited the code for readability purposes.
Not sure whether the solution is still needed, but I will just leave it here for anyone who faces the same issue.
There is a way to create a gzip archive in python using tarfile, the code is quite simple:
with tarfile.open(filename, mode="w:gz") as archive:
archive.add(name=name_of_file_to_add, recursive=True)
in this case name_of_file_to_add can be a directory, in which case tarfile will add it recursively with all its contents. Obviously you will need to import the tarfile module.
If you need to add files without a directory a simple for with calls to add will do (recursive flag is not required in this case).

Move pairs of files (.txt & .xml) into their corresponding folder using Python

I have been working this challenge for about a day or so. I've looked at multiple questions and answers asked on SO and tried to 'MacGyver' the code used for my purpose, but still having issues.
I have a directory (lets call it "src\") with hundreds of files (.txt and .xml). Each .txt file has an associated .xml file (let's call it a pair). Example:
src\text-001.txt
src\text-001.xml
src\text-002.txt
src\text-002.xml
src\text-003.txt
src\text-003.xml
Here's an example of how I would like it to turn out so each pair of files are placed into a single unique folder:
src\text-001\text-001.txt
src\text-001\text-001.xml
src\text-002\text-002.txt
src\text-002\text-002.xml
src\text-003\text-003.txt
src\text-003\text-003.xml
What I'd like to do is create an associated folder for each pair and then move each pair of files into its respective folder using Python. I've already tried working from code I found (thanks to a post from Nov '12 by Sethdd, but am having trouble figuring out how to use the move function to grab pairs of files. Here's where I'm at:
import os
import shutil
srcpath = "PATH_TO_SOURCE"
srcfiles = os.listdir(srcpath)
destpath = "PATH_TO_DEST"
# grabs the name of the file before extension and uses as the dest folder name
destdirs = list(set([filename[0:9] for filename in srcfiles]))
def create(dirname, destpath):
full_path = os.path.join(destpath, dirname)
os.mkdir(full_path)
return full_path
def move(filename, dirpath):
shutil.move(os.path.join(srcpath, filename)
,dirpath)
# create destination directories and store their names along with full paths
targets = [
(folder, create(folder, destpath)) for folder in destdirs
]
for dirname, full_path in targets:
for filename in srcfile:
if dirname == filename[0:9]:
move(filename, full_path)
I feel like it should be easy, but Python isn't something I work with everyday and it's been a while since my scripting days... Any help would be greatly appreciated!
Thanks,
WK2EcoD
Use the glob module to interate all of the 'txt' files. From that you can parse and create the folders and copy the files.
The process should be as simple as it appears to you as a human.
for file_name in os.listdir(srcpath):
dir = file_name[:9]
# if dir doesn't exist, create it
# move file_name to dir
You're doing a lot of intermediate work that seems to be confusing you.
Also, insert some simple print statements to track data flow and execution flow. It appears that you have no tracing output so far.
You can do it with os module. For every file in directory check if associated folder exists, create if needed and then move the file. See the code below:
import os
SRC = 'path-to-src'
for fname in os.listdir(SRC):
filename, file_extension = os.path.splitext(fname)
if file_extension not in ['xml', 'txt']:
continue
folder_path = os.path.join(SRC, filename)
if not os.path.exists(folder_path):
os.mkdir(folderpath)
os.rename(
os.path.join(SRC, fname),
os.path.join(folder_path, fname)
)
My approach would be:
Find the pairs that I want to move (do nothing with files without a pair)
Create a directory for every pair
Move the pair to the directory
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os, shutil
import re
def getPairs(files):
pairs = []
file_re = re.compile(r'^(.*)\.(.*)$')
for f in files:
match = file_re.match(f)
if match:
(name, ext) = match.groups()
if ext == 'txt' and name + '.xml' in files:
pairs.append(name)
return pairs
def movePairsToDir(pairs):
for name in pairs:
os.mkdir(name)
shutil.move(name+'.txt', name)
shutil.move(name+'.xml', name)
files = os.listdir()
pairs = getPairs(files)
movePairsToDir(pairs)
NOTE: This script works when called inside the directory with the pairs.

rename a file and only modify after dot

I'm trying to modify a test.tar.gz into test.tgz but it dosn't work. Here is the command:
temporalFolder= /home/albertserres/*.tar.gz
subprocess.call(["mv",temporalFolder,"*.tgz"])
It sends me error that the file doesn't exist. Why?
Also I just need to modify after the dot, not the entire name, because I'll probably doesn't know the file name, and if I do *.tgz it rename the file *.tgz and I want to keep the original name.
This should work:
import shutil
orig_file = '/home/albertserres/test.tar.gz'
new_file = orig_file.replace('tar.gz', 'tgz')
shutil.move(orig_file, new_file)
And if you want to do that for several files:
import shutil
import glob
for orig_file in glob.glob('/home/albertserres/*.tar.gz'):
new_file = orig_file.replace('tar.gz', 'tgz')
shutil.move(orig_file, new_file)
rename would probably be easier.
rename 's/\.tar\.gz/\.tgz/' *.tar.gz
In your case
params = "rename 's/\.tar\.gz/\.tgz/' /home/albertserres/*.tar.gz"
subprocess.call(params, shell=True)
To replace all .tar.gz file extensions with .tgz file extensions in a given directory (similar to #hitzg's answer):
#!/usr/bin/env python
from glob import glob
for filename in glob(b'/home/albertserres/*.tar.gz'):
new = bytearray(filename)
new[-len(b'tar.gz'):] = b'tgz'
os.rename(filename, new) # or os.replace() for portability
The code replaces tar.gz only at the end of the name. It raises an error if new is an existing directory otherwise it silently replaces the file on Unix.

Editing file names and saving to new directory in python

I would like to edit the file name of several files in a list of folders and export the entire file to a new folder. While I was able to rename the file okay, the contents of the file didn't migrate over. I think I wrote my code to just create a new empty file rather than edit the old one and move it over to a new directory. I feel that the fix should be easy, and that I am missing a couple of important lines of code. Below is what I have so far:
import libraries
import os
import glob
import re
directory
directory = glob.glob('Z:/Stuff/J/extractions/test/*.fsa')
The two files in the directory look like this when printed out
Z:/Stuff/J/extractions/test\c2_D10.fsa
Z:/Stuff/J/extractions/test\c3_E10.fsa
for fn in directory:
print fn
this script was designed to manipulate the file name and export the manipulated file to a another folder
for fn in directory:
output_directory = 'Z:/Stuff/J/extractions/test2'
value = os.path.splitext(os.path.basename(fn))[0]
matchObj = re.match('(.*)_(.*)', value, re.M|re.I)
new_fn = fn.replace(str(matchObj.group(0)), str(matchObj.group(2)) + "_" + str(matchObj.group(1)))
base = os.path.basename(new_fn)
v = open(os.path.join(output_directory, base), 'wb')
v.close()
My end result is the following:
Z:/Stuff/J/extractions/test2\D10_c2.fsa
Z:/Stuff/J/extractions/test2\E10_c3.fsa
But like I said the files are empty (0 kb) in the output_directory
As Stefan mentioned:
import shutil
and replace:
v = open(os.path.join(output_directory, base), 'wb')
v.close()
with:
shutil.copyfile (fn, os.path.join(output_directory, base))
If I'am not wrong, you are only opening the file and then you are immediately closing it again?
With out any writing to the file it is surely empty.
Have a look here:
http://docs.python.org/2/library/shutil.html
shutil.copyfile(src, dst) ;)

Categories