I iterate through the directories and want to find all zip files and add them to download_all.zip
I am sure there are zip files, but Python doesn't recognize those zip files as zip files. Why is that?
my code:
os.chdir(boardpath)
# zf = zipfile.ZipFile('download_all.zip', mode='w')
z = zipfile.ZipFile('download_all.zip', 'w') #creating zip download_all.zip file
for path, dirs, files in os.walk(boardpath):
for file in files:
print file
if file.endswith('.zip'): # find all zip files
print ('adding', file)
z.write(file) # error shows: doesn't file is a str object, not a zip file
z.close()
z = zipfile.ZipFile("download_all.zip")
z.printdir()
I tried:
file.printdir()
# I got the following error: AttributeError: 'str' object has no attribute 'printdir'
zipfile.Zipfile.write(name), name actually stands for full file path, not just filename.
import os #at the top
if file.endswith('.zip'): # find all zip files
filepath = os.path.join(path, file)
print ('adding', filepath)
z.write(filepath) # no error
As stated in the ZipFile.write's doc, the filename argument must be relative to the archive root. So the following line:
z.write(file)
Should be:
z.write(os.path.relpath(os.path.join(path, file)))
The files that os/walk() yields are lists of filenames. These filenames are just strings (which don't have a printdir() method).
You want to use the context management while opening up the zip file archive and writing to it for each file that you find, hence the use of with. In addition, since you're walking through a directory structure, you need to full qualify each file's path.
import os
import Zipfile
with zipfile.ZipFile('download_all.zip', 'w') as zf:
for path, dirs, files in os.walk('/some_path'):
for file in files:
if file.endswith('.zip'):
zf.write(os.path.join(path, file))
Related
I need to extract some files inside a directory in a zip file.
The main problem is that I want to extract only the contents from this directory, not the directory itself with all the files inside.
I've tried by iterating on them using namelist() or tweaking it with zipfile.Path(), unsuccessfully.
This works but it extracts the directory with the files (like extractall() does). Path doesn't work because raises KeyError saying that the item doesn't exist yet it does.
for zip_file in zip_files:
with zipfile.ZipFile(os.path.join(home_path, zip_file), 'r') as zip_ref:
files = [n for n in zip_ref.namelist()]
zip_ref.extractall(os.path.join(home_path, 'dir'), members=files)
written from my mobile but I expect it to work:
from pathlib import Path
with ZipFile(zipfile_path, "r") as zf:
for f in zf.namelist():
if f.startswith('/'):
continue
source = zf.open(f)
target = open(target_dir / Path(f).name, "wb")
with source, target:
shutil.copyfileobj(source, target)
I have a folder (not zipped) containing multiple zip files (no other file type within folder). Each zip has the same type of text files containing different data saved within.
I know how to read in each separately, but I am looking to loop the process without having to type in each zip name. The zipfile archive does not seem to allow wild cards, so I cannot loop using this method. Is it possible to loop the process using glob?
The goal is to get the agency names without extracting all the zipfiles.
Single file read
import os
os.listdir('C:\\NTM\\Test\\')
['00003_32_332.zip', '00011_273_569.zip', '00012_258_276.zip']
import glob
glob.glob('C:\\NTM\\Test\\*.zip')
['C:\\NTM\\Test\\00003_32_332.zip', 'C:\\NTM\\Test\\00011_273_569.zip', 'C:\\NTM\\Test\\00012_258_276.zip']
import zipfile
archive=zipfile.ZipFile('C:\\NTM\\Test\\00011_273_569.zip')
testagency=archive.open('agency.txt')
testagency.read()
'agency_id,agency_name,nVRT,ValleyRide'
Update:
Now, that I can loop through the zip files and loop through to get the text file - I cannot print the agency_name from all of the zip files in the folder. My current code only prints the name of the last agency from the text file of the last zip file in the folder. Am I missing some compound statement structure?
def csv_dict_reader(file_obj):
reader=csv.DictReader(file_obj, delimiter=',')
for row in reader:
print(row['agency_name'])
if name == 'main':
with archive.open('agency.txt')as f_obj:
csv_dict_reader(f_obj)
Whatcom Transportation Authority
Sample Code
import glob
import zipfile
dirName = '/backup/'
zipList = glob.glob(diName+'*.zip')
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
archive.close()
Thanks Jean-Francois!
for archive_name in glob.glob('C:\\NTM\\Test\\*.zip'):
archive=zipfile.ZipFile(archive_name)
testagency=archive.open('agency.txt')
testagency.read()
As I could not comment on Fuji Komalans comment.
Here is the fixed code.
import glob
import zipfile
dirName = 'C:/test/'
zipList = glob.glob(dirName + '*.zip')
print(zipList)
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
print(fileName)
archive.close()
I am trying to print the file names in my directory using the following code:
dir_path = 'D:/#/#/#/#.json'
for filename in os.listdir(dir_path):
print filename
f = open(os.path.join(dir_path,filename), 'r')
but this error is displayed when running the code:
for filename in os.listdir(dir_path):
WindowsError: [Error 267] The directory name is invalid: 'D:/#/#/#/#.json/*.*'
I am unsure what the json/*.* means in the error message, I am new to Python so apologies if this question is vague.
Your question is unclear as to whether the directory D:/#/#/# contains any files in it other than JSON files, so I shall give two answers. Hopefully one of them will apply to you:
Directory contains only JSON files
In that case, simply remove the /#.json from the end of dir_path:
dir_path = 'D:/#/#/#'
for filename in os.listdir(dir_path):
print filename
f = open(os.path.join(dir_path,filename), 'r')
Directory contains JSON files and other files that you want to exclude
In this situation it's best to use the Python glob module.
The following should list all of the .json files in the folder D:/#/#/#:
import glob
dir_path = 'D:/#/#/#/*.json'
for filename in glob.glob(dir_path):
print filename
f = open(filename, 'r')
Note that filenames returned by glob.glob include the directory path, so we don't use os.path.join on them.
os.listdir lists all files in the directory you give it. However it seems you're not passing it the name of a directory, you're passing it the name of a file. How can it possibly list all the files in a directory if you don't give it a directory?
Let's say the directory is /Home/Documents/Test_files.
I would like to create a zip file of all the files ending with ".json" and if possible delete the files so that only the zip file is left
So far I have been able to create a zip file of all the files in the given path but when I use the line zipf.write(file) it throws the error "[Errno 2] No such file or directory: sample.json". However when I use zipf.write(os.path.join(root, file)) it does write the files but also the whole directory path which I don't want.
I just want to write the files themselves. When I use print file the correct files seemed to be printed so I don't know why I get the error that the file doesn't exist
Currently my code looks like this:
def create_zip(path,zipf):
#path is the directory address (i.e. /Home/Documents/Test_files)
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".json"):
print file
zipf.write(os.path.join(root, file))
#zipf.write(file)
I would also like to remove/delete the files after creating the zip file to save space.
Any help as to why this is happening would be appreciated!
You can chdir before adding it to zip file not to include the whole directory path and use os.remove to delete the files afterwards:
def create_zip(path,zipf):
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".json"):
chdir(root)
zipf.write(file)
os.remove(file)
If you're using Python's ZipFile module, you can just specify the argument
arcname = archive name
in the write() method, as in:
import os
from zipfile import ZipFile
def create_zip(path,zipf):
#path is the directory address (i.e. /Home/Documents/Test_files)
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".json"):
print file
zipf.write(os.path.join(root, file), arcname=file)
os.remove(os.path.join(root, file))
I tried to track the file with server in the filename and i can print all the file in directory with server** but when I try to read the file it gives me error" saying:
Traceback (most recent call last):
File "view_log_packetloss.sh", line 27, in <module>
with open(filename,'rb') as files:
IOError: [Errno 2] No such file or directory: 'pcoip_server_2014_05_19_00000560.txt'
I have seen similar question being asked but I could not fix mine, some error were fixed using chdir to change the current directory to the file directory. Any help is appreciated. Thank you
#!usr/bin/env/ python
import sys, re, os
#fucntion to find the packetloss data in pcoip server files
def function_pcoip_packetloss(filename):
lineContains = re.compile('.*Loss=.*') #look for "Loss=" in the file
for line in filename:
if lineContains.match(line): #check if line matches "Loss="
print 'The file has: ' #prints if "Loss=" is found
print line
return 0;
for root, dirs, files in os.walk("/users/home10/tshrestha/brb-view/logs/vdm-sdct-agent/pcoip-logs"):
lineContainsServerFile = re.compile('.*server.*')
for filename in files:
if lineContainsServerFile.match(filename):
with open(filename,'rb') as files:
print 'filename'
function_pcoip_packetloss(filename);
the files are names of file objects in root directory.
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
try this
for root, dirs, files in os.walk("/users/home10/tshrestha/brb-view/logs/vdm-sdct-agent/pcoip-logs"):
lineContainsServerFile = re.compile('.*server.*')
for filename in files:
if lineContainsServerFile.match(filename):
filename = os.path.join(root, filename)
with open(filename,'rb') as files:
print 'filename:', filename
function_pcoip_packetloss(filename);
The os.walk() function is a generator of 3-element tuples. Each tuple contains a directory as its first element. The second element is a list of subdirectories in that directory, and the third is a list of the files.
To generate the full path to each file it is necessary to concatenate the first entry (the directory path) and the filenames from the third entry (the files). The most straightforward and platform-agnostic way to do so uses os.path.join().
Also note that it will be much more efficient to use
lineContainsServerFile = re.compile('server')
and lineContainsServerFile.search() rather than trying to match a wildcard string. Even in the first case the trailing ".* is redundant, since what follows the "server" string is irrelevant.