I have tried to zip log files that I have created, but nothing is written!
Code:
def del_files():
'''Adapted from answer # http://stackoverflow.com/questions/7217196/python-delete-old-files by jterrace'''
dir_to_search = os.path.join(os.getcwd(),'log')
for dirpath, dirnames, filenames in os.walk(dir_to_search):
for file in filenames:
curpath = os.path.join(dirpath, file)
log(curpath)
if curpath != path:
log("Archiving old log files...")
with zipfile.ZipFile("log_old.zip", "w") as ZipFile:
ZipFile.write(curpath)
ZipFile.close()
log("archived")
For one thing, you are overwriting the output zip file on each iteration:
with zipfile.ZipFile("log_old.zip", "w") as ZipFile:
mode "w" means to create a new file, or truncate an existing file. Probably you mean to append to the zip file, in which case it can be opened for append by using mode "a". Or you could open the zip file outside of the outer for loop.
Your code should result in log_old.zip containing a single file - the last one found by os.walk().
Opening the archive outside of the main loop is better since the file will only be opened once, and it will be closed automatically because of the context manager (with):
with zipfile.ZipFile("log_old.zip", "w") as zf:
dir_to_search = os.path.join(os.getcwd(), 'log')
for dirpath, dirnames, filenames in os.walk(dir_to_search):
for file in filenames:
curpath = os.path.join(dirpath, file)
zf.write(curpath)
Related
Is there a way to not generate folders during zip? When I extract the zip, it needs to show all the files directly without accessing a folder.
file_paths = utils.get_all_file_paths(path)
with ZipFile("{}/files.zip".format(path), "w") as zip:
for file in file_paths:
zip.write(file, os.path.basename(file))
I already tried arcname but it will still generate a folder which is files.
EDIT:
My code above will already remove the parent folder. Right now, when I extract the zip file, it will show a folder first with a name same as the zip name. What I want is to zip all the files and when I extract it, it will show all the files directly. Basically, no folders must show during extraction.
I hope following example will helpful to you
import os
import zipfile
TARGET_DIRECTORY = "../test"
ZIPFILE_NAME = "CompressedDir.zip"
def zip_dir(directory, zipname):
if os.path.exists(directory):
outZipFile = zipfile.ZipFile(zipname, 'w', zipfile.ZIP_DEFLATED)
for dirpath, dirnames, filenames in os.walk(directory):
for filename in filenames:
filepath = os.path.join(dirpath, filename)
outZipFile.write(filepath)
outZipFile.close()
if __name__ == '__main__':
zip_dir(TARGET_DIRECTORY, ZIPFILE_NAME)
import os
exts = ['ppt', 'pptx', 'doc', 'docx', 'txt', 'pdf', 'epub']
files = []
for root, dirnames, filenames in os.walk('.'):
for i in exts:
for file in filenames:
if file.endswith(i):
file1 = os.path.join(root, file)
print(file1)
with open(os.getcwd()+ r"\ally_"+i+".txt", 'w+') as f:
f.write("%s\n" % file1)
I m trying this code. How do I write all files in my system with ex. doc extention into a file named all_docs.txt in my desktop? file.write() inside for loop only write the last line of each extention into the files.
You need to open the log file in append mode (a) and not in write mode (w), because with w the file gets truncated (all content deleted) before anything new is written to it.
You can look into the docs of open(). This answer also has an overview of all the file modes.
Does it work with a for you?
with open(os.getcwd()+ r"\ally_"+i+".txt", 'w+') as f:
f.write("%s\n" % file1)
According to https://docs.python.org/2/library/functions.html#open the "w+" operation truncates the file.
Modes 'r+', 'w+' and 'a+' open the file for updating (reading and writing); note that 'w+' truncates the file.
The mode w+ for open causes to truncate the file, this is the reason for losing the lines, and only the last one will stay there.
An other little problem can be that this method of joining the path and the file name is not portable. You should user os.path.join for that purpose.
with open(os.path.join(os.getcwd(),"ally_"+i+".txt"), 'a') as f:
f.write("%s\n" % file1)
An other issue can be the week performance which you can have in case of many directories and files.
In your code you run through the filenames in the directory for each extension and open the output file again and again.
One more issue can be the checking of the extension. In most cases the extension can be determined by checking the ending of the file name, but sometimes it can be misleading. E.g. '.doc' is an extension however in a filename 'Medoc' the ending 'doc' is just 3 letters in a name.
So I give an example solution for these problems:
import os
exts = ['ppt', 'pptx', 'doc', 'docx', 'txt', 'pdf', 'epub']
files = []
outfiles = {}
for root, dirnames, filenames in os.walk('.'):
for filename in filenames:
_, ext = os.path.splitext(filename)
ext = ext[1:] # we do not need "."
if ext in exts:
file1 = os.path.join(root, filename)
#print(i,file1)
if ext not in outfiles:
outfiles[ext] = open(os.path.join(os.getcwd(),"ally_"+ext+".txt"), 'a')
outfiles[ext].write("%s\n" % file1)
for ext,file in outfiles.iteritems():
file.close()
I have read all the stack exchange help files on looping through subfolders, as as well as the os documentation, but I am still stuck. I am trying to loop over files in subfolders, open each file, extract the first number in the first line, copy the file to a different subfolder(with the same name but in the output directory) and rename the file copy with the number as a suffix.
import os
import re
outputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers"
inputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/FRUS_Wisconsin"
suffix=".txt"
for root, dirs, files in os.walk(inputpath):
for file in files:
file_path = os.path.join(root, file)
foldername=os.path.split(os.path.dirname(file_path))[1]
filebname=os.path.splitext(file)[0]
filename=filebname + "_"
f=open(os.path.join(root,file),'r')
data=f.readlines()
if data is None:
f.close()
else:
with open(os.path.join(root,file),'r') as f:
for line in f:
s=re.search(r'\d+',line)
if s:
pagenum=(s.group())
break
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
with open(os.path.join(root,file),'r') as f:
for line in f:
f1.write(line)
I expect the result to be copies of the files in the input directory placed in the corresponding subfolder in the output directory, renamed with a suffix, such as "005_2", where 005 is the original file name, and 2 is the number the python code extracted from it.
The error I get seems to indicates that I am not looping through files correctly. I know the code for extracting the first number and renaming the file works because I tested it on a single file. But using os.walk to loop through multiple subfolders is not working, and I can't figure out what I am doing wrong. Here is the error:
File "<ipython-input-1-614e2851f16a>", line 23, in <module>
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
IOError: [Errno 2] No such file or directory: 'C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers\\FRUS_Wisconsin\\.dropbox_1473986809.txt'
Well, this isn't eloquent, but it worked
from glob import glob
folderlist=glob("C:\\...FRUS_Wisconsin*\\")
outputpath = "C:\\..\Wisconsin_Copies_With_PageNumbers"
for folder in folderlist:
foldername = str(folder.split('\\')[7])
for root, dirs, files in os.walk(folder):
for file in files:
filebname=os.path.splitext(file)[0]
filename=filebname + "_"
if not filename.startswith('._'):
with open(os.path.join(root,file),'r') as f:
for line in f:
s=re.search(r'\d+',line)
if s:
pagenum=(s.group())
break
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
with open(os.path.join(root,file),'r') as f:
for line in f:
f1.write(line)
Trying to extract all the zip files and giving the same name to the folder where all the files are gonna be.
Looping through all the files in the folder and then looping through the lines within those files to write on a different text file.
This is my code so far:
#!usr/bin/env python3
import glob
import os
import zipfile
zip_files = glob.glob('*.zip')
for zip_filename in zip_files:
dir_name = os.path.splitext(zip_filename)[0]
os.mkdir(dir_name)
zip_handler = zipfile.ZipFile(zip_filename, "r")
zip_handler.extractall(dir_name)
path = dir_name
fOut = open("Output.txt", "w")
for filename in os.listdir(path):
for line in filename.read().splitlines():
print(line)
fOut.write(line + "\n")
fOut.close()
This is the error that I encounter:
for line in filename.read().splitlines():
AttributeError: 'str' object has no attribute 'read'
You need to open the file and also join the path to the file, also using splitlines and then adding a newline to each line is a bit redundant:
path = dir_name
with open("Output.txt", "w") as fOut:
for filename in os.listdir(path):
# join filename to path to avoid file not being found
with open(os.path.join(path, filename)):
for line in filename:
fOut.write(line)
You should always use with to open your files as it will close them automatically. If the files are not large you can simply fOut.write(f.read()) and remove the loop.
You also set path = dir_name which means path will be set to whatever the last value of dir_name was in your first loop which may or may not be what you want. You can also use iglob to avoid creating a full list zip_files = glob.iglob('*.zip').
I have a root-ish directory containing multiple subdirectories, all of which contain a file name data.txt. What I would like to do is write a script that takes in the "root" directory, and then reads through all of the subdirectories and reads every "data.txt" in the subdirectories, and then writes stuff from every data.txt file to an output file.
Here's a snippet of my code:
import os
import sys
rootdir = sys.argv[1]
with open('output.txt','w') as fout:
for root, subFolders, files in os.walk(rootdir):
for file in files:
if (file == 'data.txt'):
#print file
with open(file,'r') as fin:
for lines in fin:
dosomething()
My dosomething() part -- I've tested and confirmed for it to work if I am running that part just for one file. I've also confirmed that if I tell it to print the file instead (the commented out line) the script prints out 'data.txt'.
Right now if I run it Python gives me this error:
File "recursive.py", line 11, in <module>
with open(file,'r') as fin:
IOError: [Errno 2] No such file or directory: 'data.txt'
I'm not sure why it can't find it -- after all, it prints out data.txt if I uncomment the 'print file' line. What am I doing incorrectly?
You need to use absolute paths, your file variable is just a local filename without a directory path. The root variable is that path:
with open('output.txt','w') as fout:
for root, subFolders, files in os.walk(rootdir):
if 'data.txt' in files:
with open(os.path.join(root, 'data.txt'), 'r') as fin:
for lines in fin:
dosomething()
[os.path.join(dirpath, filename) for dirpath, dirnames, filenames in os.walk(rootdir)
for filename in filenames]
A functional approach to get the tree looks shorter, cleaner and more Pythonic.
You can wrap the os.path.join(dirpath, filename) into any function to process the files you get or save the array of paths for further processing