This question already has answers here:
Extract files from zip file and retain mod date?
(4 answers)
Closed last year.
I'm unzipping hundreds of zipped files with python as explained here.
import os
import zipfile
base_dir = '/users/me/myFile' # absolute path to the data folder
extension = ".zip"
os.chdir(base_dir) # change directory from working dir to dir with files
def unpack_all_in_dir(_dir):
for item in os.listdir(_dir): # loop through items in dir
abs_path = os.path.join(_dir, item) # absolute path of dir or file
if item.endswith(extension): # check for ".zip" extension
file_name = os.path.abspath(abs_path) # get full path of file
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(_dir) # extract file to dir
zip_ref.close() # close file
elif os.path.isdir(abs_path):
unpack_all_in_dir(abs_path) # recurse this function with inner folder
unpack_all_in_dir(base_dir)
When I unzip a file manually it will get its original modification date, whereas when doing it with code I loose this - the modification date turns into now's date.
Any idea of a way the preserve the original creation date?
I don't know zipfile very well, but according to this thread about modification date, preserving metadata is a pain.
You can hack around with calling some CLI archiving program as a subprocess, but you need to make sure that it's installed on the target system. I actually had to bundle 7zip with my script once, because of some issue with Python libraries, even third-party ones
Actually opening multiple zip files manually keeps the dates, at least on macOS.
Search for the files you need to unzip, select -> open.
Related
I have been looking for days and was wondering if there is any way to read all the files in a directory in python without using a loop. The reason I ask is because when I go to write the files it goes through the loop again and overwrites all my information or doubles it when I only need to grab one file.
I love pathlib for such tasks
from pathlib import Path
# create a posix path object
folder_path = Path('/path/to/your/folder')
# iterate over directory and store filenames in a list
files_list = list(folder_path.iterdir())
# access file names
print(files_list[0].name)
always pretty handy
This question already has answers here:
How to delete the contents of a folder?
(26 answers)
Closed 1 year ago.
I am trying to create a python script to help clean up a folder that I create with powershell. This is a directory that contains folders named after the users for them to put stuff into.
Once they leave our site, their folder remains and for all new staff who come a folder gets created. This means that we have over 250 folders but only 100 active staff. I have a test folder that I am using so that I can get the script working and then add in extra bits like moving the directories to an old location, and then deleting them based on age. But at the moment I am working on the delete portion of the script. So far I have the below script, it runs with no errors, but it doesn't actually do anything and I am failing to see why..
It should be reading a csv file that I have of all current staff members, and then comparing that to the folders located in the filepath and then removing any folders that dont match the names in the CSV file.
The CSV file is generated from powershell using the same script that I used to create them.
import os
import pandas as pd
path = "//servername/Vault/Users$"
flist = pd.read_csv('D:/Staff List.csv')
file_name = flist['fileName'].tolist()
for fileName in os.listdir(path):
#If file is not present in list
if fileName not in file_name:
#Get full path of file and remove it
full_file_path = os.path.join(path, fileName)
os.remove(full_file_path)
Use shutil and recursively remove all old user directories and their contents.
import shutil
PATH = 'D:/user/bin/old_dir_to_remove'
shutil.rmtree(PATH)
I've got a specific problem:
I am downloading some large sets of data using requests. Each request provides me with a compressed file, containing a manifest of the download, and folders, each containing 1 file.
I can unzip the archive + remove archive, and afterwards extract all files from subdirectories + remove subdirectories.
Is there a way to combine this? Since I'm new to both actions, I studied some tutorials and stack overflow questions on both topics. I'm glad it is working, but I'd like to refine my code and possibly combine these two steps - I didn't encounter it while I was browsing other information.
So for each set of parameters, I perform a request which ends up with:
# Write the file
with open((file_location+file_name), "wb") as output_file:
output_file.write(response.content)
# Unzip it
with tarfile.open((file_location+file_name), "r:gz") as tarObj:
tarObj.extractall(path=file_location)
# Remove compressed file
os.remove(file_location+file_name)
And then for the next step I wrote a function that:
target_dir = keyvalue[1] # target directory is stored in this tuple
subdirs = get_imm_subdirs(target_dir) # function to get subdirectories
for f in subdirs:
c = os.listdir(os.path.join(target_dir, f)) # find file in subdir
shutil.move(c, str(target_dir)+"ALL_FILES/") # move them into 1 subdir
os.rmdir([os.path.join(target_dir, x) for x in subdirs]) # remove other subdirs
Is there an action I can perform during the unzip step?
You can extract the files individually rather than using extractall.
with tarfile.open('musthaves.tar.gz') as tarObj:
for member in tarObj.getmembers():
if member.isfile():
member.name = os.path.basename(member.name)
tarObj.extract(member, ".")
With appropriate credit to this SO question and the tarfile docs.
getmembers() will provide a list what is inside the archive (as objects); you could use listnames() but then you'd have to devise you own test as to whether or not each entry is a file or directory.
isfile() - if it's not a file, you don't want it.
member.name = os.path.basename(member.name) resets the subdirectory depth - the extractor things everything is at the top level.
This question already has answers here:
Python append multiple files in given order to one big file
(12 answers)
Closed 6 years ago.
I have a directory on my system that contains ten zip files. Each zip file contains 1 text file. I want to write a Python script that unzips all of the files in the directory, and then concatenates all of the resulting (unzipped) files into a single file. How can I do this? So far, I have a script that is unzipping all of the files, but I am not sure how to go about adding the concatenation. Below is what I have.
import os, zipfile
dir_name = '/path/to/dir'
pattern = "my-pattern*.gz"
os.chdir(dir_name) # change directory from working dir to dir with files
for item in os.listdir(dir_name): # loop through items in dir
if item == pattern: # check for my pattern extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(dir_name) # extract file to dir
zip_ref.close() # close file
You don't have to write the files to disk when you unzip them, Python can read the file directly from the zip. So, assuming you don't need anything except the concatenated result, replace your last two lines with:
for zipfile in zip_ref.namelist():
with open('targetfile', 'a') as target:
target.write(zip_ref.read(zipfile))
This is my first time using python and I keep running into error 183. The script I created searches the network for all '.py' files and copies them to my backup drive. Please don't laugh at my script as this is my first.
Any clue to what I am doing wrong in the script?
import os
import shutil
import datetime
today = datetime.date.today()
rundate = today.strftime("%Y%m%d")
for root,dirr,filename in os.walk("p:\\"):
for files in filename:
if files.endswith(".py"):
sDir = os.path.join(root, files)
dDir = "B:\\Scripts\\20120124"
modname = rundate + '_' + files
shutil.copy(sDir, dDir)
os.rename(os.path.join(dDir, files), os.path.join(dDir, modname))
print "Renamed %s to %s in %s" % (files, modname, dDir)
I'm guessing you are running the script on windows. According to the list of windows error codes error 183 is ERROR_ALREADY_EXISTS
So I would guess the script is failing because you're attempting to rename a file over an existing file.
Perhaps you are running the script more than once per day? That would result in all the destination files already being there, so the rename is failing when the script is run additional times.
If you specifically want to overwrite the files, then you should probably delete them using os.unlink first.
Given the fact that error 183 is [Error 183] Cannot create a file when that file already exists, you're most likely finding 2 files with the same name in the os.walk() call and after the first one is renamed successfully, the second one will fail to be renamed to the same name so you'll get a file already exists error.
I suggest a try/except around the os.rename() call to treat this case (append a digit after the name or something).
[Yes, i know it's been 7 years since this question was asked but if I got here from a google search maybe others are reaching it too and this answer might help.]
I just encounter the same issue, when you trying to rename a folder with a folder that existed in the same directory has the same name, Python will raise an error.
If you trying to do that in Windows Explorer, it will ask you if you want to merge these two folders. however, Python doesn't have this feature.
Below is my codes to achieve the goal of rename a folder while a same name folder already exist, which is actually merging folders.
import os, shutil
DEST = 'D:/dest/'
SRC = 'D:/src/'
for filename in os.listdir(SRC): # move files from SRC to DEST folder.
try:
shutil.move(SRC + filename, DEST)
# In case the file you're trying to move already has a copy in DEST folder.
except shutil.Error: # shutil.Error: Destination path 'D:/DEST/xxxx.xxx' already exists
pass
# Now delete the SRC folder.
# To delete a folder, you have to empty its files first.
if os.path.exists(SRC):
for i in os.listdir(SRC):
os.remove(os.path.join(SRC, i))
# delete the empty folder
os.rmdir(SRC)