This question already has answers here:
Python: Open file in zip without temporarily extracting it
(4 answers)
Closed 5 years ago.
I have a url link for a zip file. I want to download the zip file. Then I want to list the name of all files that are in the zip file. One of them is a .csv file. I also want to read from the csv file.
Can anybody tell me how I can do it in python3?
urllib.request.retrieve to download zip file
https://docs.python.org/3/library/urllib.request.html
zipfile module to extract files https://docs.python.org/3/library/zipfile.html
find csv file(s) in path with glob module https://docs.python.org/3/library/glob.html
finally use csv module
https://docs.python.org/3/library/csv.html
Related
This question already has answers here:
Extracting specific files within directory - Windows
(2 answers)
Closed 3 years ago.
I am running a loop which needs to access circa 200 files in the directory.
In the folder - the format of the files range as follows:
Excel_YYYYMMDD.txt
Excel_YYYYMMDD_V2.txt
Excel_YYYYMMDD_orig.txt
I only need to extract the first one - that is YYYYMMDD.txt, and nothing else
I am using glob.glob to access the directory where I specified my path name as follows:
path = "Z:\T\Al8787\Box\EAST\OT\\ABB files/2019/*[0-9].txt"
However the code also extracts the .Excel_YYYYMMDD_orig.txt file too
Appreciate assistance on how to modify code to only extract desired files.
A simple solution would be to loop through the files returned by glob.glob(path). For example if
files = glob.glob("Z:\T\Al8787\Box\EAST\OT\\ABB files/2019/*[0-9].txt")
you could have
cleaned_files = [file for file in files if "orig" not in files]
This would remove every item in files that contains the substring orig
Maybe you should incorporate a split function into the code:
var=path.split('whatever letter separates them')
Then print out that variable.
This question already has answers here:
How to delete a file by extension in Python?
(7 answers)
Closed 4 years ago.
want to delete pdf files from the directory using python.
Having "pdffiles" name folder in that lost of pdf are there So I want to delete all files from it but don't want to delete folder, want to delete just file from folder. how can I do it.(may be using os.remove())
Try listdir+remove:
import os
for i in os.listdir('directory path'):
if i.endswith('.pdf'):
os.remove(i)
This question already has answers here:
Regular expression usage in glob.glob?
(4 answers)
Closed 4 years ago.
In a directory i have multiple files. but i want to fetch only few csv files with particular pattern.
Example
files in a directory: abc.csv, xyz.csv, uvw.csv, sampl.csv, code.py, commands.txt, abc_1.csv, sam.csv, xyz_1.csv, uvw_1.csv, mul.csv, pp.csv......
I need to fetch csv filenames : abc.csv , xyz.csv, uvw.csv, abc_1.csv, xyz_1.csv, uvw_1.csv, abc_2.csv , xyz_2.csv, uvw_2.csv,.... (sometimes more files with change in just the number in filename like abc_3.csv)
In python we can fetch the files using
files = glob.glob("*.csv")
But for the above requirement how to modify the above line or any other efficient way of doing it
Using Regex.
Ex:
import glob
import os
import re
for filename in glob.glob(r"Path\*.csv"):
if re.match(r"[a-z]{3}(_\d*)?\.csv", os.path.basename(filename)):
print(filename)
This question already has answers here:
Python append multiple files in given order to one big file
(12 answers)
Closed 6 years ago.
I have a directory on my system that contains ten zip files. Each zip file contains 1 text file. I want to write a Python script that unzips all of the files in the directory, and then concatenates all of the resulting (unzipped) files into a single file. How can I do this? So far, I have a script that is unzipping all of the files, but I am not sure how to go about adding the concatenation. Below is what I have.
import os, zipfile
dir_name = '/path/to/dir'
pattern = "my-pattern*.gz"
os.chdir(dir_name) # change directory from working dir to dir with files
for item in os.listdir(dir_name): # loop through items in dir
if item == pattern: # check for my pattern extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(dir_name) # extract file to dir
zip_ref.close() # close file
You don't have to write the files to disk when you unzip them, Python can read the file directly from the zip. So, assuming you don't need anything except the concatenated result, replace your last two lines with:
for zipfile in zip_ref.namelist():
with open('targetfile', 'a') as target:
target.write(zip_ref.read(zipfile))
This question already has answers here:
reading tar file contents without untarring it, in python script
(4 answers)
Closed 2 years ago.
I've got a huge *.tar.gz file and I want to see the list of files contained in it without extracting the contents (preferably with mtimes per file). How can I achieve that in python?
You can use TarFile.getnames() like this:
#!/usr/bin/env python3
import tarfile
tarf = tarfile.open('foo.tar.gz', 'r:gz')
print(tarf.getnames())
http://docs.python.org/3.3/library/tarfile.html#tarfile.TarFile.getnames
And if you want mtime values you can use getmembers().
print([(member.name, member.mtime) for member in tarf.getmembers()])