How to pass file name at run time in python - python

I have a directory which contains 4 .csv files at the moment.
I am able retrieve their names using os lib in the code given below:
import os
fileNames = os.listdir(path)
for f in fileNames:
print(f)
Now I want to pass the file names one by one in my open file command and do the related processing.
how do I pass file name in my command:
file = open(r'C:\Users\hu170f\Documents\TEST1\<filename to be passed>')

import os
dir_path = r'D:\text_dirs\\'
fileNames = os.listdir(dir_path)
for fn in fileNames:
f = open(dir_path+fn, "r")
print(f.read())

import os
path = r"C:\..."
filenames = os.listdir(path)
You can use the map function like a for loop, which only opens one method(The first parameter) and passes one after one element of this list(second parameter) to the method. The lambda function is like a method which executes the open function and returns the _io.TextIOWrapper. files contains a list of _io.TextIOWrapper.
files = list(map(lambda filename: open(f"{path}\{filename}"), filenames))

Related

Extract zip file and nested zip files into target directory using Python

I have a file structure something like this:
/a.zip
    /not_a_zip/
        contents
    /b.zip
        contents
and I want to create a directory a and extract a.zip into it and all the nested zipped files where they are so I get something like this:
/a/
    /not_a_zip/
        contents
    /b/
        contents
I tried this solution, but I was getting errors because inside my main directory I have subdirectories, as well as zip files.
I want to be able to extract the main zip file into a directory of the same name, then be able to extract all nested files within, no matter how deeply nested they are.
EDIT: my current code is this
archive = zipfile.ZipFile(zipped, 'r')
for file in archive.namelist():
archive.extract(file, resultDirectory)
for f in [filename for filename in archive.NameToInfo if filename.endswith(".zip")]:
# get file name and path to extract
fileToExtract = resultDirectory + '/' + f
# get directory to extract new file to
directoryToExtractTo = fileToExtract.rsplit('/', 1)
directoryToExtractTo = directoryToExtractTo[0] + '/'
# extract nested file
nestedArchive = zipfile.ZipFile(fileToExtract, 'r')
for file in nestedArchive.namelist():
nestedArchive.extract(fileToExtract, directoryToExtractTo)
but I'm getting this error:
KeyError: "There is no item named 'nestedFileToExtract.zip' in the archive"
Even though it exists in the file system
Based on this other solutions: this and this.
import os
import io
import sys
import zipfile
def extract_with_structure(input_file, output):
with zipfile.ZipFile(input_file) as zip_file:
print(f"namelist: {zip_file.namelist()}")
for obj in zip_file.namelist():
filename = os.path.basename(obj)
if not filename:
# Skip folders
continue
if 'zip' == filename.split('.')[-1]:
# extract a zip
content = io.BytesIO(zip_file.read(filename))
f = zipfile.ZipFile(content)
dirname = os.path.splitext(os.path.join(output, filename))[0]
for i in f.namelist():
f.extract(i, dirname)
else:
# extract a file
zip_file.extract(obj, os.path.join(output))
if __name__ == "__main__":
if len(sys.argv) < 3:
print("No zipfile specified or output folder.")
exit(1)
extract_with_structure(sys.argv[1], sys.argv[2])

How to open and read text files in a folder python

I have a folder which has a text files in it. I want to be able to put in a path to this file and have python go through the folder, open each file and append its content to a list.
import os
folderpath = "/Users/myname/Downloads/files/"
inputlst = [os.listdir(folderpath)]
filenamelist = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
filenamelist.append(filename)
print(filename list)
So far this outputs:
['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt', 'test5.txt', 'test6.txt', 'test7.txt', 'test8.txt', 'test9.txt', 'test10.txt']
I want to have the code take each of these files, open them and put all of its content into a single huge list not just print the file name. Is there any way to do this?
You should use file open for this.
Read here a documentation about its advanced options
Anyway, here is one way how you can do it:
import os
folderpath = r"yourfolderpath"
inputlst = [os.listdir(folderpath)]
filenamecontent = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
f = open(os.path.join(folderpath,filename), 'r')
filenamecontent.append(f.read())
print(filenamecontent)
If you are using Python3, you can use :
for filename in filename_list :
with open(filename,"r") as file_handler :
data = file_handler.read()
Please do mind that you will need the full (either relative or absolute) path to your file in filename
This way, your file handler will be automatically closed when you get out of the with scope.
More information around here : https://docs.python.org/fr/3/library/functions.html#open
On a side note, in order to list files, you might want to have a look to glob and use :
filename_list = glob.glob("/path/to/files/*.txt")
You can use fileinput
Code:
import fileinput
folderpath = "your_path_to_directory_where_files_are_stored"
file_list = [a for a in os.listdir(folderpath) if a.endswith(".txt")]
# This will return all the files which are in .txt format
get_all_files = fileinput.input(file_list)
with open("alldata.txt", 'ab+') as writefile:
for line in get_all_files:
writefile.write(line+'\n')
The above code will read all the data from .txt from a specified directory(folderpath) and store it in alldata.txt So, you wanted to have that long list, that list is now stored in .txt file if you want, else you can remove the write process.
Links:
https://docs.python.org/3/library/fileinput.html
https://docs.python.org/3/library/functions.html#open

Filter Directory using Regex and output filtered files to another directory

I am simply trying to create a python 3 program that runs through all .sql files in a specific directory and then apply my regex that adds ; after a certain instance and write the changes made to the file to a separate directory with their respective file names as the same.
So, if I had file1.sql and file2.sql in "/home/files" directory, after I run the program, the output should write those two files to "/home/new_files" without changes the content of the original files.
Here is my code:
import glob
import re
folder_path = "/home/files/d_d"
file_pattern = "/*sql"
folder_contents = glob.glob(folder_path + file_pattern)
for file in folder_contents:
print("Checking", file)
for file in folder_contents:
read_file = open(file, 'rt',encoding='latin-1').read()
#words=read_file.split()
with open(read_file,"w") as output:
output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))
I receive an error of File name too long:"CREATE EXTERNAL TABLe" and also I am not too sure where I would put my output path (/home/files/new_dd)in my code.
Any ideas or suggestions?
With read_file = open(file, 'rt',encoding='latin-1').read() the whole content of the file was being used as the file descriptor. The code provided here iterate over the files names found with glob.glob pattern open to read, process data, and open to write (assuming that a folder newfile_sqls already exist,
if not, an error would rise FileNotFoundError: [Errno 2] No such file or directory).
import glob
import os
import re
folder_path = "original_sqls"
#original_sqls\file1.sql, original_sqls\file2.sql, original_sqls\file3.sql
file_pattern = "*sql"
# new/modified files folder
output_path = "newfile_sqls"
folder_contents = glob.glob(os.path.join(folder_path,file_pattern))
# iterate over file names
for file_ in [os.path.basename(f) for f in folder_contents]:
# open to read
with open(os.path.join(folder_path,file_), "r") as inputf:
read_file = inputf.read()
# use variable 'read_file' here
tmp = re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', read_file, flags=re.DOTALL)
# open to write to (previouly created) new folder
with open(os.path.join(output_path,file_), "w") as output:
output.writelines(tmp)

Read .txt from multiple .zip in folder

I have a folder (not zipped) containing multiple zip files (no other file type within folder). Each zip has the same type of text files containing different data saved within.
I know how to read in each separately, but I am looking to loop the process without having to type in each zip name. The zipfile archive does not seem to allow wild cards, so I cannot loop using this method. Is it possible to loop the process using glob?
The goal is to get the agency names without extracting all the zipfiles.
Single file read
import os
os.listdir('C:\\NTM\\Test\\')
['00003_32_332.zip', '00011_273_569.zip', '00012_258_276.zip']
import glob
glob.glob('C:\\NTM\\Test\\*.zip')
['C:\\NTM\\Test\\00003_32_332.zip', 'C:\\NTM\\Test\\00011_273_569.zip', 'C:\\NTM\\Test\\00012_258_276.zip']
import zipfile
archive=zipfile.ZipFile('C:\\NTM\\Test\\00011_273_569.zip')
testagency=archive.open('agency.txt')
testagency.read()
'agency_id,agency_name,nVRT,ValleyRide'
Update:
Now, that I can loop through the zip files and loop through to get the text file - I cannot print the agency_name from all of the zip files in the folder. My current code only prints the name of the last agency from the text file of the last zip file in the folder. Am I missing some compound statement structure?
def csv_dict_reader(file_obj):
reader=csv.DictReader(file_obj, delimiter=',')
for row in reader:
print(row['agency_name'])
if name == 'main':
with archive.open('agency.txt')as f_obj:
csv_dict_reader(f_obj)
Whatcom Transportation Authority
Sample Code
import glob
import zipfile
dirName = '/backup/'
zipList = glob.glob(diName+'*.zip')
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
archive.close()
Thanks Jean-Francois!
for archive_name in glob.glob('C:\\NTM\\Test\\*.zip'):
archive=zipfile.ZipFile(archive_name)
testagency=archive.open('agency.txt')
testagency.read()
As I could not comment on Fuji Komalans comment.
Here is the fixed code.
import glob
import zipfile
dirName = 'C:/test/'
zipList = glob.glob(dirName + '*.zip')
print(zipList)
for zipname in zipList:
archive = zipfile.ZipFile(zipname)
fileList = archive.namelist()
for fileName in fileList:
if fileName.endswith('.txt'):
archive.extract(fileName)
print(fileName)
archive.close()

Reading in multiple files in directory using python

I'm trying to open each file from a directory and print the contents, so I have a code as such:
import os, sys
def printFiles(dir):
os.chdir(dir)
for f in os.listdir(dir):
myFile = open(f,'r')
lines = myFile.read()
print lines
myFile.close()
printFiles(sys.argv[1])
The program runs, but the problem here is that it is only printing one of the contents of the file, probably the last file that it has read. Does this have something to do with the open() function?
Edit: added last line that takes in sys.argv. That's the whole code, and it still only prints the last file.
There is problem with directory and file paths.
Option 1 - chdir:
def printFiles(dir):
os.chdir(dir)
for f in os.listdir('.'):
myFile = open(f,'r')
# ...
Option 2 - computing full path:
def printFiles(dir):
# no chdir here
for f in os.listdir(dir):
myFile = open(os.path.join(dir, f), 'r')
# ...
But you are combining both options - that's wrong.
This is why I prefer pathlib.Path - it's much simpler:
from pathlib import Path
def printFiles(dir):
dir = Path(dir)
for f in dir.iterdir():
myFile = f.open()
# ...
The code itself certainly should print the contents of every file.
However, if you supply a local path and not a global path it will not work.
For example, imagine you have the following folder structure:
./a
./a/x.txt
./a/y.txt
./a/a
./a/a/x.txt
If you now run
printFiles('a')
you will only get the contents of x.txt, because os.listdir will be executed from within a, and will list the contents of the internal a/a folder, which only has x.txt.

Categories