How to select single File from os.path.join() result? - python

I Selected my needed Data Like This:
import pathlib
from pathlib import Path
import glob, os
folder = Path('D:/xyz/123/Files')
os.chdir(folder)
for file in glob.glob("*.json"):
JsonFiles = os.path.join(folder, file)
print (JsonFiles)
As Output I will get all my needed .json Files
D:/xyz/123/Files/Data.json
D:/xyz/123/Files/Stuff.json
D:/xyz/123/Files/Random.json
D:/xyz/123/Files/Banana.json
D:/xyz/123/Files/Apple.json
For my further coding in need a Variable to the diffrens Json Paths. So my Idear was insted of printing them to store them in a List. But thats not working?
ListJson =[JsonFiles]
print (ListJson[1])
I get this Error:
print (ListJson[2])
IndexError: list index out of range
How wold you solve this Problem I just need an possibility to work with the Paths I already sorted.

Solution 1 with append():
If you change
for file in glob.glob("*.json"):
JsonFiles = os.path.join(folder, file)
print (JsonFiles)
to
ListJson = []
for file in glob.glob("*.json"):
JsonFile = os.path.join(folder, file)
ListJson.append(JsonFile)
You create an empty list and add during each iteration one file (the result of os.path.join to it.
then you would have what you want
Solution 2 with list comprehensions:
If you want to use list comprehensions ( https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions ) , then following would do:
ListJson = [os.path.join(folder, file) for file in glob.glob("*.json")]
Solution 3: with pathlib, absolute() and glob():
On the other hand you don't even have to chdir() into the given directory and if path objects are good enough in your context you could directly do:
ListJson = list(folder.absolute().glob('*.json'))
or if you really need strings:
ListJson = [str(path) for path in folder.absolute().glob('*.json')]

You can use Python List append() Method
import pathlib
from pathlib import Path
import glob, os
folder = Path('D:/xyz/123/Files')
ListJson=[]
os.chdir(folder)
for file in glob.glob("*.json"):
JsonFiles = os.path.join(folder, file)
ListJson.append(JsonFiles)
print(ListJson)

Related

How to loop and optimise data extraction - Python [duplicate]

I need to iterate through all .asm files inside a given directory and do some actions on them.
How can this be done in a efficient way?
Python 3.6 version of the above answer, using os - assuming that you have the directory path as a str object in a variable called directory_in_str:
import os
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
Or recursively, using pathlib:
from pathlib import Path
pathlist = Path(directory_in_str).glob('**/*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Use rglob to replace glob('**/*.asm') with rglob('*.asm')
This is like calling Path.glob() with '**/' added in front of the given relative pattern:
from pathlib import Path
pathlist = Path(directory_in_str).rglob('*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Original answer:
import os
for filename in os.listdir("/path/to/dir/"):
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
This will iterate over all descendant files, not just the immediate children of the directory:
import os
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".asm"):
print (filepath)
You can try using glob module:
import glob
for filepath in glob.iglob('my_dir/*.asm'):
print(filepath)
and since Python 3.5 you can search subdirectories as well:
glob.glob('**/*.txt', recursive=True) # => ['2.txt', 'sub/3.txt']
From the docs:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
Since Python 3.5, things are much easier with os.scandir() and 2-20x faster (source):
with os.scandir(path) as it:
for entry in it:
if entry.name.endswith(".asm") and entry.is_file():
print(entry.name, entry.path)
Using scandir() instead of listdir() can significantly increase the
performance of code that also needs file type or file attribute
information, because os.DirEntry objects expose this information if
the operating system provides it when scanning a directory. All
os.DirEntry methods may perform a system call, but is_dir() and
is_file() usually only require a system call for symbolic links;
os.DirEntry.stat() always requires a system call on Unix but only
requires one for symbolic links on Windows.
Python 3.4 and later offer pathlib in the standard library. You could do:
from pathlib import Path
asm_pths = [pth for pth in Path.cwd().iterdir()
if pth.suffix == '.asm']
Or if you don't like list comprehensions:
asm_paths = []
for pth in Path.cwd().iterdir():
if pth.suffix == '.asm':
asm_pths.append(pth)
Path objects can easily be converted to strings.
Here's how I iterate through files in Python:
import os
path = 'the/name/of/your/path'
folder = os.fsencode(path)
filenames = []
for file in os.listdir(folder):
filename = os.fsdecode(file)
if filename.endswith( ('.jpeg', '.png', '.gif') ): # whatever file types you're using...
filenames.append(filename)
filenames.sort() # now you have the filenames and can do something with them
NONE OF THESE TECHNIQUES GUARANTEE ANY ITERATION ORDERING
Yup, super unpredictable. Notice that I sort the filenames, which is important if the order of the files matters, i.e. for video frames or time dependent data collection. Be sure to put indices in your filenames though!
You can use glob for referring the directory and the list :
import glob
import os
#to get the current working directory name
cwd = os.getcwd()
#Load the images from images folder.
for f in glob.glob('images\*.jpg'):
dir_name = get_dir_name(f)
image_file_name = dir_name + '.jpg'
#To print the file name with path (path will be in string)
print (image_file_name)
To get the list of all directory in array you can use os :
os.listdir(directory)
I'm not quite happy with this implementation yet, I wanted to have a custom constructor that does DirectoryIndex._make(next(os.walk(input_path))) such that you can just pass the path you want a file listing for. Edits welcome!
import collections
import os
DirectoryIndex = collections.namedtuple('DirectoryIndex', ['root', 'dirs', 'files'])
for file_name in DirectoryIndex(*next(os.walk('.'))).files:
file_path = os.path.join(path, file_name)
I really like using the scandir directive that is built into the os library. Here is a working example:
import os
i = 0
with os.scandir('/usr/local/bin') as root_dir:
for path in root_dir:
if path.is_file():
i += 1
print(f"Full path is: {path} and just the name is: {path.name}")
print(f"{i} files scanned successfully.")
Get all the .asm files in a directory by doing this.
import os
path = "path_to_file"
file_type = '.asm'
for filename in os.listdir(path=path):
if filename.endswith(file_type):
print(filename)
print(f"{path}/{filename}")
# do something below
I don't understand why some answers are complicated. This is how I would do it with Python 2.7. Replace DIRECTORY_TO_LOOP with the directory you want to use.
import os
DIRECTORY_TO_LOOP = '/var/www/files/'
for root, dirs, files in os.walk(DIRECTORY_TO_LOOP, topdown=False):
for name in files:
print(os.path.join(root, name))

How to open and read text files in a folder python

I have a folder which has a text files in it. I want to be able to put in a path to this file and have python go through the folder, open each file and append its content to a list.
import os
folderpath = "/Users/myname/Downloads/files/"
inputlst = [os.listdir(folderpath)]
filenamelist = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
filenamelist.append(filename)
print(filename list)
So far this outputs:
['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt', 'test5.txt', 'test6.txt', 'test7.txt', 'test8.txt', 'test9.txt', 'test10.txt']
I want to have the code take each of these files, open them and put all of its content into a single huge list not just print the file name. Is there any way to do this?
You should use file open for this.
Read here a documentation about its advanced options
Anyway, here is one way how you can do it:
import os
folderpath = r"yourfolderpath"
inputlst = [os.listdir(folderpath)]
filenamecontent = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
f = open(os.path.join(folderpath,filename), 'r')
filenamecontent.append(f.read())
print(filenamecontent)
If you are using Python3, you can use :
for filename in filename_list :
with open(filename,"r") as file_handler :
data = file_handler.read()
Please do mind that you will need the full (either relative or absolute) path to your file in filename
This way, your file handler will be automatically closed when you get out of the with scope.
More information around here : https://docs.python.org/fr/3/library/functions.html#open
On a side note, in order to list files, you might want to have a look to glob and use :
filename_list = glob.glob("/path/to/files/*.txt")
You can use fileinput
Code:
import fileinput
folderpath = "your_path_to_directory_where_files_are_stored"
file_list = [a for a in os.listdir(folderpath) if a.endswith(".txt")]
# This will return all the files which are in .txt format
get_all_files = fileinput.input(file_list)
with open("alldata.txt", 'ab+') as writefile:
for line in get_all_files:
writefile.write(line+'\n')
The above code will read all the data from .txt from a specified directory(folderpath) and store it in alldata.txt So, you wanted to have that long list, that list is now stored in .txt file if you want, else you can remove the write process.
Links:
https://docs.python.org/3/library/fileinput.html
https://docs.python.org/3/library/functions.html#open

How to delete a file by extension in Python?

I was messing around just trying to make a script that deletes items by ".zip" extension.
import sys
import os
from os import listdir
test=os.listdir("/Users/ben/downloads/")
for item in test:
if item.endswith(".zip"):
os.remove(item)
Whenever I run the script I get:
OSError: [Errno 2] No such file or directory: 'cities1000.zip'
cities1000.zip is obviously a file in my downloads folder.
What did I do wrong here? Is the issue that os.remove requires the full path to the file? If this is the issue, than how can I do that in this current script without completely rewriting it.
You can set the path in to a dir_name variable, then use os.path.join for your os.remove.
import os
dir_name = "/Users/ben/downloads/"
test = os.listdir(dir_name)
for item in test:
if item.endswith(".zip"):
os.remove(os.path.join(dir_name, item))
For this operation you need to append the file name on to the file path so the command knows what folder you are looking into.
You can do this correctly and in a portable way in python using the os.path.join command.
For example:
import os
directory = "/Users/ben/downloads/"
test = os.listdir( directory )
for item in test:
if item.endswith(".zip"):
os.remove( os.path.join( directory, item ) )
Alternate approach that avoids join-ing yourself over and over: Use glob module to join once, then let it give you back the paths directly.
import glob
import os
dir = "/Users/ben/downloads/"
for zippath in glob.iglob(os.path.join(dir, '*.zip')):
os.remove(zippath)
I think you could use Pathlib-- a modern way, like the following:
import pathlib
dir = pathlib.Path("/Users/ben/downloads/")
zip_files = dir.glob(dir / "*.zip")
for zf in zip_files:
zf.unlink()
If you want to delete all zip files recursively, just write so:
import pathlib
dir = pathlib.Path("/Users/ben/downloads/")
zip_files = dir.rglob(dir / "*.zip") # recursively
for zf in zip_files:
zf.unlink()
Just leaving my two cents on this issue: if you want to be chic you can use glob or iglob from the glob package, like so:
import glob
import os
files_in_dir = glob.glob('/Users/ben/downloads/*.zip')
# or if you want to be fancy, you can use iglob, which returns an iterator:
files_in_dir = glob.iglob('/Users/ben/downloads/*.zip')
for _file in files_in_dir:
print(_file) # just to be sure, you know how it is...
os.remove(_file)
origfolder = "/Users/ben/downloads/"
test = os.listdir(origfolder)
for item in test:
if item.endswith(".zip"):
os.remove(os.path.join(origfolder, item))
The dirname is not included in the os.listdir output. You have to attach it to reference the file from the list returned by said function.
Prepend the directory to the filename
os.remove("/Users/ben/downloads/" + item)
EDIT: or change the current working directory using os.chdir.

How can I iterate over files in a given directory?

I need to iterate through all .asm files inside a given directory and do some actions on them.
How can this be done in a efficient way?
Python 3.6 version of the above answer, using os - assuming that you have the directory path as a str object in a variable called directory_in_str:
import os
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
Or recursively, using pathlib:
from pathlib import Path
pathlist = Path(directory_in_str).glob('**/*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Use rglob to replace glob('**/*.asm') with rglob('*.asm')
This is like calling Path.glob() with '**/' added in front of the given relative pattern:
from pathlib import Path
pathlist = Path(directory_in_str).rglob('*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Original answer:
import os
for filename in os.listdir("/path/to/dir/"):
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
This will iterate over all descendant files, not just the immediate children of the directory:
import os
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".asm"):
print (filepath)
You can try using glob module:
import glob
for filepath in glob.iglob('my_dir/*.asm'):
print(filepath)
and since Python 3.5 you can search subdirectories as well:
glob.glob('**/*.txt', recursive=True) # => ['2.txt', 'sub/3.txt']
From the docs:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
Since Python 3.5, things are much easier with os.scandir() and 2-20x faster (source):
with os.scandir(path) as it:
for entry in it:
if entry.name.endswith(".asm") and entry.is_file():
print(entry.name, entry.path)
Using scandir() instead of listdir() can significantly increase the
performance of code that also needs file type or file attribute
information, because os.DirEntry objects expose this information if
the operating system provides it when scanning a directory. All
os.DirEntry methods may perform a system call, but is_dir() and
is_file() usually only require a system call for symbolic links;
os.DirEntry.stat() always requires a system call on Unix but only
requires one for symbolic links on Windows.
Python 3.4 and later offer pathlib in the standard library. You could do:
from pathlib import Path
asm_pths = [pth for pth in Path.cwd().iterdir()
if pth.suffix == '.asm']
Or if you don't like list comprehensions:
asm_paths = []
for pth in Path.cwd().iterdir():
if pth.suffix == '.asm':
asm_pths.append(pth)
Path objects can easily be converted to strings.
Here's how I iterate through files in Python:
import os
path = 'the/name/of/your/path'
folder = os.fsencode(path)
filenames = []
for file in os.listdir(folder):
filename = os.fsdecode(file)
if filename.endswith( ('.jpeg', '.png', '.gif') ): # whatever file types you're using...
filenames.append(filename)
filenames.sort() # now you have the filenames and can do something with them
NONE OF THESE TECHNIQUES GUARANTEE ANY ITERATION ORDERING
Yup, super unpredictable. Notice that I sort the filenames, which is important if the order of the files matters, i.e. for video frames or time dependent data collection. Be sure to put indices in your filenames though!
You can use glob for referring the directory and the list :
import glob
import os
#to get the current working directory name
cwd = os.getcwd()
#Load the images from images folder.
for f in glob.glob('images\*.jpg'):
dir_name = get_dir_name(f)
image_file_name = dir_name + '.jpg'
#To print the file name with path (path will be in string)
print (image_file_name)
To get the list of all directory in array you can use os :
os.listdir(directory)
I'm not quite happy with this implementation yet, I wanted to have a custom constructor that does DirectoryIndex._make(next(os.walk(input_path))) such that you can just pass the path you want a file listing for. Edits welcome!
import collections
import os
DirectoryIndex = collections.namedtuple('DirectoryIndex', ['root', 'dirs', 'files'])
for file_name in DirectoryIndex(*next(os.walk('.'))).files:
file_path = os.path.join(path, file_name)
I really like using the scandir directive that is built into the os library. Here is a working example:
import os
i = 0
with os.scandir('/usr/local/bin') as root_dir:
for path in root_dir:
if path.is_file():
i += 1
print(f"Full path is: {path} and just the name is: {path.name}")
print(f"{i} files scanned successfully.")
Get all the .asm files in a directory by doing this.
import os
path = "path_to_file"
file_type = '.asm'
for filename in os.listdir(path=path):
if filename.endswith(file_type):
print(filename)
print(f"{path}/{filename}")
# do something below
I don't understand why some answers are complicated. This is how I would do it with Python 2.7. Replace DIRECTORY_TO_LOOP with the directory you want to use.
import os
DIRECTORY_TO_LOOP = '/var/www/files/'
for root, dirs, files in os.walk(DIRECTORY_TO_LOOP, topdown=False):
for name in files:
print(os.path.join(root, name))

Find all CSV files in a directory using Python

How can I find all files in directory with the extension .csv in python?
import os
import glob
path = 'c:\\'
extension = 'csv'
os.chdir(path)
result = glob.glob('*.{}'.format(extension))
print(result)
from os import listdir
def find_csv_filenames( path_to_dir, suffix=".csv" ):
filenames = listdir(path_to_dir)
return [ filename for filename in filenames if filename.endswith( suffix ) ]
The function find_csv_filenames() returns a list of filenames as strings, that reside in the directory path_to_dir with the given suffix (by default, ".csv").
Addendum
How to print the filenames:
filenames = find_csv_filenames("my/directory")
for name in filenames:
print name
By using the combination of filters and lambda, you can easily filter out csv files in given folder.
import os
all_files = os.listdir("/path-to-dir")
csv_files = list(filter(lambda f: f.endswith('.csv'), all_files))
# lambda returns True if filename (within `all_files`) ends with .csv or else False
# and filter function uses the returned boolean value to filter .csv files from list files.
use Python OS module to find csv file in a directory.
the simple example is here :
import os
# This is the path where you want to search
path = r'd:'
# this is the extension you want to detect
extension = '.csv'
for root, dirs_list, files_list in os.walk(path):
for file_name in files_list:
if os.path.splitext(file_name)[-1] == extension:
file_name_path = os.path.join(root, file_name)
print file_name
print file_name_path # This is the full path of the filter file
I had to get csv files that were in subdirectories, therefore, using the response from tchlpr I modified it to work best for my use case:
import os
import glob
os.chdir( '/path/to/main/dir' )
result = glob.glob( '*/**.csv' )
print( result )
import os
path = 'C:/Users/Shashank/Desktop/'
os.chdir(path)
for p,n,f in os.walk(os.getcwd()):
for a in f:
a = str(a)
if a.endswith('.csv'):
print(a)
print(p)
This will help to identify path also of these csv files
While solution given by thclpr works it scans only immediate files in the directory and not files in the sub directories if any. Although this is not the requirement but just in case someone wishes to scan sub directories too below is the code that uses os.walk
import os
from glob import glob
PATH = "/home/someuser/projects/someproject"
EXT = "*.csv"
all_csv_files = [file
for path, subdir, files in os.walk(PATH)
for file in glob(os.path.join(path, EXT))]
print(all_csv_files)
Copied from this blog.
Use the python glob module to easily list out the files we need.
import glob
path_csv=glob.glob("../data/subfolrder/*.csv")
You could just use glob with recursive = true, the pattern ** will match any files and zero or more directories, subdirectories and symbolic links to directories.
import glob, os
os.chdir("C:\\Users\\username\\Desktop\\MAIN_DIRECTORY")
for file in glob.glob("*/.csv", recursive = true):
print(file)
This solution uses the python function filter. This function creates a list of elements for which a function returns true. In this case, the anonymous function used is partial matching '.csv' on every element of the directory files list obtained with os.listdir('the path i want to look in')
import os
filepath= 'filepath_to_my_CSVs' # for example: './my_data/'
list(filter(lambda x: '.csv' in x, os.listdir('filepath_to_my_CSVs')))
Many (linked) answers change working directory with os.chdir(). But you don't have to.
Recursively print all CSV files in /home/project/ directory:
pathname = "/home/project/**/*.csv"
for file in glob.iglob(pathname, recursive=True):
print(file)
Requires python 3.5+. From docs [1]:
pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif)
pathname can contain shell-style wildcards.
Whether or not the results are sorted depends on the file system.
If recursive is true, the pattern ** will match any files and zero or more directories, subdirectories and symbolic links to directories
[1] https://docs.python.org/3/library/glob.html#glob.glob
You could just use glob with recursive = True, the pattern ** will match any files and zero or more directories, subdirectories and symbolic links to directories.
import glob, os
os.chdir("C:\\Users\\username\\Desktop\\MAIN_DIRECTORY")
for file in glob.glob("*/*.csv", recursive = True):
print(file)
Please use this tested working code. This function will return a list of all the CSV files with absolute CSV file paths in your specified path.
import os
from glob import glob
def get_csv_files(dir_path, ext):
os.chdir(dir_path)
return list(map(lambda x: os.path.join(dir_path, x), glob(f'*.{ext}')))
print(get_csv_files("E:\\input\\dir\\path", "csv"))

Categories