Reading multiple lines of strings into a list - Python [duplicate] - python

This question already has answers here:
How do I list all files of a directory?
(21 answers)
Closed 2 years ago.
I have a code that outputs all of the files that are a .pdf in a directory. It outputs a stack of strings like below.
file0.PDF
file1.PDF
file2.PDF
file3.PDF
I want to put these strings into a list, that looks like this:
['file0.PDF', 'file1.PDF', 'file2.PDF', 'file3.PDF']
I have managed to do this with the code below.
import os
list_final = []
for file in os.listdir(path):
if ".PDF" in file:
for value in file.split('\n'):
list_final.append(value)
print(list_final)
This gives the list in the format above, which is what I want.
Is there a better way to do this? I feel that my code is very inefficient. I have tried through a list comprehensions such as the below but I am unsure why it does not work.
list_final = [value for value in file.split('\n')]
Thanks in advance.

Try using glob.glob(), it will find all files that meet a pattern:
import glob
print(glob.glob("*.pdf")) # returns a list of filenames
Or if you want to use another path than the current path, just join it to the pattern
print(glob.glob(path + "/*.pdf")) # returns a list of filenames
Or even better, use os.path.join() instead:
from os.path import join
glob.glob(join(path, "/*.pdf"))

you can use a list comprehension:
list_final = [e for e in os.listdir(path) if e.endswith('.PDF')]
or you could use pathlib.Path.glob :
from pathlib import Path
p = Path(path)
list_final = [e.name for e in p.glob('*.PDF')]

Related

Reading multiple CSVs using glob results in wrong order [duplicate]

This question already has answers here:
Sort filenames in directory in ascending order [duplicate]
(3 answers)
Non-alphanumeric list order from os.listdir()
(14 answers)
Closed 2 years ago.
I need to read in multiple CSV files in the correct order. The files are named with sequential numbers, like "file_0.csv", "file_1.csv", "file_2.csv", ... and were created in this same order.
When running my code, the files are not kept in this order but instead completely mixed up.
There are no other files in the path folder.
path = "stored_files"
filenames = glob.iglob(path + "/*.csv")
dataframes = []
for filename in filenames:
dataframes.append(pd.read_csv(filename))
AFAIK, glob provides randomly-ordered access to the folder. You can always sort fhe filenames:
path = "stored_files"
filenames = glob.iglob(path + "/*.csv")
dataframes = []
for filename in sorted(filenames):
dataframes.append(pd.read_csv(filename))
String sort will not sort strings with numbers the way you expect (in particular, 10 precedes 2). So, if you know what your filenames look like, do loop through the numbers and append "foo"+str(i)+".csv" or whatever to your filelist.

Get a list of tuples containing files with the same names but different endings

I have a folder which among others contains pairs of json and jpeg files with the same file names. Based on this folder, I want to create a list of tuples containing the pairs as follows:
[('first.json','first.jpg'),('second.json','second.jpg')...('last.json','last.jpg')]
Filtering for only json and and jpg files is easy:
import os
import re
files = [targetFile for targetFile in os.listdir('Z:/data') if re.match('.*\.json|.*\.jpg', targetFile)]
print(files)
But how can I combine that part with the generation of the list of tuples without iterating through the file list for a second time?
This should work, per your comments:
files, tuples = list(), list()
for targetFile in os.listdir('Z:/data'):
if re.match('.*\.json|.*\.jpg', targetFile):
files.append(targetFile)
tuples.append((...))
...where in the ellipsis you place code that extracts the filenames and appends the endings.
Thanks to #John Perry, who reminded me of the fact that style shouldn't prevail over functionality, I came up with the following simple solution:
import os
from collections import defaultdict
listOfRelevantFiles = defaultdict(list)
for targetFile in os.listdir('Z:/data'):
if '.jpg' in targetFile or '.json' in targetFile:
listOfRelevantFiles[targetFile.split('.')[0]].append(targetFile)
print(listOfRelevantFiles)
If you are looking for that compact way of doing this,
ps: note that this returns list of lists instead of list of tuples,
# bunch of files os.listdir() returns
files = ['first.jpg', 'first.json', 'second.jpg', 'second.json']
print([re.findall(fileName + r'(?:.jpg|.json)', ' '.join(files)) \
for fileName in set(re.findall(r'(\w*?)(?:\.jpg|\.json)', ' '.join(files)))])
# [['second.jpg', 'second.json'], ['first.jpg', 'first.json']]

Python glob, os, relative path, making filenames into a list [duplicate]

This question already has answers here:
Python Glob without the whole path - only the filename
(10 answers)
Closed 5 years ago.
I am trying to make a list of all files in a directory with filenames in a that end in .root.
After reading some writings in the forum I tried to basic strategies using glob and os.listdir but I got into trouble for both of them
First, when I use
import glob
filelist = glob.glob('/home/usr/dir/*.root')
It does make a list of string with all filenames that end in .root but I still face a problem.
I would like to be the list of string to have filenames as '/dir/.root' but the string has full path '/home/usr/dir/.root'
Second, if I use os.listdir, I get into the trouble that
path = '/home/usr/'
filelist = os.listdir(path + 'dir/*.root')
syntax error
which tells me that I can not only get the list of files for .root.
In summary, I would like to make a list of filenames, that end in .root and are in my /home/usr/dir, while cutting off the '/home/usr' part. If I use globe, I get into the trouble of having /home/usr/. If I use os.listdir, I can't specify ".root" endling.
glob will return paths in a format matching your query, so that
glob.glob("/home/usr/dir/*.root")
# ['home/usr/dir/foo.root', 'home/usr/dir/bar.root', ...]
glob.glob("*.root")
# ['foo.root', 'bar.root', ...]
glob.glob("./*.root")
# ['./foo.root', './bar.root', ...]
...and so forth.
To get only the filename, you can use path.basename of the os module, something like this:
from glob import glob
from os import path
pattern = "/home/usr/dir/*.root"
files = [path.basename(x) for x in glob(pattern)]
# ['foo.root', 'bar.root', ...]
...or, if you want to prepend the dir part:
pattern = "/home/usr/dir/*.root"
files = [path.join('dir', path.basename(x)) for x in glob(pattern)]
# ['dir/foo.root', 'dir/bar.root', ...]
...or, if you really want the path separator at the start:
from glob import glob
import os
pattern = "/home/usr/dir/*.root"
files = [os.sep + os.path.join('dir', os.path.basename(x)) for x in glob(pattern)]
# ['/dir/foo.root', '/dir/bar.root', ...]
Using path.join and path.sep will make sure that the correct path syntax is used, depending on your OS (i.e. / or \ as a separator).
Depending on what you are really trying to do here, you might want to look at os.path.relpath, for the relative path. The title of your question indicates that relative paths might be what you are actually after:
pattern = "/home/usr/dir/*.root"
files = [os.path.relpath(x) for x in glob(pattern)]
# files will now contain the relative path to each file, from the current working directory
just use glob for getting the list you want
and then use os.path.relpath on each file
import glob
files_names = []
for file in glob.glob('/home/usr/dir/*.root'):
files_names.append(os.path.relpath(file, "/home/usr"))
You can also use regex
import re
files_names.append(re.sub(r'//home//usr//','', file, flags=re.I))

Make a List for sub directories path [duplicate]

This question already has answers here:
Getting a list of all subdirectories in the current directory
(34 answers)
Closed 7 years ago.
I have a directory and I need to make a path list for the sub directories.
For example my main directory is
C:\A
which contains four different sub directories : A1,A2,A3,A4
I need a list like this:
Path_List = ["C:\A\A1","C:\A\A2","C:\A\A3","C:\A\A4"]
Cheers
import os
base_dir = os.getcwd()
sub_dirs = [os.path.join(base_dir, d) for d in os.listdir(base_dir)]
You could (and should, i think) use os module, wich is easy to use and provides a lot of stuff to deal with paths.
I wont say anymore, because your question is vague and shows no effort on searching. You are welcome!!
sub_dirs = os.listdir(os.getcwd()) ## Assuming you start the script in C:/A
path_list = []
for i in sub_dirs:
path_list.append(os.path.join(os.getcwd(),i))
Dirty and fast.

save os.walk() in variable [duplicate]

This question already has answers here:
os.walk() ValueError: need more than 1 value to unpack
(4 answers)
Closed 9 years ago.
Can I somehow save the output of os.walk() in variables ? I tried
basepath, directories, files = os.walk(path)
but it didn't work. I want to proceed the files of the directory and one specific subdirectory. is this somehow possible ? Thanks
os.walk() returns a generator that will successively return all the tree of files/directories from the initial path it started on. If you only want to process the files in a directory and one specific subdirectory you should use a mix of os.listdir() and a mixture of os.path.isfile() and os.path.isdir() to get what you want.
Something like this:
def files_and_subdirectories(root_path):
files = []
directories = []
for f in os.listdir(root_path):
if os.path.isfile(f):
files.append(f)
elif os.path.isdir(f):
directories.append(f)
return directories, files
And use it like so:
directories,files = files_and_subdirectories(path)
I want to proceed the files of the directory and one specific
subdirectory. is this somehow possible ?
If that's all you want then simply try
[e for e in os.listdir('.') if os.path.isfile(e)] + os.listdir(special_dir)

Categories