Reading multiple CSVs using glob results in wrong order [duplicate] - python

This question already has answers here:
Sort filenames in directory in ascending order [duplicate]
(3 answers)
Non-alphanumeric list order from os.listdir()
(14 answers)
Closed 2 years ago.
I need to read in multiple CSV files in the correct order. The files are named with sequential numbers, like "file_0.csv", "file_1.csv", "file_2.csv", ... and were created in this same order.
When running my code, the files are not kept in this order but instead completely mixed up.
There are no other files in the path folder.
path = "stored_files"
filenames = glob.iglob(path + "/*.csv")
dataframes = []
for filename in filenames:
dataframes.append(pd.read_csv(filename))

AFAIK, glob provides randomly-ordered access to the folder. You can always sort fhe filenames:
path = "stored_files"
filenames = glob.iglob(path + "/*.csv")
dataframes = []
for filename in sorted(filenames):
dataframes.append(pd.read_csv(filename))

String sort will not sort strings with numbers the way you expect (in particular, 10 precedes 2). So, if you know what your filenames look like, do loop through the numbers and append "foo"+str(i)+".csv" or whatever to your filelist.

Related

How to get all files in a directory? [duplicate]

This question already has answers here:
List only files in a directory?
(8 answers)
how to check if a file is a directory or regular file in python? [duplicate]
(4 answers)
Closed 2 months ago.
I have a directory and need to get all files in it, but not subdirectories.
I have found os.listdir(path) but that gets subdirectories as well.
My current temporary solution is to then filter the list to include only the things with '.' in the title (since files are expected to have extensions, .txt and such) but that is obviously not optimal.
We can create an empty list called my_files and iterate through the files in the directory. The for loop checks to see if the current iterated file is not a directory. If it is not a directory, it must be a file.
my_files = []
for i in os.listdir(path):
if not os.path.isdir(i):
my_files.append(i)
That being said, you can also check if it is a file instead of checking if it is not a directory, by using if os.path.isfile(i).
I find this approach is simpler than glob because you do not have to deal with any path joining.

While iterating through files, how to append each filename to a list? [duplicate]

This question already has answers here:
how to split out the file name from path by different characters in python?
(2 answers)
Extract file name from path, no matter what the os/path format
(22 answers)
Split filenames with python
(6 answers)
How do I get the filename without the extension from a path in Python?
(31 answers)
Closed 2 years ago.
I am iterating through a folder of files, to extract some text from an xml, and wish to keep track of which file each text match came from.
I am looking to put the filenames into the filename_master list. I think I may be over-complicating by using a regex (each filename has 14 digits.xml) but this isn't coming to me.
path = '/Users/Downloads/PDF/XML/'
read_files = glob.glob(os.path.join(path, '*.xml'))
filename_master=[]
text_master=[]
for file in read_files:
parse = ET.parse(file)
root = parse.getroot()
all_nodes = list(root.iter())
ls=[ele.text for ele in all_nodes if ele.findall('[#mark="1"]')]
my_exp = re.compile(r'.*(\d{14})\.xml')
name = my_exp.match(file).group(1)
filename_master.append(name)
text_master.append(ls)
If you are sure that every file has 14 digits, you may
name = file[-18:-4]
filename_master.append(name)
or if you are in linux environment (where "/" is path seperator):
name = file.split('/')[-1][:-4]
filename_master.append(name)
or better:
name = os.path.basename(file)[:-4]
filename_master.append(name)
but using regex is fine IMHO.

Reading multiple lines of strings into a list - Python [duplicate]

This question already has answers here:
How do I list all files of a directory?
(21 answers)
Closed 2 years ago.
I have a code that outputs all of the files that are a .pdf in a directory. It outputs a stack of strings like below.
file0.PDF
file1.PDF
file2.PDF
file3.PDF
I want to put these strings into a list, that looks like this:
['file0.PDF', 'file1.PDF', 'file2.PDF', 'file3.PDF']
I have managed to do this with the code below.
import os
list_final = []
for file in os.listdir(path):
if ".PDF" in file:
for value in file.split('\n'):
list_final.append(value)
print(list_final)
This gives the list in the format above, which is what I want.
Is there a better way to do this? I feel that my code is very inefficient. I have tried through a list comprehensions such as the below but I am unsure why it does not work.
list_final = [value for value in file.split('\n')]
Thanks in advance.
Try using glob.glob(), it will find all files that meet a pattern:
import glob
print(glob.glob("*.pdf")) # returns a list of filenames
Or if you want to use another path than the current path, just join it to the pattern
print(glob.glob(path + "/*.pdf")) # returns a list of filenames
Or even better, use os.path.join() instead:
from os.path import join
glob.glob(join(path, "/*.pdf"))
you can use a list comprehension:
list_final = [e for e in os.listdir(path) if e.endswith('.PDF')]
or you could use pathlib.Path.glob :
from pathlib import Path
p = Path(path)
list_final = [e.name for e in p.glob('*.PDF')]

How to rename many files in many folders with python? [duplicate]

This question already has answers here:
Rename multiple files inside multiple folders
(3 answers)
Closed 4 years ago.
i'm trying to erase all indexes (characters) except the last 4 ones and the files' extension in python. for example:
a2b-0001.tif to 0001.tif
a3tcd-0002.tif to 0002.tif
as54d-0003.tif to 0003.tif
Lets say that folders "a", "b" and "c" which contains those tifs files are located in D:\photos
there many of those files in many folders in D:\photos
that's where i got so far:
import os
os.chdir('C:/photos')
for dirpath, dirnames, filenames in os.walk('C:/photos'):
os.rename (filenames, filenames[-8:])
why that' not working?
So long as you have Python 3.4+, pathlib makes it extremely simple to do:
import pathlib
def rename_files(path):
## Iterate through children of the given path
for child in path.iterdir():
## If the child is a file
if child.is_file():
## .stem is the filename (minus the extension)
## .suffix is the extension
name,ext = child.stem, child.suffix
## Rename file by taking only the last 4 digits of the name
child.rename(name[-4:]+ext)
directory = pathlib.Path(r"C:\photos").resolve()
## Iterate through your initial folder
for child in directory.iterdir():
## If the child is a folder
if child.is_dir():
## Rename all files within that folder
rename_files(child)
Just note that because you're truncating file names, there may be collisions which may result in files being overwritten (i.e.- files named 12345.jpg and 22345.jpg will both be renamed to 2345.jpg, with the second overwriting the first).

Use os.listdir to show directories only [duplicate]

This question already has answers here:
How to list only top level directories in Python?
(21 answers)
Closed 2 years ago.
How can I bring python to only output directories via os.listdir, while specifying which directory to list via raw_input?
What I have:
file_to_search = raw_input("which file to search?\n>")
dirlist=[]
for filename in os.listdir(file_to_search):
if os.path.isdir(filename) == True:
dirlist.append(filename)
print dirlist
Now this actually works if I input (via raw_input) the current working directory. However, if I put in anything else, the list returns empty. I tried to divide and conquer this problem but individually every code piece works as intended.
that's expected, since os.listdir only returns the names of the files/dirs, so objects are not found, unless you're running it in the current directory.
You have to join to scanned directory to compute the full path for it to work:
for filename in os.listdir(file_to_search):
if os.path.isdir(os.path.join(file_to_search,filename)):
dirlist.append(filename)
note the list comprehension version:
dirlist = [filename for filename in os.listdir(file_to_search) if os.path.isdir(os.path.join(file_to_search,filename))]

Categories