Maintaining sequence of the read files in a directory with python - python

I am trying to loop through the csv files in a directory so that they are read in the same sequence as stored in the directory. I am aware of the methods like using os.walk or os.listdir. To preserve the sequence I used a sorted(os.listdir..) but the sequence is still not the same. I have attached two images for ease of understanding. I do not want to change the names of the files now because this data was generated from different simulation software.

try this statement: sorted(os.listdir(dir_unresolved),key=len)

Related

How to Open and Save Files in Parallel Directory Without Knowing Full Path

I am wondering if there is an easy way to access 'parallel' directories (See photo for what I am talking about... I don't know what else to call them, please correct me if they are called something else!) from a Python file without having to input the string path.
The basic structure I intend to use is shown in the picture. The structure will be used across different computers, so I need to avoid just typing in "C:\stuff_to_get_there\parent_directory\data\file.txt" because "C:\stuff_to_get_there" will not be the same on different computers.
I want to store the .py files in their own directory, then access the data files in data directory, and save figures to figures directory. I was thinking of trying os module but not sure if that's the correct way to go.
parent directory
scripts
.py files
figures
save files here
data
.txt files stored here
Thanks for any help!

Create new directory with files without copying or overwriting old ones? (Python)

I'm working with many thousands of images held in a particular folder structure which is needed for certain image processing programs. There is a different program, however, that requires the images to be in a different folder structure. I can't change the programs, so what I do is just make the second folder structure out of the first one using copyfile from python's shutil module.
This works okay with small data sets, but my latest one is 12 gigs and it is so silly to have duplicates of everything. Is there a way to create multiple folders containing the "same" file?
Thanks so much!
The usual solution to this is to simply make a symbolic link with the desired name. This saves the trouble of consuming double disk storage. In UNIX (Linux), the command is
ln -s <existing_dir_name> <desried_dir_name>
In Windows, you make a "short cut".

What is the best way to open multiple files in python when I have the name of the files stored in a list?

I am dealing with a large set of data that can be classified to be written in one of many files. I am trying to open the files all at once so I can write to the files as I am going through the data (I am working with Python 3.7).
I could do multiple
with open(...) as ... statements but I was wondering if there is a way to do this without having to write out the open statements for each file.
I was thinking about using a for loop to open the files but heard this is not exception safe and is bad practice.
So what do you think is the best way to open multiple files where the filenames are stored in a list?
I usually use glob and dict to do so. This will asume your data is in .csv format, but shouldn't really matter to the idea:
You use glob to create a variable with all your files. Say they are in a folder called Data inside your main folder:
data=glob.glob('Data/'+'*.csv') #Put every .csv file into a list
#you can change .csv with wathever you need
dict_data={} #Create empty dictionary
for n,i in enumerate(sorted(data)):
dict_data['file_'+str(n+1)]=pd.read_csv(i)
Here you can replace with your with...open statement. In the end you'll get a dict with keys file_1 ,..., file_n that will have inside your data. I find it the best way to work with lots of data. Might need to do some tinkering if you're working with more than one type of data, though.
Hope it helps

how to compare and check if two binary(h5) files numerically have the same contents

I am trying to compare the contents of two binary files. I use python 3.6 filecomp comparing same name files inside two directories.
results_dummy=filecmp.cmpfiles(dir1, dir2, common, shallow=True)
The above line works for *.bin file I have in both directories, but it does not work with h5 files.
When comparing two hdf5 files that contain exactly the same groups/datasets and numerical data, filecmp.cmpfiles finds them as mismatch.
Is there anyway to compare the contents of two hdf5 files from within Python script and without using h5diff?
Thanks in Advance,
I finally set with using h5diff. The user of the script would need to install hdf5/Tools to run the script though,
Thanks for your answers,

How can I efficiently select 100 random JPG files from a directory (including subdirs) in Python?

I have a very large directory of files and folders. Currently, I scan the entire directory for JPGs and store them in a list. This is really slow due to the size of the directory. Is there a faster, more efficient way to do this? Perhaps without scanning everything?
My directory looks like this:
/library/Modified/2000/[FolderName]/Images.JPG
/library/Modified/2001/[FolderName]/Images.JPG
/library/Modified/2002/[FolderName]/Images.JPG
/library/Modified/2003/[FolderName]/Images.JPG
/library/Modified/2004/[FolderName]/Images.JPG
...
/library/Modified/2012/FolderName/Images.JPG
Thanks
See Generator Tricks for System Programmers for a bunch of neat stuff. But specifically, see the gen-find example. This is as efficient as you are going to get, without making a bunch of assumptions about your file structure layout.
Assuming that you application is the only one changing directory and that you have control over the directory names/structure and that you have to do the operation described in your question more than once:
Rename all the files once so you can access them in predictable order. Say, give all files numeric name from 1 to N (where N is the number of files in directory) and have a special file ".count" which will hold the N for each directory. Then access them directly with their names generated by random generator.
I don't know where the slowness occurs, but to scan directories and files I found it much faster the dump the directories/files into a text file first using a batch file then get python to read the file. This worked well on our server system with 7 servers and many thousands of directories.
Python could, of course, run the batch file.

Categories