Sort dates through file name date - python

I have a folder with *.txt files which contain a specific format (c is character and d is digit and yyyy-mm-dd-hh-mm-ss is the date format)
cccccd_ddd_cc_ccc_c_dd-ddd_yyyy-mm-dd-hh-mm-ss.txt
or
cccccd_ddd_cc_ccc_c_dd-dddd_yyyy-mm-dd-hh-mm-ss.txt
or
cccccd_ddd_cc_ccc_c_d_yyyy-mm-dd-hh-mm-ss.txt
when the single digidt d is equal to 0
I would like to create a python script to obtain the dates and sort the files from that specific date.
SO far I ahve done
import os
list_files=[]
for file in os.listdir():
if file.endswith(".txt"):
#print(file)
list_files.append(file)
But I am bit new with regular expressions. Thanks

You can use .split() to split a string.
It seems that we can split from the last occurence of "_", remove the part after "." to get the timestamp.
So, method to return timestamp from the file_name is:
def get_timestamp(file_name):
return file_name.split("_")[-1].split('.')[0]
As all the dates are of same format, python can sort those using the timestamp string itself.
To get the sorted list of filenames using that timestamp, you can do:
sorted_list = sorted(list_files, key=get_timestamp)
More about the Key function can be learned from official python documentation.

Related

Is there a way to get python to find a file name based on a changing date variable?

I'm trying to define a function that pulls data out of a newly exported csv every day, where the name of the csv changes based on the day.
So far I have the following:
import pandas as pd
todays_date = pd.to_datetime('today').strftime('%Y%m%d')
todays_date_name_string = 'unchanging part of filename ' + str(todays_date)
var1 = fnmatch.filter(os.listdir('P:directory/'), 'todays_date_name_string*.csv')
print(var1)
But an empty list is printed. I can't seem to get it to take the variable even though when I print todays_date_name_string by itself I get the string I want, am I using fnmatch or os.listdir incorrectly?
Change this line:
var1 = fnmatch.filter(os.listdir('P:directory/'), 'todays_date_name_string*.csv')
to
var1 = fnmatch.filter(os.listdir('P:directory/'), f'{todays_date_name_string}*.csv')
Your problem is that you're trying to use the variable todays_date_name_string, which contains todays date as a string, but you're not actually using it. You're using the string todays_date_name_string, so you're basically just trying to get all files that start with, literally todays_date_name_string, and end with .csv.

Find file by part of filename in python

I need to search by part of filenames.
I have a column in my database that contains the names of products, and each product has a file for description, so I need to add this file to the same raw in my database.
I want to add a new column called description and add the contents of the file that has the same name on the column name, but the names in column and file are different, for example, the product called cs12 in the database and datasheet-cs12 or guide-cs12 or anything like this in the file
You will need to figure out how to
get a list of files in a folder
Look for a substring in each element of that list
Re 1.: https://stackoverflow.com/search?q=%5Bpython%5D+get+list+of+files+in+directory
Re 2.:
You have a logical problems. There might be multiple files that match any string. So, I don't think you can solve this fully automatically at all. What if you have two files datasheet_BAT54alternative.txt and info_BAT54A, and two rows containing the string BAT54 and BAT54A. A "BAT54" is not the same as a "BAT54A". So, you'll always have to deal with a list of candidates. If you're lucky, that list has only one entry:
def give_candidates(list_of_file_names, substring):
return [ fname for fname in file_names if substring.lower() in fname.lower() ]

Strip str between last 2 instances of common character

I have many different strings (which are files) that look like this:
20201225_00_ec_op_2m_temp_24hdelta_argentinacorn_timeseries.nc
20201225_00_ec_op_2m_temp_chinawheat_timeseries.nc
20201225_00_ec_op_snowfall_romaniawheat_timeseries.nc
And many more. I want to be able to loop through all of these files and store their file path in a dictionary. To do that, I want the key to be the text that is between the last two instances of an underscore. For example, if this was the file 20201225_00_ec_op_2m_temp_24hdelta_argentinacorn_timeseries.nc, then the dict would be
{'argentinacorn': path/to/file/20201225_00_ec_op_2m_temp_24hdelta_argentinacorn_timeseries.nc
How can I loop through and do this pythonically?
You can use regexes to extract the key from the strings like this:
import re
input_string = "20201225_00_ec_op_2m_temp_24hdelta_argentinacorn_timeseries.nc"
dict_key = re.findall(".*_(.+)_[^_]+", input_string)[0]
gives
'argentinacorn'
Or with just a simple split:
dict_key = input_string.split("_")[-2]
Regarding file names, you can get the list from current working directory like this:
import os
file_names = os.listdir()
You can just loop through this list and apply the split/regex as shown above.
A simple split and pick:
parts = "20201225_00_ec_op_2m_temp_24hdelta_argentinacorn_timeseries.nc".split("_")
key = parts[-2:-1]

Extracting the date from multiple file names, python

I have a list of CSV file names that I am trying to get the date out of in order to put them into order by date
Heres a snippet of the file names
csse_covid_19_data/csse_covid_19_daily_reports/02-01-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/01-31-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/02-02-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/01-24-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/01-29-2020.csv
I can get the date from a single file using
a_date_string= all_files[0].split("/")[-1].split(".")[0]
print(a_date_string)
which gives output
02-01-2020
How do I get the code to return all the dates? The code above was given I am just trying to manipulate it to be in order by the dates listed in the file name.
dates = [i[-14:-4] for i in all_files]
print(dates)
See if this works for you. This would return a list of those dates.
If any doubts in this snippet let me know in comments.
Since you tagged the question regex, here is a possible solution:
/(\d{2}-\d{2}-\d{4})\.csv$
The resulting capture groups will contain the desired dates.
Demo
a_date_string=[]
for file_name in all_files:
a_date_string.append(file_name.split("/")[-1].split(".")[0])
print(a_date_string)
Assuming all_files is a list of all file names

Access last string after split function to create new list

I am a beginner in Python and I have been working on a code to access two types of files (dcd and inp files), combine them and create a new list with the matching strings.
I got stuck somewhere at the beginning. I want to get all dcd files here. So they have .dcd extension but the first part is not the same. So I was thinking if there is a way to access them after I have split the string.
#collect all dcd files into a list
list1 = []
for filename1 in glob.glob('*/FEP_SYAF014*/FEP1/298/*/*.dcd'):
filename1 = filename1.split('/')
filename1.sort()
list1.append(filename1)
I want to get only names with dcd extension that are indexed [5] and create a new list or mutate this one, but I am not sure how to do that.
p.s I have just posted first part of the code
Thank you !
the oddly sorted part
this one looks better
and this is how I would like it to look like, but sorted and without eq* files.
want this sorted
just use sort with a sort key: os.path.basename (extracts only the basename of the file to perform sort):
import os, glob
list1 = sorted(glob.glob('*/FEP_SYAF014*/FEP1/298/*/*.dcd'), key = os.path.basename)
So this worked. I just added del filename1[:5] to get rid of other unnecessary string parts
import os, glob
list1 = sorted(glob.glob('/FEP_SYAF014/FEP1/298//.dcd'), key = os.path.basename)
for filename1 in sorted(glob.glob('*/FEP_SYAF014 */FEP1/298/*/*.dcd'),key = os.path.basename):
filename1 = filename1.split('/')
filename1.sort()
list1.append(filename1)
del filename1[:5]
print filename1
Your sort function is applied to file name parts. This is not what you want. If I understand well you want to sort the filename list, not the parts of the filename.
The code given by Jean François is great but I guess you'd like to get your own code working.
You need to extract the file name by using only the last part of the split
A split returns a list of strings. Each item is a part of the original.
filename = filename.split ('/')[len (filename.split ('/'))-1]
This line will get you the last part of the split
Then you can add that part to your list
And after all that you can sort your list
Hope this helps!

Categories