Extracting the date from multiple file names, python

Extracting the date from multiple file names, python - python

I have a list of CSV file names that I am trying to get the date out of in order to put them into order by date
Heres a snippet of the file names
csse_covid_19_data/csse_covid_19_daily_reports/02-01-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/01-31-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/02-02-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/01-24-2020.csv
csse_covid_19_data/csse_covid_19_daily_reports/01-29-2020.csv
I can get the date from a single file using
a_date_string= all_files[0].split("/")[-1].split(".")[0]
print(a_date_string)
which gives output
02-01-2020
How do I get the code to return all the dates? The code above was given I am just trying to manipulate it to be in order by the dates listed in the file name.

dates = [i[-14:-4] for i in all_files]
print(dates)
See if this works for you. This would return a list of those dates.
If any doubts in this snippet let me know in comments.

Since you tagged the question regex, here is a possible solution:
/(\d{2}-\d{2}-\d{4})\.csv$
The resulting capture groups will contain the desired dates.
Demo

a_date_string=[]
for file_name in all_files:
a_date_string.append(file_name.split("/")[-1].split(".")[0])
print(a_date_string)
Assuming all_files is a list of all file names

Related

Is there a way to get python to find a file name based on a changing date variable?

I'm trying to define a function that pulls data out of a newly exported csv every day, where the name of the csv changes based on the day.
So far I have the following:
import pandas as pd
todays_date = pd.to_datetime('today').strftime('%Y%m%d')
todays_date_name_string = 'unchanging part of filename ' + str(todays_date)
var1 = fnmatch.filter(os.listdir('P:directory/'), 'todays_date_name_string*.csv')
print(var1)
But an empty list is printed. I can't seem to get it to take the variable even though when I print todays_date_name_string by itself I get the string I want, am I using fnmatch or os.listdir incorrectly?

Change this line:
var1 = fnmatch.filter(os.listdir('P:directory/'), 'todays_date_name_string*.csv')
to
var1 = fnmatch.filter(os.listdir('P:directory/'), f'{todays_date_name_string}*.csv')
Your problem is that you're trying to use the variable todays_date_name_string, which contains todays date as a string, but you're not actually using it. You're using the string todays_date_name_string, so you're basically just trying to get all files that start with, literally todays_date_name_string, and end with .csv.

How to extract number from a list of string to be in a new column with python?

I have this file name as follows:
8550 - Field Data Progress_070720.xlsx
and I need to get the 8550 only to be in a new column.
How can I do this with python?
Thankyou

As I can get the string from my filename, so I need to:
filename = os.path.basename(xls)
then from the filename, I want to extract 8550 only to appear in a new column.
I was thinking to use split.str, but I did not get the exact script to extract only
Thanks

Store list elements in single variable for query

I am currently facing a probably very simple problem and think too complicated to solve.
I got a excel-file with city names and postal codes.
I read the file and export the postal codes (PLZ) with
zipfile = pd.read_excel("file.xlsx")
zipcode = pd.DataFrame(data, columns=['PLZ']).values
Output is: [80331][80333] ....
Each ZIP code is later used to conduct a query on a website.
For that I use bs4 and request and the follwing line of code (is not the complete code, just the relevant line):
data = {'tx_ybpn_storefinder[searchReq][term]': zip}
The process is:
Enter the ZIP code from the list (in "zip")
Query on the website
Save the results (data) of the website-query
Query with the next ZIP code
Save data from query
Repeat for every zip code in the list
I think I have to work here with a for/while-loop-combination, but actually I dont know how. Is it necessary to store each zip code in a unique variable?
Thanks in advance!

I think I have to work here with a for/while-loop-combination
Right. Loop over the values in the PLZ column:
zipcode = pd.read_excel("file.xlsx")
for zip in zipcode['PLZ']:
data = {'tx_ybpn_storefinder[searchReq][term]': zip}
# query the website, etc.

Access last string after split function to create new list

I am a beginner in Python and I have been working on a code to access two types of files (dcd and inp files), combine them and create a new list with the matching strings.
I got stuck somewhere at the beginning. I want to get all dcd files here. So they have .dcd extension but the first part is not the same. So I was thinking if there is a way to access them after I have split the string.
#collect all dcd files into a list
list1 = []
for filename1 in glob.glob('*/FEP_SYAF014*/FEP1/298/*/*.dcd'):
filename1 = filename1.split('/')
filename1.sort()
list1.append(filename1)
I want to get only names with dcd extension that are indexed [5] and create a new list or mutate this one, but I am not sure how to do that.
p.s I have just posted first part of the code
Thank you !
the oddly sorted part
this one looks better
and this is how I would like it to look like, but sorted and without eq* files.
want this sorted

just use sort with a sort key: os.path.basename (extracts only the basename of the file to perform sort):
import os, glob
list1 = sorted(glob.glob('*/FEP_SYAF014*/FEP1/298/*/*.dcd'), key = os.path.basename)

So this worked. I just added del filename1[:5] to get rid of other unnecessary string parts
import os, glob
list1 = sorted(glob.glob('/FEP_SYAF014/FEP1/298//.dcd'), key = os.path.basename)
for filename1 in sorted(glob.glob('*/FEP_SYAF014 */FEP1/298/*/*.dcd'),key = os.path.basename):
filename1 = filename1.split('/')
filename1.sort()
list1.append(filename1)
del filename1[:5]
print filename1

Your sort function is applied to file name parts. This is not what you want. If I understand well you want to sort the filename list, not the parts of the filename.
The code given by Jean François is great but I guess you'd like to get your own code working.
You need to extract the file name by using only the last part of the split
A split returns a list of strings. Each item is a part of the original.
filename = filename.split ('/')[len (filename.split ('/'))-1]
This line will get you the last part of the split
Then you can add that part to your list
And after all that you can sort your list
Hope this helps!

Sort dates through file name date

I have a folder with *.txt files which contain a specific format (c is character and d is digit and yyyy-mm-dd-hh-mm-ss is the date format)
cccccd_ddd_cc_ccc_c_dd-ddd_yyyy-mm-dd-hh-mm-ss.txt
or
cccccd_ddd_cc_ccc_c_dd-dddd_yyyy-mm-dd-hh-mm-ss.txt
or
cccccd_ddd_cc_ccc_c_d_yyyy-mm-dd-hh-mm-ss.txt
when the single digidt d is equal to 0
I would like to create a python script to obtain the dates and sort the files from that specific date.
SO far I ahve done
import os
list_files=[]
for file in os.listdir():
if file.endswith(".txt"):
#print(file)
list_files.append(file)
But I am bit new with regular expressions. Thanks

You can use .split() to split a string.
It seems that we can split from the last occurence of "_", remove the part after "." to get the timestamp.
So, method to return timestamp from the file_name is:
def get_timestamp(file_name):
return file_name.split("_")[-1].split('.')[0]
As all the dates are of same format, python can sort those using the timestamp string itself.
To get the sorted list of filenames using that timestamp, you can do:
sorted_list = sorted(list_files, key=get_timestamp)
More about the Key function can be learned from official python documentation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting the date from multiple file names, python - python

dates = [i[-14:-4] for i in all_files] print(dates) See if this works for you. This would return a list of those dates. If any doubts in this snippet let me know in comments.

Since you tagged the question regex, here is a possible solution: /(\d{2}-\d{2}-\d{4})\.csv$ The resulting capture groups will contain the desired dates. Demo

a_date_string=[] for file_name in all_files: a_date_string.append(file_name.split("/")[-1].split(".")[0]) print(a_date_string) Assuming all_files is a list of all file names

Related

Is there a way to get python to find a file name based on a changing date variable?

How to extract number from a list of string to be in a new column with python?

Store list elements in single variable for query

Access last string after split function to create new list

Sort dates through file name date

Categories

Resources