I have a large sales database the first column of which is the purchase date. The problem is some of these dates are entered in DD.MM.YY format, some in YY.MM.DD and some in YYYY/MM/DD. I want to make them all to same format. What is the cleanest way I can do this?
Note 1: I'm thinking of doing a series of ifs but that would be a lot of conditions so I'm wondering if there is a cleaner shortcut.
Note 2: An additional complication is that the dates are in Jalaali calender and not Gregorian. I have the function that will convert them to gregorian but I need to pass the correct year, month, day arguments to it; this is why I want to bring them all to a single format. But additionally, this means that if you offer some "Gregorian-only" solutions, like dateutil.parser, it might not work.
Immediately after posting this I found/thought of a solution myself, but instead of deleting the question I decided to post the answer in case someone else come to a similar problem.
tl;dr - I just added a century option to dateutil.parser. I didnt know how to but I found this.
Here's my end code:
from khayyam import JalaliDate
from dateutil.parser import parse, parserinfo
class MyParserInfo(parserinfo):
def convertyear(self, year, *args, **kwargs):
if year < 100:
year += 1300
return year
if __name__ == '__main__':
dt = parse("9.12.96", MyParserInfo()).date()
a=JalaliDate(dt.year, dt.month, dt.day).todate()
print(dt)
print(a)
#1396-09-12
#2017-12-03
Related
I need to return the date format from a string. Currently I am using parser to parse a string as a date, then replacing the year with a yyyy or yy. Similarly for other dates items. Is there some function I could use that would return mm-dd-yyyy when I send 12-05-2018?
Technically, it is an impossible question. If you send in 12-05-2018, there is no way for me to know whether you are sending in a mm-dd-yyyy (Dec 5, 2018) or dd-mm-yyyy (May 12, 2018).
One approach might be to do a regex replacement of anything which matches your expected date pattern, e.g.
date = "Here is a date: 12-05-2018 and here is another one: 10-31-2010"
date_masked = re.sub(r'\b\d{2}-\d{2}-\d{4}\b', 'mm-dd-yyyy', date)
print(date)
print(date_masked)
Here is a date: 12-05-2018 and here is another one: 10-31-2010
Here is a date: mm-dd-yyyy and here is another one: mm-dd-yyyy
Of course, the above script makes no effort to check whether the dates are actually valid. If you require that, you may use one of the date libraries available in Python.
I don't really understand what you plan to do with the format. There are two reasons I can think of why you might want it. (1) You want at some future point to convert a normalized datetime back into the original string. If that is what you want you would be better off just storing the normalized datetime and the original string. Or (2) you want to draw (dodgy) conclusions about person sending the data, because different nationalities will tend to use different formats. But, whatever you want it for, you can do it this way:
from dateutil import parser
def get_date_format(date_input):
date = parser.parse(date_input)
for date_format in ("%m-%d-%Y", "%d-%m-%Y", "%Y-%m-%d"):
# You can extend the list above to include formats with %y in addition to %Y, etc, etc
if date.strftime(date_format) == date_input:
return date_format
>>> date_input = "12-05-2018"
>>> get_date_format(date_input)
'%m-%d-%Y'
You mention in a comment you are prepared to make assumptions about ambiguous dates like 12-05-2018 (could be May or December) and 05-12-18 (could be 2018 or 2005). You can pass those assumptions to dateutil.parser.parse. It accepts boolean keyword parameters dayfirst and yearfirst which it will use in ambiguous cases.
Take a look at the datetime library. There you will find the function strptime(), which is exactly what you are looking for.
Here is the documentation: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
I have some text, taken from different websites, that I want to extract dates from. As one can imagine, the dates vary substantially in how they are formatted, and look something like:
Posted: 10/01/2014
Published on August 1st 2014
Last modified on 5th of July 2014
Posted by Dave on 10-01-14
What I want to know is if anyone knows of a Python library [or API] which would help with this - (other than e.g. regex, which will be my fallback). I could probably relatively easily remove the "posed on" parts, but getting the other stuff consistent does not look easy.
My solution using dateutil
Following Lukas's suggestion, I used the dateutil package (seemed far more flexible than Arrow), using the Fuzzy entry, which basically ignores things which are not dates.
Caution on Fuzzy parsing using dateutil
The main thing to note with this is that as noted in the thread Trouble in parsing date using dateutil if it is unable to parse a day/month/year it takes a default value (which is the current day, unless specified), and as far as i can tell there is no flag reported to indicate that it took the default.
This would result in "random text" returning today's date of 2015-4-16 which could have caused problems.
Solution
Since I really want to know when it fails, rather than fill in the date with a default value, I ended up running twice, and then seeing if it took the default on both instances - if not, then I assumed parsing correctly.
from datetime import datetime
from dateutil.parser import parse
def extract_date(text):
date = {}
date_1 = parse(text, fuzzy=True, default=datetime(2001, 01, 01))
date_2 = parse(text, fuzzy=True, default=datetime(2002, 02, 02))
if date_1.day == 1 and date_2.day ==2:
date["day"] = "XX"
else:
date["day"] = date_1.day
if date_1.month == 1 and date_2.month ==2:
date["month"] = "XX"
else:
date["month"] = date_1.month
if date_1.year == 2001 and date_2.year ==2002:
date["year"] = "XXXX"
else:
date["year"] = date_1.year
return(date)
print extract_date("Posted: by dave August 1st")
Obviously this is a bit of a botch (so if anyone has a more elegant solution -please share), but this correctly parsed the four examples i had above [where it assumed US format for the date 10/01/2014 rather than UK format], and resulted in XX being returned appropriately when missing data entered.
You could use Arrow library:
arrow.get('2013-05-05 12:30:45', ['MM/DD/YYYY', 'MM-DD-YYYY'])
Two arguments, first a str to parse and second a list of formats to try.
For example I give the date as:
2/12/2015
The result should be:
February/Thursday/2015
I tried to do with if but I'm not getting the result. It would be nice if you could tell me the long way (without using built in functions (like datetime and others) too much). I'm new to python and not much is taught in my school.
You don't have to use datetime too much, simply parse the date and output it in whatever format you want
from datetime import datetime
d = "2/12/2015"
print(datetime.strptime(d,"%m/%d/%Y").strftime("%B/%A/%Y"))
February/Thursday/2015
A = Locale’s full weekday name.
B = Locale’s full month name.
Y = Year with century as a decimal number.
All the format directives are here
You could create a dict mapping but you will find datetime is lot simpler.
I am writing a python code to change the date in linux system to today-1 (dynamically). I tried various combinations but, yet I am not able to succeed. I searched and I found a close proximity to my scenario in this question .
I am able to change the date if I execute the command with static value say:
date --set="$(date +'2013%m%d %H:%M')"
However, I don't want to specify hardcoded value for year i.e., 2013. Instead i want to specify something like "%y-1" i.e.,
date --set="$(date +'%y-1%m%d %H:%M')"
If I run the above command I get the following error
[root#ramesh ~]$ date --set="$(date +'%y-1%m%d %H:%M')"
date: invalid date `14-11016 13:05'
Thanks for your answer. I did not try your approach though, reason being it has to be once again dealt with formatting issues when working with arithmetic operations incase if you want to.
So, I figured out a much simpler and generalized approach
Fetch the previous_year value with this command
date --date='1 years ago'
This gives the previous year date. Now this can be used in the python program to update the system in the following way
"date --set=$(date +'%%y%%m%s %%H:%%M') % previous_year"
This method has few advantages like
I can apply this method for day and month as well like "1 days ago", "1 month ago" along with +%d, +%m, +%y values.
e.g., date --date='1 years ago' +%y
I don't have to worry about the date and month arithmetic calculation logics
date will interpret the %y-1 literally has you showed. What you need is to retrieve the current year, subtract 1 and use this value as the new year.
To get the current_year - 1 you can do:
previous_year=$((`date +'%y'`-1))
echo $previous_year
>>> 13
Now you just need to use this variable to set the new date.
from datetime import datetime
datetime.strptime('%b%d %I:%M%p', 'AUG21 3:26PM')
results with
1900-08-21 15:26:00
how can I write in pythonic way so that when there's no year, take the current year as default (2013)?
I checked and strftime function doesn't have option to change the default.. maybe another time libraries can do?
thx
Parse the date as you are already doing, and then
date= date.replace(2013)
This is one of simplest solution with the modules you are using.
Thinking better about it, you will probably face a problem next Feb 29.
input= 'Aug21 3:26PM'
output= datetime.datetime.strptime('2013 '+ input ,'%Y %b%d %I:%M%p')
You can find out today's date to replace year for dynamic replacement.
datetime.strptime('%b%d %I:%M%p', 'AUG21 3:26PM').replace(year=datetime.today().year)