I'm looking to accept date input into a script and running into the conundrum of how to differentiate between 040507 // 04052007 and 050407 // 05042007 when user intends April 5th, 2007 (or May 4th, 2007). The US tends to follow the first form, but other countries the second.
I know I can use IP/GPS in some instances, but I'm looking for a method that works offline, maybe from system location/language?
I'm primarily looking for a Windows solution, but surely others will be useful in future/to others.
NB I'm not considering timezone a good option, as different countries in the same timezone can use different conventions.
Judging by your date formats, I think your user manually enters the date. Unfortunately, locality will have little to do with how it is entered. I am in the US but prefer to input my date with the full year.
A simple way would be to force a standard or accept either way and test for which was entered.
def test():
while True:
testdate = input()
if testdate.isdigit() and len(testdate) == 6:
#do something
break
elif testdate.isdigit() and len(testdate) == 8:
#do something
break
else:
print("Please enter correct format")
This would check to make sure only digits are entered and then checks the length to determine which format was used.
You could force a standard by specifying “ddmmyyyy” and only accept an 8 digit input.
If I’m wrong on how the date is given, let me know and I’ll update accordingly.
EDIT:
If you want to guess the user’s format by determining their location, you can use the locale module.
import locale
print(locale.getlocale())
Output:
('en_US', 'UTF-8')
Another way using locale is to check the international currency symbol of the locale.
import locale
locale.setlocale(locale.LC_ALL, "")
print(locale.localeconv()['int_curr_symbol'])
Output:
USD
Here is a list of currency codes: https://www.ibm.com/support/knowledgecenter/en/SSZLC2_7.0.0/com.ibm.commerce.payments.developer.doc/refs/rpylerl2mst97.htm
You could always check the OS default language, using getdefaultlocale(), and you could use that to guide how you parse dates:
>>>import locale
>>>locale.getdefaultlocale()
('en_US', 'cp1252')
This wouldn't be exact, as I would enter dates the same way no matter what locale my computer is using, but it could give you a starting point.
Related
I have a problem with upper and lower functions in Python/Django. I have the following line of code:
UserInfo.objects.get(id=user_id).city.upper()
The problem is that some of the Turkish users let me know that they are seeing the wrong information. For example one of them is from a city called "izmir". The upper function converts that into "IZMIR" it turns out the actual result should be "İZMİR".
What is the right way to use upper or lower functions for any given language? I read about changing the server locale as an answer. Changing server locale for each user request does not make sense to me. What about multi-threaded applications that handle different users simultaneously?
Thanks in advance for your time.
I would suggest you to use PyICU
>>> from icu import UnicodeString, Locale
>>> tr = Locale("TR")
>>> s = UnicodeString("I")
>>> print(unicode(s.toLower(tr)))
ı
>>> s = UnicodeString("i")
>>> print(unicode(s.toUpper(tr)))
İ
Firstly ask your user to select his preferred language and convert his city to that's language in uppercase.
I need to return the date format from a string. Currently I am using parser to parse a string as a date, then replacing the year with a yyyy or yy. Similarly for other dates items. Is there some function I could use that would return mm-dd-yyyy when I send 12-05-2018?
Technically, it is an impossible question. If you send in 12-05-2018, there is no way for me to know whether you are sending in a mm-dd-yyyy (Dec 5, 2018) or dd-mm-yyyy (May 12, 2018).
One approach might be to do a regex replacement of anything which matches your expected date pattern, e.g.
date = "Here is a date: 12-05-2018 and here is another one: 10-31-2010"
date_masked = re.sub(r'\b\d{2}-\d{2}-\d{4}\b', 'mm-dd-yyyy', date)
print(date)
print(date_masked)
Here is a date: 12-05-2018 and here is another one: 10-31-2010
Here is a date: mm-dd-yyyy and here is another one: mm-dd-yyyy
Of course, the above script makes no effort to check whether the dates are actually valid. If you require that, you may use one of the date libraries available in Python.
I don't really understand what you plan to do with the format. There are two reasons I can think of why you might want it. (1) You want at some future point to convert a normalized datetime back into the original string. If that is what you want you would be better off just storing the normalized datetime and the original string. Or (2) you want to draw (dodgy) conclusions about person sending the data, because different nationalities will tend to use different formats. But, whatever you want it for, you can do it this way:
from dateutil import parser
def get_date_format(date_input):
date = parser.parse(date_input)
for date_format in ("%m-%d-%Y", "%d-%m-%Y", "%Y-%m-%d"):
# You can extend the list above to include formats with %y in addition to %Y, etc, etc
if date.strftime(date_format) == date_input:
return date_format
>>> date_input = "12-05-2018"
>>> get_date_format(date_input)
'%m-%d-%Y'
You mention in a comment you are prepared to make assumptions about ambiguous dates like 12-05-2018 (could be May or December) and 05-12-18 (could be 2018 or 2005). You can pass those assumptions to dateutil.parser.parse. It accepts boolean keyword parameters dayfirst and yearfirst which it will use in ambiguous cases.
Take a look at the datetime library. There you will find the function strptime(), which is exactly what you are looking for.
Here is the documentation: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
I am trying to fix older code someone wrote years ago using python. I believe the "\d\d\d\d" refers to the number of text characters, and 0-9A-Z limits the type of input but I can't find any documentation on this.
idTypes = {"PFI":"\d\d\d\d",
"VA HOSPITAL ID":"V\d\d\d",
"CERTIFICATION NUMBER":"\d\d\d-[A-Z]-\d\d\d",
"MORTUARY FIRM ID":"[0-9]",
"HEALTH DEPARTMENT ID":"[0-9]",
"NYSDOH OFFICE ID":"[0-9]",
"ACF ID":"AF\d\d\d\d",
"GENERIC NUMBER ID":"[0-9]",
"GENERIC ID":"[A-Za-z0-9]",
"OASAS FAC":"[0-9]",
"OMH PSYCH CTR":"[0-9A-Z]"}
Like the PFI values seem to be limited to 4 numeric digits in a string field, so 12345 doesn't work later in the code but 1234 does. Adding another \d doesn't appear to be the answer.
These are, apparently, regular expressions used to validate inputs. See https://docs.python.org/2/library/re.html
Without seeing the code that uses these values it is impossible to say more.
I have some text, taken from different websites, that I want to extract dates from. As one can imagine, the dates vary substantially in how they are formatted, and look something like:
Posted: 10/01/2014
Published on August 1st 2014
Last modified on 5th of July 2014
Posted by Dave on 10-01-14
What I want to know is if anyone knows of a Python library [or API] which would help with this - (other than e.g. regex, which will be my fallback). I could probably relatively easily remove the "posed on" parts, but getting the other stuff consistent does not look easy.
My solution using dateutil
Following Lukas's suggestion, I used the dateutil package (seemed far more flexible than Arrow), using the Fuzzy entry, which basically ignores things which are not dates.
Caution on Fuzzy parsing using dateutil
The main thing to note with this is that as noted in the thread Trouble in parsing date using dateutil if it is unable to parse a day/month/year it takes a default value (which is the current day, unless specified), and as far as i can tell there is no flag reported to indicate that it took the default.
This would result in "random text" returning today's date of 2015-4-16 which could have caused problems.
Solution
Since I really want to know when it fails, rather than fill in the date with a default value, I ended up running twice, and then seeing if it took the default on both instances - if not, then I assumed parsing correctly.
from datetime import datetime
from dateutil.parser import parse
def extract_date(text):
date = {}
date_1 = parse(text, fuzzy=True, default=datetime(2001, 01, 01))
date_2 = parse(text, fuzzy=True, default=datetime(2002, 02, 02))
if date_1.day == 1 and date_2.day ==2:
date["day"] = "XX"
else:
date["day"] = date_1.day
if date_1.month == 1 and date_2.month ==2:
date["month"] = "XX"
else:
date["month"] = date_1.month
if date_1.year == 2001 and date_2.year ==2002:
date["year"] = "XXXX"
else:
date["year"] = date_1.year
return(date)
print extract_date("Posted: by dave August 1st")
Obviously this is a bit of a botch (so if anyone has a more elegant solution -please share), but this correctly parsed the four examples i had above [where it assumed US format for the date 10/01/2014 rather than UK format], and resulted in XX being returned appropriately when missing data entered.
You could use Arrow library:
arrow.get('2013-05-05 12:30:45', ['MM/DD/YYYY', 'MM-DD-YYYY'])
Two arguments, first a str to parse and second a list of formats to try.
I was programming a program in Python, where I need to output date as per user's locale:
Get a list of timezones in a country specified as per user input (did that using pytz)
Get the locale of the user (which I am unable to figure out how to do)
Is there a way to get locale from county/timezone or some other method needs to be followed?
Or do I need to get the locale input from user itself?
EDIT
The program is to be a web-app. The user can provide me his country. But does he have to explicitly provide me the locale also or can I get it from his timezone/country?
"Locale" is a country + language pair.
You have country + timezone. But no info about the language.
I don't think it's possible to convert country + timezone into a single 'correct' locale... in countries with multiple languages there is not a 1:1 relation between language and timezone.
The closest I can see is to use Babel:
from babel import Locale
Locale.parse('und_BR') # 'und' here means unknown language, BR is country
>>> Locale('pt', territory='BR')
This gives you a single 'most likely' (or default) locale for the country. To handle the languages properly you need to ask the user their preferred language.