Python strptime for lists and milliseconds - python

Hopefully 2 quick questions...
I have a datastring that is stored in a dictionary of dictionaries. I.e
data['<ITEM NUM>']['<time>']
My first question is this: Can I use this data structure directory in strptime? With my first few attempts I was getting error message saying:Must be string, not list
Secondly, my time tag is stored in this format HH:MM:SS.f but the milliseconds has 5 digits. Is there a quick way to resolve this since strptime's %f format only accepts 3 digits?
Update:
Well either way, I still have 5 digits for milliseconds and strpdate does not seem to like that when I pass in my string. Besides adding a 0 the end of it is there a way to get it to convert it without having to do this?
Thanks!

strptime() takes a string and a format as the input. It doesn't traverse a list of items. You can easily accomplish this with a simple loop over your dict.
for key in data.keys():
timeobj = time.strptime(data[key], '%H:%M:%S.%f')
(do something with the time object ...)

Related

how to convert string to dictionary

I have a long string that almost looks like a dictionary. I want to convert this to a proper Python dictionary. An example of the string is below:
'{"autorunResult":"0","batteryInfo":"No system battery","cpuBrand":"Intel(R) Xeon(R) CPU E5-1650 v3 # 3.50GHz","id":"bMlXyTrjXOOo","localeId":"1033","numCores":"1","payloadResult":"0","processorArchitecture":"x64 (AMD or Intel)","systemMemory":"0.2 GB","v":"5","windowsVersion":"Windows 7 Service Pack 1","payloadSaved":true,"autorunSaved":true,"installedApps":["AddressBook","Adobe AIR","com.adobe.mauby.4875E02D9FB21EE389F73B8D1702B320485DF8CE.1","Connection Manager","DirectDrawEx","Fontcore","IE40","IE4Data","IE5BAKEX","IEData","MobileOptionPack","Pillow-py2.7","SchedulingAgent","WIC","{00203668-8170-44A0-BE44-B632FA4D780F}","{26A24AE4-039D-4CA4-87B4-2F83217000FF}","{32A3A4F4-B792-11D6-A78A-00B0D0170000}","{4A03706F-666A-4037-7777-5F2748764D10}","{77DCDCE3-2DED-62F3-8154-05E745472D07}","{AC76BA86-7AD7-1033-7B44-A90000000001}","{BB8B979E-E336-47E7-96BC-1031C1B94561}","{C3CC4DF5-39A5-4027-B136-2B3E1F5AB6E2}"],"autoRunApps":["OptionalComponents","Adobe Reader Speed Launcher","SunJavaUpdateSched","MFDS"]}'
Note that this looks like a string representation of a dictionary. In fact, it is not. These two k,v pairs kill it: "payloadSaved":true,"autorunSaved":true. (no double-quotes around the values).
Basically, I need to take the long input string and convert it to a dictionary. Any tricks?
I tried:
using ast.literal_eval. It bombs...because of the above issue. Need to somehow sanitize the input string so that ast works.
Take out the parenthesis, tokenize the long string on comma, but again, it bombs...(the list values have commas...).
Not sure how to proceed.
If that is JSON, then:
import json
d = json.loads(s)
If that is Python file:
d = eval(s)
For the string keys & values you will find no much difference. The difference may appear when true/True or false/False or null/None values appear, or on how the lists/dicts are serialized in some cases.

How to require a timestamp to be zero-padded during validation in Python?

I'm trying to validate a string that's supposed to contain a timestamp in the format of ISO 8601 (commonly used in JSON).
Python's strptime seems to be very forgiving when it comes to validating zero-padding, see code example below (note that the hour is missing a leading zero):
>>> import datetime
>>> s = '1985-08-23T3:00:00.000'
>>> datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
datetime.datetime(1985, 8, 23, 3, 0)
It gracefully accepts a string that's not zero-padded for the hour for example, and doesn't throw a ValueError exception as I would expect.
Is there any way to enforce strptime to validate that it's zero-padded? Or is there any other built-in function in the standard libs of Python that does?
I would like to avoid writing my own regexp for this.
There is already an answer that parsing ISO8601 or RFC3339 date/time with Python strptime() is impossible: How to parse an ISO 8601-formatted date?
So, to answer you question, no there is no way in the standard Python library to reliable parse such a date.
Regarding the regex suggestions, a date string like
2020-14-32T45:33:44.123
would result in a valid date. There are lots of Python modules (if you search for "iso8601" on https://pypi.python.org), but building a complete ISO8601 Validator would require things like leap seconds, the list of possible time zone offset values and many more.
To enforce strptime to validate leading zeros for you you'll have to add your own literals to Python's _strptime._TimeRE_cache. The solution is very hacky, most likely not very portable, and requires writing a RegEx - although only for the hour part of a timestamp.
Another solution to the problem would be to write your own function that uses strptime and also converts the parsed date back to a string and compares the two strings. This solution is portable, but it lacks for the clear error messages - you won't be able to distinguish between missing leading zeros in hours, minutes, seconds.
You said you want to avoid a regex, but this is actually the type of problem where a regex is appropriate. As you discovered, strptime is very flexible about the input it will accept. However, the regex for this problem is relatively easy to compose:
import re
date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}')
s_list = [
'1985-08-23T3:00:00.000',
'1985-08-23T03:00:00.000'
]
for s in s_list:
if date_pattern.match(s):
print "%s is valid" % s
else:
print "%s is invalid" % s
Output
1985-08-23T3:00:00.000 is invalid
1985-08-23T03:00:00.000 is valid
Try it on repl.it
The only thing I can think of outside of messing with Python internals is to test for the validity of the format by knowing what you are looking for.
So, if I garner it right, the format is '%Y-%m-%dT%H:%M:%S.%f' and should be zero padded.
Then, you know the exact length of the string you are looking for and reproduce the intended result..
import datetime
s = '1985-08-23T3:00:00.000'
stripped = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
try:
assert len(s) == 23
except AssertionError:
raise ValueError("time data '{}' does not match format '%Y-%m-%dT%H:%M:%S.%f".format(s))
else:
print(stripped) #just for good measure
>>ValueError: time data '1985-08-23T3:00:00.000' does not match format '%Y-%m-%dT%H:%M:%S.%f

Python - Getting the date format [duplicate]

This question already has answers here:
How to determine appropriate strftime format from a date string?
(4 answers)
Closed 5 years ago.
I'm getting a date as a string, then I'm parsing it to datetime object.
Is there any way to check what's is the date format of the object?
Let's say that this is the object that I'm creating:
modified_date = parser.parse("2015-09-01T12:34:15.601+03:00")
How can i print or get the exact date format of this object, i need this in order to verify that it's in the correct format, so I'll be able to to make a diff of today's date and the given date.
I had a look in the source code and, unfortunately, python-dateutil doesn't expose the format. In fact it doesn't even generate a guess for the format at all, it just goes ahead and parses - the code is like a big nested spaghetti of conditionals.
You could have a look at dateinfer which looks to be what you're searching for, but these are unrelated libraries so there is no guarantee at all that python-dateutil will parse with the same format that dateinfer suggests.
>>> from dateinfer import infer
>>> s = "2015-09-01T12:34:15.601+03:00"
>>> infer([s])
'%Y-%d-%mT%I:%M:%S.601+%m:%d'
Look at that .601. Close but not cigar. I think it has probably also mixed up the month and the day. You might get better results by giving it more than one date string to base the guess upon.
i need this in order to verify that it's in the correct format
If you know the expected time format (or a set of valid time formats) then you could just parse the input using it: if it succeeds then the time format is valid (the usual EAFP approach in Python):
for date_format in valid_date_formats:
try:
return datetime.strptime(date_string, date_format), date_format
except ValueError: # wrong date format
pass # try the next format
raise ValueError("{date_string} is not in the correct format. "
"valid formats: {valid_date_formats}".format(**vars()))
Here's a complete code example (in Russian -- ignore the text, look at the code).
If there are many valid date formats then to improve time performance you might want to combine them into a single regular expression or convert the regex to a deterministic or non-deterministic finite-state automaton (DFA or NFA).
In general, if you need to extract dates from a larger text that is too varied to create parsing rules manually; consider machine learning solutions e.g., a NER system such as webstruct (for html input).

Write Number as string csv python

I'm trying to write a csv file from json data. During that, i want to write '001023472' but its writing as '1023472'. I have searched a lot. But dint find an answer.
The value is of type string before writing. The problem is during writing it into the file.
Thanks in advance.
Convert the number to string with formatting operator; in your case: "%09d" % number.
Use the format builtin or format string method.
>>> format(1023472, '09')
'001023472'
>>> '{:09}'.format(1023472)
'001023472'
If your "number" is actually a string, you can also just left-pad it with '0''s:
>>> format('1023472', '>09')
'001023472'
The Python docs generally eschew % formatting, saying it may go away in the future and is also more finnicky; for new code there is no real reason to use it, especially in 2.7+.

Python json.dumps with string interpolation?

Let's say I want to create a json object following the structure:
{"favorite_food":["icecream","hamburguers"]}
to do so in python, if i know the whole string in advance, I can just do:
json.dumps({"favorite_food":["icecream","hamburguers"]})
which works fine.
my question though is, how would i do the same thing if i wanted to get the object as a result of a string interpolation? For example:
favorite food = 'pizza'
json.dumps({"favorite_food":[%s]}) %favorite_food
the issue i found is, if I do the interpolation prior to calling the json.dumps:
dict= '{"favorite_food":[%s]}' % favorite_food
if i then do json.dumps(dict) , because of the string quotation, the json_dumps returns:
{"favorite_food":[pizza]}
that is, is not a dict anymore (but a string with the structure of a dict)
How can i solve this simple issue?
Why not just:
>>> food = "pizza"
>>> json.dumps({"favorite_food":[food]})
'{"favorite_food": ["pizza"]}'
json,dumps takes actual values as input --- that is, real dicts, lists, ints, and strings. If you want to put your string value in the dict, just put it in. You don't want to put in a string representation of it, you want to put in the actual value and let json.dumps make the string representation.
How about below:
favorite_food = 'pizza'
my_dict = {"favorite_food":[favorite_food]}
print json.dumps(my_dict)
I found this is very simple.

Categories