I have a long string that almost looks like a dictionary. I want to convert this to a proper Python dictionary. An example of the string is below:
'{"autorunResult":"0","batteryInfo":"No system battery","cpuBrand":"Intel(R) Xeon(R) CPU E5-1650 v3 # 3.50GHz","id":"bMlXyTrjXOOo","localeId":"1033","numCores":"1","payloadResult":"0","processorArchitecture":"x64 (AMD or Intel)","systemMemory":"0.2 GB","v":"5","windowsVersion":"Windows 7 Service Pack 1","payloadSaved":true,"autorunSaved":true,"installedApps":["AddressBook","Adobe AIR","com.adobe.mauby.4875E02D9FB21EE389F73B8D1702B320485DF8CE.1","Connection Manager","DirectDrawEx","Fontcore","IE40","IE4Data","IE5BAKEX","IEData","MobileOptionPack","Pillow-py2.7","SchedulingAgent","WIC","{00203668-8170-44A0-BE44-B632FA4D780F}","{26A24AE4-039D-4CA4-87B4-2F83217000FF}","{32A3A4F4-B792-11D6-A78A-00B0D0170000}","{4A03706F-666A-4037-7777-5F2748764D10}","{77DCDCE3-2DED-62F3-8154-05E745472D07}","{AC76BA86-7AD7-1033-7B44-A90000000001}","{BB8B979E-E336-47E7-96BC-1031C1B94561}","{C3CC4DF5-39A5-4027-B136-2B3E1F5AB6E2}"],"autoRunApps":["OptionalComponents","Adobe Reader Speed Launcher","SunJavaUpdateSched","MFDS"]}'
Note that this looks like a string representation of a dictionary. In fact, it is not. These two k,v pairs kill it: "payloadSaved":true,"autorunSaved":true. (no double-quotes around the values).
Basically, I need to take the long input string and convert it to a dictionary. Any tricks?
I tried:
using ast.literal_eval. It bombs...because of the above issue. Need to somehow sanitize the input string so that ast works.
Take out the parenthesis, tokenize the long string on comma, but again, it bombs...(the list values have commas...).
Not sure how to proceed.
If that is JSON, then:
import json
d = json.loads(s)
If that is Python file:
d = eval(s)
For the string keys & values you will find no much difference. The difference may appear when true/True or false/False or null/None values appear, or on how the lists/dicts are serialized in some cases.
Related
Hopefully 2 quick questions...
I have a datastring that is stored in a dictionary of dictionaries. I.e
data['<ITEM NUM>']['<time>']
My first question is this: Can I use this data structure directory in strptime? With my first few attempts I was getting error message saying:Must be string, not list
Secondly, my time tag is stored in this format HH:MM:SS.f but the milliseconds has 5 digits. Is there a quick way to resolve this since strptime's %f format only accepts 3 digits?
Update:
Well either way, I still have 5 digits for milliseconds and strpdate does not seem to like that when I pass in my string. Besides adding a 0 the end of it is there a way to get it to convert it without having to do this?
Thanks!
strptime() takes a string and a format as the input. It doesn't traverse a list of items. You can easily accomplish this with a simple loop over your dict.
for key in data.keys():
timeobj = time.strptime(data[key], '%H:%M:%S.%f')
(do something with the time object ...)
I'm trying to insert a unix timestamp using REST to a webservice. And when I convert the dictionary I get the value: 1392249600000L I need this value to be an integer.
So I tried int(1392249600000L) and I get 1392249600000L, still a long value.
The reason I need this is because the JSON webservice only accepts timestamsp with milliseconds in them, but when I pass the JSON value with the 'L' in it I get an invalid JSON Primative of value 1392249600000L error.
Can someone please help me resolve this? It seems like it should be so easy, but it's driving me crazy!
You should not be using Python representations when you are sending JSON data. Use the json module to represent integers instead:
>>> import json
>>> json.dumps(1392249600000L)
'1392249600000'
In any case, the L is only part of the string representation to make debugging easier, making it clear you have a long, not int value. Don't use Python string representations for network communications, in any case.
For example, if you have a list of Python values, the str() representation of that list will also use repr() representations of the contents of the list, resulting in L postfixes for long integers. But json.dumps() handles such cases properly too, and handle other types correctly too (like Python None to JSON null, Python True to JSON true, etc.):
>>> json.dumps([1392249600000L, True, None])
'[1392249600000, true, null]'
Let's say I want to create a json object following the structure:
{"favorite_food":["icecream","hamburguers"]}
to do so in python, if i know the whole string in advance, I can just do:
json.dumps({"favorite_food":["icecream","hamburguers"]})
which works fine.
my question though is, how would i do the same thing if i wanted to get the object as a result of a string interpolation? For example:
favorite food = 'pizza'
json.dumps({"favorite_food":[%s]}) %favorite_food
the issue i found is, if I do the interpolation prior to calling the json.dumps:
dict= '{"favorite_food":[%s]}' % favorite_food
if i then do json.dumps(dict) , because of the string quotation, the json_dumps returns:
{"favorite_food":[pizza]}
that is, is not a dict anymore (but a string with the structure of a dict)
How can i solve this simple issue?
Why not just:
>>> food = "pizza"
>>> json.dumps({"favorite_food":[food]})
'{"favorite_food": ["pizza"]}'
json,dumps takes actual values as input --- that is, real dicts, lists, ints, and strings. If you want to put your string value in the dict, just put it in. You don't want to put in a string representation of it, you want to put in the actual value and let json.dumps make the string representation.
How about below:
favorite_food = 'pizza'
my_dict = {"favorite_food":[favorite_food]}
print json.dumps(my_dict)
I found this is very simple.
I want to encode a string in UTF-8 and view the corresponding UTF-8 bytes individually. In the Python REPL the following seems to work fine:
>>> unicode('©', 'utf-8').encode('utf-8')
'\xc2\xa9'
Note that I’m using U+00A9 COPYRIGHT SIGN as an example here. The '\xC2\xA9' looks close to what I want — a string consisting of two separate code points: U+00C2 and U+00A9. (When UTF-8-decoded, it gives back the original string, '\xA9'.)
Then, I want the UTF-8-encoded string to be converted to a JSON-compatible string. However, the following doesn’t seem to do what I want:
>>> import json; json.dumps('\xc2\xa9')
'"\\u00a9"'
Note that it generates a string containing U+00A9 (the original symbol). Instead, I need the UTF-8-encoded string, which would look like "\u00C2\u00A9" in valid JSON.
TL;DR How can I turn '©' into "\u00C2\u00A9" in Python? I feel like I’m missing something obvious — is there no built-in way to do this?
If you really want "\u00c2\u00a9" as the output, give json a Unicode string as input.
>>> print json.dumps(u'\xc2\xa9')
"\u00c2\u00a9"
You can generate this Unicode string from the raw bytes:
s = unicode('©', 'utf-8').encode('utf-8')
s2 = u''.join(unichr(ord(c)) for c in s)
I think what you really want is "\xc2\xa9" as the output, but I'm not sure how to generate that yet.
I have a dictionary of dictionaries in Python:
d = {"a11y_firesafety.html":{"lang:hi": {"div1": "http://a11y.in/a11y/idea/a11y_firesafety.html:hi"}, "lang:kn": {"div1": "http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn}}}
I have this in a JSON file and I encoded it using json.dumps(). Now when I decode it using json.loads() in Python I get a result like this:
temp = {u'a11y_firesafety.html': {u'lang:hi': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:hi'}, u'lang:kn': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn'}}}
My problem is with the "u" which signifies the Unicode encoding in front of every item in my temp (dictionary of dictionaries). How to get rid of that "u"?
Why do you care about the 'u' characters? They're just a visual indicator; unless you're actually using the result of str(temp) in your code, they have no effect on your code. For example:
>>> test = u"abcd"
>>> test == "abcd"
True
If they do matter for some reason, and you don't care about consequences like not being able to use this code in an international setting, then you could pass in a custom object_hook (see the json docs here) to produce dictionaries with string contents rather than unicode.
You could also use this:
import fileinput
fout = open("out.txt", 'a')
for i in fileinput.input("in.txt"):
str = i.replace("u\"","\"").replace("u\'","\'")
print >> fout,str
The typical json responses from standard websites have these two encoding representations - u' and u"
This snippet gets rid of both of them. It may not be required as this encoding doesn't hinder any logical processing, as mentioned by previous commenter
There is no "unicode" encoding, since unicode is a different data type and I don't really see any reason unicode would be a problem, since you may always convert it to string doing e.g. foo.encode('utf-8').
However, if you really want to have string objects upfront you should probably create your own decoder class and use it while decoding JSON.