How to convert Unicode dict to dict - python

I am trying to convert :
datalist = [u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg'}"]
To list containing python dict. If i try to extract value using keyword i got this error:
for i in datalist:
print i['smallimage']
....:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-686ea4feba66> in <module>()
1 for i in datalist:
----> 2 print i['smallimage']
3
TypeError: string indices must be integers
How do i convert list containing Unicode Dict to Dict..

You could use the demjson module which has a non-strict mode that handles the data you have:
import demjson
for data in datalist:
dct = demjson.decode(data)
print dct['gallery'] # etc...

In this case, I'd hand-craft a regular expression to make these into something you can evaluate as Python:
import re
import ast
from functools import partial
keys = re.compile(r'(gallery|smallimage|largeimage)')
fix_keys = partial(keys.sub, r'"\1"')
for entry in datalist:
entry = ast.literal_eval(fix_keys(entry))
Yes, this is limited; but it works for this set and is robust as long as the keys match. The regular expression is simple to maintain. Moreover, this doesn't use any external dependencies, it's all based on batteries already included.
Result:
>>> for entry in datalist:
... print ast.literal_eval(fix_keys(entry))
...
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg'}

Just as another thought, your list is properly formatted Yaml.
> yaml.load(u'{foo: "bar"}')['foo']
'bar'
And if you want to be really fancy and parse everything at once:
> data = yaml.load('['+','.join(datalist)+']')
> data[0]['smallimage']
'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'
> data[3]['gallery']
'gal1'

If your dictionary keys were quoted, you could
use json.loads to load the string.
import json
for i in datalist:
print json.loads(i)['smallimage']
(ast.literal_eval would have worked too...)
however, as it is, this will work with an old-school eval:
>>> class Mdict(dict):
... def __missing__(self,key):
... return key
...
>>> eval(datalist[0],Mdict(__builtins__=None))
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}
Note that this is probably vulnerable to injection attacks, so only use it if the string is from a trusted source.
Finally, for anyone wanting a short, although somewhat dense solution that uses only the standard library and isn't vulnerable to injection attacks... This little gem does the trick (assuming the dictionary keys are valid identifiers)!
import ast
class RewriteName(ast.NodeTransformer):
def visit_Name(self,node):
return ast.Str(s=node.id)
transformer = RewriteName()
for x in datalist:
tree = ast.parse(x,mode='eval')
transformer.visit(tree)
print ast.literal_eval(tree)['smallimage']

Your datalist is a list of unicode strings.
You could use eval, except your keys are not properly quoted. what you can do is requote your keys on the fly with replace:
for i in datalist:
my_dict = eval(i.replace("gallery", "'gallery'").replace("smallimage", "'smallimage'").replace("largeimage", "'largeimage'"))
print my_dict["smallimage"]

I don't see why the need for all the extra things such as using re or json...
fdict = {str(k): v for (k, v) in udict.items()}
Where udict is the dict that has unicode keys. Simply convert them to str. In your given data, you can simply...
datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]
Simple test:
>>> datalist = [{u'a':1,u'b':2},{u'a':1,u'b':2}]
[{u'a': 1, u'b': 2}, {u'a': 1, u'b': 2}]
>>> datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]
>>> datalist
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
No import re or import json. Simple and quick.

Related

Replace all ' ' in Dictionary with " " [duplicate]

I am trying to create a python dictionary which is to be used as a java script var inside a html file for visualization purposes. As a requisite, I am in need of creating the dictionary with all names inside double quotes instead of default single quotes which Python uses. Is there an easy and elegant way to achieve this.
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = dict(couples)
print pairs
Generated Output:
{'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
Expected Output:
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
I know, json.dumps(pairs) does the job, but the dictionary as a whole is converted into a string which isn't what I am expecting.
P.S.: Is there an alternate way to do this with using json, since I am dealing with nested dictionaries.
json.dumps() is what you want here, if you use print(json.dumps(pairs)) you will get your expected output:
>>> pairs = {'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
>>> print(pairs)
{'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
>>> import json
>>> print(json.dumps(pairs))
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
You can construct your own version of a dict with special printing using json.dumps():
>>> import json
>>> class mydict(dict):
def __str__(self):
return json.dumps(self)
>>> couples = [['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
>>> pairs = mydict(couples)
>>> print pairs
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
You can also iterate:
>>> for el in pairs:
print el
arun
bill
jack
hari
# do not use this until you understand it
import json
class doubleQuoteDict(dict):
def __str__(self):
return json.dumps(self)
def __repr__(self):
return json.dumps(self)
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = doubleQuoteDict(couples)
print pairs
Yields:
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
Here's a basic print version:
>>> print '{%s}' % ', '.join(['"%s": "%s"' % (k, v) for k, v in pairs.items()])
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
The premise of the question is wrong:
I know, json.dumps(pairs) does the job, but the dictionary
as a whole is converted into a string which isn't what I am expecting.
You should be expecting a conversion to a string. All "print" does is convert an object to a string and send it to standard output.
When Python sees:
print somedict
What it really does is:
sys.stdout.write(somedict.__str__())
sys.stdout.write('\n')
As you can see, the dict is always converted to a string (afterall a string is the only datatype you can send to a file such as stdout).
Controlling the conversion to a string can be done either by defining __str__ for an object (as the other respondents have done) or by calling a pretty printing function such as json.dumps(). Although both ways have the same effect of creating a string to be printed, the latter technique has many advantages (you don't have to create a new object, it recursively applies to nested data, it is standard, it is written in C for speed, and it is already well tested).
The postscript still misses the point:
P.S.: Is there an alternate way to do this with using json, since I am
dealing with nested dictionaries.
Why work so hard to avoid the json module? Pretty much any solution to the problem of printing nested dictionaries with double quotes will re-invent what json.dumps() already does.
The problem that has gotten me multiple times is when loading a json file.
import json
with open('json_test.json', 'r') as f:
data = json.load(f)
print(type(data), data)
json_string = json.dumps(data)
print(json_string)
I accidentally pass data to some function that wants a json string and I get the error that single quote is not valid json. I recheck the input json file and see the double quotes and then scratch my head for a minute.
The problem is that data is a dict not a string, but when Python converts it for you it is NOT valid json.
<class 'dict'> {'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana', 'arun': 'maya'}
{"bill": "samantha", "jack": "ilena", "hari": "aradhana", "arun": "maya"}
If the json is valid and the dict does not need processing before conversion to string, just load as string does the trick.
with open('json_test.json', 'r') as f:
json_string = f.read()
print(json_string)
It's Easy just 2 steps
step1:converting your dict to list
step2:iterate your list and convert as json .
For better understanding check down below snippet
import json
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = [dict(couples)]#converting your dict to list
print(pairs)
#iterate ur list and convert as json
for x in pairs:
print("\n after converting: \n\t",json.dumps(x))#json like structure

python split string into multiple delimiters and put into dictionary

i have the below string that i am trying to split into a dictionary with specific names.
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
what I am hoping to obtain is:
>>> print(dict)
{'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
i'm quite new to working with dictionaries, so not too sure how it is supposed to turn out.
I have tried the below code, but it only provides instruction1 instead of the full list of instructions
delimiters = ['&nn', '&tr']
values = re.split('|'.join(delimiters), string1)
values.pop(0) # remove the initial empty string
keys = re.findall('|'.join(delimiters), string1)
output = dict(zip(keys, values))
print(output)
Use url-parsing.
from urllib import parse
url = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
d = parse.parse_qs(parse.urlparse(url).query)
print(d)
Returns:
{'nn': ['specialtime'],
'tr': ['instruction1', 'instruction2', 'instruction3'],
'x': ['klink:apple']}
And from this point, if necessary..., you would simply have to rename and pick your vars. Like this:
d = {
'namy_names':d.get('nn',['Empty'])[0],
'tracks':d.get('tr',[])
}
# {'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
This looks like url-encoded data, so you can/should use urllib.parse.parse_qs:
import urllib.parse
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
dic = urllib.parse.parse_qs(string1)
dic = {'namy_names': dic['nn'][0],
'tracks': dic['tr']}
# result: {'namy_names': 'specialtime',
# 'tracks': ['instruction1', 'instruction2', 'instruction3']}

python 2.7 get rid of double backslashes

I have list with one string element, see below
>>> s
['{\\"SrcIP\\":\\"1.1.1.1\\",\\"DstIP\\":\\"2.2.2.2\\",\\"DstPort\\":\\"80\\"}']
want to get rid of these '\\' and have dict instead:
{"SrcIP":"1.1.1.1","DstIP":"2.2.2.2","DstPort":"80"}
It looks like JSON object. You can load it to dict by using json package, but first to get rid of list and \\ you can call s[0].replace('\\', '')
import json
my_dict = json.loads(s[0].replace('\\', ''))
You can try this:
import re
import ast
s = ['{\\"SrcIP\\":\\"1.1.1.1\\",\\"DstIP\\":\\"2.2.2.2\\",\\"DstPort\\":\\"80\\"}']
final_response = [ast.literal_eval(re.sub('\\\\', '', i)) for i in s][0]
Output:
{'SrcIP': '1.1.1.1', 'DstIP': '2.2.2.2', 'DstPort': '80'}
Just use string replace method :
list_1=['{\\"SrcIP\\":\\"1.1.1.1\\",\\"DstIP\\":\\"2.2.2.2\\",\\"DstPort\\":\\"80\\"}']
for i in list_1:
print(str(i).replace("\\",""))
Or you can do in one line:
print(str(list_1[0]).replace("\\",""))
output:
{"SrcIP":"1.1.1.1","DstIP":"2.2.2.2","DstPort":"80"}
s is a list with one text item, you could get your desired output as follows:
import ast
s = ['{\\"SrcIP\\":\\"1.1.1.1\\",\\"DstIP\\":\\"2.2.2.2\\",\\"DstPort\\":\\"80\\"}']
s_dict = ast.literal_eval(s[0].replace('\\', ''))
print s_dict
print s_dict['DstIP']
Giving you the following output:
{'SrcIP': '1.1.1.1', 'DstIP': '2.2.2.2', 'DstPort': '80'}
2.2.2.2
The Python function ast.litertal_eval() can be used to safely convert a string into a Python object, in this case a dictionary.

Remove Quotation marks from a dictionary

I have a dictionary of bigrams, obtained by importing a csv and transforming it to a dictionary:
bigram_dict = {"('key1', 'key2')": 'meaning', "('key22', 'key13')": 'mean2'}
I want keys' dictionary to be without quotation marks, i.e.:
desired_bigram_dict={('key1', 'key2'): 'meaning', ('key22', 'key13'): 'mean2'}
Would you please suggest me how to do this?
This can be done using a dictionary comprehension, where you call literal_eval on the key:
from ast import literal_eval
bigram_dict = {"('key1', 'key2')": 'meaning', "('key22', 'key13')": 'mean2'}
res = {literal_eval(k): v for k,v in bigram_dict.items()}
Result:
{('key22', 'key13'): 'mean2', ('key1', 'key2'): 'meaning'}
You can literal_eval each key and reassign:
from ast import literal_eval
bigram_dict = {"('key1', 'key2')": 'meaning', "('key22', 'key13')": 'mean2'}
for k,v in bigram_dict.items():
bigram_dict[literal_eval(k)] = v
Or to create a new dict, just use the same logic with a dict comprehension:
{literal_eval(k):v for k,v in bigram_dict.items()}
Both will give you:
{('key1', 'key2'): 'meaning', ('key22', 'key13'): 'mean2'}

Storing a UNC in JSON and loading into a dict

I have a situation where a JSON configuration document, editable by users, needs to be loaded into a dictionary in my application.
One specific scenario causing problems is a windows UNC path, such as:
\\server\share\file_path
So, valid JSON for this would intuitively be:
{"foo" : "\\\server\\share\\file_path"}
however this is invalid.
I'm going in circles with this. Here are some trials:
# starting with a json string
>>> x = '{"foo" : "\\\server\\share\\file_path"}'
>>> json.loads(x)
ValueError: Invalid \escape: line 1 column 18 (char 18)
# that didn't work, let's try to reverse engineer a dict that's correct
>>> d = {"foo":"\\server\share\file_path"}
>>> d["foo"]
'\\server\\share\x0cile_path'
# good grief, where'd my "f" go?
SUMMARY
How do I create a properly formatted JSON document that includes \\server\share\file_path?
How to I load that string into a dictionary that will return the exact value?
You're running into the escape sequences supported by the string literal. Using raw strings, this becomes clearer:
>>> d = {"foo":"\\server\share\file_path"}
>>> d
{'foo': '\\server\\share\x0cile_path'}
>>> d = {"foo": r"\\server\share\file_path"}
>>> d
{'foo': '\\\\server\\share\\file_path'}
>>> import json
>>> json.dumps(d)
'{"foo": "\\\\\\\\server\\\\share\\\\file_path"}'
>>> with open('out.json', 'w') as f: f.write(json.dumps(d))
...
>>>
$ cat out.json
{"foo": "\\\\server\\share\\file_path"}
Without raw strings, you must "escape all the things!"
>>> d = {"foo":"\\server\share\file_path"}
>>> d
{'foo': '\\server\\share\x0cile_path'}
>>> d = {"foo":"\\\\server\\share\\file_path"}
>>> d
{'foo': '\\\\server\\share\\file_path'}
>>> print d['foo']
\\server\share\file_path

Categories