I have a situation where a JSON configuration document, editable by users, needs to be loaded into a dictionary in my application.
One specific scenario causing problems is a windows UNC path, such as:
\\server\share\file_path
So, valid JSON for this would intuitively be:
{"foo" : "\\\server\\share\\file_path"}
however this is invalid.
I'm going in circles with this. Here are some trials:
# starting with a json string
>>> x = '{"foo" : "\\\server\\share\\file_path"}'
>>> json.loads(x)
ValueError: Invalid \escape: line 1 column 18 (char 18)
# that didn't work, let's try to reverse engineer a dict that's correct
>>> d = {"foo":"\\server\share\file_path"}
>>> d["foo"]
'\\server\\share\x0cile_path'
# good grief, where'd my "f" go?
SUMMARY
How do I create a properly formatted JSON document that includes \\server\share\file_path?
How to I load that string into a dictionary that will return the exact value?
You're running into the escape sequences supported by the string literal. Using raw strings, this becomes clearer:
>>> d = {"foo":"\\server\share\file_path"}
>>> d
{'foo': '\\server\\share\x0cile_path'}
>>> d = {"foo": r"\\server\share\file_path"}
>>> d
{'foo': '\\\\server\\share\\file_path'}
>>> import json
>>> json.dumps(d)
'{"foo": "\\\\\\\\server\\\\share\\\\file_path"}'
>>> with open('out.json', 'w') as f: f.write(json.dumps(d))
...
>>>
$ cat out.json
{"foo": "\\\\server\\share\\file_path"}
Without raw strings, you must "escape all the things!"
>>> d = {"foo":"\\server\share\file_path"}
>>> d
{'foo': '\\server\\share\x0cile_path'}
>>> d = {"foo":"\\\\server\\share\\file_path"}
>>> d
{'foo': '\\\\server\\share\\file_path'}
>>> print d['foo']
\\server\share\file_path
Related
I am having trouble (I'm a beginner in python) to assign an array to JSON.
For example, while using python console it works fine -
>>> data = [
... "byte_array_format",
... "auto_timer",
... "auto_calc",
... "auto_intense",
... "auto_balance"]
>>> print(data)
['byte_array_format', 'auto_timer', 'auto_calc', 'auto_intense', 'auto_balance']
>>> json_db = {}
>>> json_db["value"] = data
>>> print(json_db)
{'value': ['byte_array_format', 'auto_timer', 'auto_calc', 'auto_intense', 'auto_balance']}
While the same thing when I'm doing with python code, it is putting a single quote for the square braces.
json_obj = json.dumps(unique_ids)
print(json_obj)
["byte_array_format", "auto_timer", "auto_calc", "auto_intense", "auto_balance"]
adding the json_obj to value add a single quote for the square braces, which ends up as invalid JSON.
final_json = {}
final_json["value"] = json_obj
print(final_json)
{'value': '["byte_array_format", "auto_timer", "auto_calc", "auto_intense", "auto_balance"]'}
I'm new to Python ,help me how to pass json value as parameter instead of load from filename.Please check below code for reference..
import json
filename = input("Enter your train data filename : ")
print(filename)
with open(filename) as train_data:
train = json.load(train_data)
TRAIN_DATA = []
for data in train:
ents = [tuple(entity) for entity in data['entities']]
TRAIN_DATA.append((data['content'],{'entities':ents}))
with open('{}'.format(filename.replace('json','txt')),'w') as write:
write.write(str(TRAIN_DATA))
In above code json value loaded from file ,instead of file i want to pass json value and load ....
Ex:
train_data=[{"content":"what is the price of polo?","entities":[[21,25,"PrdName"]]}
with open(filename) as train_data:
train = json.load(train_data)
Thanks,
"json value" doesn't mean anything. Json is a text format, not a data type, and what json.loads() do is to transform the json text to python objects - dicts, lists etc - according to the json syntax and what exact type makes sense in Python (json object -> dict, json array -> list etc). You can check this by yourself in your Python shell:
>>> import json
>>> jsonstr = '{"foo":"bar", "baaz":[1, 2, 3]}'
>>> json_data = json.loads(jsonstr)
>>> json_data
{'foo': 'bar', 'baaz': [1, 2, 3]}
>>> type(json_data)
<class 'dict'>
IOW, if you already have the correct Python dict, you have nothing else to do.
url_a = """http://some.url/"""
url_b = """http://some.url/'{}'/target"""
a= requests.get(url_a)
a_data = a.json()
a_id = [i['id'] for i in a_data]
b= requests.get(url_b.format(a_id[0]))
b_data = b.json()
print(b_data)
{u'message': u"Unrecognized REST Request: GET/aps/2/resources/'%5C73d49684-dc10-4d6a-ae56-eb3816cd7064'%5C/subscriptions", u'error': u'APS::Util::Exception'}
type(a_data)
<type 'list'>
URL A has some data fetched in json format, that is represented as a list of dictionaries. I need to feed that value for key 'id' into URL B but I can't do it. It's sending it as http://some.url/'12345'/target with quotes. If I escape the quotes it is still sending literal escapes to the API controller.
If I don't use quotes it returns an empty result.
A valid result is there if it's passed as /aps/2/resources/12345/subscriptions however I can't figure out how to represent it in python.
Appreciate some assistance. Thank you.
>>> s_data = s.json()
>>> print(s_data)
[]
>>> f = i_ids[0]
>>> print(f)
73d49684-dc10-4d6a-ae56-eb3816cd7064
>>> f = i_ids[1]
>>> print(f)
89c20244-331a-48c4-afea-3e23e72af768
>>> s = requests.get(s_url.format(i_ids[1]), verify=False)
>>> s_data = s.json()
>>> print(s_data)
>>> len(s_data)
143
>>>
Guys sorry but yes I just had to remove the single quotes from {}
Thanks
I am trying to convert :
datalist = [u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg'}"]
To list containing python dict. If i try to extract value using keyword i got this error:
for i in datalist:
print i['smallimage']
....:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-686ea4feba66> in <module>()
1 for i in datalist:
----> 2 print i['smallimage']
3
TypeError: string indices must be integers
How do i convert list containing Unicode Dict to Dict..
You could use the demjson module which has a non-strict mode that handles the data you have:
import demjson
for data in datalist:
dct = demjson.decode(data)
print dct['gallery'] # etc...
In this case, I'd hand-craft a regular expression to make these into something you can evaluate as Python:
import re
import ast
from functools import partial
keys = re.compile(r'(gallery|smallimage|largeimage)')
fix_keys = partial(keys.sub, r'"\1"')
for entry in datalist:
entry = ast.literal_eval(fix_keys(entry))
Yes, this is limited; but it works for this set and is robust as long as the keys match. The regular expression is simple to maintain. Moreover, this doesn't use any external dependencies, it's all based on batteries already included.
Result:
>>> for entry in datalist:
... print ast.literal_eval(fix_keys(entry))
...
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg'}
Just as another thought, your list is properly formatted Yaml.
> yaml.load(u'{foo: "bar"}')['foo']
'bar'
And if you want to be really fancy and parse everything at once:
> data = yaml.load('['+','.join(datalist)+']')
> data[0]['smallimage']
'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'
> data[3]['gallery']
'gal1'
If your dictionary keys were quoted, you could
use json.loads to load the string.
import json
for i in datalist:
print json.loads(i)['smallimage']
(ast.literal_eval would have worked too...)
however, as it is, this will work with an old-school eval:
>>> class Mdict(dict):
... def __missing__(self,key):
... return key
...
>>> eval(datalist[0],Mdict(__builtins__=None))
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}
Note that this is probably vulnerable to injection attacks, so only use it if the string is from a trusted source.
Finally, for anyone wanting a short, although somewhat dense solution that uses only the standard library and isn't vulnerable to injection attacks... This little gem does the trick (assuming the dictionary keys are valid identifiers)!
import ast
class RewriteName(ast.NodeTransformer):
def visit_Name(self,node):
return ast.Str(s=node.id)
transformer = RewriteName()
for x in datalist:
tree = ast.parse(x,mode='eval')
transformer.visit(tree)
print ast.literal_eval(tree)['smallimage']
Your datalist is a list of unicode strings.
You could use eval, except your keys are not properly quoted. what you can do is requote your keys on the fly with replace:
for i in datalist:
my_dict = eval(i.replace("gallery", "'gallery'").replace("smallimage", "'smallimage'").replace("largeimage", "'largeimage'"))
print my_dict["smallimage"]
I don't see why the need for all the extra things such as using re or json...
fdict = {str(k): v for (k, v) in udict.items()}
Where udict is the dict that has unicode keys. Simply convert them to str. In your given data, you can simply...
datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]
Simple test:
>>> datalist = [{u'a':1,u'b':2},{u'a':1,u'b':2}]
[{u'a': 1, u'b': 2}, {u'a': 1, u'b': 2}]
>>> datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]
>>> datalist
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
No import re or import json. Simple and quick.
I have
>>> import yaml
>>> yaml.dump(u'abc')
"!!python/unicode 'abc'\n"
But I want
>>> import yaml
>>> yaml.dump(u'abc', magic='something')
'abc\n'
What magic param forces no tagging?
You can use safe_dump instead of dump. Just keep in mind that it won't be able to represent arbitrary Python objects then. Also, when you load the YAML, you will get a str object instead of unicode.
How about this:
def unicode_representer(dumper, uni):
node = yaml.ScalarNode(tag=u'tag:yaml.org,2002:str', value=uni)
return node
yaml.add_representer(unicode, unicode_representer)
This seems to make dumping unicode objects work the same as dumping str objects for me (Python 2.6).
In [72]: yaml.dump(u'abc')
Out[72]: 'abc\n...\n'
In [73]: yaml.dump('abc')
Out[73]: 'abc\n...\n'
In [75]: yaml.dump(['abc'])
Out[75]: '[abc]\n'
In [76]: yaml.dump([u'abc'])
Out[76]: '[abc]\n'
You need a new dumper class that does everything the standard Dumper class does but overrides the representers for str and unicode.
from yaml.dumper import Dumper
from yaml.representer import SafeRepresenter
class KludgeDumper(Dumper):
pass
KludgeDumper.add_representer(str,
SafeRepresenter.represent_str)
KludgeDumper.add_representer(unicode,
SafeRepresenter.represent_unicode)
Which leads to
>>> print yaml.dump([u'abc',u'abc\xe7'],Dumper=KludgeDumper)
[abc, "abc\xE7"]
>>> print yaml.dump([u'abc',u'abc\xe7'],Dumper=KludgeDumper,encoding=None)
[abc, "abc\xE7"]
Granted, I'm still stumped on how to keep this pretty.
>>> print u'abc\xe7'
abcç
And it breaks a later yaml.load()
>>> yy=yaml.load(yaml.dump(['abc','abc\xe7'],Dumper=KludgeDumper,encoding=None))
>>> yy
['abc', 'abc\xe7']
>>> print yy[1]
abc�
>>> print u'abc\xe7'
abcç
little addition to interjay's excellent answer, you can keep your unicode on a reload if you take care of your file encodings.
# -*- coding: utf-8 -*-
import yaml
import codecs
data = dict(key = u"abcç\U0001F511")
fn = "test2.yaml"
with codecs.open(fn, "w", encoding="utf-8") as fo:
yaml.safe_dump(data, fo)
with codecs.open(fn, encoding="utf-8") as fi:
data2 = yaml.safe_load(fi)
print ("data2:", data2, "type(data.key):", type(data2.get("key")) )
print data2.get("key")
test2.yaml contents in my editor:
{key: "abc\xE7\uD83D\uDD11"}
print outputs:
('data2:', {'key': u'abc\xe7\U0001f511'}, 'type(data.key):', <type 'unicode'>)
abcç🔑
Plus, after reading http://nedbatchelder.com/blog/201302/war_is_peace.html I am pretty sure that safe_load/safe_dump is where I want to be anyway.
I've just started with Python and YAML, but probably this may also help. Just compare outputs:
def test_dump(self):
print yaml.dump([{'name': 'value'}, {'name2': 1}], explicit_start=True)
print yaml.dump_all([{'name': 'value'}, {'name2': 1}])