Replace all ' ' in Dictionary with " " [duplicate] - python

I am trying to create a python dictionary which is to be used as a java script var inside a html file for visualization purposes. As a requisite, I am in need of creating the dictionary with all names inside double quotes instead of default single quotes which Python uses. Is there an easy and elegant way to achieve this.
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = dict(couples)
print pairs
Generated Output:
{'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
Expected Output:
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
I know, json.dumps(pairs) does the job, but the dictionary as a whole is converted into a string which isn't what I am expecting.
P.S.: Is there an alternate way to do this with using json, since I am dealing with nested dictionaries.

json.dumps() is what you want here, if you use print(json.dumps(pairs)) you will get your expected output:
>>> pairs = {'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
>>> print(pairs)
{'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
>>> import json
>>> print(json.dumps(pairs))
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}

You can construct your own version of a dict with special printing using json.dumps():
>>> import json
>>> class mydict(dict):
def __str__(self):
return json.dumps(self)
>>> couples = [['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
>>> pairs = mydict(couples)
>>> print pairs
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
You can also iterate:
>>> for el in pairs:
print el
arun
bill
jack
hari

# do not use this until you understand it
import json
class doubleQuoteDict(dict):
def __str__(self):
return json.dumps(self)
def __repr__(self):
return json.dumps(self)
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = doubleQuoteDict(couples)
print pairs
Yields:
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}

Here's a basic print version:
>>> print '{%s}' % ', '.join(['"%s": "%s"' % (k, v) for k, v in pairs.items()])
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}

The premise of the question is wrong:
I know, json.dumps(pairs) does the job, but the dictionary
as a whole is converted into a string which isn't what I am expecting.
You should be expecting a conversion to a string. All "print" does is convert an object to a string and send it to standard output.
When Python sees:
print somedict
What it really does is:
sys.stdout.write(somedict.__str__())
sys.stdout.write('\n')
As you can see, the dict is always converted to a string (afterall a string is the only datatype you can send to a file such as stdout).
Controlling the conversion to a string can be done either by defining __str__ for an object (as the other respondents have done) or by calling a pretty printing function such as json.dumps(). Although both ways have the same effect of creating a string to be printed, the latter technique has many advantages (you don't have to create a new object, it recursively applies to nested data, it is standard, it is written in C for speed, and it is already well tested).
The postscript still misses the point:
P.S.: Is there an alternate way to do this with using json, since I am
dealing with nested dictionaries.
Why work so hard to avoid the json module? Pretty much any solution to the problem of printing nested dictionaries with double quotes will re-invent what json.dumps() already does.

The problem that has gotten me multiple times is when loading a json file.
import json
with open('json_test.json', 'r') as f:
data = json.load(f)
print(type(data), data)
json_string = json.dumps(data)
print(json_string)
I accidentally pass data to some function that wants a json string and I get the error that single quote is not valid json. I recheck the input json file and see the double quotes and then scratch my head for a minute.
The problem is that data is a dict not a string, but when Python converts it for you it is NOT valid json.
<class 'dict'> {'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana', 'arun': 'maya'}
{"bill": "samantha", "jack": "ilena", "hari": "aradhana", "arun": "maya"}
If the json is valid and the dict does not need processing before conversion to string, just load as string does the trick.
with open('json_test.json', 'r') as f:
json_string = f.read()
print(json_string)

It's Easy just 2 steps
step1:converting your dict to list
step2:iterate your list and convert as json .
For better understanding check down below snippet
import json
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = [dict(couples)]#converting your dict to list
print(pairs)
#iterate ur list and convert as json
for x in pairs:
print("\n after converting: \n\t",json.dumps(x))#json like structure

Related

Converting data inside a JSON file into Python dict

I have a JSON file and the inside of it looks like this:
"{'reviewId': 'gp:AOqpTOGiJUWB2pk4jpWSuvqeXofM9B4LQQ4Iom1mNeGzvweEriNTiMdmHsxAJ0jaJiK7CbjJ_s7YEWKE2DA_Qzo', 'userName': '\u00c0ine Mongey', 'userImage': 'https://play-lh.googleusercontent.com/a-/AOh14GhUv3c6xHP4kvLSJLaRaydi6o2qxp6yZhaLeL8QmQ', 'content': \"Honestly a great game, it does take a while to get money at first, and they do make it easier to get money by watching ads. I'm glad they don't push it though, and the game is super relaxing and fun!\", 'score': 5, 'thumbsUpCount': 2, 'reviewCreatedVersion': '1.33.0', 'at': datetime.datetime(2021, 4, 23, 8, 20, 34), 'replyContent': None, 'repliedAt': None}"
I am trying to convert this into a dict and then to a pandas DataFrame. I tried this but it will just turn this into a string representation of a dict, not a dict itself:
with open('sampledict.json') as f:
dictdump = json.loads(f.read())
print(type(dictdump))
I feel like I am so close now but I can't find out what I miss to get a dict out of this. Any input will be greatly appreciated!
If I get your data format correctly, this will work:
with open('sampledict.json') as f:
d = json.load(f)
d = eval(d)
# Or this works as well
d = json.loads(f.read())
d = eval(d)
>>> d.keys()
['userName', 'userImage', 'repliedAt', 'score', 'reviewCreatedVersion', 'at', 'replyContent', 'content', 'reviewId', 'thumbsUpCount']
Are you sure that you have your source JSON correct? The JSON snippet you have provided is a string; it has a " at the start and end. So in its current form getting a string is correct behaviour.
Note also that it is a string representation of a Python dict rather than a JSON object. This is evidenced by the fact that the strings are denoted by single quotes rather than double, and the use of the Python keyword None rather than the JSON null.
If the JSON file were a representative of a plain object then the content would be something of the form:
{
"reviewId": "gp"AO...",
"userName": "...",
"replyContent": null,
"repliedAt": null
}
I.e. the first and last characters are curly braces, not double quotes.

python split string into multiple delimiters and put into dictionary

i have the below string that i am trying to split into a dictionary with specific names.
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
what I am hoping to obtain is:
>>> print(dict)
{'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
i'm quite new to working with dictionaries, so not too sure how it is supposed to turn out.
I have tried the below code, but it only provides instruction1 instead of the full list of instructions
delimiters = ['&nn', '&tr']
values = re.split('|'.join(delimiters), string1)
values.pop(0) # remove the initial empty string
keys = re.findall('|'.join(delimiters), string1)
output = dict(zip(keys, values))
print(output)
Use url-parsing.
from urllib import parse
url = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
d = parse.parse_qs(parse.urlparse(url).query)
print(d)
Returns:
{'nn': ['specialtime'],
'tr': ['instruction1', 'instruction2', 'instruction3'],
'x': ['klink:apple']}
And from this point, if necessary..., you would simply have to rename and pick your vars. Like this:
d = {
'namy_names':d.get('nn',['Empty'])[0],
'tracks':d.get('tr',[])
}
# {'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
This looks like url-encoded data, so you can/should use urllib.parse.parse_qs:
import urllib.parse
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
dic = urllib.parse.parse_qs(string1)
dic = {'namy_names': dic['nn'][0],
'tracks': dic['tr']}
# result: {'namy_names': 'specialtime',
# 'tracks': ['instruction1', 'instruction2', 'instruction3']}

Python replace values in unknown structure JSON file

Say that I have a JSON file whose structure is either unknown or may change overtime - I want to replace all values of "REPLACE_ME" with a string of my choice in Python.
Everything I have found assumes I know the structure. For example, I can read the JSON in with json.load and walk through the dictionary to do replacements then write it back. This assumes I know Key names, structure, etc.
How can I replace ALL of a given string value in a JSON file with something else?
This function recursively replaces all strings which equal the value original with the value new.
This function works on the python structure - but of course you can use it on a json file - by using json.load
It doesn't replace keys in the dictionary - just the values.
def nested_replace( structure, original, new ):
if type(structure) == list:
return [nested_replace( item, original, new) for item in structure]
if type(structure) == dict:
return {key : nested_replace(value, original, new)
for key, value in structure.items() }
if structure == original:
return new
else:
return structure
d = [ 'replace', {'key1': 'replace', 'key2': ['replace', 'don\'t replace'] } ]
new_d = nested_replace(d, 'replace', 'now replaced')
print(new_d)
['now replaced', {'key1': 'now replaced', 'key2': ['now replaced', "don't replace"]}]
I think there's no big risk if you want to replace any key or value enclosed with quotes (since quotes are escaped in json unless they are part of a string delimiter).
I would dump the structure, perform a str.replace (with double quotes), and parse again:
import json
d = { 'foo': {'bar' : 'hello'}}
d = json.loads(json.dumps(d).replace('"hello"','"hi"'))
print(d)
result:
{'foo': {'bar': 'hi'}}
I wouldn't risk to replace parts of strings or strings without quotes, because it could change other parts of the file. I can't think of an example where replacing a string without double quotes can change something else.
There are "clean" solutions like adapting from Replace value in JSON file for key which can be nested by n levels but is it worth the effort? Depends on your requirements.
Why not modify the file directly instead of treating it as a JSON?
with open('filepath') as f:
lines = f.readlines()
for line in lines:
line = line.replace('REPLACE_ME', 'whatever')
with open('filepath_new', 'a') as f:
f.write(line)
You could load the JSON file into a dictionary and recurse through that to find the proper values but that's unnecessary muscle flexing.
The best way is to simply treat the file as a string and do the replacements that way.
json_file = 'my_file.json'
with open(json_file) as f:
file_data = f.read()
file_data = file_data.replace('REPLACE_ME', 'new string')
<...>
with open(json_file, 'w') as f:
f.write(file_data)
json_data = json.loads(file_data)
From here the file can be re-written and you can continue to use json_data as a dict.
Well that depends, if you want to place all the strings entitled "REPLACE_ME" with the same string you can use this. The for loop loops through all the keys in the dictionary and then you can use the keys to select each value in the dictionary. If it is equal to your replacement string it will replace it with the string you want.
search_string = "REPLACE_ME"
replacement = "SOME STRING"
test = {"test1":"REPLACE_ME", "test2":"REPLACE_ME", "test3":"REPLACE_ME", "test4":"REPLACE_ME","test5":{"test6":"REPLACE_ME"}}
def replace_nested(test):
for key,value in test.items():
if type(value) is dict:
replace_nested(value)
else:
if value==search_string:
test[key] = replacement
replace_nested(test)
print(test)
To solve this problem in a dynamic way, I have obtained to use the same json file to declare the variables that we want to replace.
Json File :
{
"properties": {
"property_1": "value1",
"property_2": "value2"
},
"json_file_content": {
"key_to_find": "{{property_1}} is my value"
"dict1":{
"key_to_find": "{{property_2}} is my other value"
}
}
Python code (references Replace value in JSON file for key which can be nested by n levels):
import json
def fixup(self, a_dict:dict, k:str, subst_dict:dict) -> dict:
"""
function inspired by another answers linked below
"""
for key in a_dict.keys():
if key == k:
for s_k, s_v in subst_dict.items():
a_dict[key] = a_dict[key].replace("{{"+s_k+"}}",s_v)
elif type(a_dict[key]) is dict:
fixup(a_dict[key], k, subst_dict)
# ...
file_path = "my/file/path"
if path.exists(file_path):
with open(file_path, 'rt') as f:
json_dict = json.load(f)
fixup(json_dict ["json_file_content"],"key_to_find",json_dict ["properties"])
print(json_dict) # json with variables resolved
else:
print("file not found")
Hope it helps

Accessing an value in defaultdict and stripping out url portion of it

I have a very large defaultdict that has a dict within a dict, the inner dict containing html from an email body. I only want to return an http string from within the inner dict. What's the best way to go about extracting that?
Do I need to convert the dict to another data structure before using regex? Is there a better way? I'm still fairly new to Python and appreciate any pointers.
For example, what I'm working with:
defaultdict(<type 'dict'>, {16: {u'SEQ': 16, u'RFC822': u'Delivered-To:
somebody#email.com LOTS MORE HTML until http://the_url_I_want_to_extract.com' }}
One thing I've tried is using re.findall on defaultdict which didn't work:
confirmation_link = re.findall('Click this link to confirm your registration:<br />"
(.*?)"', body)
for conf in confirmation_link:
print conf
Error:
line 177, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer
You can only only use the regular expression, once you've iterated over your dictionary for the corresponding value:
import re
d = defaultdict(<type 'dict'>, {16: {u'SEQ': 16, u'RFC822': u'Delivered-To: somebody#email.com LOTS MORE HTML until http://the_url_I_want_to_extract.com' }}
for k, v in d.iteritems():
#v is the dictionary that contains your html string:
str_with_html = v['RFC822']
#this regular expression starts with matching http, and then
#continuing until a white space character is hit.
match = re.search("http[^\s]+", str_with_html)
if match:
print match.group(0)
Output:
http://the_url_I_want_to_extract.com

How to convert Unicode dict to dict

I am trying to convert :
datalist = [u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg'}",
u"{gallery: 'gal1', smallimage: 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg',largeimage: 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg'}"]
To list containing python dict. If i try to extract value using keyword i got this error:
for i in datalist:
print i['smallimage']
....:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-686ea4feba66> in <module>()
1 for i in datalist:
----> 2 print i['smallimage']
3
TypeError: string indices must be integers
How do i convert list containing Unicode Dict to Dict..
You could use the demjson module which has a non-strict mode that handles the data you have:
import demjson
for data in datalist:
dct = demjson.decode(data)
print dct['gallery'] # etc...
In this case, I'd hand-craft a regular expression to make these into something you can evaluate as Python:
import re
import ast
from functools import partial
keys = re.compile(r'(gallery|smallimage|largeimage)')
fix_keys = partial(keys.sub, r'"\1"')
for entry in datalist:
entry = ast.literal_eval(fix_keys(entry))
Yes, this is limited; but it works for this set and is robust as long as the keys match. The regular expression is simple to maintain. Moreover, this doesn't use any external dependencies, it's all based on batteries already included.
Result:
>>> for entry in datalist:
... print ast.literal_eval(fix_keys(entry))
...
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/3/_/3_13.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/3/_/3_13.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/5/_/5_3_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/5/_/5_3_1.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/_/1_22.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/1/_/1_22.jpg'}
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/4/_/4_7_1.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/4/_/4_7_1.jpg'}
Just as another thought, your list is properly formatted Yaml.
> yaml.load(u'{foo: "bar"}')['foo']
'bar'
And if you want to be really fancy and parse everything at once:
> data = yaml.load('['+','.join(datalist)+']')
> data[0]['smallimage']
'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'
> data[3]['gallery']
'gal1'
If your dictionary keys were quoted, you could
use json.loads to load the string.
import json
for i in datalist:
print json.loads(i)['smallimage']
(ast.literal_eval would have worked too...)
however, as it is, this will work with an old-school eval:
>>> class Mdict(dict):
... def __missing__(self,key):
... return key
...
>>> eval(datalist[0],Mdict(__builtins__=None))
{'largeimage': 'http://www.styleever.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/2/_/2_12.jpg', 'gallery': 'gal1', 'smallimage': 'http://www.styleever.com/media/catalog/product/cache/1/small_image/445x370/17f82f742ffe127f42dca9de82fb58b1/2/_/2_12.jpg'}
Note that this is probably vulnerable to injection attacks, so only use it if the string is from a trusted source.
Finally, for anyone wanting a short, although somewhat dense solution that uses only the standard library and isn't vulnerable to injection attacks... This little gem does the trick (assuming the dictionary keys are valid identifiers)!
import ast
class RewriteName(ast.NodeTransformer):
def visit_Name(self,node):
return ast.Str(s=node.id)
transformer = RewriteName()
for x in datalist:
tree = ast.parse(x,mode='eval')
transformer.visit(tree)
print ast.literal_eval(tree)['smallimage']
Your datalist is a list of unicode strings.
You could use eval, except your keys are not properly quoted. what you can do is requote your keys on the fly with replace:
for i in datalist:
my_dict = eval(i.replace("gallery", "'gallery'").replace("smallimage", "'smallimage'").replace("largeimage", "'largeimage'"))
print my_dict["smallimage"]
I don't see why the need for all the extra things such as using re or json...
fdict = {str(k): v for (k, v) in udict.items()}
Where udict is the dict that has unicode keys. Simply convert them to str. In your given data, you can simply...
datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]
Simple test:
>>> datalist = [{u'a':1,u'b':2},{u'a':1,u'b':2}]
[{u'a': 1, u'b': 2}, {u'a': 1, u'b': 2}]
>>> datalist = [dict((str(k), v) for (k, v) in i.items()) for i in datalist]
>>> datalist
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
No import re or import json. Simple and quick.

Categories