get JSON var value in python - python

In Python, what is the easiest way to extract a line containing a JavaScript variable definition and get the value assigned to it (I'm scraping the JavaScript from webpages using BeautifulSoup), which is contained within curly braces (i.e. {, }), keeping in mind that the variable itself may contain any several levels of nested curly braces within it.
For e.g. with the input
var myVar = { "a": "123","b":"345", "c": {"c1":20,"c2":"c123", "c3": {"c3_1": {"c3_1_1":"12"}}}, "d":21, "e":["1","2"]}
I would like to get the complete myVar value as a string (as I want to convert this to a Python list after that),
{ "a": "123","b":"345", "c": {"c1":20,"c2":"c123", "c3": {"c3_1": {"c3_1_1":"12"}}}, "d":21, "e":["1","2"]}
Any help would be great as I am new to Python.

Use str.index to find where json object start and re.sub (makes "a": "123" from a:"123") with str.replace (changes single quotes to double quotes here: ['1','2']) to fix json:
import json
import re
var = '''var myVar = { a: "123",b:"345", c: {c1:20,c2:"c123", c3: {c3_1: {c3_1_1:"12"}}}, d:21, e:['1','2']}'''
v = var[var.index('{'):]
v = re.sub(r'(\w*):', r'"\1":', v)
v = v.replace('\'', '\"')
>>> v
'{ "a": "123","b":"345", "c": {"c1":20,"c2":"c123", "c3": {"c3_1": {"c3_1_1":"12"}}}, "d":21, "e":["1","2"]}'
>>> json.loads(v)
{u'a': u'123', u'c': {u'c3': {u'c3_1': {u'c3_1_1': u'12'}}, u'c2': u'c123', u'c1': 20}, u'b': u'345', u'e': [u'1', u'2'], u'd': 21}

import json
a = json.dumps(myVar)
The variable a is of the type string in this example. You can manipulate it as you like.

Related

String with backslashes to dictionary

I have the following string:
txt = "{\'legs_a\': 1,\'legs_b\': 0,\'score_a\': 304,\'score_b\': 334,\'turn\': B,\'i\': 2,\'z\': 19}"
When I print it, I see the below output in my console
{'legs_a': 1,'legs_b': 0,'score_a': 304,'score_b': 334,'turn': B,'i': 2,'z': 19}
I want to make a dictionary of the string by using ast.literal_eval()
import ast
d = ast.literal_eval(txt)
This yields the following error:
{ValueError}malformed node or string: <_ast.Name object at
0x7fb8b8ab4fa0>
Please explain what's going wrong? How can I make a dictionary from the string? Thanks
B is an undefined variable or unquoted string.
try:
txt = "{\'legs_a\': 1,\'legs_b\': 0,\'score_a\': 304,\'score_b\': 334,\'turn\': \'B\',\'i\': 2,\'z\': 19}"
d = ast.literal_eval(txt)
print(d)
Output:
{'legs_a': 1, 'legs_b': 0, 'score_a': 304, 'score_b': 334, 'turn': 'B', 'i': 2, 'z': 19}
Note if you wanted to deserialze the JSON string using json.loads() function then you would need to replace single quotes with double quotes.
data = json.loads(txt.replace("'", '"'))

Replace all ' ' in Dictionary with " " [duplicate]

I am trying to create a python dictionary which is to be used as a java script var inside a html file for visualization purposes. As a requisite, I am in need of creating the dictionary with all names inside double quotes instead of default single quotes which Python uses. Is there an easy and elegant way to achieve this.
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = dict(couples)
print pairs
Generated Output:
{'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
Expected Output:
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
I know, json.dumps(pairs) does the job, but the dictionary as a whole is converted into a string which isn't what I am expecting.
P.S.: Is there an alternate way to do this with using json, since I am dealing with nested dictionaries.
json.dumps() is what you want here, if you use print(json.dumps(pairs)) you will get your expected output:
>>> pairs = {'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
>>> print(pairs)
{'arun': 'maya', 'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana'}
>>> import json
>>> print(json.dumps(pairs))
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
You can construct your own version of a dict with special printing using json.dumps():
>>> import json
>>> class mydict(dict):
def __str__(self):
return json.dumps(self)
>>> couples = [['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
>>> pairs = mydict(couples)
>>> print pairs
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
You can also iterate:
>>> for el in pairs:
print el
arun
bill
jack
hari
# do not use this until you understand it
import json
class doubleQuoteDict(dict):
def __str__(self):
return json.dumps(self)
def __repr__(self):
return json.dumps(self)
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = doubleQuoteDict(couples)
print pairs
Yields:
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
Here's a basic print version:
>>> print '{%s}' % ', '.join(['"%s": "%s"' % (k, v) for k, v in pairs.items()])
{"arun": "maya", "bill": "samantha", "jack": "ilena", "hari": "aradhana"}
The premise of the question is wrong:
I know, json.dumps(pairs) does the job, but the dictionary
as a whole is converted into a string which isn't what I am expecting.
You should be expecting a conversion to a string. All "print" does is convert an object to a string and send it to standard output.
When Python sees:
print somedict
What it really does is:
sys.stdout.write(somedict.__str__())
sys.stdout.write('\n')
As you can see, the dict is always converted to a string (afterall a string is the only datatype you can send to a file such as stdout).
Controlling the conversion to a string can be done either by defining __str__ for an object (as the other respondents have done) or by calling a pretty printing function such as json.dumps(). Although both ways have the same effect of creating a string to be printed, the latter technique has many advantages (you don't have to create a new object, it recursively applies to nested data, it is standard, it is written in C for speed, and it is already well tested).
The postscript still misses the point:
P.S.: Is there an alternate way to do this with using json, since I am
dealing with nested dictionaries.
Why work so hard to avoid the json module? Pretty much any solution to the problem of printing nested dictionaries with double quotes will re-invent what json.dumps() already does.
The problem that has gotten me multiple times is when loading a json file.
import json
with open('json_test.json', 'r') as f:
data = json.load(f)
print(type(data), data)
json_string = json.dumps(data)
print(json_string)
I accidentally pass data to some function that wants a json string and I get the error that single quote is not valid json. I recheck the input json file and see the double quotes and then scratch my head for a minute.
The problem is that data is a dict not a string, but when Python converts it for you it is NOT valid json.
<class 'dict'> {'bill': 'samantha', 'jack': 'ilena', 'hari': 'aradhana', 'arun': 'maya'}
{"bill": "samantha", "jack": "ilena", "hari": "aradhana", "arun": "maya"}
If the json is valid and the dict does not need processing before conversion to string, just load as string does the trick.
with open('json_test.json', 'r') as f:
json_string = f.read()
print(json_string)
It's Easy just 2 steps
step1:converting your dict to list
step2:iterate your list and convert as json .
For better understanding check down below snippet
import json
couples = [
['jack', 'ilena'],
['arun', 'maya'],
['hari', 'aradhana'],
['bill', 'samantha']]
pairs = [dict(couples)]#converting your dict to list
print(pairs)
#iterate ur list and convert as json
for x in pairs:
print("\n after converting: \n\t",json.dumps(x))#json like structure

python split string into multiple delimiters and put into dictionary

i have the below string that i am trying to split into a dictionary with specific names.
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
what I am hoping to obtain is:
>>> print(dict)
{'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
i'm quite new to working with dictionaries, so not too sure how it is supposed to turn out.
I have tried the below code, but it only provides instruction1 instead of the full list of instructions
delimiters = ['&nn', '&tr']
values = re.split('|'.join(delimiters), string1)
values.pop(0) # remove the initial empty string
keys = re.findall('|'.join(delimiters), string1)
output = dict(zip(keys, values))
print(output)
Use url-parsing.
from urllib import parse
url = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
d = parse.parse_qs(parse.urlparse(url).query)
print(d)
Returns:
{'nn': ['specialtime'],
'tr': ['instruction1', 'instruction2', 'instruction3'],
'x': ['klink:apple']}
And from this point, if necessary..., you would simply have to rename and pick your vars. Like this:
d = {
'namy_names':d.get('nn',['Empty'])[0],
'tracks':d.get('tr',[])
}
# {'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
This looks like url-encoded data, so you can/should use urllib.parse.parse_qs:
import urllib.parse
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
dic = urllib.parse.parse_qs(string1)
dic = {'namy_names': dic['nn'][0],
'tracks': dic['tr']}
# result: {'namy_names': 'specialtime',
# 'tracks': ['instruction1', 'instruction2', 'instruction3']}

Python Convert string to dict

I have a string :
'{tomatoes : 5 , livestock :{cow : 5 , sheep :2 }}'
and would like to convert it to
{
"tomatoes" : "5" ,
"livestock" :"{"cow" : "5" , "sheep" :"2" }"
}
Any ideas ?
This has been settled in 988251
In short; use the python ast library's literal_eval() function.
import ast
my_string = "{'key':'val','key2':2}"
my_dict = ast.literal_eval(my_string)
The problem with your input string is that it's actually not a valid JSON because your keys are not declared as strings, otherwise you could just use the json module to load it and be done with it.
A simple and dirty way to get what you want is to first turn it into a valid JSON by adding quotation marks around everything that's not a whitespace or a syntax character:
source = '{tomatoes : 5 , livestock :{cow : 5 , sheep :2 }}'
output = ""
quoting = False
for char in source:
if char.isalnum():
if not quoting:
output += '"'
quoting = True
elif quoting:
output += '"'
quoting = False
output += char
print(output) # {"tomatoes" : "5" , "livestock" :{"cow" : "5" , "sheep" :"2" }}
This gives you a valid JSON so now you can easily parse it to a Python dict using the json module:
import json
parsed = json.loads(output)
# {'livestock': {'sheep': '2', 'cow': '5'}, 'tomatoes': '5'}
What u have is a JSON formatted string which u want to convert to python dictionary.
Using the JSON library :
import json
with open("your file", "r") as f:
dictionary = json.loads(f.read());
Now dictionary contains the data structure which ur looking for.
Here is my answer:
dict_str = '{tomatoes: 5, livestock: {cow: 5, sheep: 2}}'
def dict_from_str(dict_str):
while True:
try:
dict_ = eval(dict_str)
except NameError as e:
key = e.message.split("'")[1]
dict_str = dict_str.replace(key, "'{}'".format(key))
else:
return dict_
print dict_from_str(dict_str)
My strategy is to convert the dictionary str to a dict by eval. However, I first have to deal with the fact that your dictionary keys are not enclosed in quotes. I do that by evaluating it anyway and catching the error. From the error message, I extract the key that was interpreted as an unknown variable, and enclose it with quotes.

Parse Key Value Pairs in Python

So I have a key value file that's similar to JSON's format but it's different enough to not be picked up by the Python JSON parser.
Example:
"Matt"
{
"Location" "New York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat" "1"
}
}
Is there any easy way to parse this text file and store the values into an array such that I could access the data using a format similar to Matt[Items][Banana]? There is only to be one pair per line and a bracket should denote going down a level and going up a level.
You could use re.sub to 'fix up' your string and then parse it. As long as the format is always either a single quoted string or a pair of quoted strings on each line, you can use that to determine where to place commas and colons.
import re
s = """"Matt"
{
"Location" "New York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat" "1"
}
}"""
# Put a colon after the first string in every line
s1 = re.sub(r'^\s*(".+?")', r'\1:', s, flags=re.MULTILINE)
# add a comma if the last non-whitespace character in a line is " or }
s2 = re.sub(r'(["}])\s*$', r'\1,', s1, flags=re.MULTILINE)
Once you've done that, you can use ast.literal_eval to turn it into a Python dict. I use that over JSON parsing because it allows for trailing commas, without which the decision of where to put commas becomes a lot more complicated:
import ast
data = ast.literal_eval('{' + s2 + '}')
print data['Matt']['Items']['Banana']
# 2
Not sure how robust this approach is outside of the example you've posted but it does support for escaped characters and deeper levels of structured data. It's probably not going to be fast enough for large amounts of data.
The approach converts your custom data format to JSON using a (very) simple parser to add the required colons and braces, the JSON data can then be converted to a native Python dictionary.
import json
# Define the data that needs to be parsed
data = '''
"Matt"
{
"Location" "New \\"York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat"
{
"foo" "bar"
}
}
}
'''
# Convert the data from custom format to JSON
json_data = ''
# Define parser states
state = 'OUT'
key_or_value = 'KEY'
for c in data:
# Handle quote characters
if c == '"':
json_data += c
if state == 'IN':
state = 'OUT'
if key_or_value == 'KEY':
key_or_value = 'VALUE'
json_data += ':'
elif key_or_value == 'VALUE':
key_or_value = 'KEY'
json_data += ','
else:
state = 'IN'
# Handle braces
elif c == '{':
if state == 'OUT':
key_or_value = 'KEY'
json_data += c
elif c == '}':
# Strip trailing comma and add closing brace and comma
json_data = json_data.rstrip().rstrip(',') + '},'
# Handle escaped characters
elif c == '\\':
state = 'ESCAPED'
json_data += c
else:
json_data += c
# Strip trailing comma
json_data = json_data.rstrip().rstrip(',')
# Wrap the data in braces to form a dictionary
json_data = '{' + json_data + '}'
# Convert from JSON to the native Python
converted_data = json.loads(json_data)
print(converted_data['Matt']['Items']['Banana'])

Categories