I'm not sure what I am doing wrong. I have a dictionary that I want to convert to JSON. My problem is with the escape \
How do I put a dictionary into JSON without the escape \
Here is my code:
def printJSON(dump):
print(json.dumps(dump, indent=4, sort_keys=True))
data = {'number':7, 'second_number':44}
json_data = json.dumps(data)
printJSON(json_data)
The output is:
"{\"second_number\": 44, \"number\": 7}"
I want the output to look like this:
"{"second_number": 44, "number": 7}"
The reason is because you are dumping your JSON data twice. Once outside the function and another inside it. For reference:
>>> import json
>>> data = {'number':7, 'second_number':44}
# JSON dumped once, without `\`
>>> json.dumps(data)
'{"second_number": 44, "number": 7}'
# JSON dumped twice, with `\`
>>> json.dumps(json.dumps(data))
'"{\\"second_number\\": 44, \\"number\\": 7}"'
If you print the data dumped twice, you will see what you are getting currently, i.e:
>>> print json.dumps(json.dumps(data))
"{\"second_number\": 44, \"number\": 7}"
I had a slightly different problem that resulted in the same issue. My code had this:
requests.post('https://example.com/data', data=clinicListBody).text
When it should have had this
requests.post('https://example.com/data', data=clinicListBody).json()
.text was returning a string with strings inside it, which is why I was seeing escaped json in the saved file.
Related
I have a JSON file and the inside of it looks like this:
"{'reviewId': 'gp:AOqpTOGiJUWB2pk4jpWSuvqeXofM9B4LQQ4Iom1mNeGzvweEriNTiMdmHsxAJ0jaJiK7CbjJ_s7YEWKE2DA_Qzo', 'userName': '\u00c0ine Mongey', 'userImage': 'https://play-lh.googleusercontent.com/a-/AOh14GhUv3c6xHP4kvLSJLaRaydi6o2qxp6yZhaLeL8QmQ', 'content': \"Honestly a great game, it does take a while to get money at first, and they do make it easier to get money by watching ads. I'm glad they don't push it though, and the game is super relaxing and fun!\", 'score': 5, 'thumbsUpCount': 2, 'reviewCreatedVersion': '1.33.0', 'at': datetime.datetime(2021, 4, 23, 8, 20, 34), 'replyContent': None, 'repliedAt': None}"
I am trying to convert this into a dict and then to a pandas DataFrame. I tried this but it will just turn this into a string representation of a dict, not a dict itself:
with open('sampledict.json') as f:
dictdump = json.loads(f.read())
print(type(dictdump))
I feel like I am so close now but I can't find out what I miss to get a dict out of this. Any input will be greatly appreciated!
If I get your data format correctly, this will work:
with open('sampledict.json') as f:
d = json.load(f)
d = eval(d)
# Or this works as well
d = json.loads(f.read())
d = eval(d)
>>> d.keys()
['userName', 'userImage', 'repliedAt', 'score', 'reviewCreatedVersion', 'at', 'replyContent', 'content', 'reviewId', 'thumbsUpCount']
Are you sure that you have your source JSON correct? The JSON snippet you have provided is a string; it has a " at the start and end. So in its current form getting a string is correct behaviour.
Note also that it is a string representation of a Python dict rather than a JSON object. This is evidenced by the fact that the strings are denoted by single quotes rather than double, and the use of the Python keyword None rather than the JSON null.
If the JSON file were a representative of a plain object then the content would be something of the form:
{
"reviewId": "gp"AO...",
"userName": "...",
"replyContent": null,
"repliedAt": null
}
I.e. the first and last characters are curly braces, not double quotes.
I've a text document that has several thousand jsons strings in the form of: "{...}{...}{...}". This is not a valid json it self but each {...} is.
I currently use the following a regular expression to split them:
fp = open('my_file.txt', 'r')
raw_dataset = (re.sub('}{', '}\n{', fp.read())).split('\n')
Which basically breaks every line where a curly bracket closes and other opens (}{ -> }\n{) so I can split them into different lines.
The problem is that few of them have a tags attribute written as "{tagName1}{tagName2}" which breaks my regular expression.
An example would be:
'{"name":\"Bob Dylan\", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}'
Is parsed into
'{"name":"Bob Dylan", "tags":"{Artist}'
'{Singer}"}'
'{"name": "Michael Jackson"}'
instead of
'{"name":"Bob Dylan", "tags":"{Artist}{Singer}"}'
'{"name": "Michael Jackson"}'
What is the proper way of achieve this for further json parsing?
Use the raw_decode method of json.JSONDecoder
>>> import json
>>> d = json.JSONDecoder()
>>> x='{"name":\"Bob Dylan\", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}'
>>> d.raw_decode(x)
({'tags': '{Artist}{Singer}', 'name': 'Bob Dylan'}, 47)
>>> x=x[47:]
>>> d.raw_decode(x)
({'name': 'Michael Jackson'}, 27)
raw_decode returns a 2-tuple, the first element being the decoded JSON and the second being the offset in the string of the next byte after the JSON ended.
To loop until the end or until an invalid JSON element is encountered:
>>> while True:
... try:
... j,n = d.raw_decode(x)
... except ValueError:
... break
... print(j)
... x=x[n:]
...
{'name': 'Bob Dylan', 'tags': '{Artist}{Singer}'}
{'name': 'Michael Jackson'}
When the loop breaks, inspection of x will reveal if it has processed the whole string or had encountered a JSON syntax error.
With a very long file of short elements you might read a chunk into a buffer and apply the above loop, concatenating anything that's left over with the next chunk after the loop breaks.
You can use the jq command line utility to transfer your input to json. Let's say you have the following input:
input.txt:
{"name":"Bob Dylan", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}
You can use jq -s, which consumes multiple json documents from input and transfers them into a single output array:
jq -s . input.txt
Gives you:
[
{
"name": "Bob Dylan",
"tags": "{Artist}{Singer}"
},
{
"name": "Michael Jackson"
}
]
I've just realized that there are python bindings for libjq. Meaning you
don't need to use the command line, you can use jq directly in python.
https://github.com/mwilliamson/jq.py
However, I've not tried it so far. Let me give it a try :) ...
Update: The above library is nice, but it does not support the slurp mode so far.
you need to make a parser ... I dont think regex can help you for
data = ""
curlies = []
def get_dicts(file_text):
for letter in file_text:
data += letter
if letter == "{":
curlies.append(letter)
elif letter == "}":
curlies.pop() # remove last
if not curlies:
yield json.loads(data)
data = ""
note that this does not actually solve the problem that {name:"bob"} is not valid json ... {"name":"bob"} is
this will also break in the event you have weird unbalanced parenthesis inside of strings ie {"name":"{{}}}"} would break this
really your json is so broken based on your example your best bet is probably to edit it by hand and fix the code that is generating it ... if that is not feasible you may need to write a more complex parser using pylex or some other grammar library (effectively writing your own language parser)
I have the following JSON data.
"[
\"msgType\": \"0\",
\"tid\": \"1\",
\"data\": \"[
{
\\\"EventName\\\": \\\"TExceeded\\\",
\\\"Severity\\\": \\\"warn\\\",
\\\"Subject\\\": \\\"Exceeded\\\",
\\\"Message\\\": \\\"tdetails: {
\\\\\\\"Message\\\\\\\": \\\\\\\"my page tooktoolong(2498ms: AT: 5ms,
BT: 1263ms,
CT: 1230ms),
andexceededthresholdof5ms\\\\\\\",
\\\\\\\"Referrer\\\\\\\": \\\\\\\"undefined\\\\\\\",
\\\\\\\"Session\\\\\\\": \\\\\\\"None\\\\\\\",
\\\\\\\"ResponseTime\\\\\\\": 0,
\\\\\\\"StatusCode\\\\\\\": 0,
\\\\\\\"Links\\\\\\\": 215,
\\\\\\\"Images\\\\\\\": 57,
\\\\\\\"Forms\\\\\\\": 2,
\\\\\\\"Platform\\\\\\\": \\\\\\\"Linuxx86_64\\\\\\\",
\\\\\\\"BrowserAppname\\\\\\\": \\\\\\\"Netscape\\\\\\\",
\\\\\\\"AppCodename\\\\\\\": \\\\\\\"Mozilla\\\\\\\",
\\\\\\\"CPUs\\\\\\\": 8,
\\\\\\\"Language\\\\\\\": \\\\\\\"en-GB\\\\\\\",
\\\\\\\"isEvent\\\\\\\": \\\\\\\"true\\\\\\\",
\\\\\\\"PageLatency\\\\\\\": 2498,
\\\\\\\"Threshold\\\\\\\": 5,
\\\\\\\"AT\\\\\\\": 5,
\\\\\\\"BT\\\\\\\": 1263,
\\\\\\\"CT\\\\\\\": 1230
}\\\",
\\\"EventTimestamp\\\": \\\"1432514783269\\\"
}
]\",
\"Timestamp\": \"1432514783269\",
\"AppName\": \"undefined\",
\"Group\": \"UndefinedGroup\"
]"
I want to make this JSON file into a single level of wrapping.i.e I want to remove the nested structure inside and copy that data over to the top level JSON structure. How can I do this?
If this strucutre is named json_data
I want to be able to access
json_data['Platform']
json_data[BrowserAppname']
json_data['Severity']
json_data['msgType']
Basically some kind of rudimentary normalization.What is the easiest way to do this using python
A generally unsafe but probably okay in this case solution would be:
import json
d = json.loads(json_string.replace('\\', ''))
I'm not sure what happened but this doesn't look like valid JSON.
You have some double-quotes escaped once, some twice, some three times etc.
You have key/value pairs inside of a list-like object []
tdetails is missing a trailing quote
Even if you fix the above you still have your data list quoted as a multi-line string which is invalid.
It appears to be that this "JSON" was constructed by hand, by someone with no knowledge of JSON.
You can try "massaging" the data into JSON with the following:
import re
x = re.sub(r'\\+', '', js_str)
x = re.sub(r'\n', '', js_str)
x = '{' + js_str.strip()[1:-1] + '}'
Which would make the string almost json like, but you still need to fix point #3.
Here is how I dump a file
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()), fp)
The results is
"a,b,c"
I would like:
a,b,c
Thanks
Before doing a string replace, you might want to strip the quotation marks:
print '"a,b,c"'.strip('"')
Output:
a,b,c
That's closer to what you want to achieve. Even just removing the first and the last character works: '"a,b,c"'[1:-1].
But have you looked into this question?
To remove the quotation marks in the keys only, which may be important if you are parsing it later (presumably with some tolerant parser or maybe you just pipe it directly into node for bizarre reasons), you could try the following regex.
re.sub(r'(?<!: )"(\S*?)"', '\\1', json_string)
One issue is that this regex expects fields to be seperated key: value and it will fail for key:value. You could make it work for the latter with a minor change, but similarly it won't work for variable amounts of whitespace after :
There may be other edge cases but it will work with outputs of json.dumps, however the results will not be parseable by json. Some more tolerant parsers like yaml might be able to read the results.
import re
regex = r'(?<!: )"(\S*?)"'
o = {"noquotes" : 127, "put quotes here" : "and here", "but_not" : "there"}
s = json.dumps(o)
s2 = json.dumps(o, indent=3)
strip_s = re.sub(regex,'\\1',s)
strip_s2 = re.sub(regex,'\\1',s2)
print(strip_s)
print(strip_s2)
assert(json.loads(strip_s) == json.loads(s) == json.loads(strip_s2) == json.loads(s2) == object)
Will raise a ValueError but prints what you want.
Well, that's not valid json, so the json module won't help you to write that data. But you can do this:
import json
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(json.dumps(','.join(data)).replace('"', ''))
That's because you asked for json, but since that's not json, this should suffice:
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(','.join(data))
Use python's built-in string replace function
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()).replace('\"',''), fp)
Just use for loop to assign list to string.
import json
with open('json_file') as f:
data = json.loads(f.read())
for value_wo_bracket in data['key_name']:
print(value_wo_bracket)
Note there is difference between json.load and json.loads
How do I parse a json output get the list from data only and then add the output into say google.com/confidetial and the other strings in the list.
so my json out put i will name it "text"
text = {"success":true,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}.
What I am looking to do is get the list under data only. so far the script i got is giving me the entire json out put.
json.loads(text)
print text
output = urllib.urlopen("http://google.com" % text)
print output.geturl()
print output.read()
jsonobj = json.loads(text)
print jsonobj['data']
Will print the list in the data section of your JSON.
If you want to open each as a link after google.com, you could try this:
def processlinks(text):
output = urllib.urlopen('http://google.com/' % text)
print output.geturl()
print output.read()
map(processlinks, jsonobj['data'])
info = json.loads(text)
json_text = json.dumps(info["data"])
Using json.dumps converts the python data structure gotten from json.loads back to regular json text.
So, you could then use json_text wherever you were using text before and it should only have the selected key, in your case: "data".
Perhaps something like this where result is your JSON data:
from itertools import product
base_domains = ['http://www.google.com', 'http://www.example.com']
result = {"success":True,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}
for path in product(base_domains, result['data']):
print '/'.join(path) # do whatever
http://www.google.com/Confidential
http://www.google.com/L1
http://www.google.com/Secret
http://www.google.com/Secret123
http://www.google.com/foobar
http://www.google.com/maret1
http://www.google.com/maret2
http://www.google.com/posted
http://www.google.com/rontest
http://www.example.com/Confidential
http://www.example.com/L1
http://www.example.com/Secret
http://www.example.com/Secret123
http://www.example.com/foobar
http://www.example.com/maret1
http://www.example.com/maret2
http://www.example.com/posted
http://www.example.com/rontest