how to prevent python json.loads() to decoding characters unnecessarily - python

I have a file that has:
{
"name": "HOSTNAME_HTTP",
"description": "Custom hostname for http service route. Leave blank for default hostname, e.g.: \u003capplication-name\u003e-\u003cproject\u003e.\u003cdefault-domain-suffix\u003e"
}
when I open the file using:
with open('data.txt', 'r') as file:
data = file.read()
I pass this to json.loads and the content in data is replaced with:
<application>...</application>
How can I prevent python json.loads from messing with the encoding in the content?

You could use a workaround like this to escape the unicode sequences:
>>> obj = json.loads(data.replace('\\', '\\\\'))
>>> obj
{'name': 'HOSTNAME_HTTP',
'description': 'Custom hostname for http service route. Leave blank for default hostname, e.g.: \\u003capplication-name\\u003e-\\u003cproject\\u003e.\\u003cdefault-domain-suffix\\u003e'}
And then when you're done modifying:
>>> print(json.dumps(obj).replace('\\\\', '\\'))
{"name": "HOSTNAME_HTTP", "description": "Custom hostname for http service route. Leave blank for default hostname, e.g.: \u003capplication-name\u003e-\u003cproject\u003e.\u003cdefault-domain-suffix\u003e"}
If you expect other backslashes in the file, it would be safer to use regular expressions:
import re
from_pattern = re.compile(r'(\\u[0-9a-fA-F]{4})')
to_pattern = re.compile(r'\\(\\u[0-9a-fA-F]{4})')
def from_json_escaped(path):
with open(path, 'r') as f:
return json.loads(from_pattern.sub(r'\\\1', f.read()))
def to_json_escaped(path, obj):
with open(path, 'w') as f:
f.write(to_pattern.sub(r'\1', json.dumps(obj)))

I found a solution:
import json
from json.decoder import JSONDecoder
with open('data.txt', 'r') as file:
data = file.read()
data_without_dump = '{"data":\"' + data + '\"}'
datum_dump = json.dumps(data)
datum = '{"data": ' + datum_dump + '}'
datum_load = json.loads(datum)
datum_load_without_dump = json.loads(data_without_dump)
print(datum_dump)
print(datum)
print(datum_load["data"])
print(datum_load_without_dump["data"])
print(type(datum_dump), type(datum), type(datum_load))
Output:
"\\u003capplication\\u003e.....\\u003c/application\\u003e"
{"data": "\\u003capplication\\u003e.....\\u003c/application\\u003e"}
\u003capplication\u003e.....\u003c/application\u003e
<application>.....</application>
<class 'str'> <class 'str'> <class 'dict'>
My reasoning:
json.loads : Deserialize a str or unicode instance containing a JSON document to a Python object.
json.dumps : Serialize obj to a JSON formatted str.
So, using them in cascading gets the desired result.

Related

Handling a Txt file content as a String Variable

I have a json file, and I need to read all of that json file content as String data. How can I read all the data and set a variable as a String for all of that content? Json file has blanks, new lines, special characters etc if it's neccesarry.
Thanks for your help!
import json
from ast import literal_eval
with open('<path_to_json_data>/json_data.txt') as f:
json_data = json.load(f) # dict object
print(json_data, type(json_data))
json_data_as_str = str(json_data) # dict-->str object
print(json_data_as_str, type(json_data_as_str))
data = literal_eval(json_data_as_str) # str-->dict object again
print(data, type(data))
Hope it helps
Simple as this example
import json
with open("path/to/json/filename.json", "r") as json_file:
data = json.load(json_file)
print(data)
dataStr = json.dumps(data)
print(dataStr)
use json.loads
import json
with open(file_name, "r") as fp:
as_string = str(json.loads(fp.read()))

Why Python JSON output always added with escape string?

the code
import json
jsonData = {
'url' : 'test/file.jpg'
}
arrayData = json.dumps(jsonData)
# load the json to a string
resp = json.loads(arrayData)
resp['url'] = resp['url'].replace(r'/',r'\/')
print('without JSON output:',resp['url'])
jsonDataNew = [
{'url' : str(resp['url'])}
]
json_data = json.dumps(jsonDataNew)
json_data = json.loads(json_data)
print('with JSON output')
print(json_data)
the output
without JSON output: test\/file.jpg
with JSON output
[{'url': 'test\\/file.jpg'}]
the desired output would be
with JSON output
[{'url': 'test\/file.jpg'}]
no matter how, im not able to remove the escape in the JSON output, even converted into JSON file. is this something unavoidable in Python?
This is because it uses backslash to escape backslash. In that sense, the double backslash you saw is actually interpreted as a single character.
For an example,
>>> a = 'test\\'
>>> a
'test\\'
>>> print(a)
test\
You can read up for more on repr
On the other hand, if what you want to do is to escape the forward slashes. Perhaps the easier and cleaner solution is to use ujson.
import ujson
jsonData = {
'url' : 'test/file.jpg'
}
json_data = ujson.dumps(jsonData, escape_forward_slashes=True)
print(json_data)
The output will be :
{"url":"test\/file.jpg"}

How can I replace replace \" with '

I have the following content:
{
"z":"[{\"ItemId\":\"1234\",\"a\":\"1234\",\"b\":\"4567\",\"c\":\"d\"}]"
}
This is a part of the json response I get from a certain API. I need to replace the \"s with 's. Unfortunately, that's where I got stuck!
Most of the answers I get are simply replacing the \ with "" or " " so that did not help me. So my question are the following:
How can I replace the \" with ':
in a file where I copy-pasted the content?
if I receive this as a response to a certain API call?
I tried the following to replace the content in a file but I am clearly only replacing the "s with ':
with open(file, "r") as f:
content = f.read()
new_content = content.replace("\"", "'")
with open(file, "w") as new_file:
new_file.write(new_content)
If what you're trying to do is transform each value from a JSON string to a Python repr() string, while keeping the wrapper format as JSON, that might look like:
with open(filename, "r") as old_file:
old_content = json.load(old_file)
new_content = {k: repr(json.loads(v)) for k, v in old_content.items()}
with open(filename, "w") as new_file:
json.dump(new_content, new_file)
If your old file contains:
{"z":"[{\"ItemId\":\"1234\",\"a\":\"1234\",\"b\":\"4567\",\"c\":\"d\"}]"}
...the new file will contain:
{"z": "[{'ItemId': '1234', 'a': '1234', 'b': '4567', 'c': 'd'}]"}
Note that in this new file, the inner fields are now in Python format, not JSON format; they can no longer be parsed by JSON parsers. Usually, I would suggest doing something different instead, as in:
with open(filename, "r") as old_file:
old_content = json.load(old_file)
new_content = {k: json.loads(v) for k, v in old_content.items()}
with open(filename, "w") as new_file:
json.dump(new_content, new_file)
...which would yield an output file with:
{"z": [{"ItemId": "1234", "a": "1234", "b": "4567", "c": "d"}]}
...which is both easy-to-read and easy to process with standard JSON-centric tools (jq, etc).
Using json module, you can dumps the data then loads it using the following:
import json
data = {
"z": "[{\"ItemId\":\"1234\",\"a\":\"1234\",\"b\":\"4567\",\"c\":\"d\"}]"
}
g = json.dumps(data)
c = json.loads(data)
print(c)
print(str(c).replace("\"","'"))
Output:
{'z': '[{"ItemId":"1234","a":"1234","b":"4567","c":"d"}]'}
{'z': '[{'ItemId':'1234','a':'1234','b':'4567','c':'d'}]'}

How to write human-readable data to to a JSON file

When I export a file from python to json file it contains charecters like,
{"-": "text", "menu": {"-": "node", "id": 2244676, "prev": "[2/40] \u0d2a\u0d4d\u0d30\u0d2f\u0d4b\u0d1c\u0d15 \u0d15\u0d4d\u0d30\u0d3f\u0d2f
I used
with open('messages.json', 'w') as outfile:
json.dump(all_messages, outfile, cls=DateTimeEncoder)
in python. How to convert it to normal unicode text?
If you want the output JSON to be human-readable, use UTF-8 encoding and the ensure_ascii=False parameter:
with open('messages.json', 'w', encoding='utf8') as outfile:
json.dump(all_messages, outfile, cls=DateTimeEncoder,ensure_ascii=False)
If you just want to read the data back in again, json.load will convert it back to Unicode:
with open('messages.json', encoding='utf8') as infile:
data = json.load(infile)
Examples with simple strings:
>>> s = '[2/40] പ്രയോജക ക്രിയ'
>>> print(json.dumps(s))
"[2/40] \u0d2a\u0d4d\u0d30\u0d2f\u0d4b\u0d1c\u0d15 \u0d15\u0d4d\u0d30\u0d3f\u0d2f"
>>> print(json.dumps(s,ensure_ascii=False))
"[2/40] പ്രയോജക ക്രിയ"
>>> out = json.dumps(s)
>>> out
'"[2/40] \\u0d2a\\u0d4d\\u0d30\\u0d2f\\u0d4b\\u0d1c\\u0d15 \\u0d15\\u0d4d\\u0d30\\u0d3f\\u0d2f"'
>>> json.loads(out)
'[2/40] പ്രയോജക ക്രിയ'

Writing to a JSON file and updating said file

I have the following code that will write to a JSON file:
import json
def write_data_to_table(word, hash):
data = {word: hash}
with open("rainbow_table\\rainbow.json", "a+") as table:
table.write(json.dumps(data))
What I want to do is open the JSON file, add another line to it, and close it. How can I do this without messing with the file?
As of right now when I run the code I get the following:
write_data_to_table("test1", "0123456789")
write_data_to_table("test2", "00123456789")
write_data_to_table("test3", "000123456789")
#<= {"test1": "0123456789"}{"test2": "00123456789"}{"test3": "000123456789"}
How can I update the file without completely screwing with it?
My expected output would probably be something along the lines of:
{
"test1": "0123456789",
"test2": "00123456789",
"test3": "000123456789",
}
You may read the JSON data with :
parsed_json = json.loads(json_string)
You now manipulate a classic dictionary. You can add data with :
parsed_json.update({'test4': 0000123456789})
Then you can write data to a file using :
with open('data.txt', 'w') as outfile:
json.dump(parsed_json, outfile)
If you are sure the closing "}" is the last byte in the file you can do this:
>>> f = open('test.json', 'a+')
>>> json.dump({"foo": "bar"}, f) # create the file
>>> f.seek(0)
>>> f.read()
'{"foo": "bar"}'
>>> f.seek(-1, 2)
>>> f.write(',\n', f.write(',\n' + json.dumps({"spam": "bacon"})[1:]))
>>> f.seek(0)
>>> print(f.read())
{"foo": "bar",
"spam": "bacon"}
Since your data is not hierarchical, you should consider a flat format like "TSV".

Categories