I have this
import json
header = """{"fields": ["""
print(header)
with open('fields.json', 'w') as outfile:
json.dump(header, outfile)
and the return of print it's ok:
{"fields": [
but what is in the fields.json it's
"{\"fields\": ["
Why and how can I solve it?
Thanks
There is no issue there. The \ in the file is to escape the ". To see this read the json back and print it. So to add to your code
import json
header = """{"fields": ["""
print(header)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json', 'w') as outfile:
json.dump(header, outfile)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json') as data_file:
data = json.load(data_file)
print(data)
you will see the data is printed as {"fields": [
The JSON you are trying to write is bring treated as a string
so " is converted to \" . to avoid that we need to decode json using json.loads() before writing to file
The json should be complete or else json.loads() will throw an error
import json
header = """{"name":"sk"}"""
h = json.loads(header)
print(header)
with open('fields.json', 'w') as outfile:
json.dump(h, outfile)
First of all, it's not /, it's \.
Second, it's not in your JSON, because you don't really seem to have a JSON here, you have just a string. And json.dump() converts the string into a string escaped for JSON format, replacing all " with \".
By the way, you should not try to write incomplete JSON as a string yourself, it's better just to make a dictionary {"fields":[]}, then fill it with all values you want and later save into file through json.dump().
Related
I'm new to Python and I have encountered an issue regarding Unicode text content and JSON fields.
My goal is to read some text files that contain Unicode characters and extract the whole content and put them into JSON fields. However, the JSON fields will contain the encoding(UTF-8) instead of the original Unicode characters(eg: JSON will have \u00e8\u0107 instead of èć). How can I direct the whole text file content into the JSON field?
Here is my code:
import json
file_1 = open('utf8_1.txt', 'r', encoding='utf-8').read()
file_2 = open('utf8_2.txt', 'r', encoding='utf-8').read()
with open("test.json", "r") as jsonFile:
data = json.load(jsonFile)
data[0]['field_1'] = file_1
data[0]['field_2'] = file_2
with open("test.json", "w") as jsonFile:
json.dump(data, jsonFile)
Here are two files that have Unicode characters:
utf8_1.txt:
Kèććia
ivò
utf8_2.txt:
ććiùri
iχa
Here is the test.json: (note: Two fields are set to be empty and need to be updated with the file content)
[
{
"field_1": "",
"field_2": ""
}
]
and here is what I got on test.json from running the code above:
[
{
"field_1": "K\u00e8\u0107\u0107ia\niv\u00f2",
"field_2": "\u0107\u0107i\u00f9ri\ni\u03c7a"
}
]
But my expected output for test.json is something like the following:
[
{
"field_1": "Kèććia ivò",
"field_2": "ććiùri iχa"
}
]
My goal is to put whatever in the utf8_1.txt into field_1 and whatever in the utf8_2.txt into field_2 in test.json. Preferably a string value would be the best. I have stuck on this for a long time. I really appreciate your help!
What you get is valid UTF-8 JSON. It's just written as pure ASCII using escape codes for non-ASCII characters, which as a subset of UTF-8 is also valid UTF-8. Read it back in with json.load and it will be the original string. If you want the actual Unicode characters encoded as UTF-8 instead of escape codes when written to the file, use json.dump with the ensure_ascii=False parameter, and make sure to open the file with encoding='utf8':
with open("test.json", "w", encoding='utf8') as jsonFile:
json.dump(data, jsonFile, ensure_ascii=False)
This is in the documentation:
json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True,
check_circular=True, allow_nan=True, cls=None, indent=None,
separators=None, default=None, sort_keys=False, **kw)
...
If ensure_ascii is true (the default), the output is guaranteed to
have all incoming non-ASCII characters escaped. If ensure_ascii is
false, these characters will be output as-is.
I am using the pandas.DataFrame.to_json to convert a data frame to JSON data.
data = df.to_json(orient="records")
print(data)
This works fine and the output when printing is as expected in the console.
[{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040655,"u":"V","v":237.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:power","bt":1610040836,"u":"W","v":512.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040840,"u":"V","v":238.4}]
The problem comes when uploading it to an external API which converts it to a file format or writing it to a file locally. The output has added \ to the beginning and ends of strings.
def dataToFile(processedData):
with open('data.json', 'w') as outfile:
json.dump(processedData,outfile)
The result is shown in the clip below
[{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024746,\"u\":\"W\",\"v\":40.3},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:voltage\",\"bt\":1610024751,\"u\":\"V\",\"v\":238.5},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024764,\"u\":\"W\",\"v\":39.7}]
Is there any formatting specifically I should be including/excluding when converting the data to a file format?
Your data variable is a string of json data and not an actual dictionary. You can do a few things:
Use DataFrame.to_json() to write the file, the first argument of to_json() is the file path:
df.to_json('./data.json', orient='records')
Write the json string directly as text:
def write_text(text: str, path: str):
with open(path, 'w') as file:
file.write(text)
data = df.to_json(orient="records")
write_text(data, './data.json')
If you want to play around with the dictionary data:
def write_json(data, path, indent=4):
with open(path, 'w') as file:
json.dump(data, file, indent=indent)
df_data = df.to_dict(orient='records')
# ...some operations here...
write_json(df_data, './data.json')
If my json file is huge it contains to many dictionaries and lists inside the dictionary and it is enclosed with double quotes means how can i proceed that. what is the deserialize? How to use the deserialize?
Use json module.
If you are having json in one file then you can use:
with open("json_data.json", "r") as data:
print(json.load(data))
OR
with open("json_data.json", "r") as data:
print(json.loads(data.read()))
If you are having json in any var, you can use:
jsonData = '{}'
jsonVal = json.loads(jsonData)
There is a package called json in python, which you can use to serialize and deserialize a dictionary.
If you want to serialize using the following:
with open("huge_json_file.json", "r") as data
json_str = json.dumps(data)
If you want to de-serialize using the following:
with open("huge_json_file.json", "r") as data
json_dict = json.loads(data)
I got some data from an API with Python, and I'm trying to print it to a file. My understanding was that the indent argument lets you pretty print. Here's my code:
import urllib2, json
APIKEY_VALUE = "APIKEY"
APIKEY = "?hapikey=" + APIKEY_VALUE
HS_API_URL = "http://api.hubapi.com"
def getInfo():
xulr = "/engagements/v1/engagements/paged"
url = HS_API_URL + xulr + APIKEY + params
response = urllib2.urlopen(url).read()
with open("hubdataJS.json", "w") as outfile:
json.dump(response, outfile, sort_keys=True, indent=4, ensure_ascii=False)
getInfo()
What I expected hubdataJS.json to look like when I opened it in Sublime text is some JSON with a format like this:
{
a: some data
b: [
some list of data,
more data
]
c: some other data
}
What I got instead was all the data on one line, in quotes (I thought dumps was for outputting as a string), with lots of \s, \rs, and \ns.
Confused about what I'm doing wrong.
in your code, response is a bytestring that contains the data serialized in the json format. When you do json.dump you're serializing the string to json. You end up with a json formatted file containing a string, and in that string you have another json data, so, json inside json.
To solve that you have to decode (deserialize) the bytestring data you got from the internet, before reencoding it to json to write in the file.
response = json.load(urllib2.urlopen(url))
that will convert the serialized data from the web into a real python object.
I am getting a JSON file with following format :
// 20170407
// http://info.employeeportal.org
{
"EmployeeDataList": [
{
"EmployeeCode": "200005ABH9",
"Skill": CT70,
"Sales": 0.0,
"LostSales": 1010.4
}
]
}
Need to remove the extra comment lines present in the file.
I tried with the following code :
import json
import commentjson
with open('EmployeeDataList.json') as json_data:
employee_data = json.load(json_data)
'''employee_data = json.dump(json.load(json_data))'''
'''employee_data = commentjson.load(json_data)'''
print(employee_data)`
Still not able to remove the comments from the file and bring
the JSON file in correct format.
Not getting where things are going wrong? Any direction in this regard is highly appreciated.Thanks in advance
You're not using commentjson correctly. It has the same interface as the json module:
import commentjson
with open('EmployeeDataList.json', 'r') as handle:
employee_data = commentjson.load(handle)
print(employee_data)
Although in this case, your comments are simple enough that you probably don't need to install an extra module to remove them:
import json
with open('EmployeeDataList.json', 'r') as handle:
fixed_json = ''.join(line for line in handle if not line.startswith('//'))
employee_data = json.loads(fixed_json)
print(employee_data)
Note the difference here between the two code snippets is that json.loads is used instead of json.load, since you're parsing a string instead of a file object.
Try JSON-minify:
JSON-minify minifies blocks of JSON-like content into valid JSON by removing all whitespace and JS-style comments (single-line // and multiline /* .. */).
I usually read the JSON as a normal file, delete the comments and then parse it as a JSON string. It can be done in one line with the following snippet:
with open(path,'r') as f: jsonDict = json.loads('\n'.join(row for row in f if not row.lstrip().startswith("//")))
IMHO it is very convenient because it does not need CommentJSON or any other non standard library.
Well that's not a valid json format so just open it like you would a text document then delete anything from// to \n.
with open("EmployeeDataList.json", "r") as rf:
with open("output.json", "w") as wf:
for line in rf.readlines():
if line[0:2] == "//"
continue
wf.write(line)
Your file is parsable using HOCON.
pip install pyhocon
>>> from pyhocon import ConfigFactory
>>> conf = ConfigFactory.parse_file('data.txt')
>>> conf
ConfigTree([('EmployeeDataList',
[ConfigTree([('EmployeeCode', '200005ABH9'),
('Skill', 'CT70'),
('Sales', 0.0),
('LostSales', 1010.4)])])])
If it is the same number of lines every time you can just do:
fh = open('EmployeeDataList.NOTjson',"r")
rawText = fh.read()
json_data = rawText[rawText.index("\n",3)+1:]
This way json_data is now the string of text without the first 3 lines.