Python code to create JSON with Marathi language Giving Unreadable JSON - python

I am trying to create JSON file using python code. file is created successfully with English language but not properly working with Marathi Language.
Please check out code:
import os
import json
jsonFilePath = "E:/file/"
captchaImgLocation = "E:/file/captchaimg/"
path_to_tesseract = r"C:/Program Files/Tesseract-OCR/tesseract.exe"
image_path = r"E:/file/captchaimg/captcha.png"
x = {
"FName": "प्रवीण",
}
# convert into JSON:
y = json.dumps(x, ensure_ascii=False).encode('utf8')
# the result is a JSON string:
print(y.decode())
completeName = os.path.join(jsonFilePath, "searchResult_Unicode.json")
print(str(completeName))
file1 = open(completeName, "w")
file1.write(str(y))
file1.close()
O/P on console:
{"FName": "प्रवीण"}
<br>
File created inside folder like this:
b'{"FName": "\xe0\xa4\xaa\xe0\xa5\x8d\xe0\xa4\xb0\xe0\xa4\xb5\xe0\xa5\x80\xe0\xa4\xa3"}'
There is no run time or compile time error but JSON is created with with above format.
Please suggest me any solution.

Open the file in the encoding you need and then json.dump to it:
import os
import json
data = { "FName": "प्रवीण" }
# Writing human-readable. Note some text viewers on Windows required UTF-8 w/ BOM
# to *display* correctly. It's not a problem with writing, but you can use
# encoding='utf-8-sig' to hint to those programs that the file is UTF-8 if
# you see that issue. MUST use encoding='utf8' to read it back correctly.
with open('out.json', 'w', encoding='utf8') as f:
json.dump(data, f, ensure_ascii=False)
# Writing non-human-readable for non-ASCII, but others will have few
# problems reading it back into Python because all common encodings are ASCII-compatible.
# Using the default encoding this will work. I'm being explicit about encoding
# because it is good practice.
with open('out2.json', 'w', encoding='ascii') as f:
json.dump(data, f, ensure_ascii=True) # True is the default anyway
# reading either one is the same
with open('out.json', encoding='utf8') as f:
data2 = json.load(f)
with open('out2.json', encoding='utf8') as f: # UTF-8 is ASCII-compatible
data3 = json.load(f)
# Round-tripping test
print(data == data2, data2)
print(data == data3, data3)
Output:
True {'FName': 'प्रवीण'}
True {'FName': 'प्रवीण'}
out.json (UTF-8-encoded):
{"FName": "प्रवीण"}
out2.json (ASCII-encoded):
{"FName": "\u092a\u094d\u0930\u0935\u0940\u0923"}

You have encoded the JSON string, so you must either open the file in binary mode or decode the JSON before writing to file, so:
file1 = open(completeName, "wb")
file1.write(y)
or
file1 = open(completeName, "w")
file1.write(y.decode('utf-8'))
Doing
file1 = open(completeName, "w")
file1.write(str(y))
writes the string representation of the bytes to the file, which always the wrong thing to do.

Do you want your json to be human readable? It's usually bad practice since you would never know what encoding to use.
You can write/read your json files with the json module without worrying about encoding:
import json
json_path = "test.json"
x = {"FName": "प्रवीण"}
with open(json_path, "w") as outfile:
json.dump(x, outfile, indent=4)
with open(json_path, "r") as infile:
print(json.load(infile))

Related

I need help creating a simple python script that stores an attribute value from a custom json file

JSON file looks like this:
{"Clear":"Pass","Email":"noname#email.com","ID":1234}
There are hundreds of json files with different email values, which is why I need a script to run against all files.
I need to extract out the value associated with the Email attribute, which is nooname#email.com.
I tried using import json but I'm getting a decoder error:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Script looks like this:
import json
json_data = json.loads("file.json")
print (json_data["Email"]
Thanks!
According to the docs, json.loads() takes a str, bytes or bytearray as argument. So if you want to load a json file this way, you should pass the content of the file instead of its path.
import json
file = open("file.json", "r") # Opens file.json in read mode
file_data = file.read()
json_data = json.loads(file_data)
file.close() # Remember to close the file after using it
You can also use json.load() which takes a FILE as argument
import json
file = open("file.json", "r")
json_data = json.load(file)
file.close()
your script needs to open the file to get a file handle, than we can read the json.
this sample contains code that can read the json file. to simulate this, it uses a string that is identical with the data coming from the file.
import json
#this is to read from the real json file
#file_name = 'email.json'
#with open(file_name, 'r') as f_obj:
#json_data = json.load(f_obj)
# this is a string that equals the result from reading json file
json_data = '{"Clear":"Pass","Email":"noname#email.com","ID":1234}'
json_data = json.loads(json_data)
print (json_data["Email"])
result: noname#email.com
import json
with open("file.json", 'r') as f:
file_content = f.read()
#convert json to python dict
tmp = json.loads(file_content)
email = tmp["Email"]
As already pointed out in previous comments, json.loads() take contents of a file rather than a file.

Python converts multiple JSON files in a folder directory to CSV

I have a lot of JSON files, I put them in my folder, I want to convert them to CSV format,
Should I use import glob? ? I am a novice, how can I modify my code,
#-*-coding:utf-8-*-
import csv
import json
import sys
import codecs
def trans(path):
jsonData = codecs.open('C:/Users/jeri/Desktop/1', '*.json', 'r', 'utf-8')
# csvfile = open(path+'.csv', 'w')
# csvfile = open(path+'.csv', 'wb')
csvfile = open('C:/Users/jeri/Desktop/1.csv', 'w', encoding='utf-8', newline='')
writer = csv.writer(csvfile, delimiter=',')
flag = True
for line in jsonData:
dic = json.loads(line)
if flag:
keys = list(dic.keys())
print(keys)
flag = False
writer.writerow(list(dic.values()))
jsonData.close()
csvfile.close()
if __name__ == '__main__':
path=str(sys.argv[0])
print(path)
trans(path)
Yes using glob would be a good way to iterate through the .json files in your folder! But glob doesn't have anything to do with the reading/writing of files. After importing glob, you can use it like this:
for curr_file in glob.glob("*.json"):
# Process each file here
I see that you've used the json module to read in your code snippet. I'd say the better way to go about it is to use pandas.
df = pd.read_json()
I say this because with the pandas library, you can simply convert from .json to .csv using
df.to_csv('file_name.csv')
Combining the three together, it would look like this:
for curr_file in glob.glob("*.json"):
# Process each file here
df = pd.read_json(curr_file)
df.to_csv('file_name.csv')
Also, note that if your json has nested objects, it can't be directly converted to csv, you'll have to settle the organization of data prior to the conversion.

Update a JSON file with Python and keep the original format

I have a Python script that updates some value of a JSON file, and the original format of my JSON looks like:
To edit the value I use this code:
import json
status_wifi = "ok"
with open("config_wifi.json", "r") as jsonFile:
data = json.load(jsonFile)
data['wifi_session']['status'] = status_wifi
with open("config_wifi.json", "w") as jsonFile:
json.dump(data, jsonFile)
But when the values are updated, the format of my JSON is compressed like this:
I want the JSON file to keep its original format with all spaces and line breaks. How could I do that?
Try json.dumps(json_obj, indent=4)
Example:
import json
status_wifi = "ok"
with open("config_wifi.json", "r") as jsonFile:
data = json.load(jsonFile)
data['wifi_session']['status'] = status_wifi
with open("config_wifi.json", "w") as jsonFile:
json.dump(json.dumps(data, indent=4), jsonFile)
The indent is the number of spaces for a tab.
If you set this parameter, the JSON will be formatted.
You can read more about it here.

Merge txt files in a folder and replacing characters in python

I have a doubt about how to do to continue the code, I need to take all files from a folder and merge them in 1 file with another text format.
Example:
The Input files are of text format like this:
"{'nr': '3173391045', 'data': '27/12/2017'}"
"{'nr': '2173391295', 'data': '05/01/2017'}"
"{'nr': '5173351035', 'data': '07/03/2017'}"
The Output files must be lines like this:
"3173391045","27/09/2017"
"2173391295","05/01/2017"
"5173351035","07/03/2017"
This is my working code, it's working for merge and taking out the blank lines
import glob2
import datetime
filenames=glob2.glob("*.txt")
with open(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")+".SAI", 'w') as file:
for filename in filenames:
with open(filename,"r") as f:
file.write(f.read())
I'm trying something with .replace but is not working, I get syntax errors or blank files
filedata = filedata.replace("{", "") for line in filedata
If your input files had contained valid JSON strings, the correct way would have been to parse the lines as JSON and write them back in csv. As strings are enclosed in single quotes (') they are rejected by the json module of the Python library, and my advice is to use a regex to parse them. Code could become:
import glob2
import datetime
import csv
import re
# the regex to parse the line
rx = re.compile(r".*'nr'\s*:\s*'(\d+)'.*'data'\s*:\s*'([/\d]+)'")
filenames=glob2.glob("*.txt")
with open(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")+".SAI", 'w') as file:
wr = csv.writer(file, quoting = csv.QUOTE_ALL)
for filename in filenames:
with open(filename,"r") as f:
for line in f: # process line by line
m = rx.match(line)
wr.writerow(m.groups())
With a few tweaks, the input data can be coerced into a form suitable for JSON parsing:
from datetime import datetime
import json
import glob2
import csv
with open(datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")+".SAI", 'w', newline='') as f_output:
csv_output = csv.writer(f_output, quoting=csv.QUOTE_ALL)
for filename in glob2.glob('*.txt'):
with open(filename) as f_input:
for row in f_input:
row_dict = json.loads(row.strip('"\n').replace("'", '"'))
csv_output.writerow([row_dict['nr'], row_dict['data']])
Giving you:
"3173391045","27/12/2017"
"2173391295","05/01/2017"
"5173351035","07/03/2017"
Note, in Python 3.x the output file should be opened with newline=''. Without this, extra blank lines can appear in the output file.
using regex/replaces to parse those strings is dangerous. You could always stumble on a data containing the delimiter, the comma, etc..
And in this case, even if json cannot read those lines,ast.literal_eval can without any modification whatsoever:
import ast
with open("output.csv",newline="") as fw:
cw = csv.writer(fw)
for filename in filenames:
with open(filename) as f:
for line in f:
d = ast.literal_eval(line)
cw.writerow([d['nr'],d['data'])

Saving a file in a directory but also being able to print it whenever i needto

I am running Python 3.x. So i have been working on some code for fetching data on currencies names around the world from a currency website to get information which the code is as follows
def _fetch_currencies():
import urllib.request
import json
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
dumps = json.dumps(decoded, indent=4)
return dumps
I then need to save it as a file locally but having some issue and cant see where.
Here is the code for saving the currencies:
def save_currencies(_fetch_currencies, filename):
sorted_currencies = sorted(decoded.items())
with open(filename, 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
They just don't seem to work together apart from when i remove the line ' dumps = json.dumps(decoded, indent=4) ' but i need that line to be able to print the file in text, how do i get around deleting this line and still be able to save and print? How do i also pick where it saves?
Any Help will be great, thank you very much anyone and everyone who answers/reads this.
I may be mistaken, but your "decoded" variable should be declared as global in both functions.
I would actually have _fetch_currencies() return a dictionary, and then I would pass that dictionary on to saved_currencies(currencies_decoded, filename). For example:
def _fetch_currencies():
import urllib.request
import json
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
return decoded
def save_currencies(currencies_decoded, filename):
sorted_currencies = sorted(currencies_decoded.items())
with open(filename, 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
my_currencies_decoded = _fetch_currencies()
save_currencies(my_currencies_decoded, "filename.csv")
Furthermore, if you would like to save your csv file to a certain location in your filesystem, you can import os and use the os.path.join() function and provide it the FULL path. For example, to save your .csv file to a location called "/Documents/Location/Here", you can do:
import os
def save_currencies(currencies_decoded, filename):
sorted_currencies = sorted(currencies_decoded.items())
with open(os.path.join("Documents","Location","Here"), 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
You can also use a relative path, so that if you're already in directory "Documents", and you'd like to save a file in "/Documents/Location/Here", you can instead just say:
with open(os.path.join("Location", "Here"), 'w') as my_csv:

Categories