Reading from excel sheet and writing exact characters to the json file - python

I have an excel sheet and I am reading from it and writing the values read to a json file. But the problem is the characters are not written as they are.
For example:
If the text is "Молба", it is written as "\u041b\u0438\u0447\u043d\u0430" in unicode or something.
Code I am using to write to file is
with open('data.json', 'w') as file:
str = json.dumps(json_list, indent=4)
file.write(str)
file.close()
json_list has list of objects.
Any suggestions to solve this issue would be helpful.

Pass ensure_ascii=False to json.dumps() function to do that

Considering suggestion from #leotrubach,
json.dumps(json_list, indent=4, ensure_ascii=False).encode('utf8') worked as desired.

Related

Writing a JSON file from dictionary, correcting the output

So I am working on a conversion file that is taking a dictionary and converting it to a JSON file. Current code looks like:
data = {json_object}
json_string = jsonpickle.encode(data)
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(json_string, outfile)
But when I go to open that rendered file, it is adding three \ on the front and back of each string.
ps: sorry if I am using the wrong terminology, I am still new to python and don't know the vocabulary that well yet.
Try this
import json
data = {"k": "v"}
with open( 'path_to_file.json', 'w') as f:
json.dump(data, f)
You don't need to use jsonpickle to encode dict data.
The json.dump is a wrapper function that convert data to json format firstly, then write these string data to your file.
The reason why you found \\ exist between each string is that, jsonpickle have took your data to string, after which the quote(") would convert to Escape character when json.dump interact.
Just use the following code to write dict data to json
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(data, outfile)

How to convert a series of JSON strings into one json file?

I am using python and json to construct a json file. I have a string, 'outputString' which consists of multiple lines of dictionaries turned into jsons, in the following format:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
I would like to turn this string of jsons and write a new json file entirely, with each item still being its own line. I would like to turn the string of multiple json objects and turn it into one json file. I have attached the code on how I got outputString and what I have tried to do. Right now, the code I have writes the file, but all on one line. I would like the lines to be separated as the string is.
for value in outputList:
newOutputString = json.dumps(value)
outputString += (newOutputString + "\n")
with open('data.json', 'w') as outfile:
for item in outputString.splitlines():
json.dump(item, outfile)
json.dump("\n",outfile)
PROBLEM: when you json.dump("\n",outfile) it will always be written on the same line as ”\n” is not recognised as a new line in json.
SOLUTION: ensure that you write a new line using python and not a json encoded string:
with open('data.json', 'a') as outfile: # We are appending to the file so that we can add multiple new lines for each of different json strings
for item in outputString.splitlines():
json.dump(item, outfile)
outfile.write("\n”) # write to the file a new line, as you can see this uses a python string, no need to encode with json
See comments for explanation.
Please ensure that the file you write to is empty if you just want these json objects in them.
Your value rows are not in actual json format if the properties do not come between double quotes.
This would be a proper json data format:
{"size":1, "title":"Hello", "space":0}
Having said that here is a solution to your question with the type of data you provided.
I am assuming your data comes like this:
outputList = ['{size:1, title:"Hello", space:0}',
'{size:21, title:"World", space:10}',
'{size:3, title:"Goodbye", space:20}']
so the only thing you need to do is write each value using the file.write() function
Python 3.6 and above:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(f"{value}\n")
Python 3.5 and below:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(value+"\n")
data.json file will look like this:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
Note: As someone already commented, your data.json file will not be a true json format ted file but it serves the purpose of your question. Enjoy! :)

How to write exact wordings of unicode characters into a file?

when I want to write "සිවු අවුරුදු පාටමාලාව" with the exact wording into a json file using python3.6, but instead \u0dc3\u0dd2\u0dc3\u0dd4\u0db1\u0dca\u0da7 \u0dc3\u0dd2\u0dc0\u0dd4 is written into the json file.
I read an excel using xlrd and write to using open().
import xlrd
import json
wb = xlrd.open_workbook('data.xlsx',encoding_override='utf-8')
sheet = wb.sheet_by_index(0)
with open('data.json', 'w') as outfile:
data = json.dump(outerdata,outfile,ensure_ascii=True)
If I do this in Python with the escape string you report:
>>> print ("\u0dc3\u0dd2\u0dc3\u0dd4\u0db1\u0dca\u0da7 \u0dc3\u0dd2\u0dc0\u0dd4")
සිසුන්ට සිවු
you will see that the escapes do render as the characters you want. These are two different representations of the same data. Both representations are valid in JSON. But you are using json.dump() and you have specified ensure_ascii=True. That tells json.dump() that you want the representation with escapes. That is what ascii means: only the printable characters between chr(32) and chr(126). Change that to ensure_ascii=False.
But because you are now no longer writing pure ascii to your output file data.json, you need to specify an encoding when you open it:
with open("data.json", "w", encoding="utf-8") as outfile:
data = json.dump(outerdata,outfile,ensure_ascii=False)
This will make your JSON file look the way you want it to look.

Raw string for variables in python?

I have seen several similar posts on this but nothing has solved my problem.
I am reading a list of numbers with backslashes and writing them to a .csv. Obviously the backslashes are causing problems.
addr = "6253\342\200\2236387"
with open("output.csv", 'a') as w:
write = writer(w)
write.writerow([addr])
I found that using r"6253\342\200\2236387" gave me exactly what I want for the output but since I am reading my input from a file I can't use raw string. i tried .encode('string-escape') but that gave me 6253\xe2\x80\x936387 as output which is definitely not what I want. unicode-escape gave me an error. Any thoughts?
The r in front of a string is only for defining a string. If you're reading data from a file, it's already 'raw'. You shouldn't have to do anything special when reading in your data.
Note that if your data is not plain ascii, you may need to decode it or read it in binary. For example, if the data is utf-8, you can open the file like this before reading:
import codecs
f = codecs.open("test", "r", "utf-8")
Text file contains...
1234\4567\7890
41\5432\345\6789
Code:
with open('c:/tmp/numbers.csv', 'ab') as w:
f = open(textfilepath)
wr = csv.writer(w)
for line in f:
line = line.strip()
wr.writerow([line])
f.close()
This produced a csv with whole lines in a column. Maybe use 'ab' rather than 'a' as your file open type. I was getting extra blank records in my csv when using just 'a'.
I created this awhile back. This helps you write to a csv file.
def write2csv(fileName,theData):
theFile = open(fileName+'.csv', 'a')
wr = csv.writer(theFile, delimiter = ',', quoting=csv.QUOTE_MINIMAL)
wr.writerow(theData)

Writing unicode data in csv

I know similar kind of question has been asked many times but seriously i have not been able to properly implement the csv writer which writes properly in csv (it shows garbage).
I am trying to use UnicodeWriter as mention in official docs .
ff = open('a.csv', 'w')
writer = UnicodeWriter(ff)
st = unicode('Displaygrößen', 'utf-8') #gives (u'Displaygr\xf6\xdfen', 'utf-8')
writer.writerow([st])
This does not give me any decoding or encoding error. But it writes the word Displaygrößen as Displaygrößen which is not good. Can any one help me what i am doing wrong here??
You are writing a file in UTF-8 format, but you don't indicate that into your csv file.
You should write the UTF-8 header at the beginning of the file. Add this:
ff = open('a.csv', 'w')
ff.write(codecs.BOM_UTF8)
And your csv file should open correctly after that with the program trying to read it.
Opening the file with codecs.open should fix it.

Categories