Python throwing error in reading JSON file - python

I am writing a function in a Python Script which will read the json file and print it.
The scripts reads as:
def main(conn):
global link, link_ID
with open('ad_link.json', 'r') as statusFile:
status = json.loads(statusFile.read())
statusFile.close()
print(status)
link_data = json.load[status]
link = link_data["link"]
link_ID = link_data["link_id"]
print(link)
print(link_ID)
I am getting error as:
link_data = json.load[status]
TypeError: 'function' object is not subscriptable
What is the issue?
The content of ad_link.json The file I am receiving is saved in this manner.
"{\"link\": \"https://res.cloudinary.com/dnq9phiao/video/upload/v1534157695/Adidas-Break-Free_nfadrz.mp4\", \"link_id\": \"ad_Bprise_ID_Adidas_0000\"}"
The function to receive and write JSON file
def on_message2(client, userdata, message):
print("New MQTT message received. File %s line %d" % (filename, cf.f_lineno))
print("message received?/'/'/' ", str(message.payload.decode("utf-8")), \
"topic", message.topic, "retained ", message.retain)
global links
links = str(message.payload.decode("utf-8")
logging.debug("Got new mqtt message as %s" % message.payload.decode("utf-8"))
status_data = str(message.payload.decode("utf-8"))
print(status_data)
print("in function on_message2")
with open("ad_link.json", "w") as outFile:
json.dump(status_data, outFile)
time.sleep(3)
The output of this function
New MQTT message received. File C:/Users/arunav.sahay/PycharmProjects/MediaPlayer/venv/Include/mediaplayer_db_mqtt.py line 358
message received?/'/'/' {"link": "https://res.cloudinary.com/dnq9phiao/video/upload/v1534157695/Adidas-Break-Free_nfadrz.mp4", "link_id": "ad_Bprise_ID_Adidas_0000"} topic ios_push retained 1
{"link": "https://res.cloudinary.com/dnq9phiao/video/upload/v1534157695/Adidas-Break-Free_nfadrz.mp4", "link_id": "ad_Bprise_ID_Adidas_0000"}
EDIT
I found out the error is in JSON format. I am receiving the JSON data in a wrong format. How will I correct that?

There are two major errors here:
You are trying to use the json.load function as a sequence or dictionary mapping. It's a function, you can only call it; you'd use json.load(file_object). Since status is actually a string, you'd have to use json.loads(status) to actually decode a JSON document stored in a string.
In on_message2, you encoded JSON data to JSON again. Now you have to decode it twice. That's an unfortunate waste of computer resources.
In the on_message2 function, the message.payload object is a bytes-value containing a UTF-8 encoded JSON document, if you want to write that to a file, don't decode to text, and don't encode the text to JSON again. Just write those bytes directly to a file:
def on_message2(client, userdata, message):
logging.debug("Got new mqtt message as %s" % message.payload.decode("utf-8"))
with open("ad_link.json", "wb") as out:
out.write(message.payload)
Note the 'wb' status; that opens a file in binary mode for writing, at which point you can write the bytes object to that file.
When you open a file without a b in the mode, you open a file in text mode, and when you write a text string to that file object, Python encodes that text to bytes for you. The default encoding depends on your OS settings, so without an explicit encoding argument to open() you can't even be certain that you end up with UTF-8 JSON bytes again! Since you already have a bytes value, there is no need to manually decode then have Python encode again, so use a binary file object and avoid that decode / encode dance here too.
You can now load the file contents with json.load() without having to decode again:
def main(conn):
with open('ad_link.json', 'rb') as status_file:
status = json.load(status_file)
link = status["link"]
link_id = status["link_id"]
Note that I opened the file as binary again. As of Python 3.6, the json.load() function can work both with binary files and text files, and for binary files it can auto-detect if the JSON data was encoded as UTF-8, UTF-16 or UTF-32.\
If you are using Python 3.5 or earlier, open the file as text, but do explicitly set the encoding to UTF-8:
def main(conn):
with open('ad_link.json', 'r', encoding='utf-8') as status_file:
status = json.load(status_file)
link = status["link"]
link_id = status["link_id"]

def main(conn):
global link, link_ID
with open('ad_link.json', 'r') as statusFile:
link_data = json.loads(statusFile.read())
link = link_data["link"]
link_ID = link_data["link_id"]
print(link)
print(link_ID)

replace loads with load when dealing with file object which supports read like operation
def main(conn):
global link, link_ID
with open('ad_link.json', 'r') as statusFile:
status = json.load(statusFile)
status=json.loads(status)
link = status["link"]
link_ID = status["link_id"]
print(link)
print(link_ID)

Related

UTF-8 Codec Error When Assigning Uncompressed GZip File from URL to String Variable

I am downloading a gzip log from a URL and then saving it to a variable. I then want to later iterate over that string variable line by line. If I just save the file and open it in Notepad++, I can see that the saved log file is in UTF-8 encoding.
I wanted to skip saving the file and then reopening to parse it, so I have attempted to assign the file contents to a variable and then use io.StringIO to iterate over each line within the variable. This process works fine but occasionally I get the following error to blow up when the script reaches the line return str(file_content, 'utf-8').
Exception Raised in connect function: 'utf-8' codec can't decode byte 0xe0 in position 138037: invalid continuation byte
Here is the section of code that makes the request and then assigns to string variable.
# Making a get request with basic authentication
request = urllib.request.Request(url)
base64string = base64.b64encode(bytes('%s:%s' % ('xxxxx', 'xxxxx'),'ascii'))
request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
# open request and then use gzip to read the shoutcast log that is in gzip format, then save uncompressed version
with urllib.request.urlopen(request) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_content = uncompressed.read()
return str(file_content, 'utf-8')

How to Replace \n, b and single quotes from Raw File from GitHub?

I am trying to download file from GitHub(raw file) and then run this file as .sql file.
import snowflake.connector
from codecs import open
import logging
import requests
from os import getcwd
import os
import sys
#logging
logging.basicConfig(
filename='C:/Users/abc/Documents/Test.log',
level=logging.INFO
)
url = "https://github.com/raw/abc/master/file_name?token=Anvn3lJXDks5ciVaPwA%3D%3D"
directory = getcwd()
filename = os.path.join(getcwd(),'VIEWS.SQL')
r = requests.get(url)
filename.decode("utf-8")
f = open(filename,'w')
f.write(str(r.content))
with open(filename,'r') as theFile, open(filename,'w') as outFile:
data = theFile.read().split('\n')
data = theFile.read().replace('\n','')
data = theFile.read().replace("b'","")
data = theFile.read()
outFile.write(data)
However I get this error
syntax error line 1 at position 0 unexpected 'b'
My converted sql file has b at the beginning and bunch of newline \n characters in the file. Also the entire output file is in single quotes 'text'. Can anyone help me get rid of these? Looks like replace isn't working.
OS: Windows
Python Version: 3.7.0
You introduced a b'.. prefix by converting the response.content bytes value to a string with str():
>>> import requests
>>> r = requests.get("https://github.com/raw/abc/master/file_name?token=Anvn3lJXDks5ciVaPwA%3D%3D")
>>> r.content
b'Not Found'
>>> str(r.content)
"b'Not Found'"
Of course, the specific dummy URL you gave in your question produces a 404 Not Found response, hence the Not Found content of the response body:
>>> r.status_code
404
so the contents in this demonstration are not actually all that useful. However, even for your real URL you probably want to test for a 200 status code before moving to write the data to a file!
What is going wrong in the above is that str(bytesvalue) converts a bytes object to its representation. You'd normally want to decode a bytes value with a text codec, using the bytes.decode() method. But because you are writing the data to a file here, you should instead just open the file in binary mode and write the bytes object without decoding:
r = requests.get(url)
if r.status_code == 200:
with open(filename, 'wb') as f:
f.write(r.content)
The 'wb' mode opens the file for writing in binary mode. Writing binary content to a binary file is the most efficient; decoding it first then writing to a text file requires that it is encoded again. Better to avoid doing double work.
As a side note: there is no need to join a local filename with getcwd(); relative paths always end up in the current working directory, and otherwise it's better to use os.path.abspath(filename).
You could also trust that GitHub sets the correct character set in the Content-Type headers and have response decode the value to str for you in the form of the response.text attribute:
r = requests.get(url)
if r.status_code == 200:
with open(filename, 'w') as f:
f.write(r.text)
but again, that's really doing extra work for nothing, first decoding the binary content from the request, then encoding again when writing to a text file.
Finally, for larger file responses it is better to stream the data and copy it directly to a file. The shutil.copyfileobj() function can take a raw response fileobject directly, provided you enable transparent transport decompression:
import shutil
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(filename, 'wb') as f:
# enable transparent transport decompression handling
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
Depending on your version of Python/OS it could be as simple as changing the file to read/write in binary (and if they're still there then altering where you have the replaces):
with open(filename,'rb') as theFile, open(filename,'wb') as outFile:
outfile.write(str(r.content))
data = theFile.read().split('\n')
data = data.replace('\n','')
data = data.replace("b'","")
outFile.write(data)
It would help to have a copy of the file and the line the error is occurring on.

[File IO Error in porting from python2 to python 3]

I port my project from python 2.7 to python 3.6
What I was doing in python 2.7
1)Decode from Base 64
2)Uncompress using Gzip
3)Read line by line and add in file
bytes_array = base64.b64decode(encryptedData)
fio = StringIO.StringIO(bytes_array)
f = gzip.GzipFile(fileobj=fio)
decoded_data = f.read()
f.close()
f = file("DecodedData.log",'w')
for item in decoded_data:
f.write(item)
f.close()
I tried the same thing using python 3 changes but It is not working giving one error or the other.
I am not able to use StringIO giving error
#initial_value must be str or None, not bytes
So I try this
bytes_array = base64.b64decode(encryptedData)
#initial_value must be str or None, not bytes
fio = io.BytesIO(bytes_array)
f = gzip.GzipFile(fileobj=fio)
decoded_data =f.read()
f= open("DecodedData.log",'w')
for item in decoded_data:
f.write(item)
f.close()
This gives error in the line f.write(item) that
write() argument must be str, not int
To my surprsie,item actually contains an integer when i print it.(83,83,61,62)
I think it as I have not given the limit,it is reading as many as it can.
So I try to read file line by line
f= open("DecodedData.log",'w')
with open(decoded_data) as l:
for line in l:
f.write(line)
But it still not working and \n also printing in file.
Can some suggest what I am missing.
decoded_data = f.read()
Will result in decoded_data being a bytes object. bytes objects are iterable, when you iterate them they will return each byte value from the data as an integer (0-255). That means when you do
for item in decoded_data:
f.write(item)
then item will be each integer byte value from your raw data.
f.write(decoded_data)
You've opened f in text mode, so you'll need to open it in binary mode if you want to write raw binary data into it. But the fact you've called the file DecodedData.log suggests you want it to be a (human readable?) text file.
So I think overall this will be more readable:
gzipped_data = base64.b64decode(encryptedData)
data = gzip.decompress(gzipped_data)
with open("DecodedData.log",'wb') as f:
f.write(data)
There's no need for the intermediate BytesIO at all, gzip has a decompress method (https://docs.python.org/3/library/gzip.html#gzip.decompress)

How can I convert JSON-encoded data that contains Unicode surrogate pairs to string?

so I am trying to take this data that uses unicode indicators and make it print with emojis. It is currently in a txt. file but I will write to an excel file later. So anyways I am getting an error I am not sure what to do with. This is the text I am reading:
"Thanks #UglyGod \ud83d\ude4f https:\\/\\/t.co\\/8zVVNtv1o6\"
"RT #Rosssen: Multiculti beatdown \ud83d\ude4f https:\\/\\/t.co\\/fhwVkjhFFC\"
And here is my code:
sampleFile= open('tweets.txt', 'r').read()
splitFile=sampleFile.split('\n')
for line in sampleFile:
x=line.encode('utf-8')
print(x.decode('unicode-escape'))
This is the error Message:
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 0: \ at end of string
Any ideas?
This is how the data was originally generated.
class listener(StreamListener):
def on_data(self, data):
# Check for a field unique to tweets (if missing, return immediately)
if "in_reply_to_status_id" not in data:
return
with open("see_no_evil_monkey.csv", 'a') as saveFile:
try:
saveFile.write(json.dumps(data) + "\n")
except (BaseException, e):
print ("failed on data", str(e))
time.sleep(5)
return True
def on_error(self, status):
print (status)
This is how the data was originally generated... saveFile.write(json.dumps(data) + "\n")
You should use json.loads() instead of .decode('unicode-escape') to read JSON text:
#!/usr/bin/env python3
import json
with open('tweets.txt', encoding='ascii') as file:
for line in file:
text = json.loads(line)
print(text)
Your emoji 🙏 is represented as a surrogate pair, see also here for info about this particular glyph. Python cannot decode surrogates, so you'll need to look at exactly how your tweets.txt file was generated, and try encoding the original tweets, along with the emoji, as UTF-8. This will make reading and processing the text file much easier.

Unable to Save Arabic Decoded Unicode to CSV File Using Python

I am working with a twitter streaming package for python. I am currently using a keyword that is written in unicode to search for tweets containing that word. I am then using python to create a database csv file of the tweets. However, I want to convert the tweets back to Arabic symbols when I save them in the csv.
The errors I am receiving are all similar to "error ondata the ASCII caracters in position ___ are not within the range of 128."
Here is my code:
class listener(StreamListener):
def on_data(self, data):
try:
#print data
tweet = (str((data.split(',"text":"')[1].split('","source')[0]))).encode('utf-8')
now = datetime.now()
tweetsymbols = tweet.encode('utf-8')
print tweetsymbols
saveThis = str(now) + ':::' + tweetsymbols.decode('utf-8')
saveFile = open('rawtwitterdata.csv','a')
saveFile.write(saveThis)
saveFile.write('\n')
saveFile.close()
return True
Excel requires a Unicode BOM character written to the beginning of a UTF-8 file to view it properly. Without it, Excel assumes "ANSI" encoding, which is OS locale-dependent.
This writes a 3-row, 3-column CSV file with Arabic:
#!python2
#coding:utf8
import io
with io.open('arabic.csv','w',encoding='utf-8-sig') as f:
s = u'إعلان يونيو وبالرغم تم. المتحدة'
s = u','.join([s,s,s]) + u'\n'
f.write(s)
f.write(s)
f.write(s)
Output:
For your specific example, just make sure to write a BOM character u'\xfeff' as the first characters of your file, encoded in UTF-8. In the example above, the 'utf-8-sig' codec ensures a BOM is written.
Also consult this answer, which shows how to wrap the csv module to support Unicode, or get the third party unicodecsv module.
Here a snippet to write arabic in text
# coding=utf-8
import codecs
from datetime import datetime
class listener(object):
def on_data(self, tweetsymbols):
# python2
# tweetsymbols is str
# tweet = (str((data.split(',"text":"')[1].split('","source')[0]))).encode('utf-8')
now = datetime.now()
# work with unicode
saveThis = unicode(now) + ':::' + tweetsymbols.decode('utf-8')
try:
saveFile = codecs.open('rawtwitterdata.csv', 'a', encoding="utf8")
saveFile.write(saveThis)
saveFile.write('\n')
finally:
saveFile.close()
return self
listener().on_data("إعلان يونيو وبالرغم تم. المتحدة")
All you must know about encoding https://pythonhosted.org/kitchen/unicode-frustrations.html

Categories