How can I fix this problem serializing JSON? - python

I want to post input data with python to a JSON document database. But this error appears. I think it's a problem of serializing but don't think how to fix it:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\33769\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Users\33769\AppData\Local\Programs\Python\Python310\lib\json\encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Users\33769\AppData\Local\Programs\Python\Python310\lib\json\encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "C:\Users\33769\AppData\Local\Programs\Python\Python310\lib\json\encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable
>>> res = requests.post(url, headers=headers, data=data)
>>> print(res.text)
{"error":"bad_request","reason":"invalid UTF-8 JSON"}
Here is the code I used:
>>> import requests
>>> import json
>>> url = 'http://admin:pass#localhost:5984/db_reviewin'
>>> data = {'key':'value'}
>>> headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
>>> a = input("ton_nom:")
ton_nom:ayoub
>>> b = input("ton_age")
ton_age16
>>> c = input("gender:")
gender:M
>>> e_mail = input("ton e_mail:")
ton e_mail:ayoub_semsar#yahoo.com
>>> d = input("country:")
country:france
>>> data = {"full_name":{a}, "age":{b}, "gender":{c}, "e_mail":{e_mail}, "country":{d}}
>>> res = requests.post(url, headers=headers, data=json.dumps(data))

You can't use a set, use a list and make it like this
data = {"full_name":[a], "age":[b], "gender":[c], "e_mail":[e_mail], "country":[d]}

You are trying to insert a string instead of object (or vice versa).
For example;
This is a valid JSON:
{"full_name": "ayoub"}
Or this is another valid JSON:
{"full_name": {"name": "ayoub"}}
However, this is the JSON that returns from your code (let's include just the first column):
{"full_name": {"ayoub"}}
You need to remove curly brackets from inside your dictionary or you should convert them to a JSON list which can contain multiple string inside it:
data = {"full_name": a, "age": b, "gender": c, "e_mail": e_mail, "country": d}
data = {"full_name":[a], "age":[b], "gender":[c], "e_mail":[e_mail], "country":[d]}

Related

Extract specific value from JSON with Python

I am trying to get all the websites from this json file
Unfortunately, when I use this code:
import requests
response = requests.get("https://github.com/solana-labs/token-list/blob/main/src/tokens/solana.tokenlist.json")
output = response.json()
# Extract specific node content.
print(output['website'])
I get following error:
Traceback (most recent call last):
File "/Users/dusandev/Desktop/StreamFlowWebTests/extract.py", line 5, in <module>
output = response.json()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-
packages/requests/models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py",
line 346, in loads
return _default_decoder.decode(s)
File
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py",
line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py",
line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)
Any help is appreciated. Thank you in advance
Use raw data to get raw json and then iterate over 'tokens' attr of the response object:
import requests
response = requests.get(
"https://raw.githubusercontent.com/solana-labs/token-list/main/src/tokens/solana.tokenlist.json")
output = response.json()
for i in output['tokens']:
if i.get('extensions'):
print(i.get('extensions').get('website'))
The file https://github.com/solana-labs/token-list/blob/main/src/tokens/solana.tokenlist.json is not a json. Use https://raw.githubusercontent.com/solana-labs/token-list/main/src/tokens/solana.tokenlist.json instead.
If you visit the url https://github.com/solana-labs/token-list/blob/main/src/tokens/solana.tokenlist.json in a browser, you'll get a fully rendered web page. In order to get just JSON you need to use the "view raw" link. That winds up being
https://raw.githubusercontent.com/solana-labs/token-list/main/src/tokens/solana.tokenlist.json
You will then have several thousand elements in the array attached to the "tokens" key in the response dictionary. To get the website element you'll need to iterate through the list and look at the "extensions"
>>> output["tokens"][0]
{'chainId': 101, 'address': 'CbNYA9n3927uXUukee2Hf4tm3xxkffJPPZvGazc2EAH1', 'symbol': 'agEUR', 'name': 'agEUR (Wormhole)', 'decimals': 8, 'logoURI': 'https://raw.githubusercontent.com/solana-labs/token-list/main/assets/mainnet/CbNYA9n3927uXUukee2Hf4tm3xxkffJPPZvGazc2EAH1/logo.png', 'tags': ['ethereum', 'wrapped', 'wormhole'], 'extensions': {'address': '0x1a7e4e63778B4f12a199C062f3eFdD288afCBce8', 'assetContract': 'https://etherscan.io/address/0x1a7e4e63778B4f12a199C062f3eFdD288afCBce8', 'bridgeContract': 'https://etherscan.io/address/0x3ee18B2214AFF97000D974cf647E7C347E8fa585', 'coingeckoId': 'ageur', 'description': 'Angle is the first decentralized, capital efficient and over-collateralized stablecoin protocol', 'discord': 'https://discord.gg/z3kCpTaKMh', 'twitter': 'https://twitter.com/AngleProtocol', 'website': 'https://www.angle.money'}}
>>> output["tokens"][0]["extensions"]["website"]
'https://www.angle.money'
This error usually means that the output can not be parsed as a json.
you have 2 options:
use "https://raw.githubusercontent.com/solana-labs/token-list/main/src/tokens/solana.tokenlist.json" instead-
import requests
response = requests.get("https://raw.githubusercontent.com/solana-labs/token-list/main/src/tokens/solana.tokenlist.json")
output = response.json()
first_website = output["tokens"][0]["extensions"]["website"]
#all websites:
for token in output['tokens']:
if extensions := token.get('extensions'): print(extensions.get('website'))
#output:
'https://www.angle.money'
you can parse it using BeautifulSoup - https://www.dataquest.io/blog/web-scraping-python-using-beautiful-soup/

JSONDecodeError Issue? python

Can someone help me fix this, I don't know why I am getting this error.
I am trying to use a python program someone made, I tried to mess around with it but I could not figure out the issue.
Error:
PS D:\Python> python .\quizlet.py
Traceback (most recent call last):
File "D:\Python\quizlet.py", line 69, in <module>
q = QuizletParser(website)
File "D:\Python\quizlet.py", line 17, in QuizletParser
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
File "C:\Users\john\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\john\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 14 (char 13)
I am trying to use the code I found here from a bit ago: https://github.com/daijro/python-quizlet
Source:
from requests_html import HTMLSession
from box import Box
import box
import json
from bs4 import BeautifulSoup
from difflib import SequenceMatcher
def FindFlashcard(flashcards: box.box_list.BoxList, match: str):
similar = lambda a, b: SequenceMatcher(None, a, b).ratio()
data = max(list(zip([similar(match, x.term) for x in flashcards], [x for x in range(len(flashcards))])))
flashcard = flashcards[data[1]]
flashcard.update({'similarity': data[0]})
return flashcard
def QuizletParser(link: str):
session = HTMLSession()
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
flashcards = []
for i in list(data['termIdToTermsMap'].values()):
i = {
'index': i['rank'],
'id': i['id'],
'term': i['word'],
'definition': i['definition'],
'setId': i['setId'],
'image': i['_imageUrl'],
'termTts': 'https://quizlet.com'+i['_wordTtsUrl'],
'termTtsSlow': 'https://quizlet.com'+i['_wordSlowTtsUrl'],
'definitionTts': 'https://quizlet.com'+i['_definitionTtsUrl'],
'definitionTtsSlow': 'https://quizlet.com'+i['_definitionSlowTtsUrl'],
'lastModified': i['lastModified'],
}
flashcards.append(i)
output = {
'title': data['set']['title'],
'flashcards': flashcards,
'author': {
'name': data['creator']['username'],
'id': data['creator']['id'],
'timestamp': data['creator']['timestamp'],
'lastModified': data['creator']['lastModified'],
'image': data['creator']['_imageUrl'],
'timezone': data['creator']['timeZone'],
'isAdmin': data['creator']['isAdmin'],
},
'id': data['set']['id'],
'link': data['set']['_webUrl'],
'thumbnail': data['set']['_thumbnailUrl'],
'timestamp': data['set']['timestamp'],
'lastModified': data['set']['lastModified'],
'publishedTimestamp': data['set']['publishedTimestamp'],
'authorsId': data['set']['creatorId'],
'termLanguage': data['set']['wordLang'],
'definitionLanguage': data['set']['defLang'],
'description': data['set']['description'],
'numTerms': data['set']['numTerms'],
'hasImages': data['set']['hasImages'],
'hasUploadedImage': data['hasUploadedImage'],
'hasDiagrams': data['set']['hasDiagrams'],
'hasImages': data['set']['hasImages'],
}
return Box(output)
website = 'https://quizlet.com/475389316/python-web-scraping-flash-cards/'
text = 'Two popular parsers'
q = QuizletParser(website)
flashcard = FindFlashcard(q.flashcards, match=text) # finds the flashcard most similar to the input
print(flashcard.term + " " + flashcard.definition) # calculates how similar the identified flashcard is to the input
Hard to give a solution without looking at the data.
A few tips for debugging JSON errors:
Check the input data to the JSONDecoder. You might be adding a comma to the last key-value pair of the input dictionary (which is very common).
Check the data type. If your input data came from an external source check the data first.
I would suggest doing a print of this and pasting it here if possible.
input_data = BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152]
print(input_data)

Fail to acces dict in dict with string indeces must be integer error

import boto3
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId='xxxx')
print('entire response:', response)
print('SecretString:',response['SecretString'])
print('testvalue:'response['SecretString']["testkey"])
I am trying to implement aws secretsmanager and need to acces the testvalue.
entire response:{---, u'SecretString': u'{"testkey":"testvalue","testkey2":"testvalue2"}', ----}
Secretstring:{"testkey":"testvalue","testkey2":"testvalue2"}
Traceback (most recent call last):
File "secretmanagertest.py", line 7, in <module>
print('testvalue',response['SecretString']["testkey"])
TypeError: string indices must be integers
When I try integer instead I only get the specific character.
print(response['SecretString'][0])
{
print(response['SecretString'][1])
"
print(response['SecretString'][2])
t
etc.
The response is a nested JSON document, not a dictionary yet. Decode it first, with json.loads():
import json
secret = json.loads(response['SecretString'])
print(secret['testkey'])
Demo:
>>> import json
>>> response = {u'SecretString': u'{"testkey":"testvalue","testkey2":"testvalue2"}'}
>>> response['SecretString']
u'{"testkey":"testvalue","testkey2":"testvalue2"}'
>>> json.loads(response['SecretString'])
{u'testkey2': u'testvalue2', u'testkey': u'testvalue'}
>>> json.loads(response['SecretString'])['testkey']
u'testvalue'

python : error handling Ordered dict with unicode data

My script migrates data from MySQL to mongodb. It runs perfectly well when there are no unicode columns included. But throws me below error when OrgLanguages column is added.
mongoImp = dbo.insert_many(odbcArray)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/collection.py", line 711, in insert_many
blk.execute(self.write_concern.document)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/bulk.py", line 493, in execute
return self.execute_command(sock_info, generator, write_concern)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/bulk.py", line 319, in execute_command
run.ops, True, self.collection.codec_options, bwc)
bson.errors.InvalidStringData: strings in documents must be valid UTF-8: 'Portugu\xeas do Brasil, ?????, English, Deutsch, Espa\xf1ol latinoamericano, Polish'
My code:
import MySQLdb, MySQLdb.cursors, sys, pymongo, collections
odbcArray=[]
mongoConStr = '192.168.10.107:36006'
sqlConnect = MySQLdb.connect(host = "54.175.170.187", user = "testuser", passwd = "testuser", db = "testdb", cursorclass=MySQLdb.cursors.DictCursor)
mongoConnect = pymongo.MongoClient(mongoConStr)
sqlCur = sqlConnect.cursor()
sqlCur.execute("SELECT ID,OrgID,OrgLanguages,APILoginID,TransactionKey,SMTPSpeed,TimeZoneName,IsVideoWatched FROM organizations")
dbo = mongoConnect.eaedw.mysqlData
tuples = sqlCur.fetchall()
for tuple in tuples:
odbcArray.append(collections.OrderedDict(tuple))
mongoImp = dbo.insert_many(odbcArray)
sqlCur.close()
mongoConnect.close()
sqlConnect.close()
sys.exit()
Above script migraates data perfectly when tried without OrgLanguages column in the SELECT query.
To overcome this, I have tried to use the OrderedDict() in another way but gives me a different type of error
Changed Code:
for tuple in tuples:
doc = collections.OrderedDict()
doc['oid'] = tuple.OrgID
doc['APILoginID'] = tuple.APILoginID
doc['lang'] = unicode(tuple.OrgLanguages)
odbcArray.append(doc)
mongoImp = dbo.insert_many(odbcArray)
Error Received:
Traceback (most recent call last):
File "pymsql.py", line 19, in <module>
doc['oid'] = tuple.OrgID
AttributeError: 'dict' object has no attribute 'OrgID'
Your MySQL connection is returning characters in a different encoding than UTF-8, which is the encoding that all BSON strings must be in. Try your original code but pass charset='utf8' to MySQLdb.connect.

django.http.JsonResponse return json data in wrong format

I want to return the queryset in json format, and I use the JsonResponse as the following:
def all_alert_history(request):
''' get all all alert history data '''
all_data_json = serializers.serialize('json', LatestAlert.objects.all())
return JsonResponse(all_data_json,safe=False)
but the browser shows like this:
"[{\"fields\": {\"alert_name\": \"memory usage\", \"alert_value\": 83.7, \"alert_time\": \"2016-11-08T06:21:20.717Z\", \"alert_level\": \"warning\", \"alert_rule\": \"warning: > 80%\"}, \"model\": \"alert_handler.latestalert\", \"pk\": \"xyz.test-java.ip-10-0-10-138.memory.percent\"}]"
I replace the JsonResponse with HttpResponse :
def all_alert_history(request):
''' get all all alert history data '''
all_data_json = serializers.serialize('json', LatestAlert.objects.all())
return HttpResponse(all_data_json, content_type='application/json')
and the browser shows like this:
[{"fields": {"alert_name": "memory usage", "alert_value": 83.7, "alert_time": "2016-11-08T06:21:20.717Z", "alert_level": "warning", "alert_rule": "warning: > 80%"}, "model": "alert_handler.latestalert", "pk": "xyz.test-java.ip-10-0-10-138.memory.percent"}]
so, why does the \ appears when I use the JsonResponse but disappear when use the HttpResponse?
django version:1.8
JsonResponse takes a python dictionary and returns it as a json formatted string for the browser.
Since you're providing the JsonResponse with an already json formatted string it will try to escape all necessary characters with \.
Example:
>>> from django.http import JsonResponse
>>> response = JsonResponse({'foo': 'bar'})
>>> response.content
b'{"foo": "bar"}'
In your case JsonResponse even warns you about what you are doing when passing a string, hence making the safe = False parameter necessary:
>>> mydata = {"asd":"bdf"}
>>> import json
>>> myjson = json.dumps(mydata)
>>> JsonResponse(myjson)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/swozny/work2/local/lib/python2.7/site-packages/django/http/response.py", line 500, in __init__
raise TypeError('In order to allow non-dict objects to be '
TypeError: In order to allow non-dict objects to be serialized set the safe parameter to False
With the parameter set to False your observed behavior is reproducible:
>>> JsonResponse(myjson,safe=False).content
'"{\\"asd\\": \\"bdf\\"}"'
Bottom line is that if your model is a little more complex than basic data types ( IntegerField,CharField,...) then you probably will want to do the serialization yourself and stick to HttpResponse or just use djangorestframework which offers tools to do it for you.

Categories