Python json.loads() returns JSONDecodeError [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I wanted to write a program that gives me a live feed of how much subscribers a youtube channel has. For this I used the google's api which give the info in a json file:
{
"kind": "youtube#channelListResponse",
"etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/wTcrqM2kHwjf7GxOEpSBk_lofRA\"",
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 5
},
"items": [
{
"kind": "youtube#channel",
"etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/HhHZCWV2vASrbydwK9ItUgUm0X8\"",
"id": "UC-lHJZR3Gqxm24_Vd_AJ5Yw",
"statistics": {
"viewCount": "19893639729",
"commentCount": "0",
"subscriberCount": "79695778",
"hiddenSubscriberCount": false,
"videoCount": "3707"
}
}
]
}
Here's the code:
import json
json_str = '''{
{
"kind": "youtube#channelListResponse",
"etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/wTcrqM2kHwjf7GxOEpSBk_lofRA\"",
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 5
},
"items": [
{
"kind": "youtube#channel",
"etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/HhHZCWV2vASrbydwK9ItUgUm0X8\"",
"id": "UC-lHJZR3Gqxm24_Vd_AJ5Yw",
"statistics": {
"viewCount": "19893639729",
"commentCount": "0",
"subscriberCount": "79695778",
"hiddenSubscriberCount": false,
"videoCount": "3707"
}
}
]
}
'''
data = json.loads(json_str)
print(data)
But when I try to convert it into a python dictionary using json.loads(), I get the following error:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 2 (char 3)
Also:
print(ascii(json_str))
'{\n {\n "kind": "youtube#channelListResponse",\n "etag": ""XI7nbFXulYBIpL0ayR_gDh3eu1k/wTcrqM2kHwjf7GxOEpSBk_lofRA"",\n "pageInfo": {\n "totalResults": 1,\n "resultsPerPage": 5\n },\n "items": [\n {\n "kind": "youtube#channel",\n "etag": ""XI7nbFXulYBIpL0ayR_gDh3eu1k/HhHZCWV2vASrbydwK9ItUgUm0X8"",\n "id": "UC-lHJZR3Gqxm24_Vd_AJ5Yw",\n "statistics": {\n "viewCount": "19893639729",\n "commentCount": "0",\n "subscriberCount": "79695778",\n "hiddenSubscriberCount": false,\n "videoCount": "3707"\n }\n }\n ]\n}\n'
What's causing the problem?

Using the following code I was able to open your JSON and print it. Save your JSON to temp.json and try this:
import json
with open("temp.json", "r") as infile:
data = json.loads(infile.read())
print(data)

Related

Easy way to convert JSON to pyarrow schema

Disclaimer: I'm new to apache parquet and pyarrow. Is there is easy way to convert json to a pyarrow schema? The json I'm working with is:
{
"_time": ${datetime},
"activity": ${event_id},
"activity_id": 6,
"category_name": "Network Activity",
"category_uid": 4,
"class_name": "HTTP Activity",
"class_uid": 4002,
"dst_endpoint": {
"ip": ${sip}
},
"http_request": {
"hostname": ${host},
"url": {
"hostname": ${url_host},
"path": ${url_path},
"text": ${url}
},
"user_agent": ${ua},
"version": ${reqversion}
},
"http_response": {
"code": ${respcode}
},
"metadata": {
"version": 1.0.0
},
"severity": ${riskscore},
"severity_id": 0,
"src_endpoint": {
"ip": ${cip}
},
"type_name": "HTTP Activity: Traffic",
"type_uid": 400206
}

Fetching Comments using Youtube data API

Im trying to fetch all the comments for a particular Youtube video.
I've written the following code:
#storing all the comments in a list (l)
def video_comments(url):
# empty list for storing reply
replies = []
# creating youtube resource object
youtube = build('youtube', 'v3',
developerKey=api_key)
# retrieve youtube video results
video_response=youtube.commentThreads().list(
part='snippet,replies',
videoId=url
).execute()
for item in video_response['items']:
# Extracting comments
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
# counting number of reply of comment
replycount = item['snippet']['totalReplyCount']
# if reply is there
if replycount>0:
# iterate through all reply
for reply in item['replies']['comments']:
# Extract reply
reply = reply['snippet']['textDisplay']
# Store reply is list
replies.append(reply)
comment = remove_URL(comment)
# print comment with list of reply
l.append(comment)
for resp in replies:
resp = remove_URL(resp)
# print comment with list of replyprint(resp, replies, end = '\n\n')
l.append(comment)
# empty reply list
replies = []
video_comments(n)
However, the following code fetches only 20-25 comments even though that video has hundreds-thousands of comments.
The response has a nextPageToken attribute with its value - see the documentation, then, you have to use that token - in order to get the next results.
Try this example:
https://youtube.googleapis.com/youtube/v3/commentThreads?part=id%2Creplies%2Csnippet&maxResults=10&videoId=pf3kMUZvyE8&key=[YOUR_API_KEY]
Response: Note in the response the nextPageToken attribute.
{
"kind": "youtube#commentThreadListResponse",
"etag": "priyTHCuTXn9LlRkKazYailhGq0",
"nextPageToken": "QURTSl9pMlgzMi1IR0ZfTEtXZzNFRjQ1N3dEVmJlNXlPZ3BqUDFrMHlUejdxc3NIZFBOS013dWFRVjU5TWotWFJBaFJfUE1BSHR4aE9BQQ==",
"pageInfo": {
"totalResults": 9,
"resultsPerPage": 10
},
"items": [
{
"kind": "youtube#commentThread",
"etag": "MezAPCqHnXHD4xfxGWCKw8GwMrk",
"id": "Ugybh70lAXjKtWKnhVt4AaABAg",
"snippet": {
"videoId": "pf3kMUZvyE8",
"topLevelComment": {
"kind": "youtube#comment",
"etag": "MfJ5ylnOGfVyfNlVM7qc0mSwLJQ",
"id": "Ugybh70lAXjKtWKnhVt4AaABAg",
"snippet": {
"videoId": "pf3kMUZvyE8",
"textDisplay": "Electricity is raw energy",
"textOriginal": "Electricity is raw energy",
"authorDisplayName": "Kevinzhw Zhang wang",
"authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AKedOLSU9_Tg183EZXdMmQbFcYKBw4WBajjPZc4gpT1W=s48-c-k-c0x00ffffff-no-rj",
"authorChannelUrl": "http://www.youtube.com/channel/UCBCwvesq011-2OP1mXq6t8w",
"authorChannelId": {
"value": "UCBCwvesq011-2OP1mXq6t8w"
},
"canRate": true,
"viewerRating": "none",
"likeCount": 0,
"publishedAt": "2021-12-24T05:59:47Z",
"updatedAt": "2021-12-24T05:59:47Z"
}
},
"canReply": true,
"totalReplyCount": 0,
"isPublic": true
}
},
{
"kind": "youtube#commentThread",
"etag": "fiwm5vdcDBQh_CtyzB05jqp3h68",
"id": "UgzoTdopkSulNGL_6tZ4AaABAg",
"snippet": {
"videoId": "pf3kMUZvyE8",
"topLevelComment": {
"kind": "youtube#comment",
"etag": "pCGjZzOYwkp7Z4bbhF_DiutwSow",
"id": "UgzoTdopkSulNGL_6tZ4AaABAg",
"snippet": {
"videoId": "pf3kMUZvyE8",
"textDisplay": "Yo no tengo autismo y si intenté eso XD",
"textOriginal": "Yo no tengo autismo y si intenté eso XD",
"authorDisplayName": "XXX DDD",
"authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AKedOLTiD1hjwHmK8TWDil3XujkWfIFMvrc-_y0cTg=s48-c-k-c0x00ffffff-no-rj",
"authorChannelUrl": "http://www.youtube.com/channel/UCXarJ5GGpaBLaV1KEPimQXA",
"authorChannelId": {
"value": "UCXarJ5GGpaBLaV1KEPimQXA"
},
"canRate": true,
"viewerRating": "none",
"likeCount": 0,
"publishedAt": "2021-12-24T00:45:31Z",
"updatedAt": "2021-12-24T00:45:31Z"
}
},
"canReply": true,
"totalReplyCount": 0,
"isPublic": true
}
},
[other comments here...]
]
}

key error while parsing python dictionary

{
"kind": "youtube#commentThreadListResponse",
"etag": "5b1YCNidguUpH4QsR6mpPJrL6es",
"nextPageToken": "QURTSl9pMTQwTEZFU1VRZTB1R2toTFh5djJJSWQzM1oyOXp4Z3ppSXZSNEtNQ25RRzQyRm1xXzFwMDZvc3dqb1g5dnQyTnVUMVJld2lWVXFta2tFclh2LWk3eENwOFFxMmluTGhlY3JXOHNsSnh4ZlFyNllfdWVWMVlPdkhiWWlnVzA=",
"pageInfo": {
"totalResults": 100,
"resultsPerPage": 100
},
"items": [
{
"kind": "youtube#commentThread",
"etag": "GQifP0HFLluusa1n0pFQCxggSvI",
"id": "UgxWDLFO6d6fhe4UaJd4AaABAg",
"snippet": {
"videoId": "BEWz4SXfyCQ",
"topLevelComment": {
"kind": "youtube#comment",
"etag": "YlbdyUbeN1LqFBOqDnQnQZU2DnQ",
"id": "UgxWDLFO6d6fhe4UaJd4AaABAg",
"snippet": {
"videoId": "BEWz4SXfyCQ",
"textDisplay": "Honestly Jeremy is just an annoying piggyback rider",
"textOriginal": "Honestly Jeremy is just an annoying piggyback rider",
"authorDisplayName": "Michael Myers",
"authorProfileImageUrl": "https://yt3.ggpht.com/a/AATXAJwHIfrPXguIZR7YggVntreixLfBisGtlo5xTg=s48-c-k-c0xffffffff-no-rj-mo",
"authorChannelUrl": "http://www.youtube.com/channel/UCs4do_iNqxBcxxPmv6U1VPg",
"authorChannelId": {
"value": "UCs4do_iNqxBcxxPmv6U1VPg"
},
"canRate": true,
"viewerRating": "none",
"likeCount": 0,
"publishedAt": "2020-07-08T20:55:48Z",
"updatedAt": "2020-07-08T20:55:48Z"
}
},
"canReply": true,
"totalReplyCount": 0,
"isPublic": true
}
},
{
"kind": "youtube#commentThread",
"etag": "wFEgumlYzFR2ZLOsHgEdQoV45SI",
"id": "UgxaQ38-nL84EgK9ABh4AaABAg",
"snippet": {
"videoId": "BEWz4SXfyCQ",
"topLevelComment": {
"kind": "youtube#comment",
"etag": "KyMK87Zq9ej2AHtl44x5-ykwnzQ",
"id": "UgxaQ38-nL84EgK9ABh4AaABAg",
"snippet": {
"videoId": "BEWz4SXfyCQ",
"textDisplay": "Bring bob back and leave captain graybeard at the damn house",
"textOriginal": "Bring bob back and leave captain graybeard at the damn house",
"authorDisplayName": "Brad Johnson",
"authorProfileImageUrl": "https://yt3.ggpht.com/a/AATXAJzzXxTu9bz5hzGL20X1w3ALIcqIWBCc4uzuQPS8=s48-c-k-c0xffffffff-no-rj-mo",
"authorChannelUrl": "http://www.youtube.com/channel/UCwTUCnELUJ3IwcBsEwqjNaQ",
"authorChannelId": {
"value": "UCwTUCnELUJ3IwcBsEwqjNaQ"
},
"canRate": true,
"viewerRating": "none",
"likeCount": 1,
"publishedAt": "2020-07-08T18:37:35Z",
"updatedAt": "2020-07-08T18:37:35Z"
}
},
"canReply": true,
"totalReplyCount": 1,
"isPublic": true
},
"replies": {
"comments": [
{
"kind": "youtube#comment",
"etag": "eEq9MZRmGGq3sX4IpzEHk_pYvTw",
"id": "UgxaQ38-nL84EgK9ABh4AaABAg.9ArZ6N2FniS9ArdOylLUcm",
"snippet": {
"videoId": "BEWz4SXfyCQ",
"textDisplay": "No, because then there'd be no one to distract you from what a fraud Lazar is.",
"textOriginal": "No, because then there'd be no one to distract you from what a fraud Lazar is.",
"parentId": "UgxaQ38-nL84EgK9ABh4AaABAg",
"authorDisplayName": "Rombert Dillahuntsvalle",
"authorProfileImageUrl": "https://yt3.ggpht.com/a/AATXAJwALDysFZlmZoXLVeqzSZc6HcvUetsOCk6a2vTY=s48-c-k-c0xffffffff-no-rj-mo",
"authorChannelUrl": "http://www.youtube.com/channel/UCpdQrMvl72DIMs1vpsKvpgQ",
"authorChannelId": {
"value": "UCpdQrMvl72DIMs1vpsKvpgQ"
},
"canRate": true,
"viewerRating": "none",
"likeCount": 0,
"publishedAt": "2020-07-08T19:23:49Z",
"updatedAt": "2020-07-08T19:23:49Z"
}
}
]
}
},
The code is:
for i in data['items']:
print (i['replies']['comments'][0]['snippet']['textOriginal'])
My apologies for the terrible formatting, but I couldn't get all of it to fit in the code block.
I am trying to retrieve the nested "replies" then "comments". I have searched extensively through similar posts, and am still stuck.
I keep getting a key error for 'replies'.
Any help would be much appreciated, thanks.
You need either to check if the key exists or use a try/except block:
for i in dct['items']:
try:
print(i['replies']['comments'][0]['snippet']['textOriginal'])
except KeyError:
pass
This yields for your given input:
No, because then there'd be no one to distract you from what a fraud Lazar is.

Read specific JSON object from response in Python [duplicate]

This question already has answers here:
Accessing elements of Python dictionary by index
(11 answers)
Closed 6 years ago.
When you have a JSON response that contains multiple JSON objects, how do you pull out a specific object within the JSON using Python?
For example, with the following JSON response, I have three objects in it.
{
"_links": {
"base": "REDACTED",
"context": "",
"self": "REDACTED"
},
"limit": 20,
"results": [
{
"_expandable": {
"ancestors": "",
"body": "",
"children": "",
"container": "",
"descendants": "",
"extensions": "",
"history": "/rest/api/content/198121503/history",
"metadata": "",
"operations": "",
"space": "/rest/api/space/ReleaseNotes",
"version": ""
},
"_links": {
"self": "REDACTED",
"tinyui": "/x/HxjPCw",
"webui": "UNIQUE_URL_HERE"
},
"id": "198121503",
"status": "current",
"title": "Unique Title of Content",
"type": "page"
},
{
"_expandable": {
"ancestors": "",
"body": "",
"children": "",
"container": "",
"descendants": "",
"extensions": "",
"history": "/rest/api/content/197195923/history",
"metadata": "",
"operations": "",
"space": "/rest/api/space/ReleaseNotes",
"version": ""
},
"_links": {
"self": "REDACTED",
"tinyui": "/x/k-jACw",
"webui": "UNIQUE_URL_HERE"
},
"id": "197195923",
"status": "current",
"title": "Unique Title of Content",
"type": "page"
},
{
"_expandable": {
"ancestors": "",
"body": "",
"children": "",
"container": "",
"descendants": "",
"extensions": "",
"history": "/rest/api/content/198121203/history",
"metadata": "",
"operations": "",
"space": "/rest/api/space/ReleaseNotes",
"version": ""
},
"_links": {
"self": "REDACTED",
"tinyui": "/x/8xbPCw",
"webui": "UNIQUE_URL_HERE"
},
"id": "198121203",
"status": "current",
"title": "Unique Title of Content",
"type": "page"
}
],
"size": 3,
"start": 0
}
How can I retrieve the ID and TITLE for a specific object in the response?
I read in other threads that when you use json.loads(your_json), it becomes a dictionary. If that's the case, how do I pull this data if it's stored as a dictionary?
Update
Let me clarify, as maybe I'm not seeing or explaining this clearly.
Is the only option to cycle through everything? There's not an option to say get me the 2nd JSON object and return the ID and Title? If that's the case, why shouldn't I create a custom object, store the items I want from each JSON object into those within an array, then I can access each object within the array?
After you transform your response to json, you can just use key attributes.
for result in data['results']:
print("id: {}, title: {}".format(result['id'], result['title']))
As you mentioned, you can use json.load to transform string to dictionary. But if you're using requests library, just use response.json to get data in required format.
Use bracket notation to access the keys after loading the string into a json object. Loop through the results key until you find the object you want, just like this:
j = json.loads(your_json)
for r in j["results"]:
if r["title"] == "Something":
print(r["id"])
print(r["title"])

ElasticSearch Parse Error

I am attempting to read JSON Data from a Network Port Scan and store these results in an ElasticSearch Index as a document. However, whenever I try to do this, I get a MapperParsingException error on the scan output results. In my mapping, I even tried to change the analysis to not_analyzed and no, but the error doesnt go away. Then, I figured that ES might be trying to interpret certain values as date values and attempted to set date_format to 0 or none. That led to a dead-end as well, with the mapping throwing an Unsupported option exception.
I have a dump of the values that I want to index in ElasticSearch here:
{
"protocol": "tcp",
"service": "ssh",
"state": "open",
"script_out": [
{
"output": "\n 1024 de:4e:50:33:cd:f6:8a:d0:c4:5a:e9:7d:1e:7b:13:12 (DSA)\nssh-dss AAAAB3NzaC1kc3MAAACBANkPx1nphZwsN1SVPPQHwz93abIHuEC4wMEeZiXdBC8RoSUUeCmdgPfIh4or0LvZ1pqaZP/k0qzCLyVxFt/eI7n36Lb9sZdVMf1Ao7E9TSc7lj9wg5ffY58WbWob/GQs1llGZ2K9Gp7oWuwCjKP164MsxMvahoJAAaWfap48ZiXpAAAAFQCnRMwRp8wBzzQU6lia8NegIb5rswAAAIEAxvN66VMDxE5aU8SvwwVmcUNwtVQWZ6pxn2W0gzF6H7JL1BhcnbCwQ3J/S6WdtqL2Dscw8drdAvsrN4XC8RT6Jowsir4q4HSQCybll6fSpNEdlv/nLIlYsH5ZuZZUIMxbTQ9vT0oYvzpDHejIQ/Zl1inYnJ+6XJmOc0LPUsu5PEsAAACAQO+Tsd3inLGskrqyrWSDO0VDD3cApYW7C+uTWXBfIoh/sVw+X9+OPa833w/PQkpacm68kYPXKS7GK8lqhg93dwbUNYFKz9MMNY6WVOjeAX9HtUAbglgLyRIt0CBqmL4snoZeKab22Nlmaf4aU5cHFlG9gnFEcK0vVIwIWp2EM/I=\n 2048 94:5f:86:77:81:39:2e:03:e0:42:d8:7d:10:a5:60:f0 (RSA)\nssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDV9BKj+QSavAr4UcDaCoHADVIaMOpcI5/hx/X9CRLDTxmB/WvEiL42tziMZEx7ipHT28/hl4HOwK64eXZuK75JMrMDutCZ2gmvRmvFKl6mAVbUEOlVkMGZeNJxATCZyWQyrZ6wA9E2ns5+id6l9C8we+bdq39cIR/e+yR8Ht8sfaigDi0gcW67GrHDI/oIgTQ79l+T/xAqCVrtQxqn/6pCuaCWQUVCxgOPXmJPbsd+g+oqZtm0aEjIJvcDJocMkZ2qMMlgMPeJBN27FCTKB80UUbV57iHXHzZF+cD7v+Jlw0fmyMapMkkPH+aabOUy7Kkbty1mucrFxaisLsckEf47",
"elements": {
"null": [
{
"type": "ssh-dss",
"bits": "1024",
"key": "AAAAB3NzaC1kc3MAAACBANkPx1nphZwsN1SVPPQHwz93abIHuEC4wMEeZiXdBC8RoSUUeCmdgPfIh4or0LvZ1pqaZP/k0qzCLyVxFt/eI7n36Lb9sZdVMf1Ao7E9TSc7lj9wg5ffY58WbWob/GQs1llGZ2K9Gp7oWuwCjKP164MsxMvahoJAAaWfap48ZiXpAAAAFQCnRMwRp8wBzzQU6lia8NegIb5rswAAAIEAxvN66VMDxE5aU8SvwwVmcUNwtVQWZ6pxn2W0gzF6H7JL1BhcnbCwQ3J/S6WdtqL2Dscw8drdAvsrN4XC8RT6Jowsir4q4HSQCybll6fSpNEdlv/nLIlYsH5ZuZZUIMxbTQ9vT0oYvzpDHejIQ/Zl1inYnJ+6XJmOc0LPUsu5PEsAAACAQO+Tsd3inLGskrqyrWSDO0VDD3cApYW7C+uTWXBfIoh/sVw+X9+OPa833w/PQkpacm68kYPXKS7GK8lqhg93dwbUNYFKz9MMNY6WVOjeAX9HtUAbglgLyRIt0CBqmL4snoZeKab22Nlmaf4aU5cHFlG9gnFEcK0vVIwIWp2EM/I=",
"fingerprint": "de4e5033cdf68ad0c45ae97d1e7b1312"
},
{
"type": "ssh-rsa",
"bits": "2048",
"key": "AAAAB3NzaC1yc2EAAAADAQABAAABAQDV9BKj+QSavAr4UcDaCoHADVIaMOpcI5/hx/X9CRLDTxmB/WvEiL42tziMZEx7ipHT28/hl4HOwK64eXZuK75JMrMDutCZ2gmvRmvFKl6mAVbUEOlVkMGZeNJxATCZyWQyrZ6wA9E2ns5+id6l9C8we+bdq39cIR/e+yR8Ht8sfaigDi0gcW67GrHDI/oIgTQ79l+T/xAqCVrtQxqn/6pCuaCWQUVCxgOPXmJPbsd+g+oqZtm0aEjIJvcDJocMkZ2qMMlgMPeJBN27FCTKB80UUbV57iHXHzZF+cD7v+Jlw0fmyMapMkkPH+aabOUy7Kkbty1mucrFxaisLsckEf47",
"fingerprint": "945f867781392e03e042d87d10a560f0"
}
]
},
"id": "ssh-hostkey"
}
],
"banner": "product: OpenSSH version: 6.2 extrainfo: protocol 2.0",
"port": "22"
},
Update
I am able to index the content in the "output" key. However, the error appears when I try and index the content in the "elements" key
Update 2
There's a possibility that there's something wrong with my mapping. This is the python code that I am using for the mapping.
"scan_info": {
"properties": {
"protocol": {
"type": "string",
"index": "analyzed"
},
"service": {
"type": "string",
"index": "analyzed"
},
"state": {
"type": "string",
"index": "not_analyzed"
},
"banner": {
"type": "string",
"index": "analyzed"
},
"port": {
"type": "string",
"index": "not_analyzed"
},
"script_out": { #is this the problem??
"type": "object",
"dynamic": True
}
}
}
I am drawing a blank here. What do I need to do?

Categories