I get the JSON response by requests.get
req = requests.get(SAMPLE_SCHEDULE_API)
and convert it into dictionary
data = json.loads(req.text)["data"]
When I tried to convert the string into Python dict,
I got ValueError: malformed node or string:
ast.literal_eval(data)
I have no idea how to do this task.
code snippets
def schedules(cls, start_date=None, end_date=None):
import ast
req = requests.get(SAMPLE_SCHEDULE_API)
data = json.loads(req.text)["data"]
ast.literal_eval(data)
return pd.DataFrame(json.loads(req.text)["data"])
JSON response
{
status: "ok",
version: "v1",
data: "[
{"_id":"2015-01-28","end_date":"2015-01-28","estimated_release":1422453600000,"is_projection":false,"is_statement":true,"material_link":null,"start_date":"2015-01-27"},
{"_id":"2015-03-18","end_date":"2015-03-18","estimated_release":1426687200000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-03-17"},
{"_id":"2015-04-29","end_date":"2015-04-29","estimated_release":1430316000000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-04-28"},
{"_id":"2015-06-17","end_date":"2015-06-17","estimated_release":1434549600000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-06-16"},
{"_id":"2015-07-29","end_date":"2015-07-29","estimated_release":1438178400000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-07-28"}]"
}
Detail error message
Traceback (most recent call last):
File "fomc.py", line 25, in <module>
schedules = FOMC.schedules()
File "fomc.py", line 21, in schedules
ast.literal_eval(data)
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 86, in literal_eval
return _convert(node_or_string)
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 58, in _convert
return list(map(_convert, node.elts))
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 63, in _convert
in zip(node.keys, node.values))
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 62, in <genexpr>
return dict((_convert(k), _convert(v)) for k, v
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 85, in _convert
raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Name object at 0x10a19c990>
You have encoded the data twice (which would strictly not be necessary). You just need to decode the data again with json.loads:
def schedules(cls, start_date=None, end_date=None):
req = requests.get(SAMPLE_SCHEDULE_API)
data_json = json.loads(req.text)["data"]
data = json.loads(data_json)
return pd.DataFrame(data)
Do note that ast.literal_eval is for Python code, whereas json.loads is for JSON that closely follows JavaScript code; the differences are for example true , false and null vs True, False and None. The former are the javascript syntax as used in JSON (and thus you would need json.loads; the latter is Python code, for which you would use ast.literal_eval.
As the response already is json format, you do not need to encode it. Approach like this,
req = requests.get(SAMPLE_SCHEDULE_API)
data_str = req.json().get('data')
json_data = json.loads(data_str)
json() method will return the json-encoded content of a response.
The field "data" is a string, not a list. The content of that string seems to be JSON, too, so you have JSON encapsulated in JSON for some reason. If you can, fix that so that you only encode as JSON once. If that doesn't work, you can retrieve that field and decode it separately.
Related
I am using below Python code (Lambda Function) for data transformation using kinesis data firehose. I am getting below error.
Code:
#This function is created to Transform the data from Kinesis Data Firehose -> S3 Bucket
#It converts single line json to multi line json as expected by AWS Athena best practice.
#It also removes special characters from json keys (column name in Athena) as Athena expects column names without special characters
import json
import boto3
import base64
import string
from typing import Optional, Iterable, Union
delete_dict = {sp_character: '' for sp_character in string.punctuation}
PUNCT_TABLE = str.maketrans(delete_dict)
output = []
def lambda_handler(event, context):
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
remove_special_char = json.loads(payload, object_pairs_hook=clean_keys)
row_w_newline = str(remove_special_char) + "\n"
row_w_newline = base64.b64encode(row_w_newline.encode('utf-8'))
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': row_w_newline
}
output.append(output_record)
print('Processed {} records.'.format(len(event['records'])))
return {'records': output}
def strip_punctuation(s: str,
exclude_chars: Optional[Union[str, Iterable]] = None) -> str:
"""
Remove punctuation and spaces from a string.
If `exclude_chars` is passed, certain characters will not be removed
from the string.
"""
punct_table = PUNCT_TABLE.copy()
if exclude_chars:
for char in exclude_chars:
punct_table.pop(ord(char), None)
# Next, remove the desired punctuation from the string
return s.translate(punct_table)
def clean_keys(o):
return {strip_punctuation(k): v for k, v in o}
Error:
An error occurred during JSON serialization of response: b'eyd2ZXJzaW9uJzogJzAnLCAnaWQnOiAnNjFhMGI4YjQtOGRhYS0xNGMwLTllOTMtNzhhNjk0MTY0MDgxJywgJ2RldGFpbHR5cGUnOiAnQVdTIEFQSSBDYWxsIHZpYSBDbG91ZFRyYWlsJywgJ3NvdXJjZSc6ICdhd3Muc2VjdXJpdHlodWInLCAnYWNjb3VudCc6ICc5MzQ3NTU5ODkxNzYnLCAndGltZSc6ICcyMDIxLTExLTIzVDE1OjQxOjQ3WicsICdyZWdpb24nOiAndXMtZWFzdC0xJywgJ3Jlc291cmNlcyc6IFtdLCAnZGV0YWlsJzogeydldmVudFZlcnNpb24nOiAnMS4wOCcsICd1c2VySWRlbnRpdH'
is not JSON serializable
Traceback (most recent call last):
File "/var/lang/lib/python3.6/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/var/lang/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/var/lang/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/var/runtime/bootstrap.py", line 135, in decimal_serializer
raise TypeError(repr(o) + " is not JSON serializable")
Event :
{'recordId': '49623720050963652954313901532126731765249603147428528130000000', 'approximateArrivalTimestamp': 1637711607661, 'data': 'eyJ2ZXJzaW9uIjoiMCIsImlkIjoiMzFkOGE3MmItYWUxNC02ZDYzLWRjODUtMTZmNWViMzk3ZTAyIiwiZGV0YWlsLXR5cGUiOiJBV1MgQVBJIENhbGwgdmlhIENsb3VkVHJhaWwiLCJzb3VyY2UiOiJhd3Muc2VjdXJpdHlodWIiLCJhY2NvdW50IjoiMjIwMzA3MjAyMzYyIiwidGltZSI6IjIwMjEtMTEtMjNUMjM6NTM6MTdaIiwicmVnaW9uIjoidXMtd2VzdC0yIiwicmVzb3VyY2VzIjpbXSwiZGV0YWlsIjp7ImV2ZW50VmVyc2lvbiI6IjEuMDgiLCJ1c2VySWRlbnRpdHkiOnsidHlwZSI6IlJvb3QiLCJwcmluY2lwYWxJZCI6IjIyMDMwNzIwMjM2MiIsImFybiI6ImFybjphd3M6aWFtOjoyMjAzMDcyMDIzNjI6cm9vdCIsImFjY291bnRJZCI6IjIyMDMwNzIwMjM2MiIsImFjY2Vzc0tleUlkIjoiQVNJQVRHUzJWRUU1TEQ2TUZFRlYiLCJzZXNzaW9uQ29udGV4dCI6eyJzZXNzaW9uSXNzdWVyIjp7fSwid2ViSWRGZWRlcmF0aW9uRGF0YSI6e30sImF0dHJpYnV0ZXMiOnsiY3JlYXRpb25EYXRlIjoiMjAyMS0xMS0yM1QxNToxMDo1N1oiLCJtZmFBdXRoZW50aWNhdGVkIjoiZmFsc2UifX19LCJldmVudFRpbWUiOiIyMDIxLTExLTIzVDIzOjUzOjE3WiIsImV2ZW50U291cmNlIjoic2VjdXJpdHlodWIuYW1hem9uYXdzLmNvbSIsImV2ZW50TmFtZSI6IkJhdGNoRGlzYWJsZVN0YW5kYXJkcyIsImF3c1JlZ2lvbiI6InVzLXdlc3QtMiIsInNvdXJjZUlQQWRkcmVzcyI6IjEwNC4xMjkuMTk4LjEwMSIsInVzZXJBZ2VudCI6ImF3cy1pbnRlcm5hbC8zIGF3cy1zZGstamF2YS8xLjEyLjExMiBMaW51eC81LjQuMTU2LTk0LjI3My5hbXpuMmludC54ODZfNjQgT3BlbkpES182NC1CaXRfU2VydmVyX1ZNLzI1LjMxMi1iMDcgamF2YS8xLjguMF8zMTIgdmVuZG9yL09yYWNsZV9Db3Jwb3JhdGlvbiBjZmcvcmV0cnktbW9kZS9zdGFuZGFyZCIsInJlcXVlc3RQYXJhbWV0ZXJzIjp7IlN0YW5kYXJkc1N1YnNjcmlwdGlvbkFybnMiOlsiYXJuOmF3czpzZWN1cml0eWh1Yjp1cy13ZXN0LTI6MjIwMzA3MjAyMzYyOnN1YnNjcmlwdGlvbi9hd3MtZm91bmRhdGlvbmFsLXNlY3VyaXR5LWJlc3QtcHJhY3RpY2VzL3YvMS4wLjAiXX0sInJlc3BvbnNlRWxlbWVudHMiOnsiU3RhbmRhcmRzU3Vic2NyaXB0aW9ucyI6W3siU3RhbmRhcmRzQXJuIjoiYXJuOmF3czpzZWN1cml0eWh1Yjp1cy13ZXN0LTI6OnN0YW5kYXJkcy9hd3MtZm91bmRhdGlvbmFsLXNlY3VyaXR5LWJlc3QtcHJhY3RpY2VzL3YvMS4wLjAiLCJTdGFuZGFyZHNJbnB1dCI6e30sIlN0YW5kYXJkc1N0YXR1cyI6IkRFTEVUSU5HIiwiU3RhbmRhcmRzU3Vic2NyaXB0aW9uQXJuIjoiYXJuOmF3czpzZWN1cml0eWh1Yjp1cy13ZXN0LTI6MjIwMzA3MjAyMzYyOnN1YnNjcmlwdGlvbi9hd3MtZm91bmRhdGlvbmFsLXNlY3VyaXR5LWJlc3QtcHJhY3RpY2VzL3YvMS4wLjAiLCJTdGFuZGFyZHNTdGF0dXNSZWFzb24iOnsiU3RhdHVzUmVhc29uQ29kZSI6Ik5PX0FWQUlMQUJMRV9DT05GSUdVUkFUSU9OX1JFQ09SREVSIn19XX0sInJlcXVlc3RJRCI6IjcyYzVjODYyLTJmOWEtNDBjYS05NDExLTY2YzIxMTcyNjIxMCIsImV2ZW50SUQiOiI3YWY4NjFiZS03YjExLTRmOTQtOWZlYS0yYTgyZjg5NDIxNWYiLCJyZWFkT25seSI6ZmFsc2UsImV2ZW50VHlwZSI6IkF3c0FwaUNhbGwiLCJtYW5hZ2VtZW50RXZlbnQiOnRydWUsInJlY2lwaWVudEFjY291bnRJZCI6IjIyMDMwNzIwMjM2MiIsImV2ZW50Q2F0ZWdvcnkiOiJNYW5hZ2VtZW50In19'}
This code helped me with above issue
def lambda_handler(event, context):
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
remove_special_char = json.loads(payload, object_pairs_hook=clean_keys)
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps(remove_special_char).encode('utf-8') + b'\n').decode('utf-8')
}
output.append(output_record)
print('Processed {} records.'.format(len(event['records'])))
return {'records': output}
I am parsing a scraped html page that contains a script with JSON inside. This JSON contains all info I am looking for but I can't figure out how to extract a valid JSON.
Minimal example:
my_string = '
(function(){
window.__PRELOADED_STATE__ = window.__PRELOADED_STATE__ || [];
window.__PRELOADED_STATE__.push(
{ *placeholder representing valid JSON inside* }
);
})()
'
The json inside is valid according to jsonlinter.
The result should be loaded into a dictionary:
import json
import re
my_json = re.findall(r'.*(?={\").*', my_string)[0] // extract json
data = json.loads(my_json)
// print(data)
regex: https://regex101.com/r/r0OYZ0/1
This try results in:
>>> data = json.loads(my_json)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 7 (char 6)
How can the JSON be extracted and loaded from the string with Python 3.7.x?
you can try to extract this regex, its a very simple case and might not answerto all possible json variations:
my_string = '''
(function(){
window.__PRELOADED_STATE__ = window.__PRELOADED_STATE__ || [];
window.__PRELOADED_STATE__.push(
{"tst":{"f":3}}
);
})()
'''
result = re.findall(r"push\(([{\[].*\:.*[}\]])\)",string3)[0]
result
>>> '{ "tst":{"f":3}}'
to parse it to dictionary now:
import json
dictionary = json.loads(result)
type(dictionary)
>>>dict
Have a look at the below. Note that { *placeholder representing valid JSON inside* } has to be a valid JSON.
my_string = '''
<script>
(function(){
window.__PRELOADED_STATE__ = window.__PRELOADED_STATE__ || [];
window.__PRELOADED_STATE__.push(
{"foo":["bar1", "bar2"]}
);
})()
</script>
'''
import re, json
my_json = re.findall(r'.*(?={\").*', my_string)[0].strip()
data = json.loads(my_json)
print(data)
Output:
{'foo': ['bar1', 'bar2']}
The my_string provided here is not valid JSON. For valid JSON, you can use json.loads(JSON_STRING)
import json
d = json.loads('{"test":2}')
print(d) # Prints the dictionary `{'test': 2}`
I have following JSON structure coming from a Flask Rest API.
It is more than one JSON based on how many assets we query for and I am not able
to convert it to Pandas dataframe.
from flask import Flask
import requests
import pandas as pd
import json
url = "http://localhost:5000/getpqdata"
random_cols = ['AAPL', 'MSFT']
JsonOutput = {'Assets': random_cols}
headers = {'Content-type': 'application/json'}
response = requests.post(url, json=JsonOutput, headers=headers)
rawdata = response.text
rawdata is coming as below:
rawdata = '''[{"APPL": 1.067638}, {"AAPL": -1.996081}]
[{"MSFT": 0.086638}, {"MSFT": -0.926081}]'''
data = json.loads(rawdata)
df = pd.DataFrame(data)
print(df)
It gives following error.
C:\Python36>python D:\Python\pyarrow\RestfulAPI\test.py
Traceback (most recent call last):
File "D:\Python\pyarrow\RestfulAPI\test.py", line 36, in <module>
data = json.loads(rawdata)
File "C:\Python36\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Python36\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 13 (char 54)
The problem you are having does not have anything to do with pandas but rather with the JSON decoding. json.loads(...) only supports one JSON object. Your rawdata has 2 JSON objects in it. Thus when it reaches the second line it tells you there is extra data. You can see a potential solution to that in this answer.
In short, you can do something like this:
def parse_json_stream(stream):
decoder = json.JSONDecoder()
while stream:
obj, idx = decoder.raw_decode(stream)
yield obj
stream = stream[idx:].lstrip()
parsed_data = list(parse_json_stream(rawdata))
print(parsed_data)
[[{'APPL': 1.067638}, {'AAPL': -1.996081}], [{'MSFT': 0.086638}, {'MSFT': -0.926081}]]
As for converting it to a DataFrame, it depends on how you want to organize your data.
I looked up lots of similar question, but sadly none of them are close to mine.
I have a simple script that checks balance from exchange. It is a part of an unofficial API wrapper written in python and my understanding is it stuck somewhere between python 2 and python 3. I fixed errors one by one, but I'm completely stuck with this one. Here is the code:
import urllib.parse
import urllib.request
import json
import time
import hmac,hashlib
class Poloniex():
def __init__(self, APIKey, Secret):
self.APIKey = APIKey
self.Secret = Secret
def api_query(self, command, req={}):
self.req = bytes(req, 'utf-8')
req['command'] = command
req['nonce'] = int(time.time()*1000)
post_data = urllib.parse.quote(req)
my_key = self.Secret
my_key_bytes = bytes(my_key, 'utf-8')
post_data_bytes = bytes(post_data, 'utf-8')
sign = hmac.new(my_key_bytes, post_data_bytes, hashlib.sha512).hexdigest()
headers = {
'Sign': sign,
'Key': my_key_bytes,
#'Content-Type': 'application/json'
}
ret = urllib.request.urlopen(
urllib.parse.quote('https://poloniex.com/tradingApi', safe=':/'), post_data_bytes,
headers)
jsonRet = json.loads(ret.read())
return self.post_process(jsonRet)
def returnBalances(self):
return self.api_query('returnBalances')
inst = Poloniex("AAA-BBB", "123abcdef")
balance = inst.returnBalances()
print(balance)
Looks like I have a problem with syntax, but even after RTM I can't figure this out. It throws me:
TypeError: encoding without a string argument
and before that I had:
TypeError: quote_from_bytes() expected bytes
which was 'fixed' by
self.req = bytes(req, 'utf-8')
Can anybody please point me in the right direction?
Thank you.
UPD
sorry, forgot the traceback
Traceback (most recent call last):
File "script.py", line 43, in <module>
balance = inst.returnBalances()
File "script.py", line 37, in returnBalances
return self.api_query('returnBalances')
File "script.py", line 18, in api_query
post_data = urllib.parse.quote(req)
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/parse.py", line 775, in quote
return quote_from_bytes(string, safe)
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/parse.py", line 800, in quote_from_bytes
raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes
In your code, req is a dictionary, but you're attempting to convert it to bytes here: self.req = bytes(req, 'utf-8'), which doesn't make sense since only strings can be converted this way.
The second error is caused by the fact that urllib.parse.quote works only with strings and bytes, but you're passing it a dictionary.
I am making an API call and the response has unicode characters. Loading this response into a file throws the following error:
'ascii' codec can't encode character u'\u2019' in position 22462
I've tried all combinations of decode and encode ('utf-8').
Here is the code:
url = "https://%s?start_time=%s&include=metric_sets,users,organizations,groups" % (api_path, start_epoch)
while url != None and url != "null" :
json_filename = "%s/%s.json" % (inbound_folder, start_epoch)
try:
resp = requests.get(url,
auth=(api_user, api_pwd),
headers={'Content-Type': 'application/json'})
except requests.exceptions.RequestException as e:
print "|********************************************************|"
print e
return "Error: {}".format(e)
print "|********************************************************|"
sys.exit(1)
try:
total_records_extracted = total_records_extracted + rec_cnt
jsonfh = open(json_filename, 'w')
inter = resp.text
string_e = inter#.decode('utf-8')
final = string_e.replace('\\n', ' ').replace('\\t', ' ').replace('\\r', ' ')#.replace('\\ ',' ')
encoded_data = final.encode('utf-8')
cleaned_data = json.loads(encoded_data)
json.dump(cleaned_data, jsonfh, indent=None)
jsonfh.close()
except ValueError as e:
tb = traceback.format_exc()
print tb
print "|********************************************************|"
print e
print "|********************************************************|"
sys.exit(1)
Lot of developers have faced this issue. a lot of places have asked to use .decode('utf-8') or having a # _*_ coding:utf-8 _*_ at the top of python.
It is still not helping.
Can someone help me with this issue?
Here is the trace:
Traceback (most recent call last):
File "/Users/SM/PycharmProjects/zendesk/zendesk_tickets_api.py", line 102, in main
cleaned_data = json.loads(encoded_data)
File "/Users/SM/anaconda/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/Users/SM/anaconda/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/SM/anaconda/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 2826494 (char 2826493)
|********************************************************|
Invalid \escape: line 1 column 2826494 (char 2826493)
inter = resp.text
string_e = inter#.decode('utf-8')
encoded_data = final.encode('utf-8')
The text property is a Unicode character string, decoded from the original bytes using whatever encoding the Requests module guessed might be in use from the HTTP headers.
You probably don't want that; JSON has its own ideas about what the encoding should be, so you should let the JSON decoder do that by taking the raw response bytes from resp.content and passing them straight to json.loads.
What's more, Requests has a shortcut method to do the same: resp.json().
final = string_e.replace('\\n', ' ').replace('\\t', ' ').replace('\\r', ' ')#.replace('\\ ',' ')
Trying to do this on the JSON-string-literal formatted input is a bad idea: you will miss some valid escapes, and incorrectly unescape others. Your actual error is nothing to do with Unicode at all, it's that this replacement is mangling the input. For example consider the input JSON:
{"message": "Open the file C:\\newfolder\\text.txt"}
after replacement:
{"message": "Open the file C:\ ewfolder\ ext.txt"}
which is clearly not valid JSON.
Instead of trying to operate on the JSON-encoded string, you should let json decode the input and then filter any strings you have in the structured output. This may involve using a recursive function to walk down into each level of the data looking for strings to filter. eg
def clean(data):
if isinstance(data, basestring):
return data.replace('\n', ' ').replace('\t', ' ').replace('\r', ' ')
if isinstance(data, list):
return [clean(item) for item in data]
if isinstance(data, dict):
return {clean(key): clean(value) for (key, value) in data.items()}
return data
cleaned_data = clean(resp.json())