I have following JSON structure coming from a Flask Rest API.
It is more than one JSON based on how many assets we query for and I am not able
to convert it to Pandas dataframe.
from flask import Flask
import requests
import pandas as pd
import json
url = "http://localhost:5000/getpqdata"
random_cols = ['AAPL', 'MSFT']
JsonOutput = {'Assets': random_cols}
headers = {'Content-type': 'application/json'}
response = requests.post(url, json=JsonOutput, headers=headers)
rawdata = response.text
rawdata is coming as below:
rawdata = '''[{"APPL": 1.067638}, {"AAPL": -1.996081}]
[{"MSFT": 0.086638}, {"MSFT": -0.926081}]'''
data = json.loads(rawdata)
df = pd.DataFrame(data)
print(df)
It gives following error.
C:\Python36>python D:\Python\pyarrow\RestfulAPI\test.py
Traceback (most recent call last):
File "D:\Python\pyarrow\RestfulAPI\test.py", line 36, in <module>
data = json.loads(rawdata)
File "C:\Python36\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Python36\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 13 (char 54)
The problem you are having does not have anything to do with pandas but rather with the JSON decoding. json.loads(...) only supports one JSON object. Your rawdata has 2 JSON objects in it. Thus when it reaches the second line it tells you there is extra data. You can see a potential solution to that in this answer.
In short, you can do something like this:
def parse_json_stream(stream):
decoder = json.JSONDecoder()
while stream:
obj, idx = decoder.raw_decode(stream)
yield obj
stream = stream[idx:].lstrip()
parsed_data = list(parse_json_stream(rawdata))
print(parsed_data)
[[{'APPL': 1.067638}, {'AAPL': -1.996081}], [{'MSFT': 0.086638}, {'MSFT': -0.926081}]]
As for converting it to a DataFrame, it depends on how you want to organize your data.
Related
I am trying to convert a JSON response I get after calling an API to an excel file. I don't need to add specific headers or get only specific data, I just need everything this call returns.
I have found the tablib library.
I managed to make it working when I fetched data from GitLab but after changing my payload to Tfs I get an error which I am not sure what the problem is or how to resolve.
This is my code:
import requests
import urllib3
import json
from requests.packages.urllib3.exceptions import InsecureRequestWarning
import tablib
import datetime
import time
import os
class gitlab():
def get_closed():
url = "https://IP:443/DefaultCollection/_apis/projects"
payload = {}
querystring = {"api-version":"4.1"}
headers = {
'Content-Type': "application/json-patch+json",
'Authorization': "KEY"
}
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
response = requests.request(
"GET", url, headers=headers, data=payload, params=querystring, verify=False)
return json.loads(response.text)
if __name__ == "__main__":
list_b = gitlab.get_closed()
print(list_b)
data = tablib.Dataset()
data.json = json.dumps(list_b)
data_export = data.export('xlsx')
filename = os.path.dirname(os.path.realpath(__file__)) +'/closed_' + str(datetime.date.today()) + '.xlsx'
with open(filename, 'wb') as f:
f.write(data_export)
f.close()
Executing with python script where python's version is 3.7.2 shows the following error:
{'count': 1, 'value': [{'id': 'ID', 'name': 'TFS', 'url': 'https://TFS/DefaultCollection/_apis/projects/PROJ', 'state': 'wellFormed', 'revision': 00, 'visibility': 'private'}]}
Traceback (most recent call last): File ".\gitlab.py", line 94, in <module> data.json = json.dumps(list_with_bugs, indent=4, ensure_ascii=False) File "C:\Users\marialena\AppData\Local\Programs\Python\Python37\lib\site-packages\tablib\formats\_json.py", line 39, in import_set
dset.dict = json.loads(in_stream)
File "C:\Users\marialena\AppData\Local\Programs\Python\Python37\lib\site-packages\tablib\core.py", line 381, in _set_dict
if isinstance(pickle[0], list):
KeyError: 0
You can see the API response as well in the output. Why is tablib failing to convert to excel?
Dataset.json expects to receive a serialised list. The code in the question is passing a serialised dict, and this is the cause of the error.
Looking at the data, it seems the value of the value key in the dictionary is what is required, so pass that to the dataset.
>>> import json
>>> import tablib
>>> d = {'count': 1, 'value': [{'id': 'ID', 'name': 'TFS', 'url': 'https://TFS/DefaultCollection/_apis/projects/PROJ', 'state': 'wellFormed', 'revision': 00, 'visibility': 'private'}]}
>>> ds = tablib.Dataset()
>>> ds.json = json.dumps(d['value'])
>>> with open('test.xlsx', 'wb') as f:
... f.write(ds.export('xlsx'))
I see that you have raised a bug on the project's issue tracker. While this behaviour arguably isn't a bug, it would certainly be an improvement if the code emitted a more meaningful error message.
I am trying to work with Google Vision and Python. I am using the sample files but I keep getting the same error message:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 416, in Parse
js = json.loads(text, object_pairs_hook=_DuplicateChecker)
File "C:\Program Files (x86)\Python37-32\lib\json\__init__.py", line 361, in l
oads
return cls(**kw).decode(s)
File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 338, in de
code
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 356, in ra
w_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "sample.py", line 72, in <module>
async_detect_document('gs://matr/file_1035.pdf','gs://matr/output/')
File "sample.py", line 59, in async_detect_document
json_string, vision.types.AnnotateFileResponse())
File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 418, in Parse
raise ParseError('Failed to load JSON: {0}.'.format(str(e)))
google.protobuf.json_format.ParseError: Failed to load JSON: Expecting value: li
ne 1 column 1 (char 0).
I am guessing it has something to do with the resulting JSON file. It does produce a JSON file but i guess it should print it out to the command line. Here are the first few lines of the JSON file:
{
"inputConfig": {
"gcsSource": {
"uri": "gs://python-docs-samples-tests/HodgeConj.pdf"
},
"mimeType": "application/pdf"
},
I resulting file does load into a JSON object by using
data = json.load(jsonfile)
I have tried print (json_string) but I only get b'placeholder'
How can I get this to work? I am using Python 3.7.2
My code is below:
def async_detect_document(gcs_source_uri, gcs_destination_uri):
"""OCR with PDF/TIFF as source files on GCS"""
from google.cloud import vision
from google.cloud import storage
from google.protobuf import json_format
import re
# Supported mime_types are: 'application/pdf' and 'image/tiff'
mime_type = 'application/pdf'
# How many pages should be grouped into each json output file.
batch_size = 2
client = vision.ImageAnnotatorClient()
feature = vision.types.Feature(
type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)
gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
input_config = vision.types.InputConfig(
gcs_source=gcs_source, mime_type=mime_type)
gcs_destination = vision.types.GcsDestination(uri=gcs_destination_uri)
output_config = vision.types.OutputConfig(
gcs_destination=gcs_destination, batch_size=batch_size)
async_request = vision.types.AsyncAnnotateFileRequest(
features=[feature], input_config=input_config,
output_config=output_config)
operation = client.async_batch_annotate_files(
requests=[async_request])
print('Waiting for the operation to finish.')
operation.result(timeout=180)
# Once the request has completed and the output has been
# written to GCS, we can list all the output files.
storage_client = storage.Client()
match = re.match(r'gs://([^/]+)/(.+)', gcs_destination_uri)
bucket_name = match.group(1)
prefix = match.group(2)
bucket = storage_client.get_bucket(bucket_name=bucket_name)
# List objects with the given prefix.
blob_list = list(bucket.list_blobs(prefix=prefix))
print('Output files:')
for blob in blob_list:
print(blob.name)
# Process the first output file from GCS.
# Since we specified batch_size=2, the first response contains
# the first two pages of the input file.
output = blob_list[0]
json_string = output.download_as_string()
response = json_format.Parse(
json_string, vision.types.AnnotateFileResponse())
# The actual response for the first page of the input file.
first_page_response = response.responses[0]
annotation = first_page_response.full_text_annotation
# Here we print the full text from the first page.
# The response contains more information:
# annotation/pages/blocks/paragraphs/words/symbols
# including confidence scores and bounding boxes
print(u'Full text:\n{}'.format(
annotation.text))
async_detect_document('gs://my_bucket/file_1035.pdf','gs://my_bucket/output/')
I received an answer from a user on a github page.
https://github.com/GoogleCloudPlatform/python-docs-samples/issues/2086#issuecomment-487635159
I had this issue and determined it was caused by the prefix being iterated as part of the bloblist. I can see that "output/" is listed as a file in your output, and subsequently has parsing attempted on it causing the error.
Try hardcoding a prefix something like prefix = 'output/out' and that folder won't be included in the list.
The demo code should probably be modified to handle this simple case a little better.
import re
def async_detect_document(gcs_source_uri, gcs_destination_uri):
"""OCR with PDF/TIFF as source files on GCS"""
from google.cloud import vision
from google.cloud import storage
from google.protobuf import json_format
# Supported mime_types are: 'application/pdf' and 'image/tiff'
mime_type = 'application/pdf'
# How many pages should be grouped into each json output file.
batch_size = 2
client = vision.ImageAnnotatorClient()
feature = vision.types.Feature(
type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)
gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
input_config = vision.types.InputConfig(
gcs_source=gcs_source, mime_type=mime_type)
gcs_destination = vision.types.GcsDestination(uri=gcs_destination_uri)
output_config = vision.types.OutputConfig(
gcs_destination=gcs_destination, batch_size=batch_size)
async_request = vision.types.AsyncAnnotateFileRequest(
features=[feature], input_config=input_config,
output_config=output_config)
operation = client.async_batch_annotate_files(
requests=[async_request])
print('Waiting for the operation to finish.')
operation.result(timeout=180)
# Once the request has completed and the output has been
# written to GCS, we can list all the output files.
storage_client = storage.Client()
match = re.match(r'gs://([^/]+)/(.+)', gcs_destination_uri)
bucket_name = match.group(1)
prefix = match.group(2)
bucket = storage_client.get_bucket(bucket_name=bucket_name)
print ('prefix: ' + prefix)
prefix = 'output/out'
print ('prefix new: ' + prefix)
# List objects with the given prefix.
blob_list = list(bucket.list_blobs(prefix=prefix))
print('Output files:')
for blob in blob_list:
print(blob.name)
# Process the first output file from GCS.
# Since we specified batch_size=2, the first response contains
# the first two pages of the input file.
output = blob_list[0]
json_string = output.download_as_string()
response = json_format.Parse(
json_string, vision.types.AnnotateFileResponse())
# The actual response for the first page of the input file.
first_page_response = response.responses[0]
annotation = first_page_response.full_text_annotation
# Here we print the full text from the first page.
# The response contains more information:
# annotation/pages/blocks/paragraphs/words/symbols
# including confidence scores and bounding boxes
print(u'Full text:\n{}'.format(
annotation.text))
async_detect_document('gs://my_bucket/my_file.pdf','gs://my_bucket/output/out')
I am trying to run this code but it creates error.
import json
import requests
import pprint
data = []
with open('data.txt') as o1:
for line in o1:
data.append(json.loads(line))
print(data)
print(" \n")
print(data)
url = 'http://xyz.abcdfx.in/devicedata'
body_json=json.dumps(data)
headers = {'Content-Type':'application/json'}
d = requests.post(url, data = body_json, headers=headers)
pprint.pprint(d.json())
it shows
Value Error: No json object could be Decoded
I am new to programming and not able to figure out what is the problem.
It seems like you are trying to parse the json file line by line, but the json objects may (and usually are) span more than one line. You need to have the entire file in order to parse it:
with open('data.txt') as o1:
data = json.loads(o1.read()) # read ALL the file and parse. no loops
print(data)
i solved my problem using this:
data =[]
with open('data.txt') as f:
for line in f:
data = json.loads(line)
print(data)
url = 'http://xyz.abcdfx.cn/devicedata'
body_json=json.dumps(data)
headers = {'Content-Type':'application/json'}
d = requests.post(url, data = body_json, headers=headers)
pprint.pprint(d.json())
Actually,I am new to json-python, and i am getting error of simplejson.scanner.jsondecodeerror:expecting value Expecting value: line 1 column 1 (char 0), i am trying for ["series"]["TimeStamp"] data
import urllib
import simplejson
response = urllib.urlopen("http://chartapi.finance.yahoo.com/instrument/1.0/RUSHIL.NS/chartdata;type=quote;range=5d/json")
#response.read() //this works
data = simplejson.loads(response)
print data //error
I found that your data has some unnecessary words. Response has 'finance_charts_json_callback(' at the first of data. So you should remove this function string. The following code shows.
import urllib
import simplejson
response = urllib.urlopen("http://chartapi.finance.yahoo.com/instrument/1.0/RUSHIL.NS/chartdata;type=quote;range=5d/json")
a = response.read()
a = a[29:-1] # remove function wrap
data = simplejson.loads(a)
print(data)
I get the JSON response by requests.get
req = requests.get(SAMPLE_SCHEDULE_API)
and convert it into dictionary
data = json.loads(req.text)["data"]
When I tried to convert the string into Python dict,
I got ValueError: malformed node or string:
ast.literal_eval(data)
I have no idea how to do this task.
code snippets
def schedules(cls, start_date=None, end_date=None):
import ast
req = requests.get(SAMPLE_SCHEDULE_API)
data = json.loads(req.text)["data"]
ast.literal_eval(data)
return pd.DataFrame(json.loads(req.text)["data"])
JSON response
{
status: "ok",
version: "v1",
data: "[
{"_id":"2015-01-28","end_date":"2015-01-28","estimated_release":1422453600000,"is_projection":false,"is_statement":true,"material_link":null,"start_date":"2015-01-27"},
{"_id":"2015-03-18","end_date":"2015-03-18","estimated_release":1426687200000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-03-17"},
{"_id":"2015-04-29","end_date":"2015-04-29","estimated_release":1430316000000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-04-28"},
{"_id":"2015-06-17","end_date":"2015-06-17","estimated_release":1434549600000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-06-16"},
{"_id":"2015-07-29","end_date":"2015-07-29","estimated_release":1438178400000,"is_projection":false,"is_statement":false,"material_link":null,"start_date":"2015-07-28"}]"
}
Detail error message
Traceback (most recent call last):
File "fomc.py", line 25, in <module>
schedules = FOMC.schedules()
File "fomc.py", line 21, in schedules
ast.literal_eval(data)
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 86, in literal_eval
return _convert(node_or_string)
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 58, in _convert
return list(map(_convert, node.elts))
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 63, in _convert
in zip(node.keys, node.values))
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 62, in <genexpr>
return dict((_convert(k), _convert(v)) for k, v
File "/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/ast.py", line 85, in _convert
raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Name object at 0x10a19c990>
You have encoded the data twice (which would strictly not be necessary). You just need to decode the data again with json.loads:
def schedules(cls, start_date=None, end_date=None):
req = requests.get(SAMPLE_SCHEDULE_API)
data_json = json.loads(req.text)["data"]
data = json.loads(data_json)
return pd.DataFrame(data)
Do note that ast.literal_eval is for Python code, whereas json.loads is for JSON that closely follows JavaScript code; the differences are for example true , false and null vs True, False and None. The former are the javascript syntax as used in JSON (and thus you would need json.loads; the latter is Python code, for which you would use ast.literal_eval.
As the response already is json format, you do not need to encode it. Approach like this,
req = requests.get(SAMPLE_SCHEDULE_API)
data_str = req.json().get('data')
json_data = json.loads(data_str)
json() method will return the json-encoded content of a response.
The field "data" is a string, not a list. The content of that string seems to be JSON, too, so you have JSON encapsulated in JSON for some reason. If you can, fix that so that you only encode as JSON once. If that doesn't work, you can retrieve that field and decode it separately.