Python pretty printing JSON from API to a file - python

I got some data from an API with Python, and I'm trying to print it to a file. My understanding was that the indent argument lets you pretty print. Here's my code:
import urllib2, json
APIKEY_VALUE = "APIKEY"
APIKEY = "?hapikey=" + APIKEY_VALUE
HS_API_URL = "http://api.hubapi.com"
def getInfo():
xulr = "/engagements/v1/engagements/paged"
url = HS_API_URL + xulr + APIKEY + params
response = urllib2.urlopen(url).read()
with open("hubdataJS.json", "w") as outfile:
json.dump(response, outfile, sort_keys=True, indent=4, ensure_ascii=False)
getInfo()
What I expected hubdataJS.json to look like when I opened it in Sublime text is some JSON with a format like this:
{
a: some data
b: [
some list of data,
more data
]
c: some other data
}
What I got instead was all the data on one line, in quotes (I thought dumps was for outputting as a string), with lots of \s, \rs, and \ns.
Confused about what I'm doing wrong.

in your code, response is a bytestring that contains the data serialized in the json format. When you do json.dump you're serializing the string to json. You end up with a json formatted file containing a string, and in that string you have another json data, so, json inside json.
To solve that you have to decode (deserialize) the bytestring data you got from the internet, before reencoding it to json to write in the file.
response = json.load(urllib2.urlopen(url))
that will convert the serialized data from the web into a real python object.

Related

Read a specific value from a json file with Python

I am getting a JSON file from a curl request and I want to read a specific value from it.
Suppose that I have a JSON file, like the following one. How can I insert the "result_count" value into a variable?
Currently, after getting the response from curl, I am writing the JSON objects into a txt file like this.
json_response = connect_to_endpoint(url, headers)
f.write(json.dumps(json_response, indent=4, sort_keys=True))
Your json_response isn't a JSON content (JSON is a formatted string), but a python dict, you can access it using the keys
res_count = json_response['meta']['result_count']
Use the json module from the python standard library.
data itself is just a python dictionary, and can be accessed as such.
import json
with open('path/to/file/filename.json') as f:
data = json.load(f)
result_count = data['meta']['result_count']
you can parse a JSON string using json.loads() method in json module.
response = connect_to_endpoint(url, headers)
json_response = json.load(response)
after that you can extract an element with specify element name in Brackets
result_count = ['meta']['result_count']

Expecting property name enclosed in double quotes

If my json file is huge it contains to many dictionaries and lists inside the dictionary and it is enclosed with double quotes means how can i proceed that. what is the deserialize? How to use the deserialize?
Use json module.
If you are having json in one file then you can use:
with open("json_data.json", "r") as data:
print(json.load(data))
OR
with open("json_data.json", "r") as data:
print(json.loads(data.read()))
If you are having json in any var, you can use:
jsonData = '{}'
jsonVal = json.loads(jsonData)
There is a package called json in python, which you can use to serialize and deserialize a dictionary.
If you want to serialize using the following:
with open("huge_json_file.json", "r") as data
json_str = json.dumps(data)
If you want to de-serialize using the following:
with open("huge_json_file.json", "r") as data
json_dict = json.loads(data)

Uploading a csv type data using python request.put without reading from a saved csv file?

i have an api end point where i am uploading data to using python. end point accepts
putHeaders = {
'Authorization': user,
'Content-Type': 'application/octet-stream' }
My current code is doing this
.Save a dictionary as csv file
.Encode csv to utf8
dataFile = open(fileData['name'], 'r').read()).encode('utf-8')
.Upload file to api end point
fileUpload = requests.put(url,
headers=putHeaders,
data=(dataFile))
What i am trying to acheive is
loading the data without saving
so far i tried
converting my dictionary to bytes using
data = json.dumps(payload).encode('utf-8')
and loading to api end point . This works but the output in api end point is not correct.
Question
Does anyone know how to upload csv type data without actually saving the file ?
EDIT: use io.StringIO() as your file-like object when your writing your dict to csv. Then call get_value() and pass that as your data param to requests.put().
See this question for more details: How do I write data into CSV format as string (not file)?.
Old answer:
If your dict is this:
my_dict = {'col1': 1, 'col2': 2}
then you could convert it to a csv format like so:
csv_data = ','.join(list(my_dict.keys()))
csv_data += ','.join(list(my_dict.values()))
csv_data = csv_data.encode('utf8')
And then do your requests.put() call with data=csv_data.
Updated answer
I hadn't realized your input was a dictionary, you had mentioned the dictionary was being saved as a file. I assumed the dictionary lookup in your code was referencing a file. More work needs to be done if you want to go from a dict to a CSV file-like object.
Based on the I/O from your question, it appears that your input dictionary has this structure:
file_data = {"name": {"Col1": 1, "Col2": 2}}
Given that, I'd suggest trying the following using csv and io:
import csv
import io
import requests
session = requests.Session()
session.headers.update(
{"Authorization": user, "Content-Type": "application/octet-stream"}
)
file_data = {"name": {"Col1": 1, "Col2": 2}}
with io.StringIO() as f:
name = file_data["name"]
writer = csv.DictWriter(f, fieldnames=name)
writer.writeheader()
writer.writerows([name]) # `data` is dict but DictWriter expects list of dicts
response = session.put(url, data=f)
You may want to test using the correct MIME type passed in the request header. While the endpoint may not care, it's best practice to use the correct type for the data. CSV should be text/csv. Python also provides a MIME types module:
>>> import mimetypes
>>>
>>> mimetypes.types_map[".csv"]
'text/csv'
Original answer
Just open the file in bytes mode and rather than worrying about encoding or reading into memory.
Additionally, use a context manager to handle the file rather than assigning to a variable, and pass your header to a Session object so you don't have to repeatedly pass header data in your request calls.
Documentation on the PUT method:
https://requests.readthedocs.io/en/master/api/#requests.put
data – (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the Request.
import requests
session = requests.Session()
session.headers.update(
{"Authorization": user, "Content-Type": "application/octet-stream"}
)
with open(file_data["name"], "rb") as f:
response = session.put(url, data=f)
Note: I modified your code to more closely follow python style guides.

Some way to covert the string representation of a pdf into bytes in python

i'm actually trying to do something that i do not know if its ok.
Problem:
I have a web client and a web server, the server (written in python with flask) processes a pdf file in order to get some data, and the client just send the pdf file and waits for the response. The think is that the client can send various pdf files to process and what i want to do is, to send all the pdfs from the client to the server in just one request.
What I have planned to do:
I was thinking on convert the Blob of each pdf in a String and send a POST Request with a JSON body like this:
BODY:
{
"content":[
{"name": "pdf_name_1.pdf", "data": "some blob data converted to string"},
{"name": "pdf_name_2.pdf", "data": "some blob data converted to string"},
{"name": "pdf_name_3.pdf", "data": "some blob data converted to string"},
...
]
}
So then in the server i was thinking to convert again the data into a blob(bytes) in order to write down the pdf a start the processing the data.
My question:
Is there any way to convert the str representation of the pdf to bytes in order to write down in disk the pdf with python?
Thanks a lot, if some one come up with another idea to send bunch of pdfs in only one request let me know please.
pd: I'm using python 3.5 and Flask for the web server.
In such cases, it's preferred to send file data passing that with the files keyword, like so:
import requests
def send_pdf_data(filename_list, encoded_pdf_data):
files = {}
for (filename, encoded, index) in zip(filename_list, encoded_pdf_data, range(len(filename_list))):
files[f"pdf_name_[index].pdf"] = (filename, open(filename, 'rb'), 'application/pdf')
data = {}
# *Put whatever you want in data dict*
requests.post("http://yourserveradders", data=data, files=files)
def main():
filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
pdf_blob_data = [open(filename, 'wb').read() for filename
in filename_list]
if __name__ == '__main__':
main()
However, if you really want to pass data as json, you should use base-64 module as #Mark Ransom mentioned.
You can implement it in this way:
import requests
import json
import base64
def encode(data: bytes):
"""
Return base-64 encoded value of binary data.
"""
return base64.b64encode(data)
def decode(data: str):
"""
Return decoded value of a base-64 encoded string.
"""
return base64.b64decode(data.encode())
def get_pdf_data(filename):
"""
Open pdf file in binary mode,
return a string encoded in base-64.
"""
with open(filename, 'rb') as file:
return encode(file.read())
def send_pdf_data(filename_list, encoded_pdf_data):
data = {}
# *Put whatever you want in data dict*
# Create content dict.
content = [dict([("name", filename), ("data", pdf_data)])
for (filename, data) in zip(filename_list, encoded_pdf_data)]
data["content"] = content
data = json.dumps(data) # Convert it to json.
requests.post("http://yourserveradders", data=data)
def main():
filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
pdf_blob_data = [get_pdf_data(filename) for filename
in filename_list]
if __name__ == '__main__':
main()

Why this '/' in my json

I have this
import json
header = """{"fields": ["""
print(header)
with open('fields.json', 'w') as outfile:
json.dump(header, outfile)
and the return of print it's ok:
{"fields": [
but what is in the fields.json it's
"{\"fields\": ["
Why and how can I solve it?
Thanks
There is no issue there. The \ in the file is to escape the ". To see this read the json back and print it. So to add to your code
import json
header = """{"fields": ["""
print(header)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json', 'w') as outfile:
json.dump(header, outfile)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json') as data_file:
data = json.load(data_file)
print(data)
you will see the data is printed as {"fields": [
The JSON you are trying to write is bring treated as a string
so " is converted to \" . to avoid that we need to decode json using json.loads() before writing to file
The json should be complete or else json.loads() will throw an error
import json
header = """{"name":"sk"}"""
h = json.loads(header)
print(header)
with open('fields.json', 'w') as outfile:
json.dump(h, outfile)
First of all, it's not /, it's \.
Second, it's not in your JSON, because you don't really seem to have a JSON here, you have just a string. And json.dump() converts the string into a string escaped for JSON format, replacing all " with \".
By the way, you should not try to write incomplete JSON as a string yourself, it's better just to make a dictionary {"fields":[]}, then fill it with all values you want and later save into file through json.dump().

Categories