I am getting following response from an API, I want to extract Phone Number from this object in Python. How can I do that?
{
"ParsedResults": [
{
"TextOverlay": {
"Lines": [
{
"Words": [
{
"WordText": "+971555389583", //this field
"Left": 0,
"Top": 5,
"Height": 12,
"Width": 129
}
],
"MaxHeight": 12,
"MinTop": 5
}
],
"HasOverlay": true,
"Message": "Total lines: 1"
},
"TextOrientation": "0",
"FileParseExitCode": 1,
"ParsedText": "+971555389583 \r\n",
"ErrorMessage": "",
"ErrorDetails": ""
}
],
"OCRExitCode": 1,
"IsErroredOnProcessing": false,
"ProcessingTimeInMilliseconds": "308",
"SearchablePDFURL": "Searchable PDF not generated as it was not requested."**strong text**}
Store the API response to a variable. Let's call it response.
Now convert the JSON string to a Python dictionary using the json module.
import json
response_dict = json.loads(response)
Now traverse through the response_dict to get the required text.
phone_number = response_dict["ParsedResults"][0]["TextOverlay"]["Lines"][0]["Words"][0]["WordText"]
Wherever the dictionary value is an array, [0] is used to access the first element of the array. If you want to access all elements of the array, you will have to loop through the array.
You have to parse the resulting stirng into a dictionary using the library json, afterwards you can traverse the result by looping over the json structure like this:
import json
raw_output = '{"ParsedResults": [ { "Tex...' # your api response
json_output = json.loads(raw_output)
# iterate over all lists
phone_numbers = []
for parsed_result in json_output["ParsedResults"]:
for line in parsed_result["TextOverlay"]["Lines"]:
# now add all phone numbers in "Words"
phone_numbers.extend([word["WordText"] for word in line["Words"]])
print(phone_numbers)
You may want to check if all keys exist within that process, depending on the API you use, like
# ...
for line in parsed_result["TextOverlay"]["Lines"]:
if "Words" in line: # make sure key exists
phone_numbers.extend([word["WordText"] for word in line["Words"]])
# ...
Related
I am using python 3.6.3a. I would like to generate payload for each of the json records. I am using each variable to access the record. How to assign variable value (each in this case) in payload? I tried {each} and other methods but didn't work.
code snippet below.
json_records = [{"description":"<p>This is scenario1<\/p>","owner":"deb",
"priority":"high"},
{"description":"<p>This is scenario2<\/p>","owner":"deb",
"priority":"medium"}]
json_object = json.loads(json_records)
for each in json_object:
payload = """
{
"subject": "test",
"fieldValues": [
{each}
]
}
"""
There are two ways to approach this problem.
One way could be creating a dict() object and inserting keys as you wish, then json.dumps(object) to convert into string payload as in:
import json
json_records = [{"description":"This is scenario1</p>","owner":"deb","priority":"high"}
,{"description":"This is scenario2</p>","owner":"deb","priority":"medium"}]
for obj in json_records:
payload = dict()
payload['subject'] = 'test'
for key,value in obj.items():
payload['fieldName'] = {
key:value
}
print(json.dumps(payload))
#{"subject": "test", "fieldName": {"priority": "high"}}
#{"subject": "test", "fieldName": {"priority": "medium"}}
Second way is to create a textual payload from string as in, however if you need a valid JSON at the end, this would require a post-step of validation (something like try json.loads(payload) - So I'd just use the first method. I would use this method only if I have a specific requirements to generate the payload in a certain way.
import json
json_records = [{"description":"This is scenario1</p>","owner":"deb","priority":"high"}
,{"description":"This is scenario2</p>","owner":"deb","priority":"medium"}]
# json_object = json.loads(json_records) # json.loads works only on byte-like strings. your object is already in python in this case.
for obj in json_records:
payload = """
{
"subject": "test",
"fieldValues": [
%s
]
}
""" % (obj["priority"])
print(payload)
#{
# "subject": "test",
# "fieldValues": [
# high
# ]
# }
#
#
# {
# "subject": "test",
# "fieldValues": [
# medium
# ]
# }
You could make payload a Template string and use it to put the data in each JSON record into the format you want. Bracket {} characters have not special meaning in Templates, which is what makes using them easy.
Doing that will create a valid string representation of a dictionary containing everything. You can turn this into an actual Python dictionary data-structure using the ast.literal_eval() function, and then convert that into JSON string format — which I think is the final format you're after.
rom ast import literal_eval
import json
from string import Template
from textwrap import dedent
json_records = '''[{"description":"<p>This is scenario1<\/p>","owner":"deb",
"priority":"high"},
{"description":"<p>This is scenario2<\/p>","owner":"deb",
"priority":"medium"}]'''
json_object = json.loads(json_records)
payload = Template(dedent("""
{
"subject": "test",
"fieldValues": [
$each
]
}""")
)
for each in json_object:
obj = literal_eval(payload.substitute(dict(each=each)))
print(json.dumps(obj, indent=2))
Output:
{
"subject": "test",
"fieldValues": [
{
"description": "<p>This is scenario1</p>",
"owner": "deb",
"priority": "high"
}
]
}
{
"subject": "test",
"fieldValues": [
{
"description": "<p>This is scenario2</p>",
"owner": "deb",
"priority": "medium"
}
]
}
I have a JSON String like this:
{
"Sommersprossen": {
"count": 5,
"lastSeenTime": 1586959168567
},
"inkognito": {
"count": 7,
"lastSeenTime": 1586901586361
},
"Marienkäfer": {
"count": 7,
"lastSeenTime": 1586958264090,
"difficulty": 0.8902439024390244,
"difficultyWeight": 82
},
"Zaun": {
"count": 8,
"lastSeenTime": 1586958848320
},
Now I want to print something like this: Sommersprossen, inkognito, Marienkäfer, Zaun.
But I don't know how to...
My existing Python code is:
url = "https://skribbliohints.github.io/German.json"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
But the problem is I cant find a way to print the objects names.
A json is essentially just a dictionary. You can access it like any other python dictionary.
url = "https://skribbliohints.github.io/German.json"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
roots = list(data.keys())
You can treat this json output as dictionary and print its Keys, for this exapmle you can simply perform it with:
data.keys()
Output:
dict_keys(['Sommersprossen', 'inkognito', 'Marienkäfer', 'Zaun'])
It’s a dict, so you can do:
“, “.join(data.keys())
I'm trying to scrape a website and get items list from it using python. I parsed the html using BeaufitulSoup and made a JSON file using json.loads(data). The JSON object looks like this:
{ ".1768j8gv7e8__0":{
"context":{
//some info
},
"pathname":"abc",
"showPhoneLoginDialog":false,
"showLoginDialog":false,
"showForgotPasswordDialog":false,
"isMobileMenuExpanded":false,
"showFbLoginEmailDialog":false,
"showRequestProductDialog":false,
"isContinueWithSite":true,
"hideCoreHeader":false,
"hideVerticalMenu":false,
"sequenceSeed":"web-157215950176521",
"theme":"default",
"offerCount":null
},
".1768j8gv7e8.6.2.0.0__6":{
"categories":[
],
"products":{
"count":12,
"items":[
{
//item info
},
{
//item info
},
{
//item info
}
],
"pageSize":50,
"nextSkip":100,
"hasMore":false
},
"featuredProductsForCategory":{
},
"currentCategory":null,
"currentManufacturer":null,
"type":"Search",
"showProductDetail":false,
"updating":false,
"notFound":false
}
}
I need the items list from product section. How can I extract that?
Just do:
products = jsonObject[list(jsonObject.keys())[1]]["products"]["items"]
import json packagee and map every entry to a list of items if it has any:
This solution is more universal, it will check all items in your json and find all the items without hardcoding the index of an element
import json
data = '{"p1": { "pathname":"abc" }, "p2": { "pathname":"abcd", "products": { "items" : [1,2,3]} }}'
# use json package to convert json string to dictionary
jsonData = json.loads(data)
type(jsonData) # dictionary
# use "list comprehension" to iterate over all the items in json file
# itemData['products']["items"] - select items from data
# if "products" in itemData.keys() - check if given item has products
[itemData['products']["items"] for itemId, itemData in jsonData.items() if "products" in itemData.keys()]
Edit: added comments to code
I'll just call the URL of the JSON file you got from BeautifulSoup "response" and then put in a sample key in the items array, like itemId:
import json
json_obj = json.load(response)
array = []
for i in json_obj['items']:
array[i] = i['itemId']
print(array)
Okay, so I've been banging my head on this for the last 2 days, with no real progress. I am a beginner with python and coding in general, but this is the first issue I haven't been able to solve myself.
So I have this long file with JSON formatting with about 7000 entries from the youtubeapi.
right now I want to have a short script to print certain info ('videoId') for a certain dictionary key (refered to as 'key'):
My script:
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key']['Items']['id']['videoId'])
# print(trailers['key']['videoId'] gives same response
Error:
print(trailers['key']['Items']['id']['videoId'])
TypeError: string indices must be integers
It does work when I want to print all the information for the dictionary key:
This script works
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key'])
Also print(type(trailers)) results in class 'dict', as it's supposed to.
My JSON File is formatted like this and is from the youtube API, youtube#searchListResponse.
{
"kind": "youtube#searchListResponse",
"etag": "",
"nextPageToken": "",
"regionCode": "",
"pageInfo": {
"totalResults": 1000000,
"resultsPerPage": 1
},
"items": [
{
"kind": "youtube#searchResult",
"etag": "",
"id": {
"kind": "youtube#video",
"videoId": ""
},
"snippet": {
"publishedAt": "",
"channelId": "",
"title": "",
"description": "",
"thumbnails": {
"default": {
"url": "",
"width": 120,
"height": 90
},
"medium": {
"url": "",
"width": 320,
"height": 180
},
"high": {
"url": "",
"width": 480,
"height": 360
}
},
"channelTitle": "",
"liveBroadcastContent": "none"
}
}
]
}
What other information is needed to be given for you to understand the problem?
The following code gives me all the videoId's from the provided sample data (which is no id's at all in fact):
import json
with open('sampledata', 'r') as datafile:
data = json.loads(datafile.read())
print([item['id']['videoId'] for item in data['items']])
Perhaps you can try this with more data.
Hope this helps.
I didn't really look into the youtube api but looking at the code and the sample you gave it seems you missed out a [0]. Looking at the structure of json there's a list in key items.
import json
f = open ('json1.json', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['items'][0]['id']['videoId'])
I've not used json before at all. But it's basically imported in the form of dicts with more dicts, lists etc. Where applicable. At least from my understanding.
So when you do type(trailers) you get type dict. Then you do dict with trailers['key']. If you do type of that, it should also be a dict, if things work correctly. Working through the items in each dict should in the end find your error.
Pythons error says you are trying find the index/indices of a string, which only accepts integers, while you are trying to use a dict. So you need to find out why you are getting a string and not dict when using each argument.
Edit to add an example. If your dict contains a string on key 'item', then you get a string in return, not a new dict which you further can get a dict from. item in the json for example, seem to be a list, with dicts in it. Not a dict itself.
I have the below Python code
from flask import Flask, jsonify, json
app = Flask(__name__)
with open('C:/test.json', encoding="latin-1") as f:
dataset = json.loads(f.read())
#app.route('/api/PDL/<string:dataset_identifier>', methods=['GET'])
def get_task(dataset_identifier):
global dataset
dataset = [dataset for dataset in dataset if dataset['identifier'] == dataset_identifier]
if len(task) == 0:
abort(404)
return jsonify({'dataset': dataset})
if __name__ == '__main__':
app.run(debug=True)
Test.json looks like this:
{
"dataset": [{
"bureauCode": [
"016:00"
],
"description": "XYZ",
"contactPoint": {
"fn": "AG",
"hasEmail": "mailto:AG#AG.com"
},
"distribution": [
{
"format": "XLS",
"mediaType": "application/vnd.ms-excel",
"downloadURL": "https://www.example.com/xyz.xls"
}
],
"programCode": [
"000:000"
],
"keyword": [ "return to work",
],
"modified": "2015-10-14",
"title": "September 2015",
"publisher": {
"name": "abc"
},
"identifier": US-XYZ-ABC-36,
"rights": null,
"temporal": null,
"describedBy": null,
"accessLevel": "public",
"spatial": null,
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"references": [
"http://www.example.com/example.html"
]
}
],
"conformsTo": "https://example.com"
}
When I pass the variable in the URL like this: http://127.0.0.1:5000/api/PDL/1403
I get the following error: TypeError: string indices must be integers
Knowing that the "identifier" field is a string and I am passing the following in the URL:
http://127.0.0.1:5000/api/PDL/"US-XYZ-ABC-36"
http://127.0.0.1:5000/api/PDL/US-XYZ-ABC-36
I keep getting the following error:
TypeError: string indices must be integers
Any idea on what am I missing here? I am new to Python!
The problem is that you are trying to iterate the dictionary instead of the list of datasources inside it. As a consequence, you're iterating through the keys of the dictionary, which are strings. Additionaly, as it was mentioned by above, you will have problems if you use the same name for the list and the iterator variable.
This worked for me:
[ds for ds in dataset['dataset'] if ds['identifier'] == dataset_identifier]
The problem you have right now is that during iteration in the list comprehension, the very first iteration changes the name dataset from meaning the dict you json.loads-ed to a key of that dict (dicts iterate their keys). So when you try to look up a value in dataset with dataset['identifier'], dataset isn't the dict anymore, it's the str key of you're currently iterating.
Stop reusing the same name to mean different things.
From the JSON you posted, what you probably want is something like:
with open('C:/test.json', encoding="latin-1") as f:
alldata = json.loads(f.read())
#app.route('/api/PDL/<string:dataset_identifier>', methods=['GET'])
def get_task(dataset_identifier):
# Gets the list of data objects from top level object
# Could be inlined into list comprehension, replacing dataset with alldata['dataset']
dataset = alldata['dataset']
# data is a single object in that list, which should have an identifier key
# data_for_id is the list of objects passing the filter
data_for_id = [data for data in dataset if data['identifier'] == dataset_identifier]
if len(task) == 0:
abort(404)
return jsonify({'dataset': data_for_id})