How to filter JSON data? - python

import requests
s = requests.Session()
r = s.get(
'https://www.off---white.com/en/GB/men/products/omia066s188000161001.json')
print(r.text)
The code above reruns the following:
{"available_sizes":[{"id":104792,"name":"40","preorder_only":false},
{"id":104794,"name":"42","preorder_only":false},
{"id":104795,"name":"43","preorder_only":false}]}
How would I filter the above data so then when I specify the name value of 40, the id value of 104792 is printed?
In simple terms if I ask for the value of 'name' 40 then the script will print the 'id' value.

You can use method .json() of requests.Reponse.
data = r.json()
try:
value = next(size['id']
for size in data['available_sizes']
if size['name'] == '40')
except StopIteration:
value = None
In value will be stored first size id with name == '40' if such exist, if not None.

data = foo["available_sizes"]
new_dict = {}
for dict_ in data:
new_dict.update({dict_["name"]: dict["id"]})

Related

Parsing Boolean from a json and assigning to a variable does not work when the value is False

Using Python, I've got a function retrieving a list of operations from an API endpoint.
The functino takes a filter argument as a variable in order to filter the results on a given predicate.
Function looks like this:
def list_operations(filter=None):
# make a curl call to the product recognizer
headers = {
'Authorization': 'Bearer {}'.format(creds.token),
'Content-Type': 'application/json',
}
response = requests.get(
'https://{}/v1alpha1/projects/{}/locations/us-central1/operations'.format(API_ENDPOINT, project),
headers=headers
)
# dump the json response and display their names
data = json.loads(response.text)
#add a Metadata element to the operations if it does not exist
for item in data['operations']:
if not item.get('metadata'):
item['metadata'] = {}
item['metadata']['createTime'] = ''
else:
if not item['metadata'].get('createTime'):
item['metadata']['createTime'] = ''
# Order operations by create time if the metadata exists and the createTime exists
data['operations'] = sorted(data['operations'], key=lambda k: k['done'], reverse=True)
if filter:
# filter the operations by the filter value
# Parse the filter value to get the operation name
filter_path = filter.split('=')[0].split('.')
filter_value = filter.split('=')[1]
#check if the filter_value could be a Boolean
if filter_value == 'True':
filter_value = True
elif filter_value == 'False':
filter_value = False
# iterate backwards to avoid index out of range error using reversed
for item in reversed(data['operations']):
# for every element in filter_path, check if it exists
item_value = item
for filter_el in filter_path:
if item_value.get(filter_el):
item_value = item_value.get(filter_el)
# if the item value is not equal to the filter value, remove it from the list
if item_value != filter_value:
data['operations'].remove(item)
My problem is when I'm calling the function with/
list_operations(filter='done=False')
even when the done key from the response message is False, the assignment of the value to item_value does not work:
item_value = item_value.get(filter_el)
Using the debugger, item_value is {'name': 'api_path/operation-1676883175156-5f51dc9fc2ad1-b4c56f97-edd1e5be', 'done': False, 'metadata': {'createTime': ''}} instead of False
It works fine when calling
list_operations(filter='done=True')
I can't see what's missing here ...
[EDIT]
Problem was it the
if item_value.get(filter_el):
To test existence of the key, should have done:
if filter_el in item_value:
stupid mistake ...
json.loads() loads a JSON to a Python dictionary, so "done": false is already converted to {"done": False} in Python:
import json
d = json.loads("""{"name": "api_path/operation-1676883175156-5f51dc9fc2ad1-b4c56f97-edd1e5be",
"done": false,
"metadata": {"createTime": ""}}""")
print(type(d['done']))
>>> <class 'bool'>
I don't have your full response so I cannot help beyond this point.

Processing API data (json) into a singular data frame (list of list of dictionaries)?

So this is a somewhat of a continuation from a previous post of mine except now I have API data to work with. I am trying to get keys Type and Email as columns in a data frame to come up with a final number. My code:
jsp_full=[]
for p in payloads:
payload = {"payload": {"segmentId":p}}
r = requests.post(url,headers = header, json = payload)
#print(r, r.reason)
time.sleep(r.elapsed.total_seconds())
json_data = r.json() if r and r.status_code == 200 else None
json_keys = json_data['payload']['supporters']
json_package = []
jsp_full.append(json_package)
for row in json_keys:
SID = row['supporterId']
Handle = row['contacts']
a_key = 'value'
list_values = [a_list[a_key] for a_list in Handle]
string = str(list_values).split(",")
data = {
'SupporterID' : SID,
'Email' : strip_characters(string[-1]),
'Type' : labels(p)
}
json_package.append(data)
t2 = round(time.perf_counter(),2)
b_key = "Email"
e = len([b_list[b_key] for b_list in json_package])
t = str(labels(p))
#print(json_package)
print(f'There are {e} emails in the {t} segment')
print(f'Finished in {t2 - t1} seconds')
excel = pd.DataFrame(json_package)
excel.to_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(t, str(today)), sheet_name=t)
This part works all well and good. Each payload in the API represents a different segment of people so I split them out into different files. However, I am at a point where I need to combine all records into a single data frame hence why I append out to jsp_full. This is a list of a list of dictionaries.
Once I have that I would run the balance of my code which is like this:
S= pd.DataFrame(jsp_full[0], index = {0})
Advocacy_Supporters = S.sort_values("Type").groupby("Type", as_index=False)["Email"].first()
print(Advocacy_Supporters['Email'].count())
print("The number of Unique Advocacy Supporters is :")
Advocacy_Supporters_Group = Advocacy_Supporters.groupby("Type")["Email"].nunique()
print(Advocacy_Supporters_Group)
Some sample data:
[{'SupporterID': '565f6a2f-c7fd-4f1b-bac2-e33976ef4306', 'Email': 'somebody#somewhere.edu', 'Type': 'd_Student Ambassadors'}, {'SupporterID': '7508dc12-7647-4e95-a8b8-bcb067861faf', 'Email': 'someoneelse#email.somewhere.edu', 'Type': 'd_Student Ambassadors'},...`
My desired output is a dataframe that looks like so:
SupporterID Email Type
565f6a2f-c7fd-4f1b-bac2-e33976ef4306 somebody#somewhere.edu d_Student Ambassadors
7508dc12-7647-4e95-a8b8-bcb067861faf someoneelse#email.somewhere.edu d_Student Ambassadors
Any help is greatly appreciated!!
So because this code creates an excel file for each segment, all I did was read back in the excels via a for loop like so:
filesnames = ['e_S Donors', 'b_Contributors', 'c_Activists', 'd_Student Ambassadors', 'a_Volunteers', 'f_Offline Action Takers']
S= pd.DataFrame()
for i in filesnames:
data = pd.read_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(i, str(today)),sheet_name= i, engine = 'openpyxl')
S= S.append(data)
This did the trick since it was in a format I already wanted.

Field is present in output, but getting KeyError

Below is my output. As you can see, children is clearly a (dictionary) field in my response.
This code works perfectly, but it keeps any nested fields (lists or dictionaries) as is:
user = ""
password = getattr(config, 'password')
url = ''
req = requests.post(url = url, auth=(user, password))
print('Authentication succesful!/n')
ans = req.json()
#Transform resultList into Pandas DF
solr_df = pd.DataFrame.from_dict(json_normalize(ans['resultList']), orient='columns')
I instead would like to normalize the "children" field, so I did the following instead of the last row above:
solr_df = pd.DataFrame()
for record in ans['resultList']:
df = pd.DataFrame(record['children'])
df['contactId'] = record['contactId']
solr_df = solr_df.append(df)
However, I am getting a KeyError: 'children'.
Can anyone suggest what I am doing wrong?
One of your records is probably missing the 'children' key so catch that exception and continue processing the rest of the output.
solr_df = pd.DataFrame()
for record in ans['resultList']:
try:
df = pd.DataFrame(record['children'])
df['contactId'] = record['contactId']
solr_df = solr_df.append(df)
except KeyError as e:
print("Record {} triggered {}".format(record, e))
Since the message is KeyError: 'children', the only plausible reason for the error is that the children key is missing in one of the dicts. You can avoid the exception by using a try/except block, or can pass in a default value for the key, like:
solr_df = pd.DataFrame()
for record in ans['resultList']:
df = pd.DataFrame(record.get('children', {})
df['contactId'] = record.get('contactId')
solr_df = solr_df.append(df)

Selecting values from a JSON file in Python

I am getting JIRA data using the following python code,
how do I store the response for more than one key (my example shows only one KEY but in general I get lot of data) and print only the values corresponding to total,key, customfield_12830, summary
import requests
import json
import logging
import datetime
import base64
import urllib
serverURL = 'https://jira-stability-tools.company.com/jira'
user = 'username'
password = 'password'
query = 'project = PROJECTNAME AND "Build Info" ~ BUILDNAME AND assignee=ASSIGNEENAME'
jql = '/rest/api/2/search?jql=%s' % urllib.quote(query)
response = requests.get(serverURL + jql,verify=False,auth=(user, password))
print response.json()
response.json() OUTPUT:-
http://pastebin.com/h8R4QMgB
From the the link you pasted to pastebin and from the json that I saw, its a you issues as list containing key, fields(which holds custom fields), self, id, expand.
You can simply iterate through this response and extract values for keys you want. You can go like.
data = response.json()
issues = data.get('issues', list())
x = list()
for issue in issues:
temp = {
'key': issue['key'],
'customfield': issue['fields']['customfield_12830'],
'total': issue['fields']['progress']['total']
}
x.append(temp)
print(x)
x is list of dictionaries containing the data for fields you mentioned. Let me know if I have been unclear somewhere or what I have given is not what you are looking for.
PS: It is always advisable to use dict.get('keyname', None) to get values as you can always put a default value if key is not found. For this solution I didn't do it as I just wanted to provide approach.
Update: In the comments you(OP) mentioned that it gives attributerror.Try this code
data = response.json()
issues = data.get('issues', list())
x = list()
for issue in issues:
temp = dict()
key = issue.get('key', None)
if key:
temp['key'] = key
fields = issue.get('fields', None)
if fields:
customfield = fields.get('customfield_12830', None)
temp['customfield'] = customfield
progress = fields.get('progress', None)
if progress:
total = progress.get('total', None)
temp['total'] = total
x.append(temp)
print(x)

How do I access a dictionary value for use with the urllib module in python?

Example - I have the following dictionary...
URLDict = {'OTX2':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=OTX2&action=view_all',
'RAB3GAP':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=RAB3GAP1&action=view_all',
'SOX2':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=SOX2&action=view_all',
'STRA6':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=STRA6&action=view_all',
'MLYCD':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=MLYCD&action=view_all'}
I would like to use urllib to call each url in a for loop, how can this be done?
I have successfully done this with with the urls in a list format like this...
OTX2 = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=OTX2&action=view_all'
RAB3GAP = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=RAB3GAP1&action=view_all'
SOX2 = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=SOX2&action=view_all'
STRA6 = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=STRA6&action=view_all'
MLYCD = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=MLYCD&action=view_all'
URLList = [OTX2,RAB3GAP,SOX2,STRA6,PAX6,MLYCD]
for URL in URLList:
sourcepage = urllib.urlopen(URL)
sourcetext = sourcepage.read()
but I want to also be able to print the key later when returning data. Using a list format the key would be a variable and thus not able to access it for printing, I would lonly be able to print the value.
Thanks for any help.
Tom
Have you tried (as a simple example):
for key, value in URLDict.iteritems():
print key, value
Doesn't look like a dictionary is even necessary.
dbs = ['OTX2', 'RAB3GAP', 'SOX2', 'STRA6', 'PAX6', 'MLYCD']
urlbase = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=%s&action=view_all'
for db in dbs:
sourcepage = urllib.urlopen(urlbase % db)
sourcetext = sourcepage.read()
I would go about it like this:
for url_key in URLDict:
URL = URLDict[url_key]
sourcepage = urllib.urlopen(URL)
sourcetext = sourcepage.read()
The url is obviously URLDict[url_key] and you can retain the key value within the name url_key. For exemple:
print url_key
On the first iteration will printOTX2.

Categories