I collected public course data from Udemy and put it all in a json file. Each course has an identifier number under which all the data is stored. I can perfectly list out any details I want, except for these identifier numbers.
How can I list out these numbers themselves? Thanks.
{
"153318":
{
"lectures data": "31 lectures, 5 hours video",
"instructor work": "Academy Of Technical Courses, Grow Your Skills Today",
"title": "Oracle Applications R12 Order Management and Pricing",
"promotional price": "$19",
"price": "$20",
"link": "https://www.udemy.com/oracle-applications-r12-order-management-and-pricing/",
"instructor": "Parallel Branch Inc"
},
"616990":
{
"lectures data": "24 lectures, 1.5 hours video",
"instructor work": "Learning Sans Location",
"title": "Cloud Computing Development Essentials",
"promotional price": "$19",
"price": "$20",
"link": "https://www.udemy.com/cloud-computing-development-essentials/",
"instructor": "Destin Learning"
}
}
You want the keys of that dictionnary.
import json
with open('course.json') as json_file:
course=json.load(json_file)
print course.keys()
giving :
[u'616990', u'153318']
Parse the json into a python dict, then loop over the keys
parsed = json.loads(input)
for key in parsed.keys():
print(key)
Related
I need to create a "bundled" item that points to items I reference by item_id. In other terms, this is an 'un-stockable' item that is not tracked by inventory rather when it is sold it deducts the inventory of only the linked item/variations. I cannot figure this out using Square's Endpoint API. So far I have:
result = client.catalog.upsert_catalog_object(
body={
"idempotency_key": str(idempotency_key),
"object": {
"type": "ITEM",
"id": "#Cocoa_bundle",
"item_data": {
"name": "Cocoa Bundle",
"description": "Hot Chocolate Bundle of Small and Large sizes",
"abbreviation": "CB",
"item_variation_ids": ["H6EK5FZ4MMJB56FFASFNYZXC", "NZGVV7TXJHVUMNQAY743FRQX"],
"track_inventory": False,
}
}
}
)
And I recieve:
"category": "INVALID_REQUEST_ERROR",
"code": "BAD_REQUEST",
"detail": "Item with name Cocoa Bundle and token #Cocoa_bundle must have at least one variation."
However, this will create a new "variation" of an item rather than creating a bundle. It needs to be an item linked to the variations: 'item_variation_ids'. Is it possible to create a bundle as such?
I'm trying to read from a JSON file, but I get this error: TypeError: unhashable type: 'dict'
import pandas as pd
osp_json_path = 'Data/syllabi.sample.json'
df_osp_data = pd.read_json(osp_json_path)
print(df.head())
Here is some of the sample of the JSON file:
[
{
"id": 17308718202881,
"syllabus_probability": 0.6546372771263123,
"date": {
"year": 2016,
"term": null
},
"field": {
"code": "11",
"name": "Computer Science",
"label_precision": 0.7397769689559937,
"label_recall": 0.8223140239715576,
"label_f1": 0.7788649797439575
},
"institution": {
"id": 3994319585322,
"grid_id": "grid.7112.5",
"wikidata_id": "Q1783765",
"unitid": null,
"name": "Mendel University Brno",
"url": "http:\/\/mendelu.cz\/en\/",
"lat": 49.210208892822266
},
"extracted_metadata": {
"code": {
"text": "VT1",
"clean_text": "VT1",
"mean_p": 0.9811509251594543
},
"title": {
"text": "Computer Technology I",
"clean_text": "Computer Technology I",
"mean_p": 0.9993407726287842
},
"date": {
"text": "SS 2016\/2017",
"clean_text": "SS 2016\/2017",
"mean_p": 0.9996379613876343
},
"description": [
{
"text": "The aim of this course is to introduce students into the subject of computer science and data processing and to explain basic principles of computer operations. The part relating to operating systems is centered on operations with computer files and processes in systems Windows and Unix\/Linux.",
"clean_text": "The aim of this course is to introduce students into the subject of computer science and data processing and to explain basic principles of computer operations. The part relating to operating systems is centered on operations with computer files and processes in systems Windows and Unix\/Linux.",
"mean_p": 0.9988000392913818
}
]
},
"text_md5": "18443cb0375d7ba646c5ac203aac7380",
"mime_type": "text\/xml",
"anonymized_text": "Department of Informatics (FBE) Time allowance: full-time, period A\u00a02\/0, period B 2\/2 (hours of lectures per week \/ hours of seminars per week) Prerequisites for registration: not Computer Technology and Algorithms I and not Final Bachelor Exam Type of study: consulting Form of teaching: lecture, seminar Mode of completion and credits: Fulfillment of requirements (2 credits) \u00a0 Course objective: The aim of this course is to introduce students into the subject of computer science and data processing and to explain basic principles of computer operations. The part relating to operating systems is centered on operations with computer files and processes in systems Windows and Unix\/Linux. \u00a0 Course content: 1. Introduction to computer science (allowance 2\/0) \u00a0 a. Basic concepts"
}
Is there a way to avoid this TypeError? Also, not all fields are necessary, only "id" and "anonymized_text" are going to be used, so filtering some keys is also a possibility.
I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.
If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))
You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here
I have a response that I receive from Lobbyview in the form of json. I tried to put it in data frame to access only some variables, but with no success. How can I access only some variables such as the id and the committees in a format exportable to .dta ? Here is the code I have tried.
import requests, json
query = {"naics": "424430"}
results = requests.post('https://www.lobbyview.org/public/api/reports',
data = json.dumps(query))
print(results.json())
import pandas as pd
b = pd.DataFrame(results.json())
_id = data["_id"]
committee = data["_source"]["specific_issues"][0]["bills_by_algo"][0]["committees"]
An observation of the json looks like this:
"_score": 4.421936,
"_type": "object",
"_id": "5EZUMbQp3hGKH8Uq2Vxuke",
"_source":
{
"issue_codes": ["CPT"],
"received": 1214320148,
"client_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"amount": 240000,
"client":
{
"legal_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"naics": null,
"gvkey": null,
"ticker": "Unlisted",
"id": null,
"bvdid": "US131283992L"},
"specific_issues": [
{
"text": "H.R. 34, H.R. 1908, H.R. 2336, H.R. 3093 S. 522, S. 681, S. 1145, S. 1745",
"bills_by_algo": [
{
"titles": ["To amend title 35, United States Code, to provide for patent reform.", "Patent Reform Act of 2007", "Patent Reform Act of 2007", "Patent Reform Act of 2007"],
"top_terms": ["Commerce", "Administrative fees"],
"sponsor":
{
"firstname": "Howard",
"district": 28,
"title": "rep",
"id": 400025
},
"committees": ["House Judiciary"],
"introduced": 1176868800,
"type": "HR", "id": "110_HR1908"},
{
"titles": ["To amend title 35, United States Code, relating to the funding of the United States Patent and Trademark Office."],
"top_terms": ["Commerce", "Administrative fees"],
"sponsor":
{
"firstname": "Howard",
"district": 28,
"title": "rep",
"id": 400025
},
"committees": ["House Judiciary"],
"introduced": 1179288000,
"type": "HR",
"id": "110_HR2336"
}],
"gov_entities": ["U.S. House of Representatives", "Patent and Trademark Office (USPTO)", "U.S. Senate", "UNDETERMINED", "U.S. Trade Representative (USTR)"],
"lobbyists": ["Valente, Thomas Silvio", "Wamsley, Herbert C"],
"year": 2007,
"issue": "CPT",
"id": "S4nijtRn9Q5NACAmbqFjvZ"}],
"year": 2007,
"is_latest_amendment": true,
"type": "MID-YEAR AMENDMENT",
"id": "1466CDCD-BA3D-41CE-B7A1-F9566573611A",
"alternate_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION"
},
"_index": "collapsed"}```
Since the data that you specified is nested pretty deeply in the JSON-response, you have to loop through it and save it to a list temporarily. To understand the response data better, I would advice you to use some tool to look into the JSON structure, like this online JSON-Viewer. Not every entry in the JSON contains the necessary data, therefore I try to catch the error through a try and except. To make sure that the id and committees are matched correctly, I chose to add them as small dicts to the list. This list can then be read into Pandas with ease. Saving to .dta requires you to convert the lists inside the committees column to strings, instead you might also want to save as .csv for a more generally usable format.
import requests, json
import pandas as pd
query = {"naics": "424430"}
results = requests.post(
"https://www.lobbyview.org/public/api/reports", data=json.dumps(query)
)
json_response = results.json()["result"]
# to save the JSON response
# with open("data.json", "w") as outfile:
# json.dump(results.json()["result"], outfile)
resulting_data = []
# loop through the response
for data in json_response:
# try to find entries with specific issues, bills_by_algo and committees
try:
# loop through the special issues
for special_issue in data["specific_issues"]:
_id = special_issue["id"]
# loop through the bills_by_algo's
for x in special_issue["bills_by_algo"]:
# append the id and committees in a dict
resulting_data.append(({"id": _id, "committees": x["committees"]}))
except KeyError as e:
print(e, "not found in entry.")
continue
# create a DataFrame
df = pd.DataFrame(resulting_data)
# export of list objects in the column is not supported by .dta, therefore we convert
# to strings with ";" as delimiter
df["committees"] = ["; ".join(map(str, l)) for l in df["committees"]]
print(df)
df.to_stata("result.dta")
Results in
id committees
0 D8BxG5664FFb8AVc6KTphJ House Judiciary
1 D8BxG5664FFb8AVc6KTphJ Senate Judiciary
2 8XQE5wu3mU7qvVPDpUWaGP House Agriculture
3 8XQE5wu3mU7qvVPDpUWaGP Senate Agriculture, Nutrition, and Forestry
4 kzZRLAHdMK4YCUQtQAdCPY House Agriculture
.. ... ...
406 ZxXooeLGVAKec9W2i32hL5 House Agriculture
407 ZxXooeLGVAKec9W2i32hL5 Senate Agriculture, Nutrition, and Forestry; H...
408 ZxXooeLGVAKec9W2i32hL5 House Appropriations; Senate Appropriations
409 ahmmafKLfRP8wZay9o8GRf House Agriculture
410 ahmmafKLfRP8wZay9o8GRf Senate Agriculture, Nutrition, and Forestry
[411 rows x 2 columns]
I am reading file from url and format is as below
[{
"Coupon_ID": "IW12390",
"Campaign_ID": "353",
"Campaign_Name": "Dominos",
"Title": "Get 10% Off on INR 400",
"Description": "Get 10% Off on INR 400 & Above. Valid for online order only. This code is not valid on Simply Veg, Simply N-Veg Pizzas, and Combos.",
"Category": "Food & Beverages",
"Type": "Coupon",
"Type_Value": "DOM10",
"Tracking_URL": "http:\/\/tracking.icubeswire.com\/aff_c?offer_id=353&aff_id=1784",
"Added_Date": "2017-02-01",
"Expiry_Date": "2017-02-07"
},{
"Coupon_ID": "IW12392",
"Campaign_ID": "2269",
"Campaign_Name": "Shoppers Stop",
"Title": "Flat 50% on Fratini Woman",
"Description": "Flat 50% on Fratini Woman only at Shoppers Stop. So Hurry!\r\n\r\n",
"Category": "E-Commerce",
"Type": "Deal",
"Type_Value": "None",
"Tracking_URL": "http:\/\/tracking.icubeswire.com\/aff_c?offer_id=2269&aff_id=1784&url_id=16740",
"Added_Date": "2017-01-05",
"Expiry_Date": "2017-02-01"
},]
Now i want the value of list variable like this. i mean
I need to dump the value to variable X
print X should be the same as input and
x[0] - first dict
x[1] - 2nd dict like that..
I used python urllib and jSon etc.. nothing worked.
Code i tried
import json,urllib;
f = urllib.urlopen("http://assets.icubeswire.com/dealscoupons /api/getcoupon.php?API_KEY=365dx177x70080ce8b07a05e47ae79118d8641")
x = json.load(f)
#API key changed that is not correct in that code.. just try with sample input