How to generate summary like IBM from json using discovery news services with python
qopts = {'nested':'(enriched_text.entities)','filter':'(enriched_text.entities.type::Person)','term':'(enriched_text.entities.text,count:10)','filter':'(enriched_text.concepts.text:infosys)','filter':'(enriched_text.concepts.text:ceo)'}
my_query = discovery.query('system', 'news', qopts)
print(json.dumps(my_query, indent=2))
This query is proper or not for find ceo of Infosys ?
Output came in large json format the how I identify answer or create summary like top ten ceo or people.
How to generate summary from json using discovery news services with python. I fire query then output became large json format ..how to find proper summary from that json file my query is correct or not
I believe there are two questions here.
In order to answer a question like "Who is the CEO of Infosys?" I would instead make use of the natural_language_query parameter as follows:
qopts = {'natural_language_query':'Who is the CEO of Infosys?','count':'5'}
response = discovery.query(environment_id='system',collection_id='news',query_options=qopts)
print(json.dumps(response,indent=2))
In order to make use of aggregations, they must be specified in a single aggregation parameter combined with filter aggregations in the query options as follows:
qopts = {'aggregation': 'nested(enriched_text.entities).filter(enriched_text.entities.type::Person).term(enriched_text.entities.text,count:10)', 'filter':'enriched_text.entities:(text:Infosys,type:Company)','count':'0'}
response = discovery.query(environment_id='system',collection_id='news',query_options=qopts}
print(json.dumps(response,indent=2))
Notice that aggregations are chained/combined with the . symbol.
Related
Trying to use Lobbying Disclosure API from the US Congress database in Python.
I want to get only the contributions for specific members of congress along with who contributed (lobbyist or organization).
import requests
import re
import json
parameters = {
"contribution_payee": "Rashida Tlaib for Congress"
}
response = requests.get("https://lda.senate.gov/api/v1/contributions", params=parameters)
# print(response.json())
def jprint(obj):
lines = json.dumps(obj, sort_keys=True, indent=4)
with open('Test.txt', 'w') as f:
for line in lines:
f.write(line)
jprint(response.json())
I am getting dictionary lists of not just Rashida Tlaib for example, but from everyone on the LDA-203 form that also received a donation from that lobbyist or organization.
Output in text file, only want data from Rashida Tlaib and not John Thune or Ted Lieu for example. But I still want the name of org and lobbyist who donated.
Example of what each LDA-203 contribution form looks like: includes all candidates who received donation from specific org or lobbyist. Using Python to narrow down data for specific members of congress rather than just sift through it by hand. https://lda.senate.gov/filings/public/contribution/6285c999-2ec6-4d27-8963-b40bab7def55/print/
Is there a way I can narrow down my results to only include certain members of congress that I pass as a parameter, while excluding the information of everyone else who received a donation from that lobbyist or org?
Was thinking regular expressions could do the trick, but I am not very good at implementing them. Should I try to do this in R instead of Python?
Thank you!
I am looking at PatentsView API and it's unclear how to retrieve the full text of a patent. It contains only the detail_desc_length not the actual detailed description.
I would like to preform the following on both the patent_abstract and the "detailed_desciption".
import httpx
url = 'https://api.patentsview.org/patents/query?q={"_and": [{"_gte":{"patent_date":"2001-01-01"}},{"_text_any":{"patent_abstract":"radar aircraft"}},{"_neq":{"assignee_lastknown_country":"US"}}]}&o:{"per_page": 1000}'
r=httpx.get(url)
r.json()
You should take a look at patent_client! It's a python module that searches the live USPTO and EPO databases using a Django-style API. The results from any query can then be cast into pandas DataFrames or Series with a simple .to_pandas() call.
from patent_client import Patent
result = Patent.objects.filter(issue_date__gt="2001-01-01", abstract="radar aircraft")
# That provides an iterator of Patent objects that match the query.
# You can grab abstracts and detailed descriptions like this:
for patent in result:
patent.abstract
patent.description
# or you can just load it up in a Pandas dataframe:
result.values("publication_number", "abstract", "description").to_pandas()
# Produces a Pandas dataframe with columns for the patent number, abstract, and description.
A great place to start is the User Guide Introduction
PyPI | GitHub | Docs
(Full disclosure - I'm the author and maintainer of patent_client)
I want to use the census API to pull employment data that is identical to the CB1100A11 table (Screenshot attached). Each row of this table represents a different 2-digit NAICS sector. Although structuring this table is another task entirely, it appears that I am unable to get API data when I include additional variables.
I have had success with each of the example urls the Census Bureau provides, but I have not had any success with my own. I have included a code snippet below, minus my key, to show what this looks like. I am using Python 3 in Jupyter Notebooks and BS4 from BeautifulSoup.
I have already consulted the API users documentation and variable list without success.
example_vars = 'NAICS2007_TTL,GEO_TTL,EMP,LFO_TTL,ESTAB,PAYANN'
my_vars = 'NAICS2007,NAICS2007_TTL,GEO_TTL,EMP,LFO_TTL,ESTAB,PAYANN'
county_fips = '027'
state_fips = '42'
key ='str'
url= 'https://api.census.gov/data/2011/cbp?get='+my_vars+'&for=county:'+county_fips+'&in=state:'+state_fips+'&key='+key
res = requests.get(url)
res.status_code
When I add additional variables like NAICS2007 I receive a status code 400, but when I use the example variables I get a 200. The common denominator seems to be my code. Can anyone help?
image of the CB1100A11 table
This should be moved to comments (I can't comment bc of rep) but as someone who's worked closely with the US Census API I highly recommend using the Census library:
https://github.com/datamade/census
One of my queries looks like this (where acs1dp is the database I am querying):
from census import Census
conn = Census("MY API KEY")
name = 'NAME'
agriculture = 'DP03_0033PE'
laborForce = 'DP03_0003PE'
travelTime = 'DP03_0025E'
highSchool = 'DP02_0066PE'
unemployed = 'DP03_0009PE'
poverty = 'DP03_0128PE'
payload = conn.acs1dp.get((name, travelTime, agriculture, poverty,
unemployed, laborForce, highSchool), {'for': 'state:*'})
which returns each of those column values for all of the states.
I am setting up a weather camera which will provide a live stream of the current conditions outside, but I also would like to overlay continuously updated weather conditions (temperature, wind speed/direction, current weather) from a local National Weather Service weather station, from a browser API source provided in JSON format.
I have had success extracting the desired values from a different API source using a Python script I wrote; however long story short that API source is unreliable. Therefore I am using API from the official National Weather Service ASOS station at my nearby airport. The output from the new API source I am polling from is rather complicated, however, with various tiers of indentation. I have not worked with Python very long and tutorials and guides online have either been for other languages (Java or C++ mostly) or have not worked for my specific case.
First off, here is the structure of the JSON that I am receiving:
I underlined the values I am trying to extract. They are listed under the OBSERVATIONS section, associated with precip_accum_24_hour_value_1, wind_gust_value_1, wind_cardinal_direction_value_1d, and so on. The issue is there are two values underneath each observation so the script I have tried isn't returning the values I want. Here is the code I have tried:
import urllib.request
import json
f = urllib.request.urlopen('https://api.synopticdata.com/v2/stations/latest?token=8c96805fbf854373bc4b492bb3439a67&stid=KSTC&complete=1&units=english&output=json')
json_string = f.read()
parsed_json = json.loads(json_string)
for each in parsed_json['STATION']:
observations = each['OBSERVATIONS']
print(observations)
This prints out everything underneath the OBSERVATIONS in the JSON as expected, as one long string.
{'precip_accum_24_hour_value_1': {'date_time': '2018-12-06T11:53:00Z', 'value': 0.01}, 'wind_gust_value_1': {'date_time': '2018-12-12T01:35:00Z', 'value': 14.0},
to show a small snippet of the output I am receiving. I was hoping I could individually extract the values I want from this string, but everything I have attempted is not working. I would really appreciate some guidance for finishing this piece of code so I can return the values I am looking for. I realize it may be some kind of loop or special syntax.
Try something like this:
for each in parsed_json['STATION']:
observations = each['OBSERVATIONS']
for k, v in observations.items():
print(k, v["value"])
JSON maps well into python's dictionary and list types, so accessing substructures can be done with a[<index-or-key>] syntax. Iteration over key-value pairs of a dictionary can be done as I've shown above. If you're not familiar with dictionaries in python yet, I'd recommend reading about them. Searching online should yield a lot of good tutorials.
Does this help?
When you say the JSON is complicated, it really is just nested dictionaries within the main JSON response. You would access them in the same way as you would the initial JSON blob:
import urllib.request
import json
f = urllib.request.urlopen('https://api.synopticdata.com/v2/stations/latest?token=8c96805fbf854373bc4b492bb3439a67&stid=KSTC&complete=1&units=english&output=json')
json_string = f.read()
parsed_json = json.loads(json_string)
for each in parsed_json['STATION']:
for value in each:
print(value, each[value])
I'm new to using an API, and I'm currently trying to use Elsevier API. My goal is to extract the author (university) affiliations for each submission in a given journal. I've set up the API Key and looked at the exampleProg.py found here.
The How-To guides also aren't very helpful with my specific task. Could someone point me in the right direction?
Using the pybliometrics package that we design (we're Scopus users w/o Elsevier affiliation) it's very easy:
from pybliometrics.scopus import ScopusSearch
q = "ISSN(0036-8075)" # Query of the journal SoftwareX
s = ScopusSearch(q) # Handles access, retrieval and parsing
pubs = s.results # This is a list of namedtuples, one for each publication
data = []
for pub in pubs:
if not pub.author_ids:
continue
authors = pub.author_ids.split(";")
affs = pub.author_afids.split(";") # Multiple affiliations joined on hyphen!
data.extend(list(zip(authors, affs)))
We designed the information such that missing affiliations are simply stored as empty string.