Finding wikidata identifiers (properties & lexemes) - python

My problem:
I'm writing an NLP program in python and I need to get the entity ID for properties and lexemes. So what I basically want is, e.g. if the input is the word/property "father" I want the return value to be "P22" (property number for father). I already know some methods for getting the Q-number (see below).
from requests import get
def get_qnumber(wikiarticle, wikisite):
resp = get('https://www.wikidata.org/w/api.php', {
'action': 'wbgetentities',
'titles': wikiarticle,
'sites': wikisite,
'props': '',
'format': 'json'
}).json()
return list(resp['entities'])[0]
print(get_qnumber(wikiarticle="Andromeda Galaxy", wikisite="enwiki"))
And I thought getting the P and L-numbers would look something similar, but finding the lexeme and property number seems to be much trickier.
What I've tried:
The closest thing I've found is manually searching for ID numbers with https://www.wikidata.org/wiki/Special:Search And put a "P:" and "L:" in the search string.
I also found some code for SPARQL but it was slow and I don't know how to refine the search to exclude unrelated search results.
query = """
SELECT ?item
WHERE
{
?item rdfs:label "father"#en
}
"""
I'm a total noob about this and haven't found any info google. So am I approaching this thing completely wrong or am I missing something really obvious?

Use action=wbsearchentities with type=property or type=lexeme:
import requests
params = dict (
action='wbsearchentities',
format='json',
language='en',
uselang='en',
type='property',
search='father'
)
response = requests.get('https://www.wikidata.org/w/api.php?', params).json()
print(response.get('search')[0]['id'])
repl.it

Related

PyMongo Atlas Search not returning anything

I'm trying to do a full text search using Atlas for MongoDB. I'm doing this through the PyMongo driver in Python. I'm using the aggregate pipeline, and doing a $search but it seems to return nothing.
cursor = db.collection.aggregate([
{"$search": {"text": {"query": "hello", "path": "text_here"}}},
{"$project": {"file_name": 1}}
])
for x in cursor:
print(x)
What I'm trying to achieve with this code is to search through a field in the collection called "text_here", and I'm searching for a term "hello" and returning all the results that contain that term and listing them by their "file_name". However, it returns nothing and I'm quite confused as this is almost identical to the example code on the documentation website. The only thing I could think of right now is that possible the path isn't correct and it can't access the field I've specified. Also, this code returns no errors, simply just returns nothing as I've tested by looping through cursor.
I had the same issue. I solved it by also passing the name of the index in the query. For example:
{
index: "name_of_the_index",
text: {
query: 'john doe',
path: 'name'
}
}
I followed the tutorials but couldn't get any result back without specifying the "index" name. I wish this was mentioned in the documentation as mandatory.
If you are only doing a find and project, you don't need an aggregate query, just a find(). The syntax you want is:
db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
Equivalent using aggregate:
cursor = db.collection.aggregate([
{'$match': {'$text': {'$search': 'hello'}}},
{'$project': {'file_name': 1}}])
Worked example:
from pymongo import MongoClient, TEXT
db = MongoClient()['mydatabase']
db.collection.create_index([('text_here', TEXT)])
db.collection.insert_one({"text_here": "hello, is it me you're looking for", "file_name": "foo.bar"})
cursor = db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
for item in cursor:
print(item)
prints:
{'_id': ObjectId('5fc81ce9a4a46710459de610'), 'file_name': 'foo.bar'}

MediaWiki API revisions VS allrevisions

I am trying to write a script in order to get the revision history of biographies (the goal is to investigate how a biography changes over time). I have read most of the related articles here and the documentation about the revision module but I can't get the results I want. I post my code, most of it is copied (partially or complete) from the documentation. I changed the value in the titles parameter.
Moreover, I found the allrevisions submodule. I made it to return revisions for a specific biography, but what I get doesn't related to the revision history that someone found on the page.
Code related to "revisions"
import requests
S = requests.session()
URL = "https://www.mediawiki.org/w/api.php"
PARAMS = {
"action": "query",
"prop": "revisions",
"titles": "Albert Einstein",
"rvprop": "timestamp|user|content",
"rvslots": "main",
"formatversion": "2",
"format": "json"
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
print(DATA)
Code related to "allrevisions"
URL = "https://www.mediawiki.org/w/api.php"
PARAMS = {
"action": "query",
"list": "allrevisions",
"titles": "Albert Einstein",
"arvprop": "user|timestamp|content",
"arvslots": "main",
"arvstart": "2020-11-12T12:06:00Z",
"formatversion": "2",
"format": "json"
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
print(DATA)
Any suggestions to make it work properly? The most important is why the code related to "revisions" doesn't return anything.
As suggested, I want to get the full revision history for a specific page.
prop modules return information about a specific page (or set of pages) you provide. list modules return information about a list of pages where you only provide some abstract criteria and finding the pages matching those criteria is part of the work the API is doing (as such, titles in your second example will essentially be ignored).
You don't explain clearly what you are trying to do, but I'm guessing you want to get the full page history for a specific title, so your first example is mostly right, except you should set a higher rvlimit.
See also the (unfortately not very good) doc on continuing queries since many pages have a history which is too long to return in a single request.

Performing google search in python with API, returned with KeyError

I use exactly the same code in this answer, but it didn't work out.
from googleapiclient.discovery import build
import pprint
my_api_key = "Google API key"
my_cse_id = "Custom Search Engine ID"
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
return res['items']
results = google_search(
'stackoverflow site:en.wikipedia.org', my_api_key, my_cse_id, num=10)
for result in results:
pprint.pprint(result)
The result shows KeyError: 'items'
Then I tried to remove the key and see what the result is.
It seems like there aren't any keys named "items"
So the question is:
How can I tweak the code and get a list of links ranked in the top 20 google search results?
Thanks in advance.
Sandra
This happens when the query has no result. If only it had results, it comes under res["items"]. Since you have no result, the items key is not generated.
The custom search engine you created might be only accessible to very few URLs. Thus the result might be empty.
Make sure your configuration for "Custom Search" in your search engine app located at Setup -> Basic (Tab) -> Sites to Search (Section) is set to "Search the entire web but emphasize include site".
Also for the code, instead of returning res["items] directly, check if res["items"] is present, else return None. Then KeyError exception won't happen.
if "items" in res.keys():
return res["items"]
else:
return None
Just replace your return:
return res.get('items', None)

Elasticsearch - Reindex single field with different analyzer using Python

I use dynamic mapping in elasticsearch to load my json file into elasticsearch, like this:
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
def extract():
f = open('tmdb.json')
if f:
return json.loads(f.read())
movieDict = extract()
def index(movieDict={}):
for id, body in movieDict.items():
es.index(index='tmdb', id=id, doc_type='movie', body=body)
index(movieDict)
How can I update mapping for single field? I have field title to which I want to add different analyzer.
title_settings = {"properties" : { "title": {"type" : "text", "analyzer": "english"}}}
es.indices.put_mapping(index='tmdb', body=title_settings)
This fails.
I know that I cannot update already existing index, but what is proper way to reindex mapping generated from json file? My file has a lot of fields, creating mapping/settings manually would be very troublesome.
I am able to specify analyzer for an query, like this:
query = {"query": {
"multi_match": {
"query": userSearch, "analyzer":"english", "fields": ['title^10', 'overview']}}}
How do I specify it for index or field?
I am also able to put analyzer to settings after closing and opening index
analysis = {'settings': {'analysis': {'analyzer': 'english'}}}
es.indices.close(index='tmdb')
es.indices.put_settings(index='tmdb', body=analysis)
es.indices.open(index='tmdb')
Copying exact settings for english analyzers doesn't do 'activate' it for my data.
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-lang-analyzer.html#english-analyzer
By 'activate' I mean, search is not returned in a form processed by english analyzer ie. there are still stopwords.
Solved it with massive amount of googling....
You cannot change analyzer on already indexed data. This includes opening/closing of index. You can specify new index, create new mapping and load your data (quickest way)
Specifying analyzer for whole index isn't good solution, as 'english' analyzer is specific to 'text' fields. It's better to specify analyzer by field.
If analyzers are specified by field you also need to specify type.
You need to remember that analyzers are used at can be used at/or index and search time. Reference Specifying analyzers
Code:
def create_index(movieDict={}, mapping={}):
es.indices.create(index='test_index', body=mapping)
start = time.time()
for id, body in movieDict.items():
es.index(index='test_index', id=id, doc_type='movie', body=body)
print("--- %s seconds ---" % (time.time() - start))
Now, I've got mapping from dynamic mapping of my json file. I just saved it back to json file for ease of processing (editing). That's because I have over 40 fields to map, doing it by hand would be just tiresome.
mapping = es.indices.get_mapping(index='tmdb')
This is example of how title key should be specified to use english analyzer
'title': {'type': 'text', 'analyzer': 'english','fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}

Show html generated by using json2html at webpage [Django] [Python]?

Following is my code in view:
RESULTS= {}
for k in RESULTS_LIST[0].iterkeys():
RESULTS[k] = list(results[k] for results in RESULTS_LIST)
RESULTS.pop('_id',None)
html_table = json2html.convert(json=RESULTS)
return render(request,'homepage.html',{'html_table':html_table})
here I am arranging a data fetched from Mongo Db in a JSON named as RESULTS and by using JSON2HTML package, it is successfully generated html table for data given in JSON. To embedd the html table code in a blank division at html page, I am doing:
<div>{{html_table}}</div>
But it failed to display the table on page. I have tried numerous ways to make it but didn't succeed. Please help me to resolve this issue. if any relevant example, you have done before , then please guide me in a right direction.
JS code is:
angular.module("homeapp",[])
.controller("homecontroller",['$http','$scope',function($http,$scope){
$scope.search= {
'keyword': null,
'option': null,
'startDate':null,
'lastDate': null
};
$scope.search_info=function(){
var search_req = {
url:'/home',
method:'POST',
data:$scope.search
}
$http(search_req) //in case I dont want any response back or
$http(search_req).success(function(response){
window.alert(response)
}) //in case I need to check response
}
}]);
I have got solution to this problem in an easy way via using js, which is given below and it worked.
$http(search_req).success(function(response){
angular.element(".search_results").append(response);
})

Categories