Search for Q-ID (from Wikidata) with Twitter Username ID. (Python) - python

I have a list of verified Twitter User-IDs.
data['screen_name'] = [MOFAJapan_en, serenawilliams, JeffBezos ....]
data['twitter_ids'] = [303735625, 26589987, 15506669 ....]
and I want to get their respective Q-IDs from Wikidata. For the above Twitter-username-IDs, it will look sort of like this:
q_id_list = [Q222241, Q11459, Q312556 ....]
I ran into a slight complication here: if you search for MOFAJapan_en or MOFA of Japan, Wikidata API cannot recognize it. However, MOFAJapan has a wikidata page.
I know that the Property # for Twitter username is P2002, but how do I query for this without knowing the Q-ID?
Thank you in advance.

Given a list of Twitter names (inside VALUES), this SPARQL query will find the persons:
SELECT ?twitterName ?person
WHERE {
VALUES ?twitterName {
"MOFAJapan_en"
"serenawilliams"
"JeffBezos"
}
?person wdt:P2002 ?twitterName .
}
It won’t find anything for MOFAJapan_en, as the correct value seems to be MofaJapan_en. To ignore case, you can use a FILTER with LCASE, but this will increase runtime performance:
SELECT ?twitterName ?person
WHERE {
VALUES ?twitterName_anyCase {
"MOFAJapan_en"
"serenawilliams"
"JeffBezos"
}
FILTER( LCASE(?twitterName_anyCase) = LCASE(?twitterName) ) .
?person wdt:P2002 ?twitterName .
}

Related

Is there a way to find record in mongo by matching field string with an array of values

I have the below record
{
"title": "Kim floral jacquard minidress",
"designer": "Rotate Birger Christensen"
}
How can I find a record in the collection using an array of values. For example, I have the below array values. Because "title" field contains the "floral" value, the record is selected.
['floral', 'dresses']
The query I am using below doesn't work. :(
queryParam = ['floral', 'dresses']
def get_query(queryParam, gender):
query = {
"gender": gender
}
if (len(queryParam) != 0):
query["title"] = {"$in": queryParam}
return query
products_query = get_query(query, gender)
products = mongo.db.products.find(products_query)
To add to the previous answer, there's a little bit more to do to get this to work in pymongo. You have to use re.compile() to get the regex search to work:
import re
queryParam = [re.compile('floral'), re.compile('dresses')]
Alternatively you could use this approach which removes the need for the $in operator:
import re
queryParam = [re.compile('floral|dresses')]
And once you've done that you don't even need to use re.compile:
queryParam = 'floral|dress'
...
query = {"title": {"$regex": queryParam}}
Take your pick.
You need to do regex search along with $in operator :
db.collectionName.find( { title: { $in: [ /floral/, /dresses/ ] } })

Python GraphQL API call composition

I've recently started learning how to use python and i'm having some trouble with a graphQL api call.
I'm trying to set up a loop to grab all the information using pagination, and my first request is working just fine.
values = """
{"query" : "{organizations(ids:) {pipes {id name phases {id name cards_count cards(first:30){pageInfo{endCursor hasNextPage} edges {node {id title current_phase{name} assignees {name} due_date createdAt finished_at fields{name value filled_at updated_at} } } } } }}}"}
"""
but the second call using the end cursor as a variable isn't working for me. I assume that it's because i'm not understanding how to properly escape the string of the variable. But for the life of me I'm unable to understand how it should be done.
Here's what I've got for it so far...
values = """
{"query" : "{phase(id: """ + phaseID+ """ ){id name cards_count cards(first:30, after:"""\" + pointer + "\"""){pageInfo{endCursor hasNextPage} edges {node {id title assignees {name} due_date createdAt finished_at fields{name value datetime_value updated_at phase_field { id label } } } } } } }"}
"""
the second one as it loops just returns a 400 bad request.
Any help would be greatly appreciated.
As a general rule you should avoid building up queries using string manipulation like this.
In the GraphQL query itself, GraphQL allows variables that can be placeholders in the query for values you will plug in later. You need to declare the variables at the top of the query, and then can reference them anywhere inside the query. The query itself, without the JSON wrapper, would look something like
query = """
query MoreCards($phase: ID!, $cursor: String) {
phase(id: $phase) {
id, name, cards_count
cards(first: 30, after: $cursor) {
... CardConnectionData
}
}
}
"""
To actually supply the variable values, they get passed as an ordinary dictionary
variables = {
"phase": phaseID,
"cursor": pointer
}
The actual request body is a straightforward JSON structure. You can construct this as a dictionary too:
body = {
"query": query,
"variables": variables
}
Now you can use the standard json module to format it to a string
print(json.dumps(body))
or pass it along to something like the requests package that can directly accept the object and encode it for you.
I had a similar situation where I had to aggregate data through paginating from a GraphQL endpoint. Trying the above solution didn't work for me that well.
to start my header config for graphql was like this:
headers = {
"Authorization":f"Bearer {token}",
"Content-Type":"application/graphql"
}
for my query string, I used the triple quote with a variable placeholder:
user_query =
"""
{
user(
limit:100,
page:$page,
sort:[{field:"email",order:"ASC"}]
){
list{
email,
count
}
}
"""
Basically, I had my loop here for the pages:
for page in range(1, 9):
formatted_query = user_query.replace("$page",f'{page}')
response = requests.post(api_url, data=formatted_query,
headers=headers)
status_code, json = response.status_code, response.json()

Searching letter by letter in elastic search

i am using elasticsearch with python as client. I want to query through a list of companies. Say Company field values are
Gokl
Normn
Nerth
Scenario 1(using elasticsearch-dsl python)
s = Search(using=client, index="index-test") \
.query("match", Company="N")
So when i put N in query match i don't get Normn or Nerth. I think its probably because of tokenization based on words.
Scenario 2(using elasticsearch-dsl python)
s = Search(using=client, index="index-test") \
.query("match", Company="Normn")
When i enter Normn i get the output clearly. So how can i make the search active when i enter letter n as in above scenario 1.
I think you are looking for a prefix search. I don't know the python syntax but the direct query would look like this:
GET index-test/_search
{
"query": {
"prefix": {
"company": {
"value": "N"
}
}
}
}
See here for more info.
If I understand correctly you need to query companies started with specific letter
In this case you can use this query
{
"query": {
"regexp": {
"Company": "n.*"
}
}
}
please read query types from here
for this case, you can use the code below:
s = Search(using=client, index="index-test").\
.query("match_phrase_prefix", Company="N")
you can use multi-match query for Company and Another field like this:
s = Search(using=client, index="index-test").\
.query("multi_match", query="N", fields=['Company','Another_field'],type='phrase_prefix')

aggregate a field in elasticsearch-dsl using python

Can someone tell me how to write Python statements that will aggregate (sum and count) stuff about my documents?
SCRIPT
from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="attendance")
s = s.execute()
for tag in s.aggregations.per_tag.buckets:
print (tag.key)
OUTPUT
File "/Library/Python/2.7/site-packages/elasticsearch_dsl/utils.py", line 106, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Response' object has no attribute 'aggregations'
What is causing this? Is the "aggregations" keyword wrong? Is there some other package I need to import? If a document in the "attendance" index has a field called emailAddress, how would I count which documents have a value for that field?
First of all. I notice now that what I wrote here, actually has no aggregations defined. The documentation on how to use this is not very readable for me. Using what I wrote above, I'll expand. I'm changing the index name to make for a nicer example.
from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="airbnb", doc_type="sleep_overs")
s = s.execute()
# invalid! You haven't defined an aggregation.
#for tag in s.aggregations.per_tag.buckets:
# print (tag.key)
# Lets make an aggregation
# 'by_house' is a name you choose, 'terms' is a keyword for the type of aggregator
# 'field' is also a keyword, and 'house_number' is a field in our ES index
s.aggs.bucket('by_house', 'terms', field='house_number', size=0)
Above we're creating 1 bucket per house number. Therefore, the name of the bucket will be the house number. ElasticSearch (ES) will always give a document count of documents fitting into that bucket. Size=0 means to give use all results, since ES has a default setting to return 10 results only (or whatever your dev set it up to do).
# This runs the query.
s = s.execute()
# let's see what's in our results
print s.aggregations.by_house.doc_count
print s.hits.total
print s.aggregations.by_house.buckets
for item in s.aggregations.by_house.buckets:
print item.doc_count
My mistake before was thinking an Elastic Search query had aggregations by default. You sort of define them yourself, then execute them. Then your response can be split b the aggregators you mentioned.
The CURL for the above should look like:
NOTE: I use SENSE an ElasticSearch plugin/extension/add-on for Google Chrome. In SENSE you can use // to comment things out.
POST /airbnb/sleep_overs/_search
{
// the size 0 here actually means to not return any hits, just the aggregation part of the result
"size": 0,
"aggs": {
"by_house": {
"terms": {
// the size 0 here means to return all results, not just the the default 10 results
"field": "house_number",
"size": 0
}
}
}
}
Work-around. Someone on the GIT of DSL told me to forget translating, and just use this method. It's simpler, and you can just write the tough stuff in CURL. That's why I call it a work-around.
# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="airbnb", doc_type="sleep_overs")
# how simple we just past CURL code here
body = {
"size": 0,
"aggs": {
"by_house": {
"terms": {
"field": "house_number",
"size": 0
}
}
}
}
s = Search.from_dict(body)
s = s.index("airbnb")
s = s.doc_type("sleepovers")
body = s.to_dict()
t = s.execute()
for item in t.aggregations.by_house.buckets:
# item.key will the house number
print item.key, item.doc_count
Hope this helps. I now design everything in CURL, then use Python statement to peel away at the results to get what I want. This helps for aggregations with multiple levels (sub-aggregations).
I do not have the rep to comment yet but wanted to make a small fix on Matthew's comment on VISQL's answer regarding from_dict. If you want to maintain the search properties, use update_from_dict rather the from_dict.
According to the Docs , from_dict creates a new search object but update_from_dict will modify in place, which is what you want if Search already has properties such as index, using, etc
So you would want to declare the query body before the search and then create the search like this:
query_body = {
"size": 0,
"aggs": {
"by_house": {
"terms": {
"field": "house_number",
"size": 0
}
}
}
}
s = Search(using=client, index="airbnb", doc_type="sleep_overs").update_from_dict(query_body)

How can I retrieve the wikipageID if the page is not in english using sparql and dbpedia

I want to retrieve the wikipageID for the same query name in different languages. For example:
select * where { <http://dbpedia.org/resource/Mike_Quigley_(footballer)> dbpedia-owl:wikiPageID ?wikiID }
====>Mike_Quigley_(footballer) 17237449 en
select * where { <http://dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts 6831454 en
select * where { <http://de.dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts de
select * where { <http://fr.dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts fr
select * where { <http://it.dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts it
select * where { <http://ja.dbpedia.org/resource/セオドア・ロバーツ> dbpedia-owl:wikiPageID ?wikiID }
====>セオドア・ロバーツ ja
In the first query Mike_Quigley_(footballer)which is in english I was able to retrieve its ID = 17237449but when the language changes as you can see I cannot retrieve the wikipageIDs.
How can I retrieve the ids of these pages
The more complex part is that the following link German language Theodore_Roberts will lead me to a page where the property wikipageID is not dbpedia-owl It gets really complicated.
Do you have any idea on how to solve it?
Amazing, when I try this query
SELECT ?uri ?id
WHERE {
?uri <http://dbpedia.org/ontology/wikiPageID> ?id.
FILTER (?uri = <http://dbpedia.org/resource/Lyon>)
}
I get the following result:
http://dbpedia.org/resource/Lyon 863863
But When I change the uri to <it.dbpedia.org/resource/Lyon> I get nothing.
So if I understand correctly you want to get the Italian version of your URI. But the problem is you are looking for the wrong URI. The Italian version of Lyon URI in English DBpedia is http://it.dbpedia.org/resource/Lione and I am assuming you are using that. I found this out by:
SELECT *
WHERE {
?uri owl:sameAs ?b.
FILTER (?uri = <http://dbpedia.org/resource/Lyon>)
}
And I couldn't get the Italian pageID from the English DBPedia.
When I tried in on the Italian DBpedia, it worked :
SELECT ?uri ?id
WHERE {
?uri <http://dbpedia.org/ontology/wikiPageID> ?id.
FILTER (?uri = <http://it.dbpedia.org/resource/Lyon>)
}
However, if you look at the Italian page http://it.dbpedia.org/resource/Lyon, you can see that it has a property dbpedia-owl:wikiPageRedirects which is equal to http://it.dbpedia.org/resource/Lione that the owl:sameAs gives you. Maybe you can work your way through that.

Categories