Highlighting in solr with python

Highlighting in solr with python - python

If I want to achieve highlight function of Solr in Django with python, how could it be done by using the package solrpy?
How did solrpy deal with it, as the highlighting results live in a absolute fragment on the SolrResponse object,shown as a dictionary of dictionaries.
What's more, does solrpy still work for more function of solr such as faceting, highlighting and stuff, besides basic query
sc = solr.SolrConnection("http://localhost:8080/solr/cases")
response_c=sc.query('name:*%s'%q+'*',fields='name,decision_date', highlight='name')
print(response_c.results)
for hit in response_c.results:
print(hit)
And why above code does not work to achieve Highlighting?

The highlighting information is stored in a separate entry named highlighting on the response object:
If you pass in `highlight` to the SolrConnection.query call,
then the response object will also have a "highlighting" property,
which will be a dictionary.
That being said, I strongly recommend using pysolr instead of solrpy, as pysolr is maintained by the django-haystack project and has been developed continuously over the last few years compared to solrpy.

Yes. The code below allows highlighting (pysolr, version 3.6.0):
import pysolr
solr = pysolr.Solr('http://localhost:8983/solr/<core/collection>')
results = solr.search('hello', **{
'hl': 'true',
'hl.fragsize': 10,
'hl.field': 'text'
})
for i in results:
print(i)
print(results.highlighting)
results.highlighting field will store the highlighted snippets of the search. Other fields are facets, grouped, hits, spellcheck, stats. See more info at https://github.com/django-haystack/pysolr

Related

Fetching place details (specifically reviews) with GooglePlaces in Python 3

I am completely new to this module and Python in general, yet wanted to start some sort of a fun project in my spare time.
I have a specific question concerning the GooglePlaces module for Python - how do I retrieve the reviews of a place by only knowing its Place ID.
So far I have done...
from googleplaces import GooglePlaces, types, lang
google_places = GooglePlaces('API KEY')
query_result = google_places.get_place(place_id="ChIJB8wSOI11nkcRI3C2IODoBU0")
print(query_result) #<Place name="Starbucks", lat=48.14308250000001, lng=11.5782337>
print(query_result.get_details()) # Prints None
print(query_result.rating) # Prints the rating of 4.3
I am completely lost here, because I cannot get access to the object's details. Maybe I am missing something, yet would be very thankful for any guidance through my issue.

If you are completly lost just read the docs :)
Example from https://github.com/slimkrazy/python-google-places:
for place in query_result.places:
# Returned places from a query are place summaries.
# The following method has to make a further API call.
place.get_details()
# Referencing any of the attributes below, prior to making a call to
# get_details() will raise a googleplaces.GooglePlacesAttributeError.
print place.details # A dict matching the JSON response from Google.
See the Problem with your code now?
print(query_result.get_details()) # Prints None
should be
query_result.get_details() # Fetch details
print(query_result.details) # Prints details dict
Regarding the results, the Google Docs states:
reviews[] a JSON array of up to five reviews. If a language parameter
was specified in the Place Details request, the Places Service will
bias the results to prefer reviews written in that language. Each
review consists of several components:

Pymongo how to use full text search

I am looking to implement full text search in my python application using pymongo. I have been looking at this question but for some reason I am unable to implement this in my project as I am getting an error no such cmd: text. Can anyone direct me on what I am doing wrong?
Here is my code:
db = client.test
collection = db.videos
def search_for_videos(self, search_text)
self.db.command("text", "videos",
search=search_text,
limit=10)
The collection I am trying to search is called videos however I am not sure if I am putting this in the correct parameter, and I also am not sure if I need the line project={"name": 1, "_id": 0}.
The documentation here I believe is using the mongo shell to execute commands, however I wish to perform this action in my code.
I have looked at using the db.videos.find() function, but cannot seem to implement it correctly either.
How to I use PyMongo Full Text Search from my Python Code?

First be sure that you have a text index created on the field as mentioned here or you can just do it with pymongo too :
collection.create_index([('your field', 'text')])
Using pymongo you can do this to search:
collection.find({"$text": {"$search": your search}})
your function should look like this:
def search_for_videos(search_text):
collection.find({"$text": {"$search": search_text}}).limit(10)
I hope this helps you.

First create a text index based on the field you want to do the search on.
from pymongo import TEXT
db = MongoClient('localhost',port = 27017).DBNAME
db.collection.create_index([('FIELD_NAME',TEXT)],default_language ="english")
once you create the text index use the following query to search text. Depending on the size of your database, it might take long to create the text index.
db.collection.find({"$text": {"$search": your search}})

SolrClient python update document

I'm currently trying to create a small python program using SolrClient to index some files.
My need is that I want to index some file content and then add some attributes to enrich the document.
I used the post command line tool to index the files. Then I use a python program trying to enrich documents, something like this:
doc = solr.get('collection', id)
doc['new_attribute'] = 'value'
solr.index_json('collection',json.dumps([doc]))
solr.commit(openSearcher=True)
Problem is that I have the feeling that we lost file content index. If I run a query with a word present in all attributes of the doc, I find it.
If I run a query with a word only in the file, it does not work (it works indexing only the file with post without my update tentative).
I'm not sure to understand how to update the doc keeping the index created by the post command.
I hope I'm clear enough, maybe I misunderstood the way it works...
thanks a lot

If I understand correctly, you want to modify an existing record. You should be able to do something like this without using a solr.get:
doc = [{'id': 'value', 'new_attribute':{'set': 'value'}}]
solr.index_json('collection',json.dumps([doc]))
See also:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

It has worked for me in this way, it can be useful for someone
from SolrClient import SolrClient
solrConect = SolrClient("http://xx.xx.xxx.xxx:8983/solr/")
doc = [{'id': 'my_id', 'count_related_like':{'set': 10}}]
solrConect.index_json("my_collection", json.dumps(doc) )
solrConect.commit("my_collection", softCommit=True)

Trying with Curl did not change anything. I did it differently so now it works. Instead of adding the file with the post command and trying to modify it afterwards, I read the file in a string and index in a "content" field. It means every document is added in one shot.
The content field is defined as not stored, so I just index it.
It works fine and suits my needs. It's also more simple since it removes many attributes set by post command that I don't need.
If I find some time, I'll try again the partial update and update the post.
Thanks
Rémi

How to insert clean data into Firebase via REST API and Python

I have a simple Firebase that I mostly interact with via Javascript, which works really well. However, I also have a Python program that needs to get data from existing children and put/update data on existing children. I tried python-firebasin, which would do what I want, but it is unreliable (hangs, fails, etc.).
So I'm looking at the python-firebase REST wrapper. This seems efficient, and works well. However, every time I try to post() data, I get not just the data I'm posting, but some kind of unique string paired with it, all inserted as a child.
For example, via Javascript, I might say:
db = new Firebase('https://myfirebase.firebaseio.com/testval/');
db.transaction(function(current) { return 1; });
This would then give me a Firebase that looked like:
|---testval: 1
But when I try to do something similar with the Python Firebase REST wrapper, such as:
db = firebase.FirebaseApplication('https://myfirebase.firebaseio.com/')
db.post('/testval/',1)
My Firebase looks something like this:
|---testval:
|---JI4BiBbICSEAnM9mDXf: 1
In other words, it inserts a new child, gives it a new string, and then appends the data. Is there any way to insert/modify data on my Firebase using the REST wrapper that would do it cleanly like I'm doing with Javascript? Without adding children, without adding these unique strings?

Try this instead:
db.put(1)
db.post() is the equivalent of .push() in the JavaScript API, so it creates a unique ID for you. db.put() is equivalent to .set() and will just set the data, which appears to be what you want.
Note that there is no equivalent for transactions in the REST API, but your example was just using a transaction to do a .set() so hopefully you don't actually need them.

Try this:
db = firebase.FirebaseApplication('https://myfirebase.firebaseio.com/')
db.put('', 'testval', 1)
put takes three arguments : first is url or path, second is the key name or the snapshot name and third is the data(json)

BioPython Pubmed Eutils url?

I'm trying to run some queries against Pubmed's Eutils service. If I run them on the website I get a certain number of records returned, in this case 13126 (link to pubmed).
A while ago I bodged together a python script to build a query to do much the same thing, and the resultant url returns the same number of hits (link to Eutils result).
Of course, not having any formal programming background, it was all a bit cludgy, so I'm trying to do the same thing using Biopython. I think the following code should do the same thing, but it returns a greater number of hits, 23303.
from Bio import Entrez
Entrez.email = "A.N.Other#example.com"
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
record = Entrez.read(handle)
print(record["Count"])
I'm fairly sure it's just down to some subtlety in how the url is being generated, but I can't work out how to see what url is being generated by Biopython. Can anyone give me some pointers?
Thanks!
EDIT:
It's something to do with how the url is being generated, as I can get back the original number of hits by modifying the code to include double quotes around the search term, thus:
handle = Entrez.esearch(db='pubmed', term='"stem+cell"[ALL]', datetype='pdat', mindate='2012', maxdate='2012')
I'm still interested in knowing what url is being generated by Biopython as it'll help me work out how i have to structure the search term for when i want to do more complicated searches.

handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
print(handle.url)

You've solved this already (Entrez likes explicit double quoting round combined search terms), but currently the URL generated is not exposed via the API. The simplest trick would be to edit the Bio/Entrez/__init__.py file to add a print statement inside the _open function.
Update: Recent versions of Biopython now save the URL as an attribute of the returned handle, i.e. in this example try doing print(handle.url)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Highlighting in solr with python - python

Related

Fetching place details (specifically reviews) with GooglePlaces in Python 3

Pymongo how to use full text search

SolrClient python update document

How to insert clean data into Firebase via REST API and Python

BioPython Pubmed Eutils url?

Categories

Resources