I'm trying to run some queries against Pubmed's Eutils service. If I run them on the website I get a certain number of records returned, in this case 13126 (link to pubmed).
A while ago I bodged together a python script to build a query to do much the same thing, and the resultant url returns the same number of hits (link to Eutils result).
Of course, not having any formal programming background, it was all a bit cludgy, so I'm trying to do the same thing using Biopython. I think the following code should do the same thing, but it returns a greater number of hits, 23303.
from Bio import Entrez
Entrez.email = "A.N.Other#example.com"
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
record = Entrez.read(handle)
print(record["Count"])
I'm fairly sure it's just down to some subtlety in how the url is being generated, but I can't work out how to see what url is being generated by Biopython. Can anyone give me some pointers?
Thanks!
EDIT:
It's something to do with how the url is being generated, as I can get back the original number of hits by modifying the code to include double quotes around the search term, thus:
handle = Entrez.esearch(db='pubmed', term='"stem+cell"[ALL]', datetype='pdat', mindate='2012', maxdate='2012')
I'm still interested in knowing what url is being generated by Biopython as it'll help me work out how i have to structure the search term for when i want to do more complicated searches.
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
print(handle.url)
You've solved this already (Entrez likes explicit double quoting round combined search terms), but currently the URL generated is not exposed via the API. The simplest trick would be to edit the Bio/Entrez/__init__.py file to add a print statement inside the _open function.
Update: Recent versions of Biopython now save the URL as an attribute of the returned handle, i.e. in this example try doing print(handle.url)
Related
I'm using the click package to get input for one or more variables which get loaded in as a combined dictionary. Each entry is then joined and the combined string is added to the end of a base URL and sent through the requests package to receive some xml data.
Earlier I had an issue with one of the variables that let you search through a range, such as
[value1, value2]
Python added double quotes around it so the search function didn't operate correctly, so I used
.replace('"', '')
on the joined string before combined with the base url and that seemed to fix that problem. The issue now is that individual input that contains more than one word now doesn't produce the same output as the actual search engine online. I have to use quotes when I input the information to keep it as a single argument, but then the quotes get removed by the function above and I believe that is what is causing the issue.
I think if I have a way to access individual entries of this dictionary and remove the double quotes from only certain entries then that should get the job done. But if I am overlooking something please let me know.
Help is appreciated.
Code added below:
import click
import requests
#click.command()
#click.option(--variable1)
#click.option(--variable2)
query_list=[variable1, variable2]
query=''.join(query_list)
base_url = "abc.com...."
response=requests.get(base_url,query)
We are currently working on a project where we need to access the 'NP_' accession number from ClinVar. However, when we use the Entrez.eFetch( ) function, this information appears to be missing in the result. Here is a link to the website page where the NP_ number is listed:
https://www.ncbi.nlm.nih.gov/clinvar/variation/558834/
And here is the Python sample script code that fetches the XML result:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='variation', retmode="text")
print(handle.read())
Interestingly enough, this used to return the NP number in the results, however, it seems to the website formatting/style changed from when we last developed our Python script and we cannot seem to figure out how to retrieve the NP number now.
Any help would be greatly appreciated! Thank you for your time and input!
You need to format it like a new query not an old one:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='vcv', is_varationid="true", from_esearch="true")
print(handle.read())
See also: https://www.ncbi.nlm.nih.gov/clinvar/docs/maintenance_use/
I am completely new to this module and Python in general, yet wanted to start some sort of a fun project in my spare time.
I have a specific question concerning the GooglePlaces module for Python - how do I retrieve the reviews of a place by only knowing its Place ID.
So far I have done...
from googleplaces import GooglePlaces, types, lang
google_places = GooglePlaces('API KEY')
query_result = google_places.get_place(place_id="ChIJB8wSOI11nkcRI3C2IODoBU0")
print(query_result) #<Place name="Starbucks", lat=48.14308250000001, lng=11.5782337>
print(query_result.get_details()) # Prints None
print(query_result.rating) # Prints the rating of 4.3
I am completely lost here, because I cannot get access to the object's details. Maybe I am missing something, yet would be very thankful for any guidance through my issue.
If you are completly lost just read the docs :)
Example from https://github.com/slimkrazy/python-google-places:
for place in query_result.places:
# Returned places from a query are place summaries.
# The following method has to make a further API call.
place.get_details()
# Referencing any of the attributes below, prior to making a call to
# get_details() will raise a googleplaces.GooglePlacesAttributeError.
print place.details # A dict matching the JSON response from Google.
See the Problem with your code now?
print(query_result.get_details()) # Prints None
should be
query_result.get_details() # Fetch details
print(query_result.details) # Prints details dict
Regarding the results, the Google Docs states:
reviews[] a JSON array of up to five reviews. If a language parameter
was specified in the Place Details request, the Places Service will
bias the results to prefer reviews written in that language. Each
review consists of several components:
I'm relatively new, and I'm just at a loss as to where to start. I don't expect detailed step-by-step responses (though, of course, those are more than welcome), but any nudges in the right direction would be greatly appreciated.
I want to use the Gutenberg python library to select a text based on a user's input.
Right now I have the code:
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
text = strip_headers(load_etext(11)).strip()
where the number represents the text (in this case 11 = Alice in Wonderland).
Then I have a bunch of code about what to do with the text, but I don't think that's relevant here. (If it is let me know and I can add it).
Basically, instead of just selecting a text, I want to let the user do that. I want to ask the user for their choice of author, and if Project Gutenberg (PG) has pieces by that author, have them then select from the list of book titles (if PG doesn't have anything by that author, return some response along the lines of "sorry, don't have anything by $author_name, pick someone else." And then once the user has decided on a book, have the number corresponding to that book be entered into the code.
I just have no idea where to start in this process. I know how to handle user input, but I don't know how to take that input and search for something online using it.
Ideally, I'd be able to handle things like spelling mistakes too, but that may be down the line.
I really appreciate any help anyone has the time to give. Thanks!
The gutenberg module includes facilities for searching for a text by metadata, such as author. The example from the docs is:
from gutenberg.query import get_etexts
from gutenberg.query import get_metadata
print(get_metadata('title', 2701)) # prints frozenset([u'Moby Dick; Or, The Whale'])
print(get_metadata('author', 2701)) # prints frozenset([u'Melville, Hermann'])
print(get_etexts('title', 'Moby Dick; Or, The Whale')) # prints frozenset([2701, ...])
print(get_etexts('author', 'Melville, Hermann')) # prints frozenset([2701, ...])
It sounds as if you already know how to read a value from the user into a variable, and replacing the literal author in the above would be as simple as doing something like:
author_name = my_get_input_from_user_function()
texts = get_etexts('author', author_name)
Note the following note from the same section:
Before you use one of the gutenberg.query functions you must populate the local metadata cache. This one-off process will take quite a while to complete (18 hours on my machine) but once it is done, any subsequent calls to get_etexts or get_metadata will be very fast. If you fail to populate the cache, the calls will raise an exception.
With that in mind, I haven't tried the code I've presented in this answer because I'm still waiting for my local cache to populate.
I'm using a library ABPY (library here) for python but it is in older version i think. I'm using Python 3.3.
I did fix some PRINT errors, but that's how much i know, I'm really new on programing.
I want to fetch some webpage and filter it from advertising and then print it again.
EDITED after Sg'te'gmuj told me how to convert from python 2.x to 3.x this is my new code:
#!/usr/local/bin/python3.1
import cgitb;cgitb.enable()
import urllib.request
response = urllib.request.build_opener()
response.addheaders = [('User-agent', 'Mozilla/5.0')]
response = urllib.request.urlopen("http://www.youtube.com")
html = response.read()
from abpy import Filter
with open("easylist.txt") as f:
ABPFilter = Filter(file('easylist.txt'))
ABPFilter.match(html)
print("Content-type: text/html")
print()
print (html)
Now it is displaying a blank page
Just took a peek at the library, it seems that the file "easylist.txt" does not exist; you need to create the file, and populate it with the appropriate filters (in whatever format ABP specifies).
Additionally, it appears it takes a file object; try something like this instead:
with open("easylist.txt") as f:
ABPFilter = Filter(f)
I can't say this is wholly accurate though since I have no experience with the library, but looking at it's code I'd suspect either of the two are the problem, if not both.
Addendum #1
Looking at the code more in-depth, I have to agree that even if that fix I supplied does work, you're going to have more problems (it's in 2.x as you suggested, when you're using 3.x). I'd suggest utilizing Python's 2to3 function, to convert from typical Python 2 to Python 3 code (it's not foolproof though). The command line would be as so:
2to3 -w abpy.py
That will convert it from Python 2.x to 3.x code, and re-write the source file.
Addendum #2
The code to pass the file object should be the "f" variable, as shown above (modified to represent that; I wasn't paying attention and just left the old file function call in the argument).
You need to pass a URI to the function as well:
ABPFilter.match(URI)
You'll need to modify the code to pass those items into an array (I'm assuming at least); I'm playing with it now to see. At present I'm getting a rule error (not a Python error; but merely error handling used by abpy.py, which is good because it suggests that it's the right train of thought).
The code for the Filter.match function is as following (after using the 2to3 Python script):
def match(self, url, elementtype=None):
tokens = RE_TOK.split(url)
print(tokens)
for tok in tokens:
if len(tok) > 2:
if tok in self.index:
for rule in self.index[tok]:
if rule.match(url, elementtype=elementtype):
print(str(rule))
What this means is you're, at present, at a point where you need to program the functionality; it appears this module only indicates the rule. However, that is still useful.
What this means is that you're going to have to modify this function to take the HTML, in place of the the "url" parameter. You're going to regex the HTML (this may be rather intensive) for a list of URIs and then run each item through the match loop Where you go from there to actually filter the nodes, I'm not sure; but there is a list of filter types, so I'm assuming there is a typical procedural ABP does to remove the nodes (possibly, in some cases merely by removing the given URI from the HTML?)
References
http://docs.python.org/3.3/library/2to3.html