Entrez eFetch Accession Number - python

We are currently working on a project where we need to access the 'NP_' accession number from ClinVar. However, when we use the Entrez.eFetch( ) function, this information appears to be missing in the result. Here is a link to the website page where the NP_ number is listed:
https://www.ncbi.nlm.nih.gov/clinvar/variation/558834/
And here is the Python sample script code that fetches the XML result:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='variation', retmode="text")
print(handle.read())
Interestingly enough, this used to return the NP number in the results, however, it seems to the website formatting/style changed from when we last developed our Python script and we cannot seem to figure out how to retrieve the NP number now.
Any help would be greatly appreciated! Thank you for your time and input!

You need to format it like a new query not an old one:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='vcv', is_varationid="true", from_esearch="true")
print(handle.read())
See also: https://www.ncbi.nlm.nih.gov/clinvar/docs/maintenance_use/

Related

Python - I want to increase the Row-Index automatically

I am absolutely new to Python or coding for that matter, hence, any help would be greatly appreciated. I have around 21 Salesforce orgs and am trying to get some information from each of the org into one place to send out in an email.
import pandas as pd
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
username = df.loc[[1],'uname'].values[0]
password = df.loc[[1],'passw'].values[0]
sectocken = df.loc[[1],'stoken'].values[0]
I have saved all my username, password, security tokens in secretCSV.csv file and with the above code I can get the data for 1 row as the index value I have given is 0. I would like to know how can I loop through this and after each loop, how to increase the index value until all rows from the CSV file is read.
Thank you in advance for any assistance you all can offer.
Adil
--
You can iterate on the dataframe but it's highly not recommend (not efficient, looks bad, too much code etc)
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
so DO NOT DO THIS EVEN IF IT WORKS:
for i in range (0, df.shape[0]):
username = df.loc[[i],'uname'].values[0]
password = df.loc[[i],'passw'].values[0]
sectocken = df.loc[[i],'stoken'].values[0]
Instead, do this:
sec_list = [(u,p,s) for _,u,p,s in df.values]
now you have a sec_list with tuples (username, password, sectocken)
access example: sec_list[0][1] - as in row=0 and get the password (located at [1]).
Pandas is great when you want to apply operations to a large set of data, but is usually not a good fit when you want to manipulate individual cells in python. Each cell would need to be converted to a python object each time its touched.
For your goals, I think the standard csv module is what you want
import csv
with open("secretCSV.csv", newline='') as f:
for username, password, sectoken in csv.reader(f):
# do all the things
Thank you everyone for your responses. I think I will first start with python learning and then get back to this. I should have learnt coding before coding. :)
Also, I was able to iterate (sorry, most of you said not to iterate the dataframe) and get the credentials from the file.
I actually have 21 salesforce orgs and am trying to get License information from each of them and email to certain people on a daily basis. I didn't want to expose salesforce credentials, hence, went with a flat file option.
I have build the code to get the salesforce license details and able to pull the same in the format I want for 1 client. However, I have to do this for 21 clients and thought of iterating the credentials so I can run the getLicense function on loop until all 21 client's data is fetched.
I will learn Python or at least learn a little bit more than what I know now and come back to this again. Until then, Informatica and batch script would have to do.
Thank you again to each one of you for your help!
Adil
--

Selenium in Jupyter Notebook, Different Outcome on Different Cells

I'm a student studying Python and Web Crawling in Korea.
I found something I can't understand why. I want to ask why this happens and how can I fix it.
It will lovely if someone is gonna help me.
Here is my situation:
This is a code for my web crawling. There is some Korean words, but that's not important, I think.
zeropay_official = 'https://www.zeropay.or.kr/main.do?pgmId=PGM0081'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(zeropay_official)
driver.find_element_by_id('tryCode').click()
driver.find_element_by_id('tryCode').send_keys('서울특별시')
driver.find_element_by_id('skkCode').click()
driver.find_element_by_id('skkCode').send_keys('노원구')
driver.find_element_by_id('pobsAfstrName').send_keys('다마식당')
driver.find_element_by_xpath('//*[#id="form"]/div[2]/a').click()
test = driver.find_element_by_id('list_div')
test.text
and right below this Jupyter Notebook cell, I put the last line of the code,
test.text
to check what's happening.
But, first cell's output ls ''(None), and second cell's output is some string which I wanted to get.
Why Is this happening? And if I need to get the output data string on the first cell, to make this code as a module so my team can import it, what should I do?
Check this image if you couldn't clearly understand what I said due to my poor English.(sob)
You can add some wait time.
zeropay_official = 'https://www.zeropay.or.kr/main.do?pgmId=PGM0081'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(zeropay_official)
driver.find_element_by_id('tryCode').click()
driver.find_element_by_id('tryCode').send_keys('서울특별시')
driver.find_element_by_id('skkCode').click()
driver.find_element_by_id('skkCode').send_keys('노원구')
driver.find_element_by_id('pobsAfstrName').send_keys('다마식당')
driver.find_element_by_xpath('//*[#id="form"]/div[2]/a').click()
time.sleep(time_in_seconds)
test = driver.find_element_by_id('list_div')
test.text
As Korean text is taking some time to appear.

Fetching place details (specifically reviews) with GooglePlaces in Python 3

I am completely new to this module and Python in general, yet wanted to start some sort of a fun project in my spare time.
I have a specific question concerning the GooglePlaces module for Python - how do I retrieve the reviews of a place by only knowing its Place ID.
So far I have done...
from googleplaces import GooglePlaces, types, lang
google_places = GooglePlaces('API KEY')
query_result = google_places.get_place(place_id="ChIJB8wSOI11nkcRI3C2IODoBU0")
print(query_result) #<Place name="Starbucks", lat=48.14308250000001, lng=11.5782337>
print(query_result.get_details()) # Prints None
print(query_result.rating) # Prints the rating of 4.3
I am completely lost here, because I cannot get access to the object's details. Maybe I am missing something, yet would be very thankful for any guidance through my issue.
If you are completly lost just read the docs :)
Example from https://github.com/slimkrazy/python-google-places:
for place in query_result.places:
# Returned places from a query are place summaries.
# The following method has to make a further API call.
place.get_details()
# Referencing any of the attributes below, prior to making a call to
# get_details() will raise a googleplaces.GooglePlacesAttributeError.
print place.details # A dict matching the JSON response from Google.
See the Problem with your code now?
print(query_result.get_details()) # Prints None
should be
query_result.get_details() # Fetch details
print(query_result.details) # Prints details dict
Regarding the results, the Google Docs states:
reviews[] a JSON array of up to five reviews. If a language parameter
was specified in the Place Details request, the Places Service will
bias the results to prefer reviews written in that language. Each
review consists of several components:

Basics of connecting python to the web and validating user input

I'm relatively new, and I'm just at a loss as to where to start. I don't expect detailed step-by-step responses (though, of course, those are more than welcome), but any nudges in the right direction would be greatly appreciated.
I want to use the Gutenberg python library to select a text based on a user's input.
Right now I have the code:
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
text = strip_headers(load_etext(11)).strip()
where the number represents the text (in this case 11 = Alice in Wonderland).
Then I have a bunch of code about what to do with the text, but I don't think that's relevant here. (If it is let me know and I can add it).
Basically, instead of just selecting a text, I want to let the user do that. I want to ask the user for their choice of author, and if Project Gutenberg (PG) has pieces by that author, have them then select from the list of book titles (if PG doesn't have anything by that author, return some response along the lines of "sorry, don't have anything by $author_name, pick someone else." And then once the user has decided on a book, have the number corresponding to that book be entered into the code.
I just have no idea where to start in this process. I know how to handle user input, but I don't know how to take that input and search for something online using it.
Ideally, I'd be able to handle things like spelling mistakes too, but that may be down the line.
I really appreciate any help anyone has the time to give. Thanks!
The gutenberg module includes facilities for searching for a text by metadata, such as author. The example from the docs is:
from gutenberg.query import get_etexts
from gutenberg.query import get_metadata
print(get_metadata('title', 2701)) # prints frozenset([u'Moby Dick; Or, The Whale'])
print(get_metadata('author', 2701)) # prints frozenset([u'Melville, Hermann'])
print(get_etexts('title', 'Moby Dick; Or, The Whale')) # prints frozenset([2701, ...])
print(get_etexts('author', 'Melville, Hermann')) # prints frozenset([2701, ...])
It sounds as if you already know how to read a value from the user into a variable, and replacing the literal author in the above would be as simple as doing something like:
author_name = my_get_input_from_user_function()
texts = get_etexts('author', author_name)
Note the following note from the same section:
Before you use one of the gutenberg.query functions you must populate the local metadata cache. This one-off process will take quite a while to complete (18 hours on my machine) but once it is done, any subsequent calls to get_etexts or get_metadata will be very fast. If you fail to populate the cache, the calls will raise an exception.
With that in mind, I haven't tried the code I've presented in this answer because I'm still waiting for my local cache to populate.

BioPython Pubmed Eutils url?

I'm trying to run some queries against Pubmed's Eutils service. If I run them on the website I get a certain number of records returned, in this case 13126 (link to pubmed).
A while ago I bodged together a python script to build a query to do much the same thing, and the resultant url returns the same number of hits (link to Eutils result).
Of course, not having any formal programming background, it was all a bit cludgy, so I'm trying to do the same thing using Biopython. I think the following code should do the same thing, but it returns a greater number of hits, 23303.
from Bio import Entrez
Entrez.email = "A.N.Other#example.com"
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
record = Entrez.read(handle)
print(record["Count"])
I'm fairly sure it's just down to some subtlety in how the url is being generated, but I can't work out how to see what url is being generated by Biopython. Can anyone give me some pointers?
Thanks!
EDIT:
It's something to do with how the url is being generated, as I can get back the original number of hits by modifying the code to include double quotes around the search term, thus:
handle = Entrez.esearch(db='pubmed', term='"stem+cell"[ALL]', datetype='pdat', mindate='2012', maxdate='2012')
I'm still interested in knowing what url is being generated by Biopython as it'll help me work out how i have to structure the search term for when i want to do more complicated searches.
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
print(handle.url)
You've solved this already (Entrez likes explicit double quoting round combined search terms), but currently the URL generated is not exposed via the API. The simplest trick would be to edit the Bio/Entrez/__init__.py file to add a print statement inside the _open function.
Update: Recent versions of Biopython now save the URL as an attribute of the returned handle, i.e. in this example try doing print(handle.url)

Categories