I am trying to download data from UniProt using Python from within a script. If you follow the previous link, you will see a Download button, and then the option of choosing the format of the data. I would like to download the Excel format, compressed. Is there a way to do this within a script?
You can easily see the URL for that if you monitor it in the Firefox "netowork" tab or equivalent. For this page it seems to be https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes. You should be able to download it using requests or any similar lib.
Example:
import requests
url = "https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes"
with open("downloaded.xlsx.gz", "wb") as target:
target.write(requests.get(url).content)
Related
I'm trying to download the csv file using Python from this site:https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators
There's a csv button in the top right corner that I want to automatically load into a data warehouse. I've gone through a few tutorials (new to Python) and have yet to be successful. Any recommendations?
Use the library called requests, it is:
import requests
You need it to create the request to the cvs resource.
Also there's a library used for screen-scraping called bs4
import bs4
You will need both to construct what you want. Look for a course over there on web scraping with python and bs4.
Also there's a library called csv,
import csv
You can use it to easily parse the csv file once you get it.
Check this example or google it:
https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3
Here's another course on LinkedIn learning platform
https://www.linkedin.com/learning/scripting-for-testers/welcome
Selenium did the trick for me:
from selenium import webdriver
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
url = 'https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators'
browser.get(url)
browser.find_element_by_xpath('//*[#id="CSV"]').click()
browser.close()
I'm trying to download a .xls from this Site
I need to somehow click on the second button("Exporta informácion diária") on the grid and download the .xls file.
I tried with requests and beautifulsoup but didnt work.
After that, tried with selenium just for some tests and i managed to do what i needed.
Can someone please explain how can i download the .xls file without using a headless browser?
Thank You.
To do this, you first need to understand what the flow of network requests that performs the download.
The easiest way is to open the developer tools in the browser you are using. And follow the appropriate requests.
In your case, there is an POST Request, Which returns the exact address to the file.
Download it with a GET request.
I want to write a python script to automate uploading of image on https://cloud.google.com/vision/ and collect information from JSON tab there. I need to know how to do it.
Till now, I'm only able to open the website on chrome using following code:-
import webbrowser
url = 'https://cloud.google.com/vision/'
webbrowser.open_new_tab(url + 'doc/')
I tried using urllib2 but couldn't get anything.
Help me out please
you have to use google-cloud-vision lib
there is a sample code in this docs
https://cloud.google.com/vision/docs/reference/libraries#client-libraries-install-python
you can start from here
How can we save the webpage including the content in it, so that it is viewable offline, using urllib in python language? Currently I am using the following code:
import urllib.request
driver.webdriver.Chrome()
driver.get("http://www.yahoo.com")
urllib.request.urlretrieve("http://www.yahoo.com", C:\\Users\\karanjuneja\\Downloads\\kj\\yahoo.mhtml")
This works and strores an mhtml version of the webpage in the folder, but when you open the file, you will only find the codes written and not the page how it appears online. Do we need to make changes to the code?
Also, is there an alternate way of saving the webpage in MHTML format with all the content as it appears online, and not just the source.Any suggestions?
Thanks Karan
I guess this site might help you~
Create an MHTML archive
There is a stock screening website called finviz. You can set up specific parameters for it to screen, and then there is a button in the bottom right corner that lets you export the results as a .cvs file.
I would like to create a script, in python 2.7, that will download and analyze the file. I imagine I can use urllib2 to access the website, but how can I trigger the export, and then read from that resulting file? Using the standard urllib2.urlopen(url).read(), returns an html file for the entire site, and not the export I need.
So it turns out, at least in this case, the export button is really a link to a different url. So where the screener's url might be: http://finviz.com/screener.ashx?v=111&f=sh_price_u1.
The export version of the url is: http://finviz.com/export.ashx?v=111&f=sh_price_u1.
The second url has the funcitonality of triggering download, so instead of urllib2.urlopen("http://finviz.com/screener.ashx?v=111&f=sh_price_u1").read() I need
urllib2.urlopen("http://finviz.com/export.ashx?v=111&f=sh_price_u1").read()
This one does the job in python. Have a look. https://github.com/nicolamr/trending-value