I'm trying to download the csv file using Python from this site:https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators
There's a csv button in the top right corner that I want to automatically load into a data warehouse. I've gone through a few tutorials (new to Python) and have yet to be successful. Any recommendations?
Use the library called requests, it is:
import requests
You need it to create the request to the cvs resource.
Also there's a library used for screen-scraping called bs4
import bs4
You will need both to construct what you want. Look for a course over there on web scraping with python and bs4.
Also there's a library called csv,
import csv
You can use it to easily parse the csv file once you get it.
Check this example or google it:
https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3
Here's another course on LinkedIn learning platform
https://www.linkedin.com/learning/scripting-for-testers/welcome
Selenium did the trick for me:
from selenium import webdriver
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
url = 'https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators'
browser.get(url)
browser.find_element_by_xpath('//*[#id="CSV"]').click()
browser.close()
Related
I cannot seem to locate a URL path to the CSV file so as to try and use python csv.reader, requests, urllib, or pandas.
Here is the webpage: https://www.portconnect.co.nz/#/vessel-schedule/expected-arrivals
Am I on the right track?
Could you please suggest a solution?
normally it should be quite easy by using pandas if you only had the direct link to the csv file. With the provided url, I could download the csv from https://www.portconnect.co.nz/568a8a10-3379-4096-a1e8-76e0f1d1d847
import pandas as pd
portconnecturl = "https://www.portconnect.co.nz/568a8a10-3379-4096-a1e8-76e0f1d1d847"
print("Downloading data from portconnect...")
df_csv = pd.read_csv(portconnecturl)
print(df_csv)
.csv files can't be integrated inside a webpage,hence you won't get a webpage+.csv file url. You will have to download it exclusively and read it in your code. Else if will need to use web scraping and scrape the details that you want using beautiful soup or selenium python modules.
I am trying to download data from UniProt using Python from within a script. If you follow the previous link, you will see a Download button, and then the option of choosing the format of the data. I would like to download the Excel format, compressed. Is there a way to do this within a script?
You can easily see the URL for that if you monitor it in the Firefox "netowork" tab or equivalent. For this page it seems to be https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes. You should be able to download it using requests or any similar lib.
Example:
import requests
url = "https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes"
with open("downloaded.xlsx.gz", "wb") as target:
target.write(requests.get(url).content)
My goal for this python code is to create a way to obtain job information into a folder. The first step is being unsuccessful. When running the code I want the url to print https://www.indeed.com/. However instead the code returns https://secure.indeed.com/account/login. I am open to using urlib or cookielib to resolve this ongoing issue.
import requests
import urllib
data = {
'action':'Login',
'__email':'email#gmail.com',
'__password':'password',
'remember':'1',
'hl':'en',
'continue':'/account/view?hl=en',
}
response = requests.get('https://secure.indeed.com/account/login',data=data)
print(response.url)
If you're trying to scrape information from indeed, you should use the selenium library for python.
https://pypi.python.org/pypi/selenium
You can then write your program within the context of a real user browsing the site normally.
I want to write a python script to automate uploading of image on https://cloud.google.com/vision/ and collect information from JSON tab there. I need to know how to do it.
Till now, I'm only able to open the website on chrome using following code:-
import webbrowser
url = 'https://cloud.google.com/vision/'
webbrowser.open_new_tab(url + 'doc/')
I tried using urllib2 but couldn't get anything.
Help me out please
you have to use google-cloud-vision lib
there is a sample code in this docs
https://cloud.google.com/vision/docs/reference/libraries#client-libraries-install-python
you can start from here
I want to be able to generate auto alerts for certain type of matches to a web search. The first step is to read the url in Python, so that I can then parse it using BeautifulSoup or other regex based methods.
For a page like the one in the example below though, the html doesn't capture the results that I'm visualizing when I open the page with a browser.
Is there a way to actually get the HTML with search results themselves?
import urllib
link = 'http://www.sas.com/jobs/USjobs/search.html'
f = urllib.urlopen(link)
myfile = f.read()
print myfile
You cannot get the data that is being generated dynamically using javascript by using traditional urllib, urllib2 or requests modules (or even mechanize for that matter). You'll have to simulate a browser environment by using selenium with chrome or Firefox or phantomjs to evaluate the javascript in the webpage.
Have a look at Selenium Binding for python