I cannot seem to locate a URL path to the CSV file so as to try and use python csv.reader, requests, urllib, or pandas.
Here is the webpage: https://www.portconnect.co.nz/#/vessel-schedule/expected-arrivals
Am I on the right track?
Could you please suggest a solution?
normally it should be quite easy by using pandas if you only had the direct link to the csv file. With the provided url, I could download the csv from https://www.portconnect.co.nz/568a8a10-3379-4096-a1e8-76e0f1d1d847
import pandas as pd
portconnecturl = "https://www.portconnect.co.nz/568a8a10-3379-4096-a1e8-76e0f1d1d847"
print("Downloading data from portconnect...")
df_csv = pd.read_csv(portconnecturl)
print(df_csv)
.csv files can't be integrated inside a webpage,hence you won't get a webpage+.csv file url. You will have to download it exclusively and read it in your code. Else if will need to use web scraping and scrape the details that you want using beautiful soup or selenium python modules.
Related
I am trying to download data from UniProt using Python from within a script. If you follow the previous link, you will see a Download button, and then the option of choosing the format of the data. I would like to download the Excel format, compressed. Is there a way to do this within a script?
You can easily see the URL for that if you monitor it in the Firefox "netowork" tab or equivalent. For this page it seems to be https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes. You should be able to download it using requests or any similar lib.
Example:
import requests
url = "https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes"
with open("downloaded.xlsx.gz", "wb") as target:
target.write(requests.get(url).content)
I'm trying to download the csv file using Python from this site:https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators
There's a csv button in the top right corner that I want to automatically load into a data warehouse. I've gone through a few tutorials (new to Python) and have yet to be successful. Any recommendations?
Use the library called requests, it is:
import requests
You need it to create the request to the cvs resource.
Also there's a library used for screen-scraping called bs4
import bs4
You will need both to construct what you want. Look for a course over there on web scraping with python and bs4.
Also there's a library called csv,
import csv
You can use it to easily parse the csv file once you get it.
Check this example or google it:
https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3
Here's another course on LinkedIn learning platform
https://www.linkedin.com/learning/scripting-for-testers/welcome
Selenium did the trick for me:
from selenium import webdriver
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
url = 'https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators'
browser.get(url)
browser.find_element_by_xpath('//*[#id="CSV"]').click()
browser.close()
I wrote a small python program that works with data from a CSV file. I am tracking some numbers in a google sheet and I created the CSV file by downloading the google sheet. I am trying to find a way to have python read in the CSV file directly from google sheets, so that I do not have to download a new CSV when I update the spreadsheet.
I see that the requests library may be able to handle this, but I'm having a hard time figuring it out. I've chosen not to try the google APIs because this way seems simpler as long as I don't mind making the sheet public to those with the link, which is fine.
I've tried working with the requests documentation but I'm a novice programmer and I can't get it to read in as a CSV.
This is how the data is currently taken into python:
file = open('data1.csv', newline='')
reader = csv.reader(file)
I would like the file = open() to ideally be replaced by the requests library and pull directly from the spreadsheet.
You need to find the correct URL request that download the file.
Sample URL:
csv_url='https://docs.google.com/spreadsheets/d/169AMdEzYzH7NDY20RCcyf-JpxPSUaO0nC5JRUb8wwvc/export?format=csv&id=169AMdEzYzH7NDY20RCcyf-JpxPSUaO0nC5JRUb8wwvc&gid=0'
The way to doing it is by manually download your file while inspecting the requests URL at the Network tab in the Developer Tools in your browser.
Then the following is enough:
import requests as rs
csv_url=YOUR_CSV_DOWNLOAD_URL
res=rs.get(url=csv_url)
open('google.csv', 'wb').write(res.content)
It will save CSV file with the name 'google.csv' in the folder of you python script file.
import pandas as pd
import requests
YOUR_SHEET_ID=''
r = requests.get(f'https://docs.google.com/spreadsheet/ccc?key={YOUR_SHEET_ID}&output=csv')
open('dataset.csv', 'wb').write(r.content)
df = pd.read_csv('dataset.csv')
df.head()
I tried #adirmola's solution but I had to tweak it a little.
When he wrote "You need to find the correct URL request that download the file" he has a point. An easy solution is what I'm showing here. Adding "&output=csv" after your google sheet id.
Hope it helps!
I'm not exactly sure about your usage scenario, and Adirmola already provided a very exact answer to your question, but my immediate question is why you want to download the CSV in the first place.
Google Sheets has a python library so you can just get the data from the GSheet directly.
You may also be interested in this answer since you're interested in watching for changes in GSheets
I would just like to say using the Oauth keys and google python API is not always an option. I found the above to be quite useful for my current application.
I am having problem on downloading a csv file using python.
File Location: https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?ucode=00001
The download button "export to csv" show javascript without a link.
How can I download the file? Thanks a lot!
From the URL provided you cannot extract the download link directly. However, downloading the CSV yields the following URL:
https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?action=csv&type=3&ucode=00001. Making it predictable.
So depending on your usecase, you can use that? The values for ucode can be extracted from select box in your first url.
Actually, there is a link: https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?action=csv&type=3&ucode=00001
A simple requests.get() call will give you the file:
import requests
filedata = requests.get('https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?action=csv&type=3&ucode=00001').text
How can we save the webpage including the content in it, so that it is viewable offline, using urllib in python language? Currently I am using the following code:
import urllib.request
driver.webdriver.Chrome()
driver.get("http://www.yahoo.com")
urllib.request.urlretrieve("http://www.yahoo.com", C:\\Users\\karanjuneja\\Downloads\\kj\\yahoo.mhtml")
This works and strores an mhtml version of the webpage in the folder, but when you open the file, you will only find the codes written and not the page how it appears online. Do we need to make changes to the code?
Also, is there an alternate way of saving the webpage in MHTML format with all the content as it appears online, and not just the source.Any suggestions?
Thanks Karan
I guess this site might help you~
Create an MHTML archive