I am having problem on downloading a csv file using python.
File Location: https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?ucode=00001
The download button "export to csv" show javascript without a link.
How can I download the file? Thanks a lot!
From the URL provided you cannot extract the download link directly. However, downloading the CSV yields the following URL:
https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?action=csv&type=3&ucode=00001. Making it predictable.
So depending on your usecase, you can use that? The values for ucode can be extracted from select box in your first url.
Actually, there is a link: https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?action=csv&type=3&ucode=00001
A simple requests.get() call will give you the file:
import requests
filedata = requests.get('https://www.hkex.com.hk/eng/sorc/options/statistics_hv_iv.aspx?action=csv&type=3&ucode=00001').text
Related
I cannot seem to locate a URL path to the CSV file so as to try and use python csv.reader, requests, urllib, or pandas.
Here is the webpage: https://www.portconnect.co.nz/#/vessel-schedule/expected-arrivals
Am I on the right track?
Could you please suggest a solution?
normally it should be quite easy by using pandas if you only had the direct link to the csv file. With the provided url, I could download the csv from https://www.portconnect.co.nz/568a8a10-3379-4096-a1e8-76e0f1d1d847
import pandas as pd
portconnecturl = "https://www.portconnect.co.nz/568a8a10-3379-4096-a1e8-76e0f1d1d847"
print("Downloading data from portconnect...")
df_csv = pd.read_csv(portconnecturl)
print(df_csv)
.csv files can't be integrated inside a webpage,hence you won't get a webpage+.csv file url. You will have to download it exclusively and read it in your code. Else if will need to use web scraping and scrape the details that you want using beautiful soup or selenium python modules.
I am trying to download data from UniProt using Python from within a script. If you follow the previous link, you will see a Download button, and then the option of choosing the format of the data. I would like to download the Excel format, compressed. Is there a way to do this within a script?
You can easily see the URL for that if you monitor it in the Firefox "netowork" tab or equivalent. For this page it seems to be https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes. You should be able to download it using requests or any similar lib.
Example:
import requests
url = "https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes"
with open("downloaded.xlsx.gz", "wb") as target:
target.write(requests.get(url).content)
Hi there's a button in the web, if you click it, it'll download a file.
Say the corresponding url is like this
http://www.mydata.com/data/filedownload.aspx?e=MyArgu1&k=kfhk22wykq
If i put this url in the address bar in the browser, it can download the file as well properly.
Now i do this in the python,
urllib.urlretrieve(url, "myData.csv")
The csv file is empty. Any suggestions please ?
This may not be possible with every website. If a link has a token then python is unlikely to be able to use the link as it is tied to your browser.
How can we save the webpage including the content in it, so that it is viewable offline, using urllib in python language? Currently I am using the following code:
import urllib.request
driver.webdriver.Chrome()
driver.get("http://www.yahoo.com")
urllib.request.urlretrieve("http://www.yahoo.com", C:\\Users\\karanjuneja\\Downloads\\kj\\yahoo.mhtml")
This works and strores an mhtml version of the webpage in the folder, but when you open the file, you will only find the codes written and not the page how it appears online. Do we need to make changes to the code?
Also, is there an alternate way of saving the webpage in MHTML format with all the content as it appears online, and not just the source.Any suggestions?
Thanks Karan
I guess this site might help you~
Create an MHTML archive
I am trying to download an excel file from a OneDrive location. My code works okay to get the file, but the file is corrupt (I get an error message):
import urllib2
data = urllib2.urlopen("enter url here")
with open('C:\\Video.xlsx', 'wb') as output:
output.write(data.read())
output.close()
print "done"
I use the guest access to the excel file so that I don't have to work with authentication. The resulting file seems to be 15KB, the original is 22KB.
I got it. The url has the format below:
'https://onedrive.live.com/view.aspx?cid=.....app=Excel'
So, all that I had to do was change "view" to "download" at that url, and used the code below:
import urllib.request
url = 'https://onedrive.live.com/view.aspx?cid=.....app=Excel'
urllib.request.urlretrieve(url, "test.xlsx")
You can't just download the Excel file directly from OneDrive using a URL. Even when you would share the file without any authorization, you'll probably still get a link to an intermediate HTML page, rather than the Excel binary itself.
To download items from your OneDrive, you'll first need to authenticate and then pass the location of the file you're after. You'll probably want to use the OneDrive REST API. The details on how to do that are documented on the OneDrive's SDK for Python GitHub page with some examples to get you started.