Using request in python to download a xls file - python

In this page you will find a link to download an xls file (below attachment or adjuntos): https://www.banrep.gov.co/es/emisiones-vigentes-el-dcv
The link to download the xls file is: https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls
I was using this code to automatically download that file:
import requests
import os
path = os.path.abspath(os.getcwd()) #donde se descargará el archivo
path = path.replace("\\", '/')+'/'
url = 'https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls'
myfile = requests.get(url, verify=False)
open(path+'EMISIONES.xls', 'wb').write(myfile.content)
This code was working well, but suddently the downloaded file started being corrupted.
If I run the code, it raises this warning:
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.banrep.gov.co'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(

The error is related to how your request is being built. The status_code returned by the request is 403 [Forbiden]. You can see it typing
myfile.status_code
I guess the security issue is related to cookies and headers in your get request, because of that I suggest you take a view on how the webpage is building its headers in your request before the URL you're using is sent.
TIP: start you web browser in development mode and using Network tab, try to identify the headers.
To solve the issue of cookies take a view on how to retrieve naturally cookies pointing out to a previous webpage in www.banrep.gov.co, using requests.sessions
session_ = requests.Session()
Before coding you could try to test your requests using Postman, or other REST API test software.

Related

Can I do a python request using google vpn extension?

I just want to get my location using python.requests by retrieving the content of this website 'http://ipinfo.io/json'.
I planned to get the website content using google vpn extension (hola vpn) by using coockies. BUT I can't get the content of this website while using google vpn extension due to unknown reason
code:
import requests
import browser_cookie3
cookies = browser_cookie3.chrome(cookie_file="C:\\Users\\USERNAME\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 1\\Network\\Cookies")
response = requests.get('http://ipinfo.io/json', cookies=cookies)
print(response.content)
Note: I can do it by using selenium but I want another way to do it

How do I read a CSV directly into a pandas dataframe from a download link button?

I'm trying to read the Train File directly into a pandas dataframe from the link address instead of downloading to my local computer then reading.
The website is:
https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/download/#ProblemStatement
The link address when you right click the Train File at the bottom of the page is:
https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/download/train-file
I tried:
import pandas as pd
url = 'https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/download/train-file'
df = pd.read_csv(url)
The error is:
HTTPError: HTTP Error 403: Forbidden
I also tried using requests to download the CSV then reading it from my local computer, but I couldn't get that to work either.
You need supply your login credentials to the website. With requests you pass them in as arguments, as follows:
response = requests.get(url, auth=HTTPBasicAuth(username, password))
Replace username and password with your username and password. It will authenticate the request and return a response 200 or else it will return error 403.
I have also found this answer, which may be helpful.

Best way to ignore SSL certificate with request+python

I have a application deployed in private server.
ip = "100.10.1.1"
I want to read the source code/Web content of that page.
On browser when I am visiting to the page. It shows me "Your connection is not private".
So then after proceed to unsafe connection it takes me to that page.
Below is the code I am trying to read the HTML content. But it is not giving me the correct HTML content thought it is showing as 200 OK response.
I tried with ignoring ssl certificate with below code but not happening
import requests
url = "http://100.10.1.1"
requests.packages.urllib3.disable_warnings()
response = requests.get(url, verify=False)
print response.text
Can Someone share some idea on how to proceed or am i doing anything wrong on top of it?

How to log in over HTTPS via urllib?

I'm interested in using Python to retrieve a file that exists at an HTTPS url.
I have credentials for the site, and when I access it in my browser I'm able to download the file. How do I use those credentials in my Python script to do the same thing?
So far I have:
import urllib.request
response = urllib.request.urlopen('https:// (some url with an XML file)')
html = response.read()
html.write('C:/Users/mhurley/Portable_Python/notebooks/XMLOut.xml')
This code works for non-secured pages, but (understandably) returns 401:Unauthorized for the https address. I don't understand how urllib handles credentials, and the docs aren't as helpful as I'd like.

Python Requests untrusted certificate

I am using python 2.7 requests module to make a web crawler. But I am having trouble while making requests to a site that requires certificate. When I made requests.get(url), it throws sslError, certificate verify failed, ok.
So, I tried requests.get(url, verify=False), it works but it returns meta http-equiv="refresh" url='...', and the url is not the one I made the request. Is there a way to solve this problem or a need to send the certificate?
I saw in requests doc that I can send the certificate and the key. I have the certificate.crt, but I don't have the key, is there a way to get the key?
The certificate is AC certisign multipla G5 and uses TLS 1.2
After a long time of trying to solve this issue, I figured it out. The problem was not with the SSL certificate.
I was making a request to a web page that needs a session; The url that I was using is redirected from another page. To access it correctly, I had to send a request to that page and get the last redirected page.
So, what I did was using Requests' Session method:
Session.get(url, verify=False)
where the url is the redirecting url.

Categories