Python SSL Cert error despite no browser expiry - python

I am a python noob and was trying to scrape random websites (without abusing). This site caught my attention and my code ran:-
import requests
url = 'https://resultsarchives.nic.in/cbseresults/cbseresults2018/class12zpq/class12th18.asp'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Language": "en-US,en;q=0.9,bn;q=0.8",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded",
":authority": "resultsarchives.nic.in",
"Origin": "http://resultsarchives.nic.in",
"Referer": "https://resultsarchives.nic.in/cbseresults/cbseresults2018/class12zpq/class12th18.htm",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1"
}
x = range(8397, 8398)
i=0
for i in x:
payload = {
'regno': '6529437',
'sch': '12345',
'cno': str(i),
'B2': 'Submit'
}
response = requests.post(url, headers=headers, data=payload)
open('scrape.html', 'a', encoding="utf-8").write(response.text)
When executed, an SSL Certificate Expiry Error is thrown. However, browsers (chrome/firefox) work fine, and note the certificate to expire in Dec. 2022. The error ran:
requests.exceptions.SSLError: HTTPSConnectionPool(host='resultsarchives.nic.in', port=443): Max retries exceeded with url: /cbseresults/cbseresults2018/class12zpq/class12th18.asp (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))
If I process the same code using the Http version(s) of the site(s), it works fine!

Related

ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host' With Python Request Libary

I am trying to run a request with the following code but it keeps returning a ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host')
I have tried the following solutions:
Adding request headers such as user-agent, accept-encoding, accept, connection, and etc.
I had updated the cookie and path by opening a new connection
However, these solutions don't provide any fix. Here is my code:
import requests
import logging
import contextlib
from requests.auth import HTTPBasicAuth
from http.client import HTTPConnection
def debug_requests_on():
'''Switches on logging of the requests module.'''
HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
try:
url = "https://www.ixl.com/practice/tally?pesId=f171242dd3042b4bfddcaaa1f993bdc4_l92ykqs8_26fg"
headers = {
"cookie":'TRID=d4bcb89b.5e9372c8af73a; visited=true; g_state={"i_l":0}; CD=xxx; userType=2; HSI=SS; userSubjects="math,ela"; nces_id=""; mdr_id=unknown; just_logged_in=true; PPE=1_092122; ahl_banner_dismissed=false; ajs_user_id=xxx; ajs_anonymous_id=xxx; ixl_sess9300=xxx; is_logged_in=true; EUFA=true; sign_in_redirect=; debug_id=6369225f-2450-48ae-a175-536455514d19; lastLoginStatus=true; sessionUserInfo=xxx; JSESSIONID=xxx',
"authority":"www.ixl.com",
"method":"POST",
"path":"/practice/tally?pesId=f171242dd3042b4bfddcaaa1f993bdc4_l92ykqs8_26fg",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
"accept": "application/json",
"accept-language": "en-US,en;q=0.9",
"content-type": "application/json;",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"sec-gpc": "1",
"content-length": '99',
'origin':'https://www.ixl.com',
'referer':'https://www.ixl.com/ela/grade-11/recall-the-source-of-an-allusion',
'Accept-Encoding':"gzip,deflate,br",
"Connection":"keep-alive"
}
debug_requests_on()
r = requests.post(url, headers=headers)
r.raise_for_status()
except requests.exceptions.RequestException as e:
raise SystemExit(e)

Imitate Request with python

I'm trying to imitate a request
POST /default/latex2image HTTP/2
Host: e1kf0882p7.execute-api.us-east-1.amazonaws.com
Content-Length: 96
Sec-Ch-Ua: " Not A;Brand";v="99", "Chromium";v="104"
Accept: application/json, text/javascript, */*; q=0.01
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Sec-Ch-Ua-Mobile: ?0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36
Sec-Ch-Ua-Platform: "Windows"
Origin: https://latex2image.joeraut.com
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://latex2image.joeraut.com/
Accept-Encoding: gzip, deflate
Accept-Language: es-ES,es;q=0.9
{"latexInput":"\\begin{align*}\n{1}\n\\end{align*}\n",
"outputFormat":"PNG",
"outputScale":"125%"}
When its sent from the original brower, there is no problem.
However when I try to do it in python, the server rejects the request, and I don't know why.
This is what I tried:
pload = {
"latexInput":"{0}",
"outputFormat":"PNG",
"outputScale":"125%"
}
header = {
"Content-Length": "96",
"Sec-Ch-Ua": "Not A;Brand;v=99, Chromium;v=104",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Sec-Ch-Ua-Mobile":"?0",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36",
"Sec-Ch-Ua-Platform": "Windows",
"Origin": "https://latex2image.joeraut.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://latex2image.joeraut.com/",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "es-ES,es;q=0.9",
}
r = requests.post("http://e1kf0882p7.execute-api.us-east-1.amazonaws.com", data=pload, headers=header)
print(r.text)
print(r.status_code)
And the error it raised:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='e1kf0882p7.execute-api.us-east-1.amazonaws.com',
port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000259A4D20670>: Failed to establish a new connection: [WinError 10061] No se puede establecer una
conexión ya que el equipo de destino denegó expresamente dicha conexión'))

API requests works in the Web site But not work in python requests

I go onto this webpage
https://iso19139echnap.geocat.live/geonetwork/doc/api/index.html#/records/getRecord
and try this API call under Records/get/get a metadata record.
Worked,
However if I try to call the API in python, it responds 403
import requests
url_metadata = "https://iso19139echnap.geocat.live/geonetwork/srv/api/0.1/records/d1ec996c-d21c-4bc4-9888-6f1722b44a57"
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Cookie": "XSRF-TOKEN=97bb29dd-9165-4fd4-bbd1-e2c72bffa509; JSESSIONID=78C1024AF960D630A4EA49DA02DFC89A; serverTime=1615580729954; sessionExpiry=1615582829954",
"Host": "iso19139echnap.geocat.live",
"Referer": "https://iso19139echnap.geocat.live/geonetwork/doc/api/index.html",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.72 Safari/537.36 Edg/89.0.774.45",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
}
payload = {}
r_metadata = requests.request("GET", url_metadata, headers=headers, data=payload)
print("single metadata api status: "+ str(r_metadata))
It's an authentication problem, you need to include X-XSRF-TOKEN as a header. please refer to this answer on how to send a request to GeoNetwork from an API client.

Scraping website with requests and BS4 soup content coming back with question marks in html

I am scraping a website with the following url and headers:
url : 'https://tennistonic.com/tennis-news/'
headers :
{
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Cache-Control": "no-cache",
"content-length": "0",
"content-type": "text/plain",
"cookie": "IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc",
"origin": "https//tennistonic.com",
"Connection": "keep-alive",
"Pragma": "no-cache",
"Referer": "https://tennistonic.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36",
"x-client-data": "CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB"}
The x client data has a decoded section afterwards which I have left out but also tried with. The full request on dev tools is shown below:
:authority: stats.g.doubleclick.net
:method: POST
:path: /j/collect?t=dc&aip=1&_r=3&v=1&_v=j87&tid=UA-13059318-2&cid=1499412700.1601628730&jid=598376897&gjid=243704922&_gid=1691643639.1604317227&_u=QACAAEAAAAAAAC~&z=1736278164
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8
cache-control: no-cache
content-length: 0
content-type: text/plain
cookie: IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc
origin: https://tennistonic.com
pragma: no-cache
referer: https://tennistonic.com/
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36
x-client-data: CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB
Decoded:
message ClientVariations {
// Active client experiment variation IDs.
repeated int32 variation_id = [3300109, 3300134, 3300161, 3313321, 3315223, 3318700, 3318773, 3318887, 3318889, 3319522, 3320540, 3320769, 3329021, 3329167];
// Active client experiment variation IDs that trigger server-side behavior.
repeated int32 trigger_variation_id = [3317898];
}
r = requests.get(url2, headers=headers2)
soup_cont = soup(r.content, 'html.parser')
My soup contents from the response is as follows:
Is this website protected or am I sending wrong requests?
Try using selenium:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://tennistonic.com/tennis-news/')
time.sleep(3)
soup = BeautifulSoup(driver.page_source,'html5lib')
print(soup.prettify())
driver.close()

Strange error response - requests

Updated code - I'm using this code to send the request:
headers = {
"Host": "www.roblox.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",
"Accept": "application/json, text/plain, */*",
"Accept-Language": "en-US;q=0.7,en;q=0.3",
"Referer": "https://www.roblox.com/users/12345/profile",
"Content-Type": "application/json;charset=utf-8",
"X-CSRF-TOKEN": "some-xsrf-token",
"Content-Length": "27",
"DNT": "1",
"Connection": "close"
}
data = {"targetUserId":"56789"}
url = "http://www.roblox.com/user/follow"
r = requests.post(url, headers=headers, data=data, cookies={"name":"value"})
Response (using r.text):
{"isValid":false,"data":null,"error":""}
The request itself is valid, I sent it using burp and it worked:
POST /user/follow HTTP/1.1
Host: www.roblox.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: application/json, text/plain, */*
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Referer: https://www.roblox.com/users/12345/profile
Content-Type: application/json;charset=utf-8
X-CSRF-TOKEN: Ab1/2cde3fGH
Content-Length: 27
Cookie: some-cookie=;
DNT: 1
Connection: close
{"targetUser":"56789"}
Because it works in Burp but not in Python requests, get a packet sniffer (Wireshark is the simplest IMO) and look to see the difference in the packet sent by Burp that works and the one sent from Python that does not work. I am suspecting that the problem is that the website is HTTPS but you are using http://www.roblox.com . Do try https://www.roblox.com , but I am not sure if it will work.

Categories