The essence of the problem is that I used to connect to websocket by sending Origin, User-Agent, Cookies and the connection worked, now the domain owner decided to change it to the domain of the websocket and put cloudflare protection there, after which my connection method does not work . Advise some method, or information on how to connect to a web socket with cloudflare. Help me pls!!
Example of my code:
import websocket
import json
import time
import traceback
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.173', 'cookie': '__cfduid=da97b059db0292806e2affdf9c3f4fd8b1593022325; _csrf=i8W6njc7hUXMOf4iQjiAxKg1; language=en; theme=darkTheme; pro_version=false; csgo_ses=1489162147d69debd9fe5d0ea2e445c87a117578d774502172d7151b89b82f7f; steamid=76561199068891508; avatar=https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/fe/fef49e7fa7e1997310d705b2a6158ff8dc1cdfeb_medium.jpg; username=andrewcrook232; thirdparty_token=06d04856ce6e334aa1368696df775e7ba0b1b898db135b0af0b5dc0fe001dd55; user_type=old; sellerid=6721648; type_device=desktop', 'origin': 'https://cs.money'}
def start_ws():
try:
ws = websocket.WebSocketApp("wss://ws.cs.money/ws", on_message = on_message, cookie = json.dumps(headers))
print("Connected")
while True:
ws.run_forever(ping_timeout=20)
print("Reload")
time.sleep(20)
except:
print(traceback.format_exc())
def on_message(ws, message):
try:
print(message)
except:
print(traceback.format_exc())
if __name__ == "__main__":
start_ws()
Below is all the information that I got with Chrome Inspector (f12) -> Network -> WS -> headers, this information should be more than enough to successfully join WSS.
Request URL: wss://ws.cs.money/ws
Request Method: GET
Status Code: 101 Switching Protocols
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400
CF-Cache-Status: DYNAMIC
CF-RAY: 5a886ad37f4b8ac6-KBP
cf-request-id: 038921182700008ac6798a2200000001
Connection: upgrade
Date: Wed, 24 Jun 2020 18:12:29 GMT
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Sec-WebSocket-Accept: zrH4CEKXm3BY5z77HroJDqGgYSc=
Server: cloudflare
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Upgrade: websocket
X-Content-Type-Options: nosniff
Accept-Encoding: gzip, deflate, br
Accept-Language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: no-cache
Connection: Upgrade
Host: ws.cs.money
Origin: https://cs.money
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: GXVT8QewAgPEZDEZZ+x3dA==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.173
Also additional page data:
Request URL: https://cs.money/
Request Method: GET
Status Code: 200
Remote Address: 104.20.76.156:443
Referrer Policy: no-referrer-when-downgrade
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400
cf-cache-status: DYNAMIC
cf-ray: 5a886ab5adac8aea-KBP
cf-request-id: 038921058800008aea96109200000001
content-encoding: br
content-security-policy: script-src 'self' cs.money dev.csgo.trade gleam.io www.am4charts.com translate.google.com translate.googleapis.com www.googletagmanager.com www.googleoptimize.com www.google-analytics.com connect.facebook.net https://vk.com 'unsafe-inline' top-fwz1.mail.ru 'unsafe-eval' api.usersnap.com cdn.usersnap.com cs.money mc.yandex.ru diffuser-cdn.app-us1.com diffuser-cdn.app-us1.com prism.app-us1.com trackcmp.net api.basisid.com https://cdn.amplitude.com sc-static.net support.cs.money embed-sandbox.bridgerpay.com embed.bridgerpay.com cs.money; worker-src 'self' data: blob: cs.money; object-src cs.money dota.money; media-src cs.money dota.money; frame-src cs.money dota.money onesignal.com https://*.com https://*.ru https://*.ua http://www.youtube.com
content-type: text/html; charset=utf-8
date: Wed, 24 Jun 2020 18:12:25 GMT
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
set-cookie: user_type=old; Path=/
set-cookie: language=en; Max-Age=8640000; Domain=cs.money; Path=/; Expires=Fri, 02 Oct 2020 18:12:25 GMT
set-cookie: language=en; Max-Age=8640000; Domain=.cs.money; Path=/; Expires=Fri, 02 Oct 2020 18:12:25 GMT
set-cookie: sellerid=6721648; Max-Age=8640000; Domain=cs.money; Path=/; Expires=Fri, 02 Oct 2020 18:12:25 GMT
set-cookie: pro_version=false; Max-Age=8640000; Domain=cs.money; Path=/; Expires=Fri, 02 Oct 2020 18:12:25 GMT
status: 200
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-cache-status: BYPASS
x-content-type-options: nosniff
x-dns-prefetch-control: off
x-download-options: noopen
x-frame-options: SAMEORIGIN
x-powered-by: PHP 4.1.0
x-xss-protection: 1; mode=block
:authority: cs.money
:method: GET
:path: /
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
cache-control: max-age=0
cookie: __cfduid=da97b059db0292806e2affdf9c3f4fd8b1593022325; _csrf=i8W6njc7hUXMOf4iQjiAxKg1; language=en; theme=darkTheme; pro_version=false; csgo_ses=1489162147d69debd9fe5d0ea2e445c87a117578d774502172d7151b89b82f7f; steamid=76561199068891508; avatar=https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/fe/fef49e7fa7e1997310d705b2a6158ff8dc1cdfeb_medium.jpg; username=andrewcrook232; thirdparty_token=06d04856ce6e334aa1368696df775e7ba0b1b898db135b0af0b5dc0fe001dd55; user_type=old; sellerid=6721648; type_device=desktop
referer: https://steamcommunity.com/openid/login?openid.mode=checkid_setup&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.return_to=https%3A%2F%2Fauth.dota.trade%2Flogin%2Fcallback%3FredirectUrl%3Dhttps%3A%2F%2Fcs.money%26callbackUrl%3Dhttps%3A%2F%2Fcs.money%2Flogin&openid.realm=https%3A%2F%2Fauth.dota.trade
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: cross-site
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.173
I'm not sure about the real reason, but it seems that your code has some bug.
If you need to build a websocket connection with customized header, you pass it to header parameter, instead of json dump it.
ws = websocket.WebSocketApp("wss://ws.cs.money/ws",
on_message = on_message,
cookie = json.dumps(headers))
should be
cookie_string = headers['cookie']
del headers['cookie']
header_without_cookie = headers
ws = websocket.WebSocketApp("wss://ws.cs.money/ws",
on_message = on_message,
header = header_without_cookie,
cookie = cookie_string)
websocket-client documentation is missing, maybe you can read source code about usage
https://github.com/websocket-client/websocket-client/blob/2222f2c49d71afd74fcda486e3dfd14399e647af/websocket/_app.py
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 days ago.
This post was edited and submitted for review 7 days ago.
Improve this question
how to bypass HTTP/1.1 403 Forbidden in connect to wss://ws2.qxbroker.com/socket.io/EIO=3&transport=websocket, i try change user-agent and try use proxy and add cookis but not work
class WebsocketClient(object):
def __init__(self, api):
websocket.enableTrace(True)
Origin = 'Origin: https://qxbroker.com'
Extensions = 'Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits'
Host = 'Host: ws2.qxbroker.com'
Agent = 'User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 OPR/94.0.0.0'
self.api = api
self.wss=websocket.WebSocketApp(('wss://ws2.qxbroker.com/socket.io/EIO=3&transport=websocket'), on_message=(self.on_message),
on_error=(self.on_error),
on_close=(self.on_close),
on_open=(self.on_open),
header=[Origin,Extensions,Agent])
request header and response header this site protect with cloudflare
--- request header ---
GET /socket.io/?EIO=3&transport=websocket HTTP/1.1
Upgrade: websocket
Host: ws2.qxbroker.com
Sec-WebSocket-Key: 7DgEjWxUp8N8PVY7N7vyDw==
Sec-WebSocket-Version: 13
Connection: Upgrade
Origin: https://qxbroker.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
-----------------------
--- response header ---
HTTP/1.1 403 Forbidden
Date: Sat, 11 Feb 2023 23:33:11 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Permissions-Policy: accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
Referrer-Policy: same-origin
X-Frame-Options: SAMEORIGIN
Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Set-Cookie: __cf_bm=7TD4hk4.bntJRdP6w9K.AjXF5MsV9LERTJV00jL2Uww-1676158391-0-AZFOKw90ZYdyy4RxX1xJ4jZQMt74+3UkQDZpDrdXE8BxGJULfe8j0T8EZnpUNXr2W3YHd/FxRoO/bPhKA2Dc0E0=; path=/; expires=Sun, 12-Feb-23 00:03:11 GMT; domain=.qxbroker.com; HttpOnly; Secure; SameSite=None
Server-Timing: cf-q-config;dur=6.9999950937927e-06
Server: cloudflare
CF-RAY: 7980e3583b6a0785-MRS
# selenium-request.py
from seleniumwire import webdriver # Import from seleniumwire
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
driver.get('https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json')
for request in driver.requests:
if request.response:
print(request.response.headers)
When I run that code I get the headers Selenium uses:
$ python selenium-request.py
Accept-Ranges: bytes
Access-Control-Allow-Origin: http://star-website.com
Content-Type: application/json
ETag: W/"36b8a-5d3d28ed9cc43"
Last-Modified: Thu, 23 Dec 2021 16:16:16 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: Apache
ServerID: e1
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=86400
Date: Thu, 23 Dec 2021 16:16:16 GMT
Content-Length: 46236
Connection: keep-alive
Content-Security-Policy: frame-ancestors 'self' *.cmegroup.com *.quikstrike.net commodex.co.il openexchange.community.cmegroup.com staging.tickertocker.com http://www.straitsfinancial.com www.straitsfinancial.com http://straitsfinancial.com https://www.home.saxo https://app.topsteptrader.com https://help.topsteptrader.com https://staging.topsteptrader.com https://blueeditsitecore.sys.dom https://bluesitecore.sys.dom https://sitecoredev.orange.saxobank.com https://sitecoredev-nocache.orange.saxobank.com https://sitecoredevedit.orange.tst2.dom http://star-website.com https://www.investing.com https://*.benzinga.com https://bz.zingbot.bz https://www.zingbot.bz https://gdcdyn.interactivebrokers.com https://www.interactivebrokers.com https://zingbot.bz https://www.zingbot.bz https://m.zingbot.bz https://bz.zingbot.bz https://dev.futuresfirstacademy.com https://uat.futuresfirstacademy.com https://futuresfirstacademy.com http://stage.barchart.com http://www.barchart.com https://www.infinityfutures.com https://kilofutures.com https://m.cqg.com https://mdemo.cqg.com *.chicago.cme.com:7822 https://uatm.cqg.com https://local.zingbot.bz https://www.gulfbondsukuk.org www.kgieworld.sg https://www.propex24.wpcomstaging.com https://www.propex24.com *.straitsfinancial.gate39tech.com us.straitsfinancial.com https://*.kapcoclients.com https://kapcoclients.com https://*.wallstreetbound.org https://wallstreetbound.org https://cofcointl.plateau.com https://rise.articulate.com https://members.tradeday.com http://blf-django.herokuapp.com https://www.bluelinefutures.com https://www.bluelinefutures.live https://www.bluelinefutures.trade https://login.chicago.cme.com https://loginnr.chicago.cme.com https://logincert.chicago.cme.com https://login-ny.chicago.cme.com https://ampfutures.com https://cme.ampfutures.com https://*.advantagefutures.com https://*.e-futures.com https://*.etrade.com https://*.gffbrokers.com https://infinityfutures-cn.com https://sweetfutures.com https://*.tradovate.com https://home.saxo https://*.tickmill.co.uk https://*.directa.it https://big.pt https://*.tradestation-international.com https://*.stonex.com http://tradinglesson.com https://tradinglesson.com *.ibroker.it *.ibroker.es *.cornertrader.ch *.whselfinvest.com *.banxbroker.de *.ameritrade.com *.sweetfutures.com *.danielstrading.com *.gainfutures.com *.futuresonline.com *.tdainc.com *.lsvp.com *.schwab.com *.schwab.co.uk *.us.global.schwab.com *.dev.schwab.com;
Set-Cookie: ak_bmsc=AB0A9701302106EABE2E195C6AC2A074~000000000000000000000000000000~YAAQLtERAvOZVN19AQAA7C8U6A7AWr7StAmiphZPltguFftPSOXgfa2NAq7Vts+40k7AdnPG55ULK1vyBRhPRdqWbtYml3JTC3RjHLu31l8kWBFvysYyuY2uz4GpkvmOWoBSN/Dl/2bQ9bEgbiYj3tCZ1o+wEvMfsiAWiJeMY3M1ozu6nyQz0JVpdvfsqun3z5wGhpJWhkjrJjeIyHvVdzx2uyIb1azRFlHT+nRCR6NHGoaMM/G2sI1DqPOXPB5btXjdncvB739c2Beh7RgWD/zvb78qpAJDUR1KOenDy1EwN2Bg8pqH1sxlsoVrl7i7r/pAOaWKfd4U1FKP7p730GfOp/m2VRBIdYgHDPHPvGeITPKrR/G22aR886r9Lerhug==; Domain=.cmegroup.com; Path=/; Expires=Thu, 23 Dec 2021 18:16:01 GMT; Max-Age=7185; HttpOnly
I copy these exact headers into a python dict and request as follows:
# python-request.py
import requests
headers = {
"Accept-Ranges": "bytes",
"Access-Control-Allow-Origin": "http://star-website.com",
"Content-Type": "application/json",
"ETag": 'W/"36b8a-5d3d28ed9cc43"',
"Last-Modified": "Thu, 23 Dec 2021 16:16:16 GMT",
"Referrer-Policy": "no-referrer-when-downgrade",
"Server": "Apache",
"ServerID": "e1",
"Strict-Transport-Security": "max-age=31536000; includeSubDomains",
"Vary": "Accept-Encoding",
"Content-Encoding": "gzip",
"Cache-Control": "max-age=86400",
"Date": "Thu, 23 Dec 2021 16:16:16 GMT",
"Content-Length": "46236",
"Connection": "keep-alive",
"Content-Security-Policy": "frame-ancestors 'self' *.cmegroup.com *.quikstrike.net commodex.co.il openexchange.community.cmegroup.com staging.tickertocker.com http://www.straitsfinancial.com www.straitsfinancial.com http://straitsfinancial.com https://www.home.saxo https://app.topsteptrader.com https://help.topsteptrader.com https://staging.topsteptrader.com https://blueeditsitecore.sys.dom https://bluesitecore.sys.dom https://sitecoredev.orange.saxobank.com https://sitecoredev-nocache.orange.saxobank.com https://sitecoredevedit.orange.tst2.dom http://star-website.com https://www.investing.com https://*.benzinga.com https://bz.zingbot.bz https://www.zingbot.bz https://gdcdyn.interactivebrokers.com https://www.interactivebrokers.com https://zingbot.bz https://www.zingbot.bz https://m.zingbot.bz https://bz.zingbot.bz https://dev.futuresfirstacademy.com https://uat.futuresfirstacademy.com https://futuresfirstacademy.com http://stage.barchart.com http://www.barchart.com https://www.infinityfutures.com https://kilofutures.com https://m.cqg.com https://mdemo.cqg.com *.chicago.cme.com:7822 https://uatm.cqg.com https://local.zingbot.bz https://www.gulfbondsukuk.org www.kgieworld.sg https://www.propex24.wpcomstaging.com https://www.propex24.com *.straitsfinancial.gate39tech.com us.straitsfinancial.com https://*.kapcoclients.com https://kapcoclients.com https://*.wallstreetbound.org https://wallstreetbound.org https://cofcointl.plateau.com https://rise.articulate.com https://members.tradeday.com http://blf-django.herokuapp.com https://www.bluelinefutures.com https://www.bluelinefutures.live https://www.bluelinefutures.trade https://login.chicago.cme.com https://loginnr.chicago.cme.com https://logincert.chicago.cme.com https://login-ny.chicago.cme.com https://ampfutures.com https://cme.ampfutures.com https://*.advantagefutures.com https://*.e-futures.com https://*.etrade.com https://*.gffbrokers.com https://infinityfutures-cn.com https://sweetfutures.com https://*.tradovate.com https://home.saxo https://*.tickmill.co.uk https://*.directa.it https://big.pt https://*.tradestation-international.com https://*.stonex.com http://tradinglesson.com https://tradinglesson.com *.ibroker.it *.ibroker.es *.cornertrader.ch *.whselfinvest.com *.banxbroker.de *.ameritrade.com *.sweetfutures.com *.danielstrading.com *.gainfutures.com *.futuresonline.com *.tdainc.com *.lsvp.com *.schwab.com *.schwab.co.uk *.us.global.schwab.com *.dev.schwab.com;",
"Set-Cookie": "ak_bmsc=AB0A9701302106EABE2E195C6AC2A074~000000000000000000000000000000~YAAQLtERAvOZVN19AQAA7C8U6A7AWr7StAmiphZPltguFftPSOXgfa2NAq7Vts+40k7AdnPG55ULK1vyBRhPRdqWbtYml3JTC3RjHLu31l8kWBFvysYyuY2uz4GpkvmOWoBSN/Dl/2bQ9bEgbiYj3tCZ1o+wEvMfsiAWiJeMY3M1ozu6nyQz0JVpdvfsqun3z5wGhpJWhkjrJjeIyHvVdzx2uyIb1azRFlHT+nRCR6NHGoaMM/G2sI1DqPOXPB5btXjdncvB739c2Beh7RgWD/zvb78qpAJDUR1KOenDy1EwN2Bg8pqH1sxlsoVrl7i7r/pAOaWKfd4U1FKP7p730GfOp/m2VRBIdYgHDPHPvGeITPKrR/G22aR886r9Lerhug==; Domain=.cmegroup.com; Path=/; Expires=Thu, 23 Dec 2021 18:16:01 GMT; Max-Age=7185; HttpOnly"
}
requests.get(
"https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json",
headers=headers)
When I run this it just hangs indefinitely, so there is some issue with the request.
Apart from the headers, what is the difference between the requests made by python and Selenium - how could I identify the issue and hopefully get this working with the python requests library?
Update
I updated the code to get the request.headers instead:
Host: www.cmegroup.com
Connection: keep-alive
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Linux"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
... but the python requests script has the same result when using these headers, just hanging (or timing out if I set a timeout parameter).
Further update
Debug output is as follows:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.cmegroup.com:443
send: b'GET /content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12%7C07%7C2021.01%7C01%7C2008.json HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nConnection: keep-alive\r\nHost: www.cmegroup.com\r\nsec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"\r\nsec-ch-ua-mobile: ?0\r\nsec-ch-ua-platform: Linux\r\nUpgrade-Insecure-Requests: 1\r\nSec-Fetch-Site: none\r\nSec-Fetch-Mode: navigate\r\nSec-Fetch-User: ?1\r\nSec-Fetch-Dest: document\r\nAccept-Language: en-US,en;q=0.9\r\n\r\n'
It looks like it only needs a compatible User-Agent header.
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0',
}
url = 'https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json'
response = requests.get(url, headers = headers, timeout = 30) # A
print(response.status_code) # Prints 200 (OK).
print(response.json()) # Prints the output as JSON. "item" key has 50 values in a list.
^ This snippet did the trick for me.
It looks, you are using the response headers, not request headers.
Try
print(request.headers)
I want to build a simple python program that gets updates for tracking numbers from UPS, I couldn't get an account number with them so I can't use their API. I decided to try web scraping.
Here's an example of a tracking number:
https://www.ups.com/track?loc=en_US&tracknum=1Z0X118AYW08592000&requester=WT/trackdetails
I want to get the scheduled delivery date, the problem is that what the requests module scrapes and what shows when I view the page source doesn't get all the information inside a tag called app-root. That's where the delivery date is.
I found a similar post that solves this problem with FedEx, but I can't get it to work with the ups website: Parsing HTML does not output desired data(tracking info for FedEx)
I installed an extension called HTTP Trace that shows all the requests that go through my server, I can't find the one that matches UPS, this is what I got from the extension when I searched for the tracking number, any ideas what I can do here?
https://wwwapps.ups.com/WebTracking/track?loc=en_IL
HTMLVersion: 5.0
loc: en_IL
track.x: Track
trackNums: 1Z0X118AYW08592000
ups-search: 1Z0X118AYW08592000
POST https://wwwapps.ups.com/WebTracking/track?loc=en_IL
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
HTTP/1.1 302 Moved Temporarily
Redirect to: https://www.ups.com/track?loc=en_IL&tracknum=1Z0X118AYW08592000&requester=WT
Server: Apache
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains
Cache-Control: no-store, no-cache
Pragma: no-cache
Location: https://www.ups.com/track?loc=en_IL&tracknum=1Z0X118AYW08592000&requester=WT
Content-Length: 365
Content-Type: text/html
Date: Thu, 14 Jan 2021 00:32:34 GMT
Connection: keep-alive
Server-Timing: cdn-cache; desc=MISS
Server-Timing: edge; dur=164
Server-Timing: origin; dur=23
Debug-AK-TLS: No bypass
GET https://www.ups.com/track?loc=en_IL&tracknum=1Z0X118AYW08592000&requester=WT
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
HTTP/1.1 200 OK
Server: Apache
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains
Cache-Control: no-store, no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Debug-AK-TLS: No bypass
X-Akamai-Transformed: 9 9152 0 pmb=mTOE,1mRUM,1
Date: Thu, 14 Jan 2021 00:32:34 GMT
Content-Length: 10947
Connection: keep-alive
Vary: Accept-Encoding
Server-Timing: cdn-cache; desc=MISS
Server-Timing: edge; dur=182
Server-Timing: origin; dur=201
https://www.facebook.com/tr/?id=969628123173894&ev=PageView&dl=https%3A%2F%2Fwww.ups.com%2Ftrack%3Floc%3Den_IL%26tracknum%3D1Z0X118AYW08592000%26requester%3DWT%2Ftrackdetails&rl=https%3A%2F%2Fwww.ups.com%2F&if=false&ts=1610584355509&sw=1920&sh=1080&v=2.9.32&r=stable&a=tmtealium&ec=0&o=30&fbp=fb.1.1598067407332.38393503&it=1610584355413&coo=false&dpo=LDU&dpoco=0&dpost=0&rqm=GET
GET https://www.facebook.com/tr/?id=969628123173894&ev=PageView&dl=https%3A%2F%2Fwww.ups.com%2Ftrack%3Floc%3Den_IL%26tracknum%3D1Z0X118AYW08592000%26requester%3DWT%2Ftrackdetails&rl=https%3A%2F%2Fwww.ups.com%2F&if=false&ts=1610584355509&sw=1920&sh=1080&v=2.9.32&r=stable&a=tmtealium&ec=0&o=30&fbp=fb.1.1598067407332.38393503&it=1610584355413&coo=false&dpo=LDU&dpoco=0&dpost=0&rqm=GET
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: image/avif,image/webp,image/apng,image/*,*/*;q=0.8
HTTP/1.1 302
Redirect to: https://cx.atdmt.com/?c=1479770850078954307&f=AYzL_IHfyiIJ9HIa7oqq8XcmRPtLo6M0aKkForULuTS_d5qgkpmUtO1x4Rmi3jkdZ4EPRHG7qxKZDTiWb-BA5MYf&id=969628123173894&l=3&v=0
cache-control: no-cache, no-store, must-revalidate
pragma: no-cache
expires: 0
date: Thu, 14 Jan 2021 00:32:36 GMT
location: https://cx.atdmt.com/?c=1479770850078954307&f=AYzL_IHfyiIJ9HIa7oqq8XcmRPtLo6M0aKkForULuTS_d5qgkpmUtO1x4Rmi3jkdZ4EPRHG7qxKZDTiWb-BA5MYf&id=969628123173894&l=3&v=0
content-type: text/plain
content-length: 0
server: proxygen-bolt
alt-svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600
GET https://cx.atdmt.com/?c=1479770850078954307&f=AYzL_IHfyiIJ9HIa7oqq8XcmRPtLo6M0aKkForULuTS_d5qgkpmUtO1x4Rmi3jkdZ4EPRHG7qxKZDTiWb-BA5MYf&id=969628123173894&l=3&v=0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: image/avif,image/webp,image/apng,image/*,*/*;q=0.8
HTTP/1.1 200
x-fb-rlafr: 0
content-type: image/gif
date: Wed, 13 Jan 2021 16:32:37 PST
x-content-type-options: nosniff
report-to: {"group":"coep_report","max_age":86400,"endpoints":[{"url":"https:\/\/www.facebook.com\/browser_reporting\/"}]}
cache-control: public, max-age=0
content-encoding: br
x-frame-options: DENY
cross-origin-resource-policy: cross-origin
expires: Wed, 13 Jan 2021 16:32:37 PST
vary: Accept-Encoding
cross-origin-embedder-policy-report-only: require-corp;report-to="coep_report"
pragma: public
x-fb-debug: ySiesinmQSMWtIWGg5+rMp+g66R70GGiqJJC3M0DowZMGuFf14OidRiX02DfG99gXxjUSjCaEtHosxh/9tl/hQ==
I honestly do not believe this is possible. I checked how UPS loads its sites, and it seems to load the frontend first like this
Get request to website preview
then goes in to the api to grab the dates. For example, the delivered on date is stored in this api link ("https://www.ups.com/track/api/Track/GetStatus?loc=en_US") which needs a bunch of headers and has some akamai/security cookies (which may prevent you from scraping it).
If you really do not want to use an api, I would suggest using something like Selenium if you do not need it to be quick/do not have many links to work with.
Your only choice is either Selenium OR API but that comes with a hitch. For what I can tell on the UPS website, their API only allows for queries at night. They only want "emergencies" to be hitting their API, which is preposterous since I would imagine web requests are hitting the same API.
I'm trying to download a file from a website using Python's request module.
However the site will allow me to download the file only if the download link is clicked directly from the download page.
So using requests, I tried hitting the download page's URL first using requests.get() then proceeding to download the file. But unfortunately this doesn't seem to work. A text asking me to open the download page first simply gets written into file.torrent"
import requests
def download(username, password):
with requests.Session() as session:
session.post('https://website.net/forum/login.php', data={'login_username': username, 'login_password': password})
# Download page URL
requests.get('https://website.net/forum/viewtopic.php?t=2508126')
# The download URL itself
response = requests.get('https://website.net/forum/dl.php?t=2508126')
with open('file.torrent', 'wb') as f:
f.write(response.content)
download(username='XXXXX', password='YYYYY')
Response when downloading directly from the download page (works) :
General :
Request URL: https://website.net/forum/dl.php?t=2508126
Request Method: GET
Status Code: 200 OK
Remote Address: 185.37.128.136:443
Referrer Policy: no-referrer-when-downgrade
Response Headers :
Cache-Control: no-store, no-cache, must-revalidate
Cache-Control: post-check=0, pre-check=0
Content-Disposition: attachment; filename="[website.net].t2508126.torrent"
Content-Length: 33641
Content-Type: application/x-bittorrent; name="[website.net].t2508126.torrent"
Date: Thu, 14 Feb 2019 07:57:08 GMT
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Thu, 14 Feb 2019 07:57:09 GMT
Pragma: no-cache
Server: nginx
Set-Cookie: bb_dl=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/forum/; domain=.website.net
Request Headers :
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: bb_t=a%3A3%3A%7Bi%3A2507902%3Bi%3A1550052944%3Bi%3A2508011%3Bi%3A1550120230%3Bi%3A2508126%3Bi%3A1550125516%3B%7D; bb_data=1-27969311-wXVPJGcedLE1I2mM9H0u-3106784170-1550128652-1550131012-3061288864-1; bb_dl=2508126
Host: website.net
Referer: https://website.net/forum/viewtopic.php?t=2508126
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3701.0 Safari/537.36
Query String Parameters :
t: 2508126
Response when opening the download link on it's own (doesn't work) :
General :
Request URL: https://website.net/forum/dl.php?t=2508126
Request Method: GET
Status Code: 200 OK
Remote Address: 185.37.128.136:443
Referrer Policy: no-referrer-when-downgrade
Response Headers :
Cache-Control: no-store, no-cache, must-revalidate
Cache-Control: post-check=0, pre-check=0
Content-Type: text/html; charset=windows-1251
Date: Thu, 14 Feb 2019 08:03:29 GMT
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Thu, 14 Feb 2019 08:03:29 GMT
Pragma: no-cache
Server: nginx
Transfer-Encoding: chunked
Request Headers :
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: bb_t=a%3A3%3A%7Bi%3A2507902%3Bi%3A1550052944%3Bi%3A2508011%3Bi%3A1550120230%3Bi%3A2508126%3Bi%3A1550125516%3B%7D; bb_data=1-27969311-wXVPJGcedLE1I2mM9H0u-3106784170-1550128652-1550131390-3061288864-1
Host: website.net
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3701.0 Safari/537.36
Query String Parameters :
t: 2508126
This works for me:
data={'login_username': username, 'login_password': password, 'login': ''}
and using session.get() instead of requests.get()
Similar to a question asked here: Http Redirection code 3XX in python requests. I do also not receive redirection when I'm trying to post a form with python's requests.
To bypass same origin policy, my goal is it to proxy (redirect) an internal site with my flask application through the following code:
method_requests_mapping = {
'GET': requests.get,
'HEAD': requests.head,
'POST': requests.post,
'PUT': requests.put,
'DELETE': requests.delete,
'PATCH': requests.patch,
'OPTIONS': requests.options,
}
#bp.route('/<path:url>', methods=method_requests_mapping.keys())
def proxy(url):
url='https://intern.something.com/'+url
username=session['username']
password=session['password']
requests_function = method_requests_mapping[flask.request.method]
request = requests_function(url, stream=True, params=flask.request.args,auth=(username, password),allow_redirects=False)
response = flask.Response(flask.stream_with_context(request.iter_content()),
content_type=request.headers['content-type'],
status=request.status_code, )
response.headers['Access-Control-Allow-Origin'] = '*'
print(request.history)
print(request.cookies)
print(request.status_code)
return response
If I am trying to use the site without my flask proxy network analysis shows me this:
Request:
Host: intern.something.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate, br
Referer: https://intern.something.com/contract_config_edit.php4?Contract_ID=1463234
Content-Type: application/x-www-form-urlencoded
Content-Length: 4024
Authorization: Basic YWhvZWhuZTpLYXR6ZTc0MzYh
Connection: keep-alive
Cookie: PHPSESSID=kr9am6tpid67ikct3up67f03h0
Upgrade-Insecure-Requests: 1
Answer:
HTTP/1.1 302 Found
Date: Wed, 02 Jan 2019 07:50:31 GMT
Server: Apache/2.2.3 (Red Hat)
X-Powered-By: PHP/5.1.6
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- check=0
Pragma: no-cache
Location: https://intern.something.com /contract_show.php4?Contract_ID=1463234
Content-Length: 0
Connection: close
Content-Type: text/html
But if I do it with the proxy it seems not to work correctly:
Request:
Host: 10.146.177.18:7000
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://10.146.177.18:7000/backoffice/contract /contract_config_edit.php4?Contract_ID=1463234
Content-Type: application/x-www-form-urlencoded
Content-Length: 4024
Authorization: Basic RWluaG9ybjpGZXVlcnphbmdlbmJvaGxlNTU0ISE/
Connection: keep-alive
Cookie: _pk_id.7.1c19=5f552d1eb2170bab.1546180080.2.1546185355.1546184002.; session=.eJwtj1FKxTAQRddivt9Hkk5mJm8LLqJMJjdUxFbaPgTFvVvRz3PhwD1fYR47jiXcz_2BW5hfergHjTrIMlHxOrgSWh- NxNU0e67iEch5SpqaQaRxSz4oo1dzcRLNXcQ5Ugd4yMhVS8m9oVMt3pJpacw2UUEtrUfXaNQ7C DJaEw234Mc-5nN7xXr9YWdTBpJAY-KRMBVCKYYqrPEyJFav-fLe7Tg- tv234tnOTwhN_HTtjwP7X1z6p9XecKEtG5YV4fsHxkJOZg.Dw34rg.p2bNxLLF26aIXxth9VN7 BHA5x4U
Upgrade-Insecure-Requests: 1
Answer:
HTTP/1.0 200 OK
Content-Type: text/html
Access-Control-Allow-Origin: *
Vary: Cookie
Connection: close
Server: Werkzeug/0.14.1 Python/3.5.2
Date: Wed, 02 Jan 2019 08:15:38 GMT
Maybe it could be a problem with the cookies though it seems in the console it sends the correct cookie:
10.146.177.49 - - [02/Jan/2019 09:15:38] "POST /backoffice/contract/contract_config_edit.php4?Contract_ID=1463234 HTTP/1.1" 200 -
<RequestsCookieJar[<Cookie PHPSESSID=saqjj7n6m61aee19k3pe6moaf4 for intern.something.com/>]>
Does anyone know what the problem is here?