Strange error response - requests - python

Updated code - I'm using this code to send the request:
headers = {
"Host": "www.roblox.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",
"Accept": "application/json, text/plain, */*",
"Accept-Language": "en-US;q=0.7,en;q=0.3",
"Referer": "https://www.roblox.com/users/12345/profile",
"Content-Type": "application/json;charset=utf-8",
"X-CSRF-TOKEN": "some-xsrf-token",
"Content-Length": "27",
"DNT": "1",
"Connection": "close"
}
data = {"targetUserId":"56789"}
url = "http://www.roblox.com/user/follow"
r = requests.post(url, headers=headers, data=data, cookies={"name":"value"})
Response (using r.text):
{"isValid":false,"data":null,"error":""}
The request itself is valid, I sent it using burp and it worked:
POST /user/follow HTTP/1.1
Host: www.roblox.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: application/json, text/plain, */*
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Referer: https://www.roblox.com/users/12345/profile
Content-Type: application/json;charset=utf-8
X-CSRF-TOKEN: Ab1/2cde3fGH
Content-Length: 27
Cookie: some-cookie=;
DNT: 1
Connection: close
{"targetUser":"56789"}

Because it works in Burp but not in Python requests, get a packet sniffer (Wireshark is the simplest IMO) and look to see the difference in the packet sent by Burp that works and the one sent from Python that does not work. I am suspecting that the problem is that the website is HTTPS but you are using http://www.roblox.com . Do try https://www.roblox.com , but I am not sure if it will work.

Related

Imitate Request with python

I'm trying to imitate a request
POST /default/latex2image HTTP/2
Host: e1kf0882p7.execute-api.us-east-1.amazonaws.com
Content-Length: 96
Sec-Ch-Ua: " Not A;Brand";v="99", "Chromium";v="104"
Accept: application/json, text/javascript, */*; q=0.01
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Sec-Ch-Ua-Mobile: ?0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36
Sec-Ch-Ua-Platform: "Windows"
Origin: https://latex2image.joeraut.com
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://latex2image.joeraut.com/
Accept-Encoding: gzip, deflate
Accept-Language: es-ES,es;q=0.9
{"latexInput":"\\begin{align*}\n{1}\n\\end{align*}\n",
"outputFormat":"PNG",
"outputScale":"125%"}
When its sent from the original brower, there is no problem.
However when I try to do it in python, the server rejects the request, and I don't know why.
This is what I tried:
pload = {
"latexInput":"{0}",
"outputFormat":"PNG",
"outputScale":"125%"
}
header = {
"Content-Length": "96",
"Sec-Ch-Ua": "Not A;Brand;v=99, Chromium;v=104",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Sec-Ch-Ua-Mobile":"?0",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36",
"Sec-Ch-Ua-Platform": "Windows",
"Origin": "https://latex2image.joeraut.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://latex2image.joeraut.com/",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "es-ES,es;q=0.9",
}
r = requests.post("http://e1kf0882p7.execute-api.us-east-1.amazonaws.com", data=pload, headers=header)
print(r.text)
print(r.status_code)
And the error it raised:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='e1kf0882p7.execute-api.us-east-1.amazonaws.com',
port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000259A4D20670>: Failed to establish a new connection: [WinError 10061] No se puede establecer una
conexión ya que el equipo de destino denegó expresamente dicha conexión'))

Scraping website with requests and BS4 soup content coming back with question marks in html

I am scraping a website with the following url and headers:
url : 'https://tennistonic.com/tennis-news/'
headers :
{
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Cache-Control": "no-cache",
"content-length": "0",
"content-type": "text/plain",
"cookie": "IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc",
"origin": "https//tennistonic.com",
"Connection": "keep-alive",
"Pragma": "no-cache",
"Referer": "https://tennistonic.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36",
"x-client-data": "CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB"}
The x client data has a decoded section afterwards which I have left out but also tried with. The full request on dev tools is shown below:
:authority: stats.g.doubleclick.net
:method: POST
:path: /j/collect?t=dc&aip=1&_r=3&v=1&_v=j87&tid=UA-13059318-2&cid=1499412700.1601628730&jid=598376897&gjid=243704922&_gid=1691643639.1604317227&_u=QACAAEAAAAAAAC~&z=1736278164
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8
cache-control: no-cache
content-length: 0
content-type: text/plain
cookie: IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc
origin: https://tennistonic.com
pragma: no-cache
referer: https://tennistonic.com/
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36
x-client-data: CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB
Decoded:
message ClientVariations {
// Active client experiment variation IDs.
repeated int32 variation_id = [3300109, 3300134, 3300161, 3313321, 3315223, 3318700, 3318773, 3318887, 3318889, 3319522, 3320540, 3320769, 3329021, 3329167];
// Active client experiment variation IDs that trigger server-side behavior.
repeated int32 trigger_variation_id = [3317898];
}
r = requests.get(url2, headers=headers2)
soup_cont = soup(r.content, 'html.parser')
My soup contents from the response is as follows:
Is this website protected or am I sending wrong requests?
Try using selenium:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://tennistonic.com/tennis-news/')
time.sleep(3)
soup = BeautifulSoup(driver.page_source,'html5lib')
print(soup.prettify())
driver.close()

Return 302 in Web Crawler

After I simulate to log in, when I try to post the original website, it returns 302. When I open the original website in Chrome, it returns 415.
I tried several ways:
session.post(url,headers = headers,data = data)
requests.post(url,headers = headers,data = data)
urllib.request.urlopen.read(url).decode()
import requets
import json
header = {'Host': 'sty.js118114.com:8080',
'Connection': 'keep-alive',
'Content-Length': '8188',
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
'Content-Type': 'text/plain;charset=UTF-8;application/xml',
'Origin': 'http://sty.js118114.com:8080',
'Referer':
'http://sty.js118114.com:8080/Report/report/movecar_list.html',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': cookie_name + '=' + cookie_value
}
data = {"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
res = requests.post(target,data = json.dumps(data),headers = header)
print(res.content.decode())
I expect the content must be the json version or html version so that I can use re model or xpath to get the infomation I want.(without any redirects
Lastly, I provide the necessary infomation about the problem:
Chrome Network
General
Request URL: http://sty.js118114.com:8080/Report/movecar/list/1/10
Request Method: POST
Status Code: 200 OK
Remote Address: 127.0.0.1:8888
Referrer Policy: no-referrer-when-downgrade
Response Headers
Content-Length: 8150
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:47:51 GMT
Server: Apache-Coyote/1.1
Request Headers
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Content-Length: 66
Content-Type: text/plain;charset=UTF-8;
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
Host: sty.js118114.com:8080
Origin: http://sty.js118114.com:8080
Proxy-Connection: keep-alive
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
X-Requested-With: XMLHttpRequest
Request Payload
{calling_no: "", begin_time: "", end_time: "", called_car_no: ""}
begin_time: ""
called_car_no: ""
calling_no: ""
end_time: ""
Fiddler Inspectors Raw
POST http://sty.js118114.com:8080/Report/movecar/list/1/10 HTTP/1.1
Host: sty.js118114.com:8080
Connection: keep-alive
Content-Length: 66
Accept: */*
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
Content-Type: text/plain;charset=UTF-8;
Origin: http://sty.js118114.com:8080
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
{"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
Response Raw
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:27:59 GMT
Content-Length: 8150

Can't login to a specific ASP.NET website using python requests

So I've been trying for the last 6 hours to make this work, but I couldn't and endless searches didn't help, So I guess I'm either doing something very fundamental wrong, or it's just a trivial bug which happens to match my logic so I need extra eyes to help me fix it.
The website url is this.
I wrote a piece of messy python code to just login and read the next page, but All I get is a nasty 500 error saying something on the server went wrong processing my request.
Here is the request made by a browser which works just fine, no problem.
HTTP Response code to this request is 302 (Redirect)
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Content-Length: 303
Cache-Control: max-age=0
Origin: https://www.wes.org
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://www.wes.org/appstatus/index.aspx
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cookie: ASP.NET_SessionId=bu2gemmlh3hvp4f5lqqngrbp; _ga=GA1.2.1842963052.1473348318; _gat=1
__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&__VIEWSTATEGENERATOR=189D346C&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&txtPWD=PASSWORDREMOVED&Submit=Log+In&Hidden1=
and this one is the request made by my script.
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Accept-Encoding: gzip, deflate, br
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
Origin: https://www.wes.org
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cache-Control: max-age=0
Referer: https://www.wes.org/appstatus/indexca.aspx
Cookie: ASP.NET_SessionId=nxotmb55jjwf5x4511rwiy45
Content-Length: 303
txtPWD=PASSWORDREMOVED&Submit=Log+In&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&Hidden1=&__VIEWSTATEGENERATOR=189D346C
And this is the script making the request, I'm sorry if it's so messy, just need something quick.
import requests
import bs4
import urllib.parse
def main():
session = requests.Session()
headers = {"Origin": "https://www.wes.org",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1", "Connection": "close",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",
"Referer": "https://www.wes.org/appstatus/indexca.aspx", "Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.8,fa;q=0.6", "Content-Type": "application/x-www-form-urlencoded"}
r = session.get('https://www.wes.org/appstatus/index.aspx',headers=headers)
cookies = r.cookies
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState=urllib.parse.quote(str(soup.select('#__VIEWSTATE')[0]).split('value="')[1].split('"/>')[0])
viewStateGenerator=urllib.parse.quote(str(soup.select('#__VIEWSTATEGENERATOR')[0]).split('value="')[1].split('"/>')[0])
eventValidation=urllib.parse.quote(str(soup.select('#__EVENTVALIDATION')[0]).split('value="')[1].split('"/>')[0])
paramsPost = {}
paramsPost.update({'__VIEWSTATE':viewState})
paramsPost.update({'__VIEWSTATEGENERATOR':viewStateGenerator})
paramsPost.update({'__EVENTVALIDATION':eventValidation})
paramsPost.update({"txtUID": "My#Email.Removed"})
paramsPost.update({"txtPWD": "My_So_Called_Password"})
paramsPost.update({"Submit": "Log In"})
paramsPost.update({"Hidden1": ""})
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers,
cookies=cookies)
print("Status code:", response.status_code) #Outputs 500.
#print("Response body:", response.content)
if __name__ == '__main__':
main()
Any help would be so much appreciated.
You are doing way too much work and in doing so not passing valid data,you extract value attribute directly i.e .select_one('#__VIEWSTATEGENERATOR')["value"] and the same for all the rest, the cookies will be set in the Session object after your initial get so the logic boils down to:
with requests.Session() as session:
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"}
r = session.get('https://www.wes.org/appstatus/index.aspx', headers=headers)
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState = soup.select_one('#__VIEWSTATE')["value"]
viewStateGenerator = soup.select_one('#__VIEWSTATEGENERATOR')["value"]
eventValidation = soup.select_one('#__EVENTVALIDATION')["value"]
paramsPost = {'__VIEWSTATE': viewState,'__VIEWSTATEGENERATOR': viewStateGenerator,
'__EVENTVALIDATION': eventValidation,"txtUID": "My#Email.Removed",
"txtPWD": "My_So_Called_Password",
"Submit": "Log In","Hidden1": ""}
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers)
print("Status code:", response.status_code)
Python by convention uses CamelCase for class names and lowercase with underscores to separate multiple words, you might want to consider applying that to your code.

POST requests to ASP site don't bring a good result

I try to create POST request to ASP site (just like in Firefox), for get JSON response.
But in my code response is html, not JSON.
link to site
Firebug Response Headers:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Mon, 08 Sep 2014 11:32:22 GMT
Content-Length: 101
Firebug Request Headers:
POST /Portal/WebPageMethods/Playlista/playlist.aspx HTTP/1.1
Host: www.polskieradio.pl
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: http://www.polskieradio.pl/10,Czworka.json
Content-Length: 17
Cookie: cookies-accepted=true; ASP.NET_SessionId=35p3kig5t5cmlikubnlnytlh
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
source code:
import requests
import json
url = "http://www.polskieradio.pl/Portal/WebPageMethods/Playlista/playlist.aspx?program=4&count=1"
payload = { "Host": "www.polskieradio.pl",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "pl,en-US;q=0.7,en;q=0.3",
"Accept-Encoding": "gzip, deflate",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"X-Requested-With": "XMLHttpRequest",
"Referer": "http://www.polskieradio.pl/10,Czworka",
"Content-Length": "17",
"Cookie": "cookies-accepted=true; ASP.NET_SessionId=5l1eezrjfdyvvevxushojtc2",
"Connection": "keep-alive",
"Pragma": "no-cache",
"Cache-Control": "no-cache"
}
r = requests.post(url, data=json.dumps(payload))
print(r.headers['content-type'])
print r.content
How to do this properly?
Thanks for answers!
Try a little bit different...
Look at this example:
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
r = requests.post(url, data=json.dumps(data), headers=headers)
Accept is a header, not a payload.
Everything you are sending as payload, are, in fact, headers.
Your POST payload may be program=4&count=1, or you can do a GET.
--- ADDITION with final solution
import requests
import json
url = "http://www.polskieradio.pl/Portal/WebPageMethods/Playlista/playlist.aspx"
data = 'program=4&count=1'
headers = {
'User-Agent': 'curl/7.35.0',
'Host': 'www.polskieradio.pl',
'Accept':'*/*',
'Proxy-Connection': 'Keep-Alive',
'Content-Type': 'application/x-www-form-urlencoded'
}
r = requests.post(url, data=data, headers=headers)
print r.content

Categories