I am sending several post requests in Fiddler2 to check my site to make sure it is working properly. However, when I automate in Python to simulate this over several hours (I really don't want to spend 7 hours hitting space!).
This works in fiddler. I can create the account and perform related API commands. However in Python, nothing happens with this code:
def main():
import socket
from time import sleep
x = raw_input("Points: ")
x = int(x)
x = int(x/150)
for y in range(x):
new = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
new.connect(('example.com', 80))
mydata ="""POST http://www.example.com/api/site/register/ HTTP/1.1
Host: www.example.com
Connection: keep-alive
Content-Length: 191
X-NewRelic-ID: UAIFVlNXGwEFV1hXAwY=
Origin: http://www.example.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Accept: application/json, text/javascript, */*; q=0.01
X-Requested-With: XMLHttpRequest
X-CSRFToken: CEC9EzYaQOGBdO9HGPVVt3Fg66SVWVXg
DNT: 1
Referer: http://www.example.com/signup
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-GB,en;q=0.8
Cookie: sessionid=sessionid; sb-closed=true; arp_scroll_position=600; csrftoken=2u92jo23g929gj2; __utma=912.1.1.2.5.; __utmb=9139i91; __utmc=2019199; __utmz=260270731.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
username=user&password=password&moredata=here """
new.send(mydata.encode('hex'))
print "Sent", y, "of", x
sleep(1)
print "Sent all!"
print "restarting"
main()
main()
I know I could use While True, but I intend to add more functions later to test more sites.
Why does this program not do anything to the API, when Fiddler2 can? I know it is my program, as I can send the exact same packet in fiddler (obviously pointing to the right place) and it works.
PS - If anyone does fix this, as its probably something really obvious, please can you only use modules that are bundled with Python. I cannot install modules from other places. Thanks!
HTTP requests are not as easy as you think they are. First of all this is wrong:
"""POST http://www.example.com/api/site/register/ HTTP/1.1
Host: www.example.com
Connection: keep-alive
...
"""
Each line in HTTP request has to end with CRLF (in Python with \r\n), i.e. it should be:
"""POST http://www.example.com/api/site/register/ HTTP/1.1\r
Host: www.example.com\r
Connection: keep-alive\r
...
"""
Note: LF = line feed = \n is there implicitly. Also you didn't see CR in your fiddler, because it's a white-space. But it has to be there (simple copy-paste won't work).
Also HTTP specifies that after headers there has to be CRLF as well. I.e. your entire request should be:
mydata = """POST http://www.example.com/api/site/register/ HTTP/1.1\r
Host: www.example.com\r
Connection: keep-alive\r
Content-Length: 191\r
X-NewRelic-ID: UAIFVlNXGwEFV1hXAwY=\r
Origin: http://www.example.com\r
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36\r
Content-Type: application/x-www-form-urlencoded; charset=UTF-8\r
Accept: application/json, text/javascript, */*; q=0.01\r
X-Requested-With: XMLHttpRequest\r
X-CSRFToken: CEC9EzYaQOGBdO9HGPVVt3Fg66SVWVXg\r
DNT: 1\r
Referer: http://www.example.com/signup\r
Accept-Encoding: gzip,deflate,sdch\r
Accept-Language: en-GB,en;q=0.8\r
Cookie: sessionid=sessionid; sb-closed=true; arp_scroll_position=600; csrftoken=2u92jo23g929gj2; __utma=912.1.1.2.5.; __utmb=9139i91; __utmc=2019199; __utmz=260270731.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)\r
\r
username=user&password=password&moredata=here"""
Warning: it should be exactly as I've written. There can't be any spaces in front of each line, i.e. this:
mydata = """POST http://www.example.com/api/site/register/ HTTP/1.1\r
Host: www.example.com\r
Connection: keep-alive\r
...
"""
is wrong.
Side note: you can move mydata to the top, outside of the loop. Unimportant optimization but makes your code cleaner.
Now you've said that the site you are using wants you to hex-encode HTTP request? It's hard for me to believe that (HTTP is a raw string by definition). Don't do that (and ask them to specify what exactly this hex-encoding means). Possibly they've meant that the URL should be hex-encoded (since it is the only hex-encoding that is actually used in HTTP)? In your case there is nothing to encode so don't worry about it. Just remove the .encode('hex') line.
Also Content-Length header is messed up. It should be the actual length of the content. So if for example body is username=user&password=password&moredata=here then it should be Content-Length: 45.
Next thing is that the server might not allow you making multiple requests without getting a response. You should use new.recv(b) where b is a number of bytes you want to read. But how many you should read? Well this might be problematic and that's where Content-Length header comes in. First you have to read headers (i.e. read until you read \r\n\r\n which means the end of headers) and then you have to read the body (based on Content-Length header). As you can see things are becoming messy (see: final section of my answer).
There are probably more issues with your code. For example X-CSRFToken suggests that the site uses CSRF prevention mechanism. In that case your request might not work at all (you have to get the value of X-CSRFToken header from the server).
Finally: don't use sockets directly. Httplib (http://docs.python.org/2/library/httplib.html) is a great (and built-in) library for making HTTP requests which will deal with all the funky and tricky HTTP stuff for you. Your code for example may look like this:
import httplib
headers = {
"Host": "www.example.com",
"X-NewRelic-ID": "UAIFVlNXGwEFV1hXAwY=",
"Origin": "http://www.example.com",
"Connection": "keep-alive",
"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept": "application/json, text/javascript, */*; q=0.01",
"X-Requested-With": "XMLHttpRequest",
"X-CSRFToken": "CEC9EzYaQOGBdO9HGPVVt3Fg66SVWVXg",
"DNT": "1",
"Referer": "http://www.example.com/signup",
"Accept-Encoding": "gzip,deflate,sdch",
"Accept-Language": "en-GB,en;q=0.8",
"Cookie": "sessionid=sessionid; sb-closed=true; arp_scroll_position=600; csrftoken=2u92jo23g929gj2; __utma=912.1.1.2.5.; __utmb=9139i91; __utmc=2019199; __utmz=260270731.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)"
}
body = "username=user&password=password&moredata=here"
conn = httplib.HTTPConnection("example.com")
conn.request("POST", "http://www.example.com/api/site/register/", body, headers)
res = conn.getresponse()
Note that you don't need to specify Content-Length header.
Related
There are similar questions posted, but I still seem to have a problem. I am expecting to receive a registration email after running this. I receive nothing. Two questions. What is wrong? How would I even know if the data was successfully submitted as opposed to the page just loading normally?
serviceurl = 'https://signup.com/'
payload = {'register-fname': 'Peter', 'register-lname': "Parker", 'register-email': 'xyz#email.com', 'register-password': '9dlD313kF'}
r2 = requests.post(serviceurl, data=payload)
print(r2.status_code)
The url for the POST request is actually https://signup.com/api/users, and it returns 200 (in my browser).
You need to replicate what your browser does. This might include certain request headers.
You will want to use your browser's dev tools/network inspector to gather this information.
The information below it from my Firefox on my computer:
Request headers:
Host: signup.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/json;charset=utf-8
Content-Length: 107
Origin: https://signup.com
Connection: keep-alive
Referer: https://signup.com/
Cookie: _vspot_session_id=ce1937cf52382239112bd4b98e0f1bce; G_ENABLED_IDPS=google; _ga=GA1.2.712393353.1584425227; _gid=GA1.2.1095477818.1584425227; __utma=160565439.712393353.1584425227.1584425227.1584425227.1; __utmb=160565439.2.10.1584425227; __utmc=160565439; __utmz=160565439.1584425227.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __qca=P0-1580853344-1584425227133; _gat=1
Pragma: no-cache
Cache-Control: no-cache
Payload:
{"status":true,"code":null,"email":"TestEmail#hotmail.com","user":{"id":20540206,"email":"TestEmail#hotmail.com","name":"TestName TestSurname","hashedpassword":"4ffdbb1c33d14ed2bd02164755c43b4ad8098be2","salt":"700264767700800.7531319164902858","accesskey":"68dd25c3ae0290be69c0b59877636a5bc5190078","isregistered":true,"activationkey":"f1a6732b237379a8a1e6c5d14e58cf4958bf2cea","isactivated":false,"chgpwd":false,"timezone":"","phonenumber":"","zipcode":"","gender":"N","age":-1,"isdeferred":false,"wasdeferred":false,"deferreddate":null,"registerdate":"2020/03/17 06:09:27 +0000","activationdate":null,"addeddate":"2020/03/17 06:09:27 +0000","admin":false,"democount":0,"demodate":null,"invitationsrequest":null,"isvalid":true,"timesinvalidated":0,"invaliddate":null,"subscribe":0,"premium":false,"contributiondate":null,"contributionamount":0,"premiumenddate":null,"promo":"","register_token":"","premiumstartdate":null,"premiumsubscrlength":0,"initial_reg_type":"","retailmenot":null,"sees":null,"created_at":"2020/03/17 06:09:27 +0000","updated_at":"2020/03/17 06:09:27 +0000","first_name":"TestName","last_name":"TestSurname"},"first_name":"TestName","last_name":"TestSurname","mobile_redirect":false}
There's a lot to replicate. Things like the hashed password, salt, dates, etc would have been generated by JavaScript executed by your browser.
Keep in mind, the website owner might not appreciate a bot creating user accounts.
I'm trying to download several large files from a server through a url.
When downloading the file through Chrome or Internet Explorer it takes around 4-5 minutes to download the file, which has a size of around 100 MB.
But when I try to do the same download using either PyCurl
buffer = BytesIO()
ch = curl.Curl()
ch.setopt(ch.URL, url)
ch.setopt(curl.TRANSFERTEXT, True)
ch.setopt(curl.AUTOREFERER, True)
ch.setopt(curl.FOLLOWLOCATION, True)
ch.setopt(curl.POST, False)
ch.setopt(curl.SSL_VERIFYPEER, 0)
ch.setopt(curl.WRITEFUNCTION, buffer.write)
ch.perform()
Or using requests
r = requests.get(url).text
I get the
'pycurl.error: (56, 'OpenSSL SSL_read: Connection was reset, errno 10054')'
or
'[Errno 10054] An existing connection was forcibly closed by the remote host.'
When I look in Chrome during the download of the large file this is what I see
General:
Referrer Policy: no-referrer-when-downgrade
Requests Header:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: no-cache
Connection: keep-alive
Cookie : JSESSIONID = ****
Pragma: no-cache
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/... (KHTML, like Gecko) Chrome/... Safari/...
Is there anything I can do in my configuration to not have the connection close, similar to when I access it through my browser? Or is it on the server side the problem is?
EDIT
To add more information. Most of the time after the request is made is spent waiting for the server to put together the data before the actual download starts (it is generating an XML file by aggregating data from different data sources).
Try adding headers and cookies to your request.
headers = { "User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36" }
cookies = { "Cookie1": "Value1"}
r = requests.get(url, headers=headers, cookies=cookies)
(I'm new to posting on stack overflow tell me stuff to improve on ty)
I recently got into python requests and I am a bit new to web stuff related to python so if its something stupid don't flame me. I've been trying a lot of ways to get one of my post events to not return error 400. I messed with headers and data and a bunch of other stuff and I would either get code 404 or 400.
I have tried adding a bunch of headers to my project. Im trying to login to the site "Roblox" making a auto purchase bot. The Csrf Token was not in cookies so I found a loop hole getting the token with sending a request to a logout url (If anyone gets confused). I double checked the Data I inputted aswell and it is indeed correct. I was researching for multiple hours and I couldn't find a way to fix so I came to stack overflow for the first time.
RobloxSignInDeatils = {
"cvalue": "TestWebDevAcc", #Username
"ctype": "Username", #Leave Alone
"password": "Test123123" #Password
}
def GetToken(Session, Headers):
LogoutUrl = "https://api.roblox.com/sign-out/v1" #Url to get CSRF
Request = Session.post(LogoutUrl, Headers) #Sent Post
print("Token: ", Request.headers["X-CSRF-TOKEN"]) #Steal the token
return Request.headers["X-CSRF-TOKEN"] #Return Token
with requests.session() as CurrentSession:
Header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.96 Safari/537.36"
} #Get the UserAgent Header Only
CurrentSession.get("https://www.roblox.com/login", headers=Header) #Send to the login page
CsrfToken = GetToken(CurrentSession, Header) #Get The Token
Header["X-CSRF-TOKEN"] = CsrfToken #Set the token
SignIn = CurrentSession.post("https://auth.roblox.com/v2/login",
data=RobloxSignInDeatils, headers=Header) # Send Post to Sign in
print(SignIn.status_code) #Returns 400 ?
Printing "print(SignIn.status_code)" just returns 400 nothing else to explain
EDIT: If this helps heres a list of ALL the headers:
Request URL: https://auth.roblox.com/v2/login
Request Method: OPTIONS
Status Code: 200 OK
Remote Address: 128.116.112.44:443
Referrer Policy: no-referrer-when-downgrade
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: X-CSRF-TOKEN,Content-Type,Pragma,Cache-Control,Expires
Access-Control-Allow-Methods: OPTIONS, TRACE, GET, HEAD, POST, DELETE, PATCH
Access-Control-Allow-Origin: https://www.roblox.com
Access-Control-Max-Age: 600
Cache-Control: max-age=120, private
Content-Length: 0
Date: Thu, 14 Feb 2019 04:04:13 GMT
P3P: CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"
Roblox-Machine-Id: CHI1-WEB2875
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Access-Control-Request-Headers: content-type,x-csrf-token
Access-Control-Request-Method: POST
Connection: keep-alive
Host: auth.roblox.com
Origin: https://www.roblox.com
Referer: https://www.roblox.com/
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.96 Safari/537.36
Payload:
{cvalue: "TestWebDevAcc", ctype: "Username", password: "Test123123"}
I'm trying to put variables into long string but it raises error. I've tried %s notation and now I'm trying to put there .format() notation but it doesn't work either.
Could you tell me what is the problem? Is it because the string is on multiple lines?
num = 5
r=requests.post("http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{0}'.html".format(str(num)), headers=mLib.firebug_headers_to_dict("""POST /kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{1}'.html HTTP/1.1
Host: www.quoka.de
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{2}'.html
Cookie: QSESSID=cs9djh8q8c6mjgsme85s4mf7iq24pqrthag630ar6po9fp078e20; PARTNER=VIEW%02quoka%01COOKIEBEGIN%021438270171; QUUHS=QPV%027%01QABAFS%02A%01ARYSEARCHODER%02%7B%22search1%22%3A%22nachhilfe%22%7D; __utma=195481459.415565446.1438270093.1438270093.1438270093.1; __utmb=195481459.22.8.1438270093; __utmc=195481459; __utmz=195481459.1438270093.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __utmt_t2=1; POPUPCHECK=1438356493224; axd=100086901728180087; __gads=ID=92bfc541911a1c81:T=1438270098:S=ALNI_MYuEdhQnu7sWAfK-fyKf1Ej93_9KA; crtg_rta=; OX_sd=5; OX_plg=wmp|pm; rsi_segs=L11278_10123|L11278_10571|L11278_11639|F12351_10001|F12351_0; PURESADSCL=done
Connection: keep-alive
""".format(str(num),str(num-1))))
ERROR:
""".format(str(num),str(num-1))))
IndexError: tuple index out of range
In the second multi-line string, you are giving indexes as - {1} and {2} at -
_page_'{1}'.html
and
_0_page_'{2}'.html
But you are only giving two variables as argument for the format() function.
That is why when format function tries to access variable at {2} (which is the third argument, as its 0 indexed) it gets error.
You should define them as {0} and {1} .
Example -
r=requests.post("http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{0}'.html".format(str(num)), headers=mLib.firebug_headers_to_dict("""POST /kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{0}'.html HTTP/1.1
Host: www.quoka.de
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{1}'.html
Cookie: QSESSID=cs9djh8q8c6mjgsme85s4mf7iq24pqrthag630ar6po9fp078e20; PARTNER=VIEW%02quoka%01COOKIEBEGIN%021438270171; QUUHS=QPV%027%01QABAFS%02A%01ARYSEARCHODER%02%7B%22search1%22%3A%22nachhilfe%22%7D; __utma=195481459.415565446.1438270093.1438270093.1438270093.1; __utmb=195481459.22.8.1438270093; __utmc=195481459; __utmz=195481459.1438270093.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __utmt_t2=1; POPUPCHECK=1438356493224; axd=100086901728180087; __gads=ID=92bfc541911a1c81:T=1438270098:S=ALNI_MYuEdhQnu7sWAfK-fyKf1Ej93_9KA; crtg_rta=; OX_sd=5; OX_plg=wmp|pm; rsi_segs=L11278_10123|L11278_10571|L11278_11639|F12351_10001|F12351_0; PURESADSCL=done
Connection: keep-alive
""".format(str(num),str(num-1))))
Your format string starts numbering at 1, not 0. You only provide 2 values for slots 0 and 1, but your template expects parameters numbered 1 and 2.
Note that you have two separate strings here. The first string uses {0}, the second uses {1} and {2}. Renumber those to {0} and {1} instead.
To reiterate, the first string is fine:
"http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{0}'.html".format(str(num))
but you have to restart the numbering in the second string. In that string you have
POST /kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{1}'.html HTTP/1.1
and
Referer: http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{2}'.html
Simply use
POST /kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{0}'.html HTTP/1.1
and
Referer: http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_'{1}'.html
Note that those '..' single quotes are part of the string you are producing. I just tested the site and the URLs there do not use those quotes. You should probably remove them. You also don't have to call str() on everything, the template takes care of that for you:
num = 5
r = requests.post(
"http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_{0}.html".format(num),
headers=mLib.firebug_headers_to_dict("""\
POST /kleinanzeigen/nachhilfe/cat_0_ct_0_page_{0}.html HTTP/1.1
Host: www.quoka.de
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_{1}.html
Cookie: QSESSID=cs9djh8q8c6mjgsme85s4mf7iq24pqrthag630ar6po9fp078e20; PARTNER=VIEW%02quoka%01COOKIEBEGIN%021438270171; QUUHS=QPV%027%01QABAFS%02A%01ARYSEARCHODER%02%7B%22search1%22%3A%22nachhilfe%22%7D; __utma=195481459.415565446.1438270093.1438270093.1438270093.1; __utmb=195481459.22.8.1438270093; __utmc=195481459; __utmz=195481459.1438270093.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __utmt_t2=1; POPUPCHECK=1438356493224; axd=100086901728180087; __gads=ID=92bfc541911a1c81:T=1438270098:S=ALNI_MYuEdhQnu7sWAfK-fyKf1Ej93_9KA; crtg_rta=; OX_sd=5; OX_plg=wmp|pm; rsi_segs=L11278_10123|L11278_10571|L11278_11639|F12351_10001|F12351_0; PURESADSCL=done
Connection: keep-alive
""".format(num, num - 1)
))
Furthermore, I strongly doubt you need to include the POST <path> HTTP/1.1line at all. The Host, Accept-Encoding and Connection headers will be taken care of by requests, and if you were to use a Session() object, so will the cookies. Try not to set all those headers explicitly. Setting the headers as a dictionary is perhaps more readable:
session = requests.Session()
url = 'http://www.quoka.de/kleinanzeigen/nachhilfe/cat_0_ct_0_page_{0}.html'
num = 5
r = session.post(
url.format(num),
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0',
'Referer': url.format(num - 1),
}
)
Last but not least, are you sure you need to perform a POST here? You are not posting anything, and a GET works just fine too.
I wrote a small module about 1 year age.
When I tried to add some feature to it these days, I found there is a big change in python's urllib and I was confused about how to build a request.
In the old module FancyURLopener was used as my base class, yet I found it was Deprecated since version 3.3.
So I read the document again, and try to build a request instead of a opener.
However, when I tried to add headers, only one function Request.add_header(key, val) was provided. I have headers copied from fiddler like this:
GET some_url HTTP/1.1
Host: sosu.qidian.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: zh-cn,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
X-Requested-With: XMLHttpRequest
Referer: anothe_url
Cookie: a lot of data
Connection: keep-alive
So do I have to add them to the request one by one?
Also I found another openner urllib.request.build_opener() which can add a lot of headers in one time. But I could not set method 'get' or 'post'.
I am a newbie on python, any suggestions ?
By far the best way to make http requests in Python is to install and use this module:
http://docs.python-requests.org/en/latest/
You can write code like this:
headers = {
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'anothe_url',
# etc
}
response = requests.get(some_url, headers=headers)
http://docs.python-requests.org/en/latest/user/quickstart/#custom-headers