Normally, when sending a HTTP request, the actually traffic is like below:
GET /abc?hello HTTP/1.1
Host: localhost:8080
User-Agent: python-requests/2.7
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
However, I would like to send URLs without the leading slash, for example:
GET abc?hello HTTP/1.1
GET ftp://abc?hello HTTP/1.1
I understand that's not compliant with RFCs, but just need to send such request for testing purpose in Python.
Have checked requests, urllib, urllib2, urllib3, haven't figured out how to do it.
Anyone can help me out?
Related
I am scraping search results from google using people_also_ask module. The module itself dont have method to use proxies but I manually added proxies in the module. When I got blocked from google I printed the status and it was printing my ip address was banned from sending requests. The code I added in people_also_ask module to use proxies is
proxies = {
'http' : "http://username:passward#ip:port"
}
response = SESSION.get(URL, params=params, headers=HEADERS, proxies=proxies)
.I know it is an illegal activity but I want to know why it happens for education purpose mainly. I think the code to extract the data is irrelevant so I am adding simple code to send request using people_also_ask module
import people_also_ask as paa
queries = ["how to boil eggs","how to make cake","price of poco f1","price of wooden table","best soap in us","how much tesla worth"]
for query in queries:
questions = paa.get_related_questions(query ,40)
Note: The changes are made in first function named search() of google.py of people_also_people module
Note: I am doing searchs from browser without any problem. why is google allowing me to use google but blocked from using the script
The answer is quite simple. Although it is a proxy service, it doesn't guarantee 100% anonymity. When you send the HTTP GET request via the proxy server, the request sent by your program to the proxy server is:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Now, when the proxy server sends this request to the actual destination, it sends:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Via: 1.1 naxserver (squid/3.1.8)
X-Forwarded-For: 122.126.64.43
Cache-Control: max-age=18000
Connection: keep-alive
As you can see, it throws your IP (in my case, 122.126.64.43) in the HTTP header: X-Forwarded-For and hence the website knows that the request was sent on behalf of 122.126.64.43
Read more about this header at: https://www.rfc-editor.org/rfc/rfc7239
If you want to host your own squid proxy server and want to disable setting X-Forwarded-For header, read: http://www.squid-cache.org/Doc/config/forwarded_for/
I dont get any credit for the answer I copied this answer from the following post I found Python Requests module - proxy not working
There are many different ways of reading web pages in python.
I focused on the following methods:
Retrieve a page
Opening Socket
Making Request
Example of Retrieving a page:
from urllib.request import urlretrieve
url = 'http://ce.sharif.edu/courses'
file_name = 'courses.html'
urlretrieve(url, file_name)
Example of Opening Socket:
from urllib.request import urlopen
url = 'http://ce.sharif.edu/courses'
socket = urlopen(url)
text = str(url.readall())
socket.close()
Example of Making Request:
>>>import requests
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
So the problem is what are the main differences of the above methods and their usage?
The third method is using a different library than the above. This doesn't look like a problem, but let's see the content of the request as seen from the server side.
1)
GET /example1.html HTTP/1.1
Accept-Encoding: identity
Host: XXXXXXXX
Connection: close
User-Agent: Python-urllib/3.5
2)
GET /example2.html HTTP/1.1
Accept-Encoding: identity
Connection: close
Host: XXXXXXXX
User-Agent: Python-urllib/3.5
No noticeable difference between 1) and 2.
3)
GET /example3 HTTP/1.1
Host: XXXXXXXX
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.18.4
The last is slightly different; this means there is at least the possibility of obtaining different results in the response, and this will depend on the server configuration.
Accept-Encoding: gzip, deflate
This may result in the server compressing the response, which means less data transferred.
Connection: keep-alive
The server will keep the connection open for reusing with subsequent requests (possibly more efficient).
User-Agent:
Many web servers adapt the content depending on the identified client software. I don't think there will be any difference in this particular case, however it can't be ruled out completely.
This is a follow up on Security Dialogflow fulfillment thread.
the answer there
explore the req.headers.authorization you will find an authentication variable
(concat these three things:
Your dialogflow username
The character ':'
Your dialogflow password
and encode it in base64)
makes sense but in my python implementation the
request headers I get is:
Accept: */*
Content-Type: application/json; charset=UTF-8
Content-Length: 571
Host: xxxxxxxx
User-Agent: Apache-HttpClient/4.5.4 (Java/1.8.0_151)
Accept-Encoding: gzip,deflate
X-Forwarded-Proto: https
X-Forwarded-For: xx.xxxx.xx..xx
PS: I tried both V1 and V2
not sure how to take care of authorization
You have to set the basic auth fields in the Fullfilment settings (the ones below the Fullfilment-URL). Only then you will receive the base64 encoded part in the Authorization Header.
This has nothing to do with your personal credentials you use to login to dialogflow! Do not use them for basic auth!
I've made an python server with swagger-codegen. I have one endpoint that receives an file with mutlipart/form-data
And also created an client with go-swagger for testing.
created an file to upload: $ echo "123file content321" > data
and used the client to upload the file to the server. The resulting HTTP request looks like this:
POST /api/order/1/attachment HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Go-http-client/1.1
Transfer-Encoding: chunked
Accept: application/json
Content-Type: multipart/form-data; boundary=5f3f0ad86e6345b77c869cbe0a5e608f038354cf9ceab74ec2533d7555c0
Accept-Encoding: gzip
ff
--5f3f0ad86e6345b77c869cbe0a5e608f038354cf9ceab74ec2533d7555c0
Content-Disposition: form-data; name="file"; filename="data"
Content-Type: application/octet-stream
123file content321
--5f3f0ad86e6345b77c869cbe0a5e608f038354cf9ceab74ec2533d7555c0--
but the server doesn't accept it and responds:
HTTP/1.0 400 BAD REQUEST
Connection: close
Content-Length: 120
Content-Type: application/problem+json
Date: Fri, 19 May 2017 15:15:44 GMT
Server: Werkzeug/0.12.1 Python/3.6.1
{
"type": "about:blank",
"title": "Bad Request",
"detail": "Missing formdata parameter 'file'",
"status": 400
}
So the request isn't parsed properly. But when I use the swagger-ui, the file is uploaded correctly. Is there problem with the client's request, or the server has a problem?
EDIT: I think that there is missing Content-Lenght or the ff at the beginning of the BODY might not be there
EDIT2: the swagger-ui request:
POST /api/order/1/attachment HTTP/1.1
Host: localhost:8080
Connection: keep-alive
Content-Length: 211
Origin: http://localhost:8080
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundarypzmNwrDR7zzpZ7SJ
Accept: application/json
X-Requested-With: XMLHttpRequest
DNT: 1
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
------WebKitFormBoundarypzmNwrDR7zzpZ7SJ
Content-Disposition: form-data; name="file"; filename="data"
Content-Type: application/octet-stream
123file content321
------WebKitFormBoundarypzmNwrDR7zzpZ7SJ--
The first request you send is a HTTP/1.1 request using chunked transfer encoding. This means the body is consisting of multiple chunks where each chunk is prefixed by the size in hex followed by \r\n followed by the data and again \r\n. I'm not sure if the ff at the beginning of the body you show really specifies the size of the following data (i.e. 255 bytes). But, the last chunk with a size of 0 is missing, so this request is incomplete. But maybe you just omitted the missing part from this question only.
Apart from that the server is responding with version HTTP/1.0. Chunked transfer encoding is only defined for HTTP/1.1 which means that this request will not be understood by a HTTP/1.0 server. And not even all HTTP/1.1 server will understand chunked transfer encoding in the request even if they should.
The second request you show (created by Chrome) does not use chunked transfer encoding but specifies instead the length of the header using Content-length in the HTTP header. That's the way you should go since this works with all web servers, including HTTP/1.0 servers.
Based on the two requests you have posted I would attempt to set the Content-Length on your go request first and test that. I've run into issues before with the ArangoDB HTTP API not accepting requests without a correct content length value.
If the succeeds then yay.
Otherwise, that ff in your request is the next thing I'd look at getting rid of. But I'd focus on the Content-Length header first.
I need to log into a website using python but the login page requires a sessionID cookie in the request header. Using Google developer tools along with a webclient(hurl.it), I was able to determine the required format of the request header that is acceptable by the webserver:
Accept: */*
Accept-Encoding: gzip, deflate
Content-Length: 85
Content-Type: application/x-www-form-urlencoded
Cookie: www_amsterdam-dance-event_nl_session=l9Abno8a1UyHPof%2BOyVqk8BxHjesGMi78z6Ot0ZXCCbI%2BxVKqjm30ALTfW%2FR7yKcDaqfEtFOyysTrjIeU8lU5ylv1TOlW6GLHY8jDfKKWSULKsUUJiTh92DbvkuYBuE6zt%2FeLs44lDna6Nz3uMCOaSARN7gCpoSz0TOcFaes8Hk9q6FikP1F9e%2B%2FsMwfUP0RTA0Rc5gJFyJPxHXNCdn%2BT49mhHYnzoIWVlxGHhlaEkZX1PPsYx1xq0BCgpb0WnPViuiZiBnQY2nz%2BBO4Uur0WPNfpSSWZg5Qxz79nYeChlRe16JhYjVOdaiUhnfEvp1jM7h%2BCdR6cUeatd7HGbftRCjINDrVuPeyB5ltVihStmzKEjOmWetI0xNuaNswsPIKKuo%2BV6JFNfdLcA6h3iy1K8o%2FA49tKGMP2rmGe4e5Jec%3Df395212364d1ffc80cf95ebf5abf3b40f9dc6441;
User-Agent: runscope/0.1
email=******%40beatswitch.com&login_token=545a46230b291&password=*****&submission=
I have produced the following request using Python requests module:
POST /my-ade/login/ HTTP/1.1
Host: www.amsterdam-dance-event.nl
Content-Length: 85
Accept-Encoding: gzip,deflate
Accept: */*
User-Agent: runscope/0.1
Connection: keep-alive
Cookie: www_amsterdam-dance-event_nl_session=l9Abno8a1UyHPof%2BOyVqk8BxHjesGMi78z6Ot0ZXCCbI%2BxVKqjm30ALTfW%2FR7yKcDaqfEtFOyysTrjIeU8lU5ylv1TOlW6GLHY8jDfKKWSULKsUUJiTh92DbvkuYBuE6zt%2FeLs44lDna6Nz3uMCOaSARN7gCpoSz0TOcFaes8Hk9q6FikP1F9e%2B%2FsMwfUP0RTA0Rc5gJFyJPxHXNCdn%2BT49mhHYnzoIWVlxGHhlaEkZX1PPsYx1xq0BCgpb0WnPViuiZiBnQY2nz%2BBO4Uur0WPNfpSSWZg5Qxz79nYeChlRe16JhYjVOdaiUhnfEvp1jM7h%2BCdR6cUeatd7HGbftRCjINDrVuPeyB5ltVihStmzKEjOmWetI0xNuaNswsPIKKuo%2BV6JFNfdLcA6h3iy1K8o%2FA49tKGMP2rmGe4e5Jec%3Df395212364d1ffc80cf95ebf5abf3b40f9dc6441;
Content-Type: application/x-www-form-urlencoded
login_token=545a46230b291&password=*****&email=******%40beatswitch.com&submission='
When I load the former request header with hurl.it, everything works perfectly and the webserver lets me log in but trying the almost-same request with the same parameters fails in python. While using python's request, the webserver presents an error page. Any help would be highly appreciated. I need a solution desperately.
EDIT:
Here is the code:
#Open the login page to get sessionID and login_token
loginURL = "https://www.amsterdam-dance-event.nl/my-ade/login/"
loginReq = session.get(loginURL)
loginSoup = BeautifulSoup(loginReq.text)
loginToken = loginSoup.find('input',attrs={'name':'login_token'})['value']
sessionID= loginReq.cookies['www_amsterdam-dance-event_nl_session']
cookie = 'www_amsterdam-dance-event_nl_session='+sessionID
#Construct the header and post it to the webserver
headers = {'Content-Length':'85','Accept':'*/*','User-Agent':' runscope/0.1','Content-Type':'application/x-www-form-urlencoded','Accept-Encoding':'gzip,deflate','Cookie':cookie}
payload = {'email':'*******#beatswitch.com','password':'********','login_token':loginToken,'submission':''}
loggedinReq = session.post(loginURL,headers=headers,data=payload)
I found the solution, thanks to Md. Mohsin. I was trying to handle the request headers and cookies manually while the requests module can handle them by itself. So I REMOVED the following line from the code and let requests take total control, and everything worked as intended:
headers = {'Content-Length':'85','Accept':'*/*','User-Agent':' runscope/0.1','Content-Type':'application/x-www-form-urlencoded','Accept-Encoding':'gzip,deflate','Cookie':cookie}