Python-Requests close http connection - python

I was wondering, how do you close a connection with Requests (python-requests.org)?
With httplib it's HTTPConnection.close(), but how do I do the same with Requests?
Code:
r = requests.post("https://stream.twitter.com/1/statuses/filter.json", data={'track':toTrack}, auth=('username', 'passwd'))
for line in r.iter_lines():
if line:
self.mongo['db'].tweets.insert(json.loads(line))

I think a more reliable way of closing a connection is to tell the sever explicitly to close it in a way compliant with HTTP specification:
HTTP/1.1 defines the "close" connection option for the sender to
signal that the connection will be closed after completion of the
response. For example,
Connection: close
in either the request or the response header fields indicates that the
connection SHOULD NOT be considered `persistent' (section 8.1) after
the current request/response is complete.
The Connection: close header is added to the actual request:
r = requests.post(url=url, data=body, headers={'Connection':'close'})

I came to this question looking to solve the "too many open files" error, but I am using requests.session() in my code. A few searches later and I came up with an answer on the Python Requests Documentation which suggests to use the with block so that the session is closed even if there are unhandled exceptions:
with requests.Session() as s:
s.get('http://google.com')
If you're not using Session you can actually do the same thing: https://2.python-requests.org/en/master/user/advanced/#session-objects
with requests.get('http://httpbin.org/get', stream=True) as r:
# Do something

As discussed here, there really isn't such a thing as an HTTP connection and what httplib refers to as the HTTPConnection is really the underlying TCP connection which doesn't really know much about your requests at all. Requests abstracts that away and you won't ever see it.
The newest version of Requests does in fact keep the TCP connection alive after your request.. If you do want your TCP connections to close, you can just configure the requests to not use keep-alive.
s = requests.session()
s.config['keep_alive'] = False

please use response.close() to close to avoid "too many open files" error
for example:
r = requests.post("https://stream.twitter.com/1/statuses/filter.json", data={'track':toTrack}, auth=('username', 'passwd'))
....
r.close()

On Requests 1.X, the connection is available on the response object:
r = requests.post("https://stream.twitter.com/1/statuses/filter.json",
data={'track': toTrack}, auth=('username', 'passwd'))
r.connection.close()

this works for me:
res = requests.get(<url>, timeout=10).content
requests.session().close()

Based on the latest requests(2.25.1), the requests.<method> will close the connection by default
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
https://github.com/psf/requests/blob/master/requests/api.py#L60
Thus, if you use the latest version of requests, it seems we don't need to close the connection by ourselves.
Also, if you need to send multiple times of requests with the same session, it's better to use requests.Session() instead of open/close the connection multiple times.
EX:
with requests.Session() as s:
r = s.get('https://example.org/1/')
print(r.text)
r = s.get('https://example.org/2/')
print(r.text)
r = s.get('https://example.org/3/')
print(r.text)

To remove the "keep-alive" header in requests, I just created it from the Request object and then send it with Session
headers = {
'Host' : '1.2.3.4',
'User-Agent' : 'Test client (x86_64-pc-linux-gnu 7.16.3)',
'Accept' : '*/*',
'Accept-Encoding' : 'deflate, gzip',
'Accept-Language' : 'it_IT'
}
url = "https://stream.twitter.com/1/statuses/filter.json"
#r = requests.get(url, headers = headers) #this triggers keep-alive: True
s = requests.Session()
r = requests.Request('GET', url, headers)

Related

Being unable to send request using different proxies in Python (using requests module)

I am trying to send a request to a URL with a proxy using requests module of Python (3.6.5). The request has been done successfully, but when I check the origin of the request (by printing req.content), it still shows my IP. Checking through the examples over the Internet, I couldn't get the point behind this problem.
def send_request(url):
header = get_random('UserAgents.txt')
proxy = get_random('ProxyList.txt')
print("Proxy: " + str(proxy))
proxies = {
'http' : 'http://' + str(proxy),
}
try:
session = requests.Session()
session.proxies = proxies
session.headers = HEADER
req = session.get(url)
# req = requests.get(url, headers = { 'User-Agent' : HEADER
# }, proxies = proxies)
print(req.content)
req.raise_for_status()
except Exception as e:
print(e)
sys.exit()
print('Request is successful!')
return req
There is a possibility that your particular proxy may not hide your i.p. You can try one of these free proxies (results are in json format)
getproxylist
gimmeproxy
pubproxy
and see if that works. btw, if what you want is anonymous web access, vpn is far better than proxy.

Requests SSL connection timeout

I'm using python requests to send http requests to www.fredmeyer.com
I can't even get past an initial get request to this domain. doing a simple requests.get results in the connection hanging and never timing out. i've verified i have access to this domain and am able to run the request on my local machine. can anyone replicate
The site seems to have some filtering enabled to prohibit bots or similar. The following HTTP request works currently with the site:
GET / HTTP/1.1
Host: www.fredmeyer.com
Connection: keep-alive
Accept: text/html
Accept-Encoding:
If the Connection header is removed or its value changed to close it will hang. If the (empty) Accept-Encoding header is missing it will also hang. If the Accept line is missing it will return 403 Forbidden.
In order to access this site with requests the following currently works for me:
import requests
headers = { 'Accept':'text/html', 'Accept-Encoding': '', 'User-Agent': None }
resp = requests.get('https://www.fredmeyer.com', headers=headers)
print(resp.text)
Note that the heuristics used by the site to detect bots might change, so this might stop working in the future.

python http client module error / inconsistent

I'm getting the following output
301 Moved Permanently --- when using http.client
200 --- when using requests
URL handling "http://i.imgur.com/fyxDric.jpg" passed as arg through command
What I expect is give me 200 status ok response.
This is the body
if scheme == 'http':
print('Ruuning in the http')
conn = http.client.HTTPConnection("www.i.imgur.com")
conn.request("GET", urlparse(url).path)
conn_resp = conn.getresponse()
body = conn_resp.read()
print(conn_resp.status, conn_resp.reason, body)
When using the requests
headers = {'User-Agent': 'Mozilla/5.0 Chrome/54.0.2840.71 Safari/537.36'}
response = requests.get(url, allow_redirects=False)
print(response.status_code)
You are trying to hit imgur over http, but imgur redirects all its request to process over https.
Due to this redirect the issue is occurring.
http module doesnt inherently handle the redirects you need to handle the redirects, where as requests module handles these redirects by itself.
The documentation on the http module includes in its first sentence "It is normally not used directly." Unlike requests it doesn't action the 301 response and follow the redirection in the headers. It instead returns the 301, which you would have to process yourself.

Python proxy authentication with Requests and Urlib2

Does anyone have any ideas on why the Urlib2 version returns the webpage, while the Requests version returns a connection error:
[Errno 10060] A connection attempt failed because the connected party
did not properly respond after a period of time, or established
connection failed because connected host has failed to respond.
Urllib2 code (Working):
import urllib2
proxy = urllib2.ProxyHandler({'http': 'http://login:password#proxy1.com:80'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
wPage = urllib2.urlopen('http://www.google.com/')
print wPage.read();
Requests code (Not working - Errno 10060):
import requests
proxy = {"http": "http://login:password#proxy1.com:80"}
wPage = requests.get('http://www.google.com/', proxies=proxy)
print wPage.text
The requests version returns intranet webpages, but gives an error on internet pages.
I am running Python 2.7
* Edit *
Based on m170897017's suggestion, I looked for differences in the GET requests. The only difference was in Connection and Proxy-Connection.
Urllib2 version:
header: Connection: close
header: Proxy-Connection: close
Requests version :
header: Connection: Keep-Alive
header: Proxy-Connection: Keep-Alive
I forced the Requests version to close both of those connections by modifying the header
header = {
"Connection": "close",
"Proxy-Connection": "close"
}
The GET request for both now match, however the Requests version still does not work.
Try this:
import urllib2
proxy = urllib2.ProxyHandler({'http': '1.1.1.1:9090'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://www.google.com/')
datum = response.read().decode("UTF-8")
response.close()
print datum
A little late... but for future reference this line:
proxy = {"http": "http://login:password#proxy1.com:80"}
should also have a second key/value pair for https, even if it's not going to be used.
Also there is an awesome requests module called proxy requests that does something very similar:
pip3 install proxy-requests
https://pypi.org/project/proxy-requests/

Django rejects Requests' CSRF Token

I'm writing an Ajax post with python's Request's library to a django backend
Code:
import requests
import json
import sys
URL = 'http://localhost:8000/'
client = requests.session()
client.get(URL)
csrftoken = client.cookies['csrftoken']
data = { 'file': "print \"It works!\"", 'fileName' : "JSONtest", 'fileExt':".py",'eDays':'99','eHours':'1', 'eMinutes':'1' }
headers = {'Content-type': 'application/json', "X-CSRFToken":csrftoken}
r = requests.post(URL+"au", data=json.dumps(data), headers=headers)
Django gives me a 403 error stating that the CSRF token isn't set even though the request.META from csrf_failure() shows it is set. Is there something I'm missing or a stupid mistake I'm not catching?
I asked my friend and he figured out the problem, basically you have to send the cookies that django gives you every time you do a request.
corrected:
cookies = dict(client.cookies)
r = requests.post(URL+"au", data=json.dumps(data), headers=headers,cookies=cookies)
You need to pass the referer to the headers, from the django docs:
In addition, for HTTPS requests, strict referer checking is done by
CsrfViewMiddleware. This is necessary to address a Man-In-The-Middle
attack that is possible under HTTPS when using a session independent
nonce, due to the fact that HTTP ‘Set-Cookie’ headers are
(unfortunately) accepted by clients that are talking to a site under
HTTPS. (Referer checking is not done for HTTP requests because the
presence of the Referer header is not reliable enough under HTTP.)
so change this:
headers = {'Content-type': 'application/json', "X-CSRFToken":csrftoken, "Referer": URL}

Categories