Processes if call requests.get() - python

while working with python Requests, I met the problem with ConnectionRefusedError [WinError 10061], because of network settings and limitations in my network, or company's network software won't allow it (I think).
But I was interested in what happens when I call requests.get(). Maybe I'm not good at reading the documentation, but I could not find any processes which happen after the call.
For example, why if I access URL by the browser it is ok, but while I try to access by requests - it fails.
What I'm asking about is what processes happen after the call get() method: starts the server at localhost? configure it? form headers? how it send the request?

Generally, most companies use proxy server for each outgoing request. Once set in connection settings, the browsers will read them and set with each request. You can check if proxy is enabled by checking the settings in your browser.
However, when you're making a python request, you will need to set the proxy in the request, like this:
proxyDict = {
"http" : "192.168.100.3:8080",
"https" : "Some/Same proxy for https",
"ftp" : "Some proxy for ftp (Optional)"
}
r = requests.get(url, headers=headers, proxies=proxyDict)
Also, browsers set the content-types, request headers and other such parameters. You can check browser's developer console, like one of Google Chrome, and goto Network tab and see what all params are being set with the request, and imply the same paramters in your request.get(). In case of headers, it should be :
r = requests.get(url, headers=headers, proxies=proxyDict, headers = {'Content-type':'application/json')

Related

Python Requests Returning 401 code on 'get' method

I'm working on a webscrape function that's going to be pulling HTML data from internal (non public) servers. I have a connection through a VPN and proxy server so when I ping any public site I get code 200 no problem, but our internals are returning 401.
Heres my code:
http_str = f'http://{username}:{password}#proxy.yourorg.com:80'
proxyDict = {
'http' : http_str,
'https' : https_str,
'ftp' : https_str
}
html_text = requests.get(url, verify=True, proxies=proxyDict, auth=HTTPBasicAuth(user, pwd))
I've tried flushing my DNS server, using different certificate chains (that had a whole new list of problems). I'm using urllib3 on version 1.23 because that seemed to help with SSL errors. I've considered using a requests session but I'm not sure what that would change.
Also, the url's we're trying to access DO NOT require a log in. I'm not sure why its throwing 401 errors but the auth is for the proxy server, I think. Any help or idea are appreciated, along with questions as at this point I'm not even sure what to ask to move this along.
Edit: the proxyDict has a string with the user and pwd passed it for each type, https http fts, etc.
To use HTTP Basic Auth with your proxy, use the http://user:password#host/ syntax in any of the proxy configuration entries. See apidocs.
import requests
proxyDict = {
"http": "http://username:password#proxy.yourorg.com:80",
"https": "http://username:password#proxy.yourorg.com:80"
}
url = 'http://myorg.com/example'
response = requests.get(url, proxies=proxyDict)
If, however, you are accessing internal URLs via VPN (i.e., internal to your organization on your intranet) then you should NOT need the proxy to access them.
Try:
import requests
url = 'http://myorg.com/example'
response = requests.get(url, verify=False)

View headers and body of a POST request made by Python Script

In my application, I have my API that is in localhost:8000/api/v0.1/save_with_post.
I've also made a Python Script in order to do a Post Request on such Api.
### My script
import requests
url = 'localhost:8000/api/v0.1/save_with_post'
myobj = {'key': 'value'}
x = requests.post(url, data = myobj)
Is it possible to view headers and body of the request in Chrome rather than debugging my application code?
You want Postman.
With Postman you can either generate a request to your service from Postman itself, or set up Postman as a proxy so you can see the requests that your API client is generating and the responses from the server.
If you want to view the response headers from the post request, have you tried:
>>> x.headers
Or you could just add headers yourself to your POST request as so:
h = {"Content-Type": "application/xml", ("etc")}
x = requests.post(url, data = myobj, headers = h)
well, I don't know if there is a way you could view the request in Chrome DevTools directly (I don't think there is) however, I know there are two alternatives for seeing the request body and response:
1 - use selenium with a chrome webdriver
this will allow you to run chrome automated by python. then you can open a test page and run javascript in it to do your post request,
see this for more info on how to do this:
1 https://selenium-python.readthedocs.io/getting-started.html
2 Getting the return value of Javascript code in Selenium
you will need to use Selenium-requests library to use requests library with selenium
https://pypi.org/project/selenium-requests/3
2 - use Wireshark
this program will allow you to see all the traffic that is going on your network card and therefore you will be able to monitor all the requests going back and forth. however, Wireshark will throw all the traffic that you network card send or receives it may be hard to see the specific request you want

Http get and post request throgh proxy in python

import requests
proxies = {'http': '203.92.33.87:80'}
# Creating the session and setting up the proxies.
s = requests.Session()
s.proxies = proxies
# Making the HTTP request through the created session.
r = s.get('https://www.trackip.net/ip')
# Check if the proxy was indeed used (the text should contain the proxy IP).
print(r.text)
In above code I am expecting that print will print 203.92.33.87.
But it is printing my real public IP.
In your proxies dictionary, you only specify a proxy for protocol http. But in your s.get(), you specificy protocol https. Since there is no https key in your dictionary, no proxy is used.
If 203.92.33.87:80 is, in fact, an https proxy, then change the proxies dictionary to reflect that. On the other hand, if it is an http proxy, then change s.get() to s.get('http://...').
Also, I believe you've incorrectly specified the proxy URL. According to the documentation:
Note that proxy URLs must include the scheme

Capture existing cookie of browser, with python

How can I capture existing cookies from my browser (current session on a certain site) to use with requests in python?
You can't get cookies from your browser, due to security purposes. If you want to get cookies to your python script you should get it by requests.
req = requests.get("http://example.com")
And req.cookies will have your cookies objects
To send cookie objects you can create a simple dictionary with cookies and send it via necessary request:
cookies = { "id": "516561346236234" }
requests.post("http://example.com/send", cookies=cookies)
P.S. But you can get cookies by your hands using dev tools or plugins like EditThisCookie or CookieInspector.

scraping website on web proxy using python

I am working on scraping databases that I have access to using the duke library web proxy. I encountered the issue that since the data base is accessed through a proxy server, I can't directly scrape this database as I would if the database was did not require proxy authentication.
I tried several thing:
I wrote one script that logs into the duke network (https://shib.oit.duke.edu/idp/AuthnEngine').
I then hardcode in my login data:
login_data = urllib.urlencode({'j_username' : 'userxx',
'j_password' : 'passwordxx',
'Submit' : 'Enter'
})
I then login:
resp = opener.open('https://shib.oit.duke.edu/idp/AuthnEngine', login_data)
and then I create a cookie jar object to hold the cookies from proxy website.
then i try to access the database with my script and it is still telling me authentication is required. I wanted to know how I can get around the authentication required for the proxy server.
If you have any suggestions please let me know.
Thank you,
Jan
A proxy login does not store cookies but instead uses the Proxy-Authorization header. This header will need to be sent with every request similar to Cookies. The header is of the same format as regular Basic Authentication, although there are different formats possible (Digest, NTLM.) I suggest you check the headers of a normal login and copy and paste the Proxy-Authorization header that was sent.

Categories