What is causing urllib2.urlopen() to connect via proxy?

What is causing urllib2.urlopen() to connect via proxy? - python

I'm trying to read a URL within our corporate network. Spesifically the server I'm contacting is in one office and the client PC is in another:
print(urlopen(r"http://london.mycompany/mydir/").read())
Whenever I run this function I get:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "C:\Python24\lib\urllib2.py", line 364, in open
response = meth(req, response)
File "C:\Python24\lib\urllib2.py", line 471, in http_response
response = self.parent.error(
File "C:\Python24\lib\urllib2.py", line 402, in error
return self._call_chain(*args)
File "C:\Python24\lib\urllib2.py", line 337, in _call_chain
result = func(*args)
File "C:\Python24\lib\urllib2.py", line 480, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 407: Proxy Authentication Required
The odd thing is that there's no firewall between these two computers - for some reason url has decided to connect to the web-server via the proxy which we'd normally use to connect to content outside the company, and in this case that's failing because I've not authenticated it.
I'm pretty sure that the fault occurs within the client PC: I did a nslookup and a ping to the server to confirm that there's a connection between the two computers, however when I watch the transaction using TCPView for Windows I can see that the python.exe process is connecting to a completely different server (yes, the proxy!).
So what could be causing this? Note that the os.environ["http_proxy"] variable is NOT set - this variable is often used to make urllib connect via a proxy server. That's not the case here. Could there be something else which might have the same effect?
FYI, Running Python 2.4.4 on Windows XP 32bit in a very locked-down corporate environment.

It reads from the system settings. Use urllib.FancyURLOpener:
opener = urllib.FancyURLopener({})
f = opener.open("http://london.mycompany/mydir/")
f.read()

Related

GMail Oauth2 authentication for yagmail stops working after a few days

I've started using yagmail a while ago for sending mails via GMail using OAuth2.
The code that I use is straightforward:
def send_with_yagmail(self):
yag = yagmail.SMTP(self.our_email, oauth2_file="/<path_to>/credentials.json")
yag.send(self.to_email, self.message_subject, contents=self.message_body)
All was well for a few days, and then suddenly Gmail stopped sending mails with the following as part of the stack trace:
...
return get_oauth_string(user, oauth2_info)
File "/usr/local/lib/p│ython3.9/site-packages/yagmail/oauth2.py", line 96, in get_oauth_string
access_token, expires_in = refresh_authorization(**oauth2_info)
File "/usr/local/lib/python3.9/site-packages/yagmail/oauth2.py", line 91, in refresh_authorization
response = call_refresh_token(google_client_id, google_client_secret, google_refresh_token)
File "/usr/local/lib/python3.9/site-packages/yagmail/oauth2.py", line 71, in call_refresh_token
response = urlopen(request_url, encoded_params).read().decode('UTF-8')
File "/usr/local/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/usr/local/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)│Ju
File "/usr/local/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
I've solved it by regenerating the credentials.json locally on my machine and uploading to the server thinking that it's a one-off event. It worked for a few days and now it stopped working again. So I'm wondering: what should I do so I don't have to regenerate the credentials every few days? My credentials.json looks like below:
{
"email_address": "my.team#gmail.com",
"google_client_id": "74<blabla>0e1feov.apps.googleusercontent.com",
"google_client_secret": "GOCSP<blabla<kOz",
"google_refresh_token": "1//09<blabla>-2_dxZH8"
}
Isn't the point of the refresh token to be used by the library such that I don't have to regenerate this thing by hand?
Can anybody recommend something that can be done? Thanks!

I ran into this same problem, is your app in 'testing' mode with the Google API? Testing tokens have an expiration of 7 days, which is why you had to re-generate your token. I found this answer, but after I changed my status to "In Production" it now gives me an authentication error. Based on this page if you're using Gmail, a production app needs to be verified by Google, but my credential says it doesn't need to be verified and I don't see an option to request verification. Might need to create a new credential?

urllib: Opening a url always gets 429: Too many requests

I just got started with the urllib module. I'm trying to scrape products from supermarkets and there's a website that seems to always respond with an HTTP Error 429: Too many requests. I already did a bit of research on the Stack Overflow and no one seems to have the same problem. My code is as simple as it can get:
>>> import urllib.request
>>> resp = urllib.request.urlopen("https://shop.coles.com.au/a/a-national/product/head-shoulders-shampoo-conditioner-2in1-deep-clean")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
resp = urllib.request.urlopen("https://shop.coles.com.au/a/a-national/product/head-shoulders-shampoo-conditioner-2in1-deep-clean")
File "C:\Users\thank\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\thank\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\thank\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 640, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\thank\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 568, in error
return self._call_chain(*args)
File "C:\Users\thank\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\thank\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 648, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
I've also tried to modify the user-agent as this answer suggests, but the result is still the same
Can someone explain which default settings inside the urllib module may cause the problem? Or is it because the website blocks bots? Other product pages of the website don't work either.

429 is server asking you to stop. Basically, the web server thinks you are trying to spam or scrape and it doesn't like it. Generally you should honor the server and if there is try after some time with 429 response you should follow it.
If you feel you are wrongly been asked by the server, either you can make sure that your user request is **similar" to the user request generated by an user from the browser, which will include user-agent and all the other information a regular browser would send with the request. If the server is sending you 429 despite that most probably either it has blocked your ip temporarily or permanently. In that you should look how to scrape through multiple ips.

Using python script to download google sheet as csv

I'm using the python script on raspberry pi3 from this link- inserting my google email address and google sheet number into the script:
https://gist.github.com/Thuruv/dc0e2f781b8e095b9981f265647b8304
and then my google password as I run the script but I get the below errors:
Traceback (most recent call last):
File "Googlespreadsheets.py", line 53, in <module>
csv_file = gs.download(ss)
File "Googlespreadsheets.py", line 34, in download
"Authorization": "GoogleLogin auth=" + self.get_auth_token(),
File "Googlespreadsheets.py", line 29, in get_auth_token
return self._get_auth_token(self.email, self.password, source,
service="wise")
File "Googlespreadsheets.py", line 25, in _get_auth_token
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

Navigating to the URL in the code directly links here, displaying the warning from Google:
Important: ClientLogin has been officially deprecated since April 20, 2012 and is now no longer available. Requests to ClientLogin will fail with a HTTP 404 response. We encourage you to migrate to OAuth 2.0 as soon as possible.
This code will fail with a 404 response, as your attempt demonstrates. Try moving this code to OAuth2.

I've implemented an open source python command line utility https://pypi.org/project/google-sheets-to-csv/ that should works on pi3 as long you have python3 installed. If you want to integrate in a larger application you should be able to use it as third party API.
Basic usage on linux:
pip install google-sheets-to-csv
mkdir out
gs-to-csv <spreadsheet ID> <sheet selector (regex)> out/
I'll get one csv file per sheet that match the given regex selector.
If you've a browser installed on your pi3, first time you connect you'll be asked to allow read access to all your spreadsheets to the python application installed on your pi3. If you use your pi3 as a server without GUI you could use it on your computer and copy the generated token but I would recommand to use a google service account in that case and gives access to spreadsheets you want to download to that google account service.

Accessing netflix api from python's urllib2 results in 500 error

I'm currently trying to fix a Kodi plugin called NetfliXBMC.
It uses this url to get information on specific movies:
http://www.netflix.com/JSON/BOB?movieid=<SOMEID>
While trying to build a minimal case to ask this question I discovered that it's not even necessary to be logged in to access the information, which simplifies my question a lot.
Querying information about a movie works from wget, from curl, from incognito chrome etc. It just never works from urllib2:
# wget works just fine
$: wget -q -O- http://www.netflix.com/JSON/BOB?movieid=80021955
{"contextData":"{\"cookieDisclosure\":{\"data\":{\"showCookieBanner\":false}}}","result":"success","actionErrors":null,"fieldErrors":null,"actionMessages":null,"data":[output omitted for brevity]}
# so does curl
$: curl http://www.netflix.com/JSON/BOB?movieid=80021955
{"contextData":"{\"cookieDisclosure\":{\"data\":{\"showCookieBanner\":false}}}","result":"success","actionErrors":null,"fieldErrors":null,"actionMessages":null,"data":[output omitted for brevity}
# but python's urllib always gets a 500
$: python -c "import urllib2; urllib2.urlopen('http://www.netflix.com/JSON/BOB?movieid=80021955').read()"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
$: python --version
Python 2.7.6
What I've tried so far: several different user-agent strings, initializing a urlopener with a cookie jar, plain old urllib (doesn't raise an exception but receives the same error page).
I'm really curious as to why this might be. Thanks in advance!

It turned out to be a bug on netflix' side when no Accept header is sent.
This doesn't work:
opener = urllib2.build_opener()
opener.open("http://www.netflix.com/JSON/BOB?movieid=80021955")
Adding a proper accept header makes it work:
opener = urllib2.build_opener()
mimeAccept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
opener.addheaders = [('Accept', mimeAccept)]
opener.open("http://www.netflix.com/JSON/BOB?movieid=80021955")
[...]
Of course, there is another bug there: it returns a 500 internal server error instead of a 400 bad request when the problem was clearly on the request.

Paster does not work when installing DB

I have tried to use the below forum to item to fix the problem but it did not seam to work for me
https://stackoverflow.com/questions/21955234/ckan-install-paster-error
Amazingly I got the same issue when I tried to install CKAN on windows.
paster db init -c XXXX/development.ini not working for CKAN-command 'db' not know
This time I am trying to install CKAN on Ubuntu 12.04 (actually 12.04.5 as I couldn't get 12.0.4) as instructed in
http://docs.ckan.org/en/latest/maintaining/installing/install-from-source.html
I am having to install everything using a PROXY
I have added the password to the SQL Chemistry and the Development.ini does exist. This is my error (below)
Is this a proxy issue? I have used the chmod to change the access to the ini file as the other forum recommended. I also set the virtual path. The database base does exist as I check it.
:
(default)root#UbuntaDataServer:/usr/lib/ckan/default/src/ckan# paster db init -c /etc/ckan/default/development.ini
Traceback (most recent call last):
File "/usr/lib/ckan/default/bin/paster", line 9, in <module>
load_entry_point('PasteScript==1.7.5', 'console_scripts', 'paster')()
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
invoke(command, command_name, options, args[1:])
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
exit_code = runner.run(args)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
result = self.command()
File "/root/ckan/lib/default/src/ckan/ckan/lib/cli.py", line 156, in command
self._load_config()
File "/root/ckan/lib/default/src/ckan/ckan/lib/cli.py", line 98, in _load_config
load_environment(conf.global_conf, conf.local_conf)
File "/root/ckan/lib/default/src/ckan/ckan/config/environment.py", line 232, in load_environment
p.load_all(config)
File "/root/ckan/lib/default/src/ckan/ckan/plugins/core.py", line 124, in load_all
unload_all()
File "/root/ckan/lib/default/src/ckan/ckan/plugins/core.py", line 182, in unload_all
unload(*reversed(_PLUGINS))
File "/root/ckan/lib/default/src/ckan/ckan/plugins/core.py", line 210, in unload
plugins_update()
File "/root/ckan/lib/default/src/ckan/ckan/plugins/core.py", line 116, in plugins_update
environment.update_config()
File "/root/ckan/lib/default/src/ckan/ckan/config/environment.py", line 270, in update_config
search.check_solr_schema_version()
File "/root/ckan/lib/default/src/ckan/ckan/lib/search/__init__.py", line 291, in check_solr_schema_version
res = urllib2.urlopen(req)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 503: Service Unavailable

This part of the stacktrace:
File "/root/ckan/lib/default/src/ckan/ckan/lib/search/init.py", line 291, in check_solr_schema_version
res = urllib2.urlopen(req)
Suggests that there is a problem connecting to Solr. You should make sure solr is running, that you can connect to it, and that the setting in your .ini file for the location and port that solr is running on is correct.

This is not the complete answer. Maybe close.
This is what I see on http:||localhost|solr|
Solr Admin (ckan)
UbuntaDataServer:8983
cwd=/var/cache/jetty/tmp SolrHome=/usr/share/solr/
This is what is running on the URL. I assume this is close or correct?
Any more suggestions?

Using CKAN 2.2 I had the same problem with proxies that require authentication
If you are installing CKAN from sources, I suggest to move to 2.2.1 version (or newer).
In these versions I found no issues with auth proxy.
Anyway, if you're bound to a specific, older version of CKAN, you can manually add a proxy handler.
First of all, set your http_proxy env vars (both uppercase and lowercase)
Now you can edit the file ckan/ckan/lib/search/__init__.py and get your hands dirty.
We need to declare a handle_proxy() function:
import os
def handle_proxy():
proxy_settings = dict()
for k,v in os.environ.items():
if k.rfind('_proxy') > -1:
proxy_settings[k] = v
proxy_handler = urllib2.ProxyHandler(proxy_settings)
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
Now we can call it in the check_solr_schema_version() function just before sending the request.
Replace
res = urllib2.urlopen(req)
with
handle_proxy()
res = urllib2.urlopen(req)
NOTE: this is a temporary workaround, just in case upgrading to newer versions (i currently use the 2.2.2 branch) does not fix the problem for you. I wouldn't suggest it for a production environment :)

I found another answer, if the above does not work, try:
Install this again:
sudo -E apt-get install python-pastescript
. /usr/lib/ckan/default/bin/activate
cd /usr/lib/ckan/default/src/ckan
paster make-config ckan /etc/ckan/default/development.ini
Change SOLR to your IP number and not localhost
paster db init -c /etc/ckan/default/development.ini
Hope that fixes your problem

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is causing urllib2.urlopen() to connect via proxy? - python

It reads from the system settings. Use urllib.FancyURLOpener: opener = urllib.FancyURLopener({}) f = opener.open("http://london.mycompany/mydir/") f.read()

Related

GMail Oauth2 authentication for yagmail stops working after a few days

urllib: Opening a url always gets 429: Too many requests

Using python script to download google sheet as csv

Accessing netflix api from python's urllib2 results in 500 error

Paster does not work when installing DB

Categories

Resources