I have a application deployed in private server.
ip = "100.10.1.1"
I want to read the source code/Web content of that page.
On browser when I am visiting to the page. It shows me "Your connection is not private".
So then after proceed to unsafe connection it takes me to that page.
Below is the code I am trying to read the HTML content. But it is not giving me the correct HTML content thought it is showing as 200 OK response.
I tried with ignoring ssl certificate with below code but not happening
import requests
url = "http://100.10.1.1"
requests.packages.urllib3.disable_warnings()
response = requests.get(url, verify=False)
print response.text
Can Someone share some idea on how to proceed or am i doing anything wrong on top of it?
Related
I am trying to make a request to a webpage using the code below and i am getting response 444.
Is there anything i can to about it?
import requests
url = "https://www.pseudosite.com/"
response = requests.get(url)
print(response) # <Response [444]>
http.dev website says the following:
When the 444 No Response status code is generated, the server returns
no information to the client and closes the HTTP Connection. This
error message can be found in the nginx logs and will not be sent to
the client. It is useful for dealing with malicious HTTP requests,
such as one that includes an illegal Host header.
I am trying to webscrape that website using python, but I am blocked at first step.
I believe that you need to add headers to view this webpage. If you open devtools on your browser, then you should see a 'get' request; if you now click on the 'Headers' tab, you can create a dictionary with the data.
While scraping without any problem, access is suddenly denied.
The error code is as below.
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.investing.com/equities/alibaba
Most of solution on google is to set User Agent in headers to avoid robot detection. This part was already applied to the code and the scrapping has done well, but it is being rejected from one day.
So, I tried fakeuseragent and random user agent to send user-agent randomly, but it was always rejected.
Secondly, to check the case through IP information, I tried to use VPN to switch IP but still denied.
The last thing I found was regarding Referer Control. So I installed the Referer Control extension, but I can't find any information on how to use it.
My code is below.
url = "https://www.investing.com/equities/alibaba"
ua = generate_user_agent()
print(ua)
headers = {"User-agent":ua}
res = requests.get(url,headers=headers)
print("Response Code :", res.status_code)
res.raise_for_status()
print("Start")
Any help will be appreciated.
I wish to make a requests with the Python requests module. I have a large database of urls I wish to download. the urls are in the database of the form page.be/something/something.html
I get a lot of ConnectionError's. If I search the URL in my browser, the page exists.
My Code:
if not webpage.url.startswith('http://www.'):
new_html = requests.get(webpage.url, verify=True, timeout=10).text
An example of a page I'm trying to download is carlier.be/categorie/jobs.html. This gives me a ConnectionError, logged as below:
Connection error, Webpage not available for
"carlier.be/categorie/jobs.html" with webpage_id "229998"
What seems to be the problem here? Why can't requests make the connection, while I can find the page in the browser?
The Requests library requires that you supply a schema for it to connect with (the 'http://' part of the url). Make sure that every url has http:// or https:// in front of it. You may want a try/except block where you catch a requests.exceptions.MissingSchema and try again with "http://" prepended to the url.
I am using python 2.7 requests module to make a web crawler. But I am having trouble while making requests to a site that requires certificate. When I made requests.get(url), it throws sslError, certificate verify failed, ok.
So, I tried requests.get(url, verify=False), it works but it returns meta http-equiv="refresh" url='...', and the url is not the one I made the request. Is there a way to solve this problem or a need to send the certificate?
I saw in requests doc that I can send the certificate and the key. I have the certificate.crt, but I don't have the key, is there a way to get the key?
The certificate is AC certisign multipla G5 and uses TLS 1.2
After a long time of trying to solve this issue, I figured it out. The problem was not with the SSL certificate.
I was making a request to a web page that needs a session; The url that I was using is redirected from another page. To access it correctly, I had to send a request to that page and get the last redirected page.
So, what I did was using Requests' Session method:
Session.get(url, verify=False)
where the url is the redirecting url.
I'm slamming my head against the wall with this one. I've been trying every example, reading every last bit I can find online about basic http authorization with urllib2, but I can not figure out what is causing my specific error.
Adding to the frustration is that the code works for one page, and yet not for another.
logging into www.mysite.com/adm goes absolutely smooth. It authenticates no problem. Yet if I change the address to 'http://mysite.com/adm/items.php?n=201105&c=200' I receive this error:
<h4 align="center" class="teal">Add/Edit Items</h4>
<p><strong>Client:</strong> </p><p><strong>Event:</strong> </p><p class="error">Not enough information to complete this task</p>
<p class="error">This is a fatal error so I am exiting now.</p>
Searching google has lead to zero information on this error.
The adm is a frame set page, I'm not sure if that's relevant at all.
Here is the current code:
import urllib2, urllib
import sys
import re
import base64
from urlparse import urlparse
theurl = 'http://xxxxxmedia.com/adm/items.php?n=201105&c=200'
username = 'XXXX'
password = 'XXXX'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl,username,password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
url = 'http://xxxxxxxmedia.com/adm/items.php?n=201105&c=200'
values = {'AvAudioCD': 1,
'AvAudioCDDiscount': 00, 'AvAudioCDPrice': 50,
'ProductName': 'python test', 'frmSubmit': 'Submit' }
#opener2 = urllib2.build_opener(urllib2.HTTPCookieProcessor())
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
This is just one of the many versions I've tried. I've followed every example from Urllib2 Missing Manual but still receive the same error.
Can anyone point to what I'm doing wrong?
Run into a similar problem today. I was using basic authentication on the website I am developing and I couldn't authenticate any users.
Here are a few things you can use to debug your problem:
I used slumber.in and httplib2 for testing purposes. I ran both from ipython shell to see what responses I was receiving.
Slumber actually uses httplib2 beneath the covers so they acted similarly. I used tcpdump and later tcpflow (which shows information in a much more readable form) to see what was really being sent and received. If you want a GUI, see wireshark or alternatives.
I tested my website with curl and when I used curl with my username/password it worked correctly and showed the requested page. But slumber and httplib2 were still not working.
I tested my website and browserspy.dk to see what were the differences. Important thing is browserspy's website works for basic authentication and my web site did not, so I could compare between the two. I read in a lot of places that you need to send HTTP 401 Not Authorized so that the browser or the tool you are using could send the username/password you provided. But what I didn't know was, you also needed the WWW-Authenticate field in the header. So this was the missing piece.
What made this whole situation odd was while testing I would see httplib2 send basic authentication headers with most of the requests (tcpflow would show that). It turns out that the library does not send username/password authentication on the first request. If "Status 401" AND "WWW-Authenticate" is in the response, then the credentials are sent on the second request and all the requests to this domain from then on.
So to sum up, your application may be correct but you might not be returning the standard headers and status code for the client to send credentials. Use your debug tools to find which is which. Also, there's debug mode for httplib2, just set httplib2.debuglevel=1 so that debug information is printed on the standard output. This is much more helpful then using tcpdump because it is at a higher level.
Hope this helps someone.
About an year ago, I went thro' the same process and documented how I solved the problem - The direct and simple way to authentication and the standard one. Choose what you deem fit.
HTTP Authentication in Python
There is an explained description, in the missing urllib2 document.
From the HTML you posted, it still think that you authenticate successfully but encounter an error afterwards, in the processing of your POST request. I tried your URL and failing authentication, I get a standard 401 page.
In any case, I suggest you try again running your code and performing the same operation manually in Firefox, only this time with Wireshark to capture the exchange. You can grab the full text of the HTTP request and response in both cases and compare the differences. In most cases that will lead you to the source of the error you get.
I also found the passman stuff doesn't work (sometimes?). Adding the base64 user/pass header as per this answer https://stackoverflow.com/a/18592800/623159 did work for me. I am accessing jenkins URL like this: http:///job//lastCompletedBuild/testReport/api/python
This works for me:
import urllib2
import base64
baseurl="http://jenkinsurl"
username=...
password=...
url="%s/job/jobname/lastCompletedBuild/testReport/api/python" % baseurl
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
request = urllib2.Request(url)
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request)
data = result.read()
This doesn't work for me, error 403 each time:
import urllib2
baseurl="http://jenkinsurl"
username=...
password=...
##urllib2.HTTPError: HTTP Error 403: Forbidden
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, username,password)
urllib2.install_opener(urllib2.build_opener(urllib2.HTTPBasicAuthHandler(passman)))
req = urllib2.Request(url)
result = urllib2.urlopen(req)
data = result.read()