Python urllib or requests library get stuck to open certain URLs?

Python urllib or requests library get stuck to open certain URLs? - python

I am trying to send HTTP GET request to certain website, for example, https://www.united.com, but it get stuck with no response.
Here is the code:
from urllib.request import urlopen
url = 'https://www.united.com'
resp = urlopen(url,timeout=10 )
Every time, it goes timeout. But the same code works for other URLs, for example, https://www.aa.com.
So I wonder what is behind https://www.united.com that keeps me from getting the HTTP request through. Thank you!
Update:
Adding a request header still doesn't work for this site:
from urllib.request import urlopen
url = 'https://www.united.com'
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36'
}
)
resp = urlopen(req,timeout=3 )

The server of united.com might be only responding to certain user-agent strings or request headers and blocking for others. You have to send certain headers or user-agent string which are allowed by their server. This depends upon website to website who want to add some more security to their applications so they are very specific about user-agents like which resource is trying to access them.

Related

Blob URLs with python requests

I'm trying to figure out why the request i send gives me this error :
requests.exceptions.InvalidSchema: No connection adapters were found for 'blob:https://192.168.56.108/7020557a-95f0-4560-a3d4-94e23bc3db4a'
In another thread, i read that it was due to https missing. But in my url i still do have it. Here is the code i wrote to send the request :
url_image = 'blob:https://192.168.56.108/7020557a-95f0-4560-a3d4-94e23bc3db4a'
headers = {'Origin': 'https://192.168.56.108',
'Referer':'',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27'}
response = s.get(url_image, stream=True, verify=False)
print(response.url)
I also read in another thread that blob url where generated by the browser once the page is loaded. So i thought a doing a GET request to the page where i would usually download first then sending the POST request but it doesn't work still. I thought it could be for the fact that the blob url was not the one associated to the page i loaded (a new one would have been generated).
For a bit more context, i load a page on which there is a graphic that i can download. To check what happens, i use the network console. What happens is that each time i click and download that graphic. A GET request is made with a blob URL that changes each time i download.
So my question is more how to get the correct url with python requests and why would i get the first error when sending the request to the blob url ?

Why does post request only works the first time?

I'm trying to make a web scraper with python, I made it with selenium but it is really slow.Then i saw that i could speed up the project because of a button that make a post request.
import requests
from bs4 import BeautifulSoup
url = "http://vidtome.host/tnoz00am9j8p"
myobj = {
'op': 'download1',
'code':'tnoz00am9j8p',
'hash': 'the hash',
'imhuman': 'Proceed to video'
}
x = requests.post(url, data = myobj)
print(x.text)
That's the code and it works but only for the first time.
When I started it the first time it doesn't show any error and it printed me out the page with the right changes, but when i started it later it gave me no error but it printed me out the page with no changes like it doesn't do anything.
How can it be possible?

Requests are faster, but you cannot extract dynamically rendered content. However this is probably not the issue.
Problem is that you do not have access to the website.
If it is a basic human checking system, you could try to add user agent to your request
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.68',
}
r = requests.get(url, headers=headers)
If this will not work, I would recommend looking into the data that you are passing. Maybe it is validating through it and it contains expired values or something.

Unable to fetch a response - Request library Python

I am unable to fetch a response from this url. While it works in browser, even in incognito mode. Not sure why it is not working. It is just keep running without any output. No errors. I even tried request headers by setting 'user-agent' key but again received no response
Following is the code used:
import requests
response = requests.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=eqbhav&date=04-12-2020&section=EQ')
print(response.text)
I want html text from the response page for further use.

Your server is checking to see if you are sending the request from a web browser. If not, it's not returning anything. Try this:
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0'}
r=requests.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=eqbhav&date=04-12-2020&section=EQ', timeout=3, headers=headers)
print(r.text)

How to implement ajax request using Python Request

I'm trying to log into a website using Python request. Unfortunately, it is always showing this error when printing its content.
b'<head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>
For reference my code
from requests import Session
import requests
INDEX_URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/index.php'
URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php'
LOGIN_URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/login.php' # Or whatever the login request url is
payload = {'user_email': 'test#phpzag.com','password':'test'}
s = requests.Session()
user_agent = {'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36'}
t=s.post(LOGIN_URL, data=payload, headers=user_agent)
r=s.get('https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php',headers=user_agent,cookies=t.cookies.get_dict())
print(r.content)
May I know what is missing and how can I get HTML code of welcome page from this
UPDATE
I'm trying to get make an API call after login authentication. However, I'm not able to succeed in login authentication. Hence I am not able to get the response of API Call. As per my thought it due to multi-factor authentication it is getting failed. I need to know how can I implement this?
For eg: www.abc.com is the URL of the website. The login is done through JS form submission Hence URL is specified in the ajax part. On the success of that, there is another third authentication party(okta) which will also verify the credentials and finally reach the home page. then I need to call the real API for my task.
But it is not working.
import requests
import sys
class Login:
def sendRequestWithAuthentication(self,loginDetails,requestDetails):
user_agent = {'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36'}
action_url=loginDetails['action_url'] if 'action_url' in loginDetails.keys() else None
pay_load=loginDetails['payload'] if 'payload' in loginDetails.keys() else None
session_requests = requests.session()
if action_url and pay_load:
act_resp=session_requests.post(action_url, data=pay_load, headers=user_agent,verify=False,files=[ ])
print(act_resp)
auth_cookies=act_resp.cookies.get_dict()
url,method,request_payload = requestDetails['url'],requestDetails['method'],requestDetails['payload']
querystring=requestDetails['querystring']
response=session_requests.get(url,headers=user_agent,cookies=auth_cookies,data=request_payload,params=querystring)
print(response)
return response.json()
In the above action URL is the API given in the ajax part & in the second request, the URL is the API address for that GET.
In short, may I know how can implement multifactor authentication in python request
My Doubt
Do we need the cookies from the login form page to include in the login request
How to implement multifactor authentication in python request(Here we don't need any pin or something it is done through RSA.)Is there any need of a certificate for login as it now raising unable to validate the SSL certificate
Give a dummy example api that is implement such kind of scenario

No, you make it complex.This code worked:
import requests
login_url = "https://phpzag.com/demo/ajax_login_script_with_php_jquery/login.php"
welcome_url = "https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php"
payload = 'user_email=test#phpzag.com&password=test&login_button='
login_headers = {
'x-requested-with': 'XMLHttpRequest',
'Content-Type': 'application/x-www-form-urlencoded', # its urlencoded instead of form-data
'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36',
}
s = requests.Session()
login = s.post(login_url, headers=login_headers, data=payload) # post requests
welcome = s.get(welcome_url, headers=login_headers)
print(welcome.text)
Result:
.....Hello, <br><br>Welcome to the members page.<br><br>

TL;DR
Change the part of your code that says data=payload to json=payload, and it should work.
Direct answer to your question
How [does one] implement [an] AJAX request using Python Requests?
You cannot do that. An AJAX request is specifically referring to a Javascript-based HTTP request. To quote from W3 school's AJAX introduction page, "AJAX = Asynchronous JavaScript And XML".
Indirect answer to your question
What I believe you're asking is how to perform auth/login HTTP requests using the popular python package, requests. The short answer— unfortunately, and like most things—is that it depends. Various auth pages handle the auth requests differently, and so you might have to do different things in order to authenticate against the specific web service.
Based on your code
I'm going to make some assumptions that the login page is probably looking for a POST request with the authentication details (e.g. credentials) in the form of a JSON object based on your code, and based on the response back from the server being a 406 error meaning that you're sending data with an accept header that doesn't align with how the server wants to respond.
When using requests, using the data parameter to the request function will send the data "raw"; that is, it'll send it in the native data format it is (like in cases of binary data), or it'll translate it to standard HTML form data if that format doesn't work (e.g. key1=value1&key2=value2&key3=value3, this form has the MIME type of application/x-www-form-urlencoded and is what requests will send when data has not been specified with an accept header). I'm going to make an educated guess based on the fact that you put your credentials into a dictionary that the login form is expecting a POST request with a JSON-formatted body (most modern web apps do this), and you were under the impression that setting the data parameter to requests will make this into a JSON object. This is a common gotcha/misconception with requests that has bitten me before. What you want is instead to pass the data using the json parameter.
Your code:
from requests import Session
import requests
INDEX_URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/index.php'
URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php'
LOGIN_URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/login.php' # Or whatever the login request url is
payload = {'user_email': 'test#phpzag.com','password':'test'}
s = requests.Session()
user_agent = {'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36'}
t=s.post(LOGIN_URL, data=payload, headers=user_agent)
r=s.get('https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php',headers=user_agent,cookies=t.cookies.get_dict())
print(r.content)
Fixed (and cleaned up) code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Test script to login to php web app.
"""
import requests
INDEX_URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/index.php'
URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php'
LOGIN_URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/login.php' # Or whatever the login request url is
payload = {
'user_email': 'test#phpzag.com',
'password':'test'
}
headers = {
'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36'
}
session = requests.Session()
auth_response = session.post(
url=LOGIN_URL,
json=payload, # <--- THIS IS THE IMPORTANT BIT. Note: data param changed to json param
headers=user_agent
)
response = session.get(
'https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php',
headers=headers,
cookies=auth_response.cookies.get_dict() # TODO: not sure this is necessary, since you're using the session object to initiate the request, so that should maintain the cookies/session data throughout the session...
)
print(response.content)
Check out this section of the requests documentation on POST requests, if you scroll down a bit from there you'll see the docs talk about the github API which expects JSON and how to handle that.
Auth can be tricky overall. Sometimes things will want "basic auth", which requests will expect you to pass as a tuple to the auth parameter, sometimes they'll want a bearer token / OAUTH thing which can get headache-inducing-ly complicated/annoying.
Hope this helps!

You are missing the User agent that the server (apache?) requires
Try this:
import requests
from requests import Session
URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/welcome.php'
LOGIN_URL = 'https://phpzag.com/demo/ajax_login_script_with_php_jquery/login.php' # Or whatever the login request url is
payload = {'user_email': 'test#phpzag.com','password':'test'}
user_agent = {'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36'}
s = requests.Session()
x=s.get(URL, headers=user_agent)
x=s.post(LOGIN_URL, data=payload, headers=user_agent)
print(x.content)
print(x.status_code)

Take a look at Requests: Basic Authentication
import requests
requests.post(URL, auth=('user', 'pass'))
# If there are some cookies you need to send
cookies = dict(cookies_are='working')
requests.post(URL, auth=('user', 'pass'), cookies=cookies)

403 Forbidden Error when scraping a site, user-agents already used and updated. Any ideas?

As the title above states I am getting a 403 error. The URLs generated are valid, I can print them and then open them in my browser just fine.
I've got a user agent, it's the exact same one that my browser sends when accessing the page I want to scrape pulled straight from chrome devtools. I've tried using sessions instead of a straight request, I've tried using urllib, and I've tried using a generic request.get.
Here's the code I'm using, that 403s. Same result with request.get etc.
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'}
session = requests.Session()
req = session.get(URL, headers=headers)
So yeah, I assume I'm not creating the useragent write so it can tell I am scraping. But I'm not sure what I'm missing, or how to find that out.

I got all headers from DevTools and I started removing headers one by one and I found it needs only Accept-Language and it doesn't need User-Agent and it doesn't need Session.
import requests
url = 'https://www.g2a.com/lucene/search/filter?&search=The+Elder+Scrolls+V:+Skyrim&currency=nzd&cc=NZD'
headers = {
'Accept-Language': 'en-US;q=0.7,en;q=0.3',
}
r = requests.get(url, headers=headers)
data = r.json()
print(data['docs'][0]['name'])
Result:
The Elder Scrolls V: Skyrim Special Edition Steam Key GLOBAL

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.