I am trying to instrument Chrome and the driver needs to include a payload:
payload={
'response_type': 'code',
'redirect_uri': config.redirect_uri,
'client_id': client_code,
}
#
## Print chrome_options
# print(get_chrome_options())
#
## Open browser window to authenticate using User ID and Password
browser = webdriver.Chrome(config.chrome_driver_path, options=self.get_chrome_options(), params=payload)
This is generating an error:
got an unexpected keyword argument 'params'
Is it possible to send a payload with the driver?
#jared - I was doing this out of ignorance. I managed to solve the issue by using query strings/parameters:
url_encoded_client_code = urllib.parse.quote(client_code)
url_with_qstrings = "https://url?response_type=code&redirect_uri=http%3A%2F%2Flocalhost&client_id=url_encoded_client_code"
browser = webdriver.Chrome(config.chrome_driver_path, options=self.get_chrome_options())
browser.get(url_with_qstrings)
Related
first of all I'm a python beginner and this my first trial to scrape websites
I'm trying to scrape a website and I found my way using the cookies, but the cookies seems to expire every 30 minute, so I tried to log in using username, password and cookies but I'm not able to get there
the code I'm trying with
response = requests.get('https://aff.ven-door.com/login')
soup = BeautifulSoup(response.content, 'html.parser')
token = soup.find('input', {'name':'_token'})['value']
payload = {
'token' : token,
'username' : 'mail',
'password' : 'pass'
}
with requests.session() as s:
s.post(login_url, data=payload)
logged_in = s.get('https://aff.ven-door.com/products')
soup = BeautifulSoup(logged_in.content, 'html.parser')
I doubted that I created the payload dictionary the wrong way, so I tried getting the tokens at the top of the payload and another trial with it as the last key value pair but it didn't work
also tried to change the username key to name as it is in the page source so it is
{'token' : 'actual token',
name' : 'actual login':
password: actual password}
but also didn't work
Tried to replace the key of the token to be either '_token' or 'csrf-token' to match the page source with no luck -actually I don't know what I'm doing-
I'm sure I'm doing something wrong but can't figure what it is
Not sure how to do this using only requests/beautifulsoup by sending the credentials in the payload. It depends on how the site has set up its authentication so it might challenging.
One option is to do the scraping using Selenium instead (python library).
Selenium opens an actual browser that is controlled by python. You can then find the input fields and type the credentials there and then click the submit button. from there you can continue scraping as normal by getting the cookies from the selenium browsers logged in session and using that.
This will be slower than finding a way to do it with only requests/beautifulsoup.
I am trying to build an azure function in order to get some data from the autodesk forge api and put into a centralised data warehouse. When I test everything locally it is working and updating my tables, however when I deploy as a function to azure I am getting an authentication issue when trying to use a 3 legged token.
I am using this python wrapper: https://github.com/lfparis/forge-python-wrapper/tree/75868b11a3d8bac4b65f66b905c2313a35ba5711/forge
When I run locally, the authentication works fine and I get the access token etc. However when running on azure, instead of being taken to my callback url, it is instead directing me to https://auth.autodesk.com/as/NH3Mc/resume/as/authorization.ping?opentoken=... and so has no access token in the url to extract. Do you know why I might be being redirected here?
This is the section of code which handles the three legged auth
"""https://forge.autodesk.com/en/docs/oauth/v2/reference/http/authorize-GET/""" # noqa:E501
url = "{}/authorize".format(AUTH_V1_URL)
params = {
"redirect_uri": self.redirect_uri,
"client_id": self.client_id,
"scope": " ".join(self.scopes),
"response_type": response_type,
}
url = self._compose_url(url, params)
logger.info('Start url: %s', url)
chrome_driver_path = os.environ.get("CHROMEDRIVER_PATH")
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--log-level=3")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
google_chrome_path = os.environ.get("GOOGLE_CHROME_BIN")
if google_chrome_path:
chrome_options.binary_location = google_chrome_path
try:
driver = Chrome(
executable_path=chrome_driver_path,
chrome_options=chrome_options,
)
except (TypeError, WebDriverException):
chrome_driver_path = chromedriver_autoinstaller.install()
driver = Chrome(
executable_path=chrome_driver_path,
chrome_options=chrome_options,
)
try:
driver.implicitly_wait(15)
driver.get(url)
logger.info('Start driver url: %s', driver.current_url)
user_name = driver.find_element(by=By.ID, value="userName")
logger.info('Username: %s', self.username)
user_name.send_keys(self.username)
verify_user_btn = driver.find_element(
by=By.ID, value="verify_user_btn"
)
verify_user_btn.click()
logger.info('After first click url: %s', driver.current_url)
pwd = driver.find_element(by=By.ID, value="password")
logger.info('pwd: %s', self.password)
pwd.send_keys(self.password)
submit_btn = driver.find_element(by=By.ID, value="btnSubmit")
submit_btn.click()
logger.info('After Password url: %s', driver.current_url)
allow_btn = driver.find_element(by=By.ID, value="allow_btn")
allow_btn.click()
driver.implicitly_wait(15)
logger.info('Driver url: %s', driver.current_url)
return_url = driver.current_url
driver.quit()
except Exception as e:
self.logger.error(
"Please provide the correct user information."
+ "\n\nException: {}".format(e)
)
"chrome://settings/help"
"https://chromedriver.chromium.org/downloads"
sys.exit()
logger.info("Return url %s", return_url)
params = self._decompose_url(return_url)
logger.info("Returns params from Auth: %s", params)
self.__dict__.update(params)```
Apart from the fact that this is an unusual (and likely not officially supported) workflow to obtain a 3-legged token, I'm not even sure if this is a problem on the Autodesk Forge side. It could be some difference between your local Selenium setup, and the setup running in Azure. Have you tried inspecting the HTTP headers sent back and forth when running your Python app in Azure? Any potential differences there could provide more clues as to why you're not being redirected to the expected URL.
I'am new in web scraping, and I want to download runtime csv file (The button has no URL, it uses JS function) after login, I have tried using https://curl.trillworks.com/# ,and it works fine, but it uses a dynamic cookies.
import requests
cookies = {
...,
}
headers = {
...
}
data = {
...
}
s = requests.Session()
response = s.post(posturl, headers=headers, cookies=cookies, data=data, verify=False)
Cookies is dynamic, so every time I want to download files, I have to get the new cookies, so I'have tried something different using the same script
payload = {
'login': 'login',
'username': 'My_name',
'password': 'My_password',
}
logurl = "http:..."
posturl = 'http:...'
s = requests.Session()
response = s.post(logurl, headers=headers, data=data)
# response = s.post(posturl, data=payload,auth=(my_name, my_password)) #This too gives me the wrong output
But this doesn't give me the right output, it gives me the first page text/html, the response headers gives me two different content type
print response.headers['Content-Type']
for the right output is 'text/csv;charset=UTF-8' but it gives me 'text/html;charset=UTF-8', and the status_code for both is 200,
for information the posturl for CSV file is the same with html page
After a deep searching, I found :
There are completely different tools for web scraping :
1.request or urllib : widely used tools, it gives us the possibility to make post and get request, and login, create persistent cookies using Session() ..., there are a great tools we can use curl visit https://curl.trillworks.com/, but that doesn't enough for complicated extracting data.
Beautifullsoup or lxml : Use for HTML Parser, navigate in source html, something like regular expression to extract desired element form HTML page get Title, find the div with id=12345, these tools can't understand the JS button, and can't make an action like post or get or click button or submit in form, its just way how to read data from request.
Mechnize or robobrowser or MechnicalSoup : Great tools for web browsing and cookies handling, browser history, we can consider these tools as a mix of request and BeautifullSoup, so we can make get and post and submit in form and navigate in page html content easily like BeautifullSoup, these technologies are not a real browser, so its cannot excute and understand JS, and send asynchronous request, or move the scrollbar, or exporting selected data form table ... so these tools not enough to make complicated requests.
Selenium : is a powerful tools and a real browser, we can get data as we want, we can make get and post and search, submit, selecting, move scroolbar, it used just like we use any navigator, using Selemenium nothing is impossible, we can use a real browser with GUI or we can use option = 'headless' for server environment.
Blow I explain how we can submit in a form and click JS button in server environment step by step.
A. Install webdriver for server environment :
Open a terminal :
sudo apt install chromium-chromedriver
sudo pip install selenium
If you want to use a webdriver for GUI interface download from https://chromedriver.chromium.org/downloads
B. Example in Python 2.7, it works too for Python 3, just edit print line
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
# define downlaoded directory
options.add_experimental_option("prefs", {
"download.default_directory": "/tmp/",
"download.prompt_for_download": False,
})
browser = webdriver.Chrome(chrome_options=options) # see edit for recent code change.
USERNAME = 'mail'
PASSWORD = 'password'
browser.get('http://www.example.com')
print browser.title
user_input = browser.find_element_by_id('mail_input')
user_input.send_keys(USERNAME)
pass_input = browser.find_element_by_id('pass_input')
pass_input.send_keys(PASSWORD)
login_button = browser.find_element_by_id("btn_123")
login_button.click()
csv_button = browser.find_element_by_id("btn45875465")
csv_button.click()
browser.close() # to close current page or you can use too `browser.quit()` to destroy the hole of webdriver instance
# Check if the file was downloaded completely and seccessfully
file_path = '/tmp/file_name'
while not os.path.exists(file_path):
time.sleep(1)
if os.path.isfile(file_path):
print 'The file was downloaded completely and seccessfully'
I've to login into a site (for exemple I will use facebook.com). I can manage the login process using selenium, but I need to do it with a POST. I've tried to use requests but I'm not able to pass the info needed to the selenium webdriver in order to enter in the site as logged user. I've found on-line that exists a library that integrates selenium and requests https://pypi.org/project/selenium-requests/ , but the problem is that there is no documentation and I'm blocked in the same story.
With selenium-requests
webdriver = Chrome()
url = "https://www.facebook.com"
webdriver.get(url)
params = {
'email': 'my_email',
'pass': 'my_password'
}
resp = webdriver.request('POST','https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110', params)
webdriver.get(url)
# I hoped that the new page open was the one with me logged in but it did not works
With Selenium and requests passing the cookies
driver = webdriver.Chrome()
webdriver = Chrome()
url = "https://www.facebook.com"
driver.get(url)
#storing the cookies generated by the browser
request_cookies_browser = driver.get_cookies()
#making a persistent connection using the requests library
params = {
'email': 'my_email',
'pass': 'my_password'
}
s = requests.Session()
#passing the cookies generated from the browser to the session
c = [s.cookies.set(c['name'], c['value']) for c in request_cookies_browser]
resp = s.post('https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110', params) #I get a 200 status_code
#passing the cookie of the response to the browser
dict_resp_cookies = resp.cookies.get_dict()
response_cookies_browser = [{'name':name, 'value':value} for name, value in dict_resp_cookies.items()]
c = [driver.add_cookie(c) for c in response_cookies_browser]
driver.get(url)
In both the cases if in the end I print the cookies seems that something as changed from the beginning, but the page remains the one with the login form.
This is the codes I've tried, I put both the attempts but it is sufficient to find the solution to one of these two.
Someone can help me and know what I've to do or to change to open the page with me logged in?
Thank you in advance!
I have the same problem.
In your code, you just pass the params as is.
In this example the code would be data=params in :
resp = webdriver.request('POST','https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110', params)
I would like to integrate python Selenium and Requests modules to authenticate on a website.
I am using the following code:
import requests
from selenium import webdriver
driver = webdriver.Firefox()
url = "some_url" #a redirect to a login page occurs
driver.get(url) #the login page is displayed
#making a persistent connection to authenticate
params = {'os_username':'username', 'os_password':'password'}
s = requests.Session()
resp = s.post(url, params) #I get a 200 status_code
#passing the cookies to the driver
driver.add_cookie(s.cookies.get_dict())
The problem is that when I enter the browser the login authentication is still there when I try to access the url even though I passed the cookies generated from the requests session.
How can I modify the code above to get through the authentication web page?
I finally found out what the problem was.
Before making the post request with the requests library, I should have passed the cookies of the browser first.
The code is as follows:
import requests
from selenium import webdriver
driver = webdriver.Firefox()
url = "some_url" #a redirect to a login page occurs
driver.get(url)
#storing the cookies generated by the browser
request_cookies_browser = driver.get_cookies()
#making a persistent connection using the requests library
params = {'os_username':'username', 'os_password':'password'}
s = requests.Session()
#passing the cookies generated from the browser to the session
c = [s.cookies.set(c['name'], c['value']) for c in request_cookies_browser]
resp = s.post(url, params) #I get a 200 status_code
#passing the cookie of the response to the browser
dict_resp_cookies = resp.cookies.get_dict()
response_cookies_browser = [{'name':name, 'value':value} for name, value in dict_resp_cookies.items()]
c = [driver.add_cookie(c) for c in response_cookies_browser]
#the browser now contains the cookies generated from the authentication
driver.get(url)
I had some issues with this code because its set double cookies to the original browser cookie (before login) then I solve this with cleaning the cookies before set the login cookie to original. I used this command:
driver.delete_all_cookies()