Unable to log in to Amazon using Python - python

I'm using Python 3 to write a script to log in to Amazon to grab my Kindle highlights. It is based on this article: https://blog.jverkamp.com/2015/07/02/scraping-kindle-highlights/
I am unable to successfully log in and instead get a message saying to enable cookies to continue:
<RequestsCookieJar[<Cookie ubid-main=189-4768762-8531647 for .amazon.com/>]>
Failed to login:
Please Enable Cookies to Continue
To continue shopping at Amazon.com, please enable cookies in your Web browser.
Learn more about cookies and how to enable them.
I have included requests sessions to handle cookies, but it doesn't seem to be working.
Here is the code I am using to try to do this:
import bs4, requests
session = requests.Session()
session.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'
}
# Log in to Amazon, we have to get the real login page to bypass CSRF
print('Logging in...')
response = session.get('https://kindle.amazon.com/login')
soup = bs4.BeautifulSoup(response.text, "html.parser")
signin_data = {}
signin_form = soup.find('form', {'name': 'signIn'})
for field in signin_form.find_all('input'):
try:
signin_data[field['name']] = field['value']
except:
pass
signin_data[u'ap_email'] = 'myemail'
signin_data[u'ap_password'] = 'mypassword'
response = session.post('https://www.amazon.com/ap/signin', data = signin_data)
soup = bs4.BeautifulSoup(response.text, "html.parser")
warning = soup.find('div', {'id': 'message_warning'})
if warning:
print('Failed to login: {0}'.format(warning.text))
Is there something I'm missing with my use of sessions?

2020 - this code will no longer work. Amazon has added JavaScript to its sign in pages which if not executed, make this sequence fail. Retrieved pages will state cookies are not enabled even though they are and work. Sending both username and password together results in a verification page response which included a captcha. Sending username then sending password in a 2nd exchange results in the reply “something went wrong” and will ask for username/password again. Amazon recognizes the JavaScript was not executed.

Your signin form data is actually not correct it should be email and password:
signin_data[u'email'] = 'your_email'
signin_data[u'password'] = 'your_password'
You can also avoid the try with a css select and has_attr:
import bs4, requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'
}
from bs4 import BeautifulSoup
with requests.Session() as s:
s.headers = headers
r = s.get('https://kindle.amazon.com/login')
soup = BeautifulSoup(r.content, "html.parser")
signin_data = {s["name"]: s["value"]
for s in soup.select("form[name=signIn]")[0].select("input[name]")
if s.has_attr("value")}
signin_data[u'email'] = 'your_em'
signin_data[u'password'] = 'pass'
response = s.post('https://www.amazon.com/ap/signin', data=signin_data)
soup = bs4.BeautifulSoup(response.text, "html.parser")
warning = soup.find('div', {'id': 'message_warning'})
if warning:
print('Failed to login: {0}'.format(warning.text))
print(response.content)
The first line of the output, you can see <title>Amazon Kindle: Home</title> at the end:
b'<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">\n <head>\n <title>Amazon Kindle: Home</title>\n
If it is not working still, you should update your version of requests and maybe try another user-agent. Once I changed the ap_email and ap_password I logged in fine.

Related

Login with requests in Python no wrking

I tried to login into a website. I have no clue why it doesnt work. To login you need something called 'woocommerce-login-nonce'... I scraped the token(which changes when you refresh the site )via beautifulsoup. This is how the token looks in the html file on the site <input type="hidden" id="woocommerce-login-nonce" name="woocommerce-login-nonce" value="28a347ad37"> After I had the token I combined it with username etc. and inserted it with payload. However, I still cant log into the site! Can anyone help ? This is the form data:
username: test#gmail.com
password: TestPassword123
woocommerce-login-nonce: 28a347ad37
_wp_http_referer: /my-account/
login: Log in
Here is my code:
from bs4 import BeautifulSoup
import requests
source = requests.get("https://sneakerboxtlv.com/my-account/")
src = source.content
soup = BeautifulSoup(source.text,'lxml')
s = requests.Session()
payload = {"username": "test#gmail.com","password": "Testpassword123","woocommerce-login-nonce": soup.find("input",{"name":"woocommerce-login-nonce"})['value'],"_wp_http_referer": "/my-account/","login": "Log in"}
visit = s.get('https://sneakerboxtlv.com/my-account/')
login= s.post('https://sneakerboxtlv.com/my-account/', data=payload)```
Try adding user-agent and update-insecure-requests to your headers. I think that your payload looks OK.
from bs4 import BeautifulSoup
import requests
source = requests.get("https://sneakerboxtlv.com/my-account/")
src = source.content
soup = BeautifulSoup(source.text,'html.parser')
s = requests.Session()
payload = {"username": "test#gmail.com","password": "Testpassword123","woocommerce-login-nonce": soup.find("input",{"name":"woocommerce-login-nonce"})['value'],"_wp_http_referer": "/my-account/","login": "Log in"}
headers = {"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36", "upgrade-insecure-requests":"1"}
visit = s.get('https://sneakerboxtlv.com/my-account/')
login= s.post('https://sneakerboxtlv.com/my-account/', data=payload, headers=headers)
I didn't create an account to test it, so let me know how it goes.

Requests login into website only getting 403 error

I am trying to login into www.ebay-kleinanzeigen.de using the requests library, but every time I try to post my data (on the register page its the same as on the login page) I am getting a 403 error.
Here is the code for the register function:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
headers = {'user-agent': user_agent, 'Referer': 'https://www.ebay-kleinanzeigen.de'}
with requests.Session() as c:
url = 'https://www.ebay-kleinanzeigen.de/m-benutzer-anmeldung.html'
c.headers = headers
hp = c.get(url, headers=headers)
soup = BeautifulSoup(hp.content, 'html.parser')
crsf = soup.find('input', {'name': '_csrf'})['value']
print(crsf)
payload = dict(email='test.email#emailzz1.de', password='test123', passwordConfirmation='test123',
_marketingOptIn='on', _crsf=crsf)
page = c.post(url, data=payload, headers=headers)
print(page.text)
print(page.url)
print(page.status_code)
Is the problem that I need some more headers? Isn't a user-agent and a referrer enough?
I have tried adding all requested headers, but then I am getting no response.
I have managed to create a script that will successfully complete the register form you're trying to fill in using the mechanicalsoup library. Note you will have to manually check your email account for the email they send you to complete registration.
I realise this doesn't actually answer the question of why BeautifulSoup returned a 403 forbidden error however it does complete your task without encountering the same error.
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("https://www.ebay-kleinanzeigen.de/m-benutzer-anmeldung.html")
browser.select_form('#registration-form')
browser.get_current_form().print_summary()
browser["email"] = "mailuser#emailprovider.com"
browser["password"] = "testSO12345"
browser["passwordConfirmation"] = "testSO12345"
response = browser.submit_selected()
rsp_code = response.status_code
#print(response.text)
print("Response code:",rsp_code)
if(rsp_code == 200):
print("Success! Opening a local debug copy of the page... (no CSS formatting)")
browser.launch_browser()
else:
print("Failure!")

Python Screen Scraping Forbes.com

I'm writing a Python program to extract and store metadata from interesting online tech articles: "og:title", "og:description", "og:image", og:url, and og:site_name.
This is the code I'm using...
# Setup Headers
headers = {}
headers['Accept'] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
headers['Accept-Charset'] = 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
headers['Accept-Encoding'] = 'none'
headers['Accept-Language'] = "en-US,en;q=0.8"
headers['Connection'] = 'keep-alive'
headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
# Create the Request
http = urllib3.PoolManager()
# Create the Response
response = http.request('GET ', url, headers)
# BeautifulSoup - Construct
soup = BeautifulSoup(response.data, 'html.parser')
# Scrape <meta property="og:title" content=" x x x ">
if tag.get("property", None) == "og:title":
if len(tag.get("content", None)) > len(title):
title = tag.get("content", None)
The program runs fine on all but one site. On "forbes.com", I can't get to the articles using Python:
url=
https://www.forbes.com/consent/?toURL=https://www.forbes.com/sites/shermanlee/2018/07/31/privacy-revolution-how-blockchain-is-reshaping-our-economy/#72c3b4e21086
I can't bypass this consent page; which seems to be the "Cookie Consent Manager" solution from "TrustArc". On a computer, you basically provide your consent... and each consecutive run, you're able to access the articles.
If I reference the "toURL" url:
https://www.forbes.com/sites/shermanlee/2018/07/31/privacy-revolution-how-blockchain-is-reshaping-our-economy/#72c3b4e21086
And bypass the "https://www.forbes.com/consent/" page, I'm redirected back to this page.
I've tried to see if there is a cookie I could set in the header, but couldn't find the magic key.
Can anyone help me?
There is a required cookie notice_gdpr_prefs that needs to be sent to view the data :
import requests
from bs4 import BeautifulSoup
src = requests.get(
"https://www.forbes.com/sites/shermanlee/2018/07/31/privacy-revolution-how-blockchain-is-reshaping-our-economy/",
headers= {
"cookie": "notice_gdpr_prefs"
})
soup = BeautifulSoup(src.content, 'html.parser')
title = soup.find("meta", property="og:title")
print(title["content"])

Set server cookie

i'm trying to set a cookie for a website, but if I print the cookielist I only get the session ID cookie set by the website and not the one I tried to set.
I tried to follow the documenation but can't figure out why it doesn't work.
Kind regards,
Mark
import requests
from bs4 import BeautifulSoup
s = requests.session()
cookie = {"testcookie":"testvalue"}
header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36)"}
s.get("http://www.example.com", cookies=cookie, headers=header)
# Get xsrf code
loginpage = s.get("https://example.com/login/", headers=header)
soup = BeautifulSoup(loginpage.text)
xsrflist = []
source=soup.findAll('input',{"value":True})
for sources in source:
print (sources['value'])
xsrflist.append(sources["value"])
xsrf = xsrflist[0]
# Login
payload = {"username" : "usernamel", "password" : "password1", 'anti_xsrf_token' : xsrf}
login = s.post("https://example.com/login/", data=payload, cookies=cookie, headers=header)
print(s.headers)
print (requests.utils.dict_from_cookiejar(s.cookies))
You cannot set a server cookie. When you send cookies to the server, the server can do anything, including ignoring, with them.
To install selenium
pip install selenium
Here is the solution.
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('yoururl')
htmlpage = driver.page_source
#do something with htmlpage

Unable to log in to ASP.NET website with requests module of Python

I am trying to log in to an ASP.NET website using the requests module in Python.
While logging in manually in the website I can see the following headers as well as cookies.
Request Headers:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:810
Content-Type:application/x-www-form-urlencoded
Cookie:ASP.NET_SessionId=sfiziz55undlnz452gfc2d55; __utma=120481550.280814175.1411461613.1411461613.1411479534.2; __utmb=120481550.1.10.1411479534; __utmc=120481550; __utmz=120481550.1411461613.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Host:www11.davidsonsinc.com
Origin:http://www11.davidsonsinc.com
Referer:http://www11.davidsonsinc.com/Login/Login.aspx?ReturnUrl=%2fdefault.aspx
User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Form Data:
__EVENTTARGET:
__EVENTARGUMENT:
__LASTFOCUS:
__VIEWSTATE:/wEPDwUKMTY3MDM5MDAxNQ9kFgJmD2QWAgIDD2QWAgIDD2QWAgIBD2QWBAIBD2QWAmYPZBYCAg0PEA8WAh4HQ2hlY2tlZGdkZGRkAgMPDxYCHgdWaXNpYmxlaGRkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBUBjdGwwMCRDb250ZW50UGxhY2VIb2xkZXJOYXZQYW5lJExlZnRTZWN0aW9uJFVzZXJMb2dpbiRSZW1lbWJlck1lsSFPYUYvIbQNBPs/54aHYcx6GyU=
__VIEWSTATEGENERATOR:1806D926
__EVENTVALIDATION:/wEWBQLy8oGOCwKanaixDwKPr7TsAQKu3uTtBgKs+sa/CQVDEisOu4Iw1m9stXWgAAz9TWQn
ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$UserName:Username
ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$Password:password
ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$RememberMe:on
ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$LoginButton:Log In
Request Cookies
ASP.NET_SessionId: nz452gfc2d55
Response Cookies
.ASPXAUTH: 1F5A05237A1AA18795ECA108CE6E70D48FE5CBB5B38D061E0770618F6C069ABA03604335B6209CF8198AD3E98AE934F14056F5C887A92BB099BF38D639A22BC12972DEEE91BCE0BF36239BD1728E228E0E9CA1E5146A6C69E906E177CC8FB27395CE2F56B4013535C62E821384231EF0AD632474D6EBCFCD859882DBE9D420B6A8816BE6
Following is the script I use to log in in to websites using Python/Django.
import requests
with requests.Session() as c:
url = 'http://www.noobmovies.com/accounts/login/?next=/'
USERNAME = 'user name'
PASSWORD = 'password'
c.get(url)
csrftoken = c.cookies['csrftoken']
login_data = dict(csrfmiddlewaretoken=csrftoken, username=USERNAME, password=PASSWORD, next='/')
c.post(url, data=login_data, headers={"Referer":"http://www.noobmoviews.com/"})
page = c.get('http://www.noobmovies.com/user/profile/0/')
print page.status_code
But I don't know how to log in into an ASP.NET website. How do I post the data on the ASP.NET website?
import requests
from bs4 import BeautifulSoup
URL="http://www11.davidsonsinc.com/Login/Login.aspx"
headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36"}
username="username"
password="password"
s=requests.Session()
s.headers.update(headers)
r=s.get(URL)
soup=BeautifulSoup(r.content)
VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
VIEWSTATEGENERATOR=soup.find(id="__VIEWSTATEGENERATOR")['value']
EVENTVALIDATION=soup.find(id="__EVENTVALIDATION")['value']
login_data={"__VIEWSTATE":VIEWSTATE,
"__VIEWSTATEGENERATOR":VIEWSTATEGENERATOR,
"__EVENTVALIDATION":EVENTVALIDATION,
"ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$UserName":username,
"ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$Password":password,
"ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$LoginButton":"Log In"}
r=s.post(URL, data=login_data)
print r.url
I was initially using requests+bs4 as well however I was running into similar issues with the ASPX site I'm scrapping. I found another library called robobrowser that wraps requests+bs4. With this you no longer have to manually set items such as "__VIEWSTATE" and friends when interacting with ASPX sites.
from robobrowser import RoboBrowser
url = ' http://www11.davidsonsinc.com'
login_url = url + '/Login/Login.aspx'
username = "username"
password = "password"
browser = RoboBrowser(history=True)
# This retrieves __VIEWSTATE and friends
browser.open(login_url)
signin = browser.get_form(id='aspnetForm')
signin["ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$UserName"].value = username
signin["ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$Password"].value = password
signin["ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$LoginButton"].value = "Log In"
browser.submit_form(signin)
print browser.url
I think this is cleaner and more generic.
import requests
from bs4 import BeautifulSoup
url="http://www11.davidsonsinc.com/Login/Login.aspx"
username="username"
password="password"
session = requests.Session()
# Dont botter with headers at first
# s.headers.update(headers)
response = session.get(url)
soup = BeautifulSoup(response.content)
login_data = {}
# get the aspnet state form data needed with bsoup
aspnetstates = ['__VIEWSTATE', '__VIEWSTATEGENERATOR', '__EVENTVALIDATION', '__EVENTTARGET',
'__EVENTARGUMENT', '__VIEWSTATEENCRYPTED' ];
for aspnetstate in aspnetstates: # search for existing aspnet states and get its values
result = soup.find('input', {'name': aspnetstate})
if not (result is None): # when existent (some may not be needed!)
login_data.update({aspnetstate : result['value']})
login_data.update(
{"ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$UserName" : username,
"ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$Password" : password,
"ctl00$ContentPlaceHolderNavPane$LeftSection$UserLogin$LoginButton" : "Log In"})
response = session.post(url, data=login_data)

Categories