I have been through numerous Google results and Stack Overflow questions trying to figure out how to do the following. Most have suggested the use of requests' session class to store session information.
Unfortunately none of the solutions provided worked with any of the sites I have tried. Obviously I'm doing something wrong and I want to figure out what that is before I drive myself crazy.
My current code:
from requests import session
from bs4 import BeautifulSoup as bs
USER = 'leinad177'
PASSWORD = '' # removed for obvious reasons
URL = 'https://en.wikipedia.org/w/index.php?title=Special:UserLogin'
with session() as s:
login_data = {'wpName': USER,
'wpPassword': PASSWORD}
r = s.post(URL, data=login_data)
r = s.get('https://en.wikipedia.org/wiki/Special:Preferences')
print bs(r.text).find('div', {'id':'mw-content-text'}).p.text.strip()
# "Please log in to change your preferences."
You are missing some POST parameters. wpLoginToken is probably the only one that is mandatory.
wpLoginAttempt:Log in
wpLoginToken:...
wpForceHttps:1
Also, the correct URL is:
https://en.wikipedia.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=login
wpLoginToken is not static, and you will have to parse it with beautifulsoup before logging in.
How to get the token:
from bs4 import BeautifulSoup as bs
import requests
s = requests.session()
URL = 'https://en.wikipedia.org/w/index.php?title=Special:UserLogin'
req = s.get(URL).text
html = bs(req)
wp_login_token = html.find("input", {"name": "wpLoginToken"}).attrs['value']
Related
This is what I have so far. I'm very new to this so point out if I'm doing anything wrong.
import requests
url = "https://9anime.to/user/watchlist"
payload = {
"username":"blahblahblah",
"password":"secretstringy"
# anything else?
}
with requests.Session() as s:
res = s.post(url, data=payload)
print(res.status_code)
You will need to inspect the form and see where the form is posting to in the action tag. In this case it is posting to user/ajax/login. Instead of requesting the watchlist URL you should post those details to the loginurl. Once you are logged in you can request your watchlist.
from lxml.html import fromstring
import requests
url = "https://9anime.to/user/watchlist"
loginurl = "https://9anime.to/user/ajax/login"
payload = {
"username":"someemail#gmail.com",
"password":"somepass"
}
with requests.Session() as s:
res = s.post(loginurl, data=payload)
print(res.content)
# b'{"success":true,"message":"Login successful"}'
res = s.get(url)
tree = fromstring(res.content)
elem = tree.cssselect("div.watchlist div.widget-body")[0]
print(elem.text_content())
# Your watch list is empty.
You would need to have knowledge (documentation of some form) on what that URL is expecting and how you are expected to interact with it. There is no way to know just given the information you have provided.
If you have some system that is able to interact with that URL already (e.g. your able to log in with your browser), then you could try to reverse-engineer what it is your browser is doing...
I'm trying to scrape the site data, but facing issue while logging in to the site. when I log in to the site with username and password it does not do so.
I think there is an issue with the token, every time I try to login to the system a token is generated(check in the console headers)
import requests
from bs4 import BeautifulSoup
s = requests.session()
url = "http://indiatechnoborate.tymra.com"
with requests.Session() as s:
first = s.get(url)
start_soup = BeautifulSoup(first.content, 'lxml')
print(start_soup)
retVal=start_soup.find("input",{"name":"return"}).get('value')
print(retVal)
formdata=start_soup.find("form",{"id":"form-login"})
dynval=formdata.find_all('input',{"type":"hidden"})[1].get('name')
print(dynval)
dictdata={"username":"username", "password":"password","return":retVal,dynval:"1"
}
print(dictdata)
pr = {"task":"user.login"}
print(pr)
sec = s.post("http://indiatechnoborate.tymra.com/component/users/",data=dictdata,params=pr)
print("------------------------------------------")
print(sec.status_code,sec.url)
print(sec.text)
I want to log in to the site and want to get the data after login is done
try replacing this line:
dictdata={"username":"username", "password":"password","return":retVal,dynval:"1"}
with this one:
dictdata={"username":"username", "password":"password","return":retVal + "==",dynval:"1"}
hope this helps
Try to use authentication methods instead of passing in payload
import requests
from requests.auth import HTTPBasicAuth
USERNAME = "<USERNAME>"
PASSWORD = "<PASSWORD>"
BASIC_AUTH = HTTPBasicAuth(USERNAME, PASSWORD)
LOGIN_URL = "http://indiatechnoborate.tymra.com"
response = requests.get(LOGIN_URL,headers={},auth=BASIC_AUTH)
After some discussion with my problem on Unable to print links using beautifulsoup while automating through selenium
I realized that the main problem is in the URL which the request is not able to extract. URL of the page is actually https://society6.com/discover but I am using selenium to log into my account so the URL becomes https://society6.com/society?show=2
However, I can't use the second URL with request since its showing error. How do i scrap information from URL like this.
You need to log in first!
To do that you can use the bs4.BeautifulSoup library.
Here is an implementation that I have used:
import requests
from bs4 import BeautifulSoup
BASE_URL = "https://society6.com/"
def log_in_and_get_session():
"""
Get the session object with login details
:return: requests.Session
"""
ss = requests.Session()
ss.verify = False # optinal for uncertifaied sites.
text = ss.get(f"{BASE_URL}login").text
csrf_token = BeautifulSoup(text, "html.parser").input["value"]
data = {"username": "your_username", "password": "your_password", "csrfmiddlewaretoken": csrf_token}
# results = ss.post("{}login".format(BASE_URL), data=data)
results = ss.post("{}login".format(BASE_URL), data=data)
if results.ok:
print("Login success", results.status_code)
return ss
else:
print("Can't login", results.status_code)
Using the 'post` method to log in...
Hope this helps you!
Edit
Added the beginning of the function.
This question has been addresses in various shapes and flavors but I have not been able to apply any of the solutions I read online.
I would like to use Python to log into the site: https://app.ninchanese.com/login
and then reach the page: https://app.ninchanese.com/leaderboard/global/1
I have tried various stuff but without success...
Using POST method:
import urllib
import requests
oURL = 'https://app.ninchanese.com/login'
oCredentials = dict(email='myemail#hotmail.com', password='mypassword')
oSession = requests.session()
oResponse = oSession.post(oURL, data=oCredentials)
oResponse2 = oSession.get('https://app.ninchanese.com/leaderboard/global/1')
Using the authentication function from requests package
import requests
oSession = requests.session()
oResponse = oSession.get('https://app.ninchanese.com/login', auth=('myemail#hotmail.com', 'mypassword'))
oResponse2 = oSession.get('https://app.ninchanese.com/leaderboard/global/1')
Whenever I print oResponse2, I can see that I'm always on the login page so I am guessing the authentication did not work.
Could you please advise how to achieve this?
You have to send the csrf_token along with your login request:
import urllib
import requests
import bs4
URL = 'https://app.ninchanese.com/login'
credentials = dict(email='myemail#hotmail.com', password='mypassword')
session = requests.session()
response = session.get(URL)
html = bs4.BeautifulSoup(response.text)
credentials['csrf_token'] = html.find('input', {'name':'csrf_token'})['value']
response = session.post(URL, data=credentials)
response2 = session.get('https://app.ninchanese.com/leaderboard/global/1')
I want to login to Ideone.com using python script and then extract stuff from my own account using subsequent requests using python script.
This is what I used for logging in to the website:
import requests
import urllib
from bs4 import BeautifulSoup
url='http://ideone.com/account/login/'
body = {'username':'USERNAME', 'password':'PASSWORD'}
s = requests.Session()
loginPage = s.get(url)
soup = BeautifulSoup(loginPage.text)
r = s.post(soup.form['action'], data = body)
print r
This code successfully logs me in to my ideone account.
But if I make subsequent call(using BeautifulSoup) to access my account details, it send me HTML of login page again.
How can I save session for a particular script so that it accepts the subsequent calls?
Thanks in advance and sorry if this has been asked earlier.
Here is how we can do this:
from requests import session
from bs4 import BeautifulSoup
payload = {
'action' : 'login',
'username' : 'USERNAME',
'password' : 'PASSWORD'
}
login_url='http://ideone.com/account/login/'
with session() as c:
c.post(login_url, data = payload)
request = c.get('http://ideone.com/myrecent')
print request.headers
print request.text