Not able to log in to site and scrape data - python

I'm trying to scrape the site data, but facing issue while logging in to the site. when I log in to the site with username and password it does not do so.
I think there is an issue with the token, every time I try to login to the system a token is generated(check in the console headers)
import requests
from bs4 import BeautifulSoup
s = requests.session()
url = "http://indiatechnoborate.tymra.com"
with requests.Session() as s:
first = s.get(url)
start_soup = BeautifulSoup(first.content, 'lxml')
print(start_soup)
retVal=start_soup.find("input",{"name":"return"}).get('value')
print(retVal)
formdata=start_soup.find("form",{"id":"form-login"})
dynval=formdata.find_all('input',{"type":"hidden"})[1].get('name')
print(dynval)
dictdata={"username":"username", "password":"password","return":retVal,dynval:"1"
}
print(dictdata)
pr = {"task":"user.login"}
print(pr)
sec = s.post("http://indiatechnoborate.tymra.com/component/users/",data=dictdata,params=pr)
print("------------------------------------------")
print(sec.status_code,sec.url)
print(sec.text)
I want to log in to the site and want to get the data after login is done

try replacing this line:
dictdata={"username":"username", "password":"password","return":retVal,dynval:"1"}
with this one:
dictdata={"username":"username", "password":"password","return":retVal + "==",dynval:"1"}
hope this helps

Try to use authentication methods instead of passing in payload
import requests
from requests.auth import HTTPBasicAuth
USERNAME = "<USERNAME>"
PASSWORD = "<PASSWORD>"
BASIC_AUTH = HTTPBasicAuth(USERNAME, PASSWORD)
LOGIN_URL = "http://indiatechnoborate.tymra.com"
response = requests.get(LOGIN_URL,headers={},auth=BASIC_AUTH)

Related

Beautifulsoup Facebook Login

I am trying to use Beautifulsoup to scrape the post data by using the below code,
but I found that the beautifulsoup fail to login, that cause the scraper return text of all the post and include the header message (text that ask you to login).
Might I know how to modify the code in order to return info for the specific post with that id not all the posts info. Thanks!
import requests
from bs4 import BeautifulSoup
class faceBookBot():
login_basic_url = "https://mbasic.facebook.com/login"
login_mobile_url = 'https://m.facebook.com/login'
payload = {
'email': 'XXXX#gmail.com',
'pass': "XXXX"
}
post_ID = ""
# login to facebook and redirect to the link with specific post
# I guess something wrong happen in below function
def parse_html(self, request_url):
with requests.Session() as session:
post = session.post(self.login_basic_url, data=self.payload)
parsed_html = session.get(request_url)
return parsed_html
# scrape the post all <p> which is the paragraph/content part
def post_content(self):
REQUEST_URL = f'https://m.facebook.com/story.php?story_fbid={self.post_ID}&id=7724542745'
soup = BeautifulSoup(self.parse_html(REQUEST_URL).content, "html.parser")
content = soup.find_all('p')
post_content = []
for lines in content:
post_content.append(lines.text)
post_content = ' '.join(post_content)
return post_content
bot = faceBookBot()
bot.post_ID = "10158200911252746"
You can't, facebook encrypts password and you don't have encryption they use, server will never accept it, save your time and find another way
#AnsonChan yes, you could open the page with selenium, login and then copy it's cookies to requests:
from selenium import webdriver
import requests
driver = webdriver.Chrome()
driver.get('http://facebook.com')
# login manually, or automate it.
# when logged in:
session = requests.session()
[session.cookies.update({cookie['name']: cookie['value']}) for cookie in driver.get_cookies()]
driver.quit()
# get the page you want with requests
response = session.get('https://m.facebook.com/story.php?story_fbid=123456789')

Python requests post login not working for this site

So I've tried everything to try to login to this site with sessions and python requests but it doesn't seem to work and keeps redirecting me to the login page when I try to access the protected url. (status_code = 302)
import time
import smtplib
import requests
from bs4 import BeautifulSoup
from lxml import html
url = "https://beatyourcourse.com/school_required#"
protected_url = "https://beatyourcourse.com/flyering"
session = requests.Session()
responce = session.get(url)
tree = html.fromstring(responce.text)
token = list(set(tree.xpath("//input[#name='authenticity_token']/#value")))[0]
payload = {
'user[email]' : '****',
'user[password]' : '****',
'authenticity_token' : token
}
responce = session.post(url, data = payload) #Logging in
responce = session.get(protected_url) # visiting protected url
print responce.url # prints "https://beatyourcourse.com/school_required#" (redirected to login page)

Mechanize not logging in?

I'm very new to python, and I'm trying to scrape a webpage using BeautifulSoup, which requires a log in.
So far I have
import mechanize
import cookielib
import requests
from bs4 import BeautifulSoup
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.open('URL')
#login form
br.select_form(nr=2)
br['email'] = 'EMAIL'
br['pass'] = 'PASS'
br.submit()
soup = BeautifulSoup(br.response().read(), "lxml")
with open("output1.html", "w") as file:
file.write(str(soup))
(With "URL" "EMAIL" and "PASS" being the website, my email and password.)
Still the page I get in output1.html is the logged out page, rather than what you would see after logging in?
How can I make it so it logs in with the details and returns what's on the page after log in?
Cheers for any help!
Let me suggest another way to obtain desired page.
It may be a little bit easy to troubleshoot.
First, you should login manually with open any browser Developer tools's page Network. After sending login credentials, you will get a line with POST request. Open the request and right side you will get the "form data" information.
Use this code to send login data and get response:
`
from bs4 import BeautifulSoup
import requests
session = requests.Session()
url = "your url"
req = session.get(url)
soup = BeautifulSoup(req.text, "lxml")
# You can collect some useful data here (like csrf code or some token)
#fill in form data here
params = {'login': 'your login',
'password': 'your password'}
req = session.post(url)
I hope this code will be helpful.

Login into web site using Python

This question has been addresses in various shapes and flavors but I have not been able to apply any of the solutions I read online.
I would like to use Python to log into the site: https://app.ninchanese.com/login
and then reach the page: https://app.ninchanese.com/leaderboard/global/1
I have tried various stuff but without success...
Using POST method:
import urllib
import requests
oURL = 'https://app.ninchanese.com/login'
oCredentials = dict(email='myemail#hotmail.com', password='mypassword')
oSession = requests.session()
oResponse = oSession.post(oURL, data=oCredentials)
oResponse2 = oSession.get('https://app.ninchanese.com/leaderboard/global/1')
Using the authentication function from requests package
import requests
oSession = requests.session()
oResponse = oSession.get('https://app.ninchanese.com/login', auth=('myemail#hotmail.com', 'mypassword'))
oResponse2 = oSession.get('https://app.ninchanese.com/leaderboard/global/1')
Whenever I print oResponse2, I can see that I'm always on the login page so I am guessing the authentication did not work.
Could you please advise how to achieve this?
You have to send the csrf_token along with your login request:
import urllib
import requests
import bs4
URL = 'https://app.ninchanese.com/login'
credentials = dict(email='myemail#hotmail.com', password='mypassword')
session = requests.session()
response = session.get(URL)
html = bs4.BeautifulSoup(response.text)
credentials['csrf_token'] = html.find('input', {'name':'csrf_token'})['value']
response = session.post(URL, data=credentials)
response2 = session.get('https://app.ninchanese.com/leaderboard/global/1')

How to maintain login session while logging in using Python Script

I want to login to Ideone.com using python script and then extract stuff from my own account using subsequent requests using python script.
This is what I used for logging in to the website:
import requests
import urllib
from bs4 import BeautifulSoup
url='http://ideone.com/account/login/'
body = {'username':'USERNAME', 'password':'PASSWORD'}
s = requests.Session()
loginPage = s.get(url)
soup = BeautifulSoup(loginPage.text)
r = s.post(soup.form['action'], data = body)
print r
This code successfully logs me in to my ideone account.
But if I make subsequent call(using BeautifulSoup) to access my account details, it send me HTML of login page again.
How can I save session for a particular script so that it accepts the subsequent calls?
Thanks in advance and sorry if this has been asked earlier.
Here is how we can do this:
from requests import session
from bs4 import BeautifulSoup
payload = {
'action' : 'login',
'username' : 'USERNAME',
'password' : 'PASSWORD'
}
login_url='http://ideone.com/account/login/'
with session() as c:
c.post(login_url, data = payload)
request = c.get('http://ideone.com/myrecent')
print request.headers
print request.text

Categories