Fill and submit html form - python

I am trying / wanting to write a Python script (2.7) that goes to a form on a website (with the name "form1") and fills in the first input-field in said form with the word hello, the second input-field with the word Ronald, and the third field with ronaldG54#gmail.com
Can anyone help me code or give me any tips or pointers on how to do this ?

Aside from Mechanize and Selenium David has mentioned, it can also be achieved with Requests and BeautifulSoup.
To be more clear, use Requests to send request to and retrieve responses from server, and use BeautifulSoup to parse the response html to know what parameters to send to the server.
Here is an example script I wrote that uses both Requests and BeautifulSoup to submit username and password to login to wikipedia:
import requests
from bs4 import BeautifulSoup as bs
def get_login_token(raw_resp):
soup = bs(raw_resp.text, 'lxml')
token = [n['value'] for n in soup.find_all('input')
if n['name'] == 'wpLoginToken']
return token[0]
payload = {
'wpName': 'my_username',
'wpPassword': 'my_password',
'wpLoginAttempt': 'Log in',
#'wpLoginToken': '',
}
with requests.session() as s:
resp = s.get('http://en.wikipedia.org/w/index.php?title=Special:UserLogin')
payload['wpLoginToken'] = get_login_token(resp)
response_post = s.post('http://en.wikipedia.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=login',
data=payload)
response = s.get('http://en.wikipedia.org/wiki/Special:Watchlist')
Update:
For your specific case, here is the working code:
import requests
from bs4 import BeautifulSoup as bs
def get_session_id(raw_resp):
soup = bs(raw_resp.text, 'lxml')
token = soup.find_all('input', {'name':'survey_session_id'})[0]['value']
return token
payload = {
'f213054909': 'o213118718', # 21st checkbox
'f213054910': 'Ronald', # first input-field
'f213054911': 'ronaldG54#gmail.com',
}
url = r'https://app.e2ma.net/app2/survey/39047/213008231/f2e46b57c8/?v=a'
with requests.session() as s:
resp = s.get(url)
payload['survey_session_id'] = get_session_id(resp)
response_post = s.post(url, data=payload)
print response_post.text

Take a look at Mechanize and Selenium. Both are excellent pieces of software that would allow you to automate filling and submitting a form, among other browser tasks.

Related

Beautifulsoup Facebook Login

I am trying to use Beautifulsoup to scrape the post data by using the below code,
but I found that the beautifulsoup fail to login, that cause the scraper return text of all the post and include the header message (text that ask you to login).
Might I know how to modify the code in order to return info for the specific post with that id not all the posts info. Thanks!
import requests
from bs4 import BeautifulSoup
class faceBookBot():
login_basic_url = "https://mbasic.facebook.com/login"
login_mobile_url = 'https://m.facebook.com/login'
payload = {
'email': 'XXXX#gmail.com',
'pass': "XXXX"
}
post_ID = ""
# login to facebook and redirect to the link with specific post
# I guess something wrong happen in below function
def parse_html(self, request_url):
with requests.Session() as session:
post = session.post(self.login_basic_url, data=self.payload)
parsed_html = session.get(request_url)
return parsed_html
# scrape the post all <p> which is the paragraph/content part
def post_content(self):
REQUEST_URL = f'https://m.facebook.com/story.php?story_fbid={self.post_ID}&id=7724542745'
soup = BeautifulSoup(self.parse_html(REQUEST_URL).content, "html.parser")
content = soup.find_all('p')
post_content = []
for lines in content:
post_content.append(lines.text)
post_content = ' '.join(post_content)
return post_content
bot = faceBookBot()
bot.post_ID = "10158200911252746"
You can't, facebook encrypts password and you don't have encryption they use, server will never accept it, save your time and find another way
#AnsonChan yes, you could open the page with selenium, login and then copy it's cookies to requests:
from selenium import webdriver
import requests
driver = webdriver.Chrome()
driver.get('http://facebook.com')
# login manually, or automate it.
# when logged in:
session = requests.session()
[session.cookies.update({cookie['name']: cookie['value']}) for cookie in driver.get_cookies()]
driver.quit()
# get the page you want with requests
response = session.get('https://m.facebook.com/story.php?story_fbid=123456789')

Login into a php-website and webscrape website using python without selenium

I am trying to login to this website https://www.icloudemserp.com/tpct/ for scraping some data but I am unable to login. I was trying it with requests in python, using get to get the URL and post to send the post URL and the form data. It just doesn't work or either I don't understand it.
I know how I can achieve this with selenium. The website only posts to this website after I log in https://www.icloudemserp.com/corecampus/checkuser1.php and this is the form data:
[General][1]
[Form data][2]
form data:
branchid: 1
userid:****
pass_word:***
branchid: 17
sel_acad_yr: 2013-2014
sel_sem: Sem 1
import requests
from bs4 import BeautifulSoup
login_data = {
'branchid':'1',
'userid':'****',
'pass_word':'***',
'branchid':'17',
'sel_acad_yr': '2013-2014',
'sel_sem': 'Sem1',
}
with requests.Session() as s:
url = 'https://www.icloudemserp.com/tpct/'
r = s.get(url)
#soup = BeautifulSoup(r.content, 'html5lib')
r = s.post('https://www.icloudemserp.com/corecampus/checkuser1.php', data=login_data)
print(r.content)
Am I even on the right track?

Unable to login to indeed.com using python requests

I'm trying to write a code to collect resumes from "indeed.com" website.
In order to download resumes from "indeed.com" you have to login with your account.
The problem with me is after posting data it shows me response [200] which indicates successful post but still fail to login.
Here is my code :
import requests
from bs4 import BeautifulSoup
from lxml import html
page = requests.get('https://secure.indeed.com/account/login')
soup = BeautifulSoup(page.content, 'html.parser')
row_text = soup.text
surftok = str(row_text[row_text.find('"surftok":')+11:row_text.find('","tmpl":')])
formtok = str(row_text[row_text.find('"tk":') + 6:row_text.find('","variation":')])
logintok = str(row_text[row_text.find('"loginTk":') + 11:row_text.find('","debugBarLink":')])
cfb = int(str(row_text[row_text.find('"cfb":')+6:row_text.find(',"pvr":')]))
pvr = int(str(row_text[row_text.find('"pvr":') + 6:row_text.find(',"obo":')]))
hl = str(row_text[row_text.find('"hl":') + 6:row_text.find('","co":')])
data = {
'action': 'login',
'__email': 'myEmail',
'__password': 'myPassword',
'remember': '1',
'hl': hl,
'cfb': cfb,
'pvr': pvr,
'form_tk': formtok,
'surftok': surftok,
'login_tk': logintok
}
response = requests.post("https://secure.indeed.com/", data=data)
print response
print 'myEmail' in response.text
It shows me response [200] but when I search for my email in the response page to make sure that login is successful, I don't find it. It seems that login failed for a reason that I don't know.
send headers as well in your post request, get the headers from response headers of your browser.
headers = {'user-agent': 'Chrome'}
response = requests.post("https://secure.indeed.com/",headers = headers, data=data)
Some websites uses JavaScript redirection. "indeed.com" is one of them. Unfortunately, python requests does not support JavaScript redirection. In such situations we may use selenium.

Unable to access webpage with request in python

After some discussion with my problem on Unable to print links using beautifulsoup while automating through selenium
I realized that the main problem is in the URL which the request is not able to extract. URL of the page is actually https://society6.com/discover but I am using selenium to log into my account so the URL becomes https://society6.com/society?show=2
However, I can't use the second URL with request since its showing error. How do i scrap information from URL like this.
You need to log in first!
To do that you can use the bs4.BeautifulSoup library.
Here is an implementation that I have used:
import requests
from bs4 import BeautifulSoup
BASE_URL = "https://society6.com/"
def log_in_and_get_session():
"""
Get the session object with login details
:return: requests.Session
"""
ss = requests.Session()
ss.verify = False # optinal for uncertifaied sites.
text = ss.get(f"{BASE_URL}login").text
csrf_token = BeautifulSoup(text, "html.parser").input["value"]
data = {"username": "your_username", "password": "your_password", "csrfmiddlewaretoken": csrf_token}
# results = ss.post("{}login".format(BASE_URL), data=data)
results = ss.post("{}login".format(BASE_URL), data=data)
if results.ok:
print("Login success", results.status_code)
return ss
else:
print("Can't login", results.status_code)
Using the 'post` method to log in...
Hope this helps you!
Edit
Added the beginning of the function.

Mechanize not logging in?

I'm very new to python, and I'm trying to scrape a webpage using BeautifulSoup, which requires a log in.
So far I have
import mechanize
import cookielib
import requests
from bs4 import BeautifulSoup
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.open('URL')
#login form
br.select_form(nr=2)
br['email'] = 'EMAIL'
br['pass'] = 'PASS'
br.submit()
soup = BeautifulSoup(br.response().read(), "lxml")
with open("output1.html", "w") as file:
file.write(str(soup))
(With "URL" "EMAIL" and "PASS" being the website, my email and password.)
Still the page I get in output1.html is the logged out page, rather than what you would see after logging in?
How can I make it so it logs in with the details and returns what's on the page after log in?
Cheers for any help!
Let me suggest another way to obtain desired page.
It may be a little bit easy to troubleshoot.
First, you should login manually with open any browser Developer tools's page Network. After sending login credentials, you will get a line with POST request. Open the request and right side you will get the "form data" information.
Use this code to send login data and get response:
`
from bs4 import BeautifulSoup
import requests
session = requests.Session()
url = "your url"
req = session.get(url)
soup = BeautifulSoup(req.text, "lxml")
# You can collect some useful data here (like csrf code or some token)
#fill in form data here
params = {'login': 'your login',
'password': 'your password'}
req = session.post(url)
I hope this code will be helpful.

Categories