I am currently trying out Selenium to develop a program to automate testing of the login form of websites.
I am trying to use Selenium to find a form on the websites that I am testing on and I've noticed that different websites has different form name, form id or even websites that doesn't have both.
But from my observations, I've noticed that form action is always there and I've used the codes below to retrieve the name of the form action
request = requests.get("whicheverwebsite")
parseHTML = BeautifulSoup(request.text, 'html.parser')
htmlForm = parseHTML.form
formName = htmlForm['action']
I am trying to retrieve the form and then using form.submit() to submit.
I know of the functions find_element_by_name and find_element_by_name, but as I am trying to find the element by action and i am not sure how this can be done.
I've found the answer to this.
By using xpath and using form and action, I am able to achieve this.
form = driver.find_element_by_xpath("//form[#action='" + formName + "']")
I would recommend including the url of one or two of the sites you are trying to scrape and your full code. Based on the information above, it appears that you are using BeautifulSoup rather then Selenium.
I would use the following:
from selenium import webdriver
url = 'https://whicheverwebsiteyouareusing.com'
driver = webdriver.Chrome()
driver.get(url)
From there you have many options to select the form, but again, without the actual site we can't identify which would be most relevant. I would recommend reading https://selenium-python.readthedocs.io/locating-elements.html to find out which would be most applicable to your situation.
Hope this helps.
Keep in mind that login page can have multiple form tags even if you see only one. Here is the Example when login page has only one visible form though there are 3 ones in DOM.
So the most reliable way is to dig into the form (if there are multiple ones) and check two things:
If there's [type=password] element (we definitely need a password to log in)
If there's the 2nd input there (though it can be considered as optional)
Ruby example:
forms = page.all(:xpath, '//form') # retrieve all the forms and iterate
forms.each do |form|
# if there's a password field if there's two input fields in general
if form.has_css?('input[type=password']) && form.all(:xpath, '//input').count == 2)
return form
end
end
Related
I am trying to scrape the earnings calendar data from the table from zacks.com and the url is attached below.
https://www.zacks.com/stock/research/aapl/earnings-calendar
The thing is I am trying to scrape all data from the table, but it has a dropdown list to select 10, 25, 50 and 100 rows on a page. Ideally I want to scrape for all 100 rows but when I select 100 from the dropdown list, the url doesn't change. My code is below.
To note that the website blocks user-agent so I had to use chrome driver to impersonate human visiting the web. The obtained result from the pd.read_html is a list of all the tables and the d[4] returns the earnings calendar with only 10 rows (which I want to change to 100)
driver = webdriver.Chrome('../files/chromedriver96')
symbol = 'AAPL'
url = 'https://www.zacks.com/stock/research/{}/earnings-calendar'.format(symbol)
driver.get(url)
content = driver.page_source
d = pd.read_html(content)
d[4]
So calling help for anyone to guide me on this
Thanks!
UPDATE: it looks like my last post was downgraded due to lack of clear articulation and evidence of showing the past research. Maybe I am still a newbie to posting questions on this site. Actually, I have found several pages including this page with the same issue but the solutions didn't seem to work for me, which is why I came to post this as a new question
UPDATE 12/05:
Thanks a lot for the advise. As commented below, I finally got it working. Below is the code I used
dropdown = driver.find_element_by_css_selector('#earnings_announcements_earnings_table_length')
time.sleep(1)
hundreds = dropdown.find_element_by_xpath(".//option[. = '100']")
hundreds.click()
Having taken a look this is not going to be something that is easy to scrape. Given that the table is produced from the javascript I would say you have two options.
Option one:
Use selenium to render the page allowing the javascript to run. This way you can simply use the id/class of the drop down to interact with it.
You can then scrape the data by looking at the values in the table.
Option two:
This is the more challenging one. Look through the data that the page gets in response and try to find requests which result in the data you then see on the page. By cross-referencing these there will be a way to directly request the data you want.
You may find that to get at the data you want you need to accept a key from the original request to the page and then send that key as part of a second request. This way should allow you to scrape the data without having to run a selenium instance which will run more efficiently.
My personal suggestion is to go with option one as computer resources are cheap and developer time expensive.
I want to sharpen my python skills and build a program that would use http://www.allflicks.net/ to search for a given title and return if it is available on netflix. Since netflix has removed access to their public api I need to figure out how to search for the show inside allflicks and return the results. The issue I'm having is the way allflicks works is as the name of the show is being typed a list is being narrowed based on the input.
The examples I've seen on here and other websites assume the websites search box will automatically take you to the results once you fill the search box and do .click, but this isn't working for me. Any ideas on a specific library I might need, or any general advice would be extremely helpful. Thank you.
You want to intercept the requests your browser is making so that you can view them. There should be a request that searches for the movies based on the name. You can then use urllib or a package like requests to make the same request from inside Python. You can then interpret this request to determine whether or not the movie is on Netflix.
You may want to look into an intercepting proxy or a browser addon that will allow you to take a look at the requests and responses for your browser.
In firefox you can use Tamper Data, an add on that lets you capture outgoing requests.
A quick peak at allflicks.com shows me that a request is sent out every time you type into the search box. The response is labelled as text/html, but it appears to actually be JSON. Each request has a ton of query parameters on it, but the important one is search_value or something similarly named.
Hello so I wrote some code for this exact thing as a lot of websites gave for USA or other regions and couldnt translate to the exact answer that worked for my netflix.
https://github.com/Eglis05/netflix-selenium
You can have a look at it and report anything you dont like. :)
Most of the code I copy-pasted below.
browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get(URL)
try:
username_field = browser.find_element_by_id("id_userLoginId")
username_field.send_keys(USERNAME)
password_field = browser.find_element_by_id("id_password")
password_field.send_keys(PASSWORD)
except:
username_field = browser.find_element_by_id('email')
username_field.send_keys(USERNAME)
password_field = browser.find_element_by_id("password")
password_field.send_keys(PASSWORD)
login_button = browser.find_element_by_class_name('login-button')
login_button.click()
time.sleep(2)
profile_button = browser.find_element_by_class_name('profile-icon')
profile_button.click()
search_box = browser.find_element_by_class_name('searchTab')
search_box.click()
search_box = browser.find_element_by_css_selector("input[placeholder='Titles, people, genres']")
And lets say the movie you want to search is saved in variable movie
search_box.send_keys(movie)
time.sleep(1)
names = browser.find_elements_by_class_name('slider-refocus')
You might want to check for 1-4 first movies that show up. You save that number in the variable nr_checkings:
for i in range(nr_checkings):
name = names[2 * i + 1].get_attribute('aria-label')
if movie == name:
return True
return False
I am lost on what I can do to use mechanize to fill out the form of the following website and then click submit.
https://dxtra.markets.reuters.com/Dx/DxnHtm/Default.htm
on the left side click currency information
then value dates
This is for a finance class of mine and we need the dates for many different currency pairs. I wanted to get in and put in the date in the "trade Date" and then select what "base" and "quote" I wanted then click submit and get the days. off the next page using beautiful soup.
1). is this possible using mechanize?
2). how do I go about this> I have read the docs on the website and looked all through Stackoverflow but I can't seem to get this to work at all. I was trying to get the form and then set what I want but I can't get the correct forms.
Any help would be greatly appreciated, I am not tied down to mechanize, but just not sure what the best module to use it.
This is what I have so far, and I get ZERO forms to attach a value to.
from mechanize import Browser
import urllib2
br = Browser()
baseURL = "https://dxtra.markets.reuters.com/Dx/DxnHtm/Default.htm"
br.open(baseURL)
for form in br.forms():
print form
Mechanize can't find any form on that page. It's parse only html response which you received after request with baseURL. When you click on value dates it's send another request and received another html for parsing. Seems you should use https://dxtra.markets.reuters.com/Dx/DxnOutbound/400201404162135222149001.htm as baseURL value. Also python mechanize doesn't support ajax calls. For more complicated tasks you can use python-selenium. It's more powerful tool for web-browsing.
I want to cycle thru the dates at the bottom of the page using what looks like a form. But it is returning a blank. Here is my code.
import mechanize
URL='http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp'
br = mechanize.Browser()
r=br.open(URL)
for form in br.forms(): #finding the name of the form
print form.name
print form
Why is this not returning any forms? it is not a form? if not, how do I control the year and month at the bottom to cycle thru the pages?
Can someone provide some sample code on how to do it?
Trying to access that page what you are actually doing is being directed to an error page. Paste that url in a browser and you get a page with:
Not comply with the conditions of the inquiry data
and no forms at all
You need to access the page in a different way. I would suggest stepping throught the url directory until you find the right path.
I am trying to login into a website using Selenium. The website is http://projecteuler.net/login.
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('http://projecteuler.net/login')
username = browser.find_element_by_id('username')
username.send_keys(USERNAME_HERE)
password = browser.find_element_by_name('password')
password.send_keys(PASSWORD_HERE)
browser.find_element_by_name("login").submit()
The program is working correctly upto the last statement. I tried omitting the last statement and manually logged in and it worked. But when I added the last statement and ran the program it just seemed to reload the same page minus the information that I had placed via the program.
So it is only the submission that is giving problem. I viewed the source and confirmed whether there is some other element by that name but there was no other element by name "login". So what am I getting wrong here? Do I need to take care of something else also?
There is a weird thing happening. When I have done form submission via code and try to view the source in Google Chrome 33.0.1750.154 m I am getting the below.
Try click() instead of submit()
Submit is particularly useful for forms without submit buttons, e.g. single-input “Search” forms.
Source:-
http://selenium.googlecode.com/git/docs/api/py/selenium/selenium.selenium.html?highlight=submit#selenium.selenium.selenium.submit
In your case there is a submit button, better to just click it.