I am lost on what I can do to use mechanize to fill out the form of the following website and then click submit.
https://dxtra.markets.reuters.com/Dx/DxnHtm/Default.htm
on the left side click currency information
then value dates
This is for a finance class of mine and we need the dates for many different currency pairs. I wanted to get in and put in the date in the "trade Date" and then select what "base" and "quote" I wanted then click submit and get the days. off the next page using beautiful soup.
1). is this possible using mechanize?
2). how do I go about this> I have read the docs on the website and looked all through Stackoverflow but I can't seem to get this to work at all. I was trying to get the form and then set what I want but I can't get the correct forms.
Any help would be greatly appreciated, I am not tied down to mechanize, but just not sure what the best module to use it.
This is what I have so far, and I get ZERO forms to attach a value to.
from mechanize import Browser
import urllib2
br = Browser()
baseURL = "https://dxtra.markets.reuters.com/Dx/DxnHtm/Default.htm"
br.open(baseURL)
for form in br.forms():
print form
Mechanize can't find any form on that page. It's parse only html response which you received after request with baseURL. When you click on value dates it's send another request and received another html for parsing. Seems you should use https://dxtra.markets.reuters.com/Dx/DxnOutbound/400201404162135222149001.htm as baseURL value. Also python mechanize doesn't support ajax calls. For more complicated tasks you can use python-selenium. It's more powerful tool for web-browsing.
Related
I am currently trying out Selenium to develop a program to automate testing of the login form of websites.
I am trying to use Selenium to find a form on the websites that I am testing on and I've noticed that different websites has different form name, form id or even websites that doesn't have both.
But from my observations, I've noticed that form action is always there and I've used the codes below to retrieve the name of the form action
request = requests.get("whicheverwebsite")
parseHTML = BeautifulSoup(request.text, 'html.parser')
htmlForm = parseHTML.form
formName = htmlForm['action']
I am trying to retrieve the form and then using form.submit() to submit.
I know of the functions find_element_by_name and find_element_by_name, but as I am trying to find the element by action and i am not sure how this can be done.
I've found the answer to this.
By using xpath and using form and action, I am able to achieve this.
form = driver.find_element_by_xpath("//form[#action='" + formName + "']")
I would recommend including the url of one or two of the sites you are trying to scrape and your full code. Based on the information above, it appears that you are using BeautifulSoup rather then Selenium.
I would use the following:
from selenium import webdriver
url = 'https://whicheverwebsiteyouareusing.com'
driver = webdriver.Chrome()
driver.get(url)
From there you have many options to select the form, but again, without the actual site we can't identify which would be most relevant. I would recommend reading https://selenium-python.readthedocs.io/locating-elements.html to find out which would be most applicable to your situation.
Hope this helps.
Keep in mind that login page can have multiple form tags even if you see only one. Here is the Example when login page has only one visible form though there are 3 ones in DOM.
So the most reliable way is to dig into the form (if there are multiple ones) and check two things:
If there's [type=password] element (we definitely need a password to log in)
If there's the 2nd input there (though it can be considered as optional)
Ruby example:
forms = page.all(:xpath, '//form') # retrieve all the forms and iterate
forms.each do |form|
# if there's a password field if there's two input fields in general
if form.has_css?('input[type=password']) && form.all(:xpath, '//input').count == 2)
return form
end
end
I am trying to parse a WebPage whose html source code changes when I press a arrow-key to get a drop-down list.
I want to parse the contents of that drop down list. How can I do that?
Example of the Problem: If you go to this site: http://in.bookmyshow.com/hyderabad and select the arrow button on comboBox "Select Movie" a drop-down list of movies appears. I want to get a list of these movies.
Thanks in advance.
The actual URL with the data used to populate the drop-down box is here:
http://in.bookmyshow.com/getJSData/?file=/data/js/GetEvents_MT.js&cmd=GETEVENTSWEB&et=MT&rc=HYD&=1425299159643&=1425299159643
I'd be a bit careful though and double-check with the site terms of use or if there are any APIs that you could use instead.
You may want to have a look at selenium. It allows you to reproduce exacly the same steps as you do because it also uses the browser (Firefox, Chrome, etc).
Ofc, it's not as fast as using mechanize, urllib, beautifulsoup and all this stuff, but it is worth a try.
You will need to dig into the JavaScript to see how that menu gets populated. If it is getting populated via AJAX, then it might be easy to get that content by re-doing a request to the same URL (e.g., do a GET to "http://www.example.com/get_dropdown_entries.php").
I want to cycle thru the dates at the bottom of the page using what looks like a form. But it is returning a blank. Here is my code.
import mechanize
URL='http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp'
br = mechanize.Browser()
r=br.open(URL)
for form in br.forms(): #finding the name of the form
print form.name
print form
Why is this not returning any forms? it is not a form? if not, how do I control the year and month at the bottom to cycle thru the pages?
Can someone provide some sample code on how to do it?
Trying to access that page what you are actually doing is being directed to an error page. Paste that url in a browser and you get a page with:
Not comply with the conditions of the inquiry data
and no forms at all
You need to access the page in a different way. I would suggest stepping throught the url directory until you find the right path.
I am wondering how I can fill an online form automatically. I have researched it and it tuned out that, one can uses Python ( I am more interested to know how to do it with Python because it is a scripting language I know) but documentation about it is not very good. This is what I found:
Fill form values in a web page via a Python script (not testing)
Even the "mechanize" package itself does not have enough documentation:
http://wwwsearch.sourceforge.net/mechanize/
More specifically, I want to fill the TextArea in this page (Addresses):
http://stevemorse.org/jcal/latlonbatch.html?direction=forward
so I don't know what I should look for? Should I look for "id" of the the textArea? ?It doesn't look like that it has "id" (or I am very naive!). How I can "select_form"?
Python, web gurus, please help.
Thanks
See if my answer to the other question you linked helps:
https://stackoverflow.com/a/5685569/711017
EDIT:
Here is the explicit code for your example. Now, I don't have mechanize installed right now, so I haven't been able to check the code. No online IDE's I checked have it either. But even if it doesn't work, toy around with it, and you should eventually get there:
import re
from mechanize import Browser
br = Browser()
br.open("http://stevemorse.org/jcal/latlonbatch.html?direction=forward")
br.select_form(name="display")
br["locations"] = ["Hollywood and Vine, Hollywood CA"]
response = br.submit()
print response.read()
Explanation: br emulates a browser that opens your url and selects the desired form. It's called display in the website. The textarea to enter the address is called locations, into which I fill in the address, then submit the form. Whatever the server returns is the string response.read(), in which you should find your Lat-Longs somewhere. Install mechanize and check it out.
I am using Python 2.7.1 to access an online website. I need to load a URL, then submit a POST request to that URL that causes the website to redirect to a new URL. I would then like to POST some data to the new URL. This would be easy to do, except that the website in question does not allow the user to use browser navigation. (As in, you cannot just type in the URL of the new page or press the back button, you must arrive there by clicking the "Next" button on the website). Therefore, when I try this:
import urllib, urllib2, cookielib
url = "http://www.example.com/"
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
form_data_login = "POSTDATA"
form_data_try = "POSTDATA2"
resp = opener.open(url, form_data_login)
resp2 = opener.open(resp.geturl(), form_data_try)
print resp2.read()
I get a "Do not use the back button on your browser" message from the website in resp2. Is there any way to POST data to the website resp gives me? Thanks in advance!
EDIT: I'll look into Mechanize, so thanks for that pointer. For now, though, is there a way to do it with just Python?
Have you taken a look at mechanize? I believe it has the functionality you need.
You're probably getting to that page by posting something via that Next button. You'll have to take a look at the POST parameters sent when pressing that button and add all of these post parameters to your call.
The website could though be set up in such a way that it only accepts a particular POST parameter that ensures that you'll have to go through the website itself (e.g. by hashing a timestamp in a certain way or something like that) but it's not very likely.