I'm working on a project trying to autonomously monitor item prices on an Angular website.
Here's what a link to a particular item would look like:
https://www.<site-name>.com/categories/<sub-category>/products?prodNum=9999999
Using Selenium (in Python) on a page with product listings, I can get some useful information about the items, but what I really want is the prodNum parameter.
The onClick attribute for the items = clickOnItem(item, $index).
I do have some information for items including the presumable item and $index values which are visible within the html, but I'm doubtful there is a way of seeing what is actually happening in clickOnItem.
I've tried looking around using dev-tools to find where clickOnItem is defined, but I haven't been successful.
Considering that I don't see any way of getting prodNum without clicking, I'm wondering, is there's a way I could simulate a click to see where it would redirect to, but without actually loading the link- as this would take way too much time to do for each item?
Note: I want to get the specific prodNumber. I want to be able to hit the item page directly without first going though the main listing page.
Related
I'm new to using Selenium, and I am having trouble figuring out how to click through all iterations of a specific element. To clarify, I can't even get it to click through one as it's a dropdown but is defined as an element.
I am trying to scrape fanduel; when clicking on a specific game you are presented with a bunch of main title bets and in order to get the information I need to click the dropdowns to get to that information. There is also another drop down that states, "See More" which is a similar problem, but assuming this gets fixed I'm assuming I will be able to figure that out.
So far, I have tried to use:
find_element_by_class_name()
find_element_by_css_selector()
I have also used them in the sense of elements, and tried to loop through and click on each index of the list, but that did not work.
If there are any ideas, they would be much appreciated.
FYI: I am using beautiful soup to scrape the website for the information, I figured Selenium would be helpful making the information that isn't currently accessible, accessible.
This image shows the dropdowns that I am trying to access, in this case the dropdown 'Win Margin'. The HTML code is shown to the left of it.
This also shows that there are multiple dropdowns, varying in amount based off the game.
You can also try using action chains from selenium
menu = driver.find_element_by_css_selector(".nav")
hidden_submenu = driver.find_element_by_css_selector(".nav # submenu1")
ActionChains(driver).move_to_element(menu).click(hidden_submenu).perform()
Source: here
I am trying to go through a webpage with Selenium and create a set of all elements with certain class names, so I have been using:
elements = set(driver.find_elements_by_class_name('class name'))
However, in some cases there are thousands of elements on the page (if I scroll down), and I've noticed that this code only finds the first 18-20 elements on the page (only about 14-16 are visible to me at once). Do I need to scroll, or am I doing something else wrong? Is there any way to instantaneously get all of the elements I want in the HTML into a list without having to visually see them on the screen?
It depends on your webpage. Just look at the HTML source code (or the network log), before you scroll down. If there are just the 18-20 elements then the page lazy load the next items (e.g. Twitter or Instagram). This means, the server just renders the next items if you reached a certain point on the webpage. Otherwise all thousand items would be loaded, which would increase the page size, loading time and server load.
In this case, you have to scroll down until the end and then get the source code to parse all items.
Probably you can use more advanced methods like dealing with each chunk as a kind of page for a pagination method (e.g. not saying "go to next page" but saying "scroll down"). But I guess you're a beginner, so I would start with simple scrolling down to the end (e.g. scroll, waiting, scroll,... until there are no new elements), then fetching the HTML and then parsing it.
I have been writing a program that would hypothetically find items on a website as soon as they were loaded onto the website. As of now the script takes as input, two different values (keywords) used to describe an item and a color used to pick the color of the item. The parsing is spot on with items that are already on the website but lets say that I run my program before the website loads the items, instead of having to re run the entire script i'd like for it to just refresh the page and re-parse the data until it found it. I also included no errors in my question because from my example run of the script I entered Keywords and Color not pertaining to item on the website and instead of getting an error, I just got " Process finished with exit code 0". Thank you in advance to any who take the time to help !
Here is my code:
As another user suggested, you're probably better off using Selenium for the entire process rather than using it for only parts of your code and swapping between BSoup and Selenium.
As for reloading the page if certain items are not present, if you already know which items are supposed to be on the page then you can simply search for each item by id with selenium and if you can't find one or more then refresh the page with the following line of code:
driver.refresh()
For the life of me I can't think of a better title...
I have a Python WebDriver-based scraper that goes to Google, enters a local search such as chiropractors+new york+ny, which, after clicking on More chiropractors+New York+NY, ends up on a page like this
The goal of the scraper is to grab the phone number and full address (including suite# etc.) of each of the 20 results on such a results page. In order to do so, I need to have WebDriver click 20 entries needs to be clicked on the bring up an overlay over the Google Map:
This is mighty slow. Were it not having to trigger each of these overlays, I would be able to do everything up to that point with the much faster lxml, by going straight to the ultimate URL of the results page and then extracting via XPath. But I appear to be stuck with not being able to get data from the overlay without first clicking on the link that brings up the overlay.
Is there a way to get the data out of this page element without having to click the associated links?
I am trying to parse a WebPage whose html source code changes when I press a arrow-key to get a drop-down list.
I want to parse the contents of that drop down list. How can I do that?
Example of the Problem: If you go to this site: http://in.bookmyshow.com/hyderabad and select the arrow button on comboBox "Select Movie" a drop-down list of movies appears. I want to get a list of these movies.
Thanks in advance.
The actual URL with the data used to populate the drop-down box is here:
http://in.bookmyshow.com/getJSData/?file=/data/js/GetEvents_MT.js&cmd=GETEVENTSWEB&et=MT&rc=HYD&=1425299159643&=1425299159643
I'd be a bit careful though and double-check with the site terms of use or if there are any APIs that you could use instead.
You may want to have a look at selenium. It allows you to reproduce exacly the same steps as you do because it also uses the browser (Firefox, Chrome, etc).
Ofc, it's not as fast as using mechanize, urllib, beautifulsoup and all this stuff, but it is worth a try.
You will need to dig into the JavaScript to see how that menu gets populated. If it is getting populated via AJAX, then it might be easy to get that content by re-doing a request to the same URL (e.g., do a GET to "http://www.example.com/get_dropdown_entries.php").