Remember elements text clicked of names in Selenium? - python

As an example lets say I wanted to record all bios of users on SO.
Lets say I loaded up: How to click an element in Selenium WebDriver using JavaScript
I clicked all users: .user-details a (11 of them)
I wrote Extracted text -> to a csv.
driver.get(‘Version compatibility of Firefox and the latest Selenium IDE (2.9.1.1-signed)’)
I read from csv the users.
user: Ripon Al Wasim [Is present again, do not click him] ??? How can this be achieved. As its text.
Is something like this accomplish-able or is this a limitation of selenium python?
You could click all of them, but lets say you had to scrape 200 pages and common name Bob popped up 430 times. I feel like it is unnecessary to click his name. Is something like this possible with Selenium?
I feel like I'm missing something and this is achievable but I am unaware how.
You could compare the text of text file and print(elem.get_attribute("href")) -> write that to a file and compare them. If elements were present, delete them but this is text. You could (maybe) put the text in an excel file. I'm not entirely sure if this is possible but you could write the css elements individually beside the text in the excel. And Delete rows where there are matched strings. And then get Selenium to load that up into Webdriver.
I'm not entirely convinced even this would work.
Is there a sane way of clicking css but ignoring names in a text file you have already clicked.

There's nothing special here with Selenium. That is your tool for interacting with the browser. It is your program that needs to decide how to do that interaction, and what you do with the information from it.
It sounds like you want to build a database of users, so why not use a database? something like SQLite or PostgreSQL might work nicely for you.
Among the user details, store the name as it appears in the link (assuming it will be unique for each user), and index that name. when scraping your page, pull that link text, then use SQL statements to search if the record exists by that name, if not, then click the link and add a new record.

Related

Selenium to simulate click without loading link?

I'm working on a project trying to autonomously monitor item prices on an Angular website.
Here's what a link to a particular item would look like:
https://www.<site-name>.com/categories/<sub-category>/products?prodNum=9999999
Using Selenium (in Python) on a page with product listings, I can get some useful information about the items, but what I really want is the prodNum parameter.
The onClick attribute for the items = clickOnItem(item, $index).
I do have some information for items including the presumable item and $index values which are visible within the html, but I'm doubtful there is a way of seeing what is actually happening in clickOnItem.
I've tried looking around using dev-tools to find where clickOnItem is defined, but I haven't been successful.
Considering that I don't see any way of getting prodNum without clicking, I'm wondering, is there's a way I could simulate a click to see where it would redirect to, but without actually loading the link- as this would take way too much time to do for each item?
Note: I want to get the specific prodNumber. I want to be able to hit the item page directly without first going though the main listing page.

Clicking multiple <span> elements with Selenium Python

I'm new to using Selenium, and I am having trouble figuring out how to click through all iterations of a specific element. To clarify, I can't even get it to click through one as it's a dropdown but is defined as an element.
I am trying to scrape fanduel; when clicking on a specific game you are presented with a bunch of main title bets and in order to get the information I need to click the dropdowns to get to that information. There is also another drop down that states, "See More" which is a similar problem, but assuming this gets fixed I'm assuming I will be able to figure that out.
So far, I have tried to use:
find_element_by_class_name()
find_element_by_css_selector()
I have also used them in the sense of elements, and tried to loop through and click on each index of the list, but that did not work.
If there are any ideas, they would be much appreciated.
FYI: I am using beautiful soup to scrape the website for the information, I figured Selenium would be helpful making the information that isn't currently accessible, accessible.
This image shows the dropdowns that I am trying to access, in this case the dropdown 'Win Margin'. The HTML code is shown to the left of it.
This also shows that there are multiple dropdowns, varying in amount based off the game.
You can also try using action chains from selenium
menu = driver.find_element_by_css_selector(".nav")
hidden_submenu = driver.find_element_by_css_selector(".nav # submenu1")
ActionChains(driver).move_to_element(menu).click(hidden_submenu).perform()
Source: here

Scraping text values using Selenium with Python

For each vendor in an ERP system (total # of vendors = 800+), I am collecting its data and exporting this information as a pdf file. I used Selenium with Python, created a class called Scraper, and defined multiple functions to automate this task. The function, gather_vendors, is responsible for scraping and does this by extracting text values from tag elements.
Every vendor has a section called EFT Manager. EFT Manager has 9 rows I am extracting from:
For #2 and #3, both have string values (crossed out confidential info). But, #3 returns null. I don’t understand why #3 onward returns null when there are text values to be extracted.
The format of code for each element is the same.
I tried switching frames but that did not work. I tried to scrape from edit mode and that didn’t work as well. I was curious if anyone ever encountered a similar situation. It seems as though no matter what I do I can’t scrape certain values… I’d appreciate any advice or insight into how I should proceed.
Thank you.
Why not try to use
find_element_by_class_name("panelList").find_elements_by_tag_name('li')
To collect all of the li elements. And using li.text to retrieve their text values. Its hard to tell what your actual output is besides you saying "returns null"
Try to use visibility_of_element_located instead of presence_of_element_located
Try to get textContent with javascript fo element Given a (python) selenium WebElement can I get the innerText?
element = driver.find_element_by_id('txtTemp_creditor_agent_bic')
text= driver.execute_script("return attributes[0].textContent", element)
The following is what worked for me:
Get rid of the try/except blocks.
Find elements via ID's (not xpath).
That allowed me to extract text from elements I couldn't extract from before.
You should change the way of extracting the elements on web page to ID's, since all the the aspects have different id provided. If you want to use xpaths, then you should try the JavaScript function to find them.
E.g.
//span[text()='Bank Name']

Parsing a Dynamic Web Page using Python

I am trying to parse a WebPage whose html source code changes when I press a arrow-key to get a drop-down list.
I want to parse the contents of that drop down list. How can I do that?
Example of the Problem: If you go to this site: http://in.bookmyshow.com/hyderabad and select the arrow button on comboBox "Select Movie" a drop-down list of movies appears. I want to get a list of these movies.
Thanks in advance.
The actual URL with the data used to populate the drop-down box is here:
http://in.bookmyshow.com/getJSData/?file=/data/js/GetEvents_MT.js&cmd=GETEVENTSWEB&et=MT&rc=HYD&=1425299159643&=1425299159643
I'd be a bit careful though and double-check with the site terms of use or if there are any APIs that you could use instead.
You may want to have a look at selenium. It allows you to reproduce exacly the same steps as you do because it also uses the browser (Firefox, Chrome, etc).
Ofc, it's not as fast as using mechanize, urllib, beautifulsoup and all this stuff, but it is worth a try.
You will need to dig into the JavaScript to see how that menu gets populated. If it is getting populated via AJAX, then it might be easy to get that content by re-doing a request to the same URL (e.g., do a GET to "http://www.example.com/get_dropdown_entries.php").

Selenium unable to switch to TinyMCE iframe in Internet Explorer 9

I'm trying to swtich to an iframe in IE9 so I can send_keys() to the text area. When I do switch, I can tell that the webdriver thinks it switched (when I print page_source, it's right) but the cursor is still blinking on another textfield (not TinyMCE), at this point, if I send keys, the keys get appended to the other textfield and not to TinyMCE.
So I've been trying workarounds, If I select the the tinyMCE iframe and click(), the cursor is in the right place and I can send keys but the I can't return (switch back to the original frame/window) to submit the input.
Has anyone else run into this in IE9, are there workarounds?
This works, in Firefox and Chrome, just not IE9.
I had a problem like this once, and its quite complicated to work around it since TinyMCE generates some dynamic content. What I ended up doing to manipulate the contents of the TinyMCE editor was calling the API directly via page.execute_script and just doing it all on JavaScript.
A sample of my JS code is:
jQuery('textarea.tinymce').tinymce().setContent('test text in editor');
jQuery('textarea.tinymce').tinymce().selection.select(jQuery('textarea.tinymce').tinymce().dom.select('p')[0]);
jQuery('textarea.tinymce').tinymce().execCommand('Italic','true');
jQuery('textarea.tinymce').tinymce().execCommand('Underline','true');
jQuery('textarea.tinymce').tinymce().execCommand('Bold','true');
The first line adds text in TinyMCE's textarea, the second selects it (simulating a user cursor select), the third, fourth and fifth just manipulate the controls.
.execCommand() was particularly useful for activating the different extensions. After that I just validated that the form fields I was using were set with the expected HTML tags and called it a day.
I hope it helps!

Categories