Detect all names and get their link with Selenium Python - python

I want to make a search system when we enter a word in a variable, it search between all links’ names of this page (all the games) a little like a « control-F » and display the results (names + links) using Selenium (Python).
I don’t know how to make a system like that! If you can help it’s good!
Have a Nice code!

You are attempting to locate specific elements on a page and then sorting through them for a key search term. Selenium can identify elements on a page through a number of methods, see here for a guide. Once you have located all the elements you can filter them for the search term of interest.
Finding ALL the elements of interest:
I would utilise the XPATH of your elements to find them on the page and make a list that you can then search through based on your keyword. In your case all they are identifiable by this xpath:
//div[#class="blog-content"]//a
Extract the required information:
Once you have the list of elements, you will need to iterate over them to extract the href tag (the game's url) and innerHTML text (the name of the game).
I have used list comprehension in the example below to do this, which creates a dictionary {url:name, ...} you can filter your specific items from.
Example Code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.firefox import GeckoDriverManager
website_url = 'https://steamunlocked.net/all-games-2/'
game_xpaths = '//div[#class="blog-content"]//a'
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install()))
driver.get(website_url)
game_elements = driver.find_elements(By.XPATH, game_xpaths)
games = {g.get_attribute('href'):g.get_attribute('innerHTML') for g in game_elements}
games
"""
Outputs:
{'https://steamunlocked.net/red-tether-free-download/': '—Red—Tether–> Free Download (v1.006)',
'https://steamunlocked.net/hack-g-u-last-recode-free-download/': '.hack//G.U. Last Recode Free Download (v1.01)',
'https://steamunlocked.net/n-verlore-verstand-free-download/': '‘n Verlore Verstand Free Download',
'https://steamunlocked.net/0-n-0-w-free-download/': '0°N 0°W Free Download',
'https://steamunlocked.net/007-legends-free-download/': '007 Legends Free Download', ...
"""
Finding SPECIFIC items (i.e. CTRL+F)
To identify and filter only specific items from your dictionary for the occurrence of the word/string you are interested in.
def search(myDict, search_term):
return [[v,k] for k,v in myDict.items() if search_term.lower() in v.lower()]
>>> search(games, 'Ninja')
[['10 Second Ninja Free Download','https://steamunlocked.net/10-second-ninja-free-download/'],
['10 Second Ninja X Free Download','https://steamunlocked.net/10-second-ninja-x-free-download/']]

Related

Store text of driver.find_element into a python variable

Im trying to create an discord embed that containts info of some webstite. Im trying to store the driver.find_element.text of with selenium.
Then I want to put that python variable into a json code that makes a discord embed.
The problem is that each product of this page give me 3 different texts. How can I save each one in diferents variables. I put my code here
`
from selenium import webdriver
from selenium.webdriver.common.by import By
import csv
DRIVER_PATH = 'C:\chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://www.zalando.es/release-calendar/zapatillas-mujer/')
product_1 = driver.find_element(By.XPATH, '//*[#id="release-calendar"]/div/div[1]/div')
print(product_1.text)
The result in terminakl is
119,95€
adidas Originals
Forum MID TM
28 de octubre de 2022, 8:00
Recordármelo
Thanks for the help I really dont know how to save the .text info into differents python varaibles.
Store the text in a variable or a
element_text = product_1.text
element_text_split = element_text.split() # split by space
If you wanted the price of that item: element_text_split[0] would get the first word
Second word element_text_split[1] is the company
You could also slice up the string using string slicing. Keep in mind not all data you get is going to look exactly the same.
So, you are trying to get texts of each product on the page, right?
If so, you can use find_elements() method to put all the products on the page into a list. For that you need to use an xpath (or any other locator) that finds not one element but all of them.
Here is the code
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('https://www.zalando.es/release-calendar/zapatillas-mujer/')
# This finds all products on the page and puts them into a list of elements:
products = driver.find_elements(By.XPATH, '//div[#class="auWjdQ rj7dfC Hjm2Cs _8NW8Ug xJbu_q"]')
# This will put text of each product into a list
texts = []
for product in products:
texts.append(product.text)
# Here we can see what is in the list of texts
print('All texts:', texts)
# If list doesn't suite your needs, you can unpack the list into three variables
# (only if you are sure that there are only three items on the page):
prod1, prod2, prod3 = texts
print('Product 1:', prod1)
print('Product 2:', prod2)
print('Product 3:', prod3)
driver.quit()
This is what I've used in the past but that's assuming you have a specific element attribute which you're trying to extract:
For example:
element_text = driver.find_element(By.XPATH, '//*[#id="release-calendar"]/div/div[1]/div').get_attribute('title')
You'll need to replace 'title' with the specific attribute from your element which contains the text you want.

Using Selenium to grab specific information in Python

So I am quite new to using Selenium and thus and quite unsure how to do this, or even word it for this matter.
But what I am trying to do is to use selenium to grab the following values and then store them into a list.
Image provided of the inspector window of Firefox, to show what I am trying to grab (Highlighted)
https://i.stack.imgur.com/rHk9R.png
In Selenium, you access elements using functions find_element(s)_by_xxx(), where xxx is for example the tag name, element name or class name (and more). The functions find_element_... return the first element that matches the argument, while find_elements_... return all matching elements.
Selenium has a [good documentation][1], in section "Getting started" you can find several examples of basic usage.
As to your question, the following code should collect the values you want:
from selenium import webdriver
driver = webdriver.Firefox() # driver for the browser you use
select_elem = driver.find_element_by_name('ctl00_Content...') # full name of the element
options = select_elem.find_elements_by_tag_name('option')
values = []
for option in options:
val = option.get_attribute('value')
values.append(val)

Selenium scraping

I am trying to come up with a way to scrape information on houses on Zillow and I am currently using xpath to look at data such as rent price, principal and mortgage costs, insurance costs.
I was able to find the information using xpath but I wanted to make it automatic and put it inside a for loop but I realized as I was using xpath, not all the data for each listing has the same xpath information. for some it would be off by 1 of a list or div. See code below for what I mean. How do I get it more specific? Is there a way to look up for a string like "principal and interest" and select the next value which would be the numerical value that I am looking for?
works for one listing:
driver.find_element_by_xpath("/html/body/div[1]/div[6]/div/div[1]/div[1]/div[1]/ul/li[1]/article/div[1]/div[2]/div")
a different listing would contain this:
driver.find_element_by_xpath("/html/body/div[1]/div[6]/div/div[1]/div[1]/div[2]/ul/li[1]/article/div[1]/div[2]/div")
The xpaths that you are using are specific to the elements of the first listing. To be able to access elements for each listing, you will need to use xpaths in a way that can help you access elements for each listing:
import pandas as pd
from selenium import webdriver
I searched for listing for sale in Manhattan and got the below URL
url = "https://www.zillow.com/homes/Manhattan,-New-York,-NY_rb/"
Asking selenium to open the above link in Chrome
driver = webdriver.Chrome()
driver.get(url)
I hovered my mouse on one of the house listings and clicked "inspect". This opened the HTML code and highlighted the item I am inspecting. I noticed that the elements having class "list-card-info" contain all the info of the house that we need. So, our strategy would be for each house access the element that has class "list-card-info". So, using the following code, I saved all such HTML blocks in house_cards variable
house_cards = driver.find_elements_by_class_name("list-card-info")
There are 40 elements in house_cards i.e. one for each house (each page has 40 houses listed)
I loop over each of these 40 houses and extract the information I need. Notice that I am now using xpaths which are specific to elements within the "list-card-info" element. I save this info in a pandas datagram.
address = []
price = []
bedrooms = []
baths = []
sq_ft = []
for house in house_cards:
address.append(house.find_element_by_class_name("list-card-addr").text)
price.append(house.find_element_by_class_name("list-card-price").text)
bedrooms.append(house.find_element_by_xpath('.//div[#class="list-card-heading"]/ul[#class="list-card-details"]/li[1]').text)
baths.append(house.find_element_by_xpath('.//div[#class="list-card-heading"]/ul[#class="list-card-details"]/li[2]').text)
sq_ft.append(house.find_element_by_xpath('.//div[#class="list-card-heading"]/ul[#class="list-card-details"]/li[3]').text)
driver.quit()
# print(address, price,bedrooms,baths, sq_ft)
Manahattan_listings = pd.DataFrame({"address":address,
"bedrooms": bedrooms,
"baths":baths,
"sq_ft":sq_ft,
"price":price},)
pandas dataframe output
Now, to extract info from more pages i.e. page2, page 3, etc, you can loop over website pages i.e. keep modifying your URL and keep extracting info
Happy Scraping!
selecting multiple elements using xpath is not a good idea. You can look into "css selector". Using this you can get similar elements.

Looping through drop downs using selenium in python

I am trying to simulated clicking through multiple options on an online data tool that ends with downloading an excel sheet given your filters.
I am currently using selenium and identifying xpaths.
I am able to get through a single iteration and get a single excel sheet, but I need to do it for every possible permutation of drop down choices. To do by hand is unrealistic, as there are thousands of options.
The website for context: https://data.cms.gov/mapping-medicare-disparities
Does anyone know of a function that can be done in selenium that will work?
My current strategy is to create lists with the xpaths and then try to do a permutation function to get all the combinations. However, this has not worked because the function: b.find_element_by_xpath only allows one xpath at a time.
examples of lists:
geography county state/territory
G1 = '//select[#id="geography"]//option[#value="c"]'
G2 = '//select[#id="geography"]//option[#value="s"]'
Geo = [G1, G2]
creating pool of combinations
import itertools
from itertools import product
for perm in product(Geo, Adjust, Analysis, Domain):
print(perm)
actual code to use selenium
**from** selenium **import** webdriver
**from** selenium.webdriver.common.keys **import** Keys
b = webdriver.Firefox()
code to click through a popup
pop_up = b.find_element_by_xpath('/html/body/div[1]/button')
pop_up.click()
code trying to use xpath to select all options at once
b.find_element_by_xpath(('//select[#id="geography"]//option[#value="c"],
'//select[#id="adjust"]//option[#value="1"],'//select[#id="analysis"]
//option[#value="base"],'//select[#id="domain"]//option[#value="d1"]'))
error message: InvalidArgumentException: Message: invalid type: sequence, expected a string at line 1 column 28
This is because the find_element_by_xpath (I am assuming) will only look at 1 xpath at a time.
your syntax in code trying to use xpath... is wrong anyway, but you could just put all the xpaths in a list and loop through it.
xpathlist=['//select[#id="geography"]//option[#value="c"]', '//select[#id="adjust"]//option[#value="1"]',.....]
for xp in xpathlist:
b.find_element_by_xpath(xp)
#then add code to click or download or whatever

Python Selenium - Count number of items in listbox

I am trying to count the number of items in a list box on a webpage and then select multiple items from this list box. I can select the items fine I am just struggling to find out how to count the items in the list box.
see code:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
... ...
accountListBox = Select(driver.find_element_by_id("ctl00_MainContent_accountItemsListBox"))
accountListBox.select_by_index(0)
print(len(accountListBox))
I have tried using len() which results in the error "TypeError: object of type 'Select' has no len()".
I have also tried accountListBox.size() and also removed the 'Select' from line 3 which also doesn't work.
Pretty new to this so would appreciate your feedback.
Thanks!
According to the docs a list of Select element's options can be obtained by saying select.options. In your particular case this would be accountListBox.options, and you need to call len() on that, and not on the Select instance itself:
print(len(accountListBox.options))
Or, if you only want to print a list of currently selected options:
print(len(accountListBox.all_selected_options))
You should use find_elements by using some common selector for each listbox's items to find all of them, store the found elements into a variable, and use a native python's library to count them.
I usually use Selenium along with Beautiful Soup. Beautiful Soup is a Python package for parsing HTML and XML documents.
With Beautiful Soup you can get the count of items in a list box in the following way:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.PhantomJS() # or webdriver.Firefox()
driver.get('http://some-website.com/some-page/')
html = driver.page_source.encode('utf-8')
b = BeautifulSoup(html, 'lxml')
items = b.find_all('p', attrs={'id':'ctl00_MainContent_accountItemsListBox'})
print(len(items))
I assumed that the DOM element you want to find is a paragraph (p tag), but you can replace this with whatever element you need to find.

Categories