Im trying to create an discord embed that containts info of some webstite. Im trying to store the driver.find_element.text of with selenium.
Then I want to put that python variable into a json code that makes a discord embed.
The problem is that each product of this page give me 3 different texts. How can I save each one in diferents variables. I put my code here
`
from selenium import webdriver
from selenium.webdriver.common.by import By
import csv
DRIVER_PATH = 'C:\chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://www.zalando.es/release-calendar/zapatillas-mujer/')
product_1 = driver.find_element(By.XPATH, '//*[#id="release-calendar"]/div/div[1]/div')
print(product_1.text)
The result in terminakl is
119,95€
adidas Originals
Forum MID TM
28 de octubre de 2022, 8:00
Recordármelo
Thanks for the help I really dont know how to save the .text info into differents python varaibles.
Store the text in a variable or a
element_text = product_1.text
element_text_split = element_text.split() # split by space
If you wanted the price of that item: element_text_split[0] would get the first word
Second word element_text_split[1] is the company
You could also slice up the string using string slicing. Keep in mind not all data you get is going to look exactly the same.
So, you are trying to get texts of each product on the page, right?
If so, you can use find_elements() method to put all the products on the page into a list. For that you need to use an xpath (or any other locator) that finds not one element but all of them.
Here is the code
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('https://www.zalando.es/release-calendar/zapatillas-mujer/')
# This finds all products on the page and puts them into a list of elements:
products = driver.find_elements(By.XPATH, '//div[#class="auWjdQ rj7dfC Hjm2Cs _8NW8Ug xJbu_q"]')
# This will put text of each product into a list
texts = []
for product in products:
texts.append(product.text)
# Here we can see what is in the list of texts
print('All texts:', texts)
# If list doesn't suite your needs, you can unpack the list into three variables
# (only if you are sure that there are only three items on the page):
prod1, prod2, prod3 = texts
print('Product 1:', prod1)
print('Product 2:', prod2)
print('Product 3:', prod3)
driver.quit()
This is what I've used in the past but that's assuming you have a specific element attribute which you're trying to extract:
For example:
element_text = driver.find_element(By.XPATH, '//*[#id="release-calendar"]/div/div[1]/div').get_attribute('title')
You'll need to replace 'title' with the specific attribute from your element which contains the text you want.
Related
I want to make a search system when we enter a word in a variable, it search between all links’ names of this page (all the games) a little like a « control-F » and display the results (names + links) using Selenium (Python).
I don’t know how to make a system like that! If you can help it’s good!
Have a Nice code!
You are attempting to locate specific elements on a page and then sorting through them for a key search term. Selenium can identify elements on a page through a number of methods, see here for a guide. Once you have located all the elements you can filter them for the search term of interest.
Finding ALL the elements of interest:
I would utilise the XPATH of your elements to find them on the page and make a list that you can then search through based on your keyword. In your case all they are identifiable by this xpath:
//div[#class="blog-content"]//a
Extract the required information:
Once you have the list of elements, you will need to iterate over them to extract the href tag (the game's url) and innerHTML text (the name of the game).
I have used list comprehension in the example below to do this, which creates a dictionary {url:name, ...} you can filter your specific items from.
Example Code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.firefox import GeckoDriverManager
website_url = 'https://steamunlocked.net/all-games-2/'
game_xpaths = '//div[#class="blog-content"]//a'
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install()))
driver.get(website_url)
game_elements = driver.find_elements(By.XPATH, game_xpaths)
games = {g.get_attribute('href'):g.get_attribute('innerHTML') for g in game_elements}
games
"""
Outputs:
{'https://steamunlocked.net/red-tether-free-download/': '—Red—Tether–> Free Download (v1.006)',
'https://steamunlocked.net/hack-g-u-last-recode-free-download/': '.hack//G.U. Last Recode Free Download (v1.01)',
'https://steamunlocked.net/n-verlore-verstand-free-download/': '‘n Verlore Verstand Free Download',
'https://steamunlocked.net/0-n-0-w-free-download/': '0°N 0°W Free Download',
'https://steamunlocked.net/007-legends-free-download/': '007 Legends Free Download', ...
"""
Finding SPECIFIC items (i.e. CTRL+F)
To identify and filter only specific items from your dictionary for the occurrence of the word/string you are interested in.
def search(myDict, search_term):
return [[v,k] for k,v in myDict.items() if search_term.lower() in v.lower()]
>>> search(games, 'Ninja')
[['10 Second Ninja Free Download','https://steamunlocked.net/10-second-ninja-free-download/'],
['10 Second Ninja X Free Download','https://steamunlocked.net/10-second-ninja-x-free-download/']]
My code goes to a website, clicks each iteration of row (of the table) which opens a new window.
I want to scrape 1 information per this new window, but I am having difficulty using CSS selectors to get this field (Faculty)
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import requests
driver = webdriver.Chrome()
productlink=[]
driver.get('https://aaaai.planion.com/Web.User/SearchSessions?ACCOUNT=AAAAI&CONF=AM2021&USERPID=PUBLIC&ssoOverride=OFF')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=driver.find_elements_by_class_name('clickdiv')
for item in productlist:
item.click() #opens the new window per each row
time.sleep(2)
faculty=driver.find_elements_by_xpath('//*[#id="W1"]/div/div/div/div[2]/div[2]/table/tbody/tr[7]/td/table/tbody/tr/td[2]/b')
print(faculty)
driver.find_element_by_class_name('XX').click()#closes window
time.sleep(1)
You have:
faculty=driver.find_elements_by_xpath('//*[#id="W1"]/div/div/div/div[2]/div[2]/table/tbody/tr[7]/td/table/tbody/tr/td[2]/b')
So you have an array of elements; when you print, you will have something like differents selenium elements. If you only have one element, you can use find_element_by_xpath instead of find_elements_by_xpath.
If you want to obtain the faculty values, and you have an array, you need to do a for and extract the text or get the attribute of all the elements.
Faculty reside within a table with class sorttable. You want the first of these so can use nth-of-type to restrict matches. Then the actual names are within b tags so use a child combinator ( )to move to the child b tags within the table. Use a list comprehension to return a faculty list:
faculty = [i.text for i in driver.find_elements_by_css_selector('.sortable:nth-of-type(1) b')
I am trying to create a function that takes a search term from a list and add it to a list of links.
formatted_link = link_to_vid.replace(' ', '+')
driver.get('https://www.youtube.com/results?search_query={}'.format(str(formatted_link)))
Then extracts the first link that YouTube returns. For example, I search 'Never Gonna Give You Up' and it adds the link of the first result to a list.
Then I may want to do ['Never Gonna Give You Up', 'Shrek']
How would I do this without actually clicking on the link?
I hope, I got your question right, here is an example:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.implicitly_wait(5)
# Let's create a list of terms to search for and an empty list for links
search_terms = ['Never Gonna Give You Up', 'Shrek']
links = []
for term in search_terms:
# Open YouTube page for each search term
driver.get('https://www.youtube.com/results?search_query={}'.format(term))
# Find a first webelement with video thumbnail on the page
link_webelement = driver.find_element(By.CSS_SELECTOR, 'div#contents ytd-item-section-renderer>div#contents a#thumbnail')
# Grab webelement's href, this is your link, and store in a list of links:
links += [link_webelement.get_attribute('href')]
print(links)
Hope this helps. Keep in mind, that for scraping data like this, you don't have to open webpages and use selenium, there are libraries like Beautiful Soup that help you do this kind of things faster.
So I am quite new to using Selenium and thus and quite unsure how to do this, or even word it for this matter.
But what I am trying to do is to use selenium to grab the following values and then store them into a list.
Image provided of the inspector window of Firefox, to show what I am trying to grab (Highlighted)
https://i.stack.imgur.com/rHk9R.png
In Selenium, you access elements using functions find_element(s)_by_xxx(), where xxx is for example the tag name, element name or class name (and more). The functions find_element_... return the first element that matches the argument, while find_elements_... return all matching elements.
Selenium has a [good documentation][1], in section "Getting started" you can find several examples of basic usage.
As to your question, the following code should collect the values you want:
from selenium import webdriver
driver = webdriver.Firefox() # driver for the browser you use
select_elem = driver.find_element_by_name('ctl00_Content...') # full name of the element
options = select_elem.find_elements_by_tag_name('option')
values = []
for option in options:
val = option.get_attribute('value')
values.append(val)
I am trying to come up with a way to scrape information on houses on Zillow and I am currently using xpath to look at data such as rent price, principal and mortgage costs, insurance costs.
I was able to find the information using xpath but I wanted to make it automatic and put it inside a for loop but I realized as I was using xpath, not all the data for each listing has the same xpath information. for some it would be off by 1 of a list or div. See code below for what I mean. How do I get it more specific? Is there a way to look up for a string like "principal and interest" and select the next value which would be the numerical value that I am looking for?
works for one listing:
driver.find_element_by_xpath("/html/body/div[1]/div[6]/div/div[1]/div[1]/div[1]/ul/li[1]/article/div[1]/div[2]/div")
a different listing would contain this:
driver.find_element_by_xpath("/html/body/div[1]/div[6]/div/div[1]/div[1]/div[2]/ul/li[1]/article/div[1]/div[2]/div")
The xpaths that you are using are specific to the elements of the first listing. To be able to access elements for each listing, you will need to use xpaths in a way that can help you access elements for each listing:
import pandas as pd
from selenium import webdriver
I searched for listing for sale in Manhattan and got the below URL
url = "https://www.zillow.com/homes/Manhattan,-New-York,-NY_rb/"
Asking selenium to open the above link in Chrome
driver = webdriver.Chrome()
driver.get(url)
I hovered my mouse on one of the house listings and clicked "inspect". This opened the HTML code and highlighted the item I am inspecting. I noticed that the elements having class "list-card-info" contain all the info of the house that we need. So, our strategy would be for each house access the element that has class "list-card-info". So, using the following code, I saved all such HTML blocks in house_cards variable
house_cards = driver.find_elements_by_class_name("list-card-info")
There are 40 elements in house_cards i.e. one for each house (each page has 40 houses listed)
I loop over each of these 40 houses and extract the information I need. Notice that I am now using xpaths which are specific to elements within the "list-card-info" element. I save this info in a pandas datagram.
address = []
price = []
bedrooms = []
baths = []
sq_ft = []
for house in house_cards:
address.append(house.find_element_by_class_name("list-card-addr").text)
price.append(house.find_element_by_class_name("list-card-price").text)
bedrooms.append(house.find_element_by_xpath('.//div[#class="list-card-heading"]/ul[#class="list-card-details"]/li[1]').text)
baths.append(house.find_element_by_xpath('.//div[#class="list-card-heading"]/ul[#class="list-card-details"]/li[2]').text)
sq_ft.append(house.find_element_by_xpath('.//div[#class="list-card-heading"]/ul[#class="list-card-details"]/li[3]').text)
driver.quit()
# print(address, price,bedrooms,baths, sq_ft)
Manahattan_listings = pd.DataFrame({"address":address,
"bedrooms": bedrooms,
"baths":baths,
"sq_ft":sq_ft,
"price":price},)
pandas dataframe output
Now, to extract info from more pages i.e. page2, page 3, etc, you can loop over website pages i.e. keep modifying your URL and keep extracting info
Happy Scraping!
selecting multiple elements using xpath is not a good idea. You can look into "css selector". Using this you can get similar elements.