Scraping webpage with selenium (python)

Scraping webpage with selenium (python) - python

Hi I would like to scrap what is selected in the following image:
Image Code
I know i could use the following code to get the text:
cell = driver.find_elements_by_xpath(".//*[#id='ip_selection1233880116name']")
print cell.text
But my problem is that ip_selection1233880116name should be dynamic, given that it changes every time as you can see from the image.
How could I do it?
Thanks a lot for your help!!!!

Use contains to catch just the name presuming the numbers all all that change, for a single element you should also use find_element as opposed to find_elements :
find_element_by_xpath("//*[contains(#id,'ip_selection') and contains(#id,'name')]")
You could also use starts-with and ends-with depending on the browser:
find_element_by_xpath("//*[starts-with(#id,'ip_selection') and ends-with(#id,'name')]")

Related

Selenium - to make find_elements. readable

Basic concept I know:
find_element = find single elements. We can use .text or get.attribute('href') to make the element can be readable. Since find_elements is a list, we can't use .textor get.attribute('href') otherwise it shows no attribute.
To scrape information to be readable from find_elements, we can use for loop function:
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
for i in vegetables_search:
print(i.text)
Here is my problem, when I use find_element, it shows the same result. I searched the problem on the internet and the answer said that it's because using find_element would just show a single result only. Here is my code which hopes to grab different urls.
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
But I don't know how to combine the results into pandas. If I print these codes, links variable prints the same url on the csv file...
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
Product_name =[]
links = []
for search in vegetables_search:
Product_name.append(search.find_element(By.TAG_NAME, "h4").text)
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
#use panda modules to export the information
df = pd.DataFrame({'Product': Product_name,'Link': links})
df.to_csv('name.csv', index=False)
print(df)
Certainly, if I use loop function particularly, it shows different links.(That's mean my Xpath is correct(!?))
product_link = (driver.find_elements(By.XPATH, "//a[#rel='noopener']"))
for i in product_link:
print(i.get_attribute('href'))
My questions:
Besides using for loop function, how to make find_elements becomes readable? Just like find_element(By.attribute, 'content').text
How to go further step for my code? I cannot print out different urls.
Thanks so much. ORZ
This is the html code which's inspected from the website:

This line:
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
should be changed to be
links.append(search.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href') will always search for the first element on the DOM matching .//a[#rel='noopener'] XPath locator while you want to find the match inside another element.
To do so you need to change WebDriver driver object with WebElement search object you want to search inside, as shown above.

How to deal with dynamic Xpath selenium and python

I have an Xpath like this that is changing every time I open the url
//*[#id="awsui-select-23"]/div/awsui-icon/span
What changes is the number after select as for the example:
//*[#id="awsui-select-41"]/div/awsui-icon/span
I tried to give the following xpath but it's not working
("//*[contains(#id, 'awsui-select-')]")
Can you help me understand how should I create/change the Xpath in my code so I can access the element each time the url is opening?

See this is the xpath
//*[#id="awsui-select-23"]/div/awsui-icon/span
you are using, and digits are changing, so you can use contaons
//*[contains(#id,'awsui-select-')]/div/awsui-icon/span
or (if you want to select the first span)
(//*[contains(#id,'awsui-select')]/descendant::span)[1]
use it like this :
ele = driver.find_element_by_xpath("//*[contains(#id,'awsui-select-')]/div/awsui-icon/span")
Update 1 :
The below works for OP :-
(//*[contains(#id,'awsui-select')]/descendant::span)[2]
or
(//*contains(#id,'awsui-select-')]/div/awsui-icon/span)[2]

Send_keys function triggers error message: 'Message: element not interactable'

I'm using Selenium to fill out this HTML form, but when it comes to inputting the data it says 'element not interactable'. I am able to click on the element however actually sending a string produces an error. How can I fix this?
driver.get('https://www.masmovil.es/cobertura-fibra-optica-campos/')
prov = Select(driver.find_element_by_xpath('//*[#id="province"]'))
prov.select_by_index(32)
driver.find_element_by_xpath('//*[#id="town"]').send_keys('1')
Thank you!

In the page you are accessing there are 2 elements that are returned with the selector by_xpath('//*[#id="town"]'), one is a "mm-ui-autocomplete", the other one is an "input".
the "mm-ui-autocomplete" is not visible nor interactable to a real user, that's probably what's throwing the exception you're having, and selenium always takes the first match when there's more than one element returned by the selector, so, assuming you want to type something on the "Localidad" field, it is selecting the wrong element.
Try changing your selector to by_xpath('//input[#id="town"]') and see if it works.
Hope it helps.

Can you try with this css selector :
input[id='town']
code :
driver.find_element_by_css_selector("input[id='town']").send_keys('1')
The xpath (//*[#id="town"]) you have used has two entries :
one with mm-ui-autocomplete tag and one with input tag.
Always give preference to css selector over xpath. It's more stable then xpath.
In case you would not want to use css selector, then you can use xpath like this :
//input[#id='town']
Code :
driver.find_element_by_xpath("//input[#id='town']").send_keys('1')

In my case, it happens that the find_element was not working before the frontend finished loading.
I solved this by adding sleep(2) before the find_element_by_xpath. You will need to import the function by from time import sleep.

Using selenium to get access class info on website

I am using the following code using Python 3.6 and selenium:
element = driver.find_element_by_class_name("first_result_price")
print(element)
on the website it is like this
`website: span class="first_result_price">712
however if I print element I get a completely different number?
Any suggestions?
many thanks!!

"element" is a type of object called WebElement that Selenium adds. If you want to find the text inside that element, you have to say
element.text
Which should return what you're looking for, '712', albeit in string form.

How to Collect the line with Selenium Python

I want to know how I can collect line, mailto link using selenium python the emails contains # sign in the contact page I tried the following code but it is somewhere works and somewhere not..
//*[contains(text(),"#")]
the emails formats are different somewhere it is <p>Email: name#domain.com</p> or <span>Email: name#domain.com</span> or name#domain.com
is there anyway to collect them with one statement..
Thanks

Here is the XPath you are looking for my friend.
//*[contains(text(),"#")]|//*[contains(#href,"#")]

You could create a collection of the link text values that contain # on the page and then iterate through to format. You are going to have to format the span like that has Email: name#domain.com anyway.
Use find_elements_by_partial_link_text to make the collection.

I think you need 2 XPath. First XPath for finding element that contains text "Email:", second XPath for element that contains attribute "mailto:".
//*[contains(text(),"Email:")]|//*[contains(#href,"mailto:")]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scraping webpage with selenium (python) - python

Related

Selenium - to make find_elements. readable

How to deal with dynamic Xpath selenium and python

Send_keys function triggers error message: 'Message: element not interactable'

Using selenium to get access class info on website

How to Collect the line with Selenium Python

Categories

Resources