Data scraping on Flipkart [duplicate] - python

I'm trying to find the number of search results being displayed in (Flipkart e-commerce website) website using classname/xpath/cssselector, but I'm unable to find the total number of results.
The total number is being displayed as a text:
"Showing 1 – 24 of 8,747 results for "mobile phones"
In the webpage. I'm also unable to identify the number of search items displayed inside each page which is 24 in this case.
The code that I used to find elements is:
List<WebElement> flipkartTotalItems = driver.findElements(By.cssSelector("#container > div > div:nth-child(2) > div > div._1XdvSH._17zsTh > div > div._2xw3j- > div > div:nth-child(3) > div._2SxMvQ > div > div:nth-child(1)"));
#container > div > div:nth-child(2) > div > div._1XdvSH._17zsTh > div > div._2xw3j- > div > div._15eYWX > div > div.KG9X1F > h1 > span
I added thread.sleep method call for the page to load too.
HTML code for text webelement:

You can use below xpath to locate this "Showing 1 – 24 of 8,747 results for "mobile phones"
//*[contains(text(),'Showing 1 – 24 of 8,747 results for')]
Below for finding number search result show in a page.
//*[#class='_1UoZlX']

As per the url you have shared to find the number of search results being displayed in Flipkart Search Result ecommerce website you have to induce WebDriverWait for the WebElement to be visible and then extract the text as follows :
System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe");
WebDriver driver = new FirefoxDriver();
driver.get("https://www.flipkart.com/search?q=mobile%20phones&otracker=start&as-show=on&as=offFlipkart ecommerce website");
new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//span[#class='W-gt5y']")));
driver.findElement(By.xpath("//span[#class='W-gt5y']//ancestor::span")).getText();
System.out.println(driver.findElement(By.xpath("//span[#class='W-gt5y']//ancestor::span")).getText());
Console Output :
Showing 1 – 24 of 8,753 results for "mobile phones"

I think the best way to get the total number of results is to use this xpath
//span[contains(text(),'Showing')]
Then using "GetText()" get the result using string formatting
String result = driver.findElement(By.xpath("<xpath>")).getText();

The HTML is not very friendly for Browser Automation but the most stable xpath for getting the searchresults on a page I think is:
List<WebElement> flipkartTotalItems = driver.findElements(By.xPath("//div[#class = 'col col-7-12']"));
Getting the length or count of this list wil give you the number of results on the page.

Related

How do I get an element from an HTML table that has an always changing child element number?

I am trying to collect the total number of hours that I worked from each pay period that I worked. I can collect the total number of hours from the "Grand Totals" section, however it seems like that section never has a static child element number and shares class names with another row.
Here is an example:
from the first page, if I copy selector in Chrome, I get:
#TableTimeCard > tbody > tr:nth-child(12) > td:nth-child(3)
However, when I go to the next pay period, I get:
#TableTimeCard > tbody > tr:nth-child(14) > td:nth-child(3)
That "tr:nth-child" number ranges between 4 and however many number of entries I had on my time card.
Here is the snippet of code from the site. I am trying to take in 10.37:
I've tried grabbing it by that class name but then it just grabs the first number after "Total Hours" in the image. In this case, 0.00. I believe the reason why is because it has the same class name:
All I need to do is print grand total hours to console. Any ideas?
I am doing print(driver.find_element_by_css_selector("#TableTimeCard > tbody > tr:nth-child(12) > td:nth-child(3)")
Thanks!
Well, you can use find_elements to store all of them, and can iterate like this :-
hours = find_elements(By.CSS_SELECTOR, "#TableTimeCard > tbody > tr > td:nth-child(3)")
print(hours[0].text)
or even you can iterate that list as well :-
for hour in find_elements(By.CSS_SELECTOR, "#TableTimeCard > tbody > tr > td:nth-child(3)"):
print(hour.text)
Update 1 :
In case if you just want to grab Total hours value, use this xpath :-
//td[text()='Grand Totals']/following-sibling::td[4]
in code :
print(driver.find_element_by_xpath("//td[text()='Grand Totals']/following-sibling::td[4]").text)

Selenium Python - get elements and click on each

i want to store elements in a list and click on each of them one by one. But I have to reach the < u > - Element, because .click() only works, when I click on that < u > - xpath and not on the class-Element.
<a href="javascript:doQuery(...);" class="report">
<u>Test1</u>
<a href="javascript:doQuery(...);" class="report">
<u>Test2</u>
<a href="javascript:doQuery(...);" class="report">
<u>Test3</u>
Any tips? Thanks.
You can use a CSS selector for this.
selector = '.report > u'
elements = driver.find_elements_by_css_selector(selector)
for elem in elements:
elem.click()
This selector .report > u will select all <u> elements whose parent is an element with the report class.
reference

python selenium get text from element

how would I get the text "Premier League (ENG 1)" extracted from this HTML tree? (marked part)
I treid ti get the text with xpath, css selector, class... but I seem to cant get this text extracted.
Basically I want to create a list and go over all "class=with icon" elements that include a text (League) and append the text to that list.
This was my last attempt:
def scrape_test():
alleligen = []
#click the dropdown menue to open the folder with all the leagues
league_dropdown_menue = driver.find_element_by_xpath('/html/body/main/section/section/div[2]/div/div[2]/div/div[1]/div[1]/div[7]/div')
liga_dropdown_menue.click()
time.sleep(1)
#get text form all elements that conain a league as text
leagues = driver.find_elements_by_css_selector('body > main > section > section > div.ut-navigation-container-view--content > div > div.ut-pinned-list-container.ut-content-container > div > div.ut-pinned-list > div.ut-item-search-view > div.inline-list-select.ut-search-filter-control.has-default.has-image.is-open.active > div > ul > li:nth-child(3)')
#append to list
alleligen.append(leagues)
print(alleligen)
But I dont get any output.
What am I missing here?
(I am new to coding)
try this
path = "//ul[#class='inline-list']//li[first()+1]"
element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, path))).text
print(element)
the path specifies the element you want to target. the first // in the path means that the element you want to find is not the first element in the page and exists somewhere in the page. li[first()+1] states that you are interested in the li tag after the first li.
The WebDriverWait waits for the webpage to load completely for a specified number of seconds (in this case, 5). You might want to put the WebdriverWait inside a try block.
The .text in the end parses the text from the tag. In this case it is the text you want Premier League (ENG 1)
Can you try :
leagues = driver.find_elements_by_xpath(“//li[#class=‘with-icon’ and contains(text(), ‘League’)]”)
For league in leagues:
alleligen.append(league.text)
print(alleligen)
If you know that your locator will remain on the same position in that list tree, you can use the following where the li element is taken based on its index:
locator= "//ul[#class='inline-list']//li[2]"
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, locator))).text

Amazon dynamically changing the name of their CSS slectors and HTML objects?

I built an Amazon.es web scraper using Selenium
I'm using a CSS selector to find the total number of pages (to determine how many times my loop will iterate.)
But everyday, I have to go back and update the selector name because it seems to change dynamically.
I'm not extremely good with HTML/CSS, how do they do this?
Selector that worked yesterday:
lastPage = browser.find_elements_by_css_selector('div.s-desktop-width-max.s-desktop-content.sg-row > div.sg-col-20-of-24.sg-col-28-of-32.sg-col-16-of-20.sg-col.sg-col-32-of-36.sg-col-8-of-12.sg-col-12-of-16.sg-col-24-of-28 > div > span:nth-child(5) > div.s-main-slot.s-result-list.s-search-results.sg-row > div:nth-child(58) > span > div > div > ul > li:nth-child(6)')
Selector that works today:
lastPage = browser.find_elements_by_css_selector('div.s-desktop-width-max.s-desktop-content.sg-row > div.sg-col-20-of-24.sg-col-28-of-32.sg-col-16-of-20.sg-col.sg-col-32-of-36.sg-col-8-of-12.sg-col-12-of-16.sg-col-24-of-28 > div > span:nth-child(5) > div.s-main-slot.s-result-list.s-search-results.sg-row > div:nth-child(51) > span > div > div > ul > li:nth-child(6)')
I am not sure what you are trying to accomplish. But, You can use regular expression by using xpath.
Syntax
-//tagName[contains(#attribute,’value’)]

How to grab correct highchart number in selectors using Selenium Webdriver with Python with several charts on the page?

I am testing a web application that uses highcharts. The selectors look like this, and the highchart number of the same chart is always different. For example:
#highcharts-4 >div:nth-child(1) > span > div > span
When there is only one chart on a page, I do the following, and it works perfectly:
[id^='highcharts-'] > div:nth-child(1) > span > div > span
It selects the first element where id begins with the string 'highcharts-', but how can I select the second and the third elements if let's say I have several charts present on the same page?
For example, when there are three identical charts, the same element on each chart would have the following selectors with ID being different by two all the time:
#highcharts-4 >div:nth-child(1) > span > div > span
#highcharts-6 >div:nth-child(1) > span > div > span
#highcharts-8 >div:nth-child(1) > span > div > span
How can I grab the second and the third ones?
You should be able to use the method to select multiple elements that match your selector:
my_charts = driver.find_elements_by_css_selector("[id^='highcharts-'] > div:nth-child(1) > span > div > span")
for chart in my_charts:
print chart.text
(you didn't mention what you were doing with those charts, but here I'm just printing whatever text might be associated with it)

Categories