Formatting Table text with Selenium - python

I am currently writing a script for an internal website and feel like I am getting quite far with it. I have managed to webscrape the table I want with selenium and print it to the console. However now I am faced with the task of formatting the text into a format.
This is my code I need help with
Username = driver.find_elements(By.XPATH, value="/html/body/form[2]/table/tbody/tr/td[2]")
Role = driver.find_elements(By.XPATH, value="/html/body/form[2]/table/tbody/tr/td[3]")
Cluster_Access = driver.find_elements(By.XPATH, value="/html/body/form[2]/table/tbody/tr/td[4]")
Failed_Logins = driver.find_elements(By.XPATH, value="/html/body/form[2]/table/tbody/tr/td[5]")
Password_Exp = driver.find_elements(By.XPATH, value="/html/body/form[2]/table/tbody/tr/td[6]")
Mainframe_Data=[]
for i in range(len(Username)):
temporary_data={'Username': 'AU-Business-SSG-MF|S|TS347IAZ2|Company TECH|' + Username[i].text,
'Role': Role[i].text,
'Cluster Access' : Cluster_Access[i].text,
'Failed Logins' : Failed_Logins[i].text,
'Password Expiration' : Password_Exp[i].text}
Mainframe_Data.append(temporary_data)
for i in range(len(Username)):
temporary_data=['AU-NAB-SSG-MF|S|System1|IBM STORAGE|' + Username[i].text,
Role[i].text,
Cluster_Access[i].text,
Failed_Logins[i].text,
Password_Exp[i].text]
Mainframe_Data.append(temporary_data)
print(*Mainframe_Data,sep='\n')
I am wondering if a username equals a certain value I can change it and append it to Mainframe_Data.
This is the table Im scraping:
[1]: https://i.stack.imgur.com/B6cVQ.png
This is the format I need to make it output:
AU-Business-SSG-MF|S|System1|Company TECH|(User)||616/K/(Employee ID)|enabled|||Administrator
I somehow need to have an individual output for all systems on a separate line. For example in the picture provided I would need the following output:
AU-Business-SSG-MF|S|System1|Company TECH|(User)||616/K/(Employee ID)|enabled|||Administrator
AU-Business-SSG-MF|S|System2|Company TECH|(User)||616/K/(Employee ID)|enabled|||Administrator
AU-Business-SSG-MF|S|System3|Company TECH|(User)||616/K/(Employee ID)|enabled|||Administrator
AU-Business-SSG-MF|S|System4|Company TECH|(User)||616/K/(Employee ID)|enabled|||Administrator
I do realise not all the data is in the table I need but was hoping its possible to say if the username equals "User" I can fill it in with the format above.
I am hoping to do this as lightweight as possible
Im sorry if this question isn't really worded very well but I hope someone can help me
Thanks in advance

Related

Python Selenium Web-Driver not finding text element

I am trying to get the product's seller yet it will not get the text. I assume this is some weird thing since the text is also a link. Any help?
Python Code:
self.sold_by = driver.find_element_by_css_selector('#sellerProfileTriggerId').text
HTML Element:
SKUniverse
Try like this:
self.sold_by = driver.find_element_by_css_selector('#sellerProfileTriggerId')
text_element=self.sold_by.text
print(text_element)
Also, why aren't you using xpath or id selectors! Just asking :)

String filtering in an if function not working in Python

I am writing a webscraper that scrapes data from a list of links one after the other. The problem is that the website uses the same class names for up to 3 different buttons at once with no other unique identifiers used which to my understanding makes it impossible to point to the exact button if there are more.
I used the driver.find.element which worked well since it just found the first result and basically ignored the other buttons. However, on some pages, the information the offers information that I am trying to scrape is missing which results in the script picking up wrong data and filling it in even though I am not interested in that data at all.
So I went out with a solution that checks whether the scraped information contains a specific string that only appears for that one piece of information that I am trying to get and if the string is not found the data variable should get overwritten with empty data so that it would be obvious that the information doesn't exist.
However, during the process the if statement that I am trying to filter the strings with doesn't seem to work at all. When there are no buttons on the webpage it indeed manages to fill in the variable with empty data. However, once a different button appears it's not filtered and gets through somehow and ruins the whole thing.
This is an example webpage which doesn't contain the data at all :
https://reality.idnes.cz/rk/detail/nido-group-s-r-o/5a85b108a26e3a2adb4e394c/?page=185
This is an example webpage that contains 2 buttons with data the first of which I am trying to scrape look for the "nemovitostí" text in the blue button that's what I am trying to filter.
https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/
This is the problematic code :
# Offers
offers = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = offers.text
print(offers)
# Check if scraped information contains offers else move on
if "nemovitostí" or "nemovitosti" or "nemovitost" in offers:
pass
else:
offers = ""
Since the if statement should supposedly look for the set of strings and otherwise if not found should execute any other code under the else statement I can't seem to understand how is it possible that the data gets in at all. There are no error codes or warning it just picks up the data instead of ignoring it even if the string is different.
This is more of the code for reference :
# Open links.csv file and read it's contents
with open('links.csv') as read:
reader = csv.reader(read)
link_list = list(reader)
# Information search
for link in link_list:
driver.get(', '.join(link))
# Title
title = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
# Offers
offers = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = offers.text
print(offers)
# Check if scraped information contains offers else move on
if "nemovitostí" or "nemovitosti" or "nemovitost" in offers:
None
else:
offers = ""
# Address
address = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
# Phone number
# Try to obtain phone number if nonexistent move on
try:
phone_number = wait.until(ec.presence_of_element_located((By.XPATH, "//a[./span[contains(#class, 'icon icon--phone')]]")))
phone_number = phone_number.text
except TimeoutException:
phone_number = ""
# Email
# Try to obtain email if nonexistent move on
try:
email = wait.until(ec.presence_of_element_located((By.XPATH, "//a[./span[contains(#class, 'icon icon--email')]]")))
email = email.text
except TimeoutException:
email = ""
# Print scraping results
print(title.text, " ", offers, " ", address.text, " ", phone_number, " ", email)
# Save results to a list
company = [title.text, offers, address.text, phone_number, email]
# Write results to scraped.xlsx file
worksheet.write_row(row, 0, company)
del title, offers, address, phone_number, email
# Push row number lower
row += 1
workbook.close()
driver.quit()
How is it possible that the data still gets through? Is there an error in my syntax? If you saw my mistake please let me know so I can get better next time! Thanks to anyone for any sort of help!
1. The problem is that the website uses the same class names for up to 3 different buttons at once with no other unique identifiers used which to my understanding makes it impossible to point to the exact button if there are more
You can actually get the element you need if you use By.XPATH instead By.CSS_SELECTOR.
First would be (//span[#class='btn__text'])[1], second (//span[#class='btn__text'])[2] and third (//span[#class='btn__text'])[3]
Or if you are not sure what the order would be, you can be more specific like
(//span[#class='btn__text' and contains(text(),'nemovitostí')])
2. Second problem is related to if syntax in python
It should be like this
if "nemovitostí" in offers or "nemovitosti" in offers or "nemovitost" in offers:
There might be a nicer way to write this, maybe something like this:
for i in ["nemovitostí" , "nemovitosti" , "nemovitost"]:
if i in offers:
The most ideal way to write this would be the following
value=["nemovitostí","nemovitosti","nemovitost"]
if any(s in offers for s in value):
#dosomethinghere
else:
offers = ""

Using Selenium to get the information within title= ' '

please help me to find the solution to get the information from this html code by using Selenium without XPath because I want to make a loop from it.
I want to get the result as: "4.7" from this "title="4.7/5 - 10378 Reviews"
please check the picture below:
enter image description here
If you have driver setup already,
Than try something like this,
rating = driver.find_element_by_class_name("stars").get_attribute("title")
print (rating)

Web scraping in Selenium in Python - find elements via xpath or id return empty list

So I am trying to scrape a list of email addresses from my User Explorer page in Google Analytics.
which
I obtained the x-path via here
The item's X-path is //*[#id="ID-explorer-table-dataTable-key-0-0"]/div
But no matter how I do:
driver.find_elements_by_xpath(`//*[#id="ID-explorer-table-dataTable-key-0-0"]/div`)
or
driver.find_elements_by_xpath('//*[#id="ID-reportContainer"]')
or
driver.find_elements_by_id(r"ID-explorer-table-dataTable-key-0-0")
it returns an empty list.
Can anyone tell me where I have gone wrong?
I also tried using:
html = driver.page_source
but of course I couldnt find the list of the emails as well.
I am also thinking, if this doesnt work, whether there is a way I can automate control + a and copy all the text displayed into a string in Python and then usere.findall() to find the email addresses?
email = driver.find_element_by_xpath(//*[#id="ID-explorer-table-dataTable-key-0-0"]/div)
print("email", email.get_attribute("innerHTML"))
Thanks for the help of #Guy!
It was something related to iframe and this worked and detected which frame the item i need belong to:
iframelist=driver.find_elements_by_tag_name('iframe')
for i in range(len(iframelist)):
driver.switch_to.frame(iframelist[i])
if len(driver.find_elements_by_xpath('//*[#id="ID-explorer-table-dataTable-key-0-0"]/div'))!=0:
print('it is item {}'.format(i))
break
else:
driver.switch_to.default_content()

how to extract the number out of 'ui_bubble_rating bubble_50' using selenium

I am using selenium to scrape reviews from tripadvisor.com. I haven't found the right way to extract all the review rating made by users:"ui_bubble_rating bubble_50", 50 = 5 star.
<span class="ui_bubble_rating bubble_50"></span>
::before==$0
::after==$0
Is there any way to extract the number using selenium?
Anyone knows, please help to point out with many thanks.
I tried the code below, but the xpath can't find the value I need. It provided only one star rating which is same for all review.
var = driver.find_element_by_xpath("//span[contains(#class, 'ui_bubble_rating bubble_')]").get_attribute("class")
I need the star rating of each review, please have a look at the photo below for what I need. Thank you
Hi. I finally solved my problem. Thank you so much for your time. I just put the answer here in case someone needs.
var = driver.find_element_by_class_name('ui_bubble_rating').get_attribute('class')
review_rating = var.split("_")[-1]
Get class name value and store in 'var'
split class name using underscore and it will return array with splitted string
access value at last index of array 'data' it will contain elements like "ui","bubble","rating bubble","50"
var = driver.
find_element_by_xpath("//span[contains(#class,
'ui_bubble_rating bubble_')]").get_attribute("class")
data = var.split("_")
print data[-1]

Categories