Marking off checkboxes in a table that match a specific text - python

I have a table (screenshot below) where I want to mark off the checkboxes that have the text "Xatu Auto Test" in the same row using selenium python.
I've tried following these two posts:
Iterating Through a Table in Selenium Very Slow
Get row & column values in web table using python web driver
But I couldn't get those solutions to work on my code.
My code:
form = self.browser.find_element_by_id("quotes-form")
try:
rows = form.find_elements_by_tag_name("tr")
for row in rows:
columns = row.find_elements_by_tag_name("td")
for column in columns:
if column.text == self.group_name:
column.find_element_by_name("quote_id").click()
except NoSuchElementException:
pass
The checkboxes are never clicked and I am wondering what I am doing wrong.
This is the HTML when I inspect with FirePath:
<form id="quotes-form" action="/admin/quote/delete_multiple" method="post" name="quotesForm">
<table class="table table-striped table-shadow">
<thead>
<tbody id="quote-rows">
<tr>
<tr>
<td class="document-column">
<td>47</td>
<td class="nobr">
<td class="nobr">
<td class="nobr">
<td class="nobr">
<a title="Xatu Auto Test Data: No" href="http://192.168.56.10:5001/admin/quote/47/">Xatu Auto Test</a>
</td>
<td>$100,000</td>
<td style="text-align: right;">1,000</td>
<td class="nobr">Processing...</td>
<td class="nobr">192.168....</td>
<td/>
<td>
<input type="checkbox" value="47" name="quote_id"/>
</td>
</tr>
<tr>
</tbody>
<tbody id="quote-rows-footer">
</table>
<div class="btn-toolbar" style="text-align:center; width:100%;">

With a quick look, I reckon this line needs changing as you're trying to access column's quote_id, it should be row's:
From:
column.find_element_by_name("quote_id").click()
To:
row.find_element_by_name("quote_id").click()
P.S. Provided that like #Saifur commented, you have your comparison done correctly.
Updated:
I have run a simulation and indeed the checkbox is ticked if changing column to row, simplified version:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('your-form-sample.html')
form = driver.find_element_by_id("quotes-form")
rows = form.find_elements_by_tag_name("tr")
for row in rows:
columns = row.find_elements_by_tag_name("td")
for column in columns:
# I changed this to the actual string provided your comparison is correct
if column.text == 'Xatu Auto Test':
# you need to change from column to row, and it will work
row.find_element_by_name("quote_id").click()
Here's the output:

Related

How to parse column values and its href with selenuim

im new with selenium and parsing data from the website.
The problem is: i have website table with such HTML code:
<table width="580" cellspacing="1" cellpadding="3" bgcolor="#ffffff" id="restab">
<tbody>
<tr align="center" valign="middle">
<td width="40" bgcolor="#555555"><font color="#ffffff">№</font></td>
<td width="350" bgcolor="#555555"><font color="#ffffff">Название организации</font></td>
<td width="100" bgcolor="#555555"><font color="#ffffff">Город</font></td>
<td width="60" bgcolor="#555555"><span title="Число публикаций данной организации на eLibrary.Ru"><font color="#ffffff">Публ.</font></span></td><td width="30" bgcolor="#555555"><span title="Число ссылок на публикации организации"><font color="#ffffff">Цит.</font></span></td>
</tr>
<tr valign="middle" bgcolor="#f5f5f5" id="a18098">
<td align="center"><font color="#00008f">1</font></td>
<td align="left"><font color="#00008f"><a href="org_about.asp?orgsid=18098">
"Академия информатизации образования" по Ленинградской области</a></font></td>
<td align="center"><font color="#00008f">Гатчина</font></td>
<td align="right"><font color="#00008f">0<img src="/pic/1pix.gif" hspace="16"></font></td>
<td align="center"><font color="#00008f">0</font></td>
</tr>
<tr valign="middle" bgcolor="#f5f5f5" id="a17954">
<td align="center"><font color="#00008f">2</font></td>
<td align="left"><font color="#00008f"><a href="org_about.asp?orgsid=17954">
"Академия талантов" Санкт-Петербурга</a></font></td>
<td align="center"><font color="#00008f">Санкт-Петербург</font></td>
<td align="right"><font color="#00008f">3<img src="/pic/stat.gif" width="12" height="13" hspace="10" border="0"></font></td>
<td align="center"><font color="#00008f">0</font></td>
</tr>
</tbody>
</table>
and i need to get all this table values and href's of each value in left td
I tried to use Xpath, but it writes some error, how to do it better?
In conclusion i need to get dataframe with table values + extra column with href of left column
First try to use pandas.read_html(). See code example below.
If that doesn't work, then use use right-click menu on browser such as Mozilla Firefox (Inspect Element) or Google Chrome (Developer Tools) to find the CSS or Xpath. Then feed the CSS or Xpath into Selenium.
Another useful tool for finding complicated CSS/Xpath is the Inspector Gadget browser plug-in.
import pandas as pd
# this is the website you want to read ... table with "Minimum Level for Adult Cats"
str_url = 'http://www.felinecrf.org/catfood_data_how_to_use.htm'
# use pandas.read_html()
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html
list_df = pd.read_html(str_url, match='DMA')
print('Number of dataframes on the page: ', len(list_df))
print()
for idx, each_df in enumerate(list_df):
print(f'Show dataframe number {idx}:')
print(each_df.head())
print()
# use table 2 on the page
df_target = list_df[2]
# create column headers
# https://chrisalbon.com/python/data_wrangling/pandas_rename_column_headers/
header_row = df_target.iloc[0]
# Replace the dataframe with a new one which does not contain the first row
df_target = df_target[1:]
# Rename the dataframe's column values with the header variable
df_target.columns = header_row
print(df_target.head())

Is it possible to add a new <td> instance to a <tr> row with bs4?

I want to edit a table of an .htm file, which roughly looks like this:
<table>
<tr>
<td>
parameter A
</td>
<td>
value A
</td>
<tr/>
<tr>
<td>
parameter B
</td>
<td>
value B
</td>
<tr/>
...
</table>
I made a preformatted template in Word, which has nicely formatted style="" attributes. I insert parameter values into the appropreatte tds from a poorly formatted .html file (This is the output from a scientific program). My job is to automate the creation of html tables so that they can be used in a paper, basically.
This works fine, while the template has empty td instances in a tr. But when I try create additional tds inside a tr (over which I iterate), I get stuck. The .append and .append_after methods for the rows just overwrite existing td instances. I need to create new tds, since I want to create the number of columns dynamically and I need to iterate over a number of up to 5 unformatted input .html files.
from bs4 import BeautifulSoup
with open('template.htm') as template:
template = BeautifulSoup(template)
template = template.find('table')
lines_template = template.findAll('tr')
for line in lines_template:
newtd = line.findAll('td')[-1]
newtd['control_string'] = 'this_is_new'
line.append(newtd)
=> No new tds. The last one is just overwritten. No new column was created.
I want to copy and paste the last td in a row, because it will have the correct style="" for that row. Is it possible to just copy a bs4.element with all the formatting and add it as the last td in a tr? If not, what module/approach should I use?
Thanks in advance.
You can copy the attributes by assigning to the attrs:
data = '''<table>
<tr>
<td style="color:red;">
parameter A
</td>
<td style="color:blue;">
value A
</td>
</tr>
<tr>
<td style="color:red;">
parameter B
</td>
<td style="color:blue;">
value B
</td>
</tr>
</table>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
for i, tr in enumerate(soup.select('tr'), 1):
tds = tr.select('td')
new_td = soup.new_tag('td', attrs=tds[-1].attrs)
new_td.append('This is data for row {}'.format(i))
tr.append(new_td)
print(soup.table.prettify())
Prints:
<table>
<tr>
<td style="color:red;">
parameter A
</td>
<td style="color:blue;">
value A
</td>
<td style="color:blue;">
This is data for row 1
</td>
</tr>
<tr>
<td style="color:red;">
parameter B
</td>
<td style="color:blue;">
value B
</td>
<td style="color:blue;">
This is data for row 2
</td>
</tr>
</table>

Get all table elements in Python using Selenium

I have a webpage which looks like this:
<table class="data" width="100%" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<th>1</th>
<th>2</th>
<th>3 by</th>
</tr>
<tr>
<td width="10%">5120432</td>
<td width="70%">INTERESTED_SITE1/</td>
<td width="20%">foo2</td>
</tr>
<tr class="alt">
<td width="10%">5120431</td>
<td width="70%">INTERESTED_SITE2</td>
<td width="20%">foo2</td>
</tr>
</tbody>
</table>
I want to put those two sites somewhere (interested_site1 and interested_site2). I tried doing something like this:
chrome = webdriver.Chrome(chrome_path)
chrome.get("fooSite")
time.sleep(.5)
alert = chrome.find_element_by_xpath("/div/table/tbody/tr[2]/td[2]").text
print (alert)
But I can't find the first site. If I can't do this in a for loop, I don't mind getting every link separately. How can I get to that link?
It would be easier to use a CSS query:
driver.find_element_by_css_selector("td:nth-child(2)")
You can use an XPath expression to deal with this by looping over each row.
XPath expression: html/body/table/tbody/tr[i]/td[2]
Get the number of rows by,
totals_rows = chrome.find_elements_by_xpath("html/body/table/tbody/tr")
total_rows_length = len(totals_rows)
for (row in totals_rows):
count = 1
site = "html/body/table/tbody/tr["+counter+]+"/td[2]"
print("site name is:" + chrome.find_element_by_xpath(site).text)
site += 1
Basically, loop through each row and get the value in the second column (td[2]).

Python Selenium click on a specific row in a table containing the right data in a column

So I have a table that can have from 0 to x rows and always have 7 columns.
Something like below.
Type Price Store Weight For-sale Stock Discount
x
x
x
x
x
here is how the HTML looks:
<table id="my_table" class="datatable" cellspacing="0" cellpadding="0" border="0" style="border-width:0px;border-collapse:collapse;">
<tbody>
<tr>
<tr class="row" style="cursor:pointer;" onclick="javascript:__doPostBack('my$table','Select$0')">
<td>
<td class="first">Meat</td>
<td>75</td>
<td>Adams grocery</td>
<td align="center">1kg</td>
<td>Yes</td>
<td>Full</td>
<td>Yes</td>
<td>
</tr>
<tr class="row" style="cursor:pointer;" onclick="javascript:__doPostBack('my$table','Select$1')">
<td>
<td class="first">Vegetable</td>
<td>25</td>
<td>Adams grocery</td>
<td align="center">0.5kg</td>
<td>No</td>
<td>Empty</td>
<td>No</td>
<td>
</tr>
</tbody>
</table>
</div>
What I want to do is to click on each row if exists that contains the text "Adams grocery" (which is in column 3) so they open in a separate tab, then give new instructions to all tabs at once. For example: Click button "welcome" on all tabs.
I have a feeling the above might be a little too complicated for me as a beginner... So I thought maybe just click on one of the rows to begin with.
Been thinking about this the whole day, thanks for all help!
Do you need something like this:
Tested to this html:
http://jsfiddle.net/zvhrm6tf/
from selenium.webdriver.support.wait import WebDriverWait
td_list = WebDriverWait(driver, 10).until(lambda driver: driver.find_elements_by_css_selector("#my_table tr td"))
for td in td_list:
if(td.text == "Adams grocery"):
td.click()
and if you need to target the table row you could do something like this:
tr = td.find_element_by_xpath("..")

Get table with maximum number of rows in a page using BeautifulSoup

Can anyone tell me how i can get the table in a HTML page which has a the most rows? I'm using BeautifulSoup.
There is one little problem though. Sometimes, there seems to be one table nested inside another.
<table>
<tr>
<td>
<table>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
</table>
<td>
</tr>
</table>
When the table.findAll('tr') code executes, it would count all the child rows for the table and the rows for the nested table under it. The parent table has just one row but the nested table has three and I would consider that to be the largest table. Below is the code that I'm using to dig out the largest table currently but it doesn't take the aforementioned scenario into consideration.
soup = BeautifulSoup(html)
#Get the largest table
largest_table = None
max_rows = 0
for table in soup.findAll('table'):
number_of_rows = len(table.findAll('tr'))
if number_of_rows > max_rows:
largest_table = table
max_rows = number_of_rows
I'm really lost with this. Any help guys?
Thanks in advance
Calculate number_of_rows like that:
number_of_rows = len(table.findAll(lambda tag: tag.name == 'tr' and tag.findParent('table') == table))

Categories