python: parsing certain values from svg file - python

I have a svg file like the following (en example)
<svg
<g class="displacy-arrow">
<path class="displacy-arc" id="arrow-ec55d4518d3c43e391ffce0b97c713ab-0-2" stroke-width="2px" d="M420,89.5 C420,2.0 575.0,2.0 575.0,89.5" fill="none" stroke="currentColor"/>
<text dy="1.25em" style="font-size: 0.8em; letter-spacing: 1px">
<textPath xlink:href="#arrow-ec55d4518d3c43e391ffce0b97c713ab-0-2" class="displacy-label" startOffset="50%" side="left" fill="currentColor" text-anchor="middle">pd</textPath>
</text>
<path class="displacy-arrowhead" d="M575.0,91.5 L583.0,79.5 567.0,79.5" fill="currentColor"/>
</g>
</svg>
I have tried to access the what is inside the 'textpath' node using the code below:
import xml.dom.minidom
doc = xml.dom.minidom.parse('my_file.svg')
name = doc.getElementsByTagName('textPath')
for t in name:
print([x.nodeValue for x in t.childNodes])
I would however like to get the other information included in the 'textpath', like the values for 'side' or 'fill', but I do not know how to access those.

just for future reference, I wrote a function based on the links that #Aswath has sent in the comments
from bs4 import BeautifulSoup
def extract_data_from_report3(filename):
soup = BeautifulSoup(open(filename), "html.parser")
for element in soup.find_all('textpath'):
print(element.get('side'))
extract_data_from_report3('my_file.svg')

Related

Get first text inside a tag using beautiful soup

I need to get the salary range from span tag. I tried using find().text but doesn't work as the span tag has other tags in it.
job_list = soup.find_all("div", class_="d-flex flex-column pl-sm css-1buaf54 job-search-key- 1mn3dn8 e1rrn5ka0")
for job in job_list:
salary = job.find("span", class_="job-search-key-1hbqxax e1wijj240")
print(salary)
The output I have:
₹362,870 - ₹955,252 is my desired output
What you actually want to do here is get the text inside the div class but not the text in the nested tags within it.
You can use soup.find with option text = True and recursive = False
Creating the data
from bs4 import BeautifulSoup
html_doc = '''<span class="job-search-key-1hbqxax elwijj240" data-test="detailsalary">
₹362,870 - ₹955,252 <span class="job-search-key- elwijj242">(Glassdoor Est.)</span> <span class="SVGInline greyInfoIcon" data-test="salaryIcon"> <svg class="SVGInline-svg greyInforcon-svg" height="16" viewbox="0 # 16 16" width="16" xmlns="http://www.w3.org/2000/svg"> <g fill="none" fill-rule="evenodd" id="prefix__info-16-px" stroke="none" stroke-width="1">
<path d="M8 14A6 6 # 118 2a6 6 # 010 12zme- 1A5 5 # 108 3a5 5 # 880 18zm-.6-5.60.6.6 8 111.2 8v11a.6.6 # 01-1.2 8v7.42MS 5.62.6.6 110-1.2.6.6 # 810 1.2z" fill="#505863" id="prefix_a"></path>
</svg> </span>
<div class="d-none"></div> </span>'''
soup = BeautifulSoup(html_doc, 'html.parser')
Generating the output
soup.find(class_='job-search-key-1hbqxax elwijj240').find(text=True, recursive=False).strip()
Output
This gives us
'₹362,870 - ₹955,252'

set XML attribute value with BeautifulSoup4

I'd like to read, modify, and save (override) my svg file with BeautifulSoup in Python.
Contents of bs-test.svg:
<g data-default-color="#FFFFFF" data-element-id="X123456">
<rect class="selection-box" fill="none" height="91" stroke="none" width="140" x="-30" y="-10"/>
<circle cx="40" cy="25" data-colored="true" fill="red" pointer-events="visible" r="25" stroke="black" stroke-width="3"/>
<text fill="black" font-family="Verdana" font-size="16" text-anchor="middle" x="40" y="55">
<tspan dy="16" x="40">Label Text</tspan>
</text>
</g>
The contents are actually a subset of a larger svg, where I find g elements based on user-provided data-element-id values.
I'd like to change the fill attribute of the circle element to "blue".
what I have so far:
from bs4 import BeautifulSoup as bs
with open("bs-test.svg", "r") as f:
contents = f.read()
soup = bs(contents, "xml")
# grab g tags with the required data-element-id
elem_ls = soup.find_all(attrs={"data-element-id" : "X123456"})
x = elem_ls[0]
x
Output
<g data-default-color="#FFFFFF" data-element-id="X123456">
<rect class="selection-box" fill="none" height="91" stroke="none" width="140" x="-30" y="-10"/>
<circle cx="40" cy="25" data-colored="true" fill="red" pointer-events="visible" r="25" stroke="black" stroke-width="3"/>
<text fill="black" font-family="Verdana" font-size="16" text-anchor="middle" x="40" y="55">
<tspan dy="16" x="40">Label Text</tspan>
</text>
</g>
I'm sure that this is just a syntax question that I can't quite find the answer to; how might I go about grabbing the fill attr of circle, replacing its value with "blue", then writing out?
You just have to set the attribute value of the bs4 element using key
CODE:
from bs4 import BeautifulSoup as bs
with open("bs-test.svg", "r") as f:
contents = f.read()
soup = bs(contents, "xml")
# grab g tags with the required data-element-id
elem_ls = soup.find_all(attrs={"data-element-id" : "X123456"})
for e in elem_ls:
circle = e.find('circle')
circle['fill'] = 'blue'
print(e)
RESULTS:
<g data-default-color="#FFFFFF" data-element-id="X123456">
<rect class="selection-box" fill="none" height="91" stroke="none" width="140" x="-30" y="-10"/>
<circle cx="40" cy="25" data-colored="true" fill="blue" pointer-events="visible" r="25" stroke="black" stroke-width="3"/>
<text fill="black" font-family="Verdana" font-size="16" text-anchor="middle" x="40" y="55">
<tspan dy="16" x="40">Label Text</tspan>
</text>
</g>

Python Selenium - Can't Click on Button

All I am trying to do is select the drop down & then select "Export Excel Spread Sheet".
Example of Drop Down
Code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
url2 =["https://example.com/reports"]
driver = webdriver.Chrome()
driver.implicitly_wait(15)
driver.get("https://example.com")
for u in url2:
driver.implicitly_wait(15)
driver.get(u)
I have tried so many different Xpaths & ID's
#driver.find_element_by_xpath("//a[contains(#class,'dropdown__trigger header-export-menu--toggle-btn')]").click()
#driver.find_element_by_xpath("//li[contains(text(),'Export Excel Spread Sheet')]").click()
#act.click().perform()
#act.click(driver.find_element_by_xpath("//a[contains(#class,'dropdown__trigger header-export-menu--toggle-btn')]")).perform()
#act.move_to_element(driver.find_element_by_xpath("//a[contains(#class,'dropdown__trigger header-export-menu--toggle-btn')]")).perform()
#WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.ID, "Header-Dropdown-Menu")).click())
#driver.find_element_by_class_name("//div[contains(#class,'dropdown__content header-export-menu--content')]").click()
#driver.find_element_by_xpath('//div[#class="dropdown header-export-menu" and #class="dropdown dropdown--active header-export-menu"]')
#driver.quit()
HTML Code
Click Me For HTML Example1
<!-- Under React Empty: 32 -->
<div class = "dropdown header-export-menu">
Take a look at the name in Example 1 vs Example 2
Click Me for HTML Example2
You'll notice the HTML Code has change to
<!-- Under React Empty: 32 -->
<div class = "dropdown dropdown--active header-export-menu">
Which I think is part of the problem I am having. Pretty Stuck.
I have also tried to use ChroPath & XPath Helper to try and resolve the issue but no luck.
Thank you in Advance !
Update:
The Comments have asked for further detail of the HTML code & I have gathered the following block.
<div class="header-container">
<!-- react-empty: 429 -->
<div class="header-event-info" id="header-event-info">
<div class="">
<div>
<div class="single-event-info">
<div class="event-data">
<p class="data-dd">25</p>
<p class="data-mmyy">Jun 2017</p>
</div>
<div class="event-detail">
<p class="event-name">"A name of a musical"</p>
<p class="event-more-details">
<!-- react-text: 437 -->
"Tuesday, 7:00 pm, Some Theatre"
<!-- /react-text -->
<a class="popup" data-content="" data-icon="" data-position="bottom" data-width="350" data-height="auto" data-trigger="click" data-scrollable="false">
<span class="popup-icon">
<svg width="19px" height="19px" viewBox="0 0 19 19" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<desc>Created with sketchtool.</desc>
<defs></defs>
<g id="Totals-For-Today" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g class="svg-icon-path" id="01-Event-Audit-Narrow---No-TFT" transform="translate(-983.000000, -121.000000)" stroke="#919598">
<g id="Group" transform="translate(27.500000, 15.560000)">
<g id="iButton" transform="translate(956.000000, 106.000000)">
<path d="M9,17.4399996 C13.6944204,17.4399996 17.5,13.63442 17.5,8.93999958 C17.5,4.24557921 13.6944204,0.43999958 9,0.43999958 C4.30557963,0.43999958 0.5,4.24557921 0.5,8.93999958 C0.5,13.63442 4.30557963,17.4399996 9,17.4399996 Z" id="outline">
</path>
<path class="svg-icon-text" d="M10.4765625,13.3169527 L7.68164062,13.3169527 L7.68164062,12.930234 C7.77148482,12.9224214 7.86425733,12.914609 7.95996094,12.9067965 C8.05566454,12.8989839 8.13867152,12.8833591 8.20898438,12.8599215 C8.31835992,12.824765 8.3994138,12.7632422 8.45214844,12.6753511 C8.50488308,12.5874601 8.53125,12.4732034 8.53125,12.3325777 L8.53125,8.76421833 C8.53125,8.63921771 8.50292997,8.52496104 8.44628906,8.42144489 C8.38964815,8.31792875 8.31054738,8.23101556 8.20898438,8.16070271 C8.13476525,8.11382747 8.02734445,8.07378881 7.88671875,8.04058552 C7.74609305,8.00738223 7.61718809,7.98687462 7.5,7.97906208 L7.5,7.59820271 L9.5390625,7.46929646 L9.62109375,7.55132771 L9.62109375,12.2622652 C9.62109375,12.3989846 9.64746067,12.5122648 9.70019531,12.602109 C9.75292995,12.6919532 9.83593693,12.7583587 9.94921875,12.8013277 C10.0351567,12.8364841 10.1191402,12.8648042 10.2011719,12.8862886 C10.2832035,12.9077731 10.3749995,12.9224214 10.4765625,12.930234 L10.4765625,13.3169527 Z M9.73828125,5.18999958 C9.73828125,5.41265694 9.66503979,5.60699094 9.51855469,5.77300739 C9.37206958,5.93902385 9.19140732,6.02203083 8.9765625,6.02203083 C8.77734275,6.02203083 8.60449292,5.94293006 8.45800781,5.78472614 C8.31152271,5.62652223 8.23828125,5.44585997 8.23828125,5.24273396 C8.23828125,5.02788913 8.31152271,4.84039101 8.45800781,4.68023396 C8.60449292,4.5200769 8.77734275,4.43999958 8.9765625,4.43999958 C9.19921986,4.43999958 9.38183522,4.51519414 9.52441406,4.66558552 C9.6669929,4.81597689 9.73828125,4.99077983 9.73828125,5.18999958 L9.73828125,5.18999958 Z" id="i-2-copy-2" stroke-width="0.25" fill="#919598"></path>
</g>
</g>
</g>
</g>
</svg>
</span>
</a>
</p>
</div>
</div>
<div class="header-export">
<!-- react-empty: 32 -->
<div class="dropdown dropdown--active header-export-menu">
<a class="dropdown__trigger header-export-menu--toggle-btn">
<svg width="9px" height="5px" viewBox="0 0 9 5" version="1.1">
<desc>Created with Sketch.</desc>
<defs></defs>
<g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="Artboard" transform="translate(-109.000000, -97.000000)" fill="#FFFFFF">
<g id="Header-Dropdown-Menu" transform="translate(96.000000, 82.000000)">
<path d="M18.1734867,19.6470682 C17.8014721,20.0543213 17.1922167,20.0476427 16.8263028,19.6470682 L13.2549246,15.7373969 C12.8829101,15.3301438 13.0295754,15 13.5787039,15 L21.4210856,15 C21.9719185,15 22.1107787,15.3368224 21.7448649,15.7373969 L18.1734867,19.6470682 Z" id="options-dropdown-menu-arrow"></path>
</g>
</g>
</g>
</svg>
</a>
<div class="dropdown__content header-export-menu--content">
<ul class="export-menu">
<li class="export-menu-item ">
<svg width="17px" height="14px" viewBox="0 0 17 15" version="1.1">
<desc>Created with Sketch.</desc>
<defs></defs>
<g id="Basic-Report-Template-SPECS" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="Basic-Report-Template---EXPORT-SPECS-OnClick" transform="translate(-522.000000, -180.000000)" stroke="#484B4D" stroke-width="2">
<g id="SPECS" transform="translate(487.000000, 14.000000)">
<g id="Download-Icon-Copy-2" transform="translate(43.500000, 172.000000) rotate(-180.000000) translate(-43.500000, -172.000000) translate(35.000000, 164.000000)">
<path d="M5.36902902,13.624518 L5.36902902,6.56257607 L12.430971,6.56257607" id="Rectangle-242-Copy-5" transform="translate(8.900000, 10.093547) rotate(-315.000000) translate(-8.900000, -10.093547) "></path>
<path d="M8.9,6.99999999 L8.9,12.9999999" id="Line-Copy-10" stroke-linecap="square"></path>
<path d="M16.9,0.0208873076 L16.9,4.02297419 C16.9,5.12639113 16.0054862,6.02088731 14.9059397,6.02088731 L2.89406028,6.02088731 C1.7927712,6.02088731 0.9,5.12262668 0.9,4.02297419 L0.9,0.0208873076" id="Rectangle-243-Copy-4" transform="translate(8.900000, 3.020887) rotate(-180.000000) translate(-8.900000, -3.020887) "></path>
</g>
</g>
</g>
</g>
</svg>
<!-- react-text: 55 -->
"Export PDF"
<!-- /react-text -->
</li>
<li class="export-menu-item">
<svg width="17px" height="14px" viewBox="0 0 17 15" version="1.1">
<desc>Created with Sketch.</desc>
<defs></defs><g id="Basic-Report-Template-SPECS" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="Basic-Report-Template---EXPORT-SPECS-OnClick" transform="translate(-522.000000, -180.000000)" stroke="#484B4D" stroke-width="2">
<g id="SPECS" transform="translate(487.000000, 14.000000)">
<g id="Download-Icon-Copy-2" transform="translate(43.500000, 172.000000) rotate(-180.000000) translate(-43.500000, -172.000000) translate(35.000000, 164.000000)">
<path d="M5.36902902,13.624518 L5.36902902,6.56257607 L12.430971,6.56257607" id="Rectangle-242-Copy-5" transform="translate(8.900000, 10.093547) rotate(-315.000000) translate(-8.900000, -10.093547) "></path>
<path d="M8.9,6.99999999 L8.9,12.9999999" id="Line-Copy-10" stroke-linecap="square">
</path>
<path d="M16.9,0.0208873076 L16.9,4.02297419 C16.9,5.12639113 16.0054862,6.02088731 14.9059397,6.02088731 L2.89406028,6.02088731 C1.7927712,6.02088731 0.9,5.12262668 0.9,4.02297419 L0.9,0.0208873076" id="Rectangle-243-Copy-4" transform="translate(8.900000, 3.020887) rotate(-180.000000) translate(-8.900000, -3.020887)
"></path>
</g>
</g>
</g>
</g>
</svg>
<!-- react-text: 79 -->
"Export Excel Spread Sheet"
<!-- /react-text -->
</li>
<li class="export-menu-item ">
<svg width="16px" height="12px" viewBox="0 0 16 13" version="1.1">
<desc>Created with Sketch.</desc>
<defs></defs>
<g id="Basic-Report-Template-SPECS" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="Basic-Report-Template---EXPORT-SPECS-OnClick" transform="translate(-554.000000, -182.000000)" stroke="#484B4D" stroke-width="2">
<g id="SPECS" transform="translate(487.000000, 14.000000)">
<g id="Email-Icon-Copy" transform="translate(67.000000, 168.000000)">
<rect id="Rectangle-40" x="0" y="0.669998169" width="16" height="11" rx="2"></rect>
<path d="M1.55761719,3.08300781 L8.07275391,7.10009766 L14.8974609,3.14355469" id="Path-41"></path>
</g>
</g>
</g>
</g>
</svg>
<!-- react-text: 90 -->
"Email/Schedule Report"
<!-- /react-text -->
</li>
</ul>
</div>
</div>
Update 2 (Solution)
Here's where I went wrong.
I didn't provide enough of the HTML code.
Just a few lines above there was an "iframe" which was not allowing me to enter the block of code.
After switching into the iframe, I was able to click into the button and complete the following task of exporting the excel report.
example of code (Generalized to your future endeavors)
#Finding the Frame
iframes = driver.find_element_by_id("IDofFrame")
#Switching to that frame
driver.switch_to.frame(iframes)
#Finding the dropdown button element
driver.find_element_by_xpath("XPathOfButton").click()
#delay on the export click
time.sleep(3)
#Export click
driver.find_element_by_xpath("XPathOfButtonToExport").click()
#If you need to switch out of the frame to go back to the original HTML block
driver.switch_to.default_content()
CHECK YOUR HTML CODE FOR FRAMES !!!!
good video to reference.
https://www.youtube.com/watch?v=NhRx99uFUNk
Actually your every attempt is incorrect.
driver.find_element_by_xpath("//li[contains(text(),'Export Excel
Spread Sheet')]").click()
Here you are using contains(text()) Which is incorrect actually if you pass the node set selected by text() to contains(), as you did, then it is converted to a string, by taking the string value of the first node in the node set while in your HTML I can see <svg> is the first inner node of the <li> element, I would suggest trying with dot . which will take all string value inside a node with explicit wait :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.XPATH, "//li[contains(.,'Export Excel Spread Sheet')]")))
element.click()
Hope it helps

Using selenium webdriver python to retrieve SVG text element

I am trying to use selenium webdriver (Firefox) to get information about college courses from a website where we can see course reviews .... I can get the webdriver to successfully log into the website and reach the course info page, but once I am there I can't access the text element for the overall course rating.
Here is what the page looks like:
Course Ratings Chart:
And this is what the text element HTML code looks like:
<text style="text-anchor: middle; font: 12px Arial,Helvetica,sans-serif;
opacity: 1;" x="438.00500259399416" y="131.25" text-anchor="middle"
font="10px "Arial"" stroke="none" fill="#3c4c30" font-size="12px"
font-family="Arial,Helvetica,sans-serif" font-style="normal" font-
weight="normal" transform="matrix(1,0,0,1,0,0)" opacity="1"><tspan
dy="4">3.00</tspan></text>
And the svg code:
<svg height="200" version="1.1" width="600"
xmlns="http://www.w3.org/2000/svg" style="overflow: hidden; position:
relative; left: -0.5px; top: -0.866669px;"><rect x="0" y="0" width="600"
height="200" r="0" rx="0" ry="0" fill="#ffffff" stroke="#ffffff"
style="stroke-linejoin: round; stroke-linecap: square; stroke-opacity: 1;
fill-opacity: 1;" stroke-linejoin="round" stroke-linecap="square" stroke-
width="1" stroke-opacity="1" fill-opacity="1"></rect>
.......</svg>
First I tried identifying the element by it's CSS selector (#chart > svg:nth-child(1) > text:nth-child(107)) but I got a nosuchelement exception.
I think the next option is to find the element by XPath but I'm not sure how to identify the "3.00" element because it doesn't have a specific ID or class name.
Parent element1:
(bar and text for Papers/Problem Sets)
-Papers/Psets label:
<text style="text-anchor: middle; font: 12px Arial,Helvetica,sans-serif;"
x="0" y="0" text-anchor="middle" font="10px "Arial"" stroke="none"
fill="#3c4c30" font-size="12px" font-family="Arial,Helvetica,sans-serif"
font-style="normal" font-weight="normal"
transform="matrix(1,0,0,1,128,102.0833)"><tspan dy="4">Papers, Reports,
Problem Sets, Examinations</tspan></text>
Paper/Psets bar:
<rect x="262.03334045410156" y="96.00694444444444" width="216.0105950756073"
height="12.152777777777777" r="0" rx="0" ry="0" fill="#ffffff"
stroke="#ffffff" style="stroke-linejoin: round; stroke-linecap: square;
stroke-opacity: 0; opacity: 1; fill-opacity: 0;" stroke-linejoin="round"
stroke-linecap="square" stroke-width="0" stroke-opacity="0" opacity="1"
fill-opacity="0"></rect>
Number rating for papers/ psets:
<text style="text-anchor: middle; font: 12px Arial,Helvetica,sans-serif;
opacity: 1;" x="458.2356021327972" y="102.08333333333333" text-
anchor="middle" font="10px "Arial"" stroke="none" fill="#3c4c30"
font-size="12px" font-family="Arial,Helvetica,sans-serif" font-
style="normal" font-weight="normal" transform="matrix(1,0,0,1,0,0)"
opacity="1"><tspan dy="3.999997456868485">3.31</tspan></text>
Parent element 2 (Feedback for other students bar)
Feedback text label:
<text style="text-anchor: middle; font: 12px Arial,Helvetica,sans-serif;"
x="0" y="0" text-anchor="middle" font="10px "Arial"" stroke="none"
fill="#3c4c30" font-size="12px" font-family="Arial,Helvetica,sans-serif"
font-style="normal" font-weight="normal"
transform="matrix(1,0,0,1,175.3333,160.4167)"><tspan dy="4">Feedback for
other students</tspan></text>
Bar for feedback:
<rect x="262.03334045410156" y="154.34027777777777"
width="232.3255947036743" height="12.152777777777777" r="0" rx="0" ry="0"
fill="#ffffff" stroke="#ffffff" style="stroke-linejoin: round; stroke-
linecap: square; stroke-opacity: 0; opacity: 1; fill-opacity: 0;" stroke-
linejoin="round" stroke-linecap="square" stroke-width="0" stroke-opacity="0"
opacity="1" fill-opacity="0"></rect>
Feedback rating text:
<text style="text-anchor: middle; font: 12px Arial,Helvetica,sans-serif;
opacity: 1;" x="474.55060176086425" y="160.41666666666666" text-
anchor="middle" font="10px "Arial"" stroke="none" fill="#3c4c30"
font-size="12px" font-family="Arial,Helvetica,sans-serif" font-
style="normal" font-weight="normal" transform="matrix(1,0,0,1,0,0)"
opacity="1"><tspan dy="3.9999949137369697">3.56</tspan></text>
Here is the entire HTML code for the body of the website from page_source:
(https://pastebin.com/zpd4iF05)
And for the python code I attempted to use to find the element:
(https://pastebin.com/aW40P86u)
First you need to get the html from the iframe. See the answer here:
Is it possible to get contents of iframe in selenium webdriver python?
Once you have the code from iframe set to the driver, here's the full code to get the necessary info:
tspans = driver.find_element_by_id('chart').find_elements_by_tag_name("tspan")
values = map(lambda x: x.get_attribute('innerHTML'), tspans)
length = len(values)
scores = {
"Lectures": values[length-2],
"Precepts": values[length-3],
"Readings": values[length-4],
"Papers, Reports, Problem Sets, Examinations": values[length-5],
"Overall Quality of the Course": values[length-6],
"Feedback for other students": values[length-7]
}
browser.close()
print scores
That will output:
{'Lectures': u'2.71', 'Papers, Reports, Problem Sets, Examinations': u'3.31', 'Readings': u'3.67', 'Overall Quality of the Course': u'3.00', 'Feedback for other students': u'3.56', 'Precepts': u'3.43'}
Without more of the HTML it's hard to say what the right locator would be. I would start with the actual element that contains the text and avoid locators that use things like nth-child() because it's WAY too easy for the HTML to change slightly and then your locator is pointing at the wrong element.
The element you want is <tspan dy="4">3.00</tspan>. Have you tried a simple CSS selector like, tspan[dy='4']?
I'm hoping that the dy is related to the text position and will be unique on the page. If you can post the HTML for the entire row that contains the "Overall quality of the course" label and the bar graph that contains 3.00, I think that an XPath can be created to find what you want.

Select SVG paths of group by group id using lxml

I am having trouble selecting a particular set of paths using lxml. The SVG structure looks like this
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Created with matplotlib (http://matplotlib.org/) -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" height="288pt" version="1.1" viewBox="0 0 432 288" width="432pt">
<defs>
<style type="text/css">
*{stroke-linecap:butt;stroke-linejoin:round;}
</style>
</defs>
<g id="figure_1">
<g id="patch_1">
<path d=" M0 288 L432 288 L432 0 L0 0 z " style="fill:#ffffff;"/>
</g>
<g id="patch_2">
<path d=" M0 288 L432 288 L432 0 L0 0 z " style="fill:#ffffff;"/>
</g>
<g id="axes_1">
<g id="Poly3DCollection_1">
<path clip-path="url(#pe61355d493)" d=" M195.211 34.2225 L194.801 34.0894 L196.527 212.986 L196.909 212.999 z " style="fill:#0000ff;"/>
<path clip-path="url(#pe61355d493)" d=" M195.504 34.3231 L195.211 34.2225 L196.909 212.999 L197.184 213.022 z " style="fill:#0000ff;"/>
...
Its the paths listed at the bottom that I want to select and change their styles but I can't seem to get the syntax right and I fail to select the paths
ifilename = "myfig.svg"
with open( ifilename, 'r') as infile:
tree = etree.parse( infile )
elements = tree.findall(".//g[#id='Poly3DCollection_1'")
new_style = 'stroke-width:4px; stroke: linear-gradient(orange, darkblue)'
for child in elements:
child.attrib['style'] = new_style
mod_svg = 'myfigmod.svg'
tree.write(mod_svg)
EDIT
so this gets me the element I want in this instance but I would still like a specific way of getting this element
root = tree.getroot()
for child in root[1][2][0]:
child.attrib['style'] = new_style
There is no get_element_by_id in etree, so you have to use xpath, like you are doing to grab the element. I created your file and ran the code below and was able to change the style of the group.
element = tree.findall(".//{%s}g[#id='Poly3DCollection_1']" % SVG_NS)[0]
element.attrib["style"] = new_style

Categories