Python find value of attribute with lxml

Python find value of attribute with lxml - python

How am i gonna be able to get the value of data-sku with from lxml import html
<button data-e2e="product-size" type="button" class="btn btn-default "
data-price="DKK 1,750.00"
data-sku="050226_jdsportsdk.006458"
data-brand="Nike"
title="Vælg størrelse 42"
>
42
<span class="fulfilment-notice-html hide">
That is the html ^^.
i tried xpath('//button[#data-e2e="product-size"]//#data-sku()')
but doesnt work
I want to get the value of data-sku, so that would be "050226_jdsportsdk.006458". I do not know that its going to be "050226_jdsportsdk.006458", so thats why i need some way of scraping that part.
Thanks

I found out a way to fix it!
i had () after the #data-sku, and also a double //,
the right answer is xpath('//button[#data-e2e="product-size"]/#data-sku')

Related

Select class with selenium python

Hi I'm trying to automate a process with selenium, python and GLPI, but recently I spend a lot o f time trying to select a user of a menu bar,I already did tests with linkText,cssSelector,xpath but none works for me, Maybe I'll be doing it wrong, I appreciate your help.
driver.find_element_by_xpath("//a[contains(#href,'javascript:void(0)') and contains(.,'select2-choice')]").click()
driver.find_element_by_css_selector("a[href*='select2-container select2-container-active select2-dropdown-open']").click()
driver.find_element_by_link_text("javascript:void(0)").click()
https://ibb.co/RbrXDrv that is the code source
<div class="select2-container select2-container-active select2-dropdown-open" id="s2id_dropdown__users_id_requester722037505" style="width: 80%;"> <span class="select2-chosen" id="select2-chosen-4">-----</span><abbr class="select2-search-choice-close"></abbr> <span class="select2-arrow" role="presentation"><b role="presentation"></b></span><label for="s2id_autogen4" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" type="text" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-4" id="s2id_autogen4" disabled=""></div>
thanks

Your xpath is not correct.
Try this:
driver.find_element_by_xpath("//a[#href='javascript:void(0)' and #class='select2-choice']").click()
BTW, it would be better to provide url or html, not picture.

How to click a button from Python

I am using Anaconda2, Jupiter, and Chrome browser.
I am writing below code which is running successfully but now I want to click on button.
<a href="#" action="exportSelected" class="btn btn-default">
<i class="glyphicon glyphicon-download"></i> Export
</a>
What should I write in Python to access this?
find_element_by_partial_link_text('btn btn-default').click()
It's giving an error.

"btn btn-default" is not a link text, but class names.
You can use one of below solutions:
Locate by exact link text:
find_element_by_link_text('Export').click()
Locate by compound class name:
find_element_by_css_selector('a.btn.btn-default').click()
Locate by action attribute
find_element_by_css_selector('a[action="exportSelected"]').click()

Thank you all replying me and suggesting pyhthon doc. appreciate it.
So i have used very simple lines to get my job done.
driver.find_element_by_xpath('copy your xpath').
which is very simple
driver.find_element_by_xpath('//[#id="app_content"]/div[2]/div/div/section/div/div[2]/div[1]/div[1]/div/div[2]/a[1]').click()
driver.implicitly_wait(5)
driver.find_element_by_xpath('//[#id="export_button"]').click()

Python 3.5 + Selenium Scrape. Is there anyway to select <a><a/> tags?

So I'm very new to python and selenium. I'm writting an scraper to take some balances and download a txt file. So far I've managed to grab the account balances but downloading the txt files have proven to be a difficult task.
This is a sample of the html
<td>
<div id="expoDato_msdd" class="dd noImprimible" style="width: 135px">
<div id="expoDato_title123" class="ddTitle">
<span id="expoDato_arrow" class="arrow" style="background-position: 0pt 0pt"></span>
<span id="expoDato_titletext" class="textTitle">Exportar Datos</span>
</div>
<div id="expoDato_child" class="ddChild" style="width: 133px; z-index: 50">
<a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=txt">txt</a>
<a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=pdf">PDF</a>
<a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=excel">Excel</a>
<a class="modal" href="#info_formatos">Información Formatos</a>
</div>
</div>
I need to click on the fisrt "a" class=enabled. But i just can't manage to get there by xpath, class or whatever really. Here is the last thing i tried.
#Descarga de Archivos
ddmenu2 = driver.find_element_by_id("expoDato_child")
ddmenu2.find_element_by_css_selector("txt").click()
This is more of the stuff i've already tryed
#TXT = driver.select
#TXT.send_keys(Keys.RETURN)
#ddmenu2 = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]")
#Descarga = ddmenu2.find_element_by_visible_text("txt")
#Descarga.send_keys(Keys.RETURN)
Please i would apreciate your help.
Ps:English is not my native language, so i'm sorry for any confusion.
EDIT:
This was the approach that worked, I'll try your other suggetions to make a more neat code. Also it will only work if the mouse pointer is over the browser windows, it doesn't matter where.
ddmenu2a = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[1]").click()
ddmenu2b = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]")
ddmenu2c = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]/a[1]").click()
Pretty much brute force, but im getting to like python scripting.

Or simply use CSS to match on the href:
driver.find_element_by_css_selector("div#expoDato_child a.enabled[href*='txt']")

You can get all anchor elements like this:
a_list = driver.find_elements_by_tag_name('a')
this will return a list of elements. you can click on each element:
for a in a_list:
a.click()
driver.back()
or try xpath for each anchor element:
a1 = driver.find_element_by_xpath('//a[#class="enabled"][1]')
a2 = driver.find_element_by_xpath('//a[#class="enabled"][2]')
a3 = driver.find_element_by_xpath('//a[#class="enabled"][3]')
Please let me know if this was helpful

you can directly reach the elements by xpath via text:
driver.find_element_by_xpath("//*[#id='expoDato_child' and contains(., 'txt')]").click()
driver.find_element_by_xpath("//*[#id='expoDato_child' and contains(., 'PDF')]").click()
...

If there is a public link for the page in question that would be helpful.
However, generally, I can think of two methods for this:
If you can discover the direct link you can extract the link text and use pythons' urllib and download the file directly.
or
Use use Seleniums' click function and have it click on the link in the page.
A quick search resulted thusly:
downloading-file-using-selenium

Mechanize fill out textarea within <noscript> tag

So I have the following HTML code:
<form action="blabla" blabla >
<input 1 type=blah>
<input 2 type=blah2> etc
<noscript>
<textarea name="prda" rows="3" cols="40"></textarea>
</noscript>
I want to fill out that textarea preferrably with mechanize (in Python), however, form["prda"] is always giving me control not found error. Another user on StackOverflow has suggested that mechanize cannot parse controls that are within tag, which seems kind of odd for me.
Anyway, my question is can mechanize parse the control within tag and if so, how? Also, if someone can give me the alternative of writing to the textarea, I'd be more than happy to hear it. Thanks!

Mechanize can't recognize this particular control, so you need to add a new parameter to your form.
br.form.new_control('text','prda',{'value':''})
br.form.fixup()
br.form['prda'] = 'input'
I know that probably you're no more interested but I got crazy to solve this same problem.

I changed your HTMl slightly (closing the "form" tag and adding some content to the text box for testing):
<form action="blabla" blabla >
<input 1 type=blah>
<input 2 type=blah2>
<noscript>
<textarea name="prda" rows="3" cols="40">Foobar</textarea>
</noscript>
</form>
Okay, here come the mechanize version:
from mechanize import ParseResponse, urlopen
response = urlopen("http://localhost:8000/test")
forms = ParseResponse(response, backwards_compat=False)
form = forms[0]
print form["prda"]
This prints "Foobar" so I guess I was successful selecting the textbox.
Non-mechanize version: From here:
from lxml.html import fromstring, tostring
form_page = fromstring(html_code)
form = form_page.forms[0]
form.fields = dict(
prda='input',
)
print tostring(form)

I am not able to parse using Beautiful Soup

<td>
<a name="corner"></a>
<div>
<div style="aaaaa">
<div class="class-a">My name is alis</div>
</div>
<div>
<span><span class="class-b " title="My title"><span>Very Good</span></span> </span>
<b>My Description</b><br />
My Name is Alis I am a python learner...
</div>
<div class="class-3" style="style-2 clear: both;">
alis
</div>
</div>
<br /></td>
I want the description after scraping it:
My Name is Alis I am a python learner...
I tried a lots of thing but i could not figure it out the best way. Can you guys give the in general solution for this.

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("Your sample html here")
soup.td.div('div')[2].contents[-1]
This will return the string you are looking for (the unicode string, with any applicable whitespace, it should be noted).
This works by parsing the html, grabbing the first td tag and its contents, grabbing any div tags within the first div tag, selecting the 3rd item in the list (list index 2), and grabbing the last of its contents.
In BeautifulSoup, there are A LOT of ways to do this, so this answer probably hasn't taught you much and I genuinely recommend you read the tutorial that David suggested.

Have you tried reading the examples provided in the documentation? They quick start is located here http://www.crummy.com/software/BeautifulSoup/documentation.html#Quick Start
Edit:
To find
You would load your html up via
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("My html here")
myDiv = soup.find("div", { "class" : "class-a" })
Also remember you can do most of this via the python console and then using dir() along with help() walk through what you're trying to do. It might make life easier on you to try out ipython or perhaps python IDLE which have very friendly consoles for beginners.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python find value of attribute with lxml - python

I found out a way to fix it! i had () after the #data-sku, and also a double //, the right answer is xpath('//button[#data-e2e="product-size"]/#data-sku')

Related

Select class with selenium python

How to click a button from Python

Python 3.5 + Selenium Scrape. Is there anyway to select <a><a/> tags?

Mechanize fill out textarea within <noscript> tag

I am not able to parse using Beautiful Soup

Categories

Resources