Selenium2Library: move mouse position for click - python

I am new to the whole Robotframework and the Selenium2Library and I have a problem.
I have two divs: rasterContainer and anlageContainer.
They have the same x- and y-offset. The anlageContainer has a z-offset of 3 and the rasterContainer has 0. The anlageContainer lies on top of the rasterContainer.
Those two build a time bar.The anlageContainer got just one id and the rasterContainer contains many other divs, each of them with an id.
If you mouse over those divs, the rasterContainer shows you the time. If you click there, you just click on the anlageContainer and some other methods calculate the offset to get the time and opens a window with this time in a textbox.
What I want to do:
I want to move my mouse to an element of the rasterContainer and click at the same position on the anlageContainer.
What I have tried:
I began to write my own library in python. I have just one method which gets an instance of the Selenium2Library, the vertical value of the mouse position (the mouse is on top of the anlageContainer) and the vertical value of the rasterContainer's element.
def click_on_element(self, vertEl, vertMo, se2lib):
v = vertEl - vertMo
#Get Webdriver
driver = se2lib._current_browser()
#ActionChains instance
ac = webdriver.ActionChains(driver)
ac.move_by_offset(0, v)
ac.click().perform()
return "On my way"
With move_by_offset: The window opens but with the wrong time (07:00). I wanted to have 09:30.
I also tried:
#Get Element
elmfinder = ElementFinder()
elm = elmfinder.find(driver, "5_09_30")[0]
ac.move_to_element(elm)
ac.move_to_element_with_offset(elm, 461, 422)
The window opened neither with move_to_element nor with move_to_element_with_offset.
I really don't know what I am missing here.
Any hints would help.
EDIT:
HTML code:
<div id="resource_id_5_2013-07-30" class="resource" daylenght="720" loaded="false" date="2013-07-30" time="07:00" style="top: 0px; height: 1540px; width: 309.75px; left: 619.5px;">
<div class="terminContainer"></div>
<div class="overlapContainer" style="width: 10%; position: absolute; left: 90%; height: 1560.0px; top: 0px;"></div>
<div id="5" class="anlageContainer" style="width: 10%; height: 1440px; top: 0px;" title="08:53"></div>
<div class="rasterContainer" style="width: 10%; height: 1440px; top: 0px;">
<div id="5_07_00" class="rasterLabel" style="position: absolute; top: 0px;">7:00</div>
<div id="5_07_15" class="rasterLabel" style="position: absolute; top: 30px;">7:15</div>
<div id="5_07_30" class="rasterLabel" style="position: absolute; top: 60px;">7:30</div>
etc...
</div>
</div>
CSS style:
.rasterContainer{
position: absolute;
background-color: #EEEEEE;
}
.anlageContainer:hover + .rasterContainer{
background-color: #e3e3e3;
}
.rasterLabel{
z-index: 2;
font-size: 0.7em;
color: #000;
border-top: solid 1px #888;
}
.anlageContainer{
z-index: 3;
cursor: pointer;
position: absolute;
}
There you can see that the anlageContainer is above the rasterContainer. And between them are the rasterLabels --> z-index.
The anlageContainer has
dojo.connect(anlageContainer, 'onclick', function(clickevt){
addTermin(resourceId, getOffsetY(clickevt)/g_terminMultiplikator, datum);
});
Two links to images:
Time bar
3D time bar

element = find_element_by_xpath (driver, ".//div[#id='resource_id_5_2013-07-30']//div[#class='anlageContainer']")
element.click()

Related

Python Selenium Having Issues Selecting From Dropdown

I have no training in python and am trying to write my first script using Python Selenium via Visual Studio Code. I'm currently trying to select an item from a dropdown box, but I'm not understanding how this html was put together, it feels pretty unconventional.
I want to select A-Active from the Status dropdown box:
However, when inspecting the code for the dropdown box it's not using a Select tag. When I copy outerHTML this is all I get back:
<div class="g_GridElement g_GridElement_scrollignore gl_94_x_97 gl_94_w_97_12 gl_94_y_1 gl_94_h_1_1">
<div class="containerElement gbc_WidgetBase gbc_ComboBoxWidget gbc_MunisComboBoxWidget w_111 g_measureable gbc_WidgetBase_standalone aui__1107 gbc_NotNull gbc_Required gbc_Focus editing" tabindex="0" role="combobox" __widgetbase="" __comboboxwidget="" __muniscomboboxwidget="" id="w_111" data-aui-id="1107" aria-expanded="false" aria-live="polite" aria-owns="w_113" aria-labelledby="w_112" data-gqa-name="prempmst.prem_act_stat" data-gqa-aui-id="1107" data-gqa-tabindex="7" data-aui-name="prempmst.prem_act_stat" aria-describedby="tcw-tooltip-22" aria-selected="true">
<tcw-text-field density="dense" class="gbc_dataContentPlaceholder gbc_WidgetBase gbc_EditWidget gbc_MunisEditWidget w_112 g_measureable gbc_WidgetBase_standalone gbc_staticMeasure aui__1107" __widgetbase="" __editwidget="" __muniseditwidget="" id="w_112" data-aui-id="1107" tcw-text-field-label-floating="">
<input type="text" class="gbc-label-text-container tyl-text-field__input--focused" autocomplete="new-password" __widgetbase="" __editwidget="" __muniseditwidget="" id="input-w_112" readonly="readonly">
<label __widgetbase="" __editwidget="" __muniseditwidget="" for="input-w_112" slot="label" class="tyl-floating-label tyl-text-field__label--focused tyl-floating-label--float"></label>
<char-measurer class="g_layout_charMeasurer" aria-hidden="true">
<char-measurer-item class="g_layout_charMeasurer1">
MMMMMMMMMM
M
M
M
M
M
M
M
M
M
</char-measurer-item><char-measurer-item class="g_layout_charMeasurer2">0000000000</char-measurer-item>
</char-measurer><span class="gbc_dataContentMeasure" aria-hidden="" __leaflayoutmeasureelement="">B - BENEFITS ONLY</span><i slot="trailing" class="tyler-icons" aria-hidden="true">arrow_drop_down</i>
</tcw-text-field><i class="zmdi toggle" title="Open list" __widgetbase="" __comboboxwidget="" __muniscomboboxwidget=""></i>
<char-measurer class="g_layout_charMeasurer" aria-hidden="true">
<char-measurer-item class="g_layout_charMeasurer1">
MMMMMMMMMM
M
M
M
M
M
M
M
M
M
</char-measurer-item><char-measurer-item class="g_layout_charMeasurer2">0000000000</char-measurer-item>
</char-measurer><div tabindex="0" class="gbc_dataContentPlaceholder mt-label gbc_WidgetBase gbc_LabelWidget gbc_MunisLabelWidget w_114 g_measureable tyl-typography--caption gbc_WidgetBase_standalone gbc_staticMeasure aui__1107" role="note" __widgetbase="" __labelwidget="" __munislabelwidget="" id="w_114" data-aui-id="1107">
<span class="gbc-label-text-container is-empty-label" __widgetbase="" __labelwidget="" __munislabelwidget=""></span>
<char-measurer class="g_layout_charMeasurer" aria-hidden="true">
<char-measurer-item class="g_layout_charMeasurer1">
MMMMMMMMMM
M
M
M
M
M
M
M
M
M
</char-measurer-item><char-measurer-item class="g_layout_charMeasurer2">0000000000</char-measurer-item>
</char-measurer><span class="gbc_dataContentMeasure" aria-hidden="" __leaflayoutmeasureelement=""></span>
</div><tcw-tooltip target="#w_111" text="Active Status from employee master" style="border: 0px; clip: rect(0px, 0px, 0px, 0px); height: 1px; margin: -1px; overflow: hidden; padding: 0px; position: absolute; width: 1px; outline: 0px; appearance: none;" id="tcw-tooltip-22">Active Status from employee master</tcw-tooltip>
</div>
</div>
It's not even showing all the options, just the first two. Has anybody come across html like this? Any ideas on how I can select A for the Status box? I feel like I'm not giving y'all enough to work with, but I'm not sure what else to provide...

Parse div element from html with style attributes

I'm trying to get the text Something here I want to get inside the div element from a html file using Python and BeautifulSoup.
This is how part of the code looks like in html:
<div xmlns="" id="idp46819314579224" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #d43f3a; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;" class="" onclick="toggleSection('idp46819314579224-container');" onmouseover="this.style.cursor='pointer'">Something here I want to get<div id="idp46819314579224-toggletext" style="float: right; text-align: center; width: 8px;">
-
</div>
</div>
And this is how I tried to do:
vu = soup.find_all("div", {"style" : "background: #d43f3a"})
for div in vu:
print(div.text)
I use loop because there are several div with different id but all of them has the same background colour. It has no errors, but I got no output.
How can I get the text using the background colour as the condition?
The style attribute has other content inside it
style="box-sizing: ....; ....;"
Your current code is asking if style == "background: #d43f3a" which it is not.
What you can do is ask if "background: #d43f3a" in style -- a sub-string check.
One approach is passing a regular expression.
>>> import re
>>> vu = soup.find_all("div", style=re.compile("background: #d43f3a"))
...
... for div in vu:
... print(div.text.strip())
Something here I want to get
You can also say the same thing using CSS Selectors
soup.select('div[style*="background: #d43f3a"]')
Or by passing a function/lambda
>>> vu = soup.find_all("div", style=lambda style: "background: #d43f3a" in style)
...
... for div in vu:
... print(div.text.strip())
Something here I want to get

How to find an element in an <iframe> (none: id; none: classname) to action?

I'm looking for element xpath <'/html/body/div/div[2]'> to be able to do automatic button hold down using selenium python, however I'm having difficulty since they are inside iframe.
I tried "Expected_conditions as EC" but I'm not knowledgeable enough to make them work. Tell me what to change, or a more sensible approach.
HTML:
<div id="px-captcha" role="main">
<iframe style="display: none; width: 310px; height: 100px; border: 0px; user-select: none;" token="951d7e81fd6fb5e2af2cb2c701dbb6c391ab81d4b983da5f2f2de85667241a43a3a814a87cae2e98c70b730f7eaaac0a04bbf77bbfc63735e436d1d07675cb68"></iframe>
<iframe style="display: block; width: 310px; height: 100px; border: 0; -moz-user-select: none; -khtml-user-select: none; -webkit-user-select: none; -ms-user-select: none; user-select: none;" token="951d7e81fd6fb5e2af2cb2c701dbb6c391ab81d4b983da5f2f2de85667241a43a3a814a87cae2e98c70b730f7eaaac0a04bbf77bbfc63735e436d1d07675cb68">
#document
<html lang="en-US"
<head>...</head>
<body>
<div id="kkBSsePnKDMVkwa" class="eIlUWbNLSMdFkEz">
<div id="#LrJbZYBfdAzlAkl"></div>
<div id="BlXIkuwFPcwvDCY" role="main" aria-label="Please press and hold the button until verified">...</div>
</div>
</body>
</html>
</iframe>
<iframe style="display: none; width: 310px; height: 100px; border: 0px; user-select: none;" token="951d7e81fd6fb5e2af2cb2c701dbb6c391ab81d4b983da5f2f2de85667241a43a3a814a87cae2e98c70b730f7eaaac0a04bbf77bbfc63735e436d1d07675cb68"></iframe>
<p style="color: red; margin-top: 4;">Please try again</p>
</div>
Code (Updated)#Error:
def captcha(url):
driver.get(str(url))
time.sleep(10)
try:
captcha_element = driver.find_element_by_id('px-captcha')
print(len(captcha_element.text), 'Captcha verification request page')
print('Run pass captcha programing')
# 2.1: Verify captcha
# Research iframe containing captcha
# # Example 2: Use pyautogui library
# driver.set_window_position(0, 0)
# driver.set_window_size(1024, 640)
# sleep(randint(5,10))
# pyautogui.moveTo(400, 438)
# pyautogui.click()
# pyautogui.dragTo(596, 438, 5, button='left')
# Example 3:
for i in range(10):
try:
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "(//div[#id='px-captcha']/iframe)[{i}]"))) #ERROR HERE. TO TRY REPLACEMENT <wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "(//div[#id='px-captcha']/iframe)[2]")))>
print ("- Found iframe")
element = driver.find_element(By.XPATH, "//div[contains(#aria-label, 'Please press and hold the button until verified')]")
print ("- Found element")
# click and hold 5 seconds to pass the captcha
print("Button verify: ", len(element.text))
action = ActionChains(driver)
click = ActionChains(driver)
frame_x = element.location['x']
frame_y = element.location['y']
print("x: ", frame_x)
print("y: ", frame_y)
print("size box: ", element.size)
print("x max click: ", frame_x + element.size['width'])
print("y max click: ", frame_y + element.size['height'])
x_move = frame_x + element.size['width']/2
y_move = frame_y + element.size['height']/2
print("Click (x,y) = ", x_move, y_move)
action.move_to_element_with_offset(element, x_move, y_move).click_and_hold().perform()
time.sleep(10)
action.release(element)
action.perform()
time.sleep(0.2)
action.release(element)
print('Verify successful')
break
except:
print(f'- NOT Found xpath Num.: {i}')
sleep(randint(5,10))
except:
# 2.2: Skip captcha
print('Website does NOT require captcha verification')
sleep(randint(2,3))
I wanted find element:
<div id="BlXIkuwFPcwvDCY" role="main" aria-label="Please press and hold the button until verified">...</div>
In order to interact with this web element :
<div id="BlXIkuwFPcwvDCY" role="main" aria-label="Please press and hold the button until verified">...</div>
you need to switch to this iframe first :
<iframe style="display: block; width: 310px; height: 100px; border: 0; -moz-user-select: none; -khtml-user-select: none; -webkit-user-select: none; -ms-user-select: none; user-select: none;" token="951d7e81fd6fb5e2af2cb2c701dbb6c391ab81d4b983da5f2f2de85667241a43a3a814a87cae2e98c70b730f7eaaac0a04bbf77bbfc63735e436d1d07675cb68">
Now since you have mentioned that we are not able to find any any unique identifier for this, I would probably use it's parent div <div id="px-captcha" role="main">
Something like this :-
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "(//div[#id='px-captcha']/iframe)[2]")))
and then can interact with the desired web element.
driver.find_element(By.XPATH, "//div[contains(#aria-label, 'Please press and hold the button until verified')]").click()
You are gonna need the below imports as well :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

I have 4 nested div tags and when I print text using find_all, it prints the text 4 times

I am extracting text from an html file which contains a lot of div tags. However, at some places there are say 4 nested div tags and when I print text, it prints it 4 times.
<div>
<div id="PGBRK" style="TEXT-INDENT: 0pt; WIDTH: 100%; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt">
<div id="PN" style="PAGE-BREAK-AFTER: always; WIDTH: 100%">
<div style="TEXT-ALIGN: center; WIDTH: 100%"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 10pt">27</font></div>
</div>
</div>
</div>
For example, here if I do:
for item in page_soup.find_all('div'):
if "27" in item.text:
print(item)
It prints the number 27 four times and therefore messes up whole text.
How can I get my code to only print the nested text once?
EDIT 1:
This works well for this part of the code. But like I said, this is only true at some places. For example, when I do:
for item in page_soup.find_all('div', recursive = False):
print(item)
It does not print anything. For reference, this is the document I am trying to scrape.
EDIT 2:
From the given html, I am trying to extract the section "ITEM 1A. RISK FACTORS".
should_print = False
for item in page_soup.find_all('div'):
if "ITEM 1A." in item.text:
should_print = True
elif "ITEM 1B." in item.text:
break
if should_print:
print(item)
So I am printing everything starting from ITEM 1A. until it finds ITEM 1B.
Here at some places there are nested div tags, which gets printed multiple times with this piece of code.
If I do, recursive = False, it does not print anything.
Here is one option
import bs4, re
html = '''<div>
<div id="PGBRK" style="TEXT-INDENT: 0pt; WIDTH: 100%; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt">
<div id="PN" style="PAGE-BREAK-AFTER: always; WIDTH: 100%">
<div style="TEXT-ALIGN: center; WIDTH: 100%"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 10pt">27</font></div>
</div>
</div>
</div>
</div>'''
soup = bs4.BeautifulSoup(html,'html.parser')
elements = soup.find_all(text=re.compile('27'))
print(elements)
output
[u'27']
printing everything starting from ITEM 1A. until it finds ITEM 1B
Trough .string attribute (https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string)
import requests
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/Archives/edgar/data/4904/000000490412000013/ye11aep10k.htm'
html_doc = requests.get(url).content
page_soup = BeautifulSoup(html_doc, 'html.parser')
do_print = False
for el in page_soup.find_all('div'):
if el.string:
if "ITEM 1A" in el.string:
do_print = True
elif "ITEM 1B" in el.string:
break
if do_print:
print(el)
The output (I'll show the representative start and end blocks without middle part, to make a short dump):
<div align="justify" style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 12pt; FONT-WEIGHT: bold"><font style="DISPLAY: inline; TEXT-DECORATION: underline">ITEM 1A.   RISK FACTORS</font></font></div>
<div style="TEXT-INDENT: 0pt; DISPLAY: block"><br/>
</div>
<div align="justify" style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 12pt; FONT-WEIGHT: bold">GENERAL RISKS OF OUR REGULATED OPERATIONS</font></div>
<div style="TEXT-INDENT: 0pt; DISPLAY: block">
<div align="justify" style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt"><font style="FONT-STYLE: italic; DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 12pt; FONT-WEIGHT: bold"> </font></div>
<div align="justify" style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt"><font style="FONT-STYLE: italic; DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 12pt; FONT-WEIGHT: bold">The regulatory environment in Ohio has recently become unpredictable and increasingly uncertain. – Affecting AEP and OPCo</font></div>
<div style="TEXT-INDENT: 0pt; DISPLAY: block"><br/>
.....
<div style="TEXT-ALIGN: center; WIDTH: 100%"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 10pt">37</font></div>
<div style="TEXT-ALIGN: center; WIDTH: 100%">
<hr noshade="" size="2" style="COLOR: black"/>
</div>
<div id="HDR">
<div align="right" id="GLHDR" style="WIDTH: 100%"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 8pt">  </font></div>
</div>
<div align="right" id="GLHDR" style="WIDTH: 100%"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 8pt">  </font></div>
<div style="TEXT-INDENT: 0pt; DISPLAY: block"> </div>
You can provide the option text = "27" to search the divs by text and identify only that exact div. The below code should work fine. If you want to get all the divs then just remove the text = "27" or replace it with what text that you want to find. You can also use recursive = False to get only the top level divs.
Edit 1:
from bs4 import BeautifulSoup
t = '''
<div>
27
</div>
<div>
<div id="PGBRK" style="TEXT-INDENT: 0pt; WIDTH: 100%; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt">
<div id="PN" style="PAGE-BREAK-AFTER: always; WIDTH: 100%">
<div style="TEXT-ALIGN: center; WIDTH: 100%"><font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 10pt">27</font></div>
</div>
</div>
</div>
</div>
'''
page_soup = BeautifulSoup(t, 'html.parser')
for item in page_soup.find_all('div', text="27"):
print(item.text)
Edit 2:
I have added a specific code that works for your problem specifically. Try the below code. The div range that you are expecting is from 567 - 715 with page numbers removed.
import requests
from bs4 import BeautifulSoup
resp = requests.get(
r'https://www.sec.gov/Archives/edgar/data/4904/000000490412000013/ye11aep10k.htm')
t = resp.text
page_soup = BeautifulSoup(t, 'html.parser')
s = 'body > div:not(#PGBRK)'
for i in page_soup.select(s)[567:715]:
print(i.get_text(strip=True))
Well I think that is a cool question, and I don't see a simple answer if you want to generalize it to find out what text there is at each level without resorting to searching for a specific number like 27. Beautiful Soup doesn't seem to have a function for showing only the text in the top , and recursive=False simply prevents the search from delving below the first level but will still include everything below the first level as contents, so if at the top level of tags then it will capture it and everything below it
So I think you'd actually have to recurse down the tree of divs and compare the text at each level. I figure this out. It prints in reverse order as it bubbles up from the recursion but that could be stored in a list and output in forward order.
from bs4 import BeautifulSoup
soup = BeautifulSoup('<div>1A<div>2A</div>1B<div>2B<div>3A</div><div>3A</div>2C</div>1C</div>', 'html.parser')
def mangle(node):
divs = node.find_all('div')
if len(divs):
result = [divs[0]] + [n for n in divs[0].next_siblings if n.__class__.__name__ == 'Tag']
txt = []
for r in result:
txt.append(r.__repr__())
for c in mangle(r):
txt[-1] = txt[-1].replace(c.__repr__(), '')
print(''.join(BeautifulSoup(t, 'html.parser').text for t in txt))
return result
else:
return []
if __name__ == '__main__':
mangle(soup)
Basically it walks down the branches of divs and builds lists at each fork of the tree, including the tags, then the caller removes anything found below it leaving just the text that is defined at that level. I keep the tags in place so that text patterns appearing at multiple levels don't get removed by mistake.
Output from the html 1A2A1B2B3A3A2C1C was
3A3A
2A2B2C
1A1B1C
which is the 3rd, 2nd and 1st nesting levels respectively. Hope this helps.
I will answer my own question since I finally got it to work.
The solution was easy, I was just thinking it too hard.
I just added the condition that the parent of the item should not be "div". Now the program does not print the text multiple times.
should_print = False
for item in page_soup.find_all('div'):
if item.name == "div" and item.parent.name != "div"
if "ITEM 1A." in item.text:
should_print = True
elif "ITEM 1B." in item.text:
break
if should_print:
print(item)
Thank you everyone for your contributions. Appreciated...

Extracting parent and child information

Using Python and beautifulsoup, I need help extracting information from a parent div and a child div at the same time.
Here is the first example code:
<div id="slide-609becd056bb40a7ad42607a4d1c67f5"
class="slide has-link slick-slide"
data-label="April 2 2018 Acura TLX Offer 2000x700.jpg"
data-link="/new-inventory/index.htm?model=TLX&year=2018" data-target="_self"
style="background-image: url("https://pictures.dealer.com/a/adw/0877/5eabcb338dc604c09b28a4df5a49ad78x.jpg?impolicy=resize&h=514");
width: 1897px; position: relative; left: 0px; top: 0px; z-index: 998; opacity: 0; height: 514px; transition: opacity 750ms ease;" data-slick-index="0" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide00">
Here is example code 2:
<div id="slide-7ae8b29ddc9e45d1a219beffe5793b2b"
class="html-slide slide slick-slide"
data-label="March-Madness.jpg" data-link="" data-target=""
data-promo-id="" data-slick-index="2" aria-hidden="true" tabindex="-1" role="option"
aria-describedby="slick-slide02"
style="width: 1897px; position: relative; left: -3794px; top: 0px; z-index: 998; opacity: 0; height: 514px; transition: opacity 750ms ease;">
<div class="slide-background"
style="background-image: linear-gradient(rgba(0, 0, 0, 0), rgba(0, 0, 0, 0)), url("https://pictures.dealer.com/g/goodsonacuraofdallasadw/1747/13ed067a023df8ad412feea2c6eddec9x.jpg?impolicy=resize&h=514"); height: 514px;">
<img src="https://pictures.dealer.com/g/goodsonacuraofdallasadw/1747/13ed067a023df8ad412feea2c6eddec9x.jpg?impolicy=resize&h=514" class="placeholder-image pull-left"> </div>
I need to get the style element from both examples of code so I can get the background image url. The issue is that the first code has the style in the parent div and the second set of code has the style in the child div. How do I get those two style elements at the same time using Python and beautifulsoup?
Here is the code I have tried:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.goodsonacura.com/'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
banner_info = page_soup.findAll('div',{'class':['slide has-link', 'html-slide slide has-link']})
picture = [banner.get('style') for banner in banner_info]
This code gives me the correct style element for the first example code, but it gives me the wrong style element for the second example code.
Add "slide-background" class in the find_all query. See the example below:-
banner_info = page_soup.find_all('div',{'class':['slide has-link', 'html-slide slide has-link', 'slide-background']})
It works for me. May this helps you.

Categories