I am trying to print the value of a span every time it changes. To print the value of the span is quite easy:
popup = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="spot"]')))
Print(popup.text)
This will print the value at that moment, the problem is that the value will change every 2 seconds. I tried using:
# wait for the first popup to appear
popup = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="spot"]')))
# print the text
print(popup.text)
# wait for the first popup to disappear
wait.until(EC.staleness_of(popup))
# wait for the second popup to appear
popup = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="spot"]')))
# print the text
print(popup.text)
# wait for the second popup to disappear
wait.until(EC.staleness_of(popup))
No matter how long my wait value is, 10 or 20 or even 30 seconds, the process always times out. I do not know much about coding but I think this method does not work because the span as a whole does not change only the span value(text). One method that I tried was to loop the Print(popup) command and it partially worked. it printed the same value 489 times until it changed and printed the other one 489 times again.I have since tried this code:
popup = wait.until(EC.text_to_be_present_in_element_value((By.XPATH, '//*[#id="spot"]')))
print(popup.text)
but it returns:
TypeError: __init__() missing 1 required positional argument: 'text_'
.
Please help what it is I need to add or what method I need to use to get the changing value.
HTML code inspection
Please I beg you, please beware Im not trying to print the text of the span, I already know how to do that, I want print it everytime it changes
Assuming that the element does disappear and reappear again:
You can just go back and forth between waiting for the element being located and being located.
Assuming that the elements content changes, but doesn't disappear:
I don't know of any explicit way to wait for the change of the content of an element, so as far as I am concerned you would need to compare the change yourself. You might want to add an absolute wait of < 2 seconds to limit the amount of unnecessary comparisons you make.
# Init a list to contain the values later on
values = []
# Wait for the element to be loaded in the first place
popup = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="spot"]')))
values.append(popup.text)
while True:
# possibly wait here
new_value = driver.find_element(By.XPATH, '//*[#id="spot"]')
# look up if the value has changed based on the values you know and add the new value
if values[-1] != new_value:
values.append(new_value)
# add an exit condition unless you actually want to do it forever
Please be aware: This will only work if the value actually changes each and every time or if you don't need duplicates that follow one another.
If you need every value, you can leave out the comparison and add one value every ca. 2 seconds.
For your example:
The page on binary.com you provided uses websocket in order to refresh the content. This is a protocol that allows the server to send data to the client and the other way around.
So it's a different approach to the http protocol you are used to (you send a request, the server replies - let's say you ask for the webpage, then the server will just send it).
This protocol opens a connection and keeps it alive. There will hardly be a wait to anticipated this change. But: In your browser (assuming Chrome here) you can go into your developer tools, go into the "Network" Tab and filter for the WS (websocket). You'll see a connection with v3?app_id=1 (you might need to refresh the page to have output in the Network-Tab).
Click on that connection and you'll see the messages your client sent annd the ones you received. Naturally you only need those received so filter for those.
As those are quite a few steps have a look on that screenshots, it shows the correct settings:
Every message is in json format and you click on it to see its content. Under "tick" you'll see the ask and bid data.
In case that suffices, you can just leave the page open for as long as you need, then copy the output, save it as a file and read it with python for analysis.
It seems you can also automate this with selenium as demostrated here:
http://www.amitrawat.tech/post/capturing-websocket-messages-using-selenium/
Basically they do the same thing, they set the capability to record the log, then filter through it to get the data they need. Note that they use Java to do so - but it wont be hard to translate to python.
Related
I want to fill out some personal data on a website. The first input element can be accessed by find_element_by_id but the id of the next text field has a different id every time I access the website. In a Browser, I can simply press the TAB key two times to get to the desired text input field. Is there a way to achieve similar behavior with selenium?
i tried the following:
input1 = browser.find_element_by_id('id_email')
input1.send_keys("email#something.com")
input2 = browser.send_keys(Keys.TAB).send_keys(Keys.TAB)
input2.send_keys("Something else")
But Line 3 gives me:
AttributeError: 'WebDriver' object has no attribute 'send_keys'
You cannot send keys against the browser object.
This line:
input2 = browser.send_keys(Keys.TAB).send_keys(Keys.TAB)
is not valid. It even tells you as such: 'WebDriver' object has no attribute 'send_keys'
That says your webdriver object (which you called "browser") does not have a attribute ( a method/ a function) called Send_Keys.
Top tip to avoid this sort of problem is to use a good IDE with intellisense. That will tell you the methods you can use.
In vscode, you get this:
As you type, it tells you valid commands - and you can see the driver has no send keys!
What you need to do is use .sendKeys(..) against a web element.
In your code you already have input1 - that is a web element. You can send keys against that.
Something like this:
input1 = browser.find_element_by_id('id_email')
input1.send_keys("email#something.com")
input1.send_keys(Keys.Tab)
If you want to do multiple tabs from the same object, you can just add multiple. This will tab 3 times
.send_keys(Keys.TAB + Keys.TAB + Keys.TAB)
When i run a sample script on google.com,
I tab 3 times i go to the googlesearch button. (first tab is the clear, second tab is the microphone, 3rd tab goes to the button):
Finally, using tabs to navigate is a LAST RESORT. They can be flaky and inconsistent. You REALLY should get an identifier for your object.
If you can share your URL or the page DOM then I'm help you identify a working identifier. I know you say there is no ID but there many ways to access objects.
welcome!
i can only guess, since i'm not able to test the code out. But seems like you are not getting the right element with the 2 Tabs, since it starts from the start of the html page and not from the input1 field. You'd better getting the password by its id or parents id, rather then with Key.TAB.
I want to scrape data from an HTML table for different combinations of drop down values via looping over those combinations. After a combination is chosen, the changes need to be submitted. This is, however, causing an error since it refreshes the page.
This it what I've done so far:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
browser.get('https://daten.ktbl.de/feldarbeit/entry.html')
# Selecting the constant values of some of the drop downs:
fertilizer = Select(browser.find_element_by_name("hgId"))
fertilizer.select_by_value("2")
fertilizer = Select(browser.find_element_by_name("gId"))
fertilizer.select_by_value("193")
fertilizer = Select(browser.find_element_by_name("avId"))
fertilizer.select_by_value("383")
fertilizer = Select(browser.find_element_by_name("hofID"))
fertilizer.select_by_value("2")
# Looping over different combinations of plot size and amount of fertilizer:
size = Select(browser.find_element_by_name("flaecheID"))
for size_values in size.options:
size.select_by_value(size_values.get_attribute("value"))
time.sleep(1)
amount= Select(browser.find_element_by_name("mengeID"))
for amount_values in amount.options:
amount.select_by_value(amount_values.get_attribute("value"))
time.sleep(1)
#Refreshing the page after the two variable values are chosen:
button = browser.find_element_by_xpath("//*[#type='submit']")
button.click()
time.sleep(5)
This leads to the error:selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <option> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed.
Obviously the issue is that I did indeed refresh the document.
After submitting the changes and the page has loaded the results, I want to retrieve the them with:
html_source = browser.page_source
df_list = pd.read_html(html_source, match = "Dieselbedarf")
(Shout-out to #bink1time who answered this part of my question here).
How can I update the page without breaking the loop?
I would very much appreciate some help here!
Stale Element Reference Exception often occurs upon page refresh because of an element UUID change in the DOM.
In order to avoid it, always try to search for an element before an interaction. In your particular case, you searched for size and amount, found them and stored them in variables. But then, upon refresh, their UUID changed, so old ones that you have stored are no longer attached to the DOM. When trying to interact with them, Selenium cannot find them in the DOM and throws this exception.
I modified your code to always re-search size and amount elements before the interaction:
# Looping over different combinations of plot size and amount of fertilizer:
size = Select(browser.find_element_by_name("flaecheID"))
for i in range(len(size.options)):
# Search and save new select element
size = Select(browser.find_element_by_name("flaecheID"))
size.select_by_value(size.options[i].get_attribute("value"))
time.sleep(1)
amount = Select(browser.find_element_by_name("mengeID"))
for j in range(len(amount.options)):
# Search and save new select element
amount = Select(browser.find_element_by_name("mengeID"))
amount.select_by_value(amount.options[j].get_attribute("value"))
time.sleep(1)
#Refreshing the page after the two variable values are chosen:
button = browser.find_element_by_xpath("//*[#type='submit']")
button.click()
time.sleep(5)
Try this? It worked for me. I hope it helps.
I am trying to validate that a value changes to the correct text and if it does not to refresh the page and check again for up to set time.
I have tried while-loops, if statements and nested variations of both with no success. I am not even sure how to format it as this point.
element = driver.find_element_by_xpath('xpath')
While True:
if element contains textA
break
else if element contains textB
driver.refresh()
else
error
Something along those lines. Ignore any syntax errors, I am just trying to get the idea across
I have also tried using EC and By with no luck
Edit: Adding some details
So what I have is a table. I am inserting a new row with no problems. Then I need to check that one of the column values of the new row gets updated from 'new' to 'old' which usually takes about anywhere from 30secs to 2mins. This is all viewable from a web ui. I need to refresh the page in order to see the value change. I wish I had some more detailed code or error to post along with it but honestly I am just beginning to learn Selenium
Can you please try the following :
while True:
try:
driver.find_element_by_xpath('xpath'):
except NoSuchElementException:
driver.refresh
else:
print("Text found")
break
Note: I suggest to create text-based XPath to avoid an extra line of code to get and compare text.
(Selenium/webscraping noob warning.)
selenium 3.141.0
chromedriver 78
MacOS 10.14.6
I'm compiling a list of URLs across a range of dates for later download. The URLs are in a table that displays information for the date selected on a nearby calendar. When the user clicks a new date on the calendar, the table is updated asynchronously with a new list of URLs or – if no files exist for that date – with a message inside a <td class="dataTables_empty"> tag.
For each date in the desired range, my code clicks the calendar, using WebDriverWait with a custom expectation to track when the first href value in the table changes (indicating the table has finished updating), and scrapes the URLs for that day. If no files are available for a given date, the code looks for the dataTables_empty tag to go away to indicate the next date's URLs have loaded.
if current_first_uri != NO_ATT_DATA:
element = WebDriverWait(browser, 10).until_not(
text_to_be_present_in_href((
By.XPATH, first_uri_in_att_xpath),
current_first_uri))
else:
element = WebDriverWait(browser, 10).until_not(
EC.presence_of_element_located((
By.CLASS_NAME, "dataTables_empty")))
This works great in all my use cases but one: if two or more consecutive days have no data, the code doesn't notice the table has refreshed, since the dataTables_empty class remains in the table (and the cell is identical in every other respect).
In the Chrome inspector, when I click from one date without data to another, the corresponding <td> flashes pink. That suggests the values are being updated, even though their values remain the same.
Questions:
Is there a mechanism in Selenium to detect that the value was refreshed, even if it hasn't changed?
If not, any creative ideas on how to determine the table has refreshed in the problem use case? I don't want to wait blindly for some arbitrary length of time.
UPDATE: The accepted answer answered the latter of the two questions, and I was able to replace my entire detection scheme using the MutationObserver.
You could use a MutationObserver:
driver.execute_script("""
new MutationObserver(() => {
window.lastRefresh = new Date()
}).observe(document.querySelector('table.my-table'), { attributes: true, childList: true, subtree: true } )
""")
And get the last time the table dom changed with:
lastRefresh = driver.execute_script("return window.lastRefresh")
I use this below method to check if element has gone stale or not. Usually expecting false.
The same may help in your case when you are expecting true.
isElementStale(driver, element) {
try:
wait = WebDriverWait(browser, 2)
element.isEnabled()
element = wait.until(EC.element_to_be_clickable(element))
if element != null:
return False
except:
print('')
return True
}
So you can pass element to this method and check if any change has occured to it like
# element = Get First element
# Make changes that causes the refresh
if (isElementStale(driver, element)):
print('Element refreshed')
else:
print('Element Not refreshed')
I wanted to do get some experience with html crawling, so I wanted to see if I could grab some values of the following site: http://www.iex.nl/Aandeel-Koers/11890/Royal-Imtech/koers.aspx
This site shows the price of imtech shares.
If you take a look at the site, you see there is 1 number shown in bold, this is the price of the share.
As you may have seen, this price changes, and that's okay. I only want the value at the time I run my script at this point in time.
but if you reload the page, you may notice how it first shows "laatste koers" and after a delay of 1 second it shows "realtime"
As you may have figured out by now, I'm interested in the "realtime" value.
Here is my question, how do I get this value, I've tried time.sleep(2) on different places. I've tried a timeout at the request. Both didn't work.
How can I fix this?
from lxml import html
import requests
pagina = 'http://www.iex.nl/Aandeel-Koers/11890/Royal-Imtech/koers.aspx'
page = requests.get(pagina)
tree = html.fromstring(page.text)
koers = tree.xpath('//span[#class="RealtimeLabel"]/text()')
prices = tree.xpath('//span[#id="ctl00_ctl00_Content_LeftContent_PriceDetails_lblLastPrice"]/text()')
print koers[0], pagina.split("/")[5], prices[0]
I get output like this
Laatste koers Royal-Imtech 0,093
While I want output like this
Realtime Royal-Imtech 0,093
I would suggest use a wait until the element changes.
Find the block of code below to help you.
def wait_while(condition, timeout, delta=1):
"""
#condition: lambda function which checks if the text contains "REALTIME"
#timeout: Max waiting time
#delta: time after which another check has to be made
"""
max_time = time.time() + timeout
while max_time > time.time():
if condition():
return True
time.sleep(delta)
return False