Selenium Get An Element Quickly - python

Currently I'm making a python bot for whatsapp manually without APIs or that sort because I am clueless. As such, I'm using Selenium to take in messages and auto reply. Currently, I'm noticing that every few messages, one message doesn't get picked up because the loops ran are too slow and my computer is already pretty fast. Here's the code:
def incoming_msges():
msges = driver.find_elements_by_class_name("message-in")
msgq = []
tq = []
try:
for msg in msges:
txt_msg = msg.find_elements_by_class_name("copyable-text")
time = msg.find_elements_by_class_name("_18lLQ")
for t in time:
tq.append(t.text.lower())
for txt in txt_msg:
msgq.append(txt.text.lower())
msgq = msgq[-1]
tq = tq[-1]
if len(msgq) > 0:
return (msgq, tq)
except StaleElementReferenceException:
pass
return False
Previously, I didn't add the time check thing, and the message sent would be saved, with this loop continuously running such that even if the other party sent the same thing again, the code would not recognise it as a new message because it thinks it's the same one as before. So now, the problem is that my code is super time consuming and I have no idea how to speed it up. I tried doing this:
def incoming_msges():
msges = browser.find_elements_by_class_name("message-in")
try:
msg = msges[-1]
txt_msg = msg.find_element_by_xpath("/span[#class=\"copyable-text\"]").text.lower()
time = msg.find_element_by_xpath("/span[#class=\"_18lLQ\"]").text.lower()
return (txt_msg, time)
except Exception:
pass
return False
However, like this, the code just doesn't find any messages. I have gotten the elements' types and classes correct according to the whatsapp web website but it just doesn't run. What's the correct way of rewriting my first code block as it is still correct? Thanks in advance.

First thing first ...
I definitely recommend using API ... Because what you are trying to do here is to reinvent the wheel. API has the power of telling you if there is a change in your status and you can queue these changes ... So I definitely recommend to use API ... It might be hard at the beginning, but trust me, its worth it ...
Next I would recommend you to use normal variable names. msges msgq tq (these are kindof unreadable and I still dont get what they are supposed to be after reading the code twice ...)
But to your speed problem ... "try - catch (aka except)" blocks are really heavy on a performance ... I would recommend to use safe programming if possible (20 if statements might be faster, but might not a same time) ... Also I think you are kind of unaware of a python language (atleast from what i can see here)
msgq = msgq[-1] # you are telling it to take the last element and change array variable to string .. to be more specific...
msgq ([1,2,3,4]) = msgq[-1] (4) will result to -> msgq = 4 (which in my option hits you performance as well)
tq = tq[-1] # same here
This would be better :)
if len(msgq[-1]) > 0:
return (msgq[-1], tq[-1])
If I understand your code correctly, you are trying to scrape the messages, but if its like you are saying that you want to make auto-reply bot, I would recommend you to eighter get ready for some JS magic or switch tool. I personally noticed that the selenium has a problem with dynamic content ... to be more specific ... once its at the end of the file it does not scrape it again ... so if you do not want to auto refresh every 5-10 seconds to get the latest HTML file, I recommend eighter to create this bot in JS (that will trigger everytime that an element changes) or use the API and use selenium just for responses. I was told that Selenium was created to simulate the common user to check if user interface works as it should (if buttons exists, if the website contains all what it should etc.) ... I think that selenium is for this job something like a flower small sponge for a car clean ... you can do it ... buts gonna cost you alot of time and you might miss some spots (like you missed those messages) ...
Lastly ... the work with strings in general is really costly. you are doing O(n^2) of operations in a try block ... which i can imagine can be really costly ... if its possible, I would reduce the number of inner for loops.
I wish you good luck in this project and I hope you find the answer you seek, while I hope my answer was at least a little helpful.

Related

Get the duration of a URL-based Media object with python-vlc - Cannot parse

I'm trying to use the python 2.7 python-vlc to parse then get the duration of a music track from a URL. Parsing doesn't work and playing then pausing the media returns -1 for the duration occasionally.
There are two ways I know of to parse media, which has to be done before using media.get_duration(). I can parse it, or I can play it.
No matter what, I cannot parse the media. Using parse_with_options() gives me parsed status MediaParsedStatus.skipped for everything except for parse_with_option(1,0)which gives me parsed status MediaParsedStatus.FIXME_(0L)
p = vlc.MediaPlayer(songurl)
media = p.get_media()
media.parse_with_options(1, 0)
print media.get_parsed_status()
print media.get_duration()
The string "songurl" is the actual streaming URL of a song from Youtube or Google Play Music, which works perfectly fine with the MediaPlayer.
I have also tried playing the media for short 0.01 to 0.5 second periods then attempting to get the time, which works MOST OF THE TIME but randomly returns a duration of -1 about 1 in 10 times. Using media.get_duration() again returns the same result.
I would prefer to just parse the song rather than worry about playing it, but I can't figure out any way to parse it.
I already submitted a bug report to the python-vlc github since I figure MediaParsedStatus.FIXME_(0L) is some sort of bug.
UPDATE: I GOT IT! This was possibly the biggest pain in all my programming career (which isnt much). Here's the code used to get the time for a URL track:
instance = vlc.Instance()
media = instance.media_new(songurl)
player = instance.media_player_new()
player.set_media(media)
#Start the parser
media.parse_with_options(1,0)
while True:
if str(media.get_parsed_status()) == 'MediaParsedStatus.done':
break #Might be a good idea to add a failsafe in here.
print media.get_duration()
media.parse_with_options is asynchronous. So your code isn't waiting for a response from URL, it's just immediately moving on. As with all asynchronous methods, you need to receive a notification that the data has been received and then you can move on. In this case it looks like it is the MediaParsedChanged event.
https://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc__media.html#ga55f5a33e22aa32e17a9bb75decd1497b
Alternatively, you should be able to use the parse() method which is synchronous and will block until the meta data is received. This isn't recommended (and it's deprecated) because it could block indefinitely and lock up. But it is an option depending on what you are using the code for.
https://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc__media.html#ga4b71084fb35b3dd8cc6457a4d27baf0c
EDIT:
If you need an example of using the event manager with the python bindings, here is a great example:
VLC Python EventManager callback type?
Particularly, look at Rolf's answer as the way he is using it might be a good starting point for you.
import vlc
parseReady = 0
def ParseReceived(event):
global parseReady
#set a flag that your data is ready
parseReady = 1
...
events = player.event_manager()
events.event_attach(vlc.EventType.MediaParsedChanged, ParseReceived)
...
parseReady = 0
media.parse_with_options(1, 0)
while parseReady == 0:
#TODO: spin something to waste time
#Once the flag is set, your data is ready
print media.get_parsed_status()
print media.get_duration()
There are undoubtedly better ways to do it but that's a basic example. Note, according to the documentation, you can not call vlc methods from within an event callback. Thus the use of a simple flag rather that calling the media methods directly in the callback.
libvlc will not parse network resources by default. You need to call parse with options with libvlc_media_parse_network.

Endless while loop and old removed code (popup)... well "popping up."

I'm struggling to troubleshoot a strange problem I've been having since starting to use Sikuli over multiple projects. I've been using the IDE and later tried to branch out due to having strange things happening with code. If I were to debug code earlier with popups, I can save the code, even restart my pc, even check the code in other text editors but the now non-existent popups (and old code) sometimes, well, pop up. In the end normally I end up ditching original files, and having to sometimes strangely comment out lines and re-add them one at a time (even though in the grand scale of things the end script was the same as before i did all that). I'm at a real loss for words.
It's making me struggle to differentiate between bad code and something going wrong elsewhere. Does anyone know what might cause this "phantom code"? Because I'm really at a loss.
And i would like advice as to what's going wrong with the while i < (inputvariable). I can't figure out what might be going wrong at all, am i over looking something?
I'm running all scripts through Sikuli IDE at the moment. I did want to learn how to write scripts and include sikuli hoping i could package it neatly but i couldn't seem to wrap my head around it.
For the while loop, where it's being compared to "SSLoops" i can't see why it's not breaking out of the loop when the criteria is met. (prints out above and beyond the number.)
I've had to do strange workarounds such as commenting out whole sections of code, trying to get it to work, and then slowly one by one reintroduce it till it matched the old script entirely. If I copied the script to a new file to make a cleaner copy, in hopes that if there is some sort of caching issue(?) it'd resolve, but I'd normally have to tinker around with it again.
BP = getBundlePath()
print(BP)
setBundlePath(BP + "\images")
BP2 = getBundlePath()
print(BP2)
# Regions
gameRegion = Region(230, 138, 1442, 875)
matchSpeedRegion = Region(1282, 920, 162, 91)
rewardRegion = Region()
def main():
SSLoops = input("How many times would you like to run Super Smash?")
SuperSmash(SSLoops)
def SuperSmash(SSLoops):
print(SSLoops)
i = 1
while i < SSLoops:
print(i)
print(SSLoops)
if exists("btnEnterSuperSmash.PNG"):
click("btnEnterSuperSmash.PNG")
while True:
if exists("btnReward.png"):
print("Completed! On to Rewards.")
#selectRewards()
break
else:
pass
if matchSpeedRegion.exists("btnStart.png"):
matchSpeedRegion.click("btnStart.png")
matchSpeedRegion.wait("btnRetreat.png", 3600)
if matchSpeedRegion.exists("btnSpeedUp.png"):
matchSpeedRegion.click("btnSpeedUp.png")
print("clicked x1")
print("clicking retreat")
matchSpeedRegion.click("btnRetreat.png")
matchSpeedRegion.wait(Pattern("btnRetreat.png").similar(0.65), 3600)
print("clicking okay")
gameRegion.click("btnOK.png")
wait(2)
gameRegion.wait("btnOK.png", 3600)
gameRegion.click("btnOK.png")
print("Completed!")
i = i + 1
if __name__ == '__main__':
main()
I have been getting popups saying "hey" because i had a loop in while true btnRewards to run a function to say "hey" - this would have hopefully on on to pick a reward out of 5 images in the end. But after removing it, as i'm trying to trouble shoot the main loop, it still pops up.
The loop that compares the user input variable to i just keeps on increasing. The indentation looks okay to me? but i must be wrong? or is something else making it go wrong?
I've been making the program run on a folder so the pictures to break the loop are immediately up, so it should have theoretically ran the amount of times inputted without anything else (1). Any help is deeply appreciated.
====
1
1
1
[log] CLICK on L[889,656]#S(0) (568 msec)
Completed! On to Rewards.
Completed!
2
1
[log] CLICK on L[889,656]#S(0) (565 msec)
Completed! On to Rewards.
Completed!
3
1
[log] CLICK on L[889,656]#S(0) (584 msec)
Completed! On to Rewards.
Completed!
4
1
====
Your problem: input() returns a string like so "4"
you then compare it using
while i < SSLoops:
which is always True and hence the loop does not end.
using
SSLoops = int(input("How many times would you like to run Super Smash?")) will solve your problem.
Be aware: this will crash if the given input cannot be converted to an integer value.
Suggestion: debug prints should look like so:
print "SSLoops =", SSLoops
so the output is better readable.
RaiMan from SikuliX (greetings to your cat ;-)

How to transfer a value from a function in one script to another script without re-running the function (python)?

I'm really new to programming in general and very inexperienced, and I'm learning python as I think it's more simple than other languages. Anyway, I'm trying to use Flask-Ask with ngrok to program an Alexa skill to check data online (which changes a couple of times per hour). The script takes four different numbers (from a different URL) and organizes it into a dictionary, and uses Selenium and phantomjs to access the data.
Obviously, this exceeds the 8-10 second maximum runtime for an intent before Alexa decides that it's taken too long and returns an error message (I know its timing out as ngrok and the python log would show if an actual error occurred, and it invariably occurs after 8-10 seconds even though after 8-10 seconds it should be in the middle of the script). I've read that I could just reprompt it, but I don't know how and that would only give me 8-10 more seconds, and the script usually takes about 25 seconds just to get the data from the internet (and then maybe a second to turn it into a dictionary).
I tried putting the getData function right after the intent that runs when the Alexa skill is first invoked, but it only runs when I initialize my local server and just holds the data for every new Alexa session. Because the data changes frequently, I want it to perform the function every time I start a new session for the skill with Alexa.
So, I decided just to outsource the function that actually gets the data to another script, and make that other script run constantly in a loop. Here's the code I used.
import time
def getData():
username = '' #username hidden for anonymity
password = '' #password hidden for anonymity
browser = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')
browser.get("https://gradebook.com") #actual website name changed
browser.find_element_by_name("username").clear()
browser.find_element_by_name("username").send_keys(username)
browser.find_element_by_name("password").clear()
browser.find_element_by_name("password").send_keys(password)
browser.find_element_by_name("password").send_keys(Keys.RETURN)
global currentgrades
currentgrades = []
gradeids = ['2018202', '2018185', '2018223', '2018626', '2018473', '2018871', '2018886']
for x in range(0, len(gradeids)):
try:
gradeurl = "https://www.gradebook.com/grades/"
browser.get(gradeurl)
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:3]
if grade[2] != "%":
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:4]
if grade[1] == "%":
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:1]
currentgrades.append(grade)
except Exception:
currentgrades.append('No assignments found')
continue
dictionary = {"class1": currentgrades[0], "class2": currentgrades[1], "class3": currentgrades[2], "class4": currentgrades[3], "class5": currentgrades[4], "class6": currentgrades[5], "class7": currentgrades[6]}
return dictionary
def run():
dictionary = getData()
time.sleep(60)
That script runs constantly and does what I want, but then in my other script, I don't know how to just call the dictionary variable. When I use
from getdata.py import dictionary
in the Flask-ask script it just runs the loop and constantly gets the data. I just want the Flask-ask script to take the variable defined in the "run" function and then use it without running any of the actual scripts defined in the getdata script, which have already run and gotten the correct data. If it matters, both scripts are running in Terminal on a MacBook.
Is there any way to do what I'm asking about, or are there any easier workarounds? Any and all help is appreciated!
It sounds like you want to import the function, so you can run it; rather than importing the dictionary.
try deleting the run function and then in your other script
from getdata import getData
Then each time you write getData() it will run your code and get a new up-to-date dictionary.
Is this what you were asking about?
This issue has been resolved.
As for the original question, I didn't figure out how to make it just import the dictionary instead of first running the function to generate the dictionary. Furthermore, I realized there had to be a more practical solution than constantly running a script like that, and even then not getting brand new data.
My solution was to make the script that gets the data start running at the same time as the launch function. Here was the final script for the first intent (the rest of it remained the same):
#ask.intent("start_skill")
def start_skill():
welcome_message = 'What is the password?'
thread = threading.Thread(target=getData, args=())
thread.daemon = True
thread.start()
return question(welcome_message)
def getData():
#script to get data here
#other intents and rest of script here
By design, the skill requested a numeric passcode to make sure I was the one using it before it was willing to read the data (which was probably pointless, but this skill is at least as much for my own educational reasons as for practical reasons, so, for the extra practice, I wanted this to have as many features as I could possibly justify). So, by the time you would actually be able to ask for the data, the script to get the data will have finished running (I have tested this and it seems to work without fail).

Extra Characthers showing up after peeking at an MSMQ message

I am in the process of upgrading an older legacy system that is using Biztalk, MSMQs, Java, and python.
Currently, I am trying to upgrade a particular piece of the project which when complete will allow me to begin an in-place replacement of many of the legacy systems.
What I have done so far is recreate the legacy system in a newer version of Biztalk (2010) and on a machine that isn't on its last legs.
Anyway, the problem I am having is that there is a piece of Python code that picks up a message from an MSMQ and places it on another server. This code has been in place on our legacy system since 2004 and has worked since then. As far as I know, has never been changed.
Now when I rebuilt this, I started getting errors in the remote server and, after checking a few things out and eliminating many possible problems, I have established that the error occurs somewhere around the time the Python code is picking up from the MSMQ.
The error can be created using just two messages. Please note that I am using sample XMls here as the actual ones are pretty long.
Message one:
<xml>
<field1>Text 1</field1>
<field2>Text 2</field2>
</xml>
Message two:
<xml>
<field1>Text 1</field1>
</xml>
Now if I submit message one followed by message two to the MSMQ, they both appear correctly on the queue. If I then call the Python script, message one is returned correctly but message two gains extra characters.
Post-Python message two:
<xml>
<field1>Text 1</field1>
</xml>1>Te
I thought at first that there might have been scoping problems within the Python code but I have gone through that as well as I can and found none. However, I must admit that the first time that I've looked seriously at Python code is this project.
The Python code first peeks at a message and then receives it. I have been able to see the message when the script peeks and it has the same error message as when it receives.
Also, this error only shows up when going from a longer message to a shorter message.
I would welcome any suggestions of things that might be wrong, or things I could do to identify the problem.
I have googled and searched and gone a little crazy. This is holding an entire project up, as we can't begin replacing the older systems with this piece in place to act as a new bridge.
Thanks for taking the time to read through my problem.
Edit: Here's the relevant Python code:
import sys
import pythoncom
from win32com.client import gencache
msmq = gencache.EnsureModule('{D7D6E071-DCCD-11D0-AA4B-0060970DEBAE}', 0, 1, 0)
def Peek(queue):
qi = msmq.MSMQQueueInfo()
qi.PathName = queue
myq = qi.Open(msmq.constants.MQ_PEEK_ACCESS,0)
if myq.IsOpen:
# Don't loose this pythoncom.Empty thing (it took a while)
tmp = myq.Peek(pythoncom.Empty, pythoncom.Empty, 1)
myq.Close()
return tmp
The function calls this piece of code. I don't have access to the code that calls this until Monday, but the call is basically:
msg= MSMQ.peek()
2nd Edit.
I am attaching the first half of the script. this basically loops around
import base64, xmlrpclib, time
import MSMQ, Config, Logger
import XmlRpcExt,os,whrandom
QueueDetails = Config.InQueueDetails
sleeptime = Config.SleepTime
XMLRPCServer = Config.XMLRPCServer
usingBase64 = Config.base64ing
version=Config.version
verbose=Config.verbose
LogO = Logger.Logger()
def MSMQToIAMS():
# moved svr cons out of daemon loop
LogO.LogP(version)
svr = xmlrpclib.Server(XMLRPCServer, XmlRpcExt.getXmlRpcTransport())
while 1:
GotOne = 0
for qd in QueueDetails:
queue, agency, messagetype = qd
#LogO.LogD('['+version+"] Searching queue %s for messages"%queue)
try:
msg=MSMQ.Peek(queue)
except Exception,e:
LogO.LogE("Peeking at \"%s\" : %s"%(queue, e))
continue
if msg:
try:
msg = msg.__call__().encode('utf-8')
except:
LogO.LogE("Could not convert massege on \"%s\" to a string, leaving it on queue"%queue)
continue
if verbose:
print "++++++++++++++++++++++++++++++++++++++++"
print msg
print "++++++++++++++++++++++++++++++++++++++++"
LogO.LogP("Found Message on \"%s\" : \"%s...\""%(queue, msg[:40]))
try:
rv = svr.accept(msg, agency, messagetype)
if rv[0] != "OK":
raise Exception, rv[0]
LogO.LogP('Message has been sent successfully to IAMS from %s'%queue)
MSMQ.Receive(queue)
GotOne = 1
StoreMsg(msg)
except Exception, e:
LogO.LogE("%s"%e)
if GotOne == 0:
time.sleep(sleeptime)
else:
gotOne = 0
This is the full code that calls MSMQ. Creates a little program that watches MSMQ and when a message arrives picks it up and sends it off to another server.
Sounds really Python-specific (of which I know nothing) rather then MSMQ-specific. Isn't this just a case of a memory variable being used twice without being cleared in between? The second message is shorter than the first so there are characters from the first not being overwritten. What do the relevant parts of the Python code look like?
[[21st April]]
The code just shows you are populating the tmp variable with a message. What happens to tmp before the next message is accessed? I'm assuming it is not cleared.

Dragon NaturallySpeaking Programmers

Is there anyway to encorporate Dragon NaturallySpeaking into an event driven program? My boss would really like it if I used DNS to record user voice input without writing it to the screen and saving it directly to XML. I've been doing research for several days now and I can not see a way for this to happen without the (really expensive) SDK, I don't even know that it would work then.
Microsoft has the ability to write a (Python) program where it's speech recognizer can wait until it detects a speech event and then process it. It also has the handy quality of being able to suggest alternative phrases to the one that it thinks is the best guess and recording the .wav file for later use. Sample code:
spEngine = MsSpeech()
spEngine.setEventHandler(RecoEventHandler(spEngine.context))
class RecoEventHandler(SpRecoContext):
def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
res = win32com.client.Dispatch(Result)
phrase = res.PhraseInfo.GetText()
#from here I would save it as XML
# write reco phrases
altPhrases = reco.Alternates(NBEST)
for phrase in altPhrases:
nodePhrase = self.doc.createElement(TAG_PHRASE)
I can not seem to make DNS do this. The closest I can do-hickey it to is:
while keepGoing == True:
yourWords = raw_input("Your input: ")
transcript_el = createTranscript(doc, "user", yourWords)
speech_el.appendChild(transcript_el)
if yourWords == 'bye':
break
It even has the horrible side effect of making the user say "new-line" after every sentence! Not the preferred solution at all! Is there anyway to make DNS do what Microsoft Speech does?
FYI: I know the logical solution would be to simply switch to Microsoft Speech but let's assume, just for grins and giggles, that that is not an option.
UPDATE - Has anyone bought the SDK? Did you find it useful?
Solution: download Natlink - http://qh.antenna.nl/unimacro/installation/installation.html
It's not quite as flexible to use as SAPI but it covers the basics and I got almost everything that I needed out of it. Also, heads up, it and Python need to be downloaded for all users on your machine or it won't work properly and it works for every version of Python BUT 2.4.
Documentation for all supported commands is found under C:\NatLink\NatLink\MiscScripts\natlink.txt after you download it. It's under all the updates at the top of the file.
Example code:
#make sure DNS is running before you start
if not natlink.isNatSpeakRunning():
raiseError('must start up Dragon NaturallySpeaking first!')
shutdownServer()
return
#connect to natlink and load the grammer it's supposed to recognize
natlink.natConnect()
loggerGrammar = LoggerGrammar()
loggerGrammar.initialize()
if natlink.getMicState() == 'off':
natlink.setMicState('on')
userName = 'Danni'
natlink.openUser(userName)
#natlink.waitForSpeech() continuous loop waiting for input.
#Results are sent to gotResultsObject method of the logger grammar
natlink.waitForSpeech()
natlink.natDisconnect()
The code's severely abbreviated from my production version but I hope you get the idea. Only problem now is that I still have to returned to the mini-window natlink.waitForSpeech() creates to click 'close' before I can exit the program safely. A way to signal the window to close from python without using the timeout parameter would be fantastic.

Categories