python threading as a way to complete a script that allways crashes

python threading as a way to complete a script that allways crashes - python

I've been struggling for many days now with a class PublicationSaver() that I wrote that has a method for loading xml documents as strings (not shown here) and then it passes each loaded string to self.savePublication(self, publication, myDirPath).
Every time I have used it crashed after about 25.000 strings and it saves the last string on which it crashes, I was able parse that string separately so I suppose that the problem is not bad XML.
I asked here but no answers.
I goggled a lot and it seems that I'm not the only one having this problem: here
So, since I really need to complete this task, I thought this: can I wrap all with a Thread set in main, so that when lxml parse throws an exception I get it and send a result to main to kill the thread and start it again?
#threading
result_q = Queue.Queue()
# Create the thread
xmlSplitter = XmlSplitter_Thread(result_q=result_q)
xmlSplitter.run(toSplit_DirPath, target_DirPath)
print "Hello !!!\n"
toSplitDirEmptyB=False
while not toSplitDirEmptyB:
splitterAlive=True
while splitterAlive:
sleep(120)
splitterAlive=result_q.get()
xmlSplitter.join()
print "*** KILLED XmlSplitter_Thread !!! ***\n"
if not os.listdir(toSplit_DirPath):
toSplitDirEmptyB=True
else:
xmlSplitter.run(toSplit_DirPath, target_DirPath)
Is this a valid approach ? When I run the code above at the moment is not working; I mean I never get the "Hello !!" displayed and the xmlSplitter just keep going even when it starts to fail (there's an exception rule that keeps it going).

Probably the thread fails and its blocking on join method. take a look here . Split the xml into chunks and try to parse the chunk to avoid memory errors.

Related

Reading from a running process/Python app, that is printing to console? [duplicate]

This question already has answers here:
Pipe output(stdout) from running process Win32Api
(3 answers)
Closed 7 years ago.
How can i read the output of a running console process ? i found a snippet that shows how to do it for a starting process by using ReadFile() on the process handle obtained by CreateProcess(), but my question is, how can i achieve this for a running process ? thanks.
What I have tried is, using OpenProcess() on the Console app (i hardcoded the pid just to test) and then i used ReadFile() on it, but i get gibbrish letters or not showing me anything at all.
Edit: Here's the code i tried, PID is hardcoded just for test.
procedure TForm1.Button1Click(Sender: TObject);
var
hConsoleProcess: THandle;
Buffer: Array[0..512] of ansichar;
MyBuf: Array[0..512] of ansichar;
bytesReaded: DWORD;
begin
hConsoleProcess := OpenProcess(PROCESS_ALL_ACCESS, False, 6956);
ReadFile(hConsoleProcess, Buffer, sizeof(Buffer), bytesReaded, nil);
OemToCharA(Buffer, MyBuf);
showmessage(string(MyBuf));
// ShellExecute(Handle, 'open', 'cmd.exe', '/k ipconfig', nil, SW_SHOWNORMAL);
end;

It's unrealistic to expect to be able to do this. Perhaps it possible to hack it, but no good will come of doing so. You could inject into the process, obtain its standard output handle with GetStdHandle. And read from that. But no good will come of that, as I said.
Why will no good come of this? Well, standard input/output is designed for a single reader, and a single writer. If you have two readers, then one, or both, of the readers are going to miss some of the text. In fact I'd be surprised if two blocking synchronous calls to ReadFile were allowed by the system. I'd expect the second one to fail. [Rob's comment explains that this is allowed, but it's more like first come, first served.]
What you could perhaps do is to create a multi-casting program to listen to the output of the main program. Pipe the output of the main program into the multi-caster. Have the multi-caster echo to its standard output and to one or more other pipes.
The bottom line here is that whatever your actual problem is, hooking up multiple readers to the standard out is not the solution.

Segmentation fault when initializing array

I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api

This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library

To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.

How to transfer a value from a function in one script to another script without re-running the function (python)?

I'm really new to programming in general and very inexperienced, and I'm learning python as I think it's more simple than other languages. Anyway, I'm trying to use Flask-Ask with ngrok to program an Alexa skill to check data online (which changes a couple of times per hour). The script takes four different numbers (from a different URL) and organizes it into a dictionary, and uses Selenium and phantomjs to access the data.
Obviously, this exceeds the 8-10 second maximum runtime for an intent before Alexa decides that it's taken too long and returns an error message (I know its timing out as ngrok and the python log would show if an actual error occurred, and it invariably occurs after 8-10 seconds even though after 8-10 seconds it should be in the middle of the script). I've read that I could just reprompt it, but I don't know how and that would only give me 8-10 more seconds, and the script usually takes about 25 seconds just to get the data from the internet (and then maybe a second to turn it into a dictionary).
I tried putting the getData function right after the intent that runs when the Alexa skill is first invoked, but it only runs when I initialize my local server and just holds the data for every new Alexa session. Because the data changes frequently, I want it to perform the function every time I start a new session for the skill with Alexa.
So, I decided just to outsource the function that actually gets the data to another script, and make that other script run constantly in a loop. Here's the code I used.
import time
def getData():
username = '' #username hidden for anonymity
password = '' #password hidden for anonymity
browser = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')
browser.get("https://gradebook.com") #actual website name changed
browser.find_element_by_name("username").clear()
browser.find_element_by_name("username").send_keys(username)
browser.find_element_by_name("password").clear()
browser.find_element_by_name("password").send_keys(password)
browser.find_element_by_name("password").send_keys(Keys.RETURN)
global currentgrades
currentgrades = []
gradeids = ['2018202', '2018185', '2018223', '2018626', '2018473', '2018871', '2018886']
for x in range(0, len(gradeids)):
try:
gradeurl = "https://www.gradebook.com/grades/"
browser.get(gradeurl)
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:3]
if grade[2] != "%":
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:4]
if grade[1] == "%":
grade = browser.find_element_by_id("currentStudentGrade[]").get_attribute('innerHTML').encode('utf8')[0:1]
currentgrades.append(grade)
except Exception:
currentgrades.append('No assignments found')
continue
dictionary = {"class1": currentgrades[0], "class2": currentgrades[1], "class3": currentgrades[2], "class4": currentgrades[3], "class5": currentgrades[4], "class6": currentgrades[5], "class7": currentgrades[6]}
return dictionary
def run():
dictionary = getData()
time.sleep(60)
That script runs constantly and does what I want, but then in my other script, I don't know how to just call the dictionary variable. When I use
from getdata.py import dictionary
in the Flask-ask script it just runs the loop and constantly gets the data. I just want the Flask-ask script to take the variable defined in the "run" function and then use it without running any of the actual scripts defined in the getdata script, which have already run and gotten the correct data. If it matters, both scripts are running in Terminal on a MacBook.
Is there any way to do what I'm asking about, or are there any easier workarounds? Any and all help is appreciated!

It sounds like you want to import the function, so you can run it; rather than importing the dictionary.
try deleting the run function and then in your other script
from getdata import getData
Then each time you write getData() it will run your code and get a new up-to-date dictionary.
Is this what you were asking about?

This issue has been resolved.
As for the original question, I didn't figure out how to make it just import the dictionary instead of first running the function to generate the dictionary. Furthermore, I realized there had to be a more practical solution than constantly running a script like that, and even then not getting brand new data.
My solution was to make the script that gets the data start running at the same time as the launch function. Here was the final script for the first intent (the rest of it remained the same):
#ask.intent("start_skill")
def start_skill():
welcome_message = 'What is the password?'
thread = threading.Thread(target=getData, args=())
thread.daemon = True
thread.start()
return question(welcome_message)
def getData():
#script to get data here
#other intents and rest of script here
By design, the skill requested a numeric passcode to make sure I was the one using it before it was willing to read the data (which was probably pointless, but this skill is at least as much for my own educational reasons as for practical reasons, so, for the extra practice, I wanted this to have as many features as I could possibly justify). So, by the time you would actually be able to ask for the data, the script to get the data will have finished running (I have tested this and it seems to work without fail).

python simple threading won't ends without doing anything (maybe)

When i run the following code (using "sudo python servers.py") the process seem to just finish immediately with just printing "test".
why doesn't the functions "proxy_server" won't run ? or maybe they do but i do not realize that. (because the first line in proxy function doesn't print anything)
this is an impotent code, i didn't want to put unnecessary content, yet it still demonstrate my problem:
import os,sys,thread,socket,select,struct,time
HTTP_PORT = 80
FTP_PORT=21
FTP_DATA_PORT = 20
IP_IN = '10.0.1.3'
IP_OUT = '10.0.3.3'
sys_http = 'http_proxy'
sys_ftp = 'ftp_proxy'
sys_ftp_data = 'ftp_data_proxy'
def main():
try:
thread.start_new_thread(proxy_server, (HTTP_PORT, IP_IN,sys_http,http_handler))
thread.start_new_thread(proxy_server, (FTP_PORT, IP_IN,sys_ftp,http_handler))
thread.start_new_thread(proxy_server, (FTP_DATA_PORT, IP_OUT,sys_ftp_data,http_handler))
print "test"
except e:
print 'Error!'
sys.exit(1)
def proxy_server(host,port,fileName,handler):
print "Proxy Server Running on ",host,":",port
def http_handler(src,sock):
return ''
if __name__ == '__main__':
main()
What am i missing or doing wrong ?

First, you have indentation problems related to using mixed tabs and spaces for indentation. While they didn't cause your code to misbehave in this particular case, they will cause you problems later if you don't stick to consistently using one or the other. They've already broken the displayed indentation in your question; see the print "test" line in main, which looks misaligned.
Second, instead of the low-level thread module, you should be using threading. Your problem is occurring because, as documented in the thread module documentation,
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executing try ... finally clauses or executing object destructors.
threading threads let you explicitly define whether other threads should survive the death of the main thread, and default to surviving. In general, threading is much easier to use correctly.

Extra Characthers showing up after peeking at an MSMQ message

I am in the process of upgrading an older legacy system that is using Biztalk, MSMQs, Java, and python.
Currently, I am trying to upgrade a particular piece of the project which when complete will allow me to begin an in-place replacement of many of the legacy systems.
What I have done so far is recreate the legacy system in a newer version of Biztalk (2010) and on a machine that isn't on its last legs.
Anyway, the problem I am having is that there is a piece of Python code that picks up a message from an MSMQ and places it on another server. This code has been in place on our legacy system since 2004 and has worked since then. As far as I know, has never been changed.
Now when I rebuilt this, I started getting errors in the remote server and, after checking a few things out and eliminating many possible problems, I have established that the error occurs somewhere around the time the Python code is picking up from the MSMQ.
The error can be created using just two messages. Please note that I am using sample XMls here as the actual ones are pretty long.
Message one:
<xml>
<field1>Text 1</field1>
<field2>Text 2</field2>
</xml>
Message two:
<xml>
<field1>Text 1</field1>
</xml>
Now if I submit message one followed by message two to the MSMQ, they both appear correctly on the queue. If I then call the Python script, message one is returned correctly but message two gains extra characters.
Post-Python message two:
<xml>
<field1>Text 1</field1>
</xml>1>Te
I thought at first that there might have been scoping problems within the Python code but I have gone through that as well as I can and found none. However, I must admit that the first time that I've looked seriously at Python code is this project.
The Python code first peeks at a message and then receives it. I have been able to see the message when the script peeks and it has the same error message as when it receives.
Also, this error only shows up when going from a longer message to a shorter message.
I would welcome any suggestions of things that might be wrong, or things I could do to identify the problem.
I have googled and searched and gone a little crazy. This is holding an entire project up, as we can't begin replacing the older systems with this piece in place to act as a new bridge.
Thanks for taking the time to read through my problem.
Edit: Here's the relevant Python code:
import sys
import pythoncom
from win32com.client import gencache
msmq = gencache.EnsureModule('{D7D6E071-DCCD-11D0-AA4B-0060970DEBAE}', 0, 1, 0)
def Peek(queue):
qi = msmq.MSMQQueueInfo()
qi.PathName = queue
myq = qi.Open(msmq.constants.MQ_PEEK_ACCESS,0)
if myq.IsOpen:
# Don't loose this pythoncom.Empty thing (it took a while)
tmp = myq.Peek(pythoncom.Empty, pythoncom.Empty, 1)
myq.Close()
return tmp
The function calls this piece of code. I don't have access to the code that calls this until Monday, but the call is basically:
msg= MSMQ.peek()
2nd Edit.
I am attaching the first half of the script. this basically loops around
import base64, xmlrpclib, time
import MSMQ, Config, Logger
import XmlRpcExt,os,whrandom
QueueDetails = Config.InQueueDetails
sleeptime = Config.SleepTime
XMLRPCServer = Config.XMLRPCServer
usingBase64 = Config.base64ing
version=Config.version
verbose=Config.verbose
LogO = Logger.Logger()
def MSMQToIAMS():
# moved svr cons out of daemon loop
LogO.LogP(version)
svr = xmlrpclib.Server(XMLRPCServer, XmlRpcExt.getXmlRpcTransport())
while 1:
GotOne = 0
for qd in QueueDetails:
queue, agency, messagetype = qd
#LogO.LogD('['+version+"] Searching queue %s for messages"%queue)
try:
msg=MSMQ.Peek(queue)
except Exception,e:
LogO.LogE("Peeking at \"%s\" : %s"%(queue, e))
continue
if msg:
try:
msg = msg.__call__().encode('utf-8')
except:
LogO.LogE("Could not convert massege on \"%s\" to a string, leaving it on queue"%queue)
continue
if verbose:
print "++++++++++++++++++++++++++++++++++++++++"
print msg
print "++++++++++++++++++++++++++++++++++++++++"
LogO.LogP("Found Message on \"%s\" : \"%s...\""%(queue, msg[:40]))
try:
rv = svr.accept(msg, agency, messagetype)
if rv[0] != "OK":
raise Exception, rv[0]
LogO.LogP('Message has been sent successfully to IAMS from %s'%queue)
MSMQ.Receive(queue)
GotOne = 1
StoreMsg(msg)
except Exception, e:
LogO.LogE("%s"%e)
if GotOne == 0:
time.sleep(sleeptime)
else:
gotOne = 0
This is the full code that calls MSMQ. Creates a little program that watches MSMQ and when a message arrives picks it up and sends it off to another server.

Sounds really Python-specific (of which I know nothing) rather then MSMQ-specific. Isn't this just a case of a memory variable being used twice without being cleared in between? The second message is shorter than the first so there are characters from the first not being overwritten. What do the relevant parts of the Python code look like?
[[21st April]]
The code just shows you are populating the tmp variable with a message. What happens to tmp before the next message is accessed? I'm assuming it is not cleared.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.