I'm trying to use a unix named pipe to output statistics of a running service. I intend to provide a similar interface as /proc where one can see live stats by catting a file.
I'm using a code similar to this in my python code:
while True:
f = open('/tmp/readstatshere', 'w')
f.write('some interesting stats\n')
f.close()
/tmp/readstatshere is a named pipe created by mknod.
I then cat it to see the stats:
$ cat /tmp/readstatshere
some interesting stats
It works fine most of the time. However, if I cat the entry several times in quick successions, sometimes I get multiple lines of some interesting stats instead of one. Once or twice, it has even gone into an infinite loop printing that line forever until I killed it. The only fix that I've got so far is to put a delay of let's say 500ms after f.close() to prevent this issue.
I'd like to know why exactly this happens and if there is a better way of dealing with it.
Thanks in advance
A pipe is simply the wrong solution here. If you want to present a consistent snapshot of the internal state of your process, write that to a temporary file and then rename it to the "public" name. This will prevent all issues that can arise from other processes reading the state while you're updating it. Also, do NOT do that in a busy loop, but ideally in a thread that sleeps for at least one second between updates.
What about a UNIX socket instead of a pipe?
In this case, you can react on each connect by providing fresh data just in time.
The only downside is that you cannot cat the data; you'll have to create a new socket handle and connect() to the socket file.
MYSOCKETFILE = '/tmp/mysocket'
import socket
import os
try:
os.unlink(MYSOCKETFILE)
except OSError: pass
s = socket.socket(socket.AF_UNIX)
s.bind(MYSOCKETFILE)
s.listen(10)
while True:
s2, peeraddr = s.accept()
s2.send('These are my actual data')
s2.close()
Program querying this socket:
MYSOCKETFILE = '/tmp/mysocket'
import socket
import os
s = socket.socket(socket.AF_UNIX)
s.connect(MYSOCKETFILE)
while True:
d = s.recv(100)
if not d: break
print d
s.close()
I think you should use fuse.
it has python bindings, see http://pypi.python.org/pypi/fuse-python/
this allows you to compose answers to questions formulated as posix filesystem system calls
Don't write to an actual file. That's not what /proc does. Procfs presents a virtual (non-disk-backed) filesystem which produces the information you want on demand. You can do the same thing, but it'll be easier if it's not tied to the filesystem. Instead, just run a web service inside your Python program, and keep your statistics in memory. When a request comes in for the stats, formulate them into a nice string and return them. Most of the time you won't need to waste cycles updating a file which may not even be read before the next update.
You need to unlink the pipe after you issue the close. I think this is because there is a race condition where the pipe can be opened for reading again before cat finishes and it thus sees more data and reads it out, leading to multiples of "some interesting stats."
Basically you want something like:
while True:
os.mkfifo(the_pipe)
f = open(the_pipe, 'w')
f.write('some interesting stats')
f.close()
os.unlink(the_pipe)
Update 1: call to mkfifo
Update 2: as noted in the comments, there is a race condition in this code as well with multiple consumers.
Related
Is it possible -- other than by using something like a .txt/dummy file -- to pass a value from one program to another?
I have a program that uses a .txt file to pass a starting value to another program. I update the value in the file in between starting the program each time I run it (ten times, essentially simultaneously). Doing this is fine, but I would like to have the 'child' program report back to the 'mother' program when it is finished, and also report back what files it found to download.
Is it possible to do this without using eleven files to do it (that's one for each instance of the 'child' to 'mother' reporting, and one file for the 'mother' to 'child')? I am talking about completely separate programs, not classes or functions or anything like that.
To operate efficently, and not be waiting around for hours for everything to complete, I need the 'child' program to run ten times and get things done MUCH faster. Thus I run the child program ten times and give each program a separate range to check through.
Both programs run fine, I but would like to get them to run/report back and forth with each other and hopefully not be using file 'transmission' to accomplish the task, especially on the child-mother side of the transferring of data.
'Mother' program...currently
import os
import sys
import subprocess
import time
os.chdir ('/media/')
#find highest download video
Hival = open("Highest.txt", "r")
Histr = Hival.read()
Hival.close()
HiNext = str(int(Histr)+1)
#setup download #1
NextVal = open("NextVal.txt","w")
NextVal.write(HiNext)
NextVal.close()
#call download #1
procs=[]
proc=subprocess.Popen(['python','test.py'])
procs.append(proc)
time.sleep(2)
#setup download #2-11
Histr2 = int(Histr)/10000
Histr2 = Histr2 + 1
for i in range(10):
Hiint = str(Histr2)+"0000"
NextVal = open("NextVal.txt","w")
NextVal.write(Hiint)
NextVal.close()
proc=subprocess.Popen(['python','test.py'])
procs.append(proc)
time.sleep(2)
Histr2 = Histr2 + 1
for proc in procs:
proc.wait()
'Child' program
import urllib
import os
from Tkinter import *
import time
root = Tk()
root.title("Audiodownloader")
root.geometry("200x200")
app = Frame(root)
app.grid()
os.chdir('/media/')
Fileval = open('NextVal.txt','r')
Fileupdate = Fileval.read()
Fileval.close()
Fileupdate = int(Fileupdate)
Filect = Fileupdate/10000
Filect2 = str(Filect)+"0009"
Filecount = int(Filect2)
while Fileupdate <= Filecount:
root.title(Fileupdate)
url = 'http://www.yourfavoritewebsite.com/audio/encoded/'+str(Fileupdate)+'.mp3'
urllib.urlretrieve(url,str(Fileupdate)+'.mp3')
statinfo = os.stat(str(Fileupdate)+'.mp3')
if statinfo.st_size<10000L:
os.remove(str(Fileupdate)+'.mp3')
time.sleep(.01)
Fileupdate = Fileupdate+1
root.update_idletasks()
I'm trying to convert the original VB6 program over to Linux and make it much easier to use at the same time. Hence the lack of .mainloop being missing. This was my first real attempt at anything in Python at all hence the lack of def or classes. I'm trying to come back and finish this up after 1.5 months of doing nothing with it mostly due to not knowing how to. In research a little while ago I found this is WAY over my head. I haven't ever did anything with threads/sockets/client/server interaction so I'm purely an idiot in this case. Google anything on it and I just get brought right back here to stackoverflow.
Yes, I want 10 running copies of the program at the same time, to save time. I could do without the gui interface if it was possible for the program to report back to 'mother' so the mother could print on the screen the current value that is being searched. As well as if the child could report back when its finished and if it had any file that it downloaded successfully(versus downloaded and then erased due to being to small). I would use the successful download information to update Highest.txt for the next time the program got ran.
I think this may clarify things MUCH better...that or I don't understand the nature of using server/client interaction:) Only reason time.sleep is in the program was due to try to make sure that the files could get written before the next instance of the child program got ran. I didn't know for sure what kind of timing issue I may run into so I included those lines for safety.
This can be implemented using a simple client/server topology using the multiprocessing library. Using your mother/child terminology:
server.py
from multiprocessing.connection import Listener
# client
def child(conn):
while True:
msg = conn.recv()
# this just echos the value back, replace with your custom logic
conn.send(msg)
# server
def mother(address):
serv = Listener(address)
while True:
client = serv.accept()
child(client)
mother(('', 5000))
client.py
from multiprocessing.connection import Client
c = Client(('localhost', 5000))
c.send('hello')
print('Got:', c.recv())
c.send({'a': 123})
print('Got:', c.recv())
Run with
$ python server.py
$ python client.py
When you talk about using txt to pass information between programs, we first need to know what language you're using.
Within my knowledge of Java and Python achi viable despite laborious depensendo the amount of information that wants to work.
In python, you can use the library that comes with it for reading and writing txt and schedule execution, you can use the apscheduler.
The problem is, my script won't work (it's printing empty lane), but it works in python interactive console.
import telnetlib
tn = telnetlib.Telnet("killermud.pl", 4000)
data = tn.read_very_eager()
data = data.decode()
print(data)
tn.close()
What is the reason of such behavior?
I just took a look at the documentation for the read_very_eager method, which says:
Read all data available already queued or on the socket,
without blocking.
It is likely that at the time you call this method that there is no data "already available or queued on the socket", so you're getting nothing back. You probably want to use something like the read_until method, which will read data until it finds a specific string. For example:
data = tn.read_until('Podaj swoje imie')
According to tlnetlib documentation, Telnet.read_very_eager() Raises EOFError if connection closed and no cooked data available. Return '' if no cooked data available otherwise. Do not block unless in the midst of an IAC sequence.
If you do data=="", returns true, Therefore, it means that no cooked data is available
I've made a script to get inventory data from the Steam API and I'm a bit unsatisfied with the speed. So I read a bit about multiprocessing in python and simply cannot wrap my head around it. The program works as such: it gets the SteamID from a list, gets the inventory and then appends the SteamID and the inventory in a dictionary with the ID as the key and inventory contents as the value.
I've also understood that there are some issues involved with using a counter when multiprocessing, which is a small problem as I'd like to be able to resume the program from the last fetched inventory rather than from the beginning again.
Anyway, what I'm asking for is really a concrete example of how to do multiprocessing when opening the URL that contains the inventory data so that the program can fetch more than one inventory at a time rather than just one.
onto the code:
with open("index_to_name.json", "r", encoding=("utf-8")) as fp:
index_to_name=json.load(fp)
with open("index_to_quality.json", "r", encoding=("utf-8")) as fp:
index_to_quality=json.load(fp)
with open("index_to_name_no_the.json", "r", encoding=("utf-8")) as fp:
index_to_name_no_the=json.load(fp)
with open("steamprofiler.json", "r", encoding=("utf-8")) as fp:
steamprofiler=json.load(fp)
with open("itemdb.json", "r", encoding=("utf-8")) as fp:
players=json.load(fp)
error=list()
playerinventories=dict()
c=127480
while c<len(steamprofiler):
inventory=dict()
items=list()
try:
url=urllib.request.urlopen("http://api.steampowered.com/IEconItems_440/GetPlayerItems/v0001/?key=DD5180808208B830FCA60D0BDFD27E27&steamid="+steamprofiler[c]+"&format=json")
inv=json.loads(url.read().decode("utf-8"))
url.close()
except (urllib.error.HTTPError, urllib.error.URLError, socket.error, UnicodeDecodeError) as e:
c+=1
print("HTTP-error, continuing")
error.append(c)
continue
try:
for r in inv["result"]["items"]:
inventory[r["id"]]=r["quality"], r["defindex"]
except KeyError:
c+=1
error.append(c)
continue
for key in inventory:
try:
if index_to_quality[str(inventory[key][0])]=="":
items.append(
index_to_quality[str(inventory[key][0])]
+""+
index_to_name[str(inventory[key][1])]
)
else:
items.append(
index_to_quality[str(inventory[key][0])]
+" "+
index_to_name_no_the[str(inventory[key][1])]
)
except KeyError:
print("keyerror, uppdate def_to_index")
c+=1
error.append(c)
continue
playerinventories[int(steamprofiler[c])]=items
c+=1
if c % 10==0:
print(c, "inventories downloaded")
I hope my problem was clear, otherwise just say so obviously. I would optimally avoid using 3rd party libraries but if it's not possible it's not possible. Thanks in advance
So you're assuming the fetching of the URL might be the thing slowing your program down? You'd do well to check that assumption first, but if it's indeed the case using the multiprocessing module is a huge overkill: for I/O bound bottlenecks threading is quite a bit simpler and might even be a bit faster (it takes a lot more time to spawn another python interpreter than to spawn a thread).
Looking at your code, you might get away with sticking most of the content of your while loop in a function with c as a parameter, and starting a thread from there using another function, something like:
def process_item(c):
# The work goes here
# Replace al those 'continue' statements with 'return'
for c in range(127480, len(steamprofiler)):
thread = threading.Thread(name="inventory {0}".format(c), target=process_item, args=[c])
thread.start()
A real problem might be that there's no limit to the amount of threads being spawned, which might break the program. Also the guys at Steam might not be amused at getting hammered by your script, and they might decide to un-friend you.
A better approach would be to fill a collections.deque object with your list of c's and then start a limited set of threads to do the work:
def process_item(c):
# The work goes here
# Replace al those 'continue' statements with 'return'
def process():
while True:
process_item(work.popleft())
work = collections.deque(range(127480, len(steamprofiler)))
threads = [threading.Thread(name="worker {0}".format(n), target=process)
for n in range(6)]
for worker in threads:
worker.start()
Note that I'm counting on work.popleft() to throw an IndexError when we're out of work, which will kill the thread. That's a bit sneaky, so consider using a try...except instead.
Two more things:
Consider using the excellent Requests library instead of urllib (which, API-wise, is by far the worst module in the entire Python standard library that I've worked with).
For Requests, there's an add-on called grequests which allows you to do fully asynchronous HTTP-requests. That would have made for even simpler code.
I hope this helps, but please keep in mind this is all untested code.
The outermost while loop seems to be distributed over a few processes(or tasks).
When you break the loop into tasks, note that you are sharing playerinventories and error object between processes. You will need to use multiprocessing.Manager for the sharing issue.
I recommend you to start modifying your code from this snippet.
I will write a SSH communicator class on Python. I have telnet communicator class and I should use functions like at telnet. Telnet communicator have read_until and read_very_eager functions.
read_until : Read until a given string is encountered or until timeout.
read_very_eager : Read everything that's possible without blocking in I/O (eager).
I couldn't find these functions for SSH communicator. Any idea?
You didn't state it in the question, but I am assuming you are using Paramiko as per the tag.
read_until: Read until a given string is encountered or until timeout.
This seems like a very specialized function for a particular high level task. I think you will need to implement this one. You can set a timeout using paramiko.Channel.settimeout and then read in a loop until you get either the string you want or a timeout exception.
read_very_eager: Read everything that's possible without blocking in I/O (eager).
Paramiko doesn't directly provide this, but it does provide primitives for non-blocking I/O and you can easily put this in a loop to slurp in everything that's available on the channel. Have you tried something like this?
channel.setblocking(True)
resultlist = []
while True:
try:
chunk = channel.recv(1024)
except socket.timeout:
break
resultlist.append(chunk)
return ''.join(resultlist)
Hi there even i was searching solution for the same problem.
I think it might help you ....
one observation, tell me if you find solution.
I wont get output if i remove 6th line.
I was actually printing 6th line to know the status, later i found recv_exit_status() should be called for execution of this code.
import paramiko,sys
trans = paramiko.Transport((host, 22))
trans.connect(username = user, password = passwd)
session = trans.open_channel("session")
session.exec_command('grep -rE print .')
session.recv_exit_status()
while session.recv_ready():
temp = session.recv(1024)
print temp
1.Read until > search for the data you are searching for and break the loop
2.Read_very_eager > use the above mentioned code.
Here is a sample python script. How do I run this script multiple times from command line so that the import line is not called every time? The import statement takes too long to load.
import arcpy
val = arcpy.GetCellValue_management("D:\dem-merged\lidar_wsg84", "-95.090174910630012 29.973962146120652", "")
print str(val)
This problem has no solution if you strictly want this script "to be called from another program. by issuing 'python script.py' on command line".
If you want to do the "heavy import" only once, you have to start python script only once.
Think about starting a daemon, which will start once and then process calls from other program. This way all initialization has to be done only one time and next calls will be fast.
And if you split your python code into two parts (first part for daemon, second for daemon client), you'll be able to call 'python client.py' from another program, but actual computation will be performed by daemon, which is started just one time.
As example:
daemon.py
import socket
#import arcpy
def actual_work():
#val = arcpy.GetCellValue_management("D:\dem-merged\lidar_wsg84", "-95.090174910630012 29.973962146120652", "")
#return str(val)
return 'dummy_reply'
def main():
sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM )
try:
sock.bind( ('127.0.0.1', 6666) )
while True:
data, addr = sock.recvfrom( 4096 )
reply = actual_work()
sock.sendto(reply, addr)
except KeyboardInterrupt:
pass
finally:
sock.close()
if __name__ == '__main__':
main()
client.py
import socket
import sys
def main():
sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM )
sock.settimeout(1)
try:
sock.sendto('', ('127.0.0.1', 6666))
reply, _ = sock.recvfrom(4096)
print reply
except socket.timeout:
sys.exit(1)
finally:
sock.close()
if __name__ == '__main__':
main()
It's virtually impossible. Once you leave the interpreter, the modules that were imported are no longer in the memory. It's similar to asking Firefox to save large webpages in memory because the read rate to the cache takes too long. Once Firefox (or Python) is shut off, it's pretty much bye-bye anything in the RAM.
You can make the load time faster, but at your own risk. By running
python -O
you can make it go a bit faster. You can also add another 'O' to make it go just a bit faster. However, this can make some programs buggy and doesn't always work.
You could copy the functions you need into your program by doing
from arcpy import <what you need>
and that might make things go slightly faster.
As far as I know the module gets imported once. So if you do:
import a
import a
it only gets imported once. So instead of running the script many times, maybe you can change it to make all the copies in one go.
If you have to run this specific script many times, I think you can't avoid the import and you'll have to import it every time.
One solution I can think of is to have a server process that runs persistently that does the actual work, while the script that's actually invoked from the command line merely issues requests to that script. This is a fair bit of work, but it may be worth it.
The only solution I can think of is to copy the individual function(s) you need into your code manually, if what you need to execute is small enough.
If you need help on how to do this, just ask in the comments.
Looking at your use case (calling it from a Ruby on Rails webservice), one of the easiest ways would be to use XML-RPC. Use the SimpleXMLRPCServer from the python standard lib, and then use a ruby client (ruby seems to have xmlrpc in the standard lib)?
Easy.
Write your own simple shell using the cmd module and use the runpy module to run your scripts. Import you big module in the shell program and pass it to the programs using init_globals
Look through the docs for http://pypi.python.org/pypi/cmd2/ and it should be fairly clear how you can write your own simple shell, even if it just has two commands, one to edit a file and one to run it.
runpy is part of the Python standard library http://docs.python.org/library/runpy.html and you may not need it, but it is useful to know that the import and module loading mechanism can be controlled and even modified by your command shell.
Have you ever wondered where the name "var1" goes when you execute something like var1 = 25? How does Python find what var1 refers to when you later execute print var1? The answer is that these names are in a dictionary and if you understand what Python dictionaries are and what they can do, it seems like an obvious solution to the problem of connecting names with values. But there's more. Python can have lots of namespaces and you can manipulate those namespaces the same way you manipulate dictionaries. Read this http://www.diveintopython.net/html_processing/locals_and_globals.html to understand the locals and globals namespace. Here is another discussion that will help http://lucumr.pocoo.org/2011/2/1/exec-in-python/
Play around with exec like in this question globals and locals in python exec() until you understand how it works. Then build your command shell to import the module one time at the beginning, and write your scripts to only import the module if it is not already available. When the script is run from inside your shell, the module will already be there.