I was reading a proxy server developed using python
I don't understand the method def _read_write which uses select to write client and server socket.
def _read_write(self):
time_out_max = self.timeout/3
socs = [self.client, self.target]
count = 0
while 1:
count += 1
(recv, _, error) = select.select(socs, [], socs, 3)
if error:
break
if recv:
for in_ in recv:
data = in_.recv(BUFLEN)
if in_ is self.client:
out = self.target
else:
out = self.client
if data:
out.send(data)
count = 0
if count == time_out_max:
break
Please someone help me to understand.
Here is my quick and dirty annotation:
def _read_write(self):
# This allows us to get multiple
# lower-level timeouts before we give up.
# (but see later note about Python 3)
time_out_max = self.timeout/3
# We have two sockets we care about
socs = [self.client, self.target]
# Loop until error or timeout
count = 0
while 1:
count += 1
# select is very efficient. It will let
# other processes execute until we have
# data or an error.
# We only care about receive and error
# conditions, so we pass in an empty list
# for transmit, and assign transmit results
# to the _ variable to ignore.
# We also pass a timeout of 3 seconds, which
# is why it's OK to divide the timeout value
# by 3 above.
# Note that select doesn't read anything for
# us -- it just blocks until data is ready.
(recv, _, error) = select.select(socs, [], socs, 3)
# If we have an error, break out of the loop
if error:
break
# If we have receive data, it's from the client
# for the target, or the other way around, or
# even both. Loop through and deal with whatever
# receive data we have and send it to the other
# port.
# BTW, "if recv" is redundant here -- (a) in
# general (except for timeouts) we'll have
# receive data here, and (b) the for loop won't
# execute if we don't.
if recv:
for in_ in recv:
# Read data up to a max of BUFLEN,
data = in_.recv(BUFLEN)
# Dump the data out the other side.
# Indexing probably would have been
# more efficient than this if/else
if in_ is self.client:
out = self.target
else:
out = self.client
# I think this may be a bug. IIRC,
# send is not required to send all the
# data, but I don't remember and cannot
# be bothered to look it up right now.
if data:
out.send(data)
# Reset the timeout counter.
count = 0
# This is ugly -- should be >=, then it might
# work even on Python 3...
if count == time_out_max:
break
# We're done with the loop and exit the function on
# either a timeout or an error.
Related
The second 'if' statement midway through this code is using an 'or' between two conditions. This is causing the issue I just don't know how to get around it. The code is going through a data file and turning on the given relay number at a specific time, I need it to only do this once per given relay. If I use an 'and' between the conditions, it will only turn on the first relay that matches the current time and wait for the next hour and turn on the next given relay.
Could someone suggest something to fix this issue, thank you!
def schedule():
metadata, sched = dbx.files_download(path=RELAYSCHEDULE)
if not sched.content:
pass # If file is empty then exit routine
else:
relaySchedule = str(sched.content)
commaNum = relaySchedule.count(',')
data1 = relaySchedule.split(',')
for i in range(commaNum):
data2 = data1[i].split('-')
Time1 = data2[1]
currentRN = data2[0]
currentDT = datetime.datetime.now()
currentHR = currentDT.hour
global RN
global T
if str(currentHR) == str(Time1):
if T != currentHR or RN != currentRN:
relaynum = int(data2[0])
relaytime = int(data2[2])
T = currentHR
RN = currentRN
k = threading.Thread(target=SendToRelay(relaynum, relaytime)).start()
else:
print("Pass")
Desired Inputs:
sched.content = '1-19-10,3-9-20,4-9-10,'
T = ' '
RN = ' '
T and RN are global variables because the loop is running indefinitely, they're there to let the loop know whether the specific Time(T) and Relay Number(RN) have already been used.
Desired Outputs:
If the time is 9 AM then,
T = 9
RN should be whatever the given relay number is so RN = 3, but not sure this is the right thing to use.
Sorry if this is confusing. I basically need the program to read a set of scheduled times for specific relays to turn on, I need it to read the current time and if it matches the time in the schedule it will then check which relay is within that time and turn it on for however long. Once it has completed that, I need it to go over that same set of data in case there is another relay within the same time that also needs to turn on, the issue is that if I don't use T and RN variables to check whether a previous relay has been set, it will read the file and turn on the same relay over and over.
Try printing all used variables, check if everything is, what you think it is. On top of that, sometimes whietespaces characters causes problem with comparison.
I fixed it. For anyone wondering this is the new working code:
def schedule():
metadata, sched = dbx.files_download(path=RELAYSCHEDULE)
if not sched.content:
pass # If file is empty then exit routine
else:
relaySchedule = str(sched.content)
commaNum = relaySchedule.count(',')
data1 = relaySchedule.split(',')
for i in range(commaNum):
data2 = data1[i].split('-')
TimeSched = data2[1]
relaySched = data2[0]
currentDT = datetime.datetime.now()
currentHR = currentDT.hour
global RN
global T
if str(currentHR) == str(TimeSched):
if str(T) != str(currentHR):
RN = ''
T = currentHR
if str(relaySched) not in str(RN):
relaynum = int(data2[0])
relaytime = int(data2[2])
k = threading.Thread(target=SendToRelay(relaynum, relaytime)).start()
RN = str(RN) + str(relaySched)
I have an application with one producer and many consumers, and a queue, which communicates them.
The consumer should collect some data from queue, let's say qsize()/number_of_consumers, but it must stop it work when a sentinel appears.
I have such a code:
frame = 0
elems_max = 10
while frame is not None:
frames = []
for _ in range(elems_max):
frame = queue_in.get()
if frame:
frames.append(frame)
else:
break
process_data(frames)
As You can see None is a sentinel for this queue, and when it appears I wan't to break my working process. I also want to get more then one element for data processing.
What is the fastest method to achieve this [in python 3.5]?
I understand that you want to break your outer while when encountering a None.
You can hold a boolean variable that is True while the while must execute, and False when it should stop.
This would look like this:
frame = 0
elems_max = 10
running = True
while running and frame is not None:
frames = []
for _ in range(elems_max):
frame = queue_in.get()
if frame is not None:
frames.append(frame)
else:
running = False
break
process_data(frames)
The break instruction will break the inner for, but not the outer while.
However, having set running to False, the while loop will stop.
Based on your comment.
It is not possible to include a break statement in a comprehension, nor an else clause, as you wanted to do:
frames = [f for i in range(elems_max) if queue_in.get() is not None else break]
However, you can build your list, and then remove all the elements after a None:
frames = [queue_in.get() for _ in range(elems_max)]
try:
noneId = frames.find(None)
frames = frames[:noneId]
except ValueError:
pass
This is not very efficient, because potentially many elements will be appended in frames for nothing.
I would prefer a manual construction, to avoid this hazard.
One more solution, based on a generator.
This might not be what you expected, but the syntax is rather simple, so you may like it.
The idea is to wrap the getting of the data inside of a generator, that breaks on a None value:
def queue_data_generator(queue, count):
for _ in range(count):
item = queue.get()
if item is None:
raise StopIteration
else:
yield item
Then, instantiate this generator, and simply iterate over it:
g = queue_data_generator(queue_in, elems_max)
frames = [frame for frame in g]
The frames list will contain all the frames contained in queue_in, until the first None.
The usage is rather simple, but you have to setup it by defining the generator.
I think it's pretty elegant though.
I would do next (kinda pseudocode):
class CInputQueue:
def get(self, preferred_N):
# do sync stuff
# take <= N elements (you can do kinda balance the load)
# or throw exception
raise Exception("No data, no work, no life.")
elems_max = 10
try:
while True:
process_data(queue_in.get(elems_max))
except:
None # break
I assume that data processing takes much more time, than 0 ms, so I use exception. I know that it's not okay use exceptions for flow control, but for worker it's really exception. His "life" build around processing data, but their is no work for him, no even sleep task.
I am trying to design an async pipeline that can easily make a data processing pipeline. The pipeline is composed of several functions. Input data goes in at one end of the pipeline and comes out at the other end.
I want to design the pipeline in a way that:
Additional functions can be insert in the pipeline
Functions already in the pipeline can be popped out.
Here is what I came up with:
import asyncio
#asyncio.coroutine
def add(x):
return x + 1
#asyncio.coroutine
def prod(x):
return x * 2
#asyncio.coroutine
def power(x):
return x ** 3
def connect(funcs):
def wrapper(*args, **kwargs):
data_out = yield from funcs[0](*args, **kwargs)
for func in funcs[1:]:
data_out = yield from func(data_out)
return data_out
return wrapper
pipeline = connect([add, prod, power])
input = 1
output = asyncio.get_event_loop().run_until_complete(pipeline(input))
print(output)
This works, of course, but the problem is that if I want to add another function into (or pop out a function from) this pipeline, I have to disassemble and reconnect every function again.
I would like to know if there is a better scheme or design pattern to create such a pipeline?
I've done something similar before, using just the multiprocessing library. It's a bit more manual, but it gives you the ability to easily create and modify your pipeline, as you've requested in your question.
The idea is to create functions that can live in a multiprocessing pool, and their only arguments are an input queue and an output queue. You tie the stages together by passing them different queues. Each stage receives some work on its input queue, does some more work, and passes the result out to the next stage through its output queue.
The workers spin on trying to get something from their queues, and when they get something, they do their work and pass the result to the next stage. All of the work ends by passing a "poison pill" through the pipeline, causing all stages to exit:
This example just builds a string in multiple work stages:
import multiprocessing as mp
POISON_PILL = "STOP"
def stage1(q_in, q_out):
while True:
# get either work or a poison pill from the previous stage (or main)
val = q_in.get()
# check to see if we got the poison pill - pass it along if we did
if val == POISON_PILL:
q_out.put(val)
return
# do stage 1 work
val = val + "Stage 1 did some work.\n"
# pass the result to the next stage
q_out.put(val)
def stage2(q_in, q_out):
while True:
val = q_in.get()
if val == POISON_PILL:
q_out.put(val)
return
val = val + "Stage 2 did some work.\n"
q_out.put(val)
def main():
pool = mp.Pool()
manager = mp.Manager()
# create managed queues
q_main_to_s1 = manager.Queue()
q_s1_to_s2 = manager.Queue()
q_s2_to_main = manager.Queue()
# launch workers, passing them the queues they need
results_s1 = pool.apply_async(stage1, (q_main_to_s1, q_s1_to_s2))
results_s2 = pool.apply_async(stage2, (q_s1_to_s2, q_s2_to_main))
# Send a message into the pipeline
q_main_to_s1.put("Main started the job.\n")
# Wait for work to complete
print(q_s2_to_main.get()+"Main finished the job.")
q_main_to_s1.put(POISON_PILL)
pool.close()
pool.join()
return
if __name__ == "__main__":
main()
The code produces this output:
Main started the job.
Stage 1 did some work.
Stage 2 did some work.
Main finished the job.
You can easily put more stages in the pipeline or rearrange them just by changing which functions get which queues. I'm not very familiar with the asyncio module, so I can't speak to what capabilities you would be losing by using the multiprocessing library instead, but this approach is very straightforward to implement and understand, so I like its simplicity.
I don't know if it is the best way to do it but here is my solution.
While I think it's possible to control a pipeline using a list or a dictionary I found easier and more efficent to use a generator.
Consider the following generator:
def controller():
old = value = None
while True:
new = (yield value)
value = old
old = new
This is basically a one-element queue, it stores the value that you send it and releases it at the next call of send (or next).
Example:
>>> c = controller()
>>> next(c) # prime the generator
>>> c.send(8) # send a value
>>> next(c) # pull the value from the generator
8
By associating every coroutine in the pipeline with its controller we will have an external handle that we can use to push the target of each one. We just need to define our coroutines in a way that they will pull the new target from our controller every cycle.
Now consider the following coroutines:
def source(controller):
while True:
target = next(controller)
print("source sending to", target.__name__)
yield (yield from target)
def add():
return (yield) + 1
def prod():
return (yield) * 2
The source is a coroutine that does not return so that it will not terminate itself after the first cycle. The other coroutines are "sinks" and does not need a controller.
You can use these coroutines in a pipeline as in the following example. We initially set up a route source --> add and after receiving the first result we change the route to source --> prod.
# create a controller for the source and prime it
cont_source = controller()
next(cont_source)
# create three coroutines
# associate the source with its controller
coro_source = source(cont_source)
coro_add = add()
coro_prod = prod()
# create a pipeline
cont_source.send(coro_add)
# prime the source and send a value to it
coro_source.send(None)
print("add =", coro_source.send(4))
# change target of the source
cont_source.send(coro_prod)
# reset the source, send another value
coro_source.send(None)
print("prod =", coro_source.send(8))
Output:
source sending to add
add = 5
source sending to prod
prod = 16
I'm having trouble in writing a script for reading some temperature sensors (DS18B20) on a raspberry pi.
I have a working script, but sometimes the sensors fall out and then the script also stops.
I'm trying to make a more robust version by integrating a try-except statement. The goal is to proceed to the next sensor in the range if one of the sensors doesn't react. If I emulate sensor failure by plugging one of the sensors out, the script stops taking measurements for all the sensors (instead of for the sensor that has been plugged out). And it doesn't give me an error. Any ideas?
This is the part of the script with the try statement:
if time.time() <= timeout:
for index in range (numsensors):
try:
def read_temp_raw(): # gets the temps one by one
f = open(device_file[index], 'r')
lines = f.readlines()
f.close()
return lines
def read_temp(): # checks the received temperature for errors
lines = read_temp_raw()
while lines[0].strip()[-3:] != 'YES':
time.sleep(0.2)
lines = read_temp_raw()
equals_pos = lines[1].find('t=')
if equals_pos != -1:
temp_string = lines[1][equals_pos+2:]
# set proper decimal place for deg C
temp = float(temp_string) / 1000.0
# Round temp to x decimal points --> round(temp,x)
temp = round(temp, 2)
return temp
reading = (read_temp())
temp[index].append(reading)
print device[index],"=", temp[index]
continue
except IOError:
print "Error"
"What has been asked" inventory:
Is using try-except construct making underlying system more
robust?
Why is the code not giving any error on indicated sensor-failure?
A1:
The try-except clause sounds as self-explanatory & life-saving package, however it is not.
One has to fully understand what all sorts of exceptions-types the code is expecting to meet face-to-face and how to handle each of them. Naive or erroneous use of this syntax construct effectively masks the rest of the exceptions from your debugging radar screen, leaving the unhandled cases fail in a dark silence, out of your control and without knowing about them at all. True "Robustness" and "Failure Resilience" is something else than this.
This code sample will leave hidden all real-life collisions, except the only one listed, the IOError, but if that will not happen, all the others, that do happen, are not handled:
if time.time() <= timeout: # START if .time() is still before a T/O
for index in range (numsensors): # ITERATE over all sensors
try: # TRY:
<<<something>>> # <<<something>>>
except IOError: # EXC IOError:
<<<IOError__>>> # Handle EXC.IOError
Guess all your def...(..):-s may belong to a non-repeating section of the code, prior to the if:/for: as you need not modify the code "on-the-fly", do you?
def read_temp_raw(): # DEF: gets the temps one by one
f = open(device_file[index],'r') # SET aFileHANDLE access to IO.DEV ( beware var index "visibility" )
lines = f.readlines() # IO.DEV.READ till <EoF>
f.close() # IO.DEV.CLOSE
return lines # RET all lines
def read_temp(): # DEF: checks the received temperature for errors
lines = read_temp_raw() # GET lines from read_temp_raw()
while lines[0].strip()[-3:]!='YES': # WHILE last-3Bytes of 1st-line!="YES"
time.sleep(0.2) # NOP/Sleep()
lines = read_temp_raw() # GET lines again (beware var index)
equals_pos =lines[1].find('t=') # SET position of 't=' in 2nd-line
if equals_pos != -1: # IF position != -1
temp_string = lines[1][equals_pos+2:]
temp = float(temp_string) \
/ 1000.0 # DIV( 1000) decimal place for deg C
temp = round(temp, 2) # ROUND temp to x decimal points --> round(temp,x)
return temp # RET->
# ----------------------------- # ELSE: re-loop in WHILE
# -------------------------------- # LOOP AGAIN AD INFIMUM
A2: your try-except clause in the posted code is expecting only one kind of exception to be handled by it -- the IOError -- which is instantiated only when actual IO.DEV operation fails for an I/O-related reason, which does not mean a case, that you physically "un-plug" a sensor, while the IO.DEV is still present and can carry it's IO.DEV.READ(s) and thus no exceptions.EnvironmentError.IOError is to be raise-d
That means, the IO.DEV.READ(s) take place and the code results, as per the condition WHILE last-3Bytes of 1st-line dictates, in an endless loop, because the 1st-line "still" does not end with "YES".
Q.E.D.
The Goal focus
Coming back to the issue, you may rather set a safer test for a real-world case, where an erroneous input may appear during your sensor-network scan.
The principle may look like:
f.close() # IO.DEV.CLOSE
if ( len(lines) < 2 ): # IF <<lines>> are nonsense:
return [ "NULL_LENGTH_READING still with CODE EXPECTED ACK-SIG-> YES", \
"SIG ERROR FOR POST-PROCESSOR WITH AN UNREALISTIC VALUE t=-99999999" \
]
return( lines ) # OTHERWISE RET( lines )
I am writing some code to build a table of variable length (Huffman) codes, and I wanted to use the multiprocessing module for fun. The idea is to have each process try to get a node from the queue. They do work on the node, and either put that nodes two children back into the work queue, or they put the variable length code into result queue. They are also passing messages to a message queue, which should be printed by a thread in the main process. Here is the code so far:
import Queue
import multiprocessing as mp
from threading import Thread
from collections import Counter, namedtuple
Node = namedtuple("Node", ["child1", "child2", "weight", "symbol", "code"])
def _sort_func(node):
return node.weight
def _encode_proc(proc_number, work_queue, result_queue, message_queue):
while True:
try:
#get a node from the work queue
node = work_queue.get(timeout=0.1)
#if it is an end node, add the symbol-code pair to the result queue
if node.child1 == node.child2 == None:
message_queue.put("Symbol processed! : proc%d" % proc_number)
result_queue.put({node.symbol:node.code})
#otherwise do some work and add some nodes to the work queue
else:
message_queue.put("More work to be done! : proc%d" % proc_number)
node.child1.code.append(node.code + '0')
node.child2.code.append(node.code + '1')
work_queue.put(node.child1)
work_queue.put(node.child2)
except Queue.Empty: #everything is probably done
return
def _reporter_thread(message_queue):
while True:
try:
message = message_queue.get(timeout=0.1)
print message
except Queue.Empty: #everything is probably done
return
def _encode_tree(tree, symbol_count):
"""Uses multiple processes to walk the tree and build the huffman codes."""
#Create a manager to manage the queues, and a pool of workers.
manager = mp.Manager()
worker_pool = mp.Pool()
#create the queues you will be using
work = manager.Queue()
results = manager.Queue()
messages = manager.Queue()
#add work to the work queue, and start the message printing thread
work.put(tree)
message_thread = Thread(target=_reporter_thread, args=(messages,))
message_thread.start()
#add the workers to the pool and close it
for i in range(mp.cpu_count()):
worker_pool.apply_async(_encode_proc, (i, work, results, messages))
worker_pool.close()
#get the results from the results queue, and update the table of codes
table = {}
while symbol_count > 0:
try:
processed_symbol = results.get(timeout=0.1)
table.update(processed_symbol)
symbol_count -= 1
except Queue.Empty:
print "WAI DERe NO SYMBOLzzzZzz!!!"
finally:
print "Symbols to process: %d" % symbol_count
return table
def make_huffman_table(data):
"""
data is an iterable containing the string that needs to be encoded.
Returns a dictionary mapping symbols to codes.
"""
#Build a list of Nodes out of the characters in data
nodes = [Node(None, None, weight, symbol, bytearray()) for symbol, weight in Counter(data).items()]
nodes.sort(reverse=True, key=_sort_func)
symbols = len(nodes)
append_node = nodes.append
while len(nodes) > 1:
#make a new node out of the two nodes with the lowest weight and add it to the list of nodes.
child2, child1 = nodes.pop(), nodes.pop()
new_node = Node(child1, child2, child1.weight+child2.weight, None, bytearray())
append_node(new_node)
#then resort the nodes
nodes.sort(reverse=True, key=_sort_func)
top_node = nodes[0]
return _encode_tree(top_node, symbols)
def chars(fname):
"""
A simple generator to make reading from files without loading them
totally into memory a simple task.
"""
f = open(fname)
char = f.read(1)
while char != '':
yield char
char = f.read(1)
f.close()
raise StopIteration
if __name__ == "__main__":
text = chars("romeo-and-juliet.txt")
table = make_huffman_table(text)
print table
The current output of this is:
More work to be done! : proc0
WAI DERe NO SYMBOLzzzZzz!!!
Symbols to process: 92
WAI DERe NO SYMBOLzzzZzz!!!
Symbols to process: 92
WAI DERe NO SYMBOLzzzZzz!!!
Symbols to process: 92
It just repeats the last bit forever. After the first process adds work to the node, everything just stops. Why is that? Am I not understand/using queues properly? Sorry for all the code to read.
Your first problem is trying to use timeouts. They're almost never a good idea. They may be a good idea if you can't possibly think of a reliable way to do something efficiently, and you use timeouts only as a first step in checking whether something is really done.
That said, the primary problem is that multiprocessing is often very bad at reporting exceptions that occur in worker processes. Your code is actually dying here:
node.child1.code.append(node.code + '0')
The error message you're not seeing is "an integer or string of size 1 is required". You can't append a bytearray to a bytearray. You want to do :
node.child1.code.extend(node.code + '0')
^^^^^^
instead, and in the similar line for child2. As is, because the first worker process to take something off the work queue dies, nothing more is ever added to the work queue. That explains everything you've seen - so far ;-)
No timeouts
FYI, the usual approach to avoid timeouts (which are flaky - unreliable) is to put a special sentinel value on a queue. Consumers know it's time to quit when they see the sentinel, and use a plain blocking .get() to retrieve items from the queue. So first thing is to create a sentinel; e.g., add this near the top:
ALL_DONE = "all done"
Best practice is also to .join() threads and processes - that way the main program knows (doesn't just guess) when they're done too.
So, you can change the end of _encode_tree() like so:
for i in range(1, symbol_count + 1):
processed_symbol = results.get()
table.update(processed_symbol)
print "Symbols to process: %d" % (symbol_count - i)
for i in range(mp.cpu_count()):
work.put(ALL_DONE)
worker_pool.join()
messages.put(ALL_DONE)
message_thread.join()
return table
The key here is that the main program knows all the work is done when, and only when, no symbols remain to be processed. Until then, it can unconditionally .get() results from the results queue. Then it puts a number of sentinels on the work queue equal to the number of workers. They'll each consume a sentinel and quit. Then we wait for them to finish (worker_pool.join()). Then a sentinel is put on the message queue, and we wait for that thread to end too. Only then does the function return.
Now nothing ever terminates early, everything is shut down cleanly, and the output of your final table isn't mixed up anymore with various other output from the workers and the message thread. _reporter_thread() gets rewritten like so:
def _reporter_thread(message_queue):
while True:
message = message_queue.get()
if message == ALL_DONE:
break
else:
print message
and similarly for _encode_proc(). No more timeouts or try/except Queue.Empty: fiddling. You don't even have to import Queue anymore :-)