Using the python watchdog file system events watching library I noticed that when being used under Windows Server 2003 it entered into "Polling Mode" thus stoping using asynchronous OS notification and, therefore, heavily reducing system performance under big amount of file changes.
I traced the problem to watchdog/observers/winapi.py file where CancelIoEx system call is used in order to stop ReadDirectoryChangesW call lock when the user wants to stop monitoring the watched directory or file:
(winapi.py)
CancelIoEx = ctypes.windll.kernel32.CancelIoEx
CancelIoEx.restype = ctypes.wintypes.BOOL
CancelIoEx.errcheck = _errcheck_bool
CancelIoEx.argtypes = (
ctypes.wintypes.HANDLE, # hObject
ctypes.POINTER(OVERLAPPED) # lpOverlapped
)
...
...
...
def close_directory_handle(handle):
try:
CancelIoEx(handle, None) # force ReadDirectoryChangesW to return
except WindowsError:
return
The problem with CancelIoEx call is that it is not available until Windows Server 2008:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363792(v=vs.85).aspx
One possible alternative is to change close_directory_handle in order to make it create a mock file within the monitored directory, thus unlocking the thread waiting for ReadDirectoryChangesW to return.
However, I noticed that CancelIo system call is in fact available in Windows Server 2003:
Cancels all pending input and output (I/O) operations that are issued
by the calling thread for the specified file. The function does not
cancel I/O operations that other threads issue for a file handle. To
cancel I/O operations from another thread, use the CancelIoEx
function.
But calling CancelIo won't affect the waiting thread.
Do you have any idea on how to solve this problem?
May be threading.enumerate() could be used issue a signal to be handled by each thread being CancelIo called from these handlers?
The natural approach is to implement a completion routine and call to ReadDirectoryChangesW using its overlapped mode. The following example shows the way to do that:
RDCW_CALLBACK_F = ctypes.WINFUNCTYPE(None, ctypes.wintypes.DWORD, ctypes.wintypes.DWORD, ctypes.POINTER(OVERLAPPED))
First, create a WINFUNCTYPE factory which will be used to generate (callable from Windows API) C like functions from python methods. In this case, no return value and 3 parameters corresponding to
VOID CALLBACK FileIOCompletionRoutine(
_In_ DWORD dwErrorCode,
_In_ DWORD dwNumberOfBytesTransfered,
_Inout_ LPOVERLAPPED lpOverlapped
);
FileIOCompletionRoutine header.
The callback reference as well as the overlapped structure need to be added to ReadDirectoryChangesW arguments list:
ReadDirectoryChangesW = ctypes.windll.kernel32.ReadDirectoryChangesW
ReadDirectoryChangesW.restype = ctypes.wintypes.BOOL
ReadDirectoryChangesW.errcheck = _errcheck_bool
ReadDirectoryChangesW.argtypes = (
ctypes.wintypes.HANDLE, # hDirectory
LPVOID, # lpBuffer
ctypes.wintypes.DWORD, # nBufferLength
ctypes.wintypes.BOOL, # bWatchSubtree
ctypes.wintypes.DWORD, # dwNotifyFilter
ctypes.POINTER(ctypes.wintypes.DWORD), # lpBytesReturned
ctypes.POINTER(OVERLAPPED), # lpOverlapped
RDCW_CALLBACK_F # FileIOCompletionRoutine # lpCompletionRoutine
)
From here, we are ready to perform the overlapped system call.
This is a simple call bacl just usefult to test that everything works fine:
def dir_change_callback(dwErrorCode,dwNumberOfBytesTransfered,p):
print("dir_change_callback! PID:" + str(os.getpid()))
print("CALLBACK THREAD: " + str(threading.currentThread()))
Prepare and perform the call:
event_buffer = ctypes.create_string_buffer(BUFFER_SIZE)
nbytes = ctypes.wintypes.DWORD()
overlapped_read_dir = OVERLAPPED()
call2pass = RDCW_CALLBACK_F(dir_change_callback)
hand = get_directory_handle(os.path.abspath("/test/"))
def docall():
ReadDirectoryChangesW(hand, ctypes.byref(event_buffer),
len(event_buffer), False,
WATCHDOG_FILE_NOTIFY_FLAGS,
ctypes.byref(nbytes),
ctypes.byref(overlapped_read_dir), call2pass)
print("Waiting!")
docall()
If you load and execute all this code into a DreamPie interactive shell you can check the system call is done and that the callback executes thus printing the thread and pid numbers after the first change done under c:\test directory. Besides, you will notice those are the same than the main thread and process: Despite the event is raised by a separated thread, the callback runs in the same process and thread as our main program thus providing an undesired behaviour:
lck = threading.Lock()
def dir_change_callback(dwErrorCode,dwNumberOfBytesTransfered,p):
print("dir_change_callback! PID:" + str(os.getpid()))
print("CALLBACK THREAD: " + str(threading.currentThread()))
...
...
...
lck.acquire()
print("Waiting!")
docall()
lck.acquire()
This program will lock the main thread and the callback will never execute.
I tried many synchronization tools, even Windows API semaphores always getting the same behaviour so, finally, I decided to implement the ansynchronous call using the synchronous configuration for ReadDirectoryChangesW within a separate process managed and synchronized using multiprocessing python library:
Calls to get_directory_handle won't return the handle number given by windows API but one managed by winapi library, for that I implemented a handle generator:
class FakeHandleFactory():
_hl = threading.Lock()
_next = 0
#staticmethod
def next():
FakeHandleFactory._hl.acquire()
ret = FakeHandleFactory._next
FakeHandleFactory._next += 1
FakeHandleFactory._hl.release()
return ret
Each generated handle has to be globally associated with a file system path:
handle2file = {}
Each call to read_directory_changes will now generate ReadDirectoryRequest (derived from multiprocessing.Process) object:
class ReadDirectoryRequest(multiprocessing.Process):
def _perform_and_wait4request(self, path, recursive, event_buffer, nbytes):
hdl = CreateFileW(path, FILE_LIST_DIRECTORY, WATCHDOG_FILE_SHARE_FLAGS,
None, OPEN_EXISTING, WATCHDOG_FILE_FLAGS, None)
#print("path: " + path)
aux_buffer = ctypes.create_string_buffer(BUFFER_SIZE)
aux_n = ctypes.wintypes.DWORD()
#print("_perform_and_wait4request! PID:" + str(os.getpid()))
#print("CALLBACK THREAD: " + str(threading.currentThread()) + "\n----------")
try:
ReadDirectoryChangesW(hdl, ctypes.byref(aux_buffer),
len(event_buffer), recursive,
WATCHDOG_FILE_NOTIFY_FLAGS,
ctypes.byref(aux_n), None, None)
except WindowsError as e:
print("!" + str(e))
if e.winerror == ERROR_OPERATION_ABORTED:
nbytes = 0
event_buffer = []
else:
nbytes = 0
event_buffer = []
# Python 2/3 compat
nbytes.value = aux_n.value
for i in xrange(self.int_class(aux_n.value)):
event_buffer[i] = aux_buffer[i]
CloseHandle(hdl)
try:
self.lck.release()
except:
pass
def __init__(self, handle, recursive):
buffer = ctypes.create_string_buffer(BUFFER_SIZE)
self.event_buffer = multiprocessing.Array(ctypes.c_char, buffer)
self.nbytes = multiprocessing.Value(ctypes.wintypes.DWORD, 0)
targetPath = handle2file.get(handle, None)
super(ReadDirectoryRequest, self).__init__(target=self._perform_and_wait4request, args=(targetPath, recursive, self.event_buffer, self.nbytes))
self.daemon = True
self.lck = multiprocessing.Lock()
self.result = None
try:
self.int_class = long
except NameError:
self.int_class = int
if targetPath is None:
self.result = ([], -1)
def CancelIo(self):
try:
self.result = ([], 0)
self.lck.release()
except:
pass
def read_changes(self):
#print("read_changes! PID:" + str(os.getpid()))
#print("CALLBACK THREAD: " + str(threading.currentThread()) + "\n----------")
if self.result is not None:
raise Exception("ReadDirectoryRequest object can be used only once!")
self.lck.acquire()
self.start()
self.lck.acquire()
self.result = (self.event_buffer, self.int_class(self.nbytes.value))
return self.result
This class specifies Process providing a process which perform the system call and waits until (or):
A change event has been raised.
The main thread cancels the request by calling to the ReadDirectoryRequest object CancelIo method.
Note that:
get_directory_handle
close_directory_handle
read_directory_changes
Roles are now to manage requests. For that, thread locks and auxiliary data structures are needed:
rqIndexLck = threading.Lock() # Protects the access to `rqIndex`
rqIndex = {} # Maps handles to request objects sets.
get_directory_handle
def get_directory_handle(path):
rqIndexLck.acquire()
ret = FakeHandleFactory.next()
handle2file[ret] = path
rqIndexLck.release()
return ret
close_directory_handle
def close_directory_handle(handle):
rqIndexLck.acquire()
rqset4handle = rqIndex.get(handle, None)
if rqset4handle is not None:
for rq in rqset4handle:
rq.CancelIo()
del rqIndex[handle]
if handle in handle2file:
del handle2file[handle]
rqIndexLck.release()
And last but not least: read_directory_changes
def read_directory_changes(handle, recursive):
rqIndexLck.acquire()
rq = ReadDirectoryRequest(handle, recursive)
set4handle = None
if handle in rqIndex:
set4handle = rqIndex[handle]
else:
set4handle = set()
rqIndex[handle] = set4handle
set4handle.add(rq)
rqIndexLck.release()
ret = rq.read_changes()
rqIndexLck.acquire()
if rq in set4handle:
set4handle.remove(rq)
rqIndexLck.release()
return ret
Related
I see a lot of tutorials on how to use queues, but they always show them implemented in the same file. I'm trying to organize my code files well from the beginning because I anticipate the project to become very large. How do I get the queue that I initialize in my main file to import into the other function files?
Here is my main file:
import multiprocessing
import queue
from data_handler import data_handler
from get_info import get_memory_info
from get_info import get_cpu_info
if __name__ == '__main__':
q = queue.Queue()
getDataHandlerProcess = multiprocessing.Process(target=data_handler(q))
getMemoryInfoProcess = multiprocessing.Process(target=get_memory_info(q))
getCPUInfoProcess = multiprocessing.Process(target=get_cpu_info(q))
getDataHandlerProcess.start()
getMemoryInfoProcess.start()
getCPUInfoProcess.start()
print("DEBUG: All tasks successfully started.")
Here is my producer:
import psutil
import struct
import time
from data_frame import build_frame
def get_cpu_info(q):
while True:
cpu_string_data = bytes('', 'utf-8')
cpu_times = psutil.cpu_percent(interval=0.0, percpu=True)
for item in cpu_times:
cpu_string_data = cpu_string_data + struct.pack('<d',item)
cpu_frame = build_frame(cpu_string_data, 0, 0, -1, -1)
q.put(cpu_frame)
print(cpu_frame)
time.sleep(1.000)
def get_memory_info(q):
while True:
memory_string_data = bytes('', 'utf-8')
virtual_memory = psutil.virtual_memory()
swap_memory = psutil.swap_memory()
memory_info = list(virtual_memory+swap_memory)
for item in memory_info:
memory_string_data = memory_string_data + struct.pack('<d',item)
memory_frame = build_frame(memory_string_data, 0, 1, -1, -1)
q.put(memory_frame)
print(memory_frame)
time.sleep(1.000)
def get_disk_info(q):
while True:
disk_usage = psutil.disk_usage("/")
disk_io_counters = psutil.disk_io_counters()
time.sleep(1.000)
print(disk_usage)
print(disk_io_counters)
def get_network_info(q):
while True:
net_io_counters = psutil.net_io_counters()
time.sleep(1.000)
print(net_io_counters)
And here is my consumer:
def data_handler(q):
while True:
next_element = q.get()
print(next_element)
print('Item received at data handler queue.')
It is not entirely clear to me what do you mean by " How do I get the queue that I initialize in my main file to import into the other function files?".
Normally you pass a queue as and argument to a function and use it within a function scope regardless of the file structure. Or perform any other variable sharing techniques used for any other data type.
Your code seems to have a few errors however. Firstly, you shouldn't be using queue.Queue with multiprocessing. It has it's own version of that class.
q = multiprocessing.Queue()
It is slower than the queue.Queue, but it works for sharing the data across processes.
Secondly, the proper way to create process objects is:
getDataHandlerProcess = multiprocessing.Process(target=data_handler, args = (q,))
Otherwise you are actually calling data_handler(q) the main thread and trying to assign its return value to the target argument of multiprocessing.Process. Your data_handler function never returns, so the program probably gets into an infinite a deadlock at this point before multiprocessing even begins. Edit: actually it probably goes into infinite wait trying to get an element from an empty queue which will never be filled.
I have written a sample Subscriber. I want to feed the data that I have obtained from the rospy.Subscriber into another variable, so that I can use it later in the program for processing. At the moment I could see that the Subscriber is functioning as I can see the subscribed values being printed when I use rospy.loginfo() function. Although I donot know how to store this data into another varible. I have tried assigning it directly to a variable by using assignment operator '=', but I get error.
I have tried writing a callback function with rospy.loginfo to print the position data from the subscribed object. I have subscribed JointState and it containes, header, position, velocity and effort arrays. using rospy.loginfo I can verify that the subscriber is subscribing. But when i tried to assign it directly to a variable, I get an error.
I am displaying loginfo from a call back function as follows
def callback(data):
rospy.loginfo(data.position)
global listen
listen = rospy.Subscriber("joint_states", JointState,
callback)
rospy.spin()
and this works fine. But when i slightly modify the code to assign the subscribed values, I get following error i.e.
listen1 = rospy.Subscriber("joint_states", JointState,
callback=None)
listen = listen1.position
#rospy.loginfo(listen)
print(listen)
rospy.spin()```
The error is as follows,
```listen = listen1.position
AttributeError: 'Subscriber' object has no attribute 'position'
EDIT:
Here is my node I have defined in my program,
#rospy.loginfo(msg.data)
global tactile_states
tactile_states = data.data
def joint_callback(data):
#rospy.loginfo(data.position)
global g_joint_states
global g_position
global g_pos1
g_joint_states = data
#for i in len(data.position):
#g_position[i] = data.position[i]
g_position = data.position
if len(data.position) > 0:
print("jointstate more than 0")
g_pos1 = data.position[0]
#print(g_position)
def joint_modifier(*args):
#choice describes what the node is supposed to do whether act as publisher or subscribe to joint states or tactile sensors
rospy.init_node('joint_listener_publisher', anonymous=True)
pub1 = rospy.Publisher('joint_states', JointState, queue_size = 10)
if(len(args)>1):
choice = args[0]
joint_name = args[1]
position = args[2]
else:
choice = args[0]
if (choice == 1):
rate = rospy.Rate(1)
robot_configuration = JointState()
robot_configuration.header = Header()
robot_configuration.name = [joint_name]
robot_configuration.position = [position]
robot_configuration.velocity = [10]
robot_configuration.effort = [100]
while not rospy.is_shutdown():
robot_configuration.header.stamp = rospy.Time.now()
rospy.loginfo(robot_configuration)
break
pub1.publish(robot_configuration)
rospy.sleep(2)
if (choice == 2):
#rospy.Timer(rospy.Duration(2), joint_modifier)
listen = rospy.Subscriber("joint_states", JointState, joint_callback)
rospy.spin()
if (choice == 3):
#rospy.Timer(rospy.Duration(2), joint_modifier)
tactile_sub = rospy.Subscriber("/sr_tactile/touch/ff", Float64, tactile_callback)
rospy.spin()
This is how I am calling the node inside the main body of the program,
joint_modifier(2)
print("printing g_position")
print(g_position)#to check the format of g_position
print("printed g _position")
leg_1 = Leg_attribute(g_position[0], g_position[1], g_position[2], velocity1 = 10, velocity2 = 10, velocity3 = 10, effort1 = 100, effort2 = 100, effort3 = 100, acceleration=1)
When calling this way, the program is stuck at joint_modifier(2) as that function has rospy.spin().
The style which you're using is not very standard. I assume you've seen the example on ROS wiki, I've modified it to demonstrate standard usage below.
Chiefly, addressing the code you posted, you needed to make listen have global scope outside of the callback. This is to store the data you want, not the Subscriber object. The rospy.spin() never goes in a callback, only the main node function/section. The subscriber object, listen1, which is used infrequently, doesn't return anything, and doesn't store the data it acquires. That is, you need Subscriber() to have a non-None callback.
It's more of a bind, giving the data to the callback instead of returning it from Subscriber. That's why listen1 (Subscriber) has no attribute position (JointState).
import rospy
from sensor_msgs.msg import JointState
# Subscribers
# joint_sub (sensor_msgs/JointState): "joint_states"
# This is where you store all your data you recieve
g_joint_states = None
g_positions = None
g_pos1 = None
def timer_callback(event): # Type rospy.TimerEvent
print('timer_cb (' + str(event.current_real) + '): g_positions is')
print(str(None) if g_positions is None else str(g_positions))
def joint_callback(data): # data of type JointState
# Each subscriber gets 1 callback, and the callback either
# stores information and/or computes something and/or publishes
# It _does not!_ return anything
global g_joint_states, g_positions, g_pos1
rospy.loginfo(data.position)
g_joint_states = data
g_positions = data.position
if len(data.position) > 0:
g_pos1 = data.position[0]
print(g_positions)
# In your main function, only! here do you subscribe to topics
def joint_logger_node():
# Init ROS
rospy.init_node('joint_logger_node', anonymous=True)
# Subscribers
# Each subscriber has the topic, topic type, AND the callback!
rospy.Subscriber('joint_states', JointState, joint_callback)
# Rarely need to hold onto the object with a variable:
# joint_sub = rospy.Subscriber(...)
rospy.Timer(rospy.Duration(2), timer_callback)
# spin() simply keeps python from exiting until this node is stopped
# This is an infinite loop, the only code that gets ran are callbacks
rospy.spin()
# NO CODE GOES AFTER THIS, NONE! USE TIMER CALLBACKS!
# unless you need to clean up resource allocation, close(), etc when program dies
if __name__ == '__main__':
joint_logger_node()
Edit 1:
There seems to be some confusion on what Subscriber(), spin(), and _callback(s) do.
It's a bit obscured in the Python, but there is a master program that manages all nodes, and sending nodes between them. In each node, we register with that master program that the node exists, and what publishers and subscribers it has. By register, it means we tell the master program, "Hey, I want that topic!"; in your case, for your (undeclared) joint_sub Subscriber, "Hey, I want all the JointState msgs from the joint_states topic!" The master program will, every time it gets (from some publisher somewhere) a new joint_states JointState msg, send it to that subscriber.
The subscriber handles, deals with, and processes the msg (data) with a callback: when(!) I receive a message, run the callback.
So the master program receives a new joint_states JointState msg from some publisher. Then it, because we registered a subscriber to it, sends it to this node. rospy.spin() is an infinite loop waiting for that data. This is what it does (kinda-mostly):
def rospy.spin():
while rospy.ok():
for new_msg in get_new_messages from master():
if I have a subscriber to new_msg:
my_subscriber.callback(new_msg)
rospy.spin() is where your callback, joint_callback (and/or timer_callback, etc) actually get called, and executed. It only runs when there is data for it.
More fundamentally, I think because of this confusion, your program structure is flawed; your functions don't do what you think they do. This is how you should make your node.
Make your math-portion (all the real non-ros code), the one doing the NN, into a separate module, and make a function to run it.
If you only want to run it when you receive data, run it in the callback. If you want to publish the result, publish in the callback.
Don't call the main function! The if __name__ == '__main__': my_main_function() should be the only place it gets called, and this will call your code. I repeat: the main function, declaring subscribers/publishers/init/timers/parameters, is only run in if __name__ ..., and this function runs your code. To have it run your code, place your code in a callback. Timer callbacks are handy for this.
I hope this code sample clarifies:
import rospy
from std_msgs.msg import Header
from sensor_msgs.msg import JointState
import my_nn as nn # nn.run(data)
# Subscribers
# joint_sub (sensor_msgs/JointState): "joint_states"
# Publishers
# joint_pub (sensor_msgs/JointState): "target_joint_states"
joint_pub = None
def joint_callback(data): # data of type JointState
pub_msg = JointState() # Make a new msg to publish results
pub_msg.header = Header()
pub_msg.name = data.name
pub_msg.velocity = [10] * len(data.name)
pub_msg.effort = [100] * len(data.name)
# This next line might not be quite right for what you want to do,
# But basically, run the "real code" on the data, and get the
# result to publish back out
pub_msg.position = nn.run(data.position) # Run NN on data, store results
joint_pub.publish(pub_msg) # Send it when ready!
if __name__ == '__main__':
# Init ROS
rospy.init_node('joint_logger_node', anonymous=True)
# Subscribers
rospy.Subscriber('joint_states', JointState, joint_callback)
# Publishers
joint_pub = rospy.Publisher('target_joint_states', JointState, queue_size = 10)
# Spin
rospy.spin()
# No more code! This is not a function to call, but its
# own program! This is an executable! Run your code in
# a callback!
Notice that a python module we design to be a ros node, has no functions to be called. It has a defined structure of callbacks and global data shared between them, all initialized and registered in the main function / if __name__ == '__main__'.
I'm working on a Raspberry Pi (3 B+) making a data collection device and I'm
trying to spawn a process to record the data coming in and write it to a file. I have a function for the writing that works fine when I call it directly.
When I call it using the multiprocess approach however, nothing seems to happen. I can see in task monitors in Linux that the process does in fact get spawned but no file gets written, and when I try to pass a flag to it to shut down it doesn't work, meaning I end up terminating the process and nothing seems to have happened.
I've been over this every which way and can't see what I'm doing wrong; does anyone else? In case it's relevant, these are functions inside a parent class, and one of the functions is meant to spawn another as a thread.
Code I'm using:
from datetime import datetime, timedelta
import csv
from drivers.IMU_SEN0 import IMU_SEN0
import multiprocessing, os
class IMU_data_logger:
_output_filename = ''
_csv_headers = []
_accelerometer_headers = ['Accelerometer X','Accelerometer Y','Accelerometer Z']
_gyroscope_headers = ['Gyroscope X','Gyroscope Y','Gyroscope Z']
_magnetometer_headers = ['Bearing']
_log_accelerometer = False
_log_gyroscope= False
_log_magnetometer = False
IMU = None
_writer=[]
_run_underway = False
_process=[]
_stop_value = 0
def __init__(self,output_filename='/home/pi/blah.csv',log_accelerometer = True,log_gyroscope= True,log_magnetometer = True):
"""data logging device
NOTE! Multiple instances of this class should not use the same IMU devices simultaneously!"""
self._output_filename = output_filename
self._log_accelerometer = log_accelerometer
self._log_gyroscope = log_gyroscope
self._log_magnetometer = log_magnetometer
def __del__(self):
# TODO Update this
if self._run_underway: # If there's still a run underway, end it first
self.end_recording()
def _set_up(self):
self.IMU = IMU_SEN0(self._log_accelerometer,self._log_gyroscope,self._log_magnetometer)
self._set_up_headers()
def _set_up_headers(self):
"""Set up the headers of the CSV file based on the header substrings at top and the input flags on what will be measured"""
self._csv_headers = []
if self._log_accelerometer is not None:
self._csv_headers+= self._accelerometer_headers
if self._log_gyroscope is not None:
self._csv_headers+= self._gyroscope_headers
if self._log_magnetometer is not None:
self._csv_headers+= self._magnetometer_headers
def _record_data(self,frequency,stop_value):
self._set_up() #Run setup in thread
"""Record data function, which takes a recording frequency, in herz, as an input"""
previous_read_time=datetime.now()-timedelta(1,0,0)
self._run_underway = True # Note that a run is now going
Period = 1/frequency # Period, in seconds, of a recording based on the input frequency
print("Writing output data to",self._output_filename)
with open(self._output_filename,'w',newline='') as outcsv:
self._writer = csv.writer(outcsv)
self._writer.writerow(self._csv_headers) # Write headers to file
while stop_value.value==0: # While a run continues
if datetime.now()-previous_read_time>=timedelta(0,1,0): # If we've waited a period, collect the data; otherwise keep looping
print("run underway value",self._run_underway)
if datetime.now()-previous_read_time>=timedelta(0,Period,0): # If we've waited a period, collect the data; otherwise keep looping
previous_read_time = datetime.now() # Update previous readtime
next_row = []
if self._log_accelerometer:
# Get values in m/s^2
axes = self.IMU.read_accelerometer_values()
next_row += [axes['x'],axes['y'],axes['z']]
if self._log_gyroscope:
# Read gyro values
gyro = self.IMU.read_gyroscope_values()
next_row += [gyro['x'],gyro['y'],gyro['z']]
if self._log_magnetometer:
# Read magnetometer value
b= self.IMU.read_magnetometer_bearing()
next_row += b
self._writer.writerow(next_row)
# Close the csv when done
outcsv.close()
def start_recording(self,frequency_in_hz):
# Create recording process
self._stop_value = multiprocessing.Value('i',0)
self._process = multiprocessing.Process(target=self._record_data,args=(frequency_in_hz,self._stop_value))
# Start recording process
self._process.start()
print(datetime.now().strftime("%H:%M:%S.%f"),"Data logging process spawned")
print("Logging Accelerometer:",self._log_accelerometer)
print("Logging Gyroscope:",self._log_gyroscope)
print("Logging Magnetometer:",self._log_magnetometer)
print("ID of data logging process: {}".format(self._process.pid))
def end_recording(self,terminate_wait = 2):
"""Function to end the recording multithread that's been spawned.
Args: terminate_wait: This is the time, in seconds, to wait after attempting to shut down the process before terminating it."""
# Get process id
id = self._process.pid
# Set stop event for process
self._stop_value.value = 1
self._process.join(terminate_wait) # Wait two seconds for the process to terminate
if self._process.is_alive(): # If it's still alive after waiting
self._process.terminate()
print(datetime.now().strftime("%H:%M:%S.%f"),"Process",id,"needed to be terminated.")
else:
print(datetime.now().strftime("%H:%M:%S.%f"),"Process",id,"successfully ended itself.")
====================================================================
ANSWER: For anyone following up here, it turns out the problem was my use of the VS Code debugger which apparently doesn't work with multiprocessing and was somehow preventing the success of the spawned process. Many thanks to Tomasz Swider below for helping me work through issues and, eventually, find my idiocy. The help was very deeply appreciated!!
I can see few thing wrong in your code:
First thing
stop_value == 0 will not work as the multiprocess.Value('i', 0) != 0, change that line to
while stop_value.value == 0
Second, you never update previous_read_time so it will write the readings as fast as it can, you will run out of disk quick
Third, try use time.sleep() the thing you are doing is called busy looping and it is bad, it is wasting CPU cycles needlessly.
Four, terminating with self._stop_value = 1 probably will not work there must be other way to set that value maybe self._stop_value.value = 1.
Well here is a pice of example code based on the code that you have provided that is working just fine:
import csv
import multiprocessing
import time
from datetime import datetime, timedelta
from random import randint
class IMU(object):
#staticmethod
def read_accelerometer_values():
return dict(x=randint(0, 100), y=randint(0, 100), z=randint(0, 10))
class Foo(object):
def __init__(self, output_filename):
self._output_filename = output_filename
self._csv_headers = ['xxxx','y','z']
self._log_accelerometer = True
self.IMU = IMU()
def _record_data(self, frequency, stop_value):
#self._set_up() # Run setup functions for the data collection device and store it in the self.IMU variable
"""Record data function, which takes a recording frequency, in herz, as an input"""
previous_read_time = datetime.now() - timedelta(1, 0, 0)
self._run_underway = True # Note that a run is now going
Period = 1 / frequency # Period, in seconds, of a recording based on the input frequency
print("Writing output data to", self._output_filename)
with open(self._output_filename, 'w', newline='') as outcsv:
self._writer = csv.writer(outcsv)
self._writer.writerow(self._csv_headers) # Write headers to file
while stop_value.value == 0: # While a run continues
if datetime.now() - previous_read_time >= timedelta(0, 1,
0): # If we've waited a period, collect the data; otherwise keep looping
print("run underway value", self._run_underway)
if datetime.now() - previous_read_time >= timedelta(0, Period,
0): # If we've waited a period, collect the data; otherwise keep looping
next_row = []
if self._log_accelerometer:
# Get values in m/s^2
axes = self.IMU.read_accelerometer_values()
next_row += [axes['x'], axes['y'], axes['z']]
previous_read_time = datetime.now()
self._writer.writerow(next_row)
# Close the csv when done
outcsv.close()
def start_recording(self, frequency_in_hz):
# Create recording process
self._stop_value = multiprocessing.Value('i', 0)
self._process = multiprocessing.Process(target=self._record_data, args=(frequency_in_hz, self._stop_value))
# Start recording process
self._process.start()
print(datetime.now().strftime("%H:%M:%S.%f"), "Data logging process spawned")
print("ID of data logging process: {}".format(self._process.pid))
def end_recording(self, terminate_wait=2):
"""Function to end the recording multithread that's been spawned.
Args: terminate_wait: This is the time, in seconds, to wait after attempting to shut down the process before terminating it."""
# Get process id
id = self._process.pid
# Set stop event for process
self._stop_value.value = 1
self._process.join(terminate_wait) # Wait two seconds for the process to terminate
if self._process.is_alive(): # If it's still alive after waiting
self._process.terminate()
print(datetime.now().strftime("%H:%M:%S.%f"), "Process", id, "needed to be terminated.")
else:
print(datetime.now().strftime("%H:%M:%S.%f"), "Process", id, "successfully ended itself.")
if __name__ == '__main__':
foo = Foo('/tmp/foometer.csv')
foo.start_recording(20)
time.sleep(5)
print('Ending recording')
foo.end_recording()
I'm building a Sublime Text 3 plugin to shorten URLs using the goo.gl API. Bear in mind that the following code is hacked together from other plugins and tutorial code. I have no previous experience with Python.
The plugin does actually work as it is. The URL is shortened and replaced inline. Here is the plugin code:
import sublime
import sublime_plugin
import urllib.request
import urllib.error
import json
import threading
class ShortenUrlCommand(sublime_plugin.TextCommand):
def run(self, edit):
sels = self.view.sel()
threads = []
for sel in sels:
url = self.view.substr(sel)
thread = GooglApiCall(sel, url, 5) # Send the selection, the URL and timeout to the class
threads.append(thread)
thread.start()
# Wait for threads
for thread in threads:
thread.join()
self.view.sel().clear()
self.handle_threads(edit, threads, sels)
def handle_threads(self, edit, threads, sels, offset=0, i=0, dir=1):
next_threads = []
for thread in threads:
sel = thread.sel
result = thread.result
if thread.is_alive():
next_threads.append(thread)
continue
if thread.result == False:
continue
offset = self.replace(edit, thread, sels, offset)
thread = next_threads
if len(threads):
before = i % 8
after = (7) - before
if not after:
dir = -1
if not before:
dir = 1
i += dir
self.view.set_status("shorten_url", "[%s=%s]" % (" " * before, " " * after))
sublime.set_timeout(lambda: self.handle_threads(edit, threads, sels, offset, i, dir), 100)
return
self.view.erase_status("shorten_url")
selections = len(self.view.sel())
sublime.status_message("URL shortener successfully ran on %s URL%s" %
(selections, "" if selections == 1 else "s"))
def replace(self, edit, thread, sels, offset):
sel = thread.sel
result = thread.result
if offset:
sel = sublime.Region(edit, thread.sel.begin() + offset, thread.sel.end() + offset)
self.view.replace(edit, sel, result)
return
class GooglApiCall(threading.Thread):
def __init__(self, sel, url, timeout):
self.sel = sel
self.url = url
self.timeout = timeout
self.result = None
threading.Thread.__init__(self)
def run(self):
try:
apiKey = "xxxxxxxxxxxxxxxxxxxxxxxx"
requestUrl = "https://www.googleapis.com/urlshortener/v1/url"
data = json.dumps({"longUrl": self.url})
binary_data = data.encode("utf-8")
headers = {
"User-Agent": "Sublime URL Shortener",
"Content-Type": "application/json"
}
request = urllib.request.Request(requestUrl, binary_data, headers)
response = urllib.request.urlopen(request, timeout=self.timeout)
self.result = json.loads(response.read().decode())
self.result = self.result["id"]
return
except (urllib.error.HTTPError) as e:
err = "%s: HTTP error %s contacting API. %s." % (__name__, str(e.code), str(e.reason))
except (urllib.error.URLError) as e:
err = "%s: URL error %s contacting API" % (__name__, str(e.reason))
sublime.error_message(err)
self.result = False
The problem is that I get the following error in the console every time the plugin runs:
Traceback (most recent call last):
File "/Users/joejoinerr/Library/Application Support/Sublime Text 3/Packages/URL Shortener/url_shortener.py", line 51, in <lambda>
sublime.set_timeout(lambda: self.handle_threads(edit, threads, sels, offset, i, dir), 100)
File "/Users/joejoinerr/Library/Application Support/Sublime Text 3/Packages/URL Shortener/url_shortener.py", line 39, in handle_threads
offset = self.replace(edit, thread, sels, offset)
File "/Users/joejoinerr/Library/Application Support/Sublime Text 3/Packages/URL Shortener/url_shortener.py", line 64, in replace
self.view.replace(edit, sel, result)
File "/Applications/Sublime Text.app/Contents/MacOS/sublime.py", line 657, in replace
raise ValueError("Edit objects may not be used after the TextCommand's run method has returned")
ValueError: Edit objects may not be used after the TextCommand's run method has returned
I'm not sure what the problem is from that error. I have done some research and I understand that the solution may be held in the answer to this question, but due to my lack of Python knowledge I can't figure out how to adapt it to my use case.
I was searching for a Python autocompletion plugin for Sublime and found this question. I like your plugin idea. Did you ever get it working? The ValueError is telling you that you are trying to use the edit argument to ShortenUrlCommand.run after ShortenUrlCommand.run has returned. I think you could do this in Sublime Text 2 using begin_edit and end_edit, but in 3 your plugin has to finish all of its edits before run returns (https://www.sublimetext.com/docs/3/porting_guide.html).
In your code, the handle_threads function is checking the GoogleApiCall threads every 100 ms and executing the replacement for any thread that has finished. But handle_threads has a typo that causes it to run forever: thread = next_threads where it should be threads = next_threads. This means that finished threads are never removed from the list of active threads and all threads get processed in each invocation of handle_threads (eventually throwing the exception that you see).
You actually don't need to worry about whether the GoogleApiCall treads are finished in handle_threads, though, because you call join on each one before calling handle_threads (see the python threading docs for more detail on join: https://docs.python.org/2/library/threading.html). You know the threads are finished, so you can just do something like:
def handle_threads(self, edit, threads, sels):
offset = 0
for thread in threads:
if thread.result:
offset = self.replace(edit, thread, sels, offset)
selections = len(threads)
sublime.status_message("URL shortener successfully ran on %s URL%s" %
(selections, "" if selections == 1 else "s"))
This still has problems: it does not properly handle multiple selections and it blocks the UI thread in Sublime.
Multiple Selections
When you replace multiple selections you have to consider that the replacement text might not be the same length as the text it replaces. This shifts the text after it and you have to adjust the indexes for subsequent selected regions. For example, suppose the URLs are selected in the following text and that you are replacing them with shortened URLs:
1 2 3 4 5 6 7
01234567890123456789012345678901234567890123456789012345678901234567890123
blah blah http://example.com/long blah blah http://example.com/longer blah
The second URL occupies indexes 44 to 68. After replacing the first URL we have:
1 2 3 4 5 6 7
01234567890123456789012345678901234567890123456789012345678901234567890123
blah blah http://goo.gl/abc blah blah http://example.com/longer blah
Now the second URL occupies indexes 38 to 62. It is shifted by -6: the difference between the length of the string we just replaced and the length of the string we replaced it with. You need keep track of that difference and update it after each replacement as you go along. It looks like you had this in mind with your offset argument, but never got around to implementing it.
def handle_threads(self, edit, threads, sels):
offset = 0
for thread in threads:
if thread.result:
offset = self.replace(edit, thread.sel, thread.result, offset)
selections = len(threads)
sublime.status_message("URL shortener successfully ran on %s URL%s" %
(selections, "" if selections == 1 else "s"))
def replace(self, edit, selection, replacement_text, offset):
# Adjust the selection region to account for previous replacements
adjusted_selection = sublime.Region(selection.begin() + offset,
selection.end() + offset)
self.view.replace(edit, adjusted_selection, replacement_text)
# Update the offset for the next replacement
old_len = selection.size()
new_len = len(replacement_text)
delta = new_len - old_len
new_offset = offset + delta
return new_offset
Blocking the UI Thread
I'm not familiar with Sublime plugins, so I looked at how this is handled in the Gist plugin (https://github.com/condemil/Gist). They block the UI thread for the duration of the HTTP requests. This seems undesirable, but I think there might be a problem if you don't block: the user could change the text buffer and invalidate the selection indexes before your plugin finishes its updates. If you want to go down this road, you might try moving the URL shortening calls into a WindowCommand. Then once you have the replacement text you could execute a replacement command on the current view for each one. This example gets the current view and executes ShortenUrlCommand on it. You will have to move the code that collects the shortened URLs out into ShortenUrlWrapperCommand.run:
class ShortenUrlWrapperCommand(sublime_plugin.WindowCommand):
def run(self):
view = self.window.active_view()
view.run_command("shorten_url")
I've been working on a graph traversal algorithm over a simple network and I'd like to run it using multiprocessing since it it going to require a lot of I/O bounded calls when I scale it over the full network. The simple version runs pretty fast:
already_seen = {}
already_seen_get = already_seen.get
GH_add_node = GH.add_node
GH_add_edge = GH.add_edge
GH_has_node = GH.has_node
GH_has_edge = GH.has_edge
def graph_user(user, depth=0):
logger.debug("Searching for %s", user)
logger.debug("At depth %d", depth)
users_to_read = followers = following = []
if already_seen_get(user):
logging.debug("Already seen %s", user)
return None
result = [x.value for x in list(view[user])]
if result:
result = result[0]
following = result['following']
followers = result['followers']
users_to_read = set().union(following, followers)
if not GH_has_node(user):
logger.debug("Adding %s to graph", user)
GH_add_node(user)
for follower in users_to_read:
if not GH_has_node(follower):
GH_add_node(follower)
logger.debug("Adding %s to graph", follower)
if depth < max_depth:
graph_user(follower, depth + 1)
if GH_has_edge(follower, user):
GH[follower][user]['weight'] += 1
else:
GH_add_edge(user, follower, {'weight': 1})
Its actually significantly faster than my multiprocessing version:
to_write = Queue()
to_read = Queue()
to_edge = Queue()
already_seen = Queue()
def fetch_user():
seen = {}
read_get = to_read.get
read_put = to_read.put
write_put = to_write.put
edge_put = to_edge.put
seen_get = seen.get
while True:
try:
logging.debug("Begging for a user")
user = read_get(timeout=1)
if seen_get(user):
continue
logging.debug("Adding %s", user)
seen[user] = True
result = [x.value for x in list(view[user])]
write_put(user, timeout=1)
if result:
result = result.pop()
logging.debug("Got user %s and result %s", user, result)
following = result['following']
followers = result['followers']
users_to_read = list(set().union(following, followers))
[edge_put((user, x, {'weight': 1})) for x in users_to_read]
[read_put(y, timeout=1) for y in users_to_read if not seen_get(y)]
except Empty:
logging.debug("Fetches complete")
return
def write_node():
users = []
users_app = users.append
write_get = to_write.get
while True:
try:
user = write_get(timeout=1)
logging.debug("Writing user %s", user)
users_app(user)
except Empty:
logging.debug("Users complete")
return users
def write_edge():
edges = []
edges_app = edges.append
edge_get = to_edge.get
while True:
try:
edge = edge_get(timeout=1)
logging.debug("Writing edge %s", edge)
edges_app(edge)
except Empty:
logging.debug("Edges Complete")
return edges
if __name__ == '__main__':
pool = Pool(processes=1)
to_read.put(me)
pool.apply_async(fetch_user)
users = pool.apply_async(write_node)
edges = pool.apply_async(write_edge)
GH.add_weighted_edges_from(edges.get())
GH.add_nodes_from(users.get())
pool.close()
pool.join()
What I can't figure out is why the single process version is so much faster. In theory, the multiprocessing version should be writing and reading simultaneously. I suspect there is lock contention on the queues and that is the cause of the slow down but I don't really have any evidence of that. When I scale the number of fetch_user processes it seems to run faster, but then I have issues with synchronizing the data seen across them. So some thoughts I've had are
Is this even a good application for
multiprocessing? I was originally
using it because I wanted to be able
to fetch from the db in parallell.
How can I avoid resource contention when reading and writing from the same queue?
Did I miss some obvious caveat for the design?
What can I do to share a lookup table between the readers so I don't keep fetching the same user twice?
When increasing the number of fetching processes they writers eventually lock. It looks like the write queue is not being written to, but the read queue is full. Is there a better way to handle this situation than with timeouts and exception handling?
Queues in Python are synchronized. This means that only one thread at a time can read/write, this will definitely provoke a bottleneck in your app.
One better solution is to distribute the processing based on a hash function and assign the processing to the threads with a simple module operation. So for instance if you have 4 threads you could have 4 queues:
thread_queues = []
for i in range(4):
thread_queues = Queue()
for user in user_list:
user_hash=hash(user.user_id) #hash in here is just shortcut to some standard hash utility
thread_id = user_hash % 4
thread_queues[thread_id].put(user)
# From here ... your pool of threads access thread_queues but each thread ONLY accesses
# one queue based on a numeric id given to each of them.
Most of hash functions will distribute evenly your data. I normally use UMAC. But maybe you can just try with the hash function from the Python String implementation.
Another improvement would be to avoid the use of Queues and use a non-sync object, such a list.