Passing thread names as gloabl variables to a function - python

I'm a bit stuck, i'm trying to pass thread names given by the system to my function so that i can print the start and end time of the current thread working in the function, i'm using global variables name for that. The user has to input a number in the given interval. The thread names work fine when i inputed 1001 but if i input numbers like 1200 or 10001 the names do not fit anymore. I put examples of the output, output 1 is not what i'm looking, output 2 is what i need. I'm not sure what is causing the name change. If any additional information is needed i'm happy to provide it
import os
from posixpath import abspath
import time
import sys
import signal
import threading
import platform
import subprocess
from pathlib import Path
import math
lokot = threading.Lock()
lista = []
name = 0
name3 = 0
def divisor(start,end):
lokot.acquire()
start = time.time()
print('{} started working at {}'.format(name, start))
for i in range(int(start),int(end)+1):
if int(end) % i == 0:
lista.append(i)
end = time.time()
print('{} ended working at {}'.format(name, end))
lokot.release()
def new_lista():
lokot.acquire()
start = time.time()
nlista = []
for i in lista:
if i % 2 == 0:
nlista.append(i)
print(nlista)
print('{} was executed in time frame {}'.format(name3,time.time()-start))
lokot.release()
def f4():
while (True):
print ('Input a non negative number in given range <1000,200000>')
number = input()
if number.isalpha() or not number or int(number) not in range(1000,200000) :
print('Number entered is not in the interval <1000,200000>')
continue
else:
global name
global name3
x = int(number) / 2
t1 = threading.Thread(target=divisor, args=(1, x))
t2 = threading.Thread(target=divisor, args=(1, number))
t3 = threading.Thread(target=nova_lista)
name = t1.name
t1.start()
name = t2.name
t2.start()
name3 = t3.name
t3.start()
t1.join()
t2.join()
t3.join()
break
Input 1:
100001
Output 1:
Thread-1 started working at 1624538800.4813018
Thread-2 ended working at 1624538800.4887686
Thread-2 started working at 1624538800.4892647
Thread-2 ended working at 1624538800.5076165
[2, 4, 8, 10, 16, 20, 40, 50, 80, 100, 200, 250, 400, 500, 1000, 1250, 2000, 2500, 5000, 6250, 10000, 12500, 25000, 50000]
Thread-3 dwas executed in time frame 0.0
Input 2:
1001
Output 2:
Thread-1 started working at 1624538882.90607
Thread-1 ended working at 1624538882.9070616
Thread-2 started working at 1624538882.9074266
Thread-2 ended working at 1624538882.9089162
[2, 4, 8, 10, 20, 40, 50, 100, 200, 250, 500, 1000, 1250, 2500, 5000]
Thread-3 dretva se izvodila 0.0

This won't necessarily work:
name = t1.name
t1.start()
name = t2.name
Nothing prevents the second assignment from happening before the t1 thread accesses the name variable.
Q: Why don't you just assign names when you create the threads instead of letting the Threading library assign them? E.g.;
t1 = threading.Thread(target=divisor, args=(1, x), name="t1")

Related

Missing result when using threadid parallel computing python

I'm running code on a server and splitting across the 4 cores I have. The results are as expected when I have 3 parameter values but when I have 4, it only gives 3 results, code is like:
parameter_values = [0.2,0.3,0.4,0.5]
def runsoncores(threadid):
np.random.seed(threadid)
tf.random.set_seed(threadid)
parameter_for_sim = parameter_values[threadid]
# **runs simulation here with the parameter value**
filename = 'results_' + str(parameter_for_sim) + '.npy'
np.save(filename,samples)
def parallel_run(threadid, gpu):
with tf.name_scope(gpu):
with tf.device(gpu):
runoncores(threadid)
return
gpu_list = tf.config.experimental.list_logical_devices('GPU')
num_threads = len(gpu_list)
print(num_threads)
threads = list()
start = time.time()
for index in range(num_threads):
x = threading.Thread(target=parallel_run, args=(index,gpu_list[index].name))
threads.append(x)
x.start()
for index, thread in enumerate(threads):
thread.join()
end = time.time()
print('Threaded time taken: ', end-start)
What is happening and how can I get the same number of results as input values? Thanks!

"aborted (disconnected)" on multithreading in python 3.5.2

I wrote this python code that runs a simple markov chain to generate output, and I want to launch several threads of the same code and have them all report back to one central part of the program. I have assigned a queue to send messages on and one to receive them on, one for each thread I'm launching. In the main loop of the thread, I put the first item from the moves deque into qIn and I wait for a prompt from qOut to produce another result. In the main loop of the central part of the program, one of the threads is randomly selected every second and the data it sent is consumed and it is signalled to send more. This all works as intended when running one thread, but as soon as I increase the number of threads, such as 3, after a few seconds I get a message saying "aborted (disconnected)" and execution stops. What have I done wrong here?
import time
from numpy.random import choice
from numpy import array
from collections import OrderedDict, deque
from threading import *
from queue import Queue
import random
class processMarkovChain():
def __init__(self, qIn, qOut, IOtoCPU=65, IOtoIO=35, CPUtoCPU=75, CPUTtoIO=25, ID=0):
IOTransitionProbabilities = OrderedDict( sorted({"CPU":IOtoCPU, "IO ":IOtoIO }.items(), key=lambda x: x[0]))
CPUTransitionProbabilities = OrderedDict( sorted({"CPU":CPUtoCPU, "IO ":CPUTtoIO}.items(), key=lambda x: x[0]))
self.TP = OrderedDict( sorted({"CPU":CPUTransitionProbabilities,
"IO ":IOTransitionProbabilities}.items(), key=lambda x: x[0]))
self.state = "CPU"
self.moves = deque(self.generate_move() for _ in range(50))
self.ID = ID
self.qIn = qIn
self.qOut = qOut
def generate_move(self):
draw = choice(list(self.TP.keys()), 1, p=array(list(self.TP[self.state].values()))/100)
self.state = draw[0]
return draw[0]
def count(self, state):
counter = 0
for s in self.moves:
if s != state:
break
counter += 1
return counter
def run_loop(self):
while(1):
time.sleep(1)
retMove = self.moves.popleft()
print(self.ID, retMove, self.count(retMove))
self.qIn.put([retMove, self.count(retMove)], timeout=1000)
self.qOut.get(timeout=1000)
self.moves.append(self.generate_move())
numThreads = 1
inQueueList = [Queue() for ID in range(numThreads)]
outQueueList = [Queue() for ID in range(numThreads)]
threadList = [Thread(target=processMarkovChain(ID=ID,
qIn=inQueueList[ID],
qOut=outQueueList[ID]).run_loop).start() for ID in range(numThreads)]
while 1:
time.sleep(1)
luckyThread = random.randint(0,numThreads-1)
print(inQueueList[luckyThread].get(timeout=1000))
outQueueList[luckyThread].put("hello", timeout=1000)
Sample output from one thread:
0 CPU 1
['CPU', 1]
0 CPU 0
['CPU', 0]
0 IO 1
['IO ', 1]
0 IO 0
['IO ', 0]
0 CPU 2
['CPU', 2]
0 CPU 1
['CPU', 1]
0 CPU 0
['CPU', 0]
0 IO 0
['IO ', 0]
0 CPU 0
['CPU', 0]
0 IO 1
...
Sample output from three threads:
0 CPU 6
1 IO 0
2 IO 4
['IO ', 4]
['CPU', 6]aborted (disconnected)

python time sliding window variation

I'm stuck with a variation of sliding window problem!
Usually we set the number of element to slide but in my case I want to slide the time!
The goal that I would like to reach is a function (thread in this case)
that is able to create a "time" windows in seconds (given by user).
Starting from the first element of the queue in this case:
[datetime.time(7, 6, 14, 537370), 584 add 5 seconds -> 7:6:19.537370 (ending point) and sum all elements in this interval:
[datetime.time(7, 6, 14, 537370), 584]
[datetime.time(7, 6, 18, 542798), 761]
Total: 584+761= 1345
Then create another "windows" with the second elements and goes on.
IMPORTANT: One item can be part of several window. the item are generated meanwhile, a naif solution with function that sleep for n second and then flush the queue is not good for my problem.
I think its a variation of this post:
Flexible sliding window (in Python)
But still can't solve the problem! Any help or suggests will be appreciated.
Thanks!
Example list of elements:
[datetime.time(7, 6, 14, 537370), 584]
[datetime.time(7, 6, 18, 542798), 761]
[datetime.time(7, 6, 20, 546007), 848]
[datetime.time(7, 6, 24, 550969), 20]
[datetime.time(7, 6, 27, 554370), 478]
[datetime.time(7, 6, 27, 554628), 12]
[datetime.time(7, 6, 31, 558919), 29]
[datetime.time(7, 6, 31, 559562), 227]
[datetime.time(7, 6, 32, 560863), 379]
[datetime.time(7, 6, 35, 564863), 132]
[datetime.time(7, 6, 37, 567276), 651]
[datetime.time(7, 6, 38, 568652), 68]
[datetime.time(7, 6, 40, 569861), 100]
[datetime.time(7, 6, 41, 571459), 722]
[datetime.time(7, 6, 44, 574802), 560]
...
Code:
import random
import time
import threading
import datetime
from multiprocessing import Queue
q = Queue()
#this is a producer that put elements in queue
def t1():
element = [0,0]
while True:
time.sleep(random.randint(0, 5))
element[0] = datetime.datetime.now().time()
element[1] = random.randint(0, 1000)
q.put(element)
#this is a consumer that sum elements inside a window of n seconds
#Ineed something a sliding window time of ten seconds that sum all elements for n seconds
def t2():
windowsize = 5 #size of the window 5 seconds
while not queue.empty():
e = q.get()
start = e[0] #the first element is the beginning point
end = start + datetime.timedelta(seconds=windowsize) #ending point
sum += e[1]
#some code that solve the problem :)
a = threading.Thread(target=t1)
a.start()
b = threading.Thread(target=t2)
b.start()
while True:
time.sleep(1)
Would this do? This is how I understood your problem. What this does is it creates a class that keeps track of things. You either add to this by tw.insert() or sum with tw.sum_window(seconds).
When you initialise TimeWindow, you can give it a max size parameter, default is 10 seconds. When you add elements or calculate sums, it does a clean up so that before every insert or sum operation, first element time e[0][0] and last element time e[n][0] are within 10 seconds of each other. Older entries are expunged. A "poller" thread is there to track your requests.
I have added two queues as I do not know what you intend to do with results. Now if you want to request data starting from now to 5 seconds in the future, you create a request and put it in queue. The request has a random id so that you can match it to results. Your main thread needs to monitor result queue and after five seconds, every request sent to queue return with the same id and the sum.
If this is not what you want to do, then I just don't understand what is it that you try to achieve here. Even this is already rather complicated and there may be a much simpler way to achieve what you intend to do.
import random
import time
import threading
import datetime
import Queue
import uuid
from collections import deque
q_lock = threading.RLock()
class TimeWindow(object):
def __init__(self, max_size=10):
self.max_size = max_size
self.q = deque()
def expire(self):
time_now = datetime.datetime.now()
while True:
try:
oldest_element = self.q.popleft()
oe_time = oldest_element[0]
if oe_time + datetime.timedelta(seconds=self.max_size) > time_now:
self.q.appendleft(oldest_element)
break
except IndexError:
break
def insert(self,elm):
self.expire()
self.q.append(elm)
def sum_window(self, start, end):
self.expire()
try:
_ = self.q[0]
except IndexError:
return 0
result=0
for f in self.q:
if start < f[0] < end:
result += f[1]
else:
pass
return result
tw = TimeWindow()
def t1():
while True:
time.sleep(random.randint(0, 3))
element = [datetime.datetime.now(), random.randint(0,1000)]
with q_lock:
tw.insert(element)
def poller(in_q, out_q):
pending = []
while True:
try:
new_request = in_q.get(0.1)
new_request["end"] = new_request["start"] + datetime.timedelta(seconds=new_request["frame"])
pending.append(new_request)
except Queue.Empty:
pass
new_pending = []
for a in pending:
if a["end"] < datetime.datetime.now():
with q_lock:
r_sum = tw.sum_window(a["start"], a["end"])
r_structure = {"id": a["id"], "result": r_sum}
out_q.put(r_structure)
else:
new_pending.append(a)
pending = new_pending
a = threading.Thread(target=t1)
a.daemon = True
a.start()
in_queue = Queue.Queue()
result_queue = Queue.Queue()
po = threading.Thread(target=poller, args=(in_queue, result_queue,))
po.daemon = True
po.start()
while True:
time.sleep(1)
newr = {"id": uuid.uuid4(), "frame": 5, "start": datetime.datetime.now()}
in_queue.put(newr)
try:
ready = result_queue.get(0)
print ready
except Queue.Empty:
pass
garim#wof:~$ python solution.py
1 t1 produce element: 16:09:30.472497 1
2 t1 produce element: 16:09:33.475714 9
3 t1 produce element: 16:09:34.476922 10
4 t1 produce element: 16:09:37.480100 7
solution: 16:09:37.481171 {'id': UUID('adff334f-a97a-459d-8dcc-f28309e25574'), 'result': 19}
5 t1 produce element: 16:09:38.481352 10
solution: 16:09:38.482687 {'id': UUID('0a7481e5-e993-439a-9f7e-2c5aeef86155'), 'result': 19}
It still doent works :( I add a counter for each element it inserts with function t1. The goal is do the sum (result_queue.get) at this time:
16:09:35.472497 ---> 16:09:30.472497 + 5 seconds
no before. Only then the element goes out. The next time the sum will be done at:
16:09:35.475714 ---> 16:09:33.475714 + 5 seconds
I understand that it's hard to explain.. With both of your solution the time window slide so I can consider the problem solved :) I will try to improve when the function sum will be execute, that time trigger is important. I acquire a lot useful knowledge. Thanks for helping.

object of type '_Task' has no len() error

I am using the parallel programming module for python I have a function that returns me an array but when I print the variable that contain the value of the function parallelized returns me "pp._Task object at 0x04696510" and not the value of the matrix.
Here is the code:
from __future__ import print_function
import scipy, pylab
from scipy.io.wavfile import read
import sys
import peakpicker as pea
import pp
import fingerprint as fhash
import matplotlib
import numpy as np
import tdft
import subprocess
import time
if __name__ == '__main__':
start=time.time()
#Peak picking dimensions
f_dim1 = 30
t_dim1 = 80
f_dim2 = 10
t_dim2 = 20
percentile = 80
base = 100 # lowest frequency bin used (peaks below are too common/not as useful for identification)
high_peak_threshold = 75
low_peak_threshold = 60
#TDFT parameters
windowsize = 0.008 #set the window size (0.008s = 64 samples)
windowshift = 0.004 #set the window shift (0.004s = 32 samples)
fftsize = 1024 #set the fft size (if srate = 8000, 1024 --> 513 freq. bins separated by 7.797 Hz from 0 to 4000Hz)
#Hash parameters
delay_time = 250 # 250*0.004 = 1 second#200
delta_time = 250*3 # 750*0.004 = 3 seconds#300
delta_freq = 128 # 128*7.797Hz = approx 1000Hz#80
#Time pair parameters
TPdelta_freq = 4
TPdelta_time = 2
#Cargando datos almacenados
database=np.loadtxt('database.dat')
songnames=np.loadtxt('songnames.dat', dtype=str, delimiter='\t')
separator = '.'
print('Please enter an audio sample file to identify: ')
userinput = raw_input('---> ')
subprocess.call(['ffmpeg','-y','-i',userinput, '-ac', '1','-ar', '8k', 'filesample.wav'])
sample = read('filesample.wav')
userinput = userinput.split(separator,1)[0]
print('Analyzing the audio sample: '+str(userinput))
srate = sample[0] #sample rate in samples/second
audio = sample[1] #audio data
spectrogram = tdft.tdft(audio, srate, windowsize, windowshift, fftsize)
mytime = spectrogram.shape[0]
freq = spectrogram.shape[1]
print('The size of the spectrogram is time: '+str(mytime)+' and freq: '+str(freq))
threshold = pea.find_thres(spectrogram, percentile, base)
peaks = pea.peak_pick(spectrogram,f_dim1,t_dim1,f_dim2,t_dim2,threshold,base)
print('The initial number of peaks is:'+str(len(peaks)))
peaks = pea.reduce_peaks(peaks, fftsize, high_peak_threshold, low_peak_threshold)
print('The reduced number of peaks is:'+str(len(peaks)))
#Store information for the spectrogram graph
samplePeaks = peaks
sampleSpectro = spectrogram
hashSample = fhash.hashSamplePeaks(peaks,delay_time,delta_time,delta_freq)
print('The dimensions of the hash matrix of the sample: '+str(hashSample.shape))
# tuple of all parallel python servers to connect with
ppservers = ()
#ppservers = ("10.0.0.1",)
if len(sys.argv) > 1:
ncpus = int(sys.argv[1])
# Creates jobserver with ncpus workers
job_server = pp.Server(ncpus, ppservers=ppservers)
else:
# Creates jobserver with automatically detected number of workers
job_server = pp.Server(ppservers=ppservers)
print ("Starting pp with", job_server.get_ncpus(), "workers")
print('Attempting to identify the sample audio clip.')
Here I call the function in fingerprint, the commented line worked, but when I try parallelize don't work:
timepairs = job_server.submit(fhash.findTimePairs, (database, hashSample, TPdelta_freq, TPdelta_time, ))
# timepairs = fhash.findTimePairs(database, hashSample, TPdelta_freq, TPdelta_time)
print (timepairs)
#Compute number of matches by song id to determine a match
numSongs = len(songnames)
songbins= np.zeros(numSongs)
numOffsets = len(timepairs)
offsets = np.zeros(numOffsets)
index = 0
for i in timepairs:
offsets[index]=i[0]-i[1]
index = index+1
songbins[i[2]] += 1
# Identify the song
#orderarray=np.column_stack((songbins,songnames))
#orderarray=orderarray[np.lexsort((songnames,songbins))]
q3=np.percentile(songbins, 75)
q1=np.percentile(songbins, 25)
j=0
for i in songbins:
if i>(q3+(3*(q3-q1))):
print("Result-> "+str(i)+":"+songnames[j])
j+=1
end=time.time()
print('Tiempo: '+str(end-start)+' s')
print("Time elapsed: ", +time.time() - start, "s")
fig3 = pylab.figure(1003)
ax = fig3.add_subplot(111)
ind = np.arange(numSongs)
width = 0.35
rects1 = ax.bar(ind,songbins,width,color='blue',align='center')
ax.set_ylabel('Number of Matches')
ax.set_xticks(ind)
xtickNames = ax.set_xticklabels(songnames)
matplotlib.pyplot.setp(xtickNames)
pylab.title('Song Identification')
fig3.show()
pylab.show()
print('The sample song is: '+str(songnames[np.argmax(songbins)]))
The function in fingerprint that I try to parallelize is:
def findTimePairs(hash_database,sample_hash,deltaTime,deltaFreq):
"Find the matching pairs between sample audio file and the songs in the database"
timePairs = []
for i in sample_hash:
for j in hash_database:
if(i[0] > (j[0]-deltaFreq) and i[0] < (j[0] + deltaFreq)):
if(i[1] > (j[1]-deltaFreq) and i[1] < (j[1] + deltaFreq)):
if(i[2] > (j[2]-deltaTime) and i[2] < (j[2] + deltaTime)):
timePairs.append((j[3],i[3],j[4]))
else:
continue
else:
continue
else:
continue
return timePairs
The complete error is:
Traceback (most recent call last):
File "analisisPrueba.py", line 93, in <module>
numOffsets = len(timepairs)
TypeError: object of type '_Task' has no len()
The submit() method submits a task to the server. What you get back is a reference to the task, not its result. (How could it return its result? submit() returns before any of that work has been done!) You should instead provide a callback function to receive the results. For example, timepairs.append is a function that will take the result and append it to the list timepairs.
timepairs = []
job_server.submit(fhash.findTimePairs, (database, hashSample, TPdelta_freq, TPdelta_time, ), callback=timepairs.append)
(Each findTimePairs call should calculate one result, in case that isn't obvious, and you should submit multiple tasks. Otherwise you're invoking all the machinery of Parallel Python for no reason. And make sure you call job_server.wait() to wait for all the tasks to finish before trying to do anything with your results. In short, read the documentation and some example scripts and make sure you understand how it works.)

Multiprocessing in Python to process a list of parameters

I'm writing my first multiprocessing program in python.
I want to create a list of values to be processed, and 8 processes (number os CPU cores) will consume and process the list of values.
I wrote the following python code:
__author__ = 'Rui Martins'
from multiprocessing import cpu_count, Process, Lock, Value
def proc(lock, number_of_active_processes, valor):
lock.acquire()
number_of_active_processes.value+=1
print "Active processes:", number_of_active_processes.value
lock.release()
# DO SOMETHING ...
for i in range(1, 100):
valor=valor**2
# (...)
lock.acquire()
number_of_active_processes.value-=1
lock.release()
if __name__ == '__main__':
proc_number=cpu_count()
number_of_active_processes=Value('i', 0)
lock = Lock()
values=[11, 24, 13, 40, 15, 26, 27, 8, 19, 10, 11, 12, 13]
values_processed=0
processes=[]
for i in range(proc_number):
processes+=[Process()]
while values_processed<len(values):
while number_of_active_processes.value < proc_number and values_processed<len(values):
for i in range(proc_number):
if not processes[i].is_alive() and values_processed<len(values):
processes[i] = Process(target=proc, args=(lock, number_of_active_processes, values[values_processed]))
values_processed+=1
processes[i].start()
while number_of_active_processes.value == proc_number:
# BUG: always number_of_active_processes.value == 8 :(
print "Active processes:", number_of_active_processes.value
print ""
print "Active processes at END:", number_of_active_processes.value
And, I have the following problem:
The program never stop
I get out of RAM
Simplifying your code to the following:
def proc(lock, number_of_active_processes, valor):
lock.acquire()
number_of_active_processes.value += 1
print("Active processes:", number_of_active_processes.value)
lock.release()
# DO SOMETHING ...
for i in range(1, 100):
print(valor)
valor = valor **2
# (...)
lock.acquire()
number_of_active_processes.value -= 1
lock.release()
if __name__ == '__main__':
proc_number = cpu_count()
number_of_active_processes = Value('i', 0)
lock = Lock()
values = [11, 24, 13, 40, 15, 26, 27, 8, 19, 10, 11, 12, 13]
values_processed = 0
processes = [Process() for _ in range(proc_number)]
while values_processed < len(values)-1:
for p in processes:
if not p.is_alive():
p = Process(target=proc,
args=(lock, number_of_active_processes, values[values_processed]))
values_processed += 1
p.start()
If you run it like above the print(valor) added you see exactly what is happening, you are exponentially growing valor to the point you run out of memory, you don't get stuck in the while you get stuck in the for loop.
This is the output at the 12th process adding a print(len(srt(valor))) after a fraction of a second and it just keeps on going:
2
3
6
11
21
.........
59185
70726
68249
73004
77077
83805
93806
92732
90454
104993
118370
136498
131073
Just changing your loop to the following:
for i in range(1, 100):
print(valor)
valor = valor *2
The last number created is:
6021340351084089657109340225536
Using your own code you seem to get stuck in the while but it is valor is growing in the for loop to numbers with as many digits as:
167609
180908
185464
187612
209986
236740
209986
And on....
The problem is not your multiprocessing code. It's the pow operator in the for loop:
for i in range(1, 100):
valor=valor**2
the final result would be pow(val, 2**100), and this is too big, and calculate it would cost too much time and memory. so you got out of memory error in the last.
4 GB = 4 * pow(2, 10) * pow(2, 10) * pow(2, 20) * 8 bit = 2**35 bit
and for your smallest number 8:
pow(8, 2**100) = pow(2**3, 2**100) = pow(2, 3*pow(2, 100))
pow(2, 3*pow(2, 100))bit/4GB = 3*pow(2, 100-35) = 3*pow(2, 65)
it need 3*pow(2, 65) times of 4 GB memory.

Categories