I'm trying to solve Problem 8 in project euler with multi-threading technique in python.
Find the greatest product of five consecutive digits in the 1000-digit number. The number can be found here.
My approach is to generate product from chunks of 5 from the original list and repeat this process 5 times, each with the starting index shifted one to the right.
Here is my thread class
class pThread(threading.Thread):
def __init__(self, l):
threading.Thread.__init__(self)
self.l = l
self.p = 0
def run(self):
def greatest_product(l):
"""
Divide the list into chunks of 5 and find the greatest product
"""
def product(seq):
return reduce(lambda x,y : x*y, seq)
def chunk_product(l, n=5):
for i in range(0, len(l), n):
yield product(l[i:i+n])
result = 0
for p in chunk_product(num):
result = result > p and result or p
return result
self.p = greatest_product(self.l)
When I try to create 5 threads to cover all 5-digit chunks in my original list, the manual approach below gives the correct answer, with num being the list of single-digit numbers that I parse from the text:
thread1 = pThread(num)
del num[0]
thread2 = pThread(num)
del num[0]
thread3 = pThread(num)
del num[0]
thread4 = pThread(num)
del num[0]
thread5 = pThread(num)
thread1.start()
thread2.start()
thread3.start()
thread4.start()
thread5.start()
thread1.join()
thread2.join()
thread3.join()
thread4.join()
thread5.join()
def max(*args):
result = 0
for i in args:
result = i > result and i or result
return result
print max(thread1.p, thread2.p, thread3.p, thread4.p, thread5.p)
But this doesn't give the correct result:
threads = []
for i in range(0, 4):
tmp = num[:]
del tmp[0:i+1]
thread = pThread(tmp)
thread.start()
threads.append(thread)
for i in range(0, 4):
threads[i].join()
What did I do wrong here? I'm very new to multithreading so please be gentle.
There are 3 problems:
The first is that the "manual" approach does not give the correct answer. It just happens that the correct answer to the problem is at the offset 4 from the start of your list. You can see this by using:
import operator as op
print max(reduce(op.mul, num[i:i+5]) for i in range(1000))
for k in range(5):
print max(reduce(op.mul, num[i:i+5]) for i in range(k, 1000, 5))
One problem with your "manual" approach is that the threads share the num variable, each has the same list. So when you do del num[0], all threadX.l are affected. The fact that you consistently get the same answer is due to the second problem.
The line
for p in chunk_product(num):
should be:
for p in chunk_product(l):
since you want to use the parameter of function greatest_product(l) and not the global variable num.
In the second method you only spawn 4 threads since the loops range over [0, 1, 2, 3]. Also, you want to delete the values tmp[0:i] and not tmp[0:i+1]. Here is the code:
threads = []
for i in range(5):
tmp = num[:]
del tmp[0:i]
thread = pThread(tmp)
thread.start()
threads.append(thread)
for i in range(5):
threads[i].join()
print len(threads), map(lambda th: th.p, threads)
print max(map(lambda th: th.p, threads))
I took a stab at this mainly to get some practice multiprocessing, and to learn how to use argparse.
This took around 4-5 gigs of ram just in case your machine doesn't have a lot.
python euler.py -l 50000000 -n 100 -p 8
Took 5.836833333969116 minutes
The largest product of 100 consecutive numbers is: a very large number
If you type python euler.py -h at the commandline you get:
usage: euler.py [-h] -l L [L ...] -n N [-p P]
Calculates the product of consecutive numbers and return the largest product.
optional arguments:
-h, --help show this help message and exit
-l L [L ...] A single number or list of numbers, where each # is seperated
by a space
-n N A number that specifies how many consecutive numbers should be
multiplied together.
-p P Number of processes to create. Optional, defaults to the # of
cores on the pc.
And the code:
"""A multiprocess iplementation for calculation the maximum product of N consecutive
numbers in a given range (list of numbers)."""
import multiprocessing
import math
import time
import operator
from functools import reduce
import argparse
def euler8(alist,lenNums):
"""Returns the largest product of N consecutive numbers in a given range"""
return max(reduce(operator.mul, alist[i:i+lenNums]) for i in range(len(alist)))
def split_list_multi(listOfNumbers,numLength,threads):
"""Split a list into N parts where N is the # of processes."""
fullLength = len(listOfNumbers)
single = math.floor(fullLength/threads)
results = {}
counter = 0
while counter < threads:
if counter == (threads-1):
temp = listOfNumbers[single*counter::]
if counter == 0:
results[str(counter)] = listOfNumbers[single*counter::]
else:
prevListIndex = results[str(counter-1)][-int('{}'.format(numLength-1))::]
newlist = prevListIndex + temp
results[str(counter)] = newlist
else:
temp = listOfNumbers[single*counter:single*(counter+1)]
if counter == 0:
newlist = temp
else:
prevListIndex = results[str(counter-1)][-int('{}'.format(numLength-1))::]
newlist = prevListIndex + temp
results[str(counter)] = newlist
counter += 1
return results,threads
def worker(listNumbers,number,output):
"""A worker. Used to run seperate processes and put the results in the queue"""
result = euler8(listNumbers,number)
output.put(result)
def main(listOfNums,lengthNumbers,numCores=multiprocessing.cpu_count()):
"""Runs the module.
listOfNums must be a list of ints, or single int
lengthNumbers is N (an int) where N is the # of consecutive numbers to multiply together
numCores (an int) defaults to however many the cpu has, can specify a number if you choose."""
if isinstance(listOfNums,list):
if len(listOfNums) == 1:
valuesToSplit = [i for i in range(int(listOfNums[0]))]
else:
valuesToSplit = [int(i) for i in listOfNums]
elif isinstance(listOfNums,int):
valuesToSplit = [i for i in range(listOfNums)]
else:
print('First arg must be a number or a list of numbers')
split = split_list_multi(valuesToSplit,lengthNumbers,numCores)
done_queue = multiprocessing.Queue()
jobs = []
startTime = time.time()
for num in range(split[1]):
numChunks = split[0][str(num)]
thread = multiprocessing.Process(target=worker, args=(numChunks,lengthNumbers,done_queue))
jobs.append(thread)
thread.start()
resultlist = []
for i in range(split[1]):
resultlist.append(done_queue.get())
for j in jobs:
j.join()
resultlist = max(resultlist)
endTime = time.time()
totalTime = (endTime-startTime)/60
print("Took {} minutes".format(totalTime))
return print("The largest product of {} consecutive numbers is: {}".format(lengthNumbers, resultlist))
if __name__ == '__main__':
#To call the module from the commandline with arguments
parser = argparse.ArgumentParser(description="""Calculates the product of consecutive numbers \
and return the largest product.""")
parser.add_argument('-l', nargs='+', required=True,
help='A single number or list of numbers, where each # is seperated by a space')
parser.add_argument('-n', required=True, type=int,
help = 'A number that specifies how many consecutive numbers should be \
multiplied together.')
parser.add_argument('-p', default=multiprocessing.cpu_count(), type=int,
help='Number of processes to create. Optional, defaults to the # of cores on the pc.')
args = parser.parse_args()
main(args.l, args.n, args.p)
Related
I don't know if this is a good way to optimize, but basically I am using python inside a 3D app to create random colors per object. And the code I have works well with objects within 10k polygons. But it crashes in 100k polygons. Is there a way to do it by chunks in the loop, basically I have the for loop and using an if statement to filter the first 100. But then I need another 100, and another 100, etc. How can I write that? Maybe with a time sleep between each. It's not going to be faster but at least won't possible crash the program. Thanks.
for i, n in enumerate(uvShellIds):
#code can only perform well within sets of 100 elements
limit = 100 #?
if 0 <= i <= 100:
#do something
print(n)
# now I need it to work on a new set of 100 elements
#if 101 <= i <= 200:
#(...keep going between sets of 100...)
My current code :
import maya.OpenMaya as om
import maya.cmds as cmds
import random
def getUvShelList(name):
selList = om.MSelectionList()
selList.add(name)
selListIter = om.MItSelectionList(selList, om.MFn.kMesh)
pathToShape = om.MDagPath()
selListIter.getDagPath(pathToShape)
meshNode = pathToShape.fullPathName()
uvSets = cmds.polyUVSet(meshNode, query=True, allUVSets =True)
allSets = []
for uvset in uvSets:
shapeFn = om.MFnMesh(pathToShape)
shells = om.MScriptUtil()
shells.createFromInt(0)
# shellsPtr = shells.asUintPtr()
nbUvShells = shells.asUintPtr()
uArray = om.MFloatArray() #array for U coords
vArray = om.MFloatArray() #array for V coords
uvShellIds = om.MIntArray() #The container for the uv shell Ids
shapeFn.getUVs(uArray, vArray)
shapeFn.getUvShellsIds(uvShellIds, nbUvShells, uvset)
# shellCount = shells.getUint(shellsPtr)
shells = {}
for i, n in enumerate(uvShellIds):
#print(i,n)
limit = 100
if i <= limit:
if n in shells:
# shells[n].append([uArray[i],vArray[i]])
shells[n].append( '%s.map[%i]' % ( name, i ) )
else:
# shells[n] = [[uArray[i],vArray[i]]]
shells[n] = [ '%s.map[%i]' % ( name, i ) ]
allSets.append({uvset: shells})
for shell in shells:
selection_shell = shells.get(shell)
cmds.select(selection_shell)
#print(shells.get(shell))
facesSel = cmds.polyListComponentConversion(fromUV=True, toFace=True)
cmds.select(facesSel)
r = [random.random() for i in range(3)]
cmds.polyColorPerVertex(facesSel,rgb=(r[0], r[1], r[2]), cdo=1 )
cmds.select(deselect=1)
getUvShelList( 'polySurface359' )
You can use islice from itertools to chunk.
from itertools import islice
uvShellIds = list(range(1000))
iterator = iter(uvShellIds)
while True:
chunk = list(islice(iterator, 100))
if not chunk:
break
print(chunk) # chunk contains 100 elements you can process
I don't know how well it fits in your current code but, below is how you can process the chunks:
from itertools import islice
uvShellIds = list(range(1000))
iterator = iter(uvShellIds)
offset = 0
while True:
chunk = list(islice(iterator, 100))
if not chunk:
break
# Processing chunk items
for i, n in enumerate(chunk):
# offset + i will give you the right index referring to the uvShellIds variable
# Then , perform your actions
if n in shells:
# shells[n].append([uArray[i],vArray[i]])
shells[n].append( '%s.map[%i]' % ( name, offset + i ) )
else:
# shells[n] = [[uArray[i],vArray[i]]]
shells[n] = [ '%s.map[%i]' % ( name, offset + i ) ]
offset += 100
# Your sleep can come here
The snippet above should replace your for i, n in enumerate(uvShellIds): block.
As #David Culbreth's answer stated, I'm not sure the sleep will be of help, but I left a comment on where you can place it.
I use this generator to "chunkify" my long-running operations in python into smaller batches:
def chunkify_list(items, chunk_size):
for i in range(0, len(items), chunk_size):
yield items[i:i+chunk_size]
With this defined, you can write your program something like this:
items = [1,2,3,4,5 ...]
for chunk in chunkify_list(items, 100):
for item in chunk:
process_item(item)
sleep(delay)
Now, I'm not going to guarantee that sleep will actually solve your problems, but this lets you handle your data one chunk at a time.
I have a program that I created using threads, but then I learned that threads don't run concurrently in python and processes do. As a result, I am trying to rewrite the program using multiprocessing, but I am having a hard time doing so. I have tried following several examples that show how to create the processes and pools, but I don't think it's exactly what I want.
Below is my code with the attempts I have tried. The program tries to estimate the value of pi by randomly placing points on a graph that contains a circle. The program takes two command-line arguments: one is the number of threads/processes I want to create, and the other is the total number of points to try placing on the graph (N).
import math
import sys
from time import time
import concurrent.futures
import random
import multiprocessing as mp
def myThread(arg):
# Take care of imput argument
n = int(arg)
print("Thread received. n = ", n)
# main calculation loop
count = 0
for i in range (0, n):
x = random.uniform(0,1)
y = random.uniform(0,1)
d = math.sqrt(x * x + y * y)
if (d < 1):
count = count + 1
print("Thread found ", count, " points inside circle.")
return count;
# end myThread
# receive command line arguments
if (len(sys.argv) == 3):
N = sys.argv[1] # original ex: 0.01
N = int(N)
totalThreads = sys.argv[2]
totalThreads = int(totalThreads)
print("N = ", N)
print("totalThreads = ", totalThreads)
else:
print("Incorrect number of arguments!")
sys.exit(1)
if ((totalThreads == 1) or (totalThreads == 2) or (totalThreads == 4) or (totalThreads == 8)):
print()
else:
print("Invalid number of threads. Please use 1, 2, 4, or 8 threads.")
sys.exit(1)
# start experiment
t = int(time() * 1000) # begin run time
total = 0
# ATTEMPT 1
# processes = []
# for i in range(totalThreads):
# process = mp.Process(target=myThread, args=(N/totalThreads))
# processes.append(process)
# process.start()
# for process in processes:
# process.join()
# ATTEMPT 2
#pool = mp.Pool(mp.cpu_count())
#total = pool.map(myThread, [N/totalThreads])
# ATTEMPT 3
#for i in range(totalThreads):
#total = total + pool.map(myThread, [N/totalThreads])
# p = mp.Process(target=myThread, args=(N/totalThreads))
# p.start()
# ATTEMPT 4
# with concurrent.futures.ThreadPoolExecutor() as executor:
# for i in range(totalThreads):
# future = executor.submit(myThread, N/totalThreads) # start thread
# total = total + future.result() # get result
# analyze results
pi = 4 * total / N
print("pi estimate =", pi)
delta_time = int(time() * 1000) - t # calculate time required
print("Time =", delta_time, " milliseconds")
I thought that creating a loop from 0 to totalThreads that creates a process for each iteration would work. I also wanted to pass in N/totalThreads (to divide the work), but it seems that processes take in an iterable list rather than an argument to pass to the method.
What is it I am missing with multiprocessing? Is it at all possible to even do what I want to do with processes?
Thank you in advance for any help, it is greatly appreciated :)
I have simplified your code and used some hard-coded values which may or may not be reasonable.
import math
import concurrent.futures
import random
from datetime import datetime
def myThread(arg):
count = 0
for i in range(0, arg[0]):
x = random.uniform(0, 1)
y = random.uniform(0, 1)
d = math.sqrt(x * x + y * y)
if (d < 1):
count += 1
return count
N = 10_000
T = 8
_start = datetime.now()
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {executor.submit(myThread, (int(N / T),)): _ for _ in range(T)}
total = 0
for future in concurrent.futures.as_completed(futures):
total += future.result()
_end = datetime.now()
print(f'Estimate for PI = {4 * total / N}')
print(f'Run duration = {_end-_start}')
A typical output on my machine looks like this:-
Estimate for PI = 3.1472
Run duration = 0:00:00.008895
Bear in mind that the number of threads you start is effectively managed by the ThreadPoolExecutor (TPE) [ when constructed with no parameters ]. It makes decisions about the number of threads that can run based on your machine's processing capacity (number of cores etc). Therefore you could, if you really wanted to, set T to a very high number and the TPE will block execution of any new threads until it determines that there is capacity.
I want to use multiprocessing in Python to speed up a while loop.
More specifically:
I have a matrix (samples*features). I want to select x subsets of samples whose values at a random subset of features is unequal to a certain value (-1 in this case).
My serial code:
np.random.seed(43)
datafile = '...'
df = pd.read_csv(datafile, sep=" ", nrows = 89)
no_feat = 500
no_samp = 5
no_trees = 5
i=0
iter=0
samples = np.zeros((no_trees, no_samp))
features = np.zeros((no_trees, no_feat))
while i < no_trees:
rand_feat = np.random.choice(df.shape[1], no_feat, replace=False)
iter_order = np.random.choice(df.shape[0], df.shape[0], replace=False)
samp_idx = []
a=0
#--------------
#how to run in parallel?
for j in iter_order:
pot_samp = df.iloc[j, rand_feat]
if len(np.where(pot_samp==-1)[0]) == 0:
samp_idx.append(j)
if len(samp_idx) == no_samp:
print a
break
a+=1
#--------------
if len(samp_idx) == no_samp:
samples[i,:] = samp_idx
features[i, :] = rand_feat
i+=1
iter+=1
if iter>1000: #break if subsets cannot be found
break
Searching for fitting samples is the potentially expensive part (the j for loop), which in theory can be run in parallel. In some cases, it is not necessary to iterate over all samples to find a large enough subset, which is why I am breaking out of the loop as soon as the subset is large enough.
I am struggling to find an implementation that would allow for checks of how many valid results are generated already. Is it even possible?
I have used joblib before. If I understand correctly this uses the pool methods of multiprocessing as a backend which only works for separate tasks? I am thinking that queues might be helpful but thus far I failed at implementing them.
I found a working solution. I decided to run the while loop in parallel and have the different processes interact over a shared counter. Furthermore, I vectorized the search for suitable samples.
The vectorization yielded a ~300x speedup and running on 4 cores speeds up the computation ~twofold.
First I tried to implement separate processes and put the results into a queue. Turns out these aren't made to store large amounts of data.
If someone sees another bottleneck in that code I would be glad if someone pointed it out.
With my basically nonexistent knowledge about parallel computing I found it really hard to puzzle this together, especially since the example on the internet are all very basic. I learnt a lot though =)
My code:
import numpy as np
import pandas as pd
import itertools
from multiprocessing import Pool, Lock, Value
from datetime import datetime
import settings
val = Value('i', 0)
worker_ID = Value('i', 1)
lock = Lock()
def findSamp(no_trees, df, no_feat, no_samp):
lock.acquire()
print 'starting worker - {0}'.format(worker_ID.value)
worker_ID.value +=1
worker_ID_local = worker_ID.value
lock.release()
max_iter = 100000
samp = []
feat = []
iter_outer = 0
iter = 0
while val.value < no_trees and iter_outer<max_iter:
rand_feat = np.random.choice(df.shape[1], no_feat, replace=False
#get samples with random features from dataset;
#find and select samples that don't have missing values in the random features
samp_rand = df.iloc[:,rand_feat]
nan_idx = np.unique(np.where(samp_rand == -1)[0])
all_idx = np.arange(df.shape[0])
notnan_bool = np.invert(np.in1d(all_idx, nan_idx))
notnan_idx = np.where(notnan_bool == True)[0]
if notnan_idx.shape[0] >= no_samp:
#if enough samples for random feature subset, select no_samp samples randomly
notnan_idx_rand = np.random.choice(notnan_idx, no_samp, replace=False)
rand_feat_rand = rand_feat
lock.acquire()
val.value += 1
#x = val.value
lock.release()
#print 'no of trees generated: {0}'.format(x)
samp.append(notnan_idx_rand)
feat.append(rand_feat_rand)
else:
#increase iter_outer counter if no sample subset could be found for random feature subset
iter_outer += 1
iter+=1
if iter >= max_iter:
print 'exiting worker{0} because iter >= max_iter'.format(worker_ID_local)
else:
print 'worker{0} - finished'.format(worker_ID_local)
return samp, feat
def initialize(*args):
global val, worker_ID, lock
val, worker_ID, lock = args
def star_findSamp(i_df_no_feat_no_samp):
return findSamp(*i_df_no_feat_no_samp)
if __name__ == '__main__':
np.random.seed(43)
datafile = '...'
df = pd.read_csv(datafile, sep=" ", nrows = 89)
df = df.fillna(-1)
df = df.iloc[:, 6:]
no_feat = 700
no_samp = 10
no_trees = 5000
startTime = datetime.now()
print 'starting multiprocessing'
ncores = 4
p = Pool(ncores, initializer=initialize, initargs=(val, worker_ID, lock))
args = itertools.izip([no_trees]*ncores, itertools.repeat(df), itertools.repeat(no_feat), itertools.repeat(no_samp))
result = p.map(star_findSamp, args)#, callback=log_result)
p.close()
p.join()
print '{0} sample subsets for tree training have been found'.format(val.value)
samples = [x[0] for x in result if x != None]
samples = np.vstack(samples)
features = [x[1] for x in result if x != None]
features = np.vstack(features)
print datetime.now() - startTime
I am doing a project for the end of the semester and I need to be able to take a matrix to a power and I need to make the problem multithreaded.
This code works in some situations and does not in other situations. I believe that it has to do with the logic in the nested loops in the process_data function but I am not sure what I am doing wrong! I have been working on this for a couple weeks and I am absolutely stumped. It seems like it has something to do with my threads going out of bounds but even then I am not very sure because there are some situations where the threads go out of bounds but then still calculates the matrices properly.
Please help!
import copy
import numpy
import Queue
import random
import threading
import time
import timeit
# Create variable that determines the number of columns and
# rows in the matrix.
n = 4
# Create variable that determines the power we are taking the
# matrix to.
p = 2
# Create variable that determines the number of threads we are
# using.
t = 2
# Create an exit flag.
exitFlag = 0
# Create threading class.
class myThread (threading.Thread):
def __init__(self, threadID, name, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print "Starting " + self.name
process_data(self.name, self.q)
print "Exiting " + self.name
# Create a function that will split our data into multiple threads
# and do the matrix multiplication.
def process_data(threadName, q):
numCalc = ((n^3)/t)
for a in range(p-1):
for b in range((numCalc*(q-1)),(numCalc*(q))):
for c in range(n):
for d in range(n):
matrix[a+1][b][c] += matrix[a][b][d] * matrix[0][d][c]
# Create a three dimensional matrix that will store the ouput for
# each power of the matrix multiplication.
matrix = [[[0 for k in xrange(n)] for j in xrange(n)] for i in xrange(p)]
print matrix
# This part fills our initial n by n matrix with random numbers
# ranging from 0 to 9 and then prints it!
print "Populating Matrix!"
for i in range(n):
for j in range(n):
matrix[0][i][j] = random.randint(0,9)
# Tells the user that we are multiplying matrices and starts the
# timer.
print "Taking our matrix to the next level!"
start = timeit.default_timer()
threadLock = threading.Lock()
threads = []
threadID = 1
# Create new threads
for tName in range(t):
thread = myThread(threadID, "Thread-0"+str(tName), threadID)
thread.start()
threads.append(thread)
threadID += 1
# Wait for all threads to complete
for x in threads:
x.join()
stop = timeit.default_timer()
print stop - start
print "Exiting main thread!"
print matrix
Taking the matrix squared seems to work in every case but if I try to calculate beyond that the remaining powers come out with matrices that are filled with zeroes! The case that I have posted works.
When I change the n, p and t variables is when I run into problems where it does not calculate properly.
Thank you for your time.
This is not correct:
numCalc = ((n^3)/t)
for b in range((numCalc*(q-1)),(numCalc*(q))):
For instance, when n = 4 and t = 2, the first thread should have b range over the columns [0,1] and the second thread range over the columns [2,3]. But this calculation gives:
numCalc = 8 / 2 = 4
thread 1 ranges b over range(0, 4) = [0,1,2,3]
thread 2 ranges b over range(4, 8) = [4,5,6,7]
So thread 1 does all of the work and thread 2 tries to access non-existent columns!
I am currently working with python v.2.7 on windows 8.
My programme is using threads. I am providing a name to these threads during their creation. The first thread is named First-Thread and second one is named Second-Thread. The threads execute a method named as getData() that does the following:
makes the current thread to sleep for some time
calls the compareValues()
retrieve the information from the compareValues() and adds them to a
list called myList
The compareValues() does the following:
generates a random number
checks if it is less than 5 or if it is greater than or equal to 5
and yields the result along with the current thread's name
I save the results of these threads to a list named as myList and then finally print this myList.
Problem: Why I never see the Second-Thread in myList? I don't understand this behavior. Please try to execute this code to see the output in order to understand my problem.
Code:
import time
from random import randrange
import threading
myList = []
def getData(i):
print "Sleep for %d"%i
time.sleep(i)
data = compareValues()
for d in list(data):
myList.append(d)
def compareValues():
number = randrange(10)
if number >= 5:
yield "%s: Greater than or equal to 5: %d "%(t.name, number)
else:
yield "%s: Less than 5: %d "%(t.name, number)
threadList = []
wait = randrange(10)+1
t = threading.Thread(name = 'First-Thread', target = getData, args=(wait,))
threadList.append(t)
t.start()
wait = randrange(3)+1
t = threading.Thread(name = 'Second-Thread', target = getData, args=(wait,))
threadList.append(t)
t.start()
for t in threadList:
t.join()
print
print "The final list"
print myList
Sample output:
Sleep for 4Sleep for 1
The final list
['First-Thread: Greater than or equal to 5: 7 ', 'First-Thread: Greater than or equal to 5: 8 ']
Thank you for your time.
def compareValues():
number = randrange(10)
if number >= 5:
yield "%s: Greater than or equal to 5: %d "%(t.name, number)
else:
yield "%s: Less than 5: %d "%(t.name, number)
In the body of compareValues the code refers to t.name. By the time compareValues() gets called by the threads, t, which is looked up according to the LEGB rule and found in the global scope, references the first thread because the t.join() is waiting on the first thread. t.name thus has the value First-Thread.
To get the current thread name, use threading.current_thread().name:
def compareValues():
number = randrange(10)
name = threading.current_thread().name
if number >= 5:
yield "%s: Greater than or equal to 5: %d "%(name, number)
else:
yield "%s: Less than 5: %d "%(name, number)
Then you will get output like
Sleep for 4
Sleep for 2
The final list
['Second-Thread: Less than 5: 3 ', 'First-Thread: Greater than or equal to 5: 5 ']