I was trying to run a cosine similarity code to check if two strings are similar inside my list of strings to make the list containing unique strings only to remove sentences that are similar. I took one string and compared it with every other string in the list. The method I implemented is O(n^2) and will take a month minimum to finish for all my strings. I was thinking if I could run the nested loop tasks in parallel to reduce the time using asyncio.
So I tried something very similar to this but it doesn't work asynchronously. Kindly guide me a little bit. thank you.
async def dumb_add(i,j):
print("adding",i,"+",j)
await asyncio.sleep(random.randint(0,3))
print(i,"+",j,"=",(i+j))
async def main():
for i in range(0,2):
for j in range(0,2):
await dumb_add(i,j)
print('main done')
asyncio.create_task(main())
Results:
adding 0 + 0
0 + 0 = 0
adding 0 + 1
0 + 1 = 1
adding 1 + 0
1 + 0 = 1
adding 1 + 1
1 + 1 = 2
main done
It is not running in parallel because the "await" keyword
is causing the co-routine to wait for each "dumb_add" call to finish, before moving on to the next one.
Therefore, the calls run sequentially rather than concurrently.
If you want to run your "dumb_add" function in parallel, you should use asyncio.gather().
In this way, you can create a list of routines that can be executed in parallel.
Something like this:
async def dumb_add(i,j):
print("adding",i,"+",j)
await asyncio.sleep(random.randint(0,3))
print(i,"+",j,"=",(i+j))
async def main():
tasks = []
for i in range(0,2):
for j in range(0,2):
tasks.append(dumb_add(i,j))
await asyncio.gather(*tasks)
print('main done')
asyncio.run(main())
Related
import time
i = 1
def sendData(x):
time.sleep(5)
print("delayed data: ", x)
while (1):
print(i)
sendData(i)
i += 1
time.sleep(0.5)
What I want is to print a value every 5 seconds while the infinite loop runs.
so I can see the values printing very .5 seconds and another value being printed every 5 seconds.
At the moment, the loop still gets delayed because of the time.sleep(5) in the helper function. Any help is appreciated. Thank you.
You can achieve your goal using the threading library. It allows you to run code in the "background" while your main code runs alongside it.
Here's an example of how to run the sendData function in the background with the main loop executing concurrently. Notice that I modified sendData to use the global variable i instead of receiving it as a parameter to allow the main loop to update i.
import threading
import time
i = 1
def sendData():
while True:
time.sleep(5)
print("delayed data: ", i)
thread = threading.Thread(target=sendData)
thread.start()
while (1):
print(i)
i += 1
time.sleep(0.5)
You can read more about threading and sharing variables when using threading.
You could achieve this with an asynchronous approach or use multi-[threading|procesiing]
Your approach is blocking the execution as it runs step by step.
Choosing the approach depends on the task that you want to perform in the sendData method, but from the name, I could suggest asyncio should work just fine.
import asyncio
async def send_data(x):
await asyncio.sleep(5) # Could be network request as well
print("delayed data: ", x)
async def main():
i = 1
while True:
print(i)
# Create non blocking task (run in background)
asyncio.create_task(send_data(i))
i += 1
await asyncio.sleep(0.5)
if __name__ == '__main__':
asyncio.run(main())
This code may help you. But, you need to modify them as you want
# importing module
import time
# running loop from 0 to 4
for i in range(0,5):
# printing numbers
print("delayed data: ", i)
# adding 0.5 seconds time delay
time.sleep(0.5)
the output print every 0.5 sec like this:
delayed data: 0
delayed data: 1
delayed data: 2
delayed data: 3
delayed data: 4
My program retrieves a data file and then runs the same function n times (in this example, 100) to process that data differently each time (with randomly generated ranges), and then maps the results of each trial to a file.
I tried with asyncio but I discovered I have been doing it wrong since it doesn't run the way I want it:
I want it to run all the instances of the function start_analysis() in parallel (assuming that is the most efficient method).
This is what I have so far:
import asyncio
async def start_analysis(trial, data):
ranges = generate_ranges()
res_1 = func_1(ranges)
res_2 = func_2(ranges)
res_3 = func_3(ranges)
map_results(trial, res_1, res_2, res_3)
def main():
data = retrieve_data()
trial = 1
while trial <= 100:
asyncio.run(start_analysis(trial, data))
trial += 1
if __name__ == "__main__":
main()
I'm trying to implement a multi thread program in python and am having troubles.
I try to design a program, when the program (main thread) receives a specific command,
The counter of the program will return to the previous number and continue counting down.
The following is the code I tried to write:
import threading
import time
count = 0
preEvent = threading.Event()
runEvent = threading.Event()
runEvent.set()
preEvent.clear()
def pre():
global count
while True:
if event.isSet():
count -= 1
event.clear()
def doSomething():
global count
while True:
# if not runEvent.isSet():
# runEvent.wait()
print(count)
count += 1
time.sleep(1)
def main():
t1 = threading.Thread(target=pre, args=())
t2 = threading.Thread(target=doSomething, args=())
t1.start()
t2.start()
command = input("input command")
while command != 'quit':
command = input("input command")
if command == 'pre':
preEvent.set()
main()
But I encountered a few problems
How to block t1 simultaneously while I inputting a specific command
How to start from the beginning when t1 is restored instead of starting from the blocked point
Regarding question 1, I tried adding a condition check before the print(count) command, but if I enter the "pre" command when the program outputs count, the program will still perform count+=1
After that, the program will go back to check the conditions and block t1, but this does not achieve the synchronization effect I wanted.
Is there a way to achieve this goal?
Question 2 is similar to question 1. I hope that whenever I enter the command, my program will output like this.
1
2
3
4
pre
3
4
...
But if t1 is blocked after finishing the print(count) instruction, when the event is cleared, t1 will continue to execute from count+=1
So the output will become the following
1
2
3
4
pre
4
5
...
I tried to find information on the Internet, but I never knew how to add keywords.
Is there a method or library that can achieve this function?
I have tried my best to describe my problem, but it may not be good enough. If I have any questions about my problem, I can add more explanation.
Thank you all for your patience to read my question
The threading module has a wait method that will block until notify or notify_all is called. This should accomplish what you're looking for in your first question. For question 2 you can either define a function that handles the exit case, or just recreate a thread to start form the beginning.
Simply it can work like this, maybe helps.
from threading import Thread, Event
count = 0
counter = Event()
def pre():
global count
counter.wait()
count -= 1
print(count)
def main():
global count
print(count)
while True:
command = input("Enter 1 to increase, Enter 2 to decrease :")
if command == "1":
t1 = Thread(target=pre, args=())
t1.start()
counter.set()
else :
count += 1
print(count)
main()
I'm working with trio to run asynchronous concurrent task that will do some web scraping on different websites. I'd like to be able to chose how many concurrent workers I'll divide the tasks with. To do so I've written this code
async def run_task():
s = trio.Session(connections=5)
Total_to_check = to_check() / int(module().workers)
line = 0
if int(Total_to_check) < 1:
Total_to_check = 1
module().workers = int(to_check())
for i in range(int(Total_to_check)):
try:
async with trio.open_nursery() as nursery:
for x in range(int(module().workers)):
nursery.start_soon(python_worker, self, s, x, line)
line += 1
except BlockingIOError as e:
print("[Fatal Error]", str(e))
continue
In this example to_check() is equal to how many urls are given to fetch data from, and module().workers is equal to how many concurrent workers I'd like to use.
So if I had let's say I had 30 urls and I input that I want 10 concurrent tasks, it'll fetch data from 10 urls concurrently and repeat the procedure 3 times.
Now this is all well and good up until I the Total_to_check(which is equal to the number of urls divided by the number of workers) is in the decimals.
If I have let's say 15 urls and I ask for 10 workers, then this code will only check 10 urls. Same if I've got 20 urls but ask for 15 workers.
I could do something like math.ceil(Total_to_check) but then it'll start trying to check urls that don't exist.
How could I make this properly work, so that let's if I have 10 concurrent tasks and 15 urls, it'll check the first 10 concurrently and then the last 5 concurrently without skipping urls? (or trying to check too many)
Thanks!
Well, here comes the CapacityLimiter that you would use like this:
async def python_worker(self, session, workers, line, limit):
async with limit:
...
Then you can simplify your run_task:
async def run_task():
limit = trio.CapacityLimiter(10)
s = trio.Session(connections=5)
line = 0
async with trio.open_nursery() as nursery:
for x in range(int(to_check())):
nursery.start_soon(python_worker, self, s, x, line, limit)
line += 1
I believe the BlockingIOError would have to move inside python_worker too because nursery.start_soon() won't block, it's the __aexit__ of the nursery that automagically waits at the end of the async with trio.open_nursery() as nursery block.
Please bear with me as this is a bit of a contrived example of my real application. Suppose I have a list of numbers and I wanted to add a single number to each number in the list using multiple (2) processes. I can do something like this:
import multiprocessing
my_list = list(range(100))
my_number = 5
data_line = [{'list_num': i, 'my_num': my_number} for i in my_list]
def worker(data):
return data['list_num'] + data['my_num']
pool = multiprocessing.Pool(processes=2)
pool_output = pool.map(worker, data_line)
pool.close()
pool.join()
Now however, there's a wrinkle to my problem. Suppose that I wanted to alternate adding two numbers (instead of just adding one). So around half the time, I want to add my_number1 and the other half of the time I want to add my_number2. It doesn't matter which number gets added to which item on the list. However, the one requirement is that I don't want to be adding the same number simultaneously at the same time across the different processes. What this boils down to essentially (I think) is that I want to use the first number on Process 1 and the second number on Process 2 exclusively so that the processes are never simultaneously adding the same number. So something like:
my_num1 = 5
my_num2 = 100
data_line = [{'list_num': i, 'my_num1': my_num1, 'my_num2': my_num2} for i in my_list]
def worker(data):
# if in Process 1:
return data['list_num'] + data['my_num1']
# if in Process 2:
return data['list_num'] + data['my_num2']
# and so forth
Is there an easy way to specify specific inputs per process? Is there another way to think about this problem?
multiprocessing.Pool allows to execute an initializer function which is going to be executed before the actual given function will be run.
You can use it altogether with a global variable to allow your function to understand in which process is running.
You probably want to control the initial number the processes will get. You can use a Queue to notify to the processes which number to pick up.
This solution is not optimal but it works.
import multiprocessing
process_number = None
def initializer(queue):
global process_number
process_number = queue.get() # atomic get the process index
def function(value):
print "I'm process %s" % process_number
return value[process_number]
def main():
queue = multiprocessing.Queue()
for index in range(multiprocessing.cpu_count()):
queue.put(index)
pool = multiprocessing.Pool(initializer=initializer, initargs=[queue])
tasks = [{0: 'Process-0', 1: 'Process-1', 2: 'Process-2'}, ...]
print(pool.map(function, tasks))
My PC is a dual core, as you can see only Process-0 and Process-1 are processed.
I'm process 0
I'm process 0
I'm process 1
I'm process 0
I'm process 1
...
['Process-0', 'Process-0', 'Process-1', 'Process-0', ... ]