I am use threading to do parallel download now i have a url_list img_list i want to download it in 2 thread so def two download.
i put half into download1 and half in download2 so it will speed up to complete, but finally when i run the scripts my download still in serial,i don't know why, how can i modify my script?
here's the code:
import requests,threading
img_list=[...]
num=len(img_list)
def download_1(img_list):
n=0
for i in img_list:
n+=1
with open('./img/'+str(n)+'.jpg','wb')as f:
f.write(requests.get(i).content)
print(str(n)+"download1 complete")
def download_2(img_list):
n=len(img_list)
for i in img_list:
n+=1
with open('./img/'+str(n)+'.jpg','wb')as f:
f.write(requests.get(i).content)
print(str(n)+"download2 complete")
thread_1 = threading.Thread(target=download_1(img_list[:int(num/2)]))
thread_2 = threading.Thread(target=download_2(img_list[int(num/2):]))
thread_1.start()
thread_2.start()
In this line
threading.Thread(target=download_1(img_list[:int(num/2)]))
you call download_1(...) and pass the result (null) to thread. That's why it runs serially. Instead you want to pass download_1 function itself (not the result of calling it) to the thread. Like this:
threading.Thread(target=download_1, args=(img_list[:int(num/2)],))
Do it in both places.
Side note: you should t.join() both threads at the end.
You are calling both functions at the time of creating threads. So, threads are passed null and, therefore, do nothing. You should change the code like this:
thread_1 = threading.Thread(target=download_1, args=(img_list[:int(num/2)]))
thread_2 = threading.Thread(target=download_2, args=(img_list[int(num/2):]))
thread_1.start()
thread_2.start()
Related
I was thinking to use multiprocess package to run a function in parallel but I need to pass a different value of parameter every run (every 1 sec).
e.g.)
def foo(list):
while True:
<do something with list>
sleep(1000)
def main():
process = multiprocess.Process(target=foo, args=(lst))
process.start()
<keep updating lst>
This will cause a foo() function running with the same parameter value over and over. How can I work around in this scenario?
Armed with the knowledge of what you're actually trying to do, i.e.
The foo function does an http post call to save some logs (batch) to the storage. The main function is getting text logs (save log to the batch) while running a given shell script. Basically, I'm trying to do batching for logging.
the answer is to use a thread and a queue for message passing (multiprocessing.Process and multiprocessing.Queue would also work, but aren't really necessary):
import threading
import time
from queue import Queue
def send_batch(batch):
print("Sending", batch)
def save_worker(queue: Queue):
while True:
batch = queue.get()
if batch is None: # stop signal
break
send_batch(batch)
queue.task_done()
def main():
batch_queue = Queue()
save_thread = threading.Thread(target=save_worker, args=(batch_queue,))
save_thread.start()
log_batch = []
for x in range(42): # pretend this is the shell script outputting things
message = f"Message {x}"
print(message)
log_batch.append(message)
time.sleep(0.1)
if len(log_batch) >= 7: # could also look at wallclock
batch_queue.put(log_batch)
log_batch = []
if log_batch:
batch_queue.put(log_batch) # send the last batch
print("Script stopped, waiting for worker to finish")
batch_queue.put(None) # stop signal
save_thread.join()
if __name__ == "__main__":
main()
import threading
import time
def run_every_second(param):
# your function code here
print(param)
# create a list of parameters to pass to the function
params = [1, 2, 3, 4]
# create and start a thread for each parameter
for param in params:
t = threading.Thread(target=run_every_second, args=(param,))
t.start()
time.sleep(1)
# wait for all threads to complete
for t in threads:
t.join()
This will create a new thread for each parameter, and each thread will run the run_every_second function with the corresponding parameter. The threads will run concurrently, so the functions will be executed in parallel. There will be a 1-second time lapse between the start of each thread.
Background: I'm trying to do 100's of dymola simulations with the python-dymola interface. I managed to run them in a for-loop. Now I want them to run while multi-threading so I can run multiple models parallel (which will be much faster). Since probably nobody uses the interface, I wrote some simple code that also shows my problem:
1: Turn a for-loop into a definition that is run into another for-loop BUT both the def and the for-loop share the same variable 'i'.
2: Turn a for-loop into a definition and use multi-threading to execute it. A for-loop runs the command one by one. I want to run them parallel with a maximum of x threads at the same time. The result should be the same as when executing the for-loop
Example-code:
import os
nSim = 100
ndig='{:01d}'
for i in range(nSim):
os.makedirs(str(ndig.format(i)))
Note that the name of the created directories are just the numbers from the for-loop (this is important). Now instead of using the for-loop, I would love to create the directories with multi-threading (note: probably not interesting for this short code but when calling and executing 100's of simulation models it definitely is interesting to use multi-threading).
So I started with something simple I thought, turning the for-loop into a function that then is run inside another for-loop and hoped to have the same result as with the for-loop code above but got this error:
AttributeError: 'NoneType' object has no attribute 'start'
(note: I just started with this, because I did not use the def-statement before and the thread package is also new. After this I would evolve towards the multi-threading.)
1:
import os
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
for i in range(nSim):
simulation(i=i).start
After that failed, I tried to evolve to multi-threading (converting the for-loop into something that does the same but with multi-threading and by that running the code parallel instead of one by one and with a maximum number of threads):
2:
import os
import threading
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
if __name__ == '__main__':
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
Unfortunately that attempt failed as well and now I got the error:
NameError: name 'i' is not defined
Does anybody has suggestions for issues 1 or 2?
Both examples are incomplete. Here's a complete example. Note that target gets passed the name of the function target=simulation and a tuple of its arguments args=(i,). Don't call the function target=simulation(i=i) because that just passes the result of the function, which is equivalent to target=None in this case.
import threading
nSim = 100
def simulation(i):
print(f'{threading.current_thread().name}: {i}')
if __name__ == '__main__':
threads = [threading.Thread(target=simulation,args=(i,)) for i in range(nSim)]
for t in threads:
t.start()
for t in threads:
t.join()
Output:
Thread-1: 0
Thread-2: 1
Thread-3: 2
.
.
Thread-98: 97
Thread-99: 98
Thread-100: 99
Note you usually don't want more threads that CPUs, which you can get from multiprocessing.cpu_count(). You can use create a thread pool and use queue.Queue to post work that the threads execute. An example is in the Python Queue documentation.
Cannot call .start like this
simulation(i=i).start
on an non-threading object. Also, you have to import the module as well
It seems like you forgot to add 'for' and indent the code in your loop
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
to
for i in range(nSim):
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
If you would like to have max number of thread in a pool, and to run all items in the queue. We can continue #mark-tolonen answer and do like this:
import threading
import queue
import time
def main():
size_of_threads_pool = 10
num_of_tasks = 30
task_seconds = 1
q = queue.Queue()
def worker():
while True:
item = q.get()
print(my_st)
print(f'{threading.current_thread().name}: Working on {item}')
time.sleep(task_seconds)
print(f'Finished {item}')
q.task_done()
my_st = "MY string"
threads = [threading.Thread(target=worker, daemon=True) for i in range(size_of_threads_pool)]
for t in threads:
t.start()
# send the tasks requests to the worker
for item in range(num_of_tasks):
q.put(item)
# block until all tasks are done
q.join()
print('All work completed')
# NO need this, as threads are while True, so never will stop..
# for t in threads:
# t.join()
if __name__ == '__main__':
main()
This will run 30 tasks of 1 second in each, using 10 threads.
So total time would be 3 seconds.
$ time python3 q_test.py
...
All work completed
real 0m3.064s
user 0m0.033s
sys 0m0.016s
EDIT: I found another higher-level interface for asynchronously executing callables.
Use concurrent.futures, see the example in the docs:
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
Note the max_workers=5 that will tell the max number of threads, and
note the for loop for url in URLS that you can use.
I have a small piece of code that I made to test out and hopefully debug the problem without having to modify the code in my main applet in Python. This has let me to build this code:
#!/usr/bin/env python
import sys, threading, time
def loop1():
count = 0
while True:
sys.stdout.write('\r thread 1: ' + str(count))
sys.stdout.flush()
count = count + 1
time.sleep(.3)
pass
pass
def loop2():
count = 0
print ""
while True:
sys.stdout.write('\r thread 2: ' + str(count))
sys.stdout.flush()
count = count + 2
time.sleep(.3)
pass
if __name__ == '__main__':
try:
th = threading.Thread(target=loop1)
th.start()
th1 = threading.Thread(target=loop2)
th1.start()
pass
except KeyboardInterrupt:
print ""
pass
pass
My goal with this code is to be able to have both of these threads displaying output in stdout format (with flushing) at the same time and have then side by side or something. problem is that I assume since it is flushing each one, it flushes the other string by default. I don't quite know how to get this to work if it is even possible.
If you just run one of the threads, it works fine. However I want to be able to run both threads with their own string running at the same time in the terminal output. Here is a picture displaying what I'm getting:
terminal screenshot
let me know if you need more info. thanks in advance.
Instead of allowing each thread to output to stdout, a better solution is to have one thread control stdout exclusively. Then provide a threadsafe channel for the other threads to dispatch data to be output.
One good method to achieve this is to share a Queue between all threads. Ensure that only the output thread is accessing data after it has been added to the queue.
The output thread can store the last message from each other thread and use that data to format stdout nicely. This can include clearing output to display something like this, and update it as each thread generates new data.
Threads
#1: 0
#2: 0
Example
Some decisions were made to simplify this example:
There are gotchas to be wary of when giving arguments to threads.
Daemon threads terminate themselves when the main thread exits. They are used to avoid adding complexity to this answer. Using them on long-running or large applications can pose problems. Other
questions discuss how to exit a multithreaded application without leaking memory or locking system resources. You will need to think about how your program needs to signal an exit. Consider using asyncio to save yourself these considerations.
No newlines are used because \r carriage returns cannot clear the whole console. They only allow the current line to be rewritten.
import queue, threading
import time, sys
q = queue.Queue()
keepRunning = True
def loop_output():
thread_outputs = dict()
while keepRunning:
try:
thread_id, data = q.get_nowait()
thread_outputs[thread_id] = data
except queue.Empty:
# because the queue is used to update, there's no need to wait or block.
pass
pretty_output = ""
for thread_id, data in thread_outputs.items():
pretty_output += '({}:{}) '.format(thread_id, str(data))
sys.stdout.write('\r' + pretty_output)
sys.stdout.flush()
time.sleep(1)
def loop_count(thread_id, increment):
count = 0
while keepRunning:
msg = (thread_id, count)
try:
q.put_nowait(msg)
except queue.Full:
pass
count = count + increment
time.sleep(.3)
pass
pass
if __name__ == '__main__':
try:
th_out = threading.Thread(target=loop_output)
th_out.start()
# make sure to use args, not pass arguments directly
th0 = threading.Thread(target=loop_count, args=("Thread0", 1))
th0.daemon = True
th0.start()
th1 = threading.Thread(target=loop_count, args=("Thread1", 3))
th1.daemon = True
th1.start()
# Keep the main thread alive to wait for KeyboardInterrupt
while True:
time.sleep(.1)
except KeyboardInterrupt:
print("Ended by keyboard stroke")
keepRunning = False
for th in [th0, th1]:
th.join()
Example Output:
(Thread0:110) (Thread1:330)
I'm writing a library that will connect to sockets and manage them, process its data, and do stuff based on that.
My problem lays on sending b"\r\n\x00" to the socket every 20 seconds. I thought that if I started a new thread for the ping function, that would work.
..however, time.sleep() seems to pause the whole entire program instead of what I thought would be just that thread.
here's my code thus far:
def main(self):
recvbuf = b""
self.connect(self.group, self.user, self.password)
while self.connected:
rSocket, wSocket, error = select.select([x[self.group] for x in self.conArray], [x[self.group] for x in self.conArray], [x[self.group] for x in self.conArray], 0.2) #getting already made socket connections
for rChSocket in rSocket:
while not recvbuf.endswith(b"\x00"): #[-1] doesnt work on empty things... and recvbuf is empty.
recvbuf += rChSocket.recv(1024) #need the WHOLE message ;D
if len(recvbuf) > 0:
dataManager.manage(self, self.group, recvbuf)
recvbuf = b""
for wChSocket in wSocket:
t = threading.Thread(self.pingTimer(wChSocket)) #here's what I need to be ran every 20 seconds.
t.start()
x[self.group] for x in self.conArray.close()
and here's the pingTimer function:
def pingTimer(self, wChSocket):
time.sleep(20)
print(time.strftime("%I:%M:%S %p]")+"Ping test!") #I don't want to mini-DDoS, testing first.
#wChSocket.send(b"\r\n\x00")
Thanks :D
This:
t = threading.Thread(self.pingTimer(wChSocket))
Does not do what you expect. It calls self.pingTimer in the same thread and passes the return value to threading.Thread. That's not what you want. You probably want this:
t = threading.Thread(target=self.pingTimer, args=(wChSocket,))
I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script
Use the threading module.
Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....
You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()
You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error
Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.