multithreading not running task - python

So I have been messing around with this multithreading all day long but can seem to get it to work. It's in Python 3 also.
The program is trying to generate a list of 5 words 1000 times and using multithreading to increase the speed.
I've changed the code to lots off different methods im been searching online but with no outcome.
With what I have at the moment it will run without any issue but not print any of the words.
Any chance someone could have a look over it.
import random
from threading import Thread
word_file = "words.txt"
def gen():
Words = open(word_file).read().splitlines() #retreiving and sorting word file
seed = random.randrange(0,2048) #amount of words to choose from in list
for x in range(0, 1000):
print(random.choices(Words, k=5)) #print the words
def main():
t1 = Thread(target=gen)
t2 = Thread(target=gen)
t3 = Thread(target=gen)
t4 = Thread(target=gen)
t1.start()
t2.start()
t3.start()
t4.start()
print("completed")

Very simple: Your code is not calling any function, but just building them and leaving them alone.
Just add main() right before print("completed") so the code calls that function.
Note 1: Why are you reading the text file over and over again inside the loop, and why opening it manually? Auto-close it by the:
with open(word_file, "r") as f:
Words = f.read().splitlines()
code before gen().
Note 2: What is seed doing? You defined it but haven't used it anywhere.
Note 3: Please check your indentation before posting a question.

Related

Can I continue an input after a new line is created

I want to make my input continue after a new line is created in the terminal.
Here's the quick script I wrote:
import threading, time
def inn():
while True:
input()
def count():
a=0
while True:
a+=1
print(a)
time.sleep(1)
t1 = threading.Thread(target=inn)
t2 = threading.Thread(target=count)
t1.start()
t2.start()
Is there any way I could accomplish this, preferably with in-built functions.
if you change your code to:
.
def inn(): # in is restricted name in python
while True:
print(input())
.
.
.
you will notice that your code already does what you want! Even though it doesn't look like so, the input is not interrupted, take a look at the program:
1
2
t3
his 4
iss 5
my 6
strin7
g 8
this iss my string
9
The only change your code needs is to change restricted name in.

How long does it take to create a thread in python

I'm trying to finish my programming course and I'm stuck on one exercise.
I have to count how much time it takes in Python to create threads and whether it depends on the number of threads created.
I wrote a simple script and I don't know if it is good:
import threading
import time
def fun1(a,b):
c = a + b
print(c)
time.sleep(100)
times = []
for i in range(10000):
start = time.time()
threading.Thread(target=fun1, args=(55,155)).start()
end = time.time()
times.append(end-start)
print(times)
In times[] I got a 10000 results near 0.0 or exacly 0.0.
And now I don't know if I created the test because I don't understand something, or maybe the result is correct and the time of creating a thread does not depend on the number of already created ones?
Can U help me with it? If it's worng solution, explain me why, or if it's correct confirm it? :)
So there are two ways to interpret your question:
Whether the existence of other threads (that have not been started) affects creation time for new threads
Whether other threads running in the background (threads already started) affects creation time for new threads.
Checking the first one
In this case, you simply don't start the threads:
import threading
import time
def fun1(a,b):
c = a + b
print(c)
time.sleep(100)
times = []
for i in range(10):
start = time.time()
threading.Thread(target=fun1, args=(55,155)) # don't start
end = time.time()
times.append(end-start)
print(times)
output for 10 runs:
[4.696846008300781e-05, 2.8848648071289062e-05, 2.6941299438476562e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5033950805664062e-05, 2.6941299438476562e-05]
As you can see, the times are about the same (as you would expect).
Checking the second one
In this case, we want the previously created threads to keep running as we create more threads. So we give each thread a task that never finishes:
import threading
import time
def fun1(a,b):
while True:
pass # never ends
times = []
for i in range(100):
start = time.time()
threading.Thread(target=fun1, args=(55,155)).start()
end = time.time()
times.append(end-start)
print(times)
output:
Over 100 runs, the first one took 0.0003440380096435547 whereas the last one took 0.3017098903656006 so there's quite a magnitude of increase there.

Looping through a list of proxies

I am currently working on a function that is to loop through a list of functions and then restart back at the top once it reaches the bottom. So far this is the code that I have:
import time
createLimit = 100
proxyFile = 'proxies.txt'
def getProxies():
proxyList = []
with open(proxyFile, 'r') as f:
for line in f:
proxyList.append(line)
return proxyList
proxyList = getProxies()
def loopProxySwitch():
print("running")
current_run = 0
while current_run <= createLimit:
if current_run >= len(proxyList):
lengthOfList = len(proxyList)
useProxy = proxyList[current_run%lengthOfList]
print("Current Ip: "+useProxy)
print("Current Run: "+current_run)
print("Using modulus")
return useProxy
else:
useProxy = proxyList[current_run]
print("Current Ip: "+useProxy)
print("Current Run: "+current_run)
return useProxy
time.sleep(2)
print("Script ran")
loopProxySwitch()
The problem that I am having is that the loopProxySwitch function does not return or print anything within the while loop, however I don't see how it would be false. Here is the format of the text file with fake proxies:
111.111.111.111:2222
333.333.333.333:4444
444.444.444.444:5555
777.777.777.777:8888
919.919.919.919:0000
Any advice on this situation? I intend to incorporate this into a program that I am working on, however instead of cycling through the file on a timed interval, it would only loop on a certain returned condition (such as a another function letting the loop function know that some function has ran and that it is time to switch to the next proxy). If this is a bit confusing, I will be happy to elaborate and clear any confusion. Any suggestions, ideas, or fixes are appreciated. Thanks!
EDIT: Thanks to the comments below, I fixed the printing issue. However, the function does not loop through all the proxies... Any suggestions?
Nothing is printed because you return something before printing.
The loop will break the first time condition is met as it will return a value and exit the function without reaching the print statements(functions) and/or the next iteration.
BTW if you actually want to print the returned value you can print the function itself:
print(loopProxySwitch())

Always run a constant number of subprocesses in parallel

I want to use subprocesses to let 20 instances of a written script run parallel. Lets say i have a big list of urls with like 100.000 entries and my program should control that all the time 20 instances of my script are working on that list. I wanted to code it as follows:
urllist = [url1, url2, url3, .. , url100000]
i=0
while number_of_subproccesses < 20 and i<100000:
subprocess.Popen(['python', 'script.py', urllist[i]]
i = i+1
My script just writes something into a database or textfile. It doesnt output anything and dont need more input than the url.
My problem is i wasnt able to find something how to get the number of subprocesses that are active. Im a novice programmer so every hint and suggestion is welcome. I was also wondering how i can manage it once the 20 subprocesses are loaded that the while loop checks the conditions again? I thought of maybe putting another while loop over it, something like
while i<100000
while number_of_subproccesses < 20:
subprocess.Popen(['python', 'script.py', urllist[i]]
i = i+1
if number_of_subprocesses == 20:
sleep() # wait to some time until check again
Or maybe theres a bette possibility that the while loop is always checking on the number of subprocesses?
I also considered using the module multiprocessing, but i found it really convenient to just call the script.py with subprocessing instead of a function with multiprocessing.
Maybe someone can help me and lead me into the right direction. Thanks Alot!
Taking a different approach from the above - as it seems that the callback can't be sent as a parameter:
NextURLNo = 0
MaxProcesses = 20
MaxUrls = 100000 # Note this would be better to be len(urllist)
Processes = []
def StartNew():
""" Start a new subprocess if there is work to do """
global NextURLNo
global Processes
if NextURLNo < MaxUrls:
proc = subprocess.Popen(['python', 'script.py', urllist[NextURLNo], OnExit])
print ("Started to Process %s", urllist[NextURLNo])
NextURLNo += 1
Processes.append(proc)
def CheckRunning():
""" Check any running processes and start new ones if there are spare slots."""
global Processes
global NextURLNo
for p in range(len(Processes):0:-1): # Check the processes in reverse order
if Processes[p].poll() is not None: # If the process hasn't finished will return None
del Processes[p] # Remove from list - this is why we needed reverse order
while (len(Processes) < MaxProcesses) and (NextURLNo < MaxUrls): # More to do and some spare slots
StartNew()
if __name__ == "__main__":
CheckRunning() # This will start the max processes running
while (len(Processes) > 0): # Some thing still going on.
time.sleep(0.1) # You may wish to change the time for this
CheckRunning()
print ("Done!")
Just keep count as you start them and use a callback from each subprocess to start a new one if there are any url list entries to process.
e.g. Assuming that your sub-process calls the OnExit method passed to it as it ends:
NextURLNo = 0
MaxProcesses = 20
NoSubProcess = 0
MaxUrls = 100000
def StartNew():
""" Start a new subprocess if there is work to do """
global NextURLNo
global NoSubProcess
if NextURLNo < MaxUrls:
subprocess.Popen(['python', 'script.py', urllist[NextURLNo], OnExit])
print "Started to Process", urllist[NextURLNo]
NextURLNo += 1
NoSubProcess += 1
def OnExit():
NoSubProcess -= 1
if __name__ == "__main__":
for n in range(MaxProcesses):
StartNew()
while (NoSubProcess > 0):
time.sleep(1)
if (NextURLNo < MaxUrls):
for n in range(NoSubProcess,MaxProcesses):
StartNew()
To keep constant number of concurrent requests, you could use a thread pool:
#!/usr/bin/env python
from multiprocessing.dummy import Pool
def process_url(url):
# ... handle a single url
urllist = [url1, url2, url3, .. , url100000]
for _ in Pool(20).imap_unordered(process_url, urllist):
pass
To run processes instead of threads, remove .dummy from the import.

What's wrong with my python multiprocessing code?

I am an almost new programmer learning python for a few months. For the last 2 weeks, I had been coding to make a script to search permutations of numbers that make magic squares.
Finally I succeeded in searching the whole 880 4x4 magic square numbers sets within 30 seconds. After that I made some different Perimeter Magic Square program. It finds out more than 10,000,000 permutations so that I want to store them part by part to files. The problem is that my program doesn't use all my processes that while it is working to store some partial data to a file, it stops searching new number sets. I hope I could make one process of my CPU keep searching on and the others store the searched data to files.
The following is of the similar structure to my magic square program.
while True:
print('How many digits do you want? (more than 20): ', end='')
ansr = input()
if ansr.isdigit() and int(ansr) > 20:
ansr = int(ansr)
break
else:
continue
fileNum = 0
itemCount = 0
def fileMaker():
global fileNum, itemCount
tempStr = ''
for i in permutationList:
itemCount += 1
tempStr += str(sum(i[:3])) + ' : ' + str(i) + ' : ' + str(itemCount) + '\n'
fileNum += 1
file = open('{0} Permutations {1:03}.txt'.format(ansr, fileNum), 'w')
file.write(tempStr)
file.close()
numList = [i for i in range(1, ansr+1)]
permutationList = []
itemCount = 0
def makePermutList(numList, ansr):
global permutationList
for i in numList:
numList1 = numList[:]
numList1.remove(i)
for ii in numList1:
numList2 = numList1[:]
numList2.remove(ii)
for iii in numList2:
numList3 = numList2[:]
numList3.remove(iii)
for iiii in numList3:
numList4 = numList3[:]
numList4.remove(iiii)
for v in numList4:
permutationList.append([i, ii, iii, iiii, v])
if len(permutationList) == 200000:
print(permutationList[-1])
fileMaker()
permutationList = []
fileMaker()
makePermutList(numList, ansr)
I added from multiprocessing import Pool at the top. And I replaced two 'fileMaker()' parts at the end with the following.
if __name__ == '__main__':
workers = Pool(processes=2)
workers.map(fileMaker, ())
The result? Oh no. It just works awkwardly. For now, multiprocessing looks too difficult for me.
Anybody, please, teach me something. How should my code be modified?
Well, addressing some things that are bugging me before getting to your asked question.
numList = [i for i in range(1, ansr+1)]
I know list comprehensions are cool, but please just do list(range(1, ansr+1)) if you need the iterable to be a list (which you probably don't need, but I digress).
def makePermutList(numList, ansr):
...
This is quite the hack. Is there a reason you can't use itertools.permutations(numList,n)? It's certainly going to be faster, and friendlier on memory.
Lastly, answering your question: if you are looking to improve i/o performance, the last thing you should do is make it multithreaded. I don't mean you shouldn't do it, I mean that it should literally be the last thing you do. Refactor/improve other things first.
You need to take all of that top-level code that uses globals, apply the backspace key to it, and rewrite functions that pass data around properly. Then you can think about using threads. I would personally use from threading import Thread and manually spawn Threads to do each unit of I/O rather than using multiprocessing.

Categories