concurrent collections in python - python

I was wondering if there was any concurrent structure like queue in python but with the ability to remove a specific element.
Example:
import queue
#with queue would be
q = queue.Queue()
#put some element
q.put(elem)
#i want to delete a specific element
#but queue does not provide this method
q.remove(elem)
What could I use?

Actually Python lists work like what you are looking for. As a matter of fact, the translation of your code (which requires no imports) should look like this:
#Create the list
q = [element1, element2, element3...]
#Insert element
q.insert(position, element4)
#Insert element in the end
q.append(element4)
#Remove element
del(q[position])
So that you can manage it as desired.
I hope that helps you.

Related

I want to update a list that I'm using in a while loop, while the loop is still running. Is there a way to do this?

I have a function that uses a while loop, which I ideally want to set up and then run in the background. In this while loop, I use a list. What I want to do is if I think of something else to put in this list, I can simply edit the list, and then the next time the loop begins the updated list is used. At the moment, I can't seem to find a way to do this.
I tried to define the list in a separate programme, and then import it at the start of each loop. I have then updated the list in the separate programme, but this hasn't been reflected in the output.
import time
while True:
from list_test import sample_list
print(sample_list)
time.sleep(30)
When I update sample_list, the output doesn't change. Does anyone know why this is? Apologies if the solution is simple, I'm quite new to programming in general!
As already stated in the comments it generally not adviced to update a list you are iterating over (even though you are just printing it in your example). That said you could use importlib with the reload method. For example like this:
import time
import importlib
list_module = importlib.import_module("list_test")
sample_list = list_module.sample_list
while True:
sample_list = importlib.reload(list_module).sample_list
print(sample_list)
time.sleep(5)
Note that you have to update the list by hand to see the changes. Updating the list with another program at runtime will not work.

Python Multithreading for search

I have a class that I have written that will open a text document and search it line by line for the keywords that are input from a GUI that I have created in a different file. It works great, the only problem is the text document that I am searching is long (over 60,000 entries). I was looking at ways to make the search faster and have been playing around with multithreading but have not had any success yet. Basically, the main program calls the search function which takes the line and breaks it into individual words. Then over a loop checks each of the words against the keywords from the user. If the keyword is in that word then it says its true and adds a 1 to a list. At the end, if there is the same number of keywords as true statements then it adds that line to a set that is returned at the end of main.
What I would like to do is incorporate multithreading into this so that it will run much faster but at the end of the main function will still return results. Any advice or direction with being able to accomplish this will be very helpful. I have tried to read a bunch of examples and watched a bunch of youtube videos but it didn't seem to transfer over when I tried. Thank you for your help and your time.
import pdb
from threading import Thread
class codeBook:
def __init__(self):
pass
def main(self, search):
count = 0
results = set()
with open('CodeBook.txt') as current_CodeBook:
lines = current_CodeBook.readlines()
for line in lines:
line = line.strip()
new_search = self.change_search(line,search)
line = new_search[0]
search = new_search[1]
#if search in line:
if self.speed_search(line,search) == True:
results.add(line)
else:
pass
count = count + 1
results = sorted(list(results))
return results
def change_search(self, current_line, search):
current_line = current_line.lower()
search = search.lower()
return current_line, search
def search(self,line,keywords):
split_line = line.split()
split_keywords = keywords.split()
numberOfTrue = list()
for i in range(0,len(split_keywords)):
if split_keywords[i] in line:
numberOfTrue.append(1)
if len(split_keywords) == len(numberOfTrue):
return True
else:
return False
You can split the file into several parts and create a new thread that reads and processes a specific part. You can keep a data structure global to all threads and add lines that match the search query from all the threads to it. This structure should either be thread-safe or you need to use some kind of synchronization (like a lock) to work with it.
Note: CPython interpreter has a global interpreter lock (GIL), so if you're using it and your application is CPU-heavy (which seems to the case here), you might not get any benefits from multithreading whatsoever.
You can use the multiprocessing module instead. It comes with means of interprocess communitation. A Queue looks like the right structure for your problem (each process could add matching lines to the queue). After that, you just need to get all lines from the queue and do what you did with the results in your code.
While threading and/or multiprocessing can be beneficial and speed up execution, I would want to direct your attention to looking into the possibility to optimize your current algorithm, running in a single thread, before doing that.
Looking at your implementation I believe a lot of work is done several times for no reason. To the best of my understanding the following function will perform the same operation as your codeBook.main but with less overhead:
def search_keywords(keyword_string, filename='CodeBook.txt'):
results = set()
keywords = set()
for keyword in keyword_string.lower():
keywords.add(keyword)
with open(filename) as code_book:
for line in code_book:
words = line.strip().lower()
kws_present = True
for keyword in keywords:
kws_present = keyword in words
if not kws_present:
break
if kws_present:
results.add(line)
return sorted(list(results))
Try this function, as is, or slightly modified for your needs and see if that gives you a sufficient speed-up. First when that is not enough, you should look into more complex solutions, as it invariably will increase the complexity of your program to introduce more threads/processes.

Can I read content of the oldest element in a Queue without ruining thread safety?

Removing an element from my queue depends on the content of the element.
In order to access the content the only suggested way is via myQueue.get() method.
The content of a queue can be also accessed in that way myQueue.queue[0] though. Which translates to reading content directly from the underlying deque. That means that reading content that way doesn't offer the thread safety privilege that queue is originally created for.
Is there a thread-safe way to read the content without removing the element?
You may want to adapt the solution shown below. It allows you to get an element if the element satisfies some condition. Your question did not make it clear how you wanted to handle blocking, timeouts, or the inability to find a matching item. As such, the following code is nonblocking and returns a default value when the oldest value does not match the given condition. Please modify the code to fit your particular usage scenario.
import queue
class Queue(queue.Queue):
def get_if(self, condition, default=None):
with self.not_empty:
if not self._qsize():
raise queue.Empty
if condition(self._peek()):
item = self._get()
self.not_full.notify()
return item
return default
def _peek(self):
return self.queue[0]
def main():
my_queue = Queue()
for number in range(100, 200):
my_queue.put_nowait(number)
number = my_queue.get_if(lambda item: not item % 10)
print(number)
if __name__ == '__main__':
main()

Python test if an element is in a dictionary

I am trying to make a python program call notepad.exe it not already running. Right now, I'm using psutil with
pinfo = proc.as_dict(attrs=['name'])
if ("{'name': %s}"%SomeProcess) != str(pinfo):
subprocess.call("%s"%SomeProcess, shell=True)
However, this won't work because the subprocess.call will call for every name on the list besides the one it is looking for.
Knowing how to use subprocess and some of psutil I should know this, but is there any way to see if a dictionary has a preset string in it with one line of code? Something like
pinfo = proc.as_dict(attrs=['name'])
if pinfo contains "somename":
do something
If this is possible, (99% sure it is.) can it be done without a loop? (I want to update the process list every second or so.)
Thanks!
Edit:
Okay, probably should have given slightly more code, as that would have been relevant.
proc.as_dict()
by itself won't do anything, so I have this is a "for" loop.
for proc in psutil.process_iter():
pinfo = proc.as_dict(attrs=['name'])
How could I change that to output a dictionary, rather than a single line*, and would pinfo.get('somename') work on that created dictionary?
*If I use print(pinfo) it outputs one line. Something like {'name': 'pythonw.exe'}.
I think, you want to do something like that:
names = set()
for proc in psutil.process_iter():
names.add(proc.name())
if SomeProcess not in names:
subprocess.call("%s"%SomeProcess, shell=True)
It does not make to much sense, to create a dictionary with one entry and what you really wanted to do was to scan all processes in your system (I guess). So you already mentioned the iter, only your question was a little weird. Using the set (names) makes the decision a little more performant. You also could use a list here.

In python, how to use queues properly?

So far I have the following:
fnamw = input("Enter name of file:")
def carrem(fnamw):
s = Queue( )
for line in fnamw:
s.enqueue(line)
return s
print(carrem(fnamw))
The above doesn't print a list of the numbers in the file that I input instead the following is obtained:
<__main__.Queue object at 0x0252C930>
When printing a Queue, you're just printing the object directly, which is why you get that result.
You don't want to print the object representation, but I'm assuming you want to print the contents of the Queue. To do so you need to call the get method of the Queue. It's worth noting that in doing so, you will exhaust the Queue.
Replacing print(carrem(fnamw)) with print(carrem(fnamw).get()) should print the first item of the Queue.
If you really just want to print the list of items in the Queue, you should just use a list. Queue are specifically if you're looking for a FIFO (first-in-first-out) data structure.
It seems to me that you don't actually have any need for a Queue in that program. A Queue is used primarily for synchronization and data transfer in multithreaded programming. And it really doesn't seem as if that is what you're attempting to do.
For you usage, you could just as well use an ordinary Python list:
fnamw = input("Enter name of file:")
def carrem(fnamw):
s = []
for line in fnamw:
s.append(line)
return s
print(carrem(fnamw))
On that same note, however, you're not actually reading the file. The program as you quoted it will simply put each character in the filename as a post of its own into the list (or Queue). What you really want is this:
def carrem(fnamw):
s = []
with open(fnamw) as fp:
for line in fp:
s.append(line)
return s
Or, even simpler:
def carrem(fnamw):
with open(fnamw) as fp:
return list(fp)

Categories