are defaultdict(set) operations thread safe in python?

are defaultdict(set) operations thread safe in python? - python

I am pretty naive to python, and in the situation where I have to deal with multi-threading in my production code. cut-pasting some part of code, which replicates the similar functionality below:
class DataProcessor(object):
def __init__(self, config: DataProcessorConfig) -> None:
self.config = config
self.lock = defaultdict(set)
self.function_thread_lock = threading.Lock()
self.config_singleton = ConfigSingleton.create_instance()
def process_data(self, data_record: Dict[str, str]) -> None:
self._logger.debug(f"record: {data_record}")
try:
self.acquire_locks(data_record)
self.process_record(data_record)
finally:
self.release_locks(data_record)
def release_locks(self, data_record: Dict[str, str]) -> None:
with self.function_thread_lock:
for obj in self.config_singleton.get_ids(parameter):
id_value = obj.get_id_value(data_record)
if id_value:
self.lock[obj.id_key].remove(obj.get_id_value(data_record))
def acquire_locks(self, data_record: Dict[str, str], threshold: int = 3) -> None:
for obj in self._config_singleton.get_ids(parameter):
try_count = 1
id_value = obj.get_id_value(data_record)
if id_value:
while try_count <= threshold:
try:
self.function_thread_lock.acquire()
if id_value not in self.lock[id_obj.id_key]:
self.lock[id_obj.id_key].add(id_value)
break
finally:
self.function_thread_lock.release()
sleep_amount = 2**try_count
time.sleep(sleep_amount)
try_count += 1
else:
raise Exception("blah blah")
def async_data_processing(self, data_record: Dict) -> Future:
future = self.executor_pool.submit(self.process_data, data_record)
return future
Now in other class, one of the function is calling async_data_processing to perform batch processing/multithreading.
But it seems that the defaultdict(set) is creating problems to perform the multithreaded processing smoothly, and every now and then creating KeyError, or would fail to acquire to locks. KeyError doesn't really makes sense, which is the whole point on using defaultdict(set) instead of default dictionary in python.
I have been struggling with this issue for few days and haven't been able to find any proper solution or direction on it.
Reaching out here, in hope of some help!
would appreciate any help, thank you ☺️

Related

Type annotating for ndb.tasklets

GvRs App Engine ndb Library as well as monocle and - to my understanding - modern Javascript use Generators to make async code look like blocking code.
Things are decorated with #ndb.tasklet. They yield when they want to give back execution to the runloop and when they have their result ready they raise StopIteration(value) (or the alias ndb.Return):
#ndb.tasklet
def get_google_async():
context = ndb.get_context()
result = yield context.urlfetch("http://www.google.com/")
if result.status_code == 200:
raise ndb.Return(result.content)
raise RuntimeError
To use such a Function you get a ndb.Future object back and call the get get_result() Function on that to wait for the result and get it. E.g.:
def get_google():
future = get_google_async()
# do something else in real code here
return future.get_result()
This all works very nice. but how to add type Annotations? The correct types are:
get_google_async() -> ndb.Future (via yield)
ndb.tasklet(get_google_async) -> ndb.Future
ndb.tasklet(get_google_async).get_result() -> str
So far, I came only up with casting the async function.
def get_google():
# type: () -> str
future = get_google_async()
# do something else in real code here
return cast('str', future.get_result())
Unfortunately this is not only about urlfetch but about hundreds of Methods- mainly of ndb.Model.

get_google_async itself is a generator function, so type hints can be () -> Generator[ndb.Future, None, None], I think.
As for get_google, if you don't want to cast, type checking may work.
like
def get_google():
# type: () -> Optional[str]
future = get_google_async()
# do something else in real code here
res = future.get_result()
if isinstance(res, str):
return res
# somehow convert res to str, or
return None

How can I create a dead weakref in python?

Is there a better way of doing this than:
def create_expired_weakref():
class Tmp: pass
ref = weakref.ref(Tmp())
assert ref() is None
return ref
Context: I want a default state for my weakref, so that my class can do:
def __init__(self):
self._ref = create_expired_weakref()
def get_thing(self):
r = self._ref() # I need an empty weakref for this to work the first time
if r is None:
r = SomethingExpensive()
self._ref = weakref.ref(r)
return r

Another approach is to use duck typing here. If all you care about is that it behaves like a dead weakref with respect to the self._ref() call, then you can do
self._ref = lambda : None
This is what I ended up using when I had a similar desire to have a property that would return a cached value if it was available, but None otherwise. I initialized it with this lambda function. Then the property was
#property
def ref(self):
return self._ref()
Update: Credit to #Aran-Fey, who I see posted this idea as a comment to the question, rather than as an answer.

Use you a weakref.finalize for great good:
import weakref
def create_expired_weakref(type_=type("", (object,), {'__slots__':
('__weakref__',)})):
obj = type_()
ref = weakref.ref(obj)
collected = False
def on_collect():
nonlocal collected
collected = True
final = weakref.finalize(obj, on_collect)
del obj
while not collected:
pass
return ref
This might block the thread for a while if you're debugging, and might even deadlock in some obscure situations, but it's guaranteed to return an expired weakref.

Generic arguments in recursive functions: terrible habit?

I catch myself doing this a lot. The example is simple, but, in practice, there are a lot of complex assignments to update data structures and conditions under which the second recursion is not called.
I'm working with mesh data. Points, Edges, and Faces are stored in separate dictionaries and "pointers" (dict keys) are heavily used.
import itertools
class Demo(object):
def __init__(self):
self.a = {}
self.b = {}
self.keygen = itertools.count()
def add_to_b(self, val):
new_key = next(self.keygen)
self.b[new_key] = val
return new_key
def recur_method(self, arg, argisval=True):
a_key = next(self.keygen)
if argisval is True:
# arg is a value
b_key = self.add_to_b(arg)
self.a[a_key] = b_key
self.recur_method(b_key, argisval=False)
else:
# arg is a key
self.a[a_key] = arg
demo = Demo()
demo.recur_method(2.2)
Is there a better way? short of cutting up all of my assignment code into seven different methods? Should I be worried about this anyway?

Try
def recur_method(self, key=None, val=None):
if key is None and val is None:
raise exception("You fail it")
If None is a valid input, then use a guard value:
sentinel = object()
def recur_method(self, key=sentinel, val=sentinel):
if key is sentinel and val is sentinel:
raise exception("You fail it")

Elegant pattern for mutually exclusive keyword args?

Sometimes in my code I have a function which can take an argument in one of two ways. Something like:
def func(objname=None, objtype=None):
if objname is not None and objtype is not None:
raise ValueError("only 1 of the ways at a time")
if objname is not None:
obj = getObjByName(objname)
elif objtype is not None:
obj = getObjByType(objtype)
else:
raise ValueError("not given any of the ways")
doStuffWithObj(obj)
Is there any more elegant way to do this? What if the arg could come in one of three ways? If the types are distinct I could do:
def func(objnameOrType):
if type(objnameOrType) is str:
getObjByName(objnameOrType)
elif type(objnameOrType) is type:
getObjByType(objnameOrType)
else:
raise ValueError("unk arg type: %s" % type(objnameOrType))
But what if they are not? This alternative seems silly:
def func(objnameOrType, isName=True):
if isName:
getObjByName(objnameOrType)
else:
getObjByType(objnameOrType)
cause then you have to call it like func(mytype, isName=False) which is weird.

How about using something like a command dispatch pattern:
def funct(objnameOrType):
dispatcher = {str: getObjByName,
type1: getObjByType1,
type2: getObjByType2}
t = type(objnameOrType)
obj = dispatcher[t](objnameOrType)
doStuffWithObj(obj)
where type1,type2, etc are actual python types (e.g. int, float, etc).

Sounds like it should go to https://codereview.stackexchange.com/
Anyway, keeping the same interface, I may try
arg_parsers = {
'objname': getObjByName,
'objtype': getObjByType,
...
}
def func(**kwargs):
assert len(kwargs) == 1 # replace this with your favorite exception
(argtypename, argval) = next(kwargs.items())
obj = arg_parsers[argtypename](argval)
doStuffWithObj(obj)
or simply create 2 functions?
def funcByName(name): ...
def funcByType(type_): ...

One way to make it slightly shorter is
def func(objname=None, objtype=None):
if [objname, objtype].count(None) != 1:
raise TypeError("Exactly 1 of the ways must be used.")
if objname is not None:
obj = getObjByName(objname)
else:
obj = getObjByType(objtype)
I have not yet decided if I would call this "elegant".
Note that you should raise a TypeError if the wrong number of arguments was given, not a ValueError.

For whatever it's worth, similar kinds of things happen in the Standard Libraries; see, for example, the beginning of GzipFile in gzip.py (shown here with docstrings removed):
class GzipFile:
myfileobj = None
max_read_chunk = 10 * 1024 * 1024 # 10Mb
def __init__(self, filename=None, mode=None,
compresslevel=9, fileobj=None):
if mode and 'b' not in mode:
mode += 'b'
if fileobj is None:
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
if filename is None:
if hasattr(fileobj, 'name'): filename = fileobj.name
else: filename = ''
if mode is None:
if hasattr(fileobj, 'mode'): mode = fileobj.mode
else: mode = 'rb'
Of course this accepts both filename and fileobj keywords and defines a particular behavior in the case that it receives both; but the general approach seems pretty much identical.

I use a decorator:
from functools import wraps
def one_of(kwarg_names):
# assert that one and only one of the given kwarg names are passed to the decorated function
def inner(f):
#wraps(f)
def wrapped(*args, **kwargs):
count = 0
for kw in kwargs:
if kw in kwarg_names and kwargs[kw] is not None:
count += 1
assert count == 1, f'exactly one of {kwarg_names} required, got {kwargs}'
return f(*args, **kwargs)
return wrapped
return inner
Used as:
#one_of(['kwarg1', 'kwarg2'])
def my_func(kwarg1='default', kwarg2='default'):
pass
Note that this only accounts for non- None values that are passed as keyword arguments. E.g. multiple of the kwarg_names may still be passed if all but one of them have a value of None.
To allow for passing none of the kwargs simply assert that the count is <= 1.

It sounds like you're looking for function overloading, which isn't implemented in Python 2. In Python 2, your solution is nearly as good as you can expect to get.
You could probably bypass the extra argument problem by allowing your function to process multiple objects and return a generator:
import types
all_types = set([getattr(types, t) for t in dir(types) if t.endswith('Type')])
def func(*args):
for arg in args:
if arg in all_types:
yield getObjByType(arg)
else:
yield getObjByName(arg)
Test:
>>> getObjByName = lambda a: {'Name': a}
>>> getObjByType = lambda a: {'Type': a}
>>> list(func('IntType'))
[{'Name': 'IntType'}]
>>> list(func(types.IntType))
[{'Type': <type 'int'>}]

The built-in sum() can be used to on a list of boolean expressions. In Python, bool is a subclass of int, and in arithmetic operations, True behaves as 1, and False behaves as 0.
This means that this rather short code will test mutual exclusivity for any number of arguments:
def do_something(a=None, b=None, c=None):
if sum([a is not None, b is not None, c is not None]) != 1:
raise TypeError("specify exactly one of 'a', 'b', or 'c'")
Variations are also possible:
def do_something(a=None, b=None, c=None):
if sum([a is not None, b is not None, c is not None]) > 1:
raise TypeError("specify at most one of 'a', 'b', or 'c'")

I occasionally run into this problem as well, and it is hard to find an easily generalisable solution. Say I have more complex combinations of arguments that are delineated by a set of mutually exclusive arguments and want to support additional arguments for each (some of which may be required and some optional), as in the following signatures:
def func(mutex1: str, arg1: bool): ...
def func(mutex2: str): ...
def func(mutex3: int, arg1: Optional[bool] = None): ...
I would use object orientation to wrap the arguments in a set of descriptors (with names depending on the business meaning of the arguments), which can then be validated by something like pydantic:
from typing import Optional
from pydantic import BaseModel, Extra
# Extra.forbid ensures validation error if superfluous arguments are provided
class BaseDescription(BaseModel, extra=Extra.forbid):
pass # Arguments common to all descriptions go here
class Description1(BaseDescription):
mutex1: str
arg1: bool
class Description2(BaseDescription):
mutex2: str
class Description3(BaseDescription):
mutex3: int
arg1: Optional[bool]
You could instantiate these descriptions with a factory:
class DescriptionFactory:
_class_map = {
'mutex1': Description1,
'mutex2': Description2,
'mutex3': Description3
}
#classmethod
def from_kwargs(cls, **kwargs) -> BaseDescription:
kwargs = {k: v for k, v in kwargs.items() if v is not None}
set_fields = kwargs.keys() & cls._class_map.keys()
try:
[set_field] = set_fields
except ValueError:
raise ValueError(f"exactly one of {list(cls._class_map.keys())} must be provided")
return cls._class_map[set_field](**kwargs)
#classmethod
def validate_kwargs(cls, func):
def wrapped(**kwargs):
return func(cls.from_kwargs(**kwargs))
return wrapped
Then you can wrap your actual function implementation like this and use type checking to see which arguments were provided:
#DescriptionFactory.validate_kwargs
def func(desc: BaseDescription):
if isinstance(desc, Description1):
... # use desc.mutex1 and desc.arg1
elif isinstance(desc, Description2):
... # use desc.mutex2
... # etc.
and call as func(mutex1='', arg1=True), func(mutex2=''), func(mutex3=123) and so on.
This is not overall shorter code, but it performs argument validation in a very descriptive way according to your specification, raises useful pydantic errors when validation fails, and results in accurate static types in each branch of the function implementation.
Note that if you're using Python 3.10+, structural pattern matching could simplify some parts of this.

has_next in Python iterators?

Have Python iterators got a has_next method?

There's an alternative to the StopIteration by using next(iterator, default_value).
For exapmle:
>>> a = iter('hi')
>>> print next(a, None)
h
>>> print next(a, None)
i
>>> print next(a, None)
None
So you can detect for None or other pre-specified value for end of the iterator if you don't want the exception way.

No, there is no such method. The end of iteration is indicated by an exception. See the documentation.

If you really need a has-next functionality, it's easy to obtain it with a little wrapper class. For example:
class hn_wrapper(object):
def __init__(self, it):
self.it = iter(it)
self._hasnext = None
def __iter__(self): return self
def next(self):
if self._hasnext:
result = self._thenext
else:
result = next(self.it)
self._hasnext = None
return result
def hasnext(self):
if self._hasnext is None:
try: self._thenext = next(self.it)
except StopIteration: self._hasnext = False
else: self._hasnext = True
return self._hasnext
now something like
x = hn_wrapper('ciao')
while x.hasnext(): print next(x)
emits
c
i
a
o
as required.
Note that the use of next(sel.it) as a built-in requires Python 2.6 or better; if you're using an older version of Python, use self.it.next() instead (and similarly for next(x) in the example usage). [[You might reasonably think this note is redundant, since Python 2.6 has been around for over a year now -- but more often than not when I use Python 2.6 features in a response, some commenter or other feels duty-bound to point out that they are 2.6 features, thus I'm trying to forestall such comments for once;-)]]
===
For Python3, you would make the following changes:
from collections.abc import Iterator # since python 3.3 Iterator is here
class hn_wrapper(Iterator): # need to subclass Iterator rather than object
def __init__(self, it):
self.it = iter(it)
self._hasnext = None
def __iter__(self):
return self
def __next__(self): # __next__ vs next in python 2
if self._hasnext:
result = self._thenext
else:
result = next(self.it)
self._hasnext = None
return result
def hasnext(self):
if self._hasnext is None:
try:
self._thenext = next(self.it)
except StopIteration:
self._hasnext = False
else: self._hasnext = True
return self._hasnext

In addition to all the mentions of StopIteration, the Python "for" loop simply does what you want:
>>> it = iter("hello")
>>> for i in it:
... print i
...
h
e
l
l
o

Try the __length_hint__() method from any iterator object:
iter(...).__length_hint__() > 0

You can tee the iterator using, itertools.tee, and check for StopIteration on the teed iterator.

hasNext somewhat translates to the StopIteration exception, e.g.:
>>> it = iter("hello")
>>> it.next()
'h'
>>> it.next()
'e'
>>> it.next()
'l'
>>> it.next()
'l'
>>> it.next()
'o'
>>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
StopIteration docs: http://docs.python.org/library/exceptions.html#exceptions.StopIteration
Some article about iterators and generator in python: http://www.ibm.com/developerworks/library/l-pycon.html

No. The most similar concept is most likely a StopIteration exception.

I believe python just has next() and according to the doc, it throws an exception is there are no more elements.
http://docs.python.org/library/stdtypes.html#iterator-types

The use case that lead me to search for this is the following
def setfrom(self,f):
"""Set from iterable f"""
fi = iter(f)
for i in range(self.n):
try:
x = next(fi)
except StopIteration:
fi = iter(f)
x = next(fi)
self.a[i] = x
where hasnext() is available, one could do
def setfrom(self,f):
"""Set from iterable f"""
fi = iter(f)
for i in range(self.n):
if not hasnext(fi):
fi = iter(f) # restart
self.a[i] = next(fi)
which to me is cleaner. Obviously you can work around issues by defining utility classes, but what then happens is you have a proliferation of twenty-odd different almost-equivalent workarounds each with their quirks, and if you wish to reuse code that uses different workarounds, you have to either have multiple near-equivalent in your single application, or go around picking through and rewriting code to use the same approach. The 'do it once and do it well' maxim fails badly.
Furthermore, the iterator itself needs to have an internal 'hasnext' check to run to see if it needs to raise an exception. This internal check is then hidden so that it needs to be tested by trying to get an item, catching the exception and running the handler if thrown. This is unnecessary hiding IMO.

Maybe it's just me, but while I like https://stackoverflow.com/users/95810/alex-martelli 's answer, I find this a bit easier to read:
from collections.abc import Iterator # since python 3.3 Iterator is here
class MyIterator(Iterator): # need to subclass Iterator rather than object
def __init__(self, it):
self._iter = iter(it)
self._sentinel = object()
self._next = next(self._iter, self._sentinel)
def __iter__(self):
return self
def __next__(self): # __next__ vs next in python 2
if not self.has_next():
next(self._iter) # raises StopIteration
val = self._next
self._next = next(self._iter, self._sentinel)
return val
def has_next(self):
return self._next is not self._sentinel

No, there is no such method. The end of iteration is indicated by a StopIteration (more on that here).
This follows the python principle EAFP (easier to ask for forgiveness than permission). A has_next method would follow the principle of LBYL (look before you leap) and contradicts this core python principle.
This interesting article explains the two concepts in more detail.

Suggested way is StopIteration.
Please see Fibonacci example from tutorialspoint
#!usr/bin/python3
import sys
def fibonacci(n): #generator function
a, b, counter = 0, 1, 0
while True:
if (counter > n):
return
yield a
a, b = b, a + b
counter += 1
f = fibonacci(5) #f is iterator object
while True:
try:
print (next(f), end=" ")
except StopIteration:
sys.exit()

It is also possible to implement a helper generator that wraps any iterator and answers question if it has next value:
Try it online!
def has_next(it):
first = True
for e in it:
if not first:
yield True, prev
else:
first = False
prev = e
if not first:
yield False, prev
for has_next_, e in has_next(range(4)):
print(has_next_, e)
Which outputs:
True 0
True 1
True 2
False 3
The main and probably only drawback of this method is that it reads ahead one more element, for most of tasks it is totally alright, but for some tasks it may be disallowed, especially if user of has_next() is not aware of this read-ahead logic and may missuse it.
Code above works for infinite iterators too.
Actually for all cases that I ever programmed such kind of has_next() was totally enough and didn't cause any problems and in fact was very helpful. You just have to be aware of its read-ahead logic.

The way has solved it based on handling the "StopIteration" execption is pretty straightforward in order to read all iterations :
end_cursor = False
while not end_cursor:
try:
print(cursor.next())
except StopIteration:
print('end loop')
end_cursor = True
except:
print('other exceptions to manage')
end_cursor = True

I think there are valid use cases for when you may want some sort of has_next functionality, in which case you should decorate an iterator with a has_next defined.
Combining concepts from the answers to this question here is my implementation of that which feels like a nice concise solution to me (python 3.9):
_EMPTY_BUF = object()
class BufferedIterator(Iterator[_T]):
def __init__(self, real_it: Iterator[_T]):
self._real_it = real_it
self._buf = next(self._real_it, _EMPTY_BUF)
def has_next(self):
return self._buf is not _EMPTY_BUF
def __next__(self) -> _T_co:
v = self._buf
self._buf = next(self._real_it, _EMPTY_BUF)
if v is _EMPTY_BUF:
raise StopIteration()
return v
The main difference is that has_next is just a boolean expression, and also handles iterators with None values.
Added this to a gist here with tests and example usage.

With 'for' one can implement his own version of 'next' avoiding exception
def my_next(it):
for x in it:
return x
return None

very interesting question, but this "hasnext" design had been put into leetcode:
https://leetcode.com/problems/iterator-for-combination/
here is my implementation:
class CombinationIterator:
def __init__(self, characters: str, combinationLength: int):
from itertools import combinations
from collections import deque
self.iter = combinations(characters, combinationLength)
self.res = deque()
def next(self) -> str:
if len(self.res) == 0:
return ''.join(next(self.iter))
else:
return ''.join(self.res.pop())
def hasNext(self) -> bool:
try:
self.res.insert(0, next(self.iter))
return True
except:
return len(self.res) > 0

The way I solved my problem is to keep the count of the number of objects iterated over, so far. I wanted to iterate over a set using calls to an instance method. Since I knew the length of the set, and the number of items counted so far, I effectively had an hasNext method.
A simple version of my code:
class Iterator:
# s is a string, say
def __init__(self, s):
self.s = set(list(s))
self.done = False
self.iter = iter(s)
self.charCount = 0
def next(self):
if self.done:
return None
self.char = next(self.iter)
self.charCount += 1
self.done = (self.charCount < len(self.s))
return self.char
def hasMore(self):
return not self.done
Of course, the example is a toy one, but you get the idea. This won't work in cases where there is no way to get the length of the iterable, like a generator etc.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

are defaultdict(set) operations thread safe in python? - python

Related

Type annotating for ndb.tasklets

How can I create a dead weakref in python?

Generic arguments in recursive functions: terrible habit?

Elegant pattern for mutually exclusive keyword args?

has_next in Python iterators?

Categories

Resources