pymongo error when writing - python

I am unable to do any writes to a remote mongodb database. I am able to connect and do lookups (e.g. find). I connect like this:
conn = pymongo.MongoClient(db_uri,slaveOK=True)
db = conn.test_database
coll = db.test_collection
But when I try to insert,
coll.insert({'a':1})
I run into an error:
---------------------------------------------------------------------------
AutoReconnect Traceback (most recent call last)
<ipython-input-56-d4ffb9e3fa79> in <module>()
----> 1 coll.insert({'a':1})
/usr/lib/python2.7/dist-packages/pymongo/collection.pyc in insert(self, doc_or_docs, manipulate, safe, check_keys, continue_on_error, **kwargs)
410 message._do_batched_insert(self.__full_name, gen(), check_keys,
411 safe, options, continue_on_error,
--> 412 self.uuid_subtype, client)
413
414 if return_one:
/usr/lib/python2.7/dist-packages/pymongo/mongo_client.pyc in _send_message(self, message, with_last_error, command, check_primary)
1126 except (ConnectionFailure, socket.error), e:
1127 self.disconnect()
-> 1128 raise AutoReconnect(str(e))
1129 except:
1130 sock_info.close()
AutoReconnect: not master
If I remove the slaveOK=True (setting it to it's default value of False) then I can still connect, but the reads (and writes) fail:
AutoReconnect Traceback (most recent call last)
<ipython-input-70-6671eea24f80> in <module>()
----> 1 coll.find_one()
/usr/lib/python2.7/dist-packages/pymongo/collection.pyc in find_one(self, spec_or_id, *args, **kwargs)
719 *args, **kwargs).max_time_ms(max_time_ms)
720
--> 721 for result in cursor.limit(-1):
722 return result
723 return None
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in next(self)
1036 raise StopIteration
1037 db = self.__collection.database
-> 1038 if len(self.__data) or self._refresh():
1039 if self.__manipulate:
1040 return db._fix_outgoing(self.__data.popleft(),
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in _refresh(self)
980 self.__skip, ntoreturn,
981 self.__query_spec(), self.__fields,
--> 982 self.__uuid_subtype))
983 if not self.__id:
984 self.__killed = True
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in __send_message(self, message)
923 self.__tz_aware,
924 self.__uuid_subtype,
--> 925 self.__compile_re)
926 except CursorNotFound:
927 self.__killed = True
/usr/lib/python2.7/dist-packages/pymongo/helpers.pyc in _unpack_response(response, cursor_id, as_class, tz_aware, uuid_subtype, compile_re)
99 error_object = bson.BSON(response[20:]).decode()
100 if error_object["$err"].startswith("not master"):
--> 101 raise AutoReconnect(error_object["$err"])
102 elif error_object.get("code") == 50:
103 raise ExecutionTimeout(error_object.get("$err"),
AutoReconnect: not master and slaveOk=false
Am I connecting incorrectly? Is there a way to specify connecting to the primary replica?

AutoReconnect: not master means that your operation is failing because the node on which you are attempting to issue the command is not the primary of a replica set, where the command (e.g., a write operation) requires that node to be a primary. Setting slaveOK=True just enables you to read from a secondary node, where by default you would only be able to read from the primary.
MongoClient is automatically able to discover and connect to the primary if the replica set name is provided to the constructor with replicaSet=<replica set name>. See "Connecting to a Replica Set" in the PyMongo docs.
As an aside, slaveOK is deprecated, replaced by ReadPreference. You can specify a ReadPreference when creating the client or when issuing queries, if you want to target a node other than the primary.

I don't know It's related to this topic or not But when I searched about the below exception google leads me to the question. Maybe it'd be helpful.
pymongo.errors.NotMasterError: not master
In my case, My hard drive was full.
you can also figure it out with df -h command

Related

py2neo Issue: ConnectionUnavailable: Cannot open connection to ConnectionProfile('bolt://localhost:7687')

I am trying to replicate this example on neo4j desktop:
https://stellargraph.readthedocs.io/en/stable/demos/connector/neo4j/load-cora-into-neo4j.html
I am able to reproduce everything until I get to the following line:
import py2neo
default_host = os.environ.get("STELLARGRAPH_NEO4J_HOST")
# Create the Neo4j Graph database object; the arguments can be edited to specify location and authentication
graph = py2neo.Graph(host=default_host, port=None, user=None, password=None)
I have tried the following attempts to create the neo4j database object:
#1
default_host = os.environ.get("StellarGraph")
graph = py2neo.Graph(host=default_host, port=None, user=None, password=None)
#2
uri = 'bolt://localhost:7687'
graph = Graph(uri, auth=("neo4j", "password"), port= 7687, secure=True)
#3
uri = uri = 'bolt://localhost:7687'
graph = Graph(uri, auth=("neo4j", "password"), port= 7687, secure=True, name= "StellarGraph")
However, each time I attempt this, it results in some variation of this error:
IndexError Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:806, in ConnectionPool.acquire(self, force_reset, can_overfill)
804 try:
805 # Plan A: select a free connection from the pool
--> 806 cx = self._free_list.popleft()
807 except IndexError:
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
ConnectionRefusedError Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/py2neo/wiring.py:62, in Wire.open(cls, address, timeout, keep_alive, on_broken)
61 try:
---> 62 s.connect(address)
63 except (IOError, OSError) as error:
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
WireError Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/py2neo/client/bolt.py:355, in Bolt.open(cls, profile, user_agent, on_release, on_broken)
354 try:
--> 355 wire = cls._connect(profile, on_broken=on_broken)
356 protocol_version = cls._handshake(wire)
File ~/.local/lib/python3.8/site-packages/py2neo/client/bolt.py:369, in Bolt._connect(cls, profile, on_broken)
368 log.debug("[#%04X] C: (Dialing <%s>)", 0, profile.address)
--> 369 wire = Wire.open(profile.address, keep_alive=True, on_broken=on_broken)
370 local_port = wire.local_address.port_number
File ~/.local/lib/python3.8/site-packages/py2neo/wiring.py:64, in Wire.open(cls, address, timeout, keep_alive, on_broken)
63 except (IOError, OSError) as error:
---> 64 raise_from(WireError("Cannot connect to %r" % (address,)), error)
65 return cls(s, on_broken=on_broken)
File <string>:3, in raise_from(value, from_value)
WireError: Cannot connect to IPv4Address(('localhost', 7687))
The above exception was the direct cause of the following exception:
ConnectionUnavailable Traceback (most recent call last)
/home/myname/Project1/graph_import.ipynb Cell 13' in <cell line: 2>()
1 uri = 'bolt://localhost:7687'
----> 2 graph = Graph(uri, auth=("neo4j", "mypass"), port= 7687, secure=True, name= "StellarGraph")
File ~/.local/lib/python3.8/site-packages/py2neo/database.py:288, in Graph.__init__(self, profile, name, **settings)
287 def __init__(self, profile=None, name=None, **settings):
--> 288 self.service = GraphService(profile, **settings)
289 self.__name__ = name
290 self.schema = Schema(self)
File ~/.local/lib/python3.8/site-packages/py2neo/database.py:119, in GraphService.__init__(self, profile, **settings)
116 if connector_settings["init_size"] is None and not profile.routing:
117 # Ensures credentials are checked on construction
118 connector_settings["init_size"] = 1
--> 119 self._connector = Connector(profile, **connector_settings)
120 self._graphs = {}
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:960, in Connector.__init__(self, profile, user_agent, init_size, max_size, max_age, routing_refresh_ttl)
958 else:
959 self._router = None
--> 960 self._add_pools(*self._initial_routers)
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:982, in Connector._add_pools(self, *profiles)
980 continue
981 log.debug("Adding connection pool for profile %r", profile)
--> 982 pool = ConnectionPool.open(
983 profile,
984 user_agent=self._user_agent,
985 init_size=self._init_size,
986 max_size=self._max_size,
987 max_age=self._max_age,
988 on_broken=self._on_broken)
989 self._pools[profile] = pool
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:649, in ConnectionPool.open(cls, profile, user_agent, init_size, max_size, max_age, on_broken)
627 """ Create a new connection pool, with an option to seed one
628 or more initial connections.
629
(...)
646 scheme
647 """
648 pool = cls(profile, user_agent, max_size, max_age, on_broken)
--> 649 seeds = [pool.acquire() for _ in range(init_size or cls.default_init_size)]
650 for seed in seeds:
651 seed.release()
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:649, in <listcomp>(.0)
627 """ Create a new connection pool, with an option to seed one
628 or more initial connections.
629
(...)
646 scheme
647 """
648 pool = cls(profile, user_agent, max_size, max_age, on_broken)
--> 649 seeds = [pool.acquire() for _ in range(init_size or cls.default_init_size)]
650 for seed in seeds:
651 seed.release()
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:813, in ConnectionPool.acquire(self, force_reset, can_overfill)
807 except IndexError:
808 if self._has_capacity() or can_overfill:
809 # Plan B: if the pool isn't full, open
810 # a new connection. This may raise a
811 # ConnectionUnavailable exception, which
812 # should bubble up to the caller.
--> 813 cx = self._connect()
814 if cx.supports_multi():
815 self._supports_multi = True
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:764, in ConnectionPool._connect(self)
761 def _connect(self):
762 """ Open and return a new connection.
763 """
--> 764 cx = Connection.open(self.profile, user_agent=self.user_agent,
765 on_release=lambda c: self.release(c),
766 on_broken=lambda msg: self.__on_broken(msg))
767 self._server_agent = cx.server_agent
768 return cx
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:174, in Connection.open(cls, profile, user_agent, on_release, on_broken)
172 if profile.protocol == "bolt":
173 from py2neo.client.bolt import Bolt
--> 174 return Bolt.open(profile, user_agent=user_agent,
175 on_release=on_release, on_broken=on_broken)
176 elif profile.protocol == "http":
177 from py2neo.client.http import HTTP
File ~/.local/lib/python3.8/site-packages/py2neo/client/bolt.py:364, in Bolt.open(cls, profile, user_agent, on_release, on_broken)
362 return bolt
363 except (TypeError, WireError) as error:
--> 364 raise_from(ConnectionUnavailable("Cannot open connection to %r" % profile), error)
File <string>:3, in raise_from(value, from_value)
ConnectionUnavailable: Cannot open connection to ConnectionProfile('bolt+s://localhost:7687')
I have also tried variations on this fix as well, but had the same error:
ISSUE IN CONNECTING py2neo v4 to my neo4j server
I appreciate any help resolving this issue. Thanks!
I was able to resolve this with the following syntax:
graph = Graph('neo4j://localhost:7687', user="neo4j", password="999")
However, I am now having an issue with the following block:
empty_db_query = """
MATCH(n) DETACH
DELETE(n)
"""
tx = graph.begin(autocommit=True)
tx.evaluate(empty_db_query)
For the newer version of py2neo, the graph.begin argument takes readonly = F instead of autocommit = True, but in any case, I have this error now:
ServiceUnavailable Traceback (most recent call last)
/home/myname/Project1/graph_import.ipynb Cell 13' in <cell line: 6>()
1 empty_db_query = """
2 MATCH(n) DETACH
3 DELETE(n)
4 """
----> 6 tx = graph.begin(readonly=False)
7 tx.evaluate(empty_db_query)
File ~/.local/lib/python3.8/site-packages/py2neo/database.py:351, in Graph.begin(self, readonly)
340 def begin(self, readonly=False,
341 # after=None, metadata=None, timeout=None
342 ):
343 """ Begin a new :class:`~py2neo.Transaction`.
344
345 :param readonly: if :py:const:`True`, will begin a readonly
(...)
349 removed. Use the 'auto' method instead.*
350 """
--> 351 return Transaction(self, autocommit=False, readonly=readonly,
352 # after, metadata, timeout
353 )
File ~/.local/lib/python3.8/site-packages/py2neo/database.py:915, in Transaction.__init__(self, graph, autocommit, readonly)
913 self._ref = None
914 else:
--> 915 self._ref = self._connector.begin(self.graph.name, readonly=readonly,
916 # after, metadata, timeout
917 )
918 self._readonly = readonly
919 self._closed = False
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:1357, in Connector.begin(self, graph_name, readonly)
1345 def begin(self, graph_name, readonly=False,
1346 # after=None, metadata=None, timeout=None
1347 ):
1348 """ Begin a new explicit transaction.
1349
1350 :param graph_name:
(...)
1355 :raises Failure: if the server signals a failure condition
1356 """
-> 1357 cx = self._acquire(graph_name)
1358 try:
1359 return cx.begin(graph_name, readonly=readonly,
1360 # after=after, metadata=metadata, timeout=timeout
1361 )
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:1111, in Connector._acquire(self, graph_name, readonly)
1109 return self._acquire_ro(graph_name)
1110 else:
-> 1111 return self._acquire_rw(graph_name)
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:1203, in Connector._acquire_rw(self, graph_name)
1199 # TODO: exit immediately if the server/cluster is in readonly mode
1201 while True:
-> 1203 ro_profiles, rw_profiles = self._get_profiles(graph_name, readonly=False)
1204 if rw_profiles:
1205 # There is at least one writer, so collect the pools
1206 # for those writers. In all implementations to date,
1207 # a Neo4j cluster will only ever contain at most one
1208 # writer (per database). But this algorithm should
1209 # still survive if that changes.
1210 pools = [pool for profile, pool in list(self._pools.items())
1211 if profile in rw_profiles]
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:1016, in Connector._get_profiles(self, graph_name, readonly)
1014 rt.wait_until_updated()
1015 else:
-> 1016 self.refresh_routing_table(graph_name)
File ~/.local/lib/python3.8/site-packages/py2neo/client/__init__.py:1064, in Connector.refresh_routing_table(self, graph_name)
1062 cx.release()
1063 else:
-> 1064 raise ServiceUnavailable("Cannot connect to any known routers")
1065 finally:
1066 rt.set_not_updating()
ServiceUnavailable: Cannot connect to any known routers
Appreciate any help in resolving this. Thank you!

Deleting files from blob - TypeError: quote_from_bytes() expected bytes

I have some files inside a container named data:
folder1/somepath/folder2/output/folder3/my_file1.csv
folder1/somepath/folder2/output/folder3/my_file4.csv
folder1/somepath/folder2/output/folder3/my_file23.csv
I have the following code:
file_names_prefix = os.path.join('folder1/somepath/','folder2','output','folder3','my_file')
client = BlobServiceClient('https://mystoragename.blob.core.windows.net',credential=ManagedIdentityCredential()).get_container_client('data')
blob_list = client.list_blobs(name_starts_with=file_names_prefix)
file_list = [blob.name for blob in blob_list]
The code above produces the following output:
['folder1/somepath/folder2/output/folder3/my_file1.csv',
'folder1/somepath/folder2/output/folder3/my_file4.csv',
'folder1/somepath/folder2/output/folder3/my_file23.csv']
but when trying to delete these files using:
client.delete_blobs(file_list)
There is an error:
TypeError Traceback (most recent call last)
/tmp/ipykernel_2376/712121654.py in
----> 1 client.delete_blobs(file_list)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azure/core/tracing/decorator.py in wrapper_use_tracer(*args, **kwargs)
81 span_impl_type = settings.tracing_implementation()
82 if span_impl_type is None:
---> 83 return func(*args, **kwargs)
84
85 # Merge span is parameter is set, but only if no explicit parent are passed
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azure/storage/blob/_container_client.py in delete_blobs(self, *blobs, **kwargs)
1298 return iter(list())
1299
-> 1300 reqs, options = self._generate_delete_blobs_options(*blobs, **kwargs)
1301
1302 return self._batch_send(*reqs, **options)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azure/storage/blob/_container_client.py in _generate_delete_blobs_options(self, *blobs, **kwargs)
1206 req = HttpRequest(
1207 "DELETE",
-> 1208 "/{}/{}{}".format(quote(container_name), quote(blob_name, safe='/~'), self._query_str),
1209 headers=header_parameters
1210 )
/anaconda/envs/azureml_py38/lib/python3.8/urllib/parse.py in quote(string, safe, encoding, errors)
817 if errors is not None:
818 raise TypeError("quote() doesn't support 'errors' for bytes")
--> 819 return quote_from_bytes(string, safe)
820
821 def quote_plus(string, safe='', encoding=None, errors=None):
/anaconda/envs/azureml_py38/lib/python3.8/urllib/parse.py in quote_from_bytes(bs, safe)
842 """
843 if not isinstance(bs, (bytes, bytearray)):
--> 844 raise TypeError("quote_from_bytes() expected bytes")
845 if not bs:
846 return ''
TypeError: quote_from_bytes() expected bytes
Can someone please help?
I tried various things, but nothing worked. Ended up deleting files in a loop.
for file in file_list:
client.delete_blob(file)
See https://github.com/Azure/azure-sdk-for-python/issues/25764. delete_blobs takes *blobs as its first argument. So
client.delete_blobs(*file_list)
should do the trick.
Here are the official docs for reference.
The error is due to lack of permissions. Azure uses Shared Access Singatures[SAS] tokens and roles to protect the Azure Blob storage objects like containers, and blobs. The above code snippet uses default credentials, which has read and list access to the Blob container that is being used, however that user is not having the correct role to delete the blob. Check Azure documentation to know the RBAC roles that allows blob deletion.
In order to delete a blob, the RBAC action that needs to be present for the role is Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete.
See Azure documentation for full list of RBAC actions
Refer this SO answer.

Error while saving Optuna study to Google Drive from Colab

I can save a random file to my drive colab as:
with open ("gdrive/My Drive/chapter_classification/output/hello.txt",'w')as f:
f.write('hello')
works fine but when I use the Official documentation approach of Optuna using the code:
direction = 'minimize'
name = 'opt1'
study = optuna.create_study(sampler=optuna.samplers.TPESampler(),direction=direction,study_name=name, storage=f"gdrive/My Drive/chapter_classification/output/sqlite:///{name}.db",load_if_exists=True)
study.optimize(tune, n_trials=1000)
throws an error as:
ArgumentError Traceback (most recent call last)
<ipython-input-177-f32da2c0f69a> in <module>()
2 direction = 'minimize'
3 name = 'opt1'
----> 4 study = optuna.create_study(sampler=optuna.samplers.TPESampler(),direction=direction,study_name=name, storage="gdrive/My Drive/chapter_classification/output/sqlite:///opt1.db",load_if_exists=True)
5 study.optimize(tune, n_trials=1000)
6 frames
/usr/local/lib/python3.7/dist-packages/optuna/study/study.py in create_study(storage, sampler, pruner, study_name, direction, load_if_exists, directions)
1134 ]
1135
-> 1136 storage = storages.get_storage(storage)
1137 try:
1138 study_id = storage.create_new_study(study_name)
/usr/local/lib/python3.7/dist-packages/optuna/storages/__init__.py in get_storage(storage)
29 return RedisStorage(storage)
30 else:
---> 31 return _CachedStorage(RDBStorage(storage))
32 elif isinstance(storage, RDBStorage):
33 return _CachedStorage(storage)
/usr/local/lib/python3.7/dist-packages/optuna/storages/_rdb/storage.py in __init__(self, url, engine_kwargs, skip_compatibility_check, heartbeat_interval, grace_period, failed_trial_callback)
173
174 try:
--> 175 self.engine = create_engine(self.url, **self.engine_kwargs)
176 except ImportError as e:
177 raise ImportError(
<string> in create_engine(url, **kwargs)
/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/deprecations.py in warned(fn, *args, **kwargs)
307 stacklevel=3,
308 )
--> 309 return fn(*args, **kwargs)
310
311 doc = fn.__doc__ is not None and fn.__doc__ or ""
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/create.py in create_engine(url, **kwargs)
528
529 # create url.URL object
--> 530 u = _url.make_url(url)
531
532 u, plugins, kwargs = u._instantiate_plugins(kwargs)
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/url.py in make_url(name_or_url)
713
714 if isinstance(name_or_url, util.string_types):
--> 715 return _parse_rfc1738_args(name_or_url)
716 else:
717 return name_or_url
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/url.py in _parse_rfc1738_args(name)
775 else:
776 raise exc.ArgumentError(
--> 777 "Could not parse rfc1738 URL from string '%s'" % name
778 )
779
ArgumentError: Could not parse rfc1738 URL from string 'gdrive/My Drive/chapter_classification/output/sqlite:///opt1.db'
So according to the official Documentation of create_study
When a database URL is passed, Optuna internally uses SQLAlchemy to handle the database. Please refer to SQLAlchemy’s document for further details. If you want to specify non-default options to SQLAlchemy Engine, you can instantiate RDBStorage with your desired options and pass it to the storage argument instead of a URL.
And when you visit the documentation of SQLAlchemy, you find that it uses absolute path.
So all you have to do is to change
storage=f"gdrive/My Drive/chapter_classification/output/sqlite:///{name}.db"
to the absolute path as:
storage = f"sqlite:///gdrive/My Drive/chapter_classification/output{name}.db"

How to abbreviate traceback in Jupyter Notebook?

I documented an XML-API with Jupyter Notebook, so documentation and specification cannot drift apart.
This works great.
As the API also has to handle invalid input, Jupyter Notebook shows - correctly - the traceback.
The traceback is very verbose - I'd like to abbreviate / shorten it - ideally, only the last line should be shown.
request
server.get_licenses("not-existing-id")
current print out in Jupyter Notebook
---------------------------------------------------------------------------
Fault Traceback (most recent call last)
<ipython-input-5-366cceb6869e> in <module>
----> 1 server.get_licenses("not-existing-id")
/usr/lib/python3.9/xmlrpc/client.py in __call__(self, *args)
1114 return _Method(self.__send, "%s.%s" % (self.__name, name))
1115 def __call__(self, *args):
-> 1116 return self.__send(self.__name, args)
1117
1118 ##
/usr/lib/python3.9/xmlrpc/client.py in __request(self, methodname, params)
1456 allow_none=self.__allow_none).encode(self.__encoding, 'xmlcharrefreplace')
1457
-> 1458 response = self.__transport.request(
1459 self.__host,
1460 self.__handler,
/usr/lib/python3.9/xmlrpc/client.py in request(self, host, handler, request_body, verbose)
1158 for i in (0, 1):
1159 try:
-> 1160 return self.single_request(host, handler, request_body, verbose)
1161 except http.client.RemoteDisconnected:
1162 if i:
/usr/lib/python3.9/xmlrpc/client.py in single_request(self, host, handler, request_body, verbose)
1174 if resp.status == 200:
1175 self.verbose = verbose
-> 1176 return self.parse_response(resp)
1177
1178 except Fault:
/usr/lib/python3.9/xmlrpc/client.py in parse_response(self, response)
1346 p.close()
1347
-> 1348 return u.close()
1349
1350 ##
/usr/lib/python3.9/xmlrpc/client.py in close(self)
660 raise ResponseError()
661 if self._type == "fault":
--> 662 raise Fault(**self._stack[0])
663 return tuple(self._stack)
664
Fault: <Fault 1: 'company id is not valid'>
my wish output
Fault: <Fault 1: 'company id is not valid'>
As it turns out, that's built into iPython, so you don't need to install or update anything.
Just put a single cell at the top of your notebook and run %xmode Minimal as the only input. You can also see the documentation with %xmode? or a lot of other "magic method" documentation with %quickref.
The following solution, using sys.excepthook works in a REPL...
code
import sys
def my_exc_handler(type, value, traceback):
print(repr(value), file=sys.stderr)
sys.excepthook = my_exc_handler
1 / 0
bash
❯ python3.9 main.py
ZeroDivisionError('division by zero')
... but unfortunately not in Jupyter Notebook - I still get the full traceback.
When I have a look at Python's documentation...
When an exception is raised and uncaught
... maybe the "uncaught" is the problem. When I have to guess, I think Jupyter Notebook catches all exceptions, and does the formatting and printing itself.

how to debug a CommClosedError in Dask Gateway deployed in Kubernetes

I have deployed dask_gateway 0.8.0 (with dask==2.25.0 and distributed==2.25.0) in a Kubernetes cluster.
When I create a new cluster with:
cluster = gateway.new_cluster(public_address = gateway._public_address)
I get this error:
Task exception was never retrieved
future: <Task finished coro=<connect.<locals>._() done, defined at /home/jovyan/.local/lib/python3.6/site-packages/distributed/comm/core.py:288> exception=CommClosedError()>
Traceback (most recent call last):
File "/home/jovyan/.local/lib/python3.6/site-packages/distributed/comm/core.py", line 297, in _
handshake = await asyncio.wait_for(comm.read(), 1)
File "/cvmfs/sft.cern.ch/lcg/releases/Python/3.6.5-f74f0/x86_64-centos7-gcc8-opt/lib/python3.6/asyncio/tasks.py", line 351, in wait_for
yield from waiter
concurrent.futures._base.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jovyan/.local/lib/python3.6/site-packages/distributed/comm/core.py", line 304, in _
raise CommClosedError() from e
distributed.comm.core.CommClosedError
However, if I check the pods, the cluster has actually been created, and I can scale it up, and everything seems fine in the dashboard (I can even see the workers).
However, I cannot get the client:
> client = cluster.get_client()
Task exception was never retrieved
future: <Task finished coro=<connect.<locals>._() done, defined at /home/jovyan/.local/lib/python3.6/site-packages/distributed/comm/core.py:288> exception=CommClosedError()>
Traceback (most recent call last):
File "/home/jovyan/.local/lib/python3.6/site-packages/distributed/comm/core.py", line 297, in _
handshake = await asyncio.wait_for(comm.read(), 1)
File "/cvmfs/sft.cern.ch/lcg/releases/Python/3.6.5-f74f0/x86_64-centos7-gcc8-opt/lib/python3.6/asyncio/tasks.py", line 351, in wait_for
yield from waiter
concurrent.futures._base.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jovyan/.local/lib/python3.6/site-packages/distributed/comm/core.py", line 304, in _
raise CommClosedError() from e
distributed.comm.core.CommClosedError
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, handshake_overrides, **connection_args)
321 if not comm:
--> 322 _raise(error)
323 except FatalCommClosedError:
~/.local/lib/python3.6/site-packages/distributed/comm/core.py in _raise(error)
274 )
--> 275 raise IOError(msg)
276
OSError: Timed out trying to connect to 'gateway://traefik-dask-gateway:80/jhub.0373ea68815d47fca6a6c489c8f7263a' after 100 s: connect() didn't finish in time
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-19-affca45186d3> in <module>
----> 1 client = cluster.get_client()
~/.local/lib/python3.6/site-packages/dask_gateway/client.py in get_client(self, set_as_default)
1066 set_as_default=set_as_default,
1067 asynchronous=self.asynchronous,
-> 1068 loop=self.loop,
1069 )
1070 if not self.asynchronous:
~/.local/lib/python3.6/site-packages/distributed/client.py in __init__(self, address, loop, timeout, set_as_default, scheduler_file, security, asynchronous, name, heartbeat_interval, serializers, deserializers, extensions, direct_to_workers, connection_limit, **kwargs)
743 ext(self)
744
--> 745 self.start(timeout=timeout)
746 Client._instances.add(self)
747
~/.local/lib/python3.6/site-packages/distributed/client.py in start(self, **kwargs)
948 self._started = asyncio.ensure_future(self._start(**kwargs))
949 else:
--> 950 sync(self.loop, self._start, **kwargs)
951
952 def __await__(self):
~/.local/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
337 if error[0]:
338 typ, exc, tb = error[0]
--> 339 raise exc.with_traceback(tb)
340 else:
341 return result[0]
~/.local/lib/python3.6/site-packages/distributed/utils.py in f()
321 if callback_timeout is not None:
322 future = asyncio.wait_for(future, callback_timeout)
--> 323 result[0] = yield future
324 except Exception as exc:
325 error[0] = sys.exc_info()
/cvmfs/sft.cern.ch/lcg/views/LCG_96python3/x86_64-centos7-gcc8-opt/lib/python3.6/site-packages/tornado/gen.py in run(self)
1131
1132 try:
-> 1133 value = future.result()
1134 except Exception:
1135 self.had_exception = True
~/.local/lib/python3.6/site-packages/distributed/client.py in _start(self, timeout, **kwargs)
1045
1046 try:
-> 1047 await self._ensure_connected(timeout=timeout)
1048 except (OSError, ImportError):
1049 await self._close()
~/.local/lib/python3.6/site-packages/distributed/client.py in _ensure_connected(self, timeout)
1103 try:
1104 comm = await connect(
-> 1105 self.scheduler.address, timeout=timeout, **self.connection_args
1106 )
1107 comm.name = "Client->Scheduler"
~/.local/lib/python3.6/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, handshake_overrides, **connection_args)
332 backoff = min(backoff, 1) # wait at most one second
333 else:
--> 334 _raise(error)
335 else:
336 break
~/.local/lib/python3.6/site-packages/distributed/comm/core.py in _raise(error)
273 error,
274 )
--> 275 raise IOError(msg)
276
277 backoff = 0.01
OSError: Timed out trying to connect to 'gateway://traefik-dask-gateway:80/jhub.0373ea68815d47fca6a6c489c8f7263a' after 100 s: Timed out trying to connect to 'gateway://traefik-dask-gateway:80/jhub.0373ea68815d47fca6a6c489c8f7263a' after 100 s: connect() didn't finish in time
How do I debug this? Any pointer would be greatly appreciated.
I already tried increasing all the timeouts, but nothing changed:
os.environ["DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT"]="100s"
os.environ["DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP"]="600s"
os.environ["DASK_DISTRIBUTED__COMM__RETRY__DELAY__MIN"]="1s"
os.environ["DASK_DISTRIBUTED__COMM__RETRY__DELAY__MAX"]="60s"
I wrote a tutorial about the steps I took to deploy dask gateway, see https://zonca.dev/2020/08/dask-gateway-jupyterhub.html.
I am quite sure this was working fine a few weeks ago, but I cannot identify what changed...
You need to use compatible versions of dask and dask-distributed everywhere.
I believe this is an error related to an upgrade in the communications protocol for distributed. See https://github.com/dask/dask-gateway/issues/316#issuecomment-702947730
These are the pinned versions of the dependencies for the docker images as of Nov 10, 2020 (in conda environment.yml compatible format):
- python=3.7.7
- dask=2.21.0
- distributed=2.21.0
- cloudpickle=1.5.0
- toolz=0.10.0

Categories