How to force reading from replicaset secondary member with PyMongo? - python

I am trying to connect to a MongoDB replicaset with PyMongo and to manually balance the reading load with ReadPreference.
The problem is that whatever I try with MongoClient, it always reads from PRIMARY.
I am using Python 2.7.6 and PyMongo 2.6.3 with mongodb-10gen 2.4.14 (all for legacy reasons) on Linux Mint 17.2.
My connection sequence looks like this (without the print, and with massive request on some collection of some database from the connection):
>>> from pymongo import MongoClient, ReadPreference
>>> HOST = "10.0.0.51"
>>> PORT = 49029
>>> print MongoClient(host=HOST, port=PORT, replicaset="rs02", readPreference=ReadPreference.SECONDARY)
MongoClient([u'XXXX-MNGO03664:49029', u'XXXX-MNGO03663:49029'])
This one would be the right way to go with PyMongo > 3, from what I have read. Unfortunately I am stuck with PyMongo 2.6.3, and I can only assume that this is why it doesn't read from secondary.
After a bit of digging, I found about ReplicaSetConnection (deprecated since PyMongo 2.4) and MongoReplicaSetClient (see e.g. pymongo replication secondary readreference not work and pymongo: Advantage of using MongoReplicaSetClient?), but it also doesn't seem to work for me, for different reasons though.
>>> print MongoReplicaSetClient(host=HOST, port=PORT, replicaset="rs02", readPreference=ReadPreference.SECONDARY)
MongoReplicaSetClient([])
The client doesn't seem to be able to see the members of the replicaset…
And of course when I start reading from this connection, it doesn't work.
>>> myConn = MongoReplicaSetClient(host=HOST, port=PORT, replicaset="rs02", readPreference=ReadPreference.SECONDARY)
>>> print myConn.someDB.someCollection.count()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 759, in count
return self.find().count()
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 640, in count
**command)
File "/usr/local/lib/python2.7/dist-packages/pymongo/database.py", line 391, in command
result = self["$cmd"].find_one(command, **extra_opts)
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 604, in find_one
for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 904, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 848, in _refresh
self.__uuid_subtype))
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 782, in __send_message
res = client._send_message_with_response(message, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pymongo/mongo_replica_set_client.py", line 1631, in _send_message_with_response
raise AutoReconnect(msg, errors)
pymongo.errors.AutoReconnect: No replica set secondary available for query with ReadPreference SECONDARY
Note that the same error appears when I try to read with ReadPreference.PRIMARY.
The weird thing here is that if I change the name of the replicaset to connect to, the client spots that it doesn't exist :
>>> print MongoReplicaSetClient(host=HOST, port=PORT, replicaset="rs42", readPreference=ReadPreference.SECONDARY)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pymongo/mongo_replica_set_client.py", line 742, in __init__
self.refresh()
File "/usr/local/lib/python2.7/dist-packages/pymongo/mongo_replica_set_client.py", line 1135, in refresh
% (host, port, self.__name))
pymongo.errors.ConfigurationError: 10.0.0.51:49029 is not a member of replica set rs42
So I assume that in normal cases, it has a way to see that there is a replicaset here, and who are its members.

Related

Asn1 parsing error when trying to connect to mongo atlas

I am using MongoDB atlas for my discord bot but recently ran into an error. On the hosting (Heroku) everything works without errors, I first updated all the modules, but the error has not disappeared.
I am using motor as a driver to work with MongoDB atlas.
Checked the database connection URL, everything is correct.
Python version 3.10(on Heroku too)
Traceback (most recent call last):
File "D:\Проекты\AkainuBot\main.py", line 23, in <module>
mongo = AsyncIOMotorClient(
File "D:\Python\lib\site-packages\motor\core.py", line 159, in __init__
delegate = self.__delegate_class__(*args, **kwargs)
File "D:\Python\lib\site-packages\pymongo\mongo_client.py", line 718, in __init__
self.__options = options = ClientOptions(
File "D:\Python\lib\site-packages\pymongo\client_options.py", line 165, in __init__
self.__pool_options = _parse_pool_options(options)
File "D:\Python\lib\site-packages\pymongo\client_options.py", line 132, in _parse_pool_options
ssl_context, ssl_match_hostname = _parse_ssl_options(options)
File "D:\Python\lib\site-packages\pymongo\client_options.py", line 98, in _parse_ssl_options
ctx = get_ssl_context(
File "D:\Python\lib\site-packages\pymongo\ssl_support.py", line 159, in get_ssl_context
ctx.load_verify_locations(certifi.where())
File "D:\Python\lib\site-packages\pymongo\pyopenssl_context.py", line 276, in load_verify_locations
self._callback_data.trusted_ca_certs = _load_trusted_ca_certs(cafile)
File "D:\Python\lib\site-packages\pymongo\ocsp_support.py", line 79, in _load_trusted_ca_certs
_load_pem_x509_certificate(cert_data, backend))
File "D:\Python\lib\site-packages\cryptography\x509\base.py", line 436, in load_pem_x509_certificate
return rust_x509.load_pem_x509_certificate(data)
ValueError: error parsing asn1 value: ParseError { kind: InvalidValue, location: ["RawCertificate::tbs_cert", "TbsCertificate::serial"] }
I've solved my problem getting the parsing asn1 value error by adding the ssl_cert_reqs option on the MongoClient (and obviously importing ssl).
dbClient = pymongo.MongoClient(uri, ssl_cert_reqs=ssl.CERT_NONE)

Pymongo pymongo.errors.ServerSelectionTimeoutError when using example code

I am trying to run very straightforward code to figure out how to use pymongo with the MongoDB Atlas Cloud.
Here is the example code
import pymongo
client = pymongo.MongoClient("mongodb+srv://{myusername}:{mypassword}#cluster0-uywu8.mongodb.net/test?retryWrites=true&w=majority")
db = client.BroadwayMatch
print(db)
collection = db.Artists
print(collection)
print(collection.insert_one({'x': 1}))
BroadwayMatch and Artists are existing databases and collections that I was able to insert to last week, I am not sure what changed. It seems to be connecting to the database and collection successfully, but for some reason is unable to read or write to it. All attributes of collections can be accessed, but all methods result in a ServerSelectionTimeoutError. Here is the output from this snippet
Database(MongoClient(host=['cluster0-shard-00-01-uywu8.mongodb.net:27017', 'cluster0-shard-00-00-uywu8.mongodb.net:27017', 'cluster0-shard-00-02-uywu8.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='Cluster0-shard-0', ssl=True, retrywrites=True, w='majority'), 'BroadwayMatch')
Collection(Database(MongoClient(host=['cluster0-shard-00-01-uywu8.mongodb.net:27017', 'cluster0-shard-00-00-uywu8.mongodb.net:27017', 'cluster0-shard-00-02-uywu8.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='Cluster0-shard-0', ssl=True, retrywrites=True, w='majority'), 'BroadwayMatch'), 'Artists')
Traceback (most recent call last):
File "C:\Python37\Spotify-Match\mongotest.py", line 10, in <module>
print(collection.insert_one({'x': 1}))
File "C:\Python37\lib\site-packages\pymongo\collection.py", line 700, in insert_one
session=session),
File "C:\Python37\lib\site-packages\pymongo\collection.py", line 614, in _insert
bypass_doc_val, session)
File "C:\Python37\lib\site-packages\pymongo\collection.py", line 602, in _insert_one
acknowledged, _insert_command, session)
File "C:\Python37\lib\site-packages\pymongo\mongo_client.py", line 1279, in _retryable_write
with self._tmp_session(session) as s:
File "C:\Python37\lib\contextlib.py", line 112, in __enter__
return next(self.gen)
File "C:\Python37\lib\site-packages\pymongo\mongo_client.py", line 1611, in _tmp_session
s = self._ensure_session(session)
File "C:\Python37\lib\site-packages\pymongo\mongo_client.py", line 1598, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "C:\Python37\lib\site-packages\pymongo\mongo_client.py", line 1551, in __start_session
server_session = self._get_server_session()
File "C:\Python37\lib\site-packages\pymongo\mongo_client.py", line 1584, in _get_server_session
return self._topology.get_server_session()
File "C:\Python37\lib\site-packages\pymongo\topology.py", line 434, in get_server_session
None)
File "C:\Python37\lib\site-packages\pymongo\topology.py", line 200, in _select_servers_loop
self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: connection closed,connection closed,connection closed
I'm not sure what I am doing wrong, can anyone help?
Reasons you might not be able to connect to an Atlas server:
Your white list has not enable your current IP address
you are using the wrong username and/or password. In your case it looks like your fString is missing an f at the start.
When diagnosing these conditions cutting and pasting the MongoDB Atlas connection string (see below) for the MongoDB shell or MongoDB Compass can often expose username and/or password errors.
Either your mongo server is not exposed otherwise it is not in the default port. Try following:
import pymongo
client = pymongo.MongoClient("mongodb://uname:pass#ip:port/")
db = client['BroadwayMatch']

Accessing HDInsight Hive with python

We have a HDInsight cluster with some tables in HIVE. I want to query these tables from Python 3.6 from a client machine (outside Azure).
I have tried using PyHive, pyhs2 and also impyla but I am running into various problems with all of them.
Does anybody have a working example of accessing a HDInsight HIVE from Python?
I have very little experience with this, and don't know how to configure PyHive (which seems the most promising), especially regarding authorization.
With impyla:
from impala.dbapi import connect
conn = connect(host='redacted.azurehdinsight.net',port=443)
cursor = conn.cursor()
cursor.execute('SELECT * FROM cs_test LIMIT 100')
print(cursor.description) # prints the result set's schema
results = cursor.fetchall()
This gives:
Traceback (most recent call last):
File "C:/git/ml-notebooks/impyla.py", line 3, in <module>
cursor = conn.cursor()
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 125, in cursor
session = self.service.open_session(user, configuration)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 995, in open_session
resp = self._rpc('OpenSession', req)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 923, in _rpc
response = self._execute(func_name, request)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 954, in _execute
.format(self.retries))
impala.error.HiveServer2Error: Failed after retrying 3 times
With Pyhive:
from pyhive import hive
conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
#also tried other auth-types, but as i said, i have no clue here
This gives:
Traceback (most recent call last):
File "C:/git/ml-notebooks/PythonToHive.py", line 3, in <module>
conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 64, in connect
return Connection(*args, **kwargs)
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 164, in __init__
response = self._client.OpenSession(open_session_req)
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 187, in OpenSession
return self.recv_OpenSession()
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 161, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TSocket.py", line 117, in read
buff = self.handle.recv(sz)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
According to the offical document Understand and resolve errors received from WebHCat on HDInsight, it said as below.
What is WebHCat
WebHCat is a REST API for HCatalog, a table, and storage management layer for Hadoop. WebHCat is enabled by default on HDInsight clusters, and is used by various tools to submit jobs, get job status, etc. without logging in to the cluster.
So a workaround way is to use WebHCat to run the Hive QL in Python, please refer to the Hive document to learn & use it. As reference, there is a similar MSDN thread discussed about it.
Hope it helps.
Technically you should be able to use the Thrift connector and pyhive but I haven't had any success with this. However I have successfully used the JDBC connector using JayDeBeAPI.
First you need to download the JDBC driver.
http://central.maven.org/maven2/org/apache/hive/hive-jdbc/1.2.1/hive-jdbc-1.2.1-standalone.jar
http://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.4/httpclient-4.4.jar
http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.4.4/httpcore-4.4.4.jar
I put mine in /jdbc and used JayDeBeAPI with the following connection string.
edit: You need to add /jdbc/* to your CLASSPATH environment variable.
import jaydebeapi
conn = jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver",
"jdbc:hive2://my_ip_or_url:443/;ssl=true;transportMode=http;httpPath=/hive2",
[username, password],
"/jdbc/hive-jdbc-1.2.1.jar")

How to Integrate Pyramid 1.1 and Mongo DB - as few lines as possible

Goal: I try to integrate Mongo DB with Pyramid 1.1 basic application.
Background: Appliation is created by the book (https://docs.pylonsproject.org/projects/pyramid/1.1/narr/project.html#creating-the-project) using basic command "paste create -t pyramid_starter"
I followed this cookbook article: https://docs.pylonsproject.org/projects/pyramid_cookbook/dev/mongo.html
Problem: It seems that when ever I add MongoDB connection into request I got "Internal Server Error" with
I have tried several articles and it seems that I must start debug system more?
Has anybody found easy solution for this?
Exception if it helps some expert
Exception happened during processing of request from ('127.0.0.1', 53697)
Traceback (most recent call last):
File "virtualenv\lib\site-packages\paste-1.7.5.1-py2.7.egg\paste\httpserver.py", line 1068, in process_request_in_thread
self.finish_request(request, client_address)
File "C:\Python27\Lib\SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "C:\Python27\Lib\SocketServer.py", line 639, in __init__
self.handle()
File "virtualenv\lib\site-packages\paste-1.7.5.1-py2.7.egg\paste\httpserver.py", line 442, in handle
BaseHTTPRequestHandler.handle(self)
File "C:\Python27\Lib\BaseHTTPServer.py", line 343, in handle
self.handle_one_request()
...
File "C:\Python27\lib\site-packages\pyramid_debugtoolbar-0.8-py2.7.egg\pyramid_debugtoolbar\panels\__init__.py", line 24, in render
return render(template_name, vars, request=request)
File "virtualenv\lib\site-packages\pyramid-1.2a1-py2.7.egg\pyramid\renderers.py", line 69, in render
return helper.render(value, None, request=request)
File "virtualenv\lib\site-packages\pyramid-1.2a1-py2.7.egg\pyramid\renderers.py", line 418, in render
result = renderer(value, system_values)
File "C:\Python27\lib\site-packages\pyramid_jinja2-1.1-py2.7.egg\pyramid_jinja2\__init__.py", line 277, in __call__
return self.template.render(system)
File "C:\Python27\lib\site-packages\jinja2-2.6-py2.7.egg\jinja2\environment.py", line 894, in render
return self.environment.handle_exception(exc_info, True)
File "C:\Python27\lib\site-packages\pyramid_debugtoolbar-0.8-py2.7.egg\pyramid_debugtoolbar\panels\templates\request_vars.jinja2", line 110, in top-level template code
<td>{{ value|escape }}</td>
File "virtualenv\lib\site-packages\markupsafe-0.15-py2.7.egg\markupsafe\_native.py", line 20, in escape
return s.__html__()
File "virtualenv\lib\site-packages\pymongo-2.0.1-py2.7-win-amd64.egg\pymongo\collection.py", line 1156, in __call__
self.__name)
TypeError: 'Collection' object is not callable. If you meant to call the '__html__' method on a 'Database' object it is failing because no such method exists.
Another possible solution is to use the 'debugtoolbar.panels' setting in your config file to disable the request_vars panel (which is what is causing the issue):
[app:main]
.. other stuff ...
debugtoolbar.panels =
pyramid_debugtoolbar.panels.versions.VersionDebugPanel
pyramid_debugtoolbar.panels.settings.SettingsDebugPanel
pyramid_debugtoolbar.panels.headers.HeaderDebugPanel
# pyramid_debugtoolbar.panels.request_vars.RequestVarsDebugPanel
pyramid_debugtoolbar.panels.renderings.RenderingsDebugPanel
pyramid_debugtoolbar.panels.logger.LoggingPanel
pyramid_debugtoolbar.panels.performance.PerformanceDebugPanel
pyramid_debugtoolbar.panels.routes.RoutesDebugPanel
pyramid_debugtoolbar.panels.sqla.SQLADebugPanel
Pymongo's Database and Collection objects respond to __getattr__ in order to provide a nicer interface and let you write code like:
db.foo.bar.find(...)
Any call to __getattr__ will succeed, but unfortunately this confuses some libraries which expect certain attributes to be callable (the Pymongo Collection object is not callable, except to raise that exception you're seeing above).
What I've done in Pyramid projects is only use the database from within the resources, to prevent references to the database or collections from becoming present in the module level in views or other code. As an added benefit, this ends up being a good way of enforcing separation of concerns so that resources handle database manipulation, and views only translate that for display in the templates.
That error means that your trying to call a method (html) that doesn't exist in the Database instance.
>>> conn = Connection()
>>> db = conn.mydb
>>> col = db.mycoll
>>> col = db.mycoll()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/virtualenvs/myenv/lib/python2.7/site-packages/pymongo-2.0-py2.7-macosx-10.6-x86_64.egg/pymongo/collection.py", line 1156, in __call__
self.__name)
TypeError: 'Collection' object is not callable. If you meant to call the 'mycoll' method on a 'Database' object it is failing because no such method exists.
If you haven't modified the code then it is possible it's an bug in markupsafe that tries to call html() in a Database instance
s.__html__()

Pymongo AssertionError: ids don't match

I use:
MongoDB 1.6.5
Pymongo 1.9
Python 2.6.6
I have 3 types of daemons. 1st load data from web, 2nd analyze it and save result, and 3rd group result. All of them working with Mongodb.
At some time 3rd daemon throws many exceptions like this(mostly when there are big amount of data in DB):
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/gevent-0.13.1-py2.6-linux-x86_64.egg/gevent/greenlet.py", line 405, in run
result = self._run(*self.args, **self.kwargs)
File "/data/www/spider/daemon/scripts/mainconverter.py", line 72, in work
for item in res:
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/cursor.py", line 601, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/cursor.py", line 564, in _refresh
self.__query_spec(), self.__fields))
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/cursor.py", line 521, in __send_message
**kwargs)
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/connection.py", line 743, in _send_message_with_response
return self.__send_and_receive(message, sock)
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/connection.py", line 724, in __send_and_receive
return self.__receive_message_on_socket(1, request_id, sock)
File "/usr/local/lib/python2.6/dist-packages/pymongo-1.9_-py2.6-linux-x86_64.egg/pymongo/connection.py", line 714, in __receive_message_on_socket
struct.unpack("<i", header[8:12])[0])
AssertionError: ids don't match -561338340 0
<Greenlet at 0x2baa628: <bound method Worker.work of <scripts.mainconverter.Worker object at 0x2ba8450>>> failed with AssertionError
Can anyone tell what cause this exeption and how to fix this.
Thanks.
This is likely a threading problem related to how you are using worker threads with gevent coroutines. It seems like the pymongo connection object is reading a response for a request it didn't make.

Categories