I have a query that joins on a jsonb type column in postgres that I want to convert to sqlalchemy in django using the aldjemy package
SELECT anon_1.key AS tag, count(anon_1.value ->> 'polarity') AS count_1, anon_1.value ->> 'polarity' AS anon_2
FROM feedback f
JOIN tagging t ON t.feedback_id = f.id
JOIN jsonb_each(t.json_content -> 'entityMap') AS anon_3 ON true
JOIN jsonb_each(((anon_3.value -> 'data') - 'selectionState') - 'segment') AS anon_1 ON true
where f.id = 2
GROUP BY anon_1.value ->> 'polarity', anon_1.key;
The json_content field stores data in the following format:
{
"entityMap":
{
"0":
{
"data":
{
"people":
{
"labelId": 5,
"polarity": "positive"
},
"segment": "a small segment",
"selectionState":
{
"focusKey": "9xrre",
"hasFocus": true,
"anchorKey": "9xrre",
"isBackward": false,
"focusOffset": 75,
"anchorOffset": 3
}
},
"type": "TAG",
"mutability": "IMMUTABLE"
},
"1":
{
"data":
{
"product":
{
"labelId": 6,
"polarity": "positive"
},
"segment": "another segment",
"selectionState":
{
"focusKey": "9xrre",
"hasFocus": true,
"anchorKey": "9xrre",
"isBackward": false,
"focusOffset": 138,
"anchorOffset": 79
}
},
"type": "TAG",
"mutability": "IMMUTABLE"
}
}
}
I wrote the following sqlalchemy code to achieve the query
first_alias = aliased(func.jsonb_each(Tagging.sa.json_content["entityMap"]))
print(first_alias)
second_alias = aliased(
func.jsonb_each(
first_alias.c.value.op("->")("data")
.op("-")("selectionState")
.op("-")("segment")
)
)
polarity = second_alias.c.value.op("->>")("polarity")
p_tag = second_alias.c.key
_count = (
Feedback.sa.query()
.join(
CampaignQuestion,
CampaignQuestion.sa.question_id == Feedback.sa.question_id,
isouter=True,
)
.join(Tagging)
.join(first_alias, true())
.join(second_alias, true())
.filter(CampaignQuestion.sa.campaign_id == campaign_id)
.with_entities(p_tag.label("p_tag"), func.count(polarity), polarity)
.group_by(polarity, p_tag)
.all()
)
print(_count)
but it is giving me a NotImplementedError: Operator 'getitem' is not supported on this expression error on accessing first_alias.c
the stack trace:
Traceback (most recent call last):
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/home/work/api/app/campaign/views.py", line 119, in results_p_tags
d = campaign_service.get_p_tag_count_for_campaign_results(id)
File "/home/work/api/app/campaign/services/campaign.py", line 177, in get_p_tag_count_for_campaign_results
return campaign_selectors.get_p_tag_counts_for_campaign(campaign_id)
File "/home/work/api/app/campaign/selectors.py", line 196, in get_p_tag_counts_for_campaign
polarity = second_alias.c.value.op("->>")("polarity")
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 1093, in __get__
obj.__dict__[self.__name__] = result = self.fget(obj)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 746, in columns
self._populate_column_collection()
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 1617, in _populate_column_collection
self.element._generate_fromclause_column_proxies(self)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 703, in _generate_fromclause_column_proxies
fromclause._columns._populate_separate_keys(
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/base.py", line 1216, in _populate_separate_keys
self._colset.update(c for k, c in self._collection)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/base.py", line 1216, in <genexpr>
self._colset.update(c for k, c in self._collection)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/operators.py", line 434, in __getitem__
return self.operate(getitem, index)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 831, in operate
return op(self.comparator, *other, **kwargs)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/operators.py", line 434, in __getitem__
return self.operate(getitem, index)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/type_api.py", line 75, in operate
return o[0](self.expr, op, *(other + o[1:]), **kwargs)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/default_comparator.py", line 173, in _getitem_impl
_unsupported_impl(expr, op, other, **kw)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/default_comparator.py", line 177, in _unsupported_impl
raise NotImplementedError(
NotImplementedError: Operator 'getitem' is not supported on this expression
Any help would be greatly appreciated
PS: The sqlalchemy version I'm using for this is 1.4.6
I used the same sqlalchmy query expression before in a flask project using sqlalchemy version 1.3.22 and it was working correctly
Fixed the issue by using table_valued functions as mentioned in the docs,
and accessing the ColumnCollection of the function using indices instead of keys. Code is as follows:
first_alias = func.jsonb_each(Tagging.sa.json_content["entityMap"]).table_valued(
"key", "value"
)
second_alias = func.jsonb_each(
first_alias.c[1].op("->")("data").op("-")("selectionState").op("-")("segment")
).table_valued("key", "value")
polarity = second_alias.c[1].op("->>")("polarity")
p_tag = second_alias.c[0]
Related
I am trying to create tests to check a newly implemented login process, however I got stuck at the very beginning.
When I try to make a query in my tests, Django throws a MySQLdb._exceptions.OperationalError and fails to continue. I noticed, that this issue happens only when I use django.test.TestCase, but not when I use django.test.TransactionTestCase. Now I read that there is a difference between how these two handle transactions, but I still cannot understand why one fails when the other one works. Can someone please point me in the right direction?
This code fails:
class AuthenticateUserTest(TestCase):
fixtures = ["test_users.json"]
serialized_rollback = True
def test_login_success(self):
print(User.objects.all())
but this code doesn't:
class AuthenticateUserTest(TransactionTestCase):
fixtures = ["test_users.json"]
serialized_rollback = True
def test_login_success(self):
print(User.objects.all())
with this fixture:
[
{
"model": "app.user",
"pk": 1,
"fields": {
"created_at": "2022-04-07T13:34:53+01:00",
"email": "test#test.com",
"password": "topsecret",
"is_active": true,
"roles": "[\"SUPER_ADMIN\"]",
"first_name": "Test",
"last_name": "User"
}
}
]
and with the following error:
ERROR: test_login_success (tests.auth.test_auth_api.AuthenticateUserTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/app/tests/auth/test_auth_api.py", line 36, in test_login_success
print(User.objects.all())
File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 252, in __repr__
data = list(self[:REPR_OUTPUT_SIZE + 1])
File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 258, in __len__
self._fetch_all()
File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 1261, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 57, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1155, in execute_sql
cursor.close()
File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 83, in close
while self.nextset():
File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 137, in nextset
nr = db.next_result()
MySQLdb._exceptions.OperationalError: (2006, '')
Background
I am trying to validate a JSON file using jsonchema. However, the library is trying to make a GET request and I want to avoid that.
from jsonschema import validate
point_schema = {
"$id": "https://example.com/schemas/point",
"type": "object",
"properties": {"x": {"type": "number"}, "y": {"type": "number"}},
"required": ["x", "y"],
}
polygon_schema = {
"$id": "https://example.com/schemas/polygon",
"type": "array",
"items": {"$ref": "https://example.com/schemas/point"},
}
a_polygon = [{'x': 1, 'y': 2}, {'x': 3, 'y': 4}, {'x': 1, 'y': 2}]
validate(instance=a_polygon, schema=polygon_schema)
Error
I am trying to connect both schemas using a $ref key from the spec:
https://json-schema.org/understanding-json-schema/structuring.html?highlight=ref#ref
Unfortunately for me, this means the library will make a GET request to the URI specified and try to decode it:
Traceback (most recent call last):
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 777, in resolve_from_url
document = self.resolve_remote(url)
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 860, in resolve_remote
result = requests.get(uri).json()
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/requests/models.py", line 910, in json
return complexjson.loads(self.text, **kwargs)
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 932, in validate
error = exceptions.best_match(validator.iter_errors(instance))
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/exceptions.py", line 367, in best_match
best = next(errors, None)
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 328, in iter_errors
for error in errors:
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/_validators.py", line 81, in items
for error in validator.descend(item, items, path=index):
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 344, in descend
for error in self.iter_errors(instance, schema):
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 328, in iter_errors
for error in errors:
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/_validators.py", line 259, in ref
scope, resolved = validator.resolver.resolve(ref)
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 766, in resolve
return url, self._remote_cache(url)
File "/home/user/anaconda3/envs/myapp-py/lib/python3.7/site-packages/jsonschema/validators.py", line 779, in resolve_from_url
raise exceptions.RefResolutionError(exc)
jsonschema.exceptions.RefResolutionError: Expecting value: line 1 column 1 (char 0)
I don't want this, I just want the polygon schema to reference the point schema that is right above (as for this purpose, a polygon is a list of points).
In fact, these schemas are in the same file.
Questions
I could always do the following:
point_schema = {
"$id": "https://example.com/schemas/point",
"type": "object",
"properties": {"x": {"type": "number"}, "y": {"type": "number"}},
"required": ["x", "y"],
}
polygon_schema = {
"$id": "https://example.com/schemas/polygon",
"type": "array",
"items": point_schema,
}
And this would technically work.
However I would simply be build a bigger dictionary and I would not be using the spec as it was designed to.
How can I use the spec to solve my problem?
You have to provide your other schemas to the implementation.
With this implementation, you must provide a RefResolver to the validate function.
You'll need to either provide a single base_uri and referrer (the schema), or a store which contains a dictionary of URI to schema.
Additionally, you may handle protocols with a function.
Your RefResolver would look like the following...
refResolver = jsonschema.RefResolver(referrer=point_schema, base_uri='https://example.com/schemas/point'
I have created a Beam Dataflow pipeline that parses a single JSON from a PubSub topic:
{
"data": "test data",
"options": {
"test options": "test",
"test_units": {
"test": {
"test1": "test1",
"test2": "test2"
},
"test2": {
"test1": "test1",
"test2": "test2"
},
"test3": {
"test1": "test1",
"test2": "test2"
}
}
}
}
My output is something like this:
{
"data": "test data",
"test_test_unit": "test1",
"test_test_unit": "test2",
"test1_test_unit": "test1",
...
},
{
"data": "test data",
"test_test_unit": "test1",
"test_test_unit": "test2",
"test1_test_unit": "test1",
...
}
Basically what I'm doing is flattening the data based on how many test_units are in the JSON from the PubSub and returning that many rows in a single dict.
I have created a Class to flatten the data which returns a dict of rows.
Here is my Beam pipeline:
lines = ( p | 'Read from PubSub' >> beam.io.ReadStringsFromPubSub(known_args.input_topic)
| 'Parse data' >> beam.DoFn(parse_pubsub())
| 'Write to BigQuery' >> beam.io.WriteToBigQuery(
known_args.output_table,
schema=table_schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
)
)
Here is some of the class to handle the flattening:
class parse_pubsub(beam.DoFn):
def process(self, element):
# ...
# flattens the data
# ...
return rows
Here is the error from the Stackdriver logs:
Error processing instruction -138. Original traceback is Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 151, in _execute
response = task() File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 186, in <lambda> self._execute(lambda: worker.do_instruction(work), work) File "/usr/local/lib/python2.7/
dist-packages/apache_beam/runners/worker/sdk_worker.py", line 265, in do_instruction request.instruction_id)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 281, in
process_bundle delayed_applications = bundle_processor.process_bundle(instruction_id) File "/usr/local/lib/
python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py", line 552, in process_bundle op.finish()
File "apache_beam/runners/worker/operations.py", line 549, in
apache_beam.runners.worker.operations.DoOperation.finish def finish(self): File "apache_beam/runners/worker/
operations.py", line 550, in apache_beam.runners.worker.operations.DoOperation.finish with
self.scoped_finish_state: File "apache_beam/runners/worker/operations.py", line 551, in
apache_beam.runners.worker.operations.DoOperation.finish self.dofn_runner.finish() File "apache_beam/runners/
common.py", line 758, in apache_beam.runners.common.DoFnRunner.finish self._invoke_bundle_method
(self.do_fn_invoker.invoke_finish_bundle) File "apache_beam/runners/common.py", line 752, in
apache_beam.runners.common.DoFnRunner._invoke_bundle_method self._reraise_augmented(exn) File "apache_beam/
runners/common.py", line 777, in apache_beam.runners.common.DoFnRunner._reraise_augmented raise_with_traceback
(new_exn) File "apache_beam/runners/common.py", line 750, in
apache_beam.runners.common.DoFnRunner._invoke_bundle_method bundle_method() File "apache_beam/runners/common.py",
line 361, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle def invoke_finish_bundle(self): File
"apache_beam/runners/common.py", line 365, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
self.signature.finish_bundle_method.method_value()) File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/
gcp/bigquery.py", line 630, in finish_bundle self._flush_batch() File "/usr/local/lib/python2.7/dist-packages/
apache_beam/io/gcp/bigquery.py", line 637, in _flush_batch table_id=self.table_id, rows=self._rows_buffer) File
# HERE:
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery_tools.py",
line 611, in insert_rows for k, v in iteritems(row): File "/usr/local/lib/python2.7/dist-packages/future/utils/
__init__.py", line 308, in iteritems func = obj.items AttributeError: 'int' object has no attribute 'items'
[while running 'generatedPtransform-135']
I've also tried returning a list and had the same error that 'list' object has no 'items' therefore I'm converting the list rows to a dict like this:
0 {
"data": "test data",
"test_test_unit": "test1",
"test_test_unit": "test2",
"test1_test_unit": "test1",
...
},
1 {
"data": "test data",
"test_test_unit": "test1",
"test_test_unit": "test2",
"test1_test_unit": "test1",
...
}
I'm fairly new to this so any help will be appreciated!
You'll need to use the yield keyword to emit multiple outputs in your DoFn. For example:
class parse_pubsub(beam.DoFn):
def process(self, element):
# ...
# flattens the data
# ...
for row in rows:
yield row
I have defined MongoEngine classes which are mapped with MongoDB. When I am trying to access the data using MongoEngine, at the specific code it is failing at first attempt but successfully returns data in the second attempt with the same code. Executing the code in python terminal
from Project.Mongo import User
user = User.objects(username = 'xyz#xyz.com').first()
from Project.Mongo import Asset
Asset.objects(org = user.org)
Last line from code generating the following error in first attempt.
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/mongoengine/queryset/manager.py", line 37, in get
queryset = queryset_class(owner, owner._get_collection())
File "/usr/local/lib/python3.5/dist-packages/mongoengine/document.py", line 209, in _get_collection
cls.ensure_indexes()
File "/usr/local/lib/python3.5/dist-packages/mongoengine/document.py", line 765, in ensure_indexes
collection.create_index(fields, background=background, **opts)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 1754, in create_index
self.__create_index(keys, kwargs, session, **cmd_options)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 1656, in __create_index
session=session)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 245, in _command
retryable_write=retryable_write)
File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 517, in command
collation=collation)
File "/usr/local/lib/python3.5/dist-packages/pymongo/network.py", line 125, in command
parse_write_concern_error=parse_write_concern_error)
File "/usr/local/lib/python3.5/dist-packages/pymongo/helpers.py", line 145, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Index: { v: 2, key: { org: 1, _fts: "text", _ftsx: 1 }, name: "org_1_name_content_text_description_text_content_text_tag_content_text_remote.source_text", ns: "digitile.asset", weights: { content: 3, description: 1, name_content: 10, remote.owner__name: 20, remote.source: 2, tag_content: 2 }, default_language: "english", background: false, language_override: "language", textIndexVersion: 3 } already exists with different options: { v: 2, key: { org: 1, _fts: "text", _ftsx: 1 }, name: "org_1_name_text_description_text_content_text_tag_content_text_remote.source_text", ns: "digitile.asset", default_language: "english", background: false, weights: { content: 3, description: 1, name: 10, remote.owner__name: 20, remote.source: 2, tag_content: 2 }, language_override: "language", textIndexVersion: 3 }
When I try same last line second time, it produces accurate result
I am using python 3.5.2
pymongo 3.7.2
mongoengine 0.10.6
The first time you call .objects on a document class, mongoengine tries to create the indexes if they don't exist.
In this case it fails during the creation of an index on the asset collection (detail of indexes are taken from your Asset/User Document classes) as you can see in the error message:
pymongo.errors.OperationFailure: Index: {...new index details...} already exists with different options {...existing index details...}.
The second time you make that call, mongoengine assumes that the indexes were created and isn't attempting to create it again, which explains why the second call passes.
I have spent the last 12 hours scouring the web. I am completely lost, please help.
I am trying to pull data from an API endpoint and put it into MongoDB. The data looks like this:
{"_links": {
"self": {
"href": "https://us.api.battle.net/data/sc2/ladder/271302?namespace=prod"
}
},
"league": {
"league_key": {
"league_id": 5,
"season_id": 37,
"queue_id": 201,
"team_type": 0
},
"key": {
"href": "https://us.api.battle.net/data/sc2/league/37/201/0/5?namespace=prod"
}
},
"team": [
{
"id": 6956151645604413000,
"rating": 5321,
"wins": 131,
"losses": 64,
"ties": 0,
"points": 1601,
"longest_win_streak": 15,
"current_win_streak": 4,
"current_rank": 1,
"highest_rank": 10,
"previous_rank": 1,
"join_time_stamp": 1534903699,
"last_played_time_stamp": 1537822019,
"member": [
{
"legacy_link": {
"id": 9964871,
"realm": 1,
"name": "mTOR#378",
"path": "/profile/9964871/1/mTOR"
},
"played_race_count": [
{
"race": "Zerg",
"count": 195
}
],
"character_link": {
"id": 9964871,
"battle_tag": "Hellghost#11903",
"key": {
"href": "https://us.api.battle.net/data/sc2/character/Hellghost-11903/9964871?namespace=prod"
}
}
}
]
},
{
"id": 11611747760398664000, .....
....
Here's the code:
for ladder_number in ladder_array:
ladder_call_url = ladder_call+slash+str(ladder_number)+eng_locale+access_token
url = str(ladder_call_url)
response = requests.get(url)
print('trying ladder number '+str(ladder_number))
print('calling :'+url)
if response.status_code == 200:
print('status: '+str(response))
mmr_db.ladders.insert_one(response.json())
I get an error:
OverflowError: MongoDB can only handle up to 8-byte ints?
Is this because the data I am trying to load is too large? Are the "ID" integers too large?
Oh man, any help would be sincerely appreciated.
_______ EDIT ____________
Edited to include the Traceback:
Traceback (most recent call last):
File "C:\scripts\mmr_from_ladders.py", line 96, in <module>
mmr_db.ladders.insert_one(response.json(), bypass_document_validation=True)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 693, in insert_one
session=session),
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 607, in _insert
bypass_doc_val, session)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 595, in _insert_one
acknowledged, _insert_command, session)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\mongo_client.py", line 1243, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\mongo_client.py", line 1196, in _retry_with_session
return func(session, sock_info, retryable)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\collection.py", line 590, in _insert_command
retryable_write=retryable_write)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\pool.py", line 584, in command
self._raise_connection_failure(error)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\pool.py", line 745, in _raise_connection_failure
raise error
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\pool.py", line 579, in command
unacknowledged=unacknowledged)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\network.py", line 114, in command
codec_options, ctx=compression_ctx)
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pymongo\message.py", line 679, in _op_msg
flags, command, identifier, docs, check_keys, opts)
OverflowError: MongoDB can only handle up to 8-byte ints
The BSON spec — MongoDB’s native binary extended JSON format / data type — only supports 32 bit (signed) and 64 bit (signed) integers — 8 bytes being 64 bits.
The maximum integer value that can be stored in a 64 bit int is:
9,223,372,036,854,775,807
In your example you appear to have larger ids, for example:
11,611,747,760,398,664,000
I’m guessing that the app generating this data is using uint64 types (unsigned can hold x2-1 values).
I would start by looking at either of these potential solutions, if possible:
Changing the other side to use int64 (signed) types for the IDs.
Replacing the incoming IDs using ObjectId() as you then get a 12 byte ~ GUID for your unique IDs.