getting pymongo.errors.CursorNotFound: cursor id "..." not found - python

I am reading over 100 million records from mongodb and creating nodes and relationships in neo4j.
whenever I run this after executing certain records I am getting pymongo.errors.CursorNotFound: cursor id "..." not found
earlier when I was executing it without "no_cursor_timeout=True" in the mongodb query then at every 64179 records I was getting the same error but after looking for this on StackOverflow I had tried this adding no_cursor_timeout=True but now also at 2691734 value I am getting the same error. HOW CAN I GET RID OF THIS ERROR I had also tried by defining the batch size.

Per the ticket Belly Buster mentioned you should try:
manually specifying the session to use with all your operations, and
periodically pinging the server using that session id to keep it alive on the server

Related

How to be informed that some database information has been changed in Python

I'm working on a code wrote in Python 2.7 that connects to a MariaDB database to read data.
This database receives data from different external resources. My code only read it.
My service read the data once at the beginning and keep everything in memory to avoid I/O.
I would like to know if there is someway to create some 'function callback' in my code to receive some kind of alert of new update/insert, so I can reload my memory data from the database every time that any external resource change or save new data.
I have thought of creating a sql trigger to a new table to insert some "flag" there and put my service to check that new table periodically if the flag is present.
If so, reload the data and delete the flag.
But it sounds like a wrong workaround...
I'm using:
Python 2.7
MariaDB Ver 15.1 Distrib 10.3.24-MariaDB
lib mysql-connector 2.1.6
The better solution for MariaDB is streaming with the CDC API: https://mariadb.com/resources/blog/how-to-stream-change-data-through-mariadb-maxscale-using-cdc-api/
The plan you have now, with using a flag table, means your client has to poll the flag table for presence of the flag. You have to run a query against that table at intervals, and keep doing it 24/7. Depending on how quickly your client needs to be notified of a change, you might need to run this type of polling query very frequently, which puts a burden on the MariaDB server just to respond to the polling queries, even when there is no change to report.
The CDC solution is better because the client can just request to be notified the next time a change occurs, then the client waits. It does not put extra load on the MariaDB server, any more than if you had simply added a replica server.

Slow query msg using python motor client and mongodb

I have a MongoDB pod with millions of records in k8s and using AsyncIOMotorClient to iterate over the cursor with 1000 batch size. However, after a few iterations, the fetching got stuck and I can see the Slow query message from MongoDB logs.
The iteration is like below after getting collection:
cursor = collection.aggregate_raw_batches(pipeline=pipeline, batchSize=c['QUERY_BATCH_SIZE'])
async for batch in cursor:
...
Even I tried to paginate the fetched records by adding $skip $limit into pipeline, but the same result appears again.
I also tried to track resource usage such as CPU or memory in Kubernetes but everything looks fine except MongoDB itself.
Example log of MongoDB:
{"t":{"$date":"2022-03-24T13:02:41.042+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn1214224","msg":"Slow query","attr":{"type":"command","ns":"product.feed_1648113467","command":{"getMore":4946197083994343333,"collection":"feed_1648113467","batchSize":1000,"lsid":{"id":{"$uuid":"bec21ebb-f305-4438-8e1d-16badfc22bd3"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1648126959,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"product"},"originatingCommand":{"aggregate":"feed_1648113467","pipeline":[{"$match":{"_disapproved":{"$ne":true},"_gpla_feed_name":"detske-zbozi/kocarky","_delivery.direct":true,"$or":[{"_restricted":{"$ne":true}},{"_restriction_exception_for_google":true}]}},{"$limit":1000000},{"$project":{"_id":0,"g:description":1,"g:id":1,"title":"$g:name","g:price":{"$concat":[{"$toString":"$g:price_rounded"}," CZK"]},"g:brand":"$g:producer_name","g:image_link":"$g:image_url","g:additional_image_link":1,"link":"$g:url","g:availability":"in stock","g:condition":"new","g:material":1,"g:color":1,"g:gender":1,"g:size":1,"g:product_type":1,"g:google_product_category":1,"g:energy_efficiency_class":1,"g:unit_pricing_measure":1,"g:unit_pricing_base_measure":1,"g:gtin":"$g:ean","g:adult":"$_restricted","g:product_detail":1}}],"cursor":{"batchSize":0},"lsid":{"id":{"$uuid":"bec21ebb-f305-4438-8e1d-16badfc22bd3"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1648126959,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"product","$readPreference":{"mode":"primaryPreferred"}},"planSummary":"IXSCAN { _disapproved: 1, _gpla_feed_name: 1 }","cursorid":4946197083994343333,"keysExamined":861,"docsExamined":859,"cursorExhausted":true,"numYields":8,"nreturned":960,"reslen":642671,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":9}},"Global":{"acquireCount":{"r":9}},"Database":{"acquireCount":{"r":9}},"Collection":{"acquireCount":{"r":9}},"Mutex":{"acquireCount":{"r":1}}},"storage":{"data":{"bytesRead":21129204,"timeReadingMicros":112254}},"protocol":"op_msg","durationMillis":138}}
Thanks for any help

MongoDB service goes down while inserting data

I am running MongoDB 4.2.8 on Windows Server 2019 Datacenter and I want to simply insert a lot of data into the database, using Python and PyMongo.
It worked fine at first, but after inserting a certain number of records (~2 million), the MongoDB service went down. I deleted the data and ran my program again, the same thing still happened.
I couldn't find out the cause of this problem.
Here are the error messages that I copied from the log file.
EDIT: The first line of the error messages is
2020-07-17T18:00:51.614+0800 E STORAGE [conn38247] WiredTiger error (0) [1594980051:613746][17560:140711179997792], file:collection-8-5621763546278059960.wt, WT_CURSOR.search: __wt_block_read_off, 283: collection-8-5621763546278059960.wt: read checksum error for 32768B block at offset 4277100544: block header checksum of 0xf5304876 doesn't match expected checksum of 0x1ea56329 Raw: [1594980051:613746]
The error wt: read checksum error indicates that the storage engine read a block whose checksum did not match the data.
This implies that the data on disk was changed after being written by the mongod process and is no longer self-consistent.
Make sure that you aren't trying to run 2 different mongod processes with the same data directory, and that there are no other processes trying to access these files.

how do you tell if arcgis request is processed correctly?

my company has an arcgis server, and i've been trying to geocode some address using the python requests packages.
However, as long as the input format is correct, the reponse.status_code is always"200", meaning everything is OK, even if the server didn't process the request properly.
( for example, if the batch size limit is 1000 records, and I sent an json input with 2000 records, it would still return status_code 200, but half of the records will get ignored. )
just wondering if there is a way for me to know if the server process the request properly or not?
A great spot to check is the server logs to start with. They are located in your ArcGIS server manager (https://gisserver.domain.com:6443/arcgis/manager). I would assume it would log some type of warning/info there if records were ignored, but it is not technically an error so there would be no error messages would be returned anywhere.
I doubt you'd want to do this but if you want to up your limit you can follow this technical article on how to do thathttps://support.esri.com/en/technical-article/000012383

MySQL select query not working with limit, offset parameters

I am running MySQL 5.1 on my windows vista installation. The table in question uses MyISAM, has about 10 million rows. It is used to store text messages posted by users on a website.
I am trying to run the following query on it,
query = "select id, text from messages order by id limit %d offset %d" %(limit, offset)
where limit is set to a fixed value (in this case 20000) and offset is incremented in steps of 20000.
This query goes into an infinite loop when offset = 240000. This particular value and not any other value.
I isolated this query into a script and ran it, and got the same results. I then tried to run the last query (with offset = 240000) directly, and it worked !
I then tried executing the same queries directly in a mysql client to make sure that the error was not in the python DB accessor module. All the queries returned results, except the one with offset = 240000.
I then looked at the mysql server logs and saw the following.
[ERROR] C:\Program Files\MySQL\MySQL Server 5.1\bin\mysqld: Sort aborted
This probably means that when I stopped the python process (out of frustration), the mysqld process was 'sorting' something. When I looked at the my.ini file, I saw a lot of MAX_* options. I am currently experimenting with these, but just throwing it out there in the meanwhile.
Any help appreciated!
Have you checked the table with myisamchk?

Categories