I am running MongoDB 4.2.8 on Windows Server 2019 Datacenter and I want to simply insert a lot of data into the database, using Python and PyMongo.
It worked fine at first, but after inserting a certain number of records (~2 million), the MongoDB service went down. I deleted the data and ran my program again, the same thing still happened.
I couldn't find out the cause of this problem.
Here are the error messages that I copied from the log file.
EDIT: The first line of the error messages is
2020-07-17T18:00:51.614+0800 E STORAGE [conn38247] WiredTiger error (0) [1594980051:613746][17560:140711179997792], file:collection-8-5621763546278059960.wt, WT_CURSOR.search: __wt_block_read_off, 283: collection-8-5621763546278059960.wt: read checksum error for 32768B block at offset 4277100544: block header checksum of 0xf5304876 doesn't match expected checksum of 0x1ea56329 Raw: [1594980051:613746]
The error wt: read checksum error indicates that the storage engine read a block whose checksum did not match the data.
This implies that the data on disk was changed after being written by the mongod process and is no longer self-consistent.
Make sure that you aren't trying to run 2 different mongod processes with the same data directory, and that there are no other processes trying to access these files.
Related
my company has an arcgis server, and i've been trying to geocode some address using the python requests packages.
However, as long as the input format is correct, the reponse.status_code is always"200", meaning everything is OK, even if the server didn't process the request properly.
( for example, if the batch size limit is 1000 records, and I sent an json input with 2000 records, it would still return status_code 200, but half of the records will get ignored. )
just wondering if there is a way for me to know if the server process the request properly or not?
A great spot to check is the server logs to start with. They are located in your ArcGIS server manager (https://gisserver.domain.com:6443/arcgis/manager). I would assume it would log some type of warning/info there if records were ignored, but it is not technically an error so there would be no error messages would be returned anywhere.
I doubt you'd want to do this but if you want to up your limit you can follow this technical article on how to do thathttps://support.esri.com/en/technical-article/000012383
I'm currently building a pipeline that reads data from MongoDB everytime new document gets inserted and send it to external data source after some preprocessing. Preprocessing and sending data to external data source part works well the way I designed.
The problem, however, I can't read data from MongoDB. I'm trying to build a trigger that reads data from MongoDB when certain MongoDB collection gets updated then sends it to python. I'm not considering polling a MongoDB since it's too resource-intensive.
I've found this library mongotriggers(https://github.com/drorasaf/mongotriggers/) and now taking a look at it.
In summary, how can I build a trigger that sends data to python from MongoDB when new document gets inserted to specific collection?
Any comment or feedback would be appreciated.
Thanks in advance.
Best
Gee
In MongoDB v3.6+, you can now use MongoDB Change Streams. Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them.
For example to listen to streams from MongoDB when a new document gets inserted:
try:
with db.collection.watch([{'$match': {'operationType': 'insert'}}]) as stream:
for insert_change in stream:
# Do something
print(insert_change)
except pymongo.errors.PyMongoError:
# The ChangeStream encountered an unrecoverable error or the
# resume attempt failed to recreate the cursor.
logging.error('...')
pymongo.collection.Collection.watch() is available from PyMongo 3.6.0+.
We use SQLite databases to store the results coming out of data analysis pipelines. The database files sit on our high performance scale-out filestore which is connected to the same switch as our cluster nodes to ensure a good connection.
However recently I've been having trouble querying the database via python. This is particularly the case when many jobs are trying to query the database at once. I get error messages such as
sqlite3.DatabaseError: malformed database schema (primary_digest_joint) - index primary_digest_joint already exists
or
sqlite3.DatabaseError: database disk image is malformed
Note that these jobs are only reading the database, not writing to it (nothing is writing to the database), which I thought should be fine with SQLite.
Generally if I stop the pipeline, I can access the database fine and it appears to be perfectly intact. If I restart the pipeline again a number of jobs will successfully complete before I get the error again.
Any idea why this is happening or what can be done to stop it? Is there any chance that the database is actually being damaged, even though I seems to be fine, and running a PRAGMA integrity_check doesn't suggest anything is wrong.
I have a python script which loops through log files in a directory and uses oracle sqlloader to load the log files to the oracle database. the script works properly and even the sqlloader..
But after loading around some 200k records,the loading fails with this exception..
Record 11457: Rejected - Error on table USAGE_DATA.
ORA-12571: TNS:packet writer failure
SQL*Loader-926: OCI error while uldlfca:OCIDirPathColArrayLoadStream for table USAGE_DATA
SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
Specify SKIP=11000 when continuing the load.
SQL*Loader-925: Error while uldlgs: OCIStmtExecute (ptc_hp)
ORA-03114: not connected to ORACLE
SQL*Loader-925: Error while uldlgs: OCIStmtFetch (ptc_hp)
ORA-24338: statement handle not executed
I am not sure why this is hapenning.. I have checked the data files corresponding to the table's table space and it has auto extend set to true. What else could be the reason?
in the "sqlldr" command i have rows=1000 and Direct=True, so it commits for every 1000 records loaded, i have tested by varying this number, still getting same error.
sqlldr arisdw/arisdwabc01#APPDEV24 control=Temp_Sadish.ctl direct=true rows=1000 data=C:/_dev/logs/sample/data/mydata1.csv;
Post your controlfile contents. What version of Oracle are you using?
The ORA-42338 error is the one I would focus on. Are you doing any sort of data transformation in your job? Calling functions or similar?
I am running MySQL 5.1 on my windows vista installation. The table in question uses MyISAM, has about 10 million rows. It is used to store text messages posted by users on a website.
I am trying to run the following query on it,
query = "select id, text from messages order by id limit %d offset %d" %(limit, offset)
where limit is set to a fixed value (in this case 20000) and offset is incremented in steps of 20000.
This query goes into an infinite loop when offset = 240000. This particular value and not any other value.
I isolated this query into a script and ran it, and got the same results. I then tried to run the last query (with offset = 240000) directly, and it worked !
I then tried executing the same queries directly in a mysql client to make sure that the error was not in the python DB accessor module. All the queries returned results, except the one with offset = 240000.
I then looked at the mysql server logs and saw the following.
[ERROR] C:\Program Files\MySQL\MySQL Server 5.1\bin\mysqld: Sort aborted
This probably means that when I stopped the python process (out of frustration), the mysqld process was 'sorting' something. When I looked at the my.ini file, I saw a lot of MAX_* options. I am currently experimenting with these, but just throwing it out there in the meanwhile.
Any help appreciated!
Have you checked the table with myisamchk?