We use SQLite databases to store the results coming out of data analysis pipelines. The database files sit on our high performance scale-out filestore which is connected to the same switch as our cluster nodes to ensure a good connection.
However recently I've been having trouble querying the database via python. This is particularly the case when many jobs are trying to query the database at once. I get error messages such as
sqlite3.DatabaseError: malformed database schema (primary_digest_joint) - index primary_digest_joint already exists
or
sqlite3.DatabaseError: database disk image is malformed
Note that these jobs are only reading the database, not writing to it (nothing is writing to the database), which I thought should be fine with SQLite.
Generally if I stop the pipeline, I can access the database fine and it appears to be perfectly intact. If I restart the pipeline again a number of jobs will successfully complete before I get the error again.
Any idea why this is happening or what can be done to stop it? Is there any chance that the database is actually being damaged, even though I seems to be fine, and running a PRAGMA integrity_check doesn't suggest anything is wrong.
Related
I am using an embedded monetdb database in python using Monetdbe.
I can see how to create a new connection with the :memory: setting
But i cant see a way to persist the created database and tables for use later.
Once an in memory session ends, all data is lost.
So i have two questions:
Is there a way to persist an in memory db to local disk
and
Once an in memory db has been saved to local disk, is it possible to load the db to memory at a later point to allow fast data analytics. At the moment it looks like if i create a connection from a file location, then my queries are reading from local disk rather memory.
It is a little bit hidden away admittedly, but you can check out the following code snipet from the movies.py example in the monetdbe-examples repository:
import monetdbe
database = '/tmp/movies.mdbe'
with monetdbe.connect(database) as conn:
conn.set_autocommit(True)
conn.execute(
"""CREATE TABLE Movies
(id SERIAL, title TEXT NOT NULL, "year" INTEGER NOT NULL)""")
So in this example the single argument to connect is just the desired path to your database directory. This is how you can (re)start a database that stores its data in a persistent way on a file system.
Notice that I have intentionally removed the python lines from the example in the actual repo that start with the comment # Removes the database if it already exists. Just to make the example in the answer persistent.
I haven't run the code but I expect that if you run this code twice consecutively the second run wil return a database error on the execute statement as the movies table should already be there.
And just to be sure, don't use the /tmp directory if you want your data to persist between restarts of your computer.
I am reading voltages from a device continuously which I want to log data on to an SQLite database in real-time. Is it possible to add and read the data from the database so that I can plot it? I am using matplotlib in python.
There's no reason why you couldn't. If the database is locked by a write, the read(s) will block for that time.
You might want to try the SQLite WAL mode for better concurrent performance by running
PRAGMA journal_mode=WAL;
as the first command of your SQLite connection.
So I have a Google sheet that maintains a lot of data. I also have a MySQL DB with a huge junk of data. There is a vital piece of information in the Sheet that is also present in the DB. Both needs to be in sync. The information always enters the Sheet first. I had a python script with mysql queries to update my database separately.
Now the work flow has changed. Data will enter the sheet and whenever that happens the database has to updated automatically.
After some research, I found that using the onEdit function of Google AppScript (I learned from here.), I could pickup when the file has changed.
The Next step is to fetch the data from relevant cell, which I can do using this.
Now I need to connect to the DB and send some queries. This is where I am stuck.
Approach 1:
Have a python web-app running live. Send the data via UrlFetchApp.This I yet have to try.
Approach 2:
Connect to mySQL remotely through appscript. But I am not sure this is possible after 2-3 hours of reading the docs.
So this is my scenario. Any viable solution you can think of or a better approach?
Connect directly to mySQL. You likely missed reading this part https://developers.google.com/apps-script/guides/jdbc
Using JDBC within Apps Script will work if you have the time to build this yourself.
If you don't want to roll your own solution, check out SeekWell. It allows you to connect to databases and write SQL queries directly in Sheets. You can create a run a “Run Sheet” that will run multiple queries at once and schedule those queries to be run without you even opening the Sheet.
Disclaimer: I made this.
I am trying to load a huge file like 5900 lines of sql creates, inserts and alter tables into mysql database with flask-SQLalchemy.
I am parsing the file and seperate each command by splitting between ;
This works as expected.
Here is what I am having so far.
For the SQL Query execution I am using the Engine API of SQLAlchemy.
When I execute the queries it seems that the database quits its job after like 5400lines of the file, but the application logs the full execution until line 5900 without error.
When i do the creates and inserts seperately it also works, so is there a way to split the batch execution or use pooling or something like that, which does not make the database stuck.
Thank you!
I have a python script which loops through log files in a directory and uses oracle sqlloader to load the log files to the oracle database. the script works properly and even the sqlloader..
But after loading around some 200k records,the loading fails with this exception..
Record 11457: Rejected - Error on table USAGE_DATA.
ORA-12571: TNS:packet writer failure
SQL*Loader-926: OCI error while uldlfca:OCIDirPathColArrayLoadStream for table USAGE_DATA
SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
Specify SKIP=11000 when continuing the load.
SQL*Loader-925: Error while uldlgs: OCIStmtExecute (ptc_hp)
ORA-03114: not connected to ORACLE
SQL*Loader-925: Error while uldlgs: OCIStmtFetch (ptc_hp)
ORA-24338: statement handle not executed
I am not sure why this is hapenning.. I have checked the data files corresponding to the table's table space and it has auto extend set to true. What else could be the reason?
in the "sqlldr" command i have rows=1000 and Direct=True, so it commits for every 1000 records loaded, i have tested by varying this number, still getting same error.
sqlldr arisdw/arisdwabc01#APPDEV24 control=Temp_Sadish.ctl direct=true rows=1000 data=C:/_dev/logs/sample/data/mydata1.csv;
Post your controlfile contents. What version of Oracle are you using?
The ORA-42338 error is the one I would focus on. Are you doing any sort of data transformation in your job? Calling functions or similar?