Oracle SQL loader Error while loading really huge log files - python

I have a python script which loops through log files in a directory and uses oracle sqlloader to load the log files to the oracle database. the script works properly and even the sqlloader..
But after loading around some 200k records,the loading fails with this exception..
Record 11457: Rejected - Error on table USAGE_DATA.
ORA-12571: TNS:packet writer failure
SQL*Loader-926: OCI error while uldlfca:OCIDirPathColArrayLoadStream for table USAGE_DATA
SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
Specify SKIP=11000 when continuing the load.
SQL*Loader-925: Error while uldlgs: OCIStmtExecute (ptc_hp)
ORA-03114: not connected to ORACLE
SQL*Loader-925: Error while uldlgs: OCIStmtFetch (ptc_hp)
ORA-24338: statement handle not executed
I am not sure why this is hapenning.. I have checked the data files corresponding to the table's table space and it has auto extend set to true. What else could be the reason?
in the "sqlldr" command i have rows=1000 and Direct=True, so it commits for every 1000 records loaded, i have tested by varying this number, still getting same error.
sqlldr arisdw/arisdwabc01#APPDEV24 control=Temp_Sadish.ctl direct=true rows=1000 data=C:/_dev/logs/sample/data/mydata1.csv;

Post your controlfile contents. What version of Oracle are you using?
The ORA-42338 error is the one I would focus on. Are you doing any sort of data transformation in your job? Calling functions or similar?

Related

How to be informed that some database information has been changed in Python

I'm working on a code wrote in Python 2.7 that connects to a MariaDB database to read data.
This database receives data from different external resources. My code only read it.
My service read the data once at the beginning and keep everything in memory to avoid I/O.
I would like to know if there is someway to create some 'function callback' in my code to receive some kind of alert of new update/insert, so I can reload my memory data from the database every time that any external resource change or save new data.
I have thought of creating a sql trigger to a new table to insert some "flag" there and put my service to check that new table periodically if the flag is present.
If so, reload the data and delete the flag.
But it sounds like a wrong workaround...
I'm using:
Python 2.7
MariaDB Ver 15.1 Distrib 10.3.24-MariaDB
lib mysql-connector 2.1.6
The better solution for MariaDB is streaming with the CDC API: https://mariadb.com/resources/blog/how-to-stream-change-data-through-mariadb-maxscale-using-cdc-api/
The plan you have now, with using a flag table, means your client has to poll the flag table for presence of the flag. You have to run a query against that table at intervals, and keep doing it 24/7. Depending on how quickly your client needs to be notified of a change, you might need to run this type of polling query very frequently, which puts a burden on the MariaDB server just to respond to the polling queries, even when there is no change to report.
The CDC solution is better because the client can just request to be notified the next time a change occurs, then the client waits. It does not put extra load on the MariaDB server, any more than if you had simply added a replica server.

pyodbc An internal error occurred that prevents further processing of this command: 'Object reference not set to an instance of an object.'

I have a python script which uses pyodbc to connect a Microsoft SQL Server (SQL Pool indeed) and executes a COPY INTO statement on a daily basis. It has been working fine for months but last week suddenly started to crash returning the abovementioned error. I haven't made any change, the statement still executing fine if I run it directly on the server but not by code.
The purpose of the script is to perform a COPY INTO operation from an Azure Data Lake file which gets loaded every day to the SQl Server. Again as I said I have already tryied executing the command on the server and trying to load previous versions of the file but the error keeps appearing and it is not very descriptive.
This is the piece of code that throws the error:
If I try to change the statement with a table that does not exists or with an invalid file name the error changes appropiately to invalid object name for example, so the connection is okay imo. And I have tryied to execute the statement without passing variables to the string as well
Thanks in advance.
Just in case someone faces a similar issue. It was resolved after setting pyodbc connection autocommit to True.

MongoDB service goes down while inserting data

I am running MongoDB 4.2.8 on Windows Server 2019 Datacenter and I want to simply insert a lot of data into the database, using Python and PyMongo.
It worked fine at first, but after inserting a certain number of records (~2 million), the MongoDB service went down. I deleted the data and ran my program again, the same thing still happened.
I couldn't find out the cause of this problem.
Here are the error messages that I copied from the log file.
EDIT: The first line of the error messages is
2020-07-17T18:00:51.614+0800 E STORAGE [conn38247] WiredTiger error (0) [1594980051:613746][17560:140711179997792], file:collection-8-5621763546278059960.wt, WT_CURSOR.search: __wt_block_read_off, 283: collection-8-5621763546278059960.wt: read checksum error for 32768B block at offset 4277100544: block header checksum of 0xf5304876 doesn't match expected checksum of 0x1ea56329 Raw: [1594980051:613746]
The error wt: read checksum error indicates that the storage engine read a block whose checksum did not match the data.
This implies that the data on disk was changed after being written by the mongod process and is no longer self-consistent.
Make sure that you aren't trying to run 2 different mongod processes with the same data directory, and that there are no other processes trying to access these files.

Temporary SQlite malformed pragma / disk image

We use SQLite databases to store the results coming out of data analysis pipelines. The database files sit on our high performance scale-out filestore which is connected to the same switch as our cluster nodes to ensure a good connection.
However recently I've been having trouble querying the database via python. This is particularly the case when many jobs are trying to query the database at once. I get error messages such as
sqlite3.DatabaseError: malformed database schema (primary_digest_joint) - index primary_digest_joint already exists
or
sqlite3.DatabaseError: database disk image is malformed
Note that these jobs are only reading the database, not writing to it (nothing is writing to the database), which I thought should be fine with SQLite.
Generally if I stop the pipeline, I can access the database fine and it appears to be perfectly intact. If I restart the pipeline again a number of jobs will successfully complete before I get the error again.
Any idea why this is happening or what can be done to stop it? Is there any chance that the database is actually being damaged, even though I seems to be fine, and running a PRAGMA integrity_check doesn't suggest anything is wrong.

SQL queries through PYODBC fail silently on one machine, works on another

I am working on a program to automate parsing data from XML files and storing it into several databases. (Specifically the USGS realtime water quality service, if anyone's interested, at http://waterservices.usgs.gov/rest/WaterML-Interim-REST-Service.html) It's written in Python 2.5.1 using LXML and PYODBC. The databases are in Microsoft Access 2000.
The connection function is as follows:
def get_AccessConnection(db):
connString = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=' + db
cnxn = pyodbc.connect(connString, autocommit=False)
cursor = cnxn.cursor()
return cnxn, cursor
where db is the filepath to the database.
The program:
a) opens the connection to the database
b) parses 2 to 8 XML files for that database and builds the values from them into a series of records to insert into the database (using a nested dictionary structure, not a user-defined type)
c) loops through the series of records, cursor.execute()-ing an SQL query for each one
d) commits and closes the database connection
If the cursor.execute() call throws an error, it writes the traceback and the query to the log file and moves on.
When my coworker runs it on his machine, for one particular database, specific records will simply not be there, with no errors recorded. When I run the exact same code on the exact same copy of the database over the exact same network path from my machine, all the data that should be there is there.
My coworker and I are both on Windows XP computers with Microsoft Access 2000 and the same versions of Python, lxml, and pyodbc installed. I have no idea how to check whether we have the same version of the Microsoft ODBC drivers. I haven't been able to find any difference between the records that are there and the records that aren't. I'm in the process of testing whether the same problem happens with the other databases, and whether it happens on a third coworker's computer as well.
What I'd really like to know is ANYTHING anyone can think of that would cause this, because it doesn't make sense to me. To summarize: Python code executing SQL queries will silently fail half of them on one computer and work perfectly on another.
Edit:
No more problem. I just had my coworker run it again, and the database was updated completely with no missing records. Still no idea why it failed in the first place, nor whether or not it will happen again, but "problem solved."
I have no idea how to check whether
we have the same version of the
Microsoft ODBC drivers.
I think you're looking for Control Panel | Administrative Tools | Data Sources (ODBC). Click the "Drivers" tab.
I think either Access 2000 or Office 2000 shipped with a desktop edition of SQL Server called "MSDE". Might be worth installing that for testing. (Or production, for that matter.)

Categories