MySQL select query not working with limit, offset parameters

MySQL select query not working with limit, offset parameters - python

I am running MySQL 5.1 on my windows vista installation. The table in question uses MyISAM, has about 10 million rows. It is used to store text messages posted by users on a website.
I am trying to run the following query on it,
query = "select id, text from messages order by id limit %d offset %d" %(limit, offset)
where limit is set to a fixed value (in this case 20000) and offset is incremented in steps of 20000.
This query goes into an infinite loop when offset = 240000. This particular value and not any other value.
I isolated this query into a script and ran it, and got the same results. I then tried to run the last query (with offset = 240000) directly, and it worked !
I then tried executing the same queries directly in a mysql client to make sure that the error was not in the python DB accessor module. All the queries returned results, except the one with offset = 240000.
I then looked at the mysql server logs and saw the following.
[ERROR] C:\Program Files\MySQL\MySQL Server 5.1\bin\mysqld: Sort aborted
This probably means that when I stopped the python process (out of frustration), the mysqld process was 'sorting' something. When I looked at the my.ini file, I saw a lot of MAX_* options. I am currently experimenting with these, but just throwing it out there in the meanwhile.
Any help appreciated!

Have you checked the table with myisamchk?

Related

MongoDB service goes down while inserting data

I am running MongoDB 4.2.8 on Windows Server 2019 Datacenter and I want to simply insert a lot of data into the database, using Python and PyMongo.
It worked fine at first, but after inserting a certain number of records (~2 million), the MongoDB service went down. I deleted the data and ran my program again, the same thing still happened.
I couldn't find out the cause of this problem.
Here are the error messages that I copied from the log file.
EDIT: The first line of the error messages is
2020-07-17T18:00:51.614+0800 E STORAGE [conn38247] WiredTiger error (0) [1594980051:613746][17560:140711179997792], file:collection-8-5621763546278059960.wt, WT_CURSOR.search: __wt_block_read_off, 283: collection-8-5621763546278059960.wt: read checksum error for 32768B block at offset 4277100544: block header checksum of 0xf5304876 doesn't match expected checksum of 0x1ea56329 Raw: [1594980051:613746]

The error wt: read checksum error indicates that the storage engine read a block whose checksum did not match the data.
This implies that the data on disk was changed after being written by the mongod process and is no longer self-consistent.
Make sure that you aren't trying to run 2 different mongod processes with the same data directory, and that there are no other processes trying to access these files.

getting pymongo.errors.CursorNotFound: cursor id "..." not found

I am reading over 100 million records from mongodb and creating nodes and relationships in neo4j.
whenever I run this after executing certain records I am getting pymongo.errors.CursorNotFound: cursor id "..." not found
earlier when I was executing it without "no_cursor_timeout=True" in the mongodb query then at every 64179 records I was getting the same error but after looking for this on StackOverflow I had tried this adding no_cursor_timeout=True but now also at 2691734 value I am getting the same error. HOW CAN I GET RID OF THIS ERROR I had also tried by defining the batch size.

Per the ticket Belly Buster mentioned you should try:
manually specifying the session to use with all your operations, and
periodically pinging the server using that session id to keep it alive on the server

Postgres closes connection during query after a few hundred seconds when using Psycopg2

I'm running PostgreSQL 9.6 (in Docker, using the postgres:9.6.13 image) and psycopg2 2.8.2.
My PostgreSQL server (local) hosts two databases. My goal is to create materialized views in one of the databases that use data from the other database using Postgres's foreign data wrappers. I do all this from a Python script that uses psycopg2.
This works well as long as creating the materialized view does not take too long (i.e. if the amount of data being imported isn't too large). However, if the process takes longer than roughly ~250 seconds, psycopg2 throws the exception
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
No error message (or any message concerning this whatsoever) can be found in Postgres's logs.
Materialized view creation completes successfully if I do it from an SQL client (Postico).
This code illustrates roughly what I'm doing in the Python script:
db = pg.connect(
dbname=config.db_name,
user=config.db_user,
password=config.db_password,
host=config.db_host,
port=config.db_port
)
with db.cursor() as c:
c.execute("""
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
CREATE SERVER fdw FOREIGN DATA WRAPPER postgres_fdw OPTIONS (...);
CREATE USER MAPPING FOR CURRENT_USER SERVER fdw OPTIONS (...);
CREATE SCHEMA foreign;
IMPORT FOREIGN SCHEMA foreign_schema FROM SERVER fdw INTO foreign;
""")
c.execute("""
CREATE MATERIALIZED VIEW IF NOT EXISTS my_view AS (
SELECT (...)
FROM foreign.foreign_table
);
""")

Adding the keepalive parameters to the psycopg2.connect call seems to have solved the problem:
self.db = pg.connect(
dbname=config.db_name,
user=config.db_user,
password=config.db_password,
host=config.db_host,
port=config.db_port,
keepalives=1,
keepalives_idle=30,
keepalives_interval=10,
keepalives_count=5
)
I still don't know why this is necessary. I can't find anyone else who has described having to use the keepalives parameter keywords when using Postgres in Docker just to be able to run queries that take longer than 4-5 minutes, but maybe it's obvious enough that nobody has noted it?

We encountered the same issue, and resolved it by adding net.ipv4.tcp_keepalive_time=200 to our docker-compose.yml file:
services:
myservice:
image: myimage
sysctls:
- net.ipv4.tcp_keepalive_time=200
From what I understand this will signal that the connection is alive after 200 seconds, which is less than the time it takes to drop the connection (300 seconds?), thus preventing it from being dropped.

It might be that PostgreSQL 9.6 kills your connections after the new timeout mentioned at https://stackoverflow.com/a/45627782/1587329. In that case, you could set
the statement_timeout in postgresql.conf
but it is not recommended.
It might work in Postico because the value has been set there.
To log an error you need to set log_min_error_statement to ERROR or lower for it to show.

Disabling cached results in mysql (using python)

I am using python 2.7, pyodbc and mysql 5.5. I am on windows
I have query which returns millions of rows and I would like to process it in chunks.
using the fetchmany function.
He a portion of the code
import pyodbc
connection = pyodbc.connect('Driver={MySQL ODBC 5.1 Driver};Server=127.0.0.1;Port=3306;Database=XXXX;User=root; Password='';Option=3;')
cursor_1 = connection.cursor()
strSQLStatement = 'SELECT x1, x2 from X'
cursor_1.execute(strSQLStatement)
# the error occurs here
x1 = cursor_1.fetchmany(10)
print x1
connection.close()
My problem:
I get the error MySQL client ran out of memory
I guess that this is because the cursor_1.execute tries to read everything into memory and tried the following (one by one) but to no avail
In user interface (ODBC – admin tools) I ticked the “Don't cache results of forwarding-only cursors”
connection.query("SET GLOBAL query_cache_size = 40000")
My question:
Does pyodbc has the possibility to run the query and serve the results only on demand ?
The MySQL manual suggests to invoke mysql with the --quick option. Can this be done also when not using the command line?
Thanks for your help.
P.S: suggestions for an alternative MySQL module are also welcome, but I use portable python so my choice is limited.

Using MySQLdb with SSCursor will solve your issues.
Unfortunately the documentation isn't great but it is mentioned in the user guide and you can find an example in this stackoverflow question.

Use the LIMIT clause on the query string.
http://dev.mysql.com/doc/refman/5.5/en/select.html
By using
SELECT x1, x2 from X LIMIT 0,1000
You'll only get the 1st 1k records, then by doing :
SELECT x1, x2 from X LIMIT 1000,2000
You'd get the next 1k records.
Loop this appropriately to get all your records. (I dont know python so cant help here :( )

SQL queries through PYODBC fail silently on one machine, works on another

I am working on a program to automate parsing data from XML files and storing it into several databases. (Specifically the USGS realtime water quality service, if anyone's interested, at http://waterservices.usgs.gov/rest/WaterML-Interim-REST-Service.html) It's written in Python 2.5.1 using LXML and PYODBC. The databases are in Microsoft Access 2000.
The connection function is as follows:
def get_AccessConnection(db):
connString = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=' + db
cnxn = pyodbc.connect(connString, autocommit=False)
cursor = cnxn.cursor()
return cnxn, cursor
where db is the filepath to the database.
The program:
a) opens the connection to the database
b) parses 2 to 8 XML files for that database and builds the values from them into a series of records to insert into the database (using a nested dictionary structure, not a user-defined type)
c) loops through the series of records, cursor.execute()-ing an SQL query for each one
d) commits and closes the database connection
If the cursor.execute() call throws an error, it writes the traceback and the query to the log file and moves on.
When my coworker runs it on his machine, for one particular database, specific records will simply not be there, with no errors recorded. When I run the exact same code on the exact same copy of the database over the exact same network path from my machine, all the data that should be there is there.
My coworker and I are both on Windows XP computers with Microsoft Access 2000 and the same versions of Python, lxml, and pyodbc installed. I have no idea how to check whether we have the same version of the Microsoft ODBC drivers. I haven't been able to find any difference between the records that are there and the records that aren't. I'm in the process of testing whether the same problem happens with the other databases, and whether it happens on a third coworker's computer as well.
What I'd really like to know is ANYTHING anyone can think of that would cause this, because it doesn't make sense to me. To summarize: Python code executing SQL queries will silently fail half of them on one computer and work perfectly on another.
Edit:
No more problem. I just had my coworker run it again, and the database was updated completely with no missing records. Still no idea why it failed in the first place, nor whether or not it will happen again, but "problem solved."

I have no idea how to check whether
we have the same version of the
Microsoft ODBC drivers.
I think you're looking for Control Panel | Administrative Tools | Data Sources (ODBC). Click the "Drivers" tab.
I think either Access 2000 or Office 2000 shipped with a desktop edition of SQL Server called "MSDE". Might be worth installing that for testing. (Or production, for that matter.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

MySQL select query not working with limit, offset parameters - python

Have you checked the table with myisamchk?

Related

MongoDB service goes down while inserting data

getting pymongo.errors.CursorNotFound: cursor id "..." not found

Postgres closes connection during query after a few hundred seconds when using Psycopg2

Disabling cached results in mysql (using python)

SQL queries through PYODBC fail silently on one machine, works on another

Categories

Resources