I had a question regarding the python IBM_DB package (but I think it could be applied to any of the packages that employ the connection/cursor logic i.e. pyodbc).
When the cursor.execute() method is called, it executes an sql query on the database. However, to access this data, you would need to use the fetchall()/other fetch methods. I want to time the hit on the database.
Does the query completely finish running at the execute level, and it is in memory just for python to fetch? Or does the fetch method continue calling the database? I have scoured the documentation and am unable to find anything definitive on this subject.
Most or all of the Db2 open source drivers are based on the Call Level Interface (CLI). The CLI functions and details are part of the overall Db2 documentation. The Fetch() from a ResultSet retrieves one more row.
AFAIK the result set can be cached or go back to the engine. It makes sense to bring in few (dozen) rows, but not for some million rows.
You would need insights and understanding of how drivers and database query processing work in order to measure something useful and interpret it correctly.
BTW: There is some form of CLI tracing available.
Related
Say I have a SQL 2008 Database, which contains the inventory data for my business. I need to trigger a python script, once an item qty is changed. This can result under several conditions, it could be a sale order, or simply a qty change. The python script will transform the data, and upload it to google sheets.
I need this to trigger in realtime, like when the specified columns change or records are created, I need to fire off the script.
Its preferred the solution runs on the DB server itself, without having to pay for other integration tools such as Zapier. (Besides Zapier wont help here)
Constraints:
I cannot move the database to the cloud (Business Restriction)
Upgrading the database to a new version is not possible either (budget)
Changing the Database to open source is not possible either (other application dependencies)
Its a real pickle, but I'm trying to find a solution for a real time trigger.
Failing that I could almost implement a periodic scanning method, but this will create new problems.
Havent tried anything yet, because I have no idea what to try here.
Some google searches, but was not able to find a solution.
Source: https://www.sqlshack.com/use-xp-cmdshell-extended-procedure/
The answer to this is to configure the required trigger, using xp_cmdshell. You can then use xp_cmdshell to run a .bat or .py file.
Be aware of the permissions xp_cmdshell is running as, most likely this will be the SQL user that runs the file, you will have to ensure this user has the right privileges at the OS level to execute files, and write to any directories that need to be written to.
If anyone is in a similar situation, they could look into upgrading to SQL Express (which is free), not sure if this would break the application, but there is a good chance it will not(SQL Server 2008R2 Express upgrade to SQL Server 2012 Express).
It goes without saying this is certainly not best practice, and if you can at all avoid it, it would be best to run a scheduled task instead.
I am working on a project now where I need to load daily data from one psql database into another one (both databases are on separate remote machines).
The Postgres version I'm using is 9.5, and due to our infrastructure, I am currently doing this using python scripts, which works fine for now, although I was wondering:
Is it possible to do this using psql commands that I can easily schedule? or is python a flexible enough appproach for future developments?
EDIT:
The main database contains a backend connected directly to a website and the other contains an analytics system which basically only needs to read the main db's data and store future transformations of it.
The latency is not very important, what is important is the reliability and simplicity.
sure, you can use psql and an ssh connection if you want.
this approach (or using pg_dump) can be useful as way to reduce the effexcts of latency.
however note that the SQL insert...values command can insert several rows in a single command. When I use python scripts to migrate data I build insert commands that insert up-to 1000 rows, thus reducing latency by a factor of 1000,
Another approach worth considering is dblink which allows postgres to query a remote postgres directly, so you could do a select from the remote database and insert the result into a local table.
Postgres-FDW may be worth a look too.
So I have a Google sheet that maintains a lot of data. I also have a MySQL DB with a huge junk of data. There is a vital piece of information in the Sheet that is also present in the DB. Both needs to be in sync. The information always enters the Sheet first. I had a python script with mysql queries to update my database separately.
Now the work flow has changed. Data will enter the sheet and whenever that happens the database has to updated automatically.
After some research, I found that using the onEdit function of Google AppScript (I learned from here.), I could pickup when the file has changed.
The Next step is to fetch the data from relevant cell, which I can do using this.
Now I need to connect to the DB and send some queries. This is where I am stuck.
Approach 1:
Have a python web-app running live. Send the data via UrlFetchApp.This I yet have to try.
Approach 2:
Connect to mySQL remotely through appscript. But I am not sure this is possible after 2-3 hours of reading the docs.
So this is my scenario. Any viable solution you can think of or a better approach?
Connect directly to mySQL. You likely missed reading this part https://developers.google.com/apps-script/guides/jdbc
Using JDBC within Apps Script will work if you have the time to build this yourself.
If you don't want to roll your own solution, check out SeekWell. It allows you to connect to databases and write SQL queries directly in Sheets. You can create a run a “Run Sheet” that will run multiple queries at once and schedule those queries to be run without you even opening the Sheet.
Disclaimer: I made this.
Example scenario:
MySQL running a single server -> HOSTNAME
Two MySQL databases on that server -> USERS , GAMES .
Task -> Fetch 10 newest games from GAMES.my_games_table , and fetch users playing those games from USERS.my_users_table ( assume no joins )
In Django as well as Python MySQLdb , why is having one cursor for each database more preferable ?
What is the disadvantage of an extended cursor which is single per MySQL server and can switch databases ( eg by querying "use USERS;" ), and then work on corresponding database
MySQL connections are cheap, but isn't single connection better than many , if there is a linear flow and no complex tranasactions which might need two cursors ?
A shorter answer would be, "MySQL doesn't support that type of cursor", so neither does Python-MySQL, so the reason one connection command is preferred is because that's the way MySQL works. Which is sort of a tautology.
However, the longer answer is:
A 'cursor', by your definition, would be some type of object accessing tables and indexes within an RDMS, capable of maintaining its state.
A 'connection', by your definition, would accept commands, and either allocate or reuse a cursor to perform the action of the command, returning its results to the connection.
By your definition, a 'connection' would/could manage multiple cursors.
You believe this would be the preferred/performant way to access a database as 'connections' are expensive, and 'cursors' are cheap.
However:
A cursor in MySQL (and other RDMS) is not a the user-accessible mechanism for performing operations. MySQL (and other's) perform operations in as "set", or rather, they compile your SQL command into an internal list of commands, and do numerous, complex bits depending on the nature of your SQL command and your table structure.
A cursor is a specific mechanism, utilized within stored procedures (and there only), giving the developer a way to work with data in a procedural way.
A 'connection' in MySQL is what you think of as a 'cursor', sort of. MySQL does not expose it's internals for you as an iterator, or pointer, that is merely moving over tables. It exposes it's internals as a 'connection' which accepts SQL and other commands, translates those commands into an internal action, performs that action, and returns it's result to you.
This is the difference between a 'set' and a 'procedural' execution style (which is really about the granularity of control you, the user, is given access to, or at least, the granularity inherent in how the RDMS abstracts away its internals when it exposes them via an API).
As you say, MySQL connections are cheap, so for your case, I'm not sure there is a technical advantage either way, outside of code organization and flow. It might be easier to manage two cursors than to keep track of which database a single cursor is currently talking to by painstakingly tracking SQL 'USE' statements. Mileage with other databases may vary -- remember that Django strives to be database-agnostic.
Also, consider the case where two different databases, even on the same server, require different access credentials. In such a case, two connections will be necessary, so that each connection can successfully authenticate.
One cursor per database is not necessarily preferable, it's just the default behavior.
The rationale is that different databases are more often than not on different servers, use different engines, and/or need different initialization options. (Otherwise, why should you be using different "databases" in the first place?)
In your case, if your two databases are just namespaces of tables (what should be called "schemas" in SQL jargon) but reside on the same MySQL instance, then by all means use a single connection. (How to configure Django to do so is actually an altogether different question.)
You are also right that a single connection is better than two, if you only have a single thread and don't actually need two database workers at the same time.
I have an application that needs to interface with another app's database. I have read access but not write.
Currently I'm using sql statements via pyodbc to grab the rows and using python manipulate the data. Since I don't cache anything this can be quite costly.
I'm thinking of using an ORM to solve my problem. The question is if I use an ORM like "sql alchemy" would it be smart enough to pick up changes in the other database?
E.g. sql alchemy accesses a table and retrieves a row. If that row got modified outside of sql alchemy would it be smart enough to pick it up?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Edit: To be more clear
I have one application that is simply a reporting tool lets call App A.
I have another application that handles various financial transactions called App B.
A has access to B's database to retrieve the transactions and generates various reports. There's hundreds of thousands of transactions. We're currently caching this info manually in python, if we need an updated report we refresh the cache. If we get rid of the cache, the sql queries combined with the calculations becomes unscalable.
I don't think an ORM is the solution to your problem of performance. By default ORMs tend to be less efficient than row SQL because they might fetch data that you're not going to use (eg. doing a SELECT * when you need only one field), although SQLAlchemy allows fine-grained control over the SQL generated.
Now to implement a caching mechanism, depending on your application, you could use a simple dictionary in memory or a specialized system such as memcached or Redis.
To keep your cached data relatively fresh, you can poll the source at regular intervals, which might be OK if your application can tolerate a little delay. Otherwise you'll need the application that has write access to the db to notify your application or your cache system when an update occurs.
Edit: since you seem to have control over app B, and you've already got a cache system in app A, the simplest way to solve your problem is probably to create a callback in app A that app B can call to expire cached items. Both apps need to agree on a convention to identify cached items.