So I have a Google sheet that maintains a lot of data. I also have a MySQL DB with a huge junk of data. There is a vital piece of information in the Sheet that is also present in the DB. Both needs to be in sync. The information always enters the Sheet first. I had a python script with mysql queries to update my database separately.
Now the work flow has changed. Data will enter the sheet and whenever that happens the database has to updated automatically.
After some research, I found that using the onEdit function of Google AppScript (I learned from here.), I could pickup when the file has changed.
The Next step is to fetch the data from relevant cell, which I can do using this.
Now I need to connect to the DB and send some queries. This is where I am stuck.
Approach 1:
Have a python web-app running live. Send the data via UrlFetchApp.This I yet have to try.
Approach 2:
Connect to mySQL remotely through appscript. But I am not sure this is possible after 2-3 hours of reading the docs.
So this is my scenario. Any viable solution you can think of or a better approach?
Connect directly to mySQL. You likely missed reading this part https://developers.google.com/apps-script/guides/jdbc
Using JDBC within Apps Script will work if you have the time to build this yourself.
If you don't want to roll your own solution, check out SeekWell. It allows you to connect to databases and write SQL queries directly in Sheets. You can create a run a “Run Sheet” that will run multiple queries at once and schedule those queries to be run without you even opening the Sheet.
Disclaimer: I made this.
Related
Say I have a SQL 2008 Database, which contains the inventory data for my business. I need to trigger a python script, once an item qty is changed. This can result under several conditions, it could be a sale order, or simply a qty change. The python script will transform the data, and upload it to google sheets.
I need this to trigger in realtime, like when the specified columns change or records are created, I need to fire off the script.
Its preferred the solution runs on the DB server itself, without having to pay for other integration tools such as Zapier. (Besides Zapier wont help here)
Constraints:
I cannot move the database to the cloud (Business Restriction)
Upgrading the database to a new version is not possible either (budget)
Changing the Database to open source is not possible either (other application dependencies)
Its a real pickle, but I'm trying to find a solution for a real time trigger.
Failing that I could almost implement a periodic scanning method, but this will create new problems.
Havent tried anything yet, because I have no idea what to try here.
Some google searches, but was not able to find a solution.
Source: https://www.sqlshack.com/use-xp-cmdshell-extended-procedure/
The answer to this is to configure the required trigger, using xp_cmdshell. You can then use xp_cmdshell to run a .bat or .py file.
Be aware of the permissions xp_cmdshell is running as, most likely this will be the SQL user that runs the file, you will have to ensure this user has the right privileges at the OS level to execute files, and write to any directories that need to be written to.
If anyone is in a similar situation, they could look into upgrading to SQL Express (which is free), not sure if this would break the application, but there is a good chance it will not(SQL Server 2008R2 Express upgrade to SQL Server 2012 Express).
It goes without saying this is certainly not best practice, and if you can at all avoid it, it would be best to run a scheduled task instead.
I had a question regarding the python IBM_DB package (but I think it could be applied to any of the packages that employ the connection/cursor logic i.e. pyodbc).
When the cursor.execute() method is called, it executes an sql query on the database. However, to access this data, you would need to use the fetchall()/other fetch methods. I want to time the hit on the database.
Does the query completely finish running at the execute level, and it is in memory just for python to fetch? Or does the fetch method continue calling the database? I have scoured the documentation and am unable to find anything definitive on this subject.
Most or all of the Db2 open source drivers are based on the Call Level Interface (CLI). The CLI functions and details are part of the overall Db2 documentation. The Fetch() from a ResultSet retrieves one more row.
AFAIK the result set can be cached or go back to the engine. It makes sense to bring in few (dozen) rows, but not for some million rows.
You would need insights and understanding of how drivers and database query processing work in order to measure something useful and interpret it correctly.
BTW: There is some form of CLI tracing available.
I have a Python Scraper that I run periodically in my free tier AWS EC2 instance using Cron that outputs a csv file every day containing around 4-5000 rows with 8 columns. I have been ssh-ing into it from my home Ubuntu OS and adding the new data to a SQLite database which I can then use to extract the data I want.
Now I would like to try the free tier AWS MySQL database so I can have the database in the Cloud and pull data from it from my terminal on my home PC. I have searched around and found no direct tutorial on how this could be done. It would be great if anyone that has done this could give me a conceptual idea of the steps I would need to take. Ideally I would like to automate the updating of the database as soon as my EC2 instance updates with a new csv table. I can do all the de-duping once the table is in the aws MySQL database.
Any advice or link to tutorials on this most welcome. As I stated, I have searched quite a bit for guides but haven't found anything on this. Perhaps the concept is completely wrong and there is an entirely different way of doing it that I am not seeing?
The problem is you don't have access to RDS filesystem, therefore cannot upload csv there (and import too).
Modify your Python Scraper to connect to DB directly and insert data there.
Did you consider using AWS Lambda to run your scraper?
Take a look at this AWS tutorial which will help you configure a Lambda Function to access an Amazon RDS database.
I am creating a Python application that uses embedded SQLite databases. The programme creates the db files and they are on a shared network drive. At this point there will be no more than 5 computers on the network running the programme.
My initial thought was to ask the user on startup if they are the server or client. If they are the server then they create the database. If they are the client they must find a server instance on the network. The one way I suppose is to send all db commands from client to server and server implements in the database. Will that solve the shared db issue?
Alternatively, is there some way to create a SQLite "server". I presume this would be the quicker option if available?
Note: I can't use a server engine such as MySQL or PostgreSQL at this point but I am implementing using ORM and so when this becomes viable, it should be easy to change over.
Here's a "SQLite Server", http://sqliteserver.xhost.ro/, but it looks like not in maintain for years.
SQLite supports concurrency itself, multiple processes can read data at one time and only one can write data into it. Also, When some process is writing, it'll lock the whole database file for a few seconds and others have to wait in the mean time according official document.
I guess this is sufficient for 5 processes as yor scenario. Just you need to write codes to handle the waiting.
I would like to write a trigger for a PostgreSQL database which, on insertions, would notify a node.js server which would send some data to connected clients.
Currently, my thought is to write a Python row insert trigger for the database which would write data to some file which would then be read by the node.js server.
However, this would be slow, as disk access would be involved. What would be a better way to connect these two applications?
Have you looked at Listen Notify functionality?
http://www.postgresql.org/docs/9.0/interactive/sql-notify.html
Also you will want to test out different options against your needs, instead of assuming one is not fast enough for what you need. Maybe your python approach will work just fine.