I'm developing a GUI in Python in Visual Studio Code, which is connected to SQL Server 2019 database. I created a number of tables in the database, taking normalization into consideration.
I'm not quite sure which approach I should take, I need to perform CRUD operation on multiple tables tables simultaneously from the GUI. Would using stored procedures be the best option, and will i have to create the stored procedure in SQL Server 2019 and then call the procedure name in VSC? Is there a more effective way to achieve this?
Thanks
Related
I am working on a project now where I need to load daily data from one psql database into another one (both databases are on separate remote machines).
The Postgres version I'm using is 9.5, and due to our infrastructure, I am currently doing this using python scripts, which works fine for now, although I was wondering:
Is it possible to do this using psql commands that I can easily schedule? or is python a flexible enough appproach for future developments?
EDIT:
The main database contains a backend connected directly to a website and the other contains an analytics system which basically only needs to read the main db's data and store future transformations of it.
The latency is not very important, what is important is the reliability and simplicity.
sure, you can use psql and an ssh connection if you want.
this approach (or using pg_dump) can be useful as way to reduce the effexcts of latency.
however note that the SQL insert...values command can insert several rows in a single command. When I use python scripts to migrate data I build insert commands that insert up-to 1000 rows, thus reducing latency by a factor of 1000,
Another approach worth considering is dblink which allows postgres to query a remote postgres directly, so you could do a select from the remote database and insert the result into a local table.
Postgres-FDW may be worth a look too.
So I have a Google sheet that maintains a lot of data. I also have a MySQL DB with a huge junk of data. There is a vital piece of information in the Sheet that is also present in the DB. Both needs to be in sync. The information always enters the Sheet first. I had a python script with mysql queries to update my database separately.
Now the work flow has changed. Data will enter the sheet and whenever that happens the database has to updated automatically.
After some research, I found that using the onEdit function of Google AppScript (I learned from here.), I could pickup when the file has changed.
The Next step is to fetch the data from relevant cell, which I can do using this.
Now I need to connect to the DB and send some queries. This is where I am stuck.
Approach 1:
Have a python web-app running live. Send the data via UrlFetchApp.This I yet have to try.
Approach 2:
Connect to mySQL remotely through appscript. But I am not sure this is possible after 2-3 hours of reading the docs.
So this is my scenario. Any viable solution you can think of or a better approach?
Connect directly to mySQL. You likely missed reading this part https://developers.google.com/apps-script/guides/jdbc
Using JDBC within Apps Script will work if you have the time to build this yourself.
If you don't want to roll your own solution, check out SeekWell. It allows you to connect to databases and write SQL queries directly in Sheets. You can create a run a “Run Sheet” that will run multiple queries at once and schedule those queries to be run without you even opening the Sheet.
Disclaimer: I made this.
Is there a way of making pandas (or sqlalchemy) output the SQL that would be executed by a call to to_sql() instead of actually executing it? This would be handy in many cases where I actually need to update multiple databases with the same data where python and pandas only exists in one of my machines.
According to the doc, use the echo parameter as:
engine = create_engine("mysql://scott:tiger#hostname/dbname", echo=True)
This is more a process question than a programming one. First, is the use of multiple databases. Relational databases management systems (RDMBS) are designed as multiple-user systems for many simultaneous users/apps/clients/machines. Designed to run as ONE system, the database serves as the central repository for related applications. Some argue databases should be agnostic to apps and be data-centric (Postgre folks) and others believe databases should be app-centric (MySQL folks). Overall, understand they are more involved than a flatfile spreadsheet or data frame.
Usually, RDMS's come in two structural types:
file level systems like SQLite and MS Access (where databases reside in a file saved to CPU directory); these systems though still powerful and multi-user mostly serve for smaller business applications with relatively handful of users or team sizes
server-level systems like SQL Server, MySQL, PostgreSQL, DB2, Oracle (where databases run over a network without any localized file); these systems serve as enterprise level systems to run full-scale business operations run over LAN intranets or web networks.
Meanwhile, Pandas is not a database but a data analysis toolkit (much like MS Excel) though it can import/export queried resultsets from RDMS's. Therefore, it maintains no native SQL dialect for DDL/DML procedures. Moreover, pandas runs in memory on the OS calling the Python script and cannot be shared by other clients/machines. Pandas does not track changes like you intend in order to know the different states of a data frame during runtime of script unless you design it that way with a before and after and identify column/row changes.
With that mouthful said, why not use ONE database and have your Python script serve as just another of the many clients that connect to the database to import/export data into data frame. Hence, after every data frame change actually run the to_sql(). Recall pandas' to_sql uses the if_exists argument:
# DROPS TABLE, RECREATES IT, AND UPDATES IT
df.to_sql(name='tablename', con=conn, if_exists='replace')
# APPENDS DF DATA TO EXISTING TABLE
df.to_sql(name='tablename', con=conn, if_exists='append')
In turn, every app/machine that connects to the centralized database will only need to refresh their instance and current data would be available in real-time for their end use needs. Though of course, table-locking states can be an issue in multi-user environments if another user had a table record in edit mode while your script tried updating it. But transactions here may help.
I am working on a project involving insertion a lot of data in to the database. I am wondering if anybody knows how to fill 2 or 3 tables in the database at the same time.An example or psueodecode would be helpful.
Thanks
If you have a lot of data to insert into the database all at once, then you probably are interested in bulk loading data. The ideal tool for that is the bulk loader that likely comes with your database -- Oracle, Microsoft SQL Server, Sybase SQL Server, and MySQL (to name the ones that come to mind) all have bulk loaders. For example, Microsoft has the bulk insert statement and the bcp program to perform this task. I recommend you look into that rather than rigging up some tool in python, with or without threads.
I am working on a program to automate parsing data from XML files and storing it into several databases. (Specifically the USGS realtime water quality service, if anyone's interested, at http://waterservices.usgs.gov/rest/WaterML-Interim-REST-Service.html) It's written in Python 2.5.1 using LXML and PYODBC. The databases are in Microsoft Access 2000.
The connection function is as follows:
def get_AccessConnection(db):
connString = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=' + db
cnxn = pyodbc.connect(connString, autocommit=False)
cursor = cnxn.cursor()
return cnxn, cursor
where db is the filepath to the database.
The program:
a) opens the connection to the database
b) parses 2 to 8 XML files for that database and builds the values from them into a series of records to insert into the database (using a nested dictionary structure, not a user-defined type)
c) loops through the series of records, cursor.execute()-ing an SQL query for each one
d) commits and closes the database connection
If the cursor.execute() call throws an error, it writes the traceback and the query to the log file and moves on.
When my coworker runs it on his machine, for one particular database, specific records will simply not be there, with no errors recorded. When I run the exact same code on the exact same copy of the database over the exact same network path from my machine, all the data that should be there is there.
My coworker and I are both on Windows XP computers with Microsoft Access 2000 and the same versions of Python, lxml, and pyodbc installed. I have no idea how to check whether we have the same version of the Microsoft ODBC drivers. I haven't been able to find any difference between the records that are there and the records that aren't. I'm in the process of testing whether the same problem happens with the other databases, and whether it happens on a third coworker's computer as well.
What I'd really like to know is ANYTHING anyone can think of that would cause this, because it doesn't make sense to me. To summarize: Python code executing SQL queries will silently fail half of them on one computer and work perfectly on another.
Edit:
No more problem. I just had my coworker run it again, and the database was updated completely with no missing records. Still no idea why it failed in the first place, nor whether or not it will happen again, but "problem solved."
I have no idea how to check whether
we have the same version of the
Microsoft ODBC drivers.
I think you're looking for Control Panel | Administrative Tools | Data Sources (ODBC). Click the "Drivers" tab.
I think either Access 2000 or Office 2000 shipped with a desktop edition of SQL Server called "MSDE". Might be worth installing that for testing. (Or production, for that matter.)