User friendly SQLite database csv file import update solution - python

I was wondering if there is a way to allow a user to export a SQLite database as a .csv file, make some changes to it in a program like Excel, then upload that .csv file back to the table it came from using a record UPDATE method.
Currently I have a client that needed an inventory and pricing management system for their e-commerce store. I designed a database system and logic in Python 3 and SQLite. The system from a programming standpoint works flawlessly.
The problem I have is that there are some less then technical office staff that need to edit things like product markup within the database. Currently, I have them setup with SQLite DB Browser, from there they can edit products one at a time and write the changes to the database. They can also export tables to a .csv file for data manipulation in Excel.
The main issue is getting that .csv file back into the table it was exported from using an UPDATE method. When importing a .csv file to a table in SQLite DB Browser there is no way to perform an update import. It can only insert new rows by default and do to my table constraints that is a problem.
I like SQLite DB Browser because it is clean and simple and does exactly what I need. However, as soon as you have to edit more then one thing at a time and filter information in more complicated ways it starts to lack the functionality needed.
Is there a solution out there for SQLite DB Browser to tackle this problem? Is there a better software option all together to interact with a SQLite database that would give me that last bit of functionality?

Have you tried SQLiteForExcel? however, some coding is required.

So after researching some off the shelf options I found that the Devart Excel Add Ins did exactly what I needed. They are paid add ins, however, they seem to support almost all modern databases including SQlite. Once the add in is installed you can connect to a database and manipulate the data returned just like normal in Excel including bulk edits and advanced filtering, all changes are highlighted and can easily be written to the database with one click.
Overall I thought it was a pretty solid solution and everyone seems to be very happy with it as it made interacting with a database intuitive and non threatening to the more technically challenged.

Related

csv->Oracle DB autoscheduled import

I have basic csv report that is produced by other team on a daily basis, each report has 50k rows, those reports are saved on sharedrive everyday. And I have Oracle DB.
I need to create autoscheduled process (or at least less manual) to import those csv reports to Oracle DB. What solution would you recommend for it?
I did not find such solution in SQL Developer, since it is upload from file and not a query. I was thinking about python cron script, that will autoran on a daily basis and transform csv report to txt with needed SQL syntax (insert into...) and then python will connect to Oracle DB and will ran txt file as SQL command and insert data.
But this looks complicated.
Maybe you know other solution that you would recommend yo use?
Create an external table to allow you to access the content of the CSV as if it were a regular table. This assumes the file name does not change day-to-day.
Create a scheduled job to import the data in that external table and do whatever you want with it.
One common blocking issue that prevents using 'external tables' is that external tables require the data to be on the computer hosting the database. Not everyone has access to those servers. Or sometimes the external transfer of data to that machine + the data load to the DB is slower than doing a direct path load from the remote machine.
SQL*Loader with direct path load may be an option: https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-sql-loader.html#GUID-8D037494-07FA-4226-B507-E1B2ED10C144 This will be faster than Python.
If you do want to use Python, then read the cx_Oracle manual Batch Statement Execution and Bulk Loading. There is an example of reading from a CSV file.

Flask website backend structure guidance assistance?

I have a basic personal project website that I am looking to learn some web dev fundamentals with and database (SQL) fundamentals as well (If SQL is even the right technology to use??).
I have the basic skeleton up and running but as I am new to this, I want to make sure I am doing it in the most efficient and "correct" way possible.
Currently the site has a main index (landing) page and from there the user can select one of a few subpages. For the sake of understanding, each of these sub pages represents a different surf break and they each display relevant info about that particular break i.e. wave height, wind, tide.
As I have already been able to successfully scrape this data, my main questions revolve around how would I go about inserting this data into a database for future use (historical graphs, trends)? How would I ensure data is added to this database in a continuous manner (once/day)? How would I use data that was scraped from an earlier time, say at noon, to be displayed/used at 12:05 PM rather than scraping it again?
Any other tips, guidance, or resources you can point me to are much appreciated.
This kind of data is called time series. There are specialized database engines for time series, but with a not-extreme volume of observations - (timestamp, wave heigh, wind, tide, which break it is) tuples - a SQL database will be perfectly fine.
Try to model your data as a table in Postgres or MySQL. Start by making a table and manually inserting some fake data in a GUI client for your database. When it looks right, you have your schema. The corresponding CREATE TABLE statement is your DDL. You should be able to write SELECT queries against your table that yield the data you want to show on your webapp. If these queries are awkward, it's a sign that your schema needs revision. Save your DDL. It's (sort of) part of your source code. I imagine two tables: a listing of surf breaks, and a listing of observations. Each row in the listing of observations would reference the listing of surf breaks. If you're on a Mac, Sequel Pro is a decent tool for playing around with a MySQL database, and playing around is probably the best way to learn to use one.
Next, try to insert data to the table from a Python script. Starting with fake data is fine, but mold your Python script to read from your upstream source (the result of scraping) and insert into the table. What does your scraping code output? Is it a function you can call? A CSV you can read? That'll dictate how this script works.
It'll help if this import script is idempotent: you can run it multiple times and it won't make a mess by inserting duplicate rows. It'll also help if this is incremental: once your dataset grows large, it will be very expensive to recompute the whole thing. Try to deal with importing a specific interval at a time. A command-line tool is fine. You can specify the interval as a command-line argument, or figure out out from the current time.
The general problem here, loading data from one system into another on a regular schedule, is called ETL. You have a very simple case of it, and can use very simple tools, but if you want to read about it, that's what it's called. If instead you could get a continuous stream of observations - say, straight from the sensors - you would have a streaming ingestion problem.
You can use the Linux subsystem cron to make this script run on a schedule. You'll want to know whether it ran successfully - this opens a whole other can of worms about monitoring and alerting. There are various open-source systems that will let you emit metrics from your programs, basically a "hey, this happened" tick, see these metrics plotted on graphs, and ask to be emailed/texted/paged if something is happening too frequently or too infrequently. (These systems are, incidentally, one of the main applications of time-series databases). Don't get bogged down with this upfront, but keep it in mind. Statsd, Grafana, and Prometheus are some names to get you started Googling in this direction. You could also simply have your script send an email on success or failure, but people tend to start ignoring such emails.
You'll have written some functions to interact with your database engine. Extract these in a Python module. This forms the basis of your Data Access Layer. Reuse it in your Flask application. This will be easiest if you keep all this stuff in the same Git repository. You can use your chosen database engine's Python client directly, or you can use an abstraction layer like SQLAlchemy. This decision is controversial and people will have opinions, but just pick one. Whatever database API you pick, please learn what a SQL injection attack is and how to use user-supplied data in queries without opening yourself up to SQL injection. Your database API's documentation should cover the latter.
The / page of your Flask application will be based on a SQL query like SELECT * FROM surf_breaks. Render a link to the break-specific page for each one.
You'll have another page like /breaks/n where n identifies a surf break (an integer that increments as you insert surf break rows is customary). This page will be based on a query like SELECT * FROM observations WHERE surf_break_id = n. In each case, you'll call functions in your Data Access Layer for a list of rows, and then in a template, iterate through those rows and render some HTML. There are various Javascript and Python graphing libraries you can feed this list of rows into and get graphs out of (client side or server side). If you're interested in something like a week-over-week change, you should be able to express that in one SQL query and get that dataset directly from the database engine.
For performance, try not to get in a situation where more than one SQL query happens during a page load. By default, you'll be doing some unnecessary work by going back to the database and recomputing the page every time someone requests it. If this becomes a problem, you can add a reverse proxy cache in front of your Flask app. In your case this is easy, since nothing users do to the app cause its content to change. Simply invalidate the cache when you import new data.

Python or SQL: Populating an excel form (multiple times and saving outputs) from another table

Problem: Customer has requested we fill out a form (excel) for each item we provide them. Due to us providing them a large amount of parts, I would like to figure out a way to automate it as much as possible.
Idea: Create a table ('Data') with each part number and relevant information in the columns. Use Python to read 'Data' table, open blank customer form, populate blank customer form, and then save customer form.
Questions:
Can SQL accomplish this task as well? In relation to this task, I've only really created flat table outputs with SQL. Not really sure how this would work.
Recommended Python packages / documentation?
Similar example with code available? Just helps me learn being able to walk through something.
Any other ideas? Maybe I am tackling this issue the wrong way.
I am just unsure of my best path of action.
You could create a simple table on your SQL system (PostgreSQL, MySQL), so you can add modify simply your items.
Then you can export your table in excel format as the customer wants with:
Copy (Select * From foo) To '/tmp/test.csv' With CSV DELIMITER ',';
You can also do it with python, but i think it's more complicated to update item with python, with a SQL system you could create and HTML/PHP front-end page making it more customizable.

Peewee adding columns on demand

I have an sqlite database which I use as a data storage file for an application I develop in python.
Now the development of new features requires me to define new fields in the database. Is there a way, with peewee, of loading a database file, which used the old table definition (without new field) without getting an SQLError: no such column error?
Like an automatic insertion of the new field with a default value in the database. This would make life a lot easier for having backwards compatibility with opening database files from previous versions.
I've written a web-based tool called sqlite-web that will allow you to manage your database schema using a GUI.
If you want to add columns on the fly in your Python code, check out peewee's migration extension: http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#schema-migrations

Multiple pandas users connecting to SQL DB

New to Pandas & SQL. Haven't found an answer specific to this config, and not sure if standard SQL wisdom applies when introducing pandas to the mix.
Doing a school project that involves ~300 gb of data in ~6gb .csv chunks.
School advised syncing data via dropbox, but this seemed impractical for a 4-person team.
So, current solution is AWS EC2 & RDS instance (MySQL, I think it'll be, 1 table).
What I wanted to confirm before we start setting it up:
If multiple users are working with (and occasionally modifying) the data, can this arrangement manage conflicts? e.g., if user A uses pandas to construct a dataframe from a query, are the records in that query frozen if user B tries to work with them?
My assumption is that the data in the frame are in memory, and the records in the SQL database are free to be modified by others until the dataframe is written back to the db, but I'm hoping that either I'm wrong or there's a simple solution here (like a random sample query for each user or something).
A pandas DataFrame object does not interact directly with the db. Once you read it in it sits in memory locally. You would have to use a method like DataFrame.to_sql to write your changes back to the MySQL DB. For more information on reading and writing to SQL tables, see the pandas documentation here.

Categories