Scheduled Background Database Updates using SQLAlchemy and Pyramid

Scheduled Background Database Updates using SQLAlchemy and Pyramid - python

I'm building a web application which utilizes SQLAlchemy to store and retrieve data. My goal is to update the SQLite database on a scheduled daily basis in the background as the app is constantly running. My current approach works as the following:
The SQLite database is first initialized and built from the script: initializedb.py by reading through a series of text files and adding the proper information as rows in a table to the database
The Pyramid app is then run and is accessible via localhost:6543
The user is then able to access a list read from the SQLite database, rendered using a Jinja2 template
My app will be constantly running 24/7 so that the user can access this list whenever. Because the text files which I initialize the database from are constantly updating, I want to be able to update the database as well everyday. My main question is this:
How would I automatically update the database on a daily basis using SQLAlchemy and Pyramid?
Should the code to update the database periodically be done on a script running separately from the app, or should it be done in the Pyramid code itself, such as in views.py?

Use cron to schedule regular tasks
Just use cron. Run your initialise code once per day to recreate the database.
If you need to be more sophisticated you can use celery for more advanced stuff. But I think cron would be the best place to start.
Should you make the database primary?
You should try to have only one copy of your data. It sounds like you have text files and are 'importing' them into a database. But it sounds like your text files are updating regularly by some other process.
An alternative approach is to make the database the canonical version of the data. You could create an administrative interface in your app to update the database.
If the data come in via automatic processes then perhaps you could create an import script to take new data.
This could be done via a command line script. Just add this kind of thing to your setup.py
entry_points = """\
[paste.app_factory]
main = myapp:main
[console_scripts]
some_script = myapp.scripts.script:main
another_script = myapp.any_module:some_function
"""

Related

Best approach to an API update project

I'm working in a personal project to segment some data from the Sendinblue Api (CRM Service). Basically what I try to achieve is generate a new score attribute to each user base on his emailing behavior. For that proposed, the process I've plan is as follows:
Get data from the API
Store in database
Analysis and segment the data with Python
Create and update score attribute in Sendin every 24 hours
The Api has a Rate limiting 400 request per minute, we are talking about 100k registers right now which means I have to spend like 3 hours to get all the initial data (currently I'm using concurrent futures to multiprocessing). After that I'll plan to store and update only the registers who present changes. I'm wondering if this is the best way to do it and which combinations of tools is better for this job.
Right now I have all my script in Jupyter notebooks and I recently finished my first Django project, so I don't know if I need a django app for this one or just simple connect the notebook to a database (PostgreSQL?), and if this last one is possible which library I have to learn to run my script every 24 hours. (i'm a beginner). Thanks!

I don't think you need Django except you want a web to view your data. Even so you can write any web application to view your statistic data with any framework/language. So I think the approach is simpler:
Create your python project, entry point main function will execute logic to fetch data from API. Once it's done, you can start logic to analyze and statistic then save result in database.
If you can query to view your final result by SQL, you don't need to build web application. Otherwise you might want to build a small web application to pull data from database to view statistic in charts or export in any prefer format.
Setup a linux cron job to execute python code at #1 and let it run every 24 at paticular time you want. Link: https://phoenixnap.com/kb/set-up-cron-job-linux

How to structure and connect a python project (not a web application) with Django project?

I have a processing engine built in python and a driver program with several components that uses this engine to process some files.
See the pictorial representation here.
The Engine is used for math calculations.
The Driver program has several components.
Scanner keeps scanning a folder to check for new files, if found makes entry into DB by calling a API.
Scheduler picks new entries made by scanner and schedules them for processing (makes entry into 'jobs' table in DB)
Executer picks entries from job table and executes them using the engine and outputs new files.
All the components run as separate python process continuously. This is very in efficient, how can I improve this? The use of Django is to provide a DB (so the multiple processes could communicate) and keep a record of how many files are processed.
Next came a new requirement to manually check the processed files for errors so a UI was developed for this. Also the assess to the engine was to be made API based. See the new block diagram here
Now the entire thing is a huge mess in my opinion. For start, the Django now has to serve 2 different sets of API - one for the UI and other for the driver program. If the server stops, the UI stops working and also the Driver program stops working.
Since the engine is API based there is a huge amount of data passed to it in the request. The Engine takes several minutes (3 to 4) to process the files and most of the time the request to engine get timeout. The Driver program is started separately from terminal and it fails if Django server is not running as the DB APIs are required to schedule jobs and execute the jobs.
I want to ask what is the beast way to structure such projects.
should I keep the Engine and driver program logic inside Django? In this case how do I start the driver program?
Should I keep both of them outside Django, in which case how do I communicate with Django such that even if the Django server is down I can still keep processing the files.
I would really appreciate any sort of improvement ideas in any of the areas.

Writing a Django backend program that runs indefinitely -- what to keep in mind?

I am trying to write a Django app that queries a remote database for some data, performs some calculations on a portion of this data and stores the results (in the local database using Django models). It also filters another portion and stores the result separately. My front end then queries my Django database for these processed data and displays them to the user.
My questions are:
How do I write an agent program that continuously runs in the backend, downloads data from the remote database, does calculations/ filtering and stores the result in the local Django database ? Particularly, what are the most important things to keep in mind when writing a program that runs indefinitely?
Is using cron for this purpose a good idea ?
The data retrieved from the remote database belong to multiple users and each user's data must be kept/ stored separately in my local database as well. How do I achieve that? using row-level/ class-instance level permissions maybe? Remember that the backend agent does the storage, update and delete. Front end only reads data (through http requests).
And finally, I allow creation of new users. If a new user has valid credentials for the remote database the user should be allowed to use my app. In which case, my backend will download this particular user's data from the remote database, performs calculations/ filtering and presents the results to the user. How can I handle the dynamic creation of objects/ database tables for the new users? and how can I differentiate between users' data when retrieving them ?
Would very much appreciate answers from experienced programmers with knowledge of Django. Thank you.

For
1) The standard get-go solution for timed and background task is Celery which has Django integration. There are others, like Huey https://github.com/coleifer/huey
2) The usual solution is that each row contains user_id column for which this data belongs to. This maps to User model using Django ORM's ForeignKey field. Do your users to need to query the database directly or do they have direct database accounts? If not then this solution should be enough. It sounds like it your front end has 1 database connection and all permission logic is handled by the front end, not the database itself.
3) See 2

How to ship stored procedures (PL/PGSQL) to be used by Pyramid web app to the environment?

I am working on a web app written using Pyramid web application. Using MySQL to store the relational stuff. But the web app is also a data storing facility and we use Postgres for that purpose.
Note that each user's account gets its own connection parameters in Postgres. The hosts running Postgres is not going to be the same for users.
We have a couple of stored procedures that are essential for the app to function. I was wondering how to ship the procedures to each Postgres database instance. I would like to make sure that it is pretty easy to update them as well.
Here is what I could come up with so far.
I have a file in the app's code base called procedures.sql
CREATE FUNCTION {procedure_name_1} (text, text,
max_split integer) RETURNS text AS $$
BEGIN
-- do stuff --
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION {procedure_name_2} (text, text,
max_split integer) RETURNS text AS $$
BEGIN
-- do stuff --
END;
$$ LANGUAGE plpgsql;
Whenever a user wants to talk to his DB, I execute _add_procedures_to_db function from the python app.
procedure_name_map = {
'procedure_name_1': 'function_1_v1',
'procedure_name_2': 'function_2_v1'
}
def _add_procedures_to_db(connection):
cursor = connection.Cursor()
with open(PROCEDURE_FILE_PATH) as f:
sql = f.read().format(**procedure_name_map)
try:
cursor.execute(sql)
connection.commit()
except:
pass
Note that connection params will be obtained when we want to do some transaction within web response cycle from MySQL DB.
Strategy is to change function_1_v1 to function_1_v2 in case I update the code for the procedure.
But this seems like such an expensive way to do this as each time I want to connect, I will get an exception that has to be handled after first time.
So here is my question:
Is there another way to do this from within the web app code? Or should I make procedure updates a part of deployment and configuration rather than an app layer thing?

If you are looking how to change the database (tables, views, stored procedurues) between different Pyramid web application version deployments that's usually called migration.
If you are using SQLAlchemy you can use automated migration tools like Alembic
If you are using raw SQL commands, then you need to write and run a custom command line script each time you deploy an application with different version. This command line script would prepare the database for the current application version. This would include running ALTER TABLE SQL commands, etc.

Storing session data in database in Pyramid using beaker

I'm building a web application in Pyramid and it needs user logins. Database backend is a MySQL DB connected via SQLAalchemy.
Pyramid has an introduction on using beaker for sessions, but it only shows how to configure it using files. I couldn't find out how to store session data in the database (I think it should be possible), since then I would have only one place were my varying data is stored.

I found it. Put something like this in your configuration file (development.ini/production.ini)
session.type=ext:database
session.secret=someThingReallyReallySecret
session.cookie_expires=true
session.key=WhatEver
session.url=mysql://user:password#host/database
session.timeout=3000
session.lock_dir=%(here)s/var/lock
I don't is if it is possible (or sensible) to put locking to the DB, too, but the sessions should live in the DB like this. You'll need to take care to delete old sessions from the DB yourself (but I think that's the case when using files, too).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.