Right way to manage a high traffic connection application - python

Introduction
I am working on a GPS Listener, this is a service build on twisted python, this app receive at least 100 connections from gps devices, and it is working without issues, each GPS send data each 5 seconds, containing positions. ( the next week must be at least 200 gps devices connected )
Database
I am using a unique postgresql connection, this connection is shared between all gps devices connected for save and store information, postgresql is using pgbouncer as pooler
Server
I am using a small pc as server, and I need to find a way to have a high availability application with out loosing data
Problem
According with my high traffic on my app, I am having issues with memory data after 30 minutes start to appear as no saved, however queries are being executed on postgres ( I have checked that on last activity )
Fake Solution
I have amke a script that restart my app, postgres ang pgbouncer, however this is a wrong solution, because each time that I restart my app, gps get disconnected, and must to reconnected again
Posible Solution
I am thinking on a high availability solution based on a Data Layer, where each time when database have to be restarted or something happened, a txt file store data from gps devices.
For get it, I am thing on a no unique connection, I am thinking on a simple connection each time one data must be saved, and then test database, like a pooler, and then if database connection is wrong, the txt file store it, until database is ok again, and the other process read txt file and send info to database
Question
Since I am thinking on a app data pooler and a single connection each time when this data must be saved for try to no lost data, I want to know
Is ok making single connection each time that data is saved for this
kind of app, knowing that connections will be done more than 100 times
each 5 seconds?
As I said, my question is too simple, which one is the right way on working with db connections on a high traffic app? single connections per query or shared unique connection for all app.
The reason on looking this single question, is looking for the right way on working with db connections considering memory resources.
I am not looking for solve postgresql issues or performance, just to know the right way on working with this kind of applications. And that is the reason on give as much of possible about my application
Note
One more thing,I have seen one vote to close this question, that is related to no clear question, when the question is titled with the word "question" and was marked on italic, now I have marked as gray for notice people that dont read the word "question"
Thanks a lot

Databases do not just lose data willy-nilly. Not losing data is pretty much number one in their job description. If it seems to be losing data, you must be misusing transactions in your application. Figure out what you are doing wrong and fix it.
Making and breaking a connection between your app and pgbouncer for each transaction is not good for performance, but is not terrible either; and if that is what helps you fix your transaction boundaries then do that.

Related

How can I optimize an alert system that processes 10k requests / job?

I'm build a solution Match Service where receive data from a third party provider from MQTT server. This data is a realtime data. We save this data in RDS Cluster.
Our users can create in another service a filter called Strateg, we send a cron every 5 minutes to this service and all records in database are send to Kafka topic to be processed in Match Service.
My design is based on events, so each new Strategy record in topic, Match Service performs a query in database for check if have any Match that active the Strategy threshold. If the threshold is passed, it sends out an new message to broker.
The API processes about 10k Strategy in each job, it's taking timing (about 250s for each job).
So my question is if there is a better way to design this system? I was thinking of adding a redis-layer, to avoid database transactions.
All suggestions welcome!
Think long and hard about your relational data store. If you really need it to be relational, then it may absolutely make sense, but if not, a relational database is often a horrible place to dump things like time-series and IoT output. It's a great place to put normalized, structured data for reporting, but a lousy dump/load location and real-time matching.
Look more at something like AWS RedShift, ElasticSearch, or some other no-sql solution that can ingest and match things at orders of magnitude higher scale.

web service for recommendation system

I'm trying to build a recommendation system with python using lightfm library and an api created with Flask framework.
My question is more design related than coding.
The webservice which will be called when a user logs in the website, recieves a json with userid and return a json with userid and 5 product sku to be recommended.
My desire is to save those recommendations in a DB. I want to do that because in this way I can see and comparing this table with other tables in DB and find out if a user has purchased the product that I recommended.
My concern (maybe it's stupid) is that everything will slow down if I open a connection to DB and write data in it.
Potentially the service can be called between 5k to 7k times per day.
Thanks
What I've understood from your explanation is that you will be comparing the actual selected data by the user and the ones you recommended. So, considering you are comparing every week once, it won't affect much of your processing.
Your concern is, would everything slow down if a DB connection is opened?
It won't slow down the service. Considering the usage of service of 5k times per day, other major factors are there which will slow the service down or will cause it to stop. Like when the number of users is too high, one python process will fail.
What you need to do here is, use a web application server like Gunicorn or uwsgi Using Gunicorn with Flask
This way, what gunicorn does is it starts multiple python processes running flask so it will support a high number of concurrent users.

Sending data to Django backend from RaspberryPi Sensor (frequency, bulk-update, robustness)

I’m currently working on a Raspberry Pi/Django project slightly more complex that i’m used to. (i either do local raspberry pi projects, or simple Django websites; never the two combined!)
The idea is two have two Raspberry Pi’s collecting information running a local Python script, that would each take input from one HDMI feed (i’ve got all that part figured out - I THINK) using image processing. Now i want these two Raspberry Pi’s (that don’t talk to each other) to connect to a backend server that would combine, store (and process) the information gathered by my two Pis
I’m expecting each Pi to be working on one frame per second, comparing it to the frame a second earlier (only a few different things he is looking out for) isolate any new event, and send it to the server. I’m therefore expecting no more than a dozen binary timestamped data points per second.
Now what is the smart way to do it here ?
Do i make contact to the backend every second? Every 10 seconds?
How do i make these bulk HttpRequests ? Through a POST request? Through a simple text file that i send for the Django backend to process? (i have found some info about “bulk updates” for django but i’m not sure that covers it entirely)
How do i make it robust? How do i make sure that all data what successfully transmitted before deleting the log locally ? (if one call fails for a reason, or gets delayed, how do i make sure that the next one compensates for lost info?
Basically, i’m asking advise for making a IOT based project, where a sensor gathers bulk information and want to send it to a backend server for processing, and how should that archiving process be designed.
PS: i expect the image processing part (at one fps) to be fast enough on my Pi Zero (as it is VERY simple); backlog at that level shouldn’t be an issue.
PPS: i’m using a django backend (even if it seems a little overkill)
a/ because i already know the framework pretty well
b/ because i’m expecting to build real-time performance indicators from the combined data points gathered, using django, and displaying them in (almost) real-time on a webpage.
Thank you very much !
This partly depends on just how resilient you need it to be. If you really can't afford for a single update to be lost, I would consider using a message queue such as RabbitMQ - the clients would add things directly to the queue and the server would pop them off in turn, with no need to involve HTTP requests at all.
Otherwise it would be much simpler to just POST each frame's data in some serialized format (ie JSON) and Django would simply deserialize and iterate through the list, saving each entry to the db. This should be fast enough for the rate you describe - I'd expect saving a dozen db entries to take significantly less than half a second - but this still leaves the problem of what to do if things get hung up for some reason. Setting a super-short timeout on the server will help, as would keeping the data to be posted until you have confirmation that it has been saved - and creating unique IDs in the client to ensure that the request is idempotent.

Keeping partly-offline sqlite db in sync with postgresql

This question is more on architecture and libs, than on implementation.
I am currently working at project, which requires a local long-term cache storage (updated once a day) at client kept in sync with a remote db at server. For client side sqlite has been chosen as a lightweight approach and postgresql as feature rich db at server. Native replication mechanisms of postgres are no-opt cause I need to keep client really lightweight and free of relying on external components like db servers.
The implementation language would be Python. Now I'm looking at ORMs like SQLAlchemy, but haven't worked with any before.
Does SQLAlchemy have any tools to keep sqlite and postgres dbs in sync?
If not, are there any other Python libraries which have such tools?
Any ideas about how should the architecture look like, if the task must be solved "by hand"?
Added:
It's like telemetry, cause client would have internet connection only for approximately 20 minutes a day
So, the main question is about architecure of such a system
It doesn't usually fall within the tasks of an ORM to sync data between databases, so you will likely have to implement it yourself. I don't know of any solution that will handle syncing for you given your choice of databases.
There are a couple important design choices to consider:
how do you figure out what data changed ( i.e. inserted, updated or deleted )
what is the most efficient way to package the change-log
will you have to deal with conflicts ? and how will you do that.
The most efficient way to figure out what changed is to have the database tell you that directly. Bottled water can offer some inspiration in this regard. The idea is to tap into the event log postgres would use for replication. You will need something like Kafka to keep track of what each of your clients already knows. This will allow you to optimize your server for writes, as you won't have clients querying trying to figure out what changed since they were last online.
The same can be achieved on the sqlight end with event callbacks, you'll just have to trade some storage space on the client to retain the changes to be sent to the server. If that sounds like too much infrastructure for your needs, it's something that you can easily implement with SQL and pooling as well, but I would still think of it as an event log, and consider how it's implemented a detail - possibly allowing for a more efficient implementation lather on.
The best way to structure and package your change log will depend on your applications requirements, available band-with, etc. You could use standard formats such as json, compress and encrypt if needed.
It will be much simpler to design your application as such to avoid conflicts, and possibly flow data in a single direction, or partition your data so that it always flows in a single direction for a specific partition.
One final taught is that with such an architecture you would be getting incremental updates, some of which might be missed for unplanned reasons ( system failure, bugs, dropped messages, etc ). You could have some built in heuristic to check that your data matches, like at least checking the number of records on each side, with some way to recover such a fault, at a minimal a way to manually re-fetch the data from the authoritative source, i.e. if the server is authoritative, the client should be able to discard it's data and re-fetch it. You might need such a mechanism anyway for cases wen the client is reinstalled, etc.

SQLite over a network

I am creating a Python application that uses embedded SQLite databases. The programme creates the db files and they are on a shared network drive. At this point there will be no more than 5 computers on the network running the programme.
My initial thought was to ask the user on startup if they are the server or client. If they are the server then they create the database. If they are the client they must find a server instance on the network. The one way I suppose is to send all db commands from client to server and server implements in the database. Will that solve the shared db issue?
Alternatively, is there some way to create a SQLite "server". I presume this would be the quicker option if available?
Note: I can't use a server engine such as MySQL or PostgreSQL at this point but I am implementing using ORM and so when this becomes viable, it should be easy to change over.
Here's a "SQLite Server", http://sqliteserver.xhost.ro/, but it looks like not in maintain for years.
SQLite supports concurrency itself, multiple processes can read data at one time and only one can write data into it. Also, When some process is writing, it'll lock the whole database file for a few seconds and others have to wait in the mean time according official document.
I guess this is sufficient for 5 processes as yor scenario. Just you need to write codes to handle the waiting.

Categories