How can I troubleshoot why flask file system session data is lost

How can I troubleshoot why flask file system session data is lost - python

I have an application (running in Docker and managed by Marathon) where I use server side flask sessions - FileSystemSessionInterface (permanent sessions).
My problem is that if the user waits too long to go to the next step, the session data is lost.
One of my assumptions was that this is because of Marathon, which performs health checks of the application by making an http get request every 2 seconds. This results in a new session file open on every request. And my assumption was that the maximum number of open files is reached. However, when I check in the docker container how many session files are open, the number is not that big, around 350 files.
Did anyone have this problem, any ideas on why my session data disappears?

Related

How to structure and connect a python project (not a web application) with Django project?

I have a processing engine built in python and a driver program with several components that uses this engine to process some files.
See the pictorial representation here.
The Engine is used for math calculations.
The Driver program has several components.
Scanner keeps scanning a folder to check for new files, if found makes entry into DB by calling a API.
Scheduler picks new entries made by scanner and schedules them for processing (makes entry into 'jobs' table in DB)
Executer picks entries from job table and executes them using the engine and outputs new files.
All the components run as separate python process continuously. This is very in efficient, how can I improve this? The use of Django is to provide a DB (so the multiple processes could communicate) and keep a record of how many files are processed.
Next came a new requirement to manually check the processed files for errors so a UI was developed for this. Also the assess to the engine was to be made API based. See the new block diagram here
Now the entire thing is a huge mess in my opinion. For start, the Django now has to serve 2 different sets of API - one for the UI and other for the driver program. If the server stops, the UI stops working and also the Driver program stops working.
Since the engine is API based there is a huge amount of data passed to it in the request. The Engine takes several minutes (3 to 4) to process the files and most of the time the request to engine get timeout. The Driver program is started separately from terminal and it fails if Django server is not running as the DB APIs are required to schedule jobs and execute the jobs.
I want to ask what is the beast way to structure such projects.
should I keep the Engine and driver program logic inside Django? In this case how do I start the driver program?
Should I keep both of them outside Django, in which case how do I communicate with Django such that even if the Django server is down I can still keep processing the files.
I would really appreciate any sort of improvement ideas in any of the areas.

Need help troubleshooting Google App Engine job that worked in dev but not production

I have been working on a website for over a year now, using Django and Python3 primarily. A few of my buddies and I built a front end where a user enters some parameters and submits, this goes to the GAE to run the job and return the results.
In my local dev environment, everything works well. I have two separate dev environments. One builds the entire service up in a docker container. This produces the desired results in roughly 11 seconds. The other environment runs the source files locally on my computer and connects to the Postgres database hosted in Google Cloud. The Python application runs locally. It takes roughly 2 minutes for it to run locally, a lot of latency between the cloud and the post/gets from my local machine.
Once I perform the Gcloud app deploy and attempt to run in production, it never finishes. I have some print statements built into the code, I know it gets to the part where the submitted parameters go to the Python code. I monitor via this command on my local computer: gcloud app logs read.
I suspect that since my local computer is a beast (i7-7770 processor with 64 GB of RAM), it runs the whole thing no problem. But in the GAE, I don't think it's providing the proper machines to do the job efficiently (not enough compute, not enough RAM). That's my guess.
So, I need help in how to troubleshoot this. I tried changing my app.yaml file so that resources would scale to 16 GB of memory, but it would never deploy. I received an error 13.
One other note, after it spins around trying to run the job for 60 minutes, the website crashes and displays this message:
502 Server Error
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.

OK, so just in case anybody in the future is having a similar problem...the constant crashing of my Google App Engine workers was because of using Pandas dataframes in the production environment. I don't know exactly what Pandas was doing, but I kept getting Memory Errors, it would crash the site...and it didn't appear to be occurring in a single line of code. That is, it randomly happened somewhere in a Pandas Dataframe operation.
I am still using a Pandas Dataframe simply to read in a csv file. I then use
data_I_care_about = dict(zip(df.col1, df.col2))
#or
other_data = df.col3.values.tolist()
and then go to town with processing. As a note, on my local machine (my development environment basically) - it took 6 seconds to run from start to finish . That's a long time for a web request but I was in a hurry, thus why I used Pandas to begin with.
After refactoring, the same job completed in roughly 200ms using python lists and dicts (again, in my dev environment). The website is up and running very smoothly now. It takes a maximum of 7 seconds after pressing "Submit" for the back-end to return the data sets and render on the web page. Thanks for the help peeps!

Switch session behaviour based on the Flask route

Is it possible to switch Flask session backends on a per-route basis?
My site stores all sessions in Redis with RedisSessionInterface. Each time a new visitor comes a new session gets written to Redis, hence there's a problem when there're thousands of visitors (real visitors or not) come within a few hours I run out of RAM because Redis eats it.
I really need sessions for internal pages only (/administration_area) accessible by me, so I don't want a session to be started each time a visitor hits the index page (/).
Is it possible to start a new session only on specific routes or can we change the RedisSessionInterface implementation somehow so it won't open a new session on every route?

Right way to manage a high traffic connection application

Introduction
I am working on a GPS Listener, this is a service build on twisted python, this app receive at least 100 connections from gps devices, and it is working without issues, each GPS send data each 5 seconds, containing positions. ( the next week must be at least 200 gps devices connected )
Database
I am using a unique postgresql connection, this connection is shared between all gps devices connected for save and store information, postgresql is using pgbouncer as pooler
Server
I am using a small pc as server, and I need to find a way to have a high availability application with out loosing data
Problem
According with my high traffic on my app, I am having issues with memory data after 30 minutes start to appear as no saved, however queries are being executed on postgres ( I have checked that on last activity )
Fake Solution
I have amke a script that restart my app, postgres ang pgbouncer, however this is a wrong solution, because each time that I restart my app, gps get disconnected, and must to reconnected again
Posible Solution
I am thinking on a high availability solution based on a Data Layer, where each time when database have to be restarted or something happened, a txt file store data from gps devices.
For get it, I am thing on a no unique connection, I am thinking on a simple connection each time one data must be saved, and then test database, like a pooler, and then if database connection is wrong, the txt file store it, until database is ok again, and the other process read txt file and send info to database
Question
Since I am thinking on a app data pooler and a single connection each time when this data must be saved for try to no lost data, I want to know
Is ok making single connection each time that data is saved for this
kind of app, knowing that connections will be done more than 100 times
each 5 seconds?
As I said, my question is too simple, which one is the right way on working with db connections on a high traffic app? single connections per query or shared unique connection for all app.
The reason on looking this single question, is looking for the right way on working with db connections considering memory resources.
I am not looking for solve postgresql issues or performance, just to know the right way on working with this kind of applications. And that is the reason on give as much of possible about my application
Note
One more thing,I have seen one vote to close this question, that is related to no clear question, when the question is titled with the word "question" and was marked on italic, now I have marked as gray for notice people that dont read the word "question"
Thanks a lot

Databases do not just lose data willy-nilly. Not losing data is pretty much number one in their job description. If it seems to be losing data, you must be misusing transactions in your application. Figure out what you are doing wrong and fix it.
Making and breaking a connection between your app and pgbouncer for each transaction is not good for performance, but is not terrible either; and if that is what helps you fix your transaction boundaries then do that.

how to find out how much of file size served by nginx?

so I want to serve some file with django nginx solution. the problem is that many of files which is serving have a huge size and users have some quota to download files.
so how I can find out how much of file size serving to user? what I mean is that maybe user close download file connection. so how can I find the right size of serving?
thanks!

With Nginx you can log the amount of bandwidth used pushed to a log file by using the log_module but this is not exactly what you want, but can help achieve what you wish.
So, now you will have logs that you can parse to get the file sizes and total bandwidth used and then have a script that updates a database and then you can then authorize future downloads if their limit is reached or within some soft limit range.
Server Fault with similar question
Another option is, making an assumption, keep the file sizes in a database and just keep a tally at the request so when ever they hit a download link it immediately increments their download count and if it is over their limit then just invalidate the link else make the link valid and pass them over to Nginx.
Another option would be to write a custom Nginx module that performs the increment at a much more fine grained level, but this could be more wok than your situation requires.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.