Tools for initializing a database with GAE - python

I'm looking for a way to initialize a database with a set of preliminary data. I've been looking around, but haven't seem to find anything that exactly matches what I'd like to do. Any suggestions?

The Datastore Administration allow to copy data to another app. But if you need this tool for testing your app in local, as I suppose, you could write a script.

You can also do this programatically. Write a function that initializes your database and call it from inside a dedicated URL handler, e.g.
http://myapp.com/prepareDb
See here how to configure this URL so that only admins of the app have permission to access it.
An HTTP request coming from a user has 60 seconds to complete before it is aborted by app engine. If, for some reason, you need more than 60 seconds to prepare your database, do it from a cron job (which has 10 minutes to finish).

Related

Best approach to an API update project

I'm working in a personal project to segment some data from the Sendinblue Api (CRM Service). Basically what I try to achieve is generate a new score attribute to each user base on his emailing behavior. For that proposed, the process I've plan is as follows:
Get data from the API
Store in database
Analysis and segment the data with Python
Create and update score attribute in Sendin every 24 hours
The Api has a Rate limiting 400 request per minute, we are talking about 100k registers right now which means I have to spend like 3 hours to get all the initial data (currently I'm using concurrent futures to multiprocessing). After that I'll plan to store and update only the registers who present changes. I'm wondering if this is the best way to do it and which combinations of tools is better for this job.
Right now I have all my script in Jupyter notebooks and I recently finished my first Django project, so I don't know if I need a django app for this one or just simple connect the notebook to a database (PostgreSQL?), and if this last one is possible which library I have to learn to run my script every 24 hours. (i'm a beginner). Thanks!
I don't think you need Django except you want a web to view your data. Even so you can write any web application to view your statistic data with any framework/language. So I think the approach is simpler:
Create your python project, entry point main function will execute logic to fetch data from API. Once it's done, you can start logic to analyze and statistic then save result in database.
If you can query to view your final result by SQL, you don't need to build web application. Otherwise you might want to build a small web application to pull data from database to view statistic in charts or export in any prefer format.
Setup a linux cron job to execute python code at #1 and let it run every 24 at paticular time you want. Link: https://phoenixnap.com/kb/set-up-cron-job-linux

mongodb - app-users vs db-users login?

I am working on an app with a locally stored mongodb instance and I am strugling with the design of how app-users should be stored in order to implement in-app login.
In one hand, Mongodb provides a solid access control and authentication for db users, with the ability to define roles, actions and privileges. So I feel tempted to leverage this to implement my app-users storage.
On the other hand, considering it uses a system collection, I get the feeling, and from at least this thread I am getting it right, that this user management provided by mongodb should be used to manage db-user accounts only (that would be software that access the database), not app-user accounts (people who use the software that access the database).
So I am thinking my storage schema should look something like this:
system.
users #for db-users (apps and services)
other system cols
...
myappdb.
users #for app-users (actual people using the app)
other app cols
...
So, in order to log into my app, I need a first set of credentials (db-user) so the app can log into my db so I can retrieve app-user credentials in order to log this person into my app when they type their own credentials.
Question 1: does this make sense?
Question 2: if yes, how do I hide my db-user credentials then? because I get the feeling this should not be hardcoded and I am not finding a way to make the connection to the database without it being so.
Question 3: if not, what would be an appropriate way to deal with this? links and articles are welcome.

Getting timeout on django application

I'm currently working on a django application. I can't add an element to my database on the admin view. I fill all the information but when I click on save button but the operation doesn't finish and I get a timeout. I use sqlite3 as database.
My question is there any one that know the origin of this problem. If not how could I investigate the problem. When I worked with other language (Java, C ...etc) when I have a problem I can use a debugger. What are the options I have?
This Problem can occur because of following reasons:
(Less Probable) You computation code is too Slow: Which is a rarity because the Timeout is set to about 1 minute or so, and code doesn't take that time to Execute
Your app is waiting on some external resource but it is not Responding. For this you will have to check for the Django logs and check if some external resource error is there
(Most Probable) Database taking too much time: This can occur either because:
App can't connect to Database: For this you have to check database logs OR try and connect manually with database through python manage.py dbshell
DB Query Taking so much time to execute: You can test this by checking database logs for how much time a query is taking OR you can connect manually via dbshell and make the same query there
Your can also use tools Like django-profiler , Django debug toolbar etc for debugging purposes. and for native python code python debugger

Google AppEngine and Threaded Workers

I am currently trying to develop something using Google AppEngine, I am using Python as my runtime and require some advise on setting up the following.
I am running a webserver that provides JSON data to clients, The data comes from an external service in which I have to pull the data from.
What I need to be able to do is run a background system that will check the memcache to see if there are any required ID's, if there is an ID I need to fetch some data for that ID from the external source and place the data in the memecache.
If there are multiple id's, > 30 I need to be able to pull all 30 request as quickly and efficiently as possible.
I am new to Python Development and AppEngine so any advise you guys could give would be great.
Thanks.
You can use "backends" or "task queues" to run processes in the background. Tasks have a 10-minute run time limit, and backends have no run time limit. There's also a cronjob mechanism which can trigger requests at regular intervals.
You can fetch the data from external servers with the "URLFetch" service.
Note that using memcache as the communication mechanism between front-end and back-end is unreliable -- the contents of memcache may be partially or fully erased at any time (and it does happen from time to time).
Also note that you can't query memcache of you don't know the exact keys ahead of time. It's probably better to use the task queue to queue up requests instead of using memcache, or using the datastore as a storage mechanism.

Web application: Hold large object between requests

I'm working on a web application related to genome searching. This application makes use of this suffix tree library through Cython bindings. Objects of this type are large (hundreds of MB up to ~10GB) and take as long to load from disk as it takes to process them in response to a page request. I'm looking for a way to load several of these objects once on server boot and then use them for all page requests.
I have tried using a remote manager / client setup using the multiprocessing module, modeled after this demo, but it fails when the client connects with an error message that says the object is not picklable.
I would suggest writing a small Flask (or even raw WSGI… But it's probably simpler to use Flask, as it will be easier to get up and running quickly) application which loads the genome database then exposes a simple API. Something like this:
app = Flask(__name__)
database = load_database()
#app.route('/get_genomes')
def get_genomes():
return database.all_genomes()
app.run(debug=True)
Or, you know, something a bit more sensible.
Also, if you need to be handling more than one request at a time (I believe that app.run will only handle one at a time), start by threading… And if that's too slow, you can os.fork() after the database is loaded and run multiple request handlers from there (that way they will all share the same database in memory).

Categories