Large scale pyramid application with multiple databases [closed]

Large scale pyramid application with multiple databases [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
This is less of a question, and more of a "am I on the right track with my train of thought?" topic.
I have an idea for a fairly large scale application written using Pyramid, using multiple databases, and wanted to post some of my musings to see if what I have in mind would work. I'm in the process of designing the application, but haven't started writing any code yet, to test out if it would work or not.
I'm considering it large scale because I anticipate the database growing significantly.
Main points:
The main route would be in the form of www.domain.com/[name]/whatever, where [name] is the key parameter that will decide what data to present to the client.
There will be multiple databases: 1 database, let's say site.db that contains site-wide information and configs, and the rest of the databases will be [name].db, containing each individuals user-specific data. This is where I expect the bulk of the scaling to happen, as I want to design the application to accept 100s of databases, one per user, to compartmentalize the data and prevent data cross-contamination.
Options:
2 databases. site.db and userdata.db. I would use the same database models, but the table names would be determined dynamically using the [name] parameter as such: [name]_table1, [name]_table2, etc, where table1/2/n would be a lot more descriptive than that. This may be simpler in the sense that I'd only ever have 2 database sessions to work with, and as long as I keep the sessions separate, I should be able to query each one individually. The downside is that the userdata.db can grow large with 100s of users. In theory, shouldn't be a problem if I can query the right tables using the [name] parameter.
1 + n databases. Managing database sessions could be a pain here, but one possible solution might be to scan a folder for all .db files, and create a session for each database, and build out a dictionary of sessions, where the dictionary key would be the file name. Something like sessions['site'] points to the DB session that handles site-wide data, while session['name'] points to a session that manipulates name.db - if that makes sense. This way, I can use the [name] parameter in the route to use the appropriate database session.
Aside from the above, there will be a sub-application as well, in the form of an IRC bot, that will monitor for chat events and commands to manipulate the same databases. I'm thinking one way to do this would be to run instances of the IRC bot, passing in [name] as a parameter, so that each instance of the IRC bot is only accessing one database. There's still the possibility that the IRC bot and Pyramid application may be trying to manipulate the same database, which could be problematic. Maybe one way around this would be for the IRC bot to import the database sessions (as mentioned above in point #2), and use the same sessions within the IRC bot application (would that take advantage of Pyramid's transaction manager? Presumably yes).
That's all I have right now. I hope I'm thinking about this correctly. Is there anything that I'm grossly mistaken on?

There's a lot to unpack in your question and it's probably too much for SO but I have a few comments.
I would strongly urge avoiding a per-tenant schema where you're doing dynamic naming of tables. So I'd avoid option 1. Session management is always difficult, and sort of depends on your choice of database (for example, pgbouncer can greatly help by keeping persistent connections open to a lot of databases without complicating your code as much).
Manipulating the same database isn't the end of the world - any decent database has concurrency support and can handle multiple connections so long as you handle your locking correctly.
Consider whether it's really worth separating the databases as it definitely adds a lot of complexity up front. It depends on the problem domain and use cases whether it's worth it.

Related

Text File, MongoDB or JSON? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
So, I’m learning python and discord.py, and out of a bot with 500 lines of code, I only asked help with one item. The rest, I’ve been researching myself and trial and error. I’m currently at a cross roads, and would like some advice which route to take. I’m not looking to ask how, I’ll figure that on my own ( hopefully ).
So, I have a bot running on my Windows PC, only running on a single server, which is my own. The bot returns an embedded message with a list of inactive users, which is based on a series of roles. After a few nested IF statements, it adds the field with person.mention. Then posts the list to a specific channel, mentioning them all.
As per rules, they have 48 hours to improve their activity, which will modify their roles.
So, while the first command works like a charm, I’m looking to create a second command that goes through the list of users from the previous “audit” ( typically about 15-30 people ) check them to see if their activity has improved ( if set of roles exist ) and report back in a staff channel “Members out of compliance, and subject to removal:” then the list of saved users wiped for the next audit. ( twice a month )
To do this, I need to research how, but for the sake of saving me time, I’m asking which route should I investigate and why? Text File? DB? Or JSON?
I appreciate everyone’s input.

I'd normally suggest using a small database (like sqlite) for small bots, but if you're new to python you shouldn't learn SQL. I guess using a JSON file works, though using them as a database is not a great idea, it's mostly used as a config file. A few downsides of using JSON files are:
It's a file-based data storage, which makes it vulnerable to race conditions.
You'll need to implement your own synchronization primitives to avoid corrupting data.
If you're not careful, you could accidentally wipe your entire JSON file.
Another alternative to JSON files are yaml or toml files, but the downsides are the same.
Using databases:
If you want to learn SQL (there are good, free, easy to follow sources out there like sqlbolt) the advantages are:
Databases organize your data into tables, and are fast at inserting, retrieving, and removing records.
You can impose uniqueness constraints to ensure against duplication.
The Python libraries enforce synchronization for you.
The query language is intuitive, you can get running with simple queries in just a few hours!
MongoDB is an excellent choice for a database, I haven't personally used it but it's a good non-relational database (doesn't use SQL).
PS: Don't even think about using txt files as a database, that's a bad, bad, bad idea.

Understanding how FHIR works and how to implement it in a healthcare app [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a project for University where we are developing a toy healthcare app and I am the backend developer. This app interacts with users, collecting their data. This data is used to make predictions for the user (data science) and can also be sent to doctors. Doctors can also update the data and send it back to the app.
What I do know:
The backend will be in Python, since that's the language of Data Science.
Probably will use Flask for ease of use
needs a database to store the data - probably start with SQLite also for quick ease of use
What I don't know:
since we are dealing with health data, I know that there are standards when it comes to transferring such data, such as using FHIR. But I have no idea where FHIR fits in in the app. I saw that Smart-on-FHIR has a fhirclient library with an example in Flask - is that the way to go?
I assume I need a FHIR server but how do I use one? I saw many available for testing, but how can I use those if the data needs to be private?
Basically, although I know which technologies to use, I don't know how to actually fit the pieces together. This question is asking for ideas on how to piece this project together. I need some clarity to get started and get things off the ground. I have a Flask server - how do I implement this FHIR in it so that I store the data properly, get the data for predictions and also send the data back and forth between the app and the doctor?
I appreciate any help!

FHIR is principally standard for sharing information between software systems - be that applications within a hospital, between EMRs and community pharmacies, clinical systems and research systems, etc. If your system isn't actually sharing data with other applications, there's no real need to use FHIR at all.
You could choose to use FHIR anyhow - you could download one of the FHIR open source servers and use that as your persistence layer. (You'd have your own instance running on your own hardware/cloud server, so your data would be your own.) The rationale for doing that is that it'll be good at storing healthcare data and will have most of the features you need (though it'll likely have a whole lot of features you don't). Also, if one of the objectives of your project is learning, then understanding how FHIR represents data will be useful as you move beyond a 'toy' application and start working with real systems that invariably need to share data.
SMART on FHIR is a mechanism to let you embed applications inside of electronic health record systems that have access to the EHR's data. It can also be used to create web applications that have access to EHR data. The key thing that SMART provides is an ability for a user to control what data the app has access to. (Just like you can control whether an app on your phone can access your address book or microphone, SMART lets you control whether a healthcare app can access your allergies or medications.) It's not obvious from your project description that there'd necessarily be a need for that functionality.
In short, you probably don't need FHIR, but you may find some of the open source tools developed by the FHIR community useful. Good luck with your project.

using Python and SQL [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I used txt files to store data in it and read it any time i need and search in it and append and delete from it
so
why should i use database i can still using txt files ?

In fact, you have used files instead of a database. To answer the question, let us check the advantages of using a database:
it is faster: a service is awaiting commands and your app sends some commands to it. Database Management Systems have a lot of cool stuff implemented which you will be lacking if you use a single file. True, you can create a service which loads the file into memory and serves commands, but while that seems to be easy, it will be inferior to RDBMS's, since your implementation is highly unlikely to be even close to a match of the optimizations done for RDBMS's over decades, unless you implement an RDBMS, but then you end up with an RDBMS, after all
it is safer: RDBMS's encrypt data and have user-password authentication along with port handling
it is smaller: data is stored in a compressed manner, so if you end up with a lot of data, data size will get critical much later
it is developed: you will always have possibilities to upgrade your system with juices implemented recently and to keep up the pace with science's current development
you can use ORM's and other stuff built to ease the pain of data handling
it supports concurrent access: imagine the case when many people are reaching your database at the same time. Instead of you implementing very complicated stuff, you can get this feature instantly
All in all, you will either use a database management system (not necessarily relational), implement your own or work with textual files. Your textual file will quickly be overwhelmed if your application is successful and you will need a database management system. If you write your own, you might have a success story to tell, but it will come only after many years of hard work. So, if you get successful, then you will need database management system. If you do not get successful, you can use textual files, but the question is: is it worth it?
And finally, your textual file is a database, but you are managing it by your custom and probably very primitive (no offence, but it is virtually impossible to achieve results when you are racing against the whole world) database management system compared to the ones out there. So, yes, you should learn to use advanced database management systems and should refactor your project to use one.

General queries vs detailed queries to database

I'm writing a web application in python and postgreSQL. Users are to access a lot of information during a session. All such information (almost) are indexed in the database. My question is, should I litter the code with specific queries, or is it better practice to query larger chunks of information, cashing it, and letting python process the chunk for finer pieces?
For example: A user is to ask for entries in a payment log. Either one writes a query asking for the specific entries requested, or one collect the payment history of the user and then use python to select the specific entries.
Of course cashing is preferred when working with heavy queries, but since nearly all my data is indexed, direct database access is fast and the cashing approach would not yield much if any extra speed. But are there other factors that may still render the cashing approach preferable?

Database designers spend a lot of time on caching and optimization. Unless you hit a specific problem, it's probably better to let the database do the database stuff, and your code do the rest instead of having your code try to take over some of the database functionality.

Django calling REST API from models or views? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have to call external REST APIs from Django. The external data source schemas resemble my Django models. I'm supposed to keep the remote data and local ones in sync (maybe not relevant for the question)
Questions:
What is the most logical place from where to call external web services: from a model method or from a view?
Should I put the code that call the remote API in external modules that will be then called by the views?
Is it possible to conditionally select the data source? Meaning presenting the data from the REST API or the local models depending on their "freshness"?
Thanks
EDIT: for the people willing to close this question: I've broken down the question in three simple questions from the beginning and I've received good answers so far, thanks.

I think it is an opinion where to call web services. I would say don't pollute your models because it means you probably need instances of those models to call these web services. That might not make any sense. Your other choice there is to make things #classmethod on the models, which is not very clean design I would argue.
Calling from the view is probably more natural if accessing the view itself is what triggers the web service call. Is it? You said that you need to keep things in sync, which points to a possible need for background processing. At that point, you can still use views if your background processes issue http requests, but that's often not the best design. If anything, you would probably want your own REST API for this, which necessitates separating the code from your average web site view.
My opinion is these calls should be placed in modules and classes specifically encapsulated for your remote calls and processing. This makes things flexible (background jobs, signals, etc.) and it is also easier to unit test. You can trigger calling this code in the views or elsewhere, but the logic itself should be separate from both the views and the models to decouple things nicely.
You should imagine that this logic should exist on its own if there was no Django around it, then build other pieces that connect that logic to Django (ex: syncing the models). In other words, keep things atomic.
Yes, same reasons as above, especially flexibility. Is there any reason not to?
Yes, simply create the equivalent of an interface. Have each class map to the interface. If the fields are the same and you are lazy, in python you can just dump the fields you need as dicts to the constructor (using **kwargs) and be done with it, or rename the keys using some convetion you can process. I usually build some sort of simple data mapper class for this and process the django or rest models in a list comprehension, but no need if things match up as I mentioned.
Another related option to the above is you can dump things into a common structure in a cache such as Redis or Memcache. It might be wise to atomically update this info if you are concerned with "freshness." But in general you should have a single source of authority that can tell you what is actually fresh. In sync situations, I think it's better to pick one or the other to keep things predictable and clear though.
One last thing that might influence your design is that by definition, keeping things in sync is a difficult process. Syncs tend to be very prone to failure, so you should have some sort of durable mechanism such as a task queue or job system for retries. Always assume when calling a remote REST API that calls can fail for crazy reasons such as network hicups. Also keep in mind transactions and transactional behavior when syncing. Since these are important, it points again to the fact that if you put all this logic in a view directly, you will probably run into trouble reusing it in the background without abstracting things a bit anyway.

What is the most logical place from where to call external web services: from a model method or from a view?
Ideally your models should only talk to database and have no clue what's happening with your business logic.
Should I put the code that call the remote API in external modules that will be then called by the views?
If you need to access them from multiple modules, then yes, placing them in a module makes sense. That way you can reuse them efficiently.
Is it possible to conditionally select the data source? Meaning presenting the data from the REST API or the local models depending on their "freshness"?
Of course it's possible. You can just implement how you fetch your data on request. But the more efficient way might just be avoiding that logic and just sync your local data with remote data and show the local data on the views.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.