Text File, MongoDB or JSON? [closed]

Text File, MongoDB or JSON? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
So, I’m learning python and discord.py, and out of a bot with 500 lines of code, I only asked help with one item. The rest, I’ve been researching myself and trial and error. I’m currently at a cross roads, and would like some advice which route to take. I’m not looking to ask how, I’ll figure that on my own ( hopefully ).
So, I have a bot running on my Windows PC, only running on a single server, which is my own. The bot returns an embedded message with a list of inactive users, which is based on a series of roles. After a few nested IF statements, it adds the field with person.mention. Then posts the list to a specific channel, mentioning them all.
As per rules, they have 48 hours to improve their activity, which will modify their roles.
So, while the first command works like a charm, I’m looking to create a second command that goes through the list of users from the previous “audit” ( typically about 15-30 people ) check them to see if their activity has improved ( if set of roles exist ) and report back in a staff channel “Members out of compliance, and subject to removal:” then the list of saved users wiped for the next audit. ( twice a month )
To do this, I need to research how, but for the sake of saving me time, I’m asking which route should I investigate and why? Text File? DB? Or JSON?
I appreciate everyone’s input.

I'd normally suggest using a small database (like sqlite) for small bots, but if you're new to python you shouldn't learn SQL. I guess using a JSON file works, though using them as a database is not a great idea, it's mostly used as a config file. A few downsides of using JSON files are:
It's a file-based data storage, which makes it vulnerable to race conditions.
You'll need to implement your own synchronization primitives to avoid corrupting data.
If you're not careful, you could accidentally wipe your entire JSON file.
Another alternative to JSON files are yaml or toml files, but the downsides are the same.
Using databases:
If you want to learn SQL (there are good, free, easy to follow sources out there like sqlbolt) the advantages are:
Databases organize your data into tables, and are fast at inserting, retrieving, and removing records.
You can impose uniqueness constraints to ensure against duplication.
The Python libraries enforce synchronization for you.
The query language is intuitive, you can get running with simple queries in just a few hours!
MongoDB is an excellent choice for a database, I haven't personally used it but it's a good non-relational database (doesn't use SQL).
PS: Don't even think about using txt files as a database, that's a bad, bad, bad idea.

Related

Large scale pyramid application with multiple databases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
This is less of a question, and more of a "am I on the right track with my train of thought?" topic.
I have an idea for a fairly large scale application written using Pyramid, using multiple databases, and wanted to post some of my musings to see if what I have in mind would work. I'm in the process of designing the application, but haven't started writing any code yet, to test out if it would work or not.
I'm considering it large scale because I anticipate the database growing significantly.
Main points:
The main route would be in the form of www.domain.com/[name]/whatever, where [name] is the key parameter that will decide what data to present to the client.
There will be multiple databases: 1 database, let's say site.db that contains site-wide information and configs, and the rest of the databases will be [name].db, containing each individuals user-specific data. This is where I expect the bulk of the scaling to happen, as I want to design the application to accept 100s of databases, one per user, to compartmentalize the data and prevent data cross-contamination.
Options:
2 databases. site.db and userdata.db. I would use the same database models, but the table names would be determined dynamically using the [name] parameter as such: [name]_table1, [name]_table2, etc, where table1/2/n would be a lot more descriptive than that. This may be simpler in the sense that I'd only ever have 2 database sessions to work with, and as long as I keep the sessions separate, I should be able to query each one individually. The downside is that the userdata.db can grow large with 100s of users. In theory, shouldn't be a problem if I can query the right tables using the [name] parameter.
1 + n databases. Managing database sessions could be a pain here, but one possible solution might be to scan a folder for all .db files, and create a session for each database, and build out a dictionary of sessions, where the dictionary key would be the file name. Something like sessions['site'] points to the DB session that handles site-wide data, while session['name'] points to a session that manipulates name.db - if that makes sense. This way, I can use the [name] parameter in the route to use the appropriate database session.
Aside from the above, there will be a sub-application as well, in the form of an IRC bot, that will monitor for chat events and commands to manipulate the same databases. I'm thinking one way to do this would be to run instances of the IRC bot, passing in [name] as a parameter, so that each instance of the IRC bot is only accessing one database. There's still the possibility that the IRC bot and Pyramid application may be trying to manipulate the same database, which could be problematic. Maybe one way around this would be for the IRC bot to import the database sessions (as mentioned above in point #2), and use the same sessions within the IRC bot application (would that take advantage of Pyramid's transaction manager? Presumably yes).
That's all I have right now. I hope I'm thinking about this correctly. Is there anything that I'm grossly mistaken on?

There's a lot to unpack in your question and it's probably too much for SO but I have a few comments.
I would strongly urge avoiding a per-tenant schema where you're doing dynamic naming of tables. So I'd avoid option 1. Session management is always difficult, and sort of depends on your choice of database (for example, pgbouncer can greatly help by keeping persistent connections open to a lot of databases without complicating your code as much).
Manipulating the same database isn't the end of the world - any decent database has concurrency support and can handle multiple connections so long as you handle your locking correctly.
Consider whether it's really worth separating the databases as it definitely adds a lot of complexity up front. It depends on the problem domain and use cases whether it's worth it.

using Python and SQL [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I used txt files to store data in it and read it any time i need and search in it and append and delete from it
so
why should i use database i can still using txt files ?

In fact, you have used files instead of a database. To answer the question, let us check the advantages of using a database:
it is faster: a service is awaiting commands and your app sends some commands to it. Database Management Systems have a lot of cool stuff implemented which you will be lacking if you use a single file. True, you can create a service which loads the file into memory and serves commands, but while that seems to be easy, it will be inferior to RDBMS's, since your implementation is highly unlikely to be even close to a match of the optimizations done for RDBMS's over decades, unless you implement an RDBMS, but then you end up with an RDBMS, after all
it is safer: RDBMS's encrypt data and have user-password authentication along with port handling
it is smaller: data is stored in a compressed manner, so if you end up with a lot of data, data size will get critical much later
it is developed: you will always have possibilities to upgrade your system with juices implemented recently and to keep up the pace with science's current development
you can use ORM's and other stuff built to ease the pain of data handling
it supports concurrent access: imagine the case when many people are reaching your database at the same time. Instead of you implementing very complicated stuff, you can get this feature instantly
All in all, you will either use a database management system (not necessarily relational), implement your own or work with textual files. Your textual file will quickly be overwhelmed if your application is successful and you will need a database management system. If you write your own, you might have a success story to tell, but it will come only after many years of hard work. So, if you get successful, then you will need database management system. If you do not get successful, you can use textual files, but the question is: is it worth it?
And finally, your textual file is a database, but you are managing it by your custom and probably very primitive (no offence, but it is virtually impossible to achieve results when you are racing against the whole world) database management system compared to the ones out there. So, yes, you should learn to use advanced database management systems and should refactor your project to use one.

Python evaluate and grade code from students [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
For a class, I would like to automatically evaluate (parts) of the coding assignments of students. The setup I had in mind is something like:
Students get a class skeleton, which they have to fill in.
Students ``upload'' this class definition to a server (or via webinterface)
The server runs a script an test on specific functions, eg class.sigmoid(x), and checks if the output of the function is correct and might give suggestions.
This setup brings a whole lot of problems, since you're evaluating untrusted code. However, it would be extremely useful, for many of my classes, so I'm willing to spend some time in thinking it trough. I remember Coursera had something similar for matlab/octace assignments, but I can't get the details of that.
I've looked at many online python interfaces (eg, codecademy.com, ideone.com, c9.io); while they seem perfect to learn and or share code, with online evaluation. I do miss the option, that the evaluation script is "hidden" from the students (ie the evaluation script should contain a correct reference implementation to compare output on random generated data). Moreover, the course I give requires some data (eg images) and packages (sklearn / numpy), which is not always available.
Specifically, my questions are
Do I miss an online environment which actually offers such a functionality. (that would be easiest)
To set this up myself, I was thinking to host it at (eg) amazon cloud (so no problem with infrastructure at University), but are there any python practices you could recommend on sandboxing the evaluation?
Thanks in advance for any suggestions!
Pity to hear that the question is not suitable for StackOverflow. Thanks to the people (partially) answering the question.
After some more feedback via other channels, I think my approach will become as follows:
Student gets skeleton and fills it in
Student also has the evaluation script.
In the script, some connections with a server are made to
login
obtain some random data
check if the output of the students code is numerically identical to what the server expects.
In this way the students code is evaluated locally, but only output is send to the server. This limits the kind of evaluations possible, but still allows for kind of automatic evaluation of code.

Sandboxing Python in general is impossible. You can try to prevent dangerous operations, which will mean significantly limiting what the student code can do. But that will likely leave open attack vectors anyway. A better option is to use OS-level sandboxing to isolate the Python process. The CodeJail library uses AppArmor to provide a safe Python eval, for example.
As an example of the difficulty of sandboxing Python, see Eval really is dangerous, or consider this input to your sandbox: 9**9**99, which will attempt to compute an integer on the order of a googolplex, consuming all of your RAM after a long time.

This is currently a very active field in programming languages research.
I know of these two different approaches that look at the problem:
- http://arxiv.org/pdf/1409.0166.pdf
- http://research.microsoft.com/en-us/um/people/sumitg/pubs/cacm14.pdf (this is actually only one of very many papers by Sumit and his group)
You may want to look at these things to find something that could help with your problem (and edit this answer to make it more useful).

Reverse Engineer a program working as a webservice, the future? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
First I want to clearify that I mean by reverse engineering something like "decompiling" and getting back the original source code or something similiar.
Yesterday I read a question about someone who wanted to protect his python code from "getting stolen" in other words: he didn't like that someone can read his python code.
The interesting thing I read was that someone said that the only reliable way to "protect" his code from getting reverse engineered is by using a Webservice.
So I could actually only write some GUIs in Python, PHP, whatever and do the "very secret code" I want to protect via a Webservice. (Basically sending variables to the host and getting results back).
Is it really impossible to reverse engineer a Webservice (via code and without hacking into the Server)? Will this be the future of modern commercial applications? The cloud-hype is already here. So I wouldn't wonder.
I'm very sorry if this topic was already discussed, but I couldn't find any resources about this.
EDIT: The whole idea reminds me of AJAX. The code is executed on the server and the content is sent to the client and "prettified". The client himself doesnt see what php-code or other technology is behind.

Wow, this is awesome! I've never thought it this way, but you could create a program that crawls an api, and returns as an output a django/tastypie software that mimics everything the api does.
By calling the service, and reading what it says, you can parse it, and begin to see the relationships between objects inside the api. Having this, you can create the models, and tastypie takes it from this point.
The awesome thing about this, is that normal people (or at least not backend developers) could create an api just by describing what they want to be as an output. I've seen many android/iphone developers creating a bunch of static xml or json, so they can call their service, and start the frontend development. Well what if that was enough? Take some xml/json files as input, get a backend as an output.

Yes,
All they could do is treat your web service as a black box, query the WSDL for all the parameters it accepts and the data that it returns.
They could then submit different variables and see what different results are. The "code" could not be seen or stolen (with proper security) but the inputs and outputs could be duplicated.
If you want to secure your "very secret code" a web service is a great way to protect the actual code.
-sb

It depends on what you mean by reverse engineering: by repeatedly sending input and analyzing the output the behaviour of your code can still be seen. I wouldn't have your code but I can still see what the system does. This means I could build a similar system that does the same thing, given the same input.
It would be hard to catch exceptional cases (such as output that is different on one day of the year only) but the common behaviour can certainly be copied. It is similar to analyzing the protocol of an instant messaging client: you may not have the original code but you can still build a copy.

Personalizing Online Assignments for a Statistics Class [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I teach undergraduate statistics, and am interested in administering personalized online assignments. I have already solved one portion of the puzzle, the generation of multiple version of a question using latex/markdown + knitr/sweave, using seeds.
I am now interested in developing a web-based system, that would use the various versions generated, and administer a different one for each student, online. I have looked into several sites related to forms (google docs, wufoo, formsite etc.), but none of them allow programmatic creation of questionnaires.
I am tagging this with R since that is the language I am most familiar with, and is key to solving the first part of the problem. I know that there are several web-based frameworks for R, and was wondering whether any of them are suitable for this job.
I am not averse to solutions in other languages like Ruby, Python etc. But the key consideration is the ability to programatically deliver online assignments. I am aware of tools like WebWork, but they require the use of Perl and the interfaces are usually quite clunky.
Feel free to add tags to the post, if you think I have missed a framework that would be more suitable.
EDIT. Let me make it clear by giving an example. Currently, if I want to administer an assignment online, I could simply create a Google Form, send the link to my students, and collect all responses in a spreadsheet, and automatically grade it. This works, if I just have one version of the assignment.
My questions is, if I want to administer a different version of the assignment for each student, and collect their responses, how can I do that?

The way you have worded your question it's not really clear why you have to mark the students' work online. Especially since you say that you generate assignments using sweave. If you use R to generate the (randomised) questions, then you really have to use R to mark them (or output the data set).
For my courses, I use a couple of strategies.
For the end of year exam (~500 students), each student gets a unique data set. The students log on to a simple web-site (we use blackboard since the University already has it set up). All students answer the same questions, but use their own unique data set. For example, "What is the mean". The answers are marked offline using an R script.
In my introductory R course, students upload their R functions and I run and mark them off line. I use sweave to generate a unique pdf for each student. Their pdf shows where they lost marks. For example, they didn't use the correct named arguments.
Coupling a simple web-form with marking offline gives you a lot of flexibility and is fairly straightforward.

I found one possible solution that might work using the RGoogleDocs package. I am posting this as an answer only because it is long. I am still interested in better approaches, and hence will keep the question open.
Here is the gist of the idea, which is still untested.
Create multiple versions of each assignment using knitr/Sweave.
Upload them to GoogleDocs using uploadDoc.
Share one document per student using setAccess which modifies access controls.
Create a common Google Form to capture final answers for each student.
The advantage I see is two-fold. One, since all final answers get captured on a spreadsheet, I can access them with R and grade them automatically. Two, since I have access to all the completed assignments on Google Docs, I can skim through them and provide individual comments as required (or let some of my TAs do it).
I will provide an update, if I manage to get this working, and maybe even create an R package if it would be useful for others.

I know that this was asked a long time ago, but I think that today the best solution is the package exams plus Moodle.
The package exams can now generate XML Moodle questions that can be upload to Moodle platform as the students can solve the exercices on-line.
This is an example of a question made with exams package and uploaded to Moodle.

i just stumbled upon the ?exams package in R: Link to the CRAN site. could this be something for you?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.