Server Infrastructure, file upload project

Server Infrastructure, file upload project - python

I am planning to create a file upload website where users register as members and then upload files through both a file upload form and ftp account (each file can be up to 10gb).
For each file uploaded the member gets provided with a link which he can share with other users. Unfortunately I am just an average Django coder/linux user and have not worked on any similar project before.
Problem 1
The storage space used will potentially quickly grow to 1000s of TB's, how do I optimise the server and its storage for this? Should I use a Cloud-service or which type of Hosting would be most suitable? How would you setup the Infrastructure to make this run smoothly?
I was planning to run Freebsd as OS and Django/Python for the Development ...
Appreciate your input and all ideas!

From what you describe, I would start with a cloud service and see how the actual usage turns out. That might be the cheapest and most scalable version.
For setting things up, you have several options (surprise! :-) ). AFAIK, Amazon has some preconfigured images that might take you a long way. Since you're doing python, you could also look at Google and see how their services play together.
As you described yourself primarily as a coder, I would stay away from puppet, chef, ansible and such. While those are great tools, they add a layer of abstraction to managing actual servers. I might be wrong, of course, and such tools are just the help you need in order to set things up.
For many admin-tools, there are ready-to-use modules or templates that might help you achieve your goal.
As a simple battle-plan suggestion:
look at cloud-providers to determine which one suites you well.
look at tools to interact with the cloud-provider you are thinking about using.
try to find user-groups for the cloud-provider/admin tool you chose to learn more about them or get help from other people.

Related

trying to create script that scan for secrets in python code

As part of our CI/CD we want to add some check to run code on python files and check if there are some secrets in code (like API, passwords etc.).
I saw only programs that do this, and I want to create a Python script that does it.
Does anyone have some suggestion or example for this?

I'm not sure it exactly what you look for but you can use GitGuardian API,
The GitGuardian API puts at your fingertips the power to detect more
than 200 types of secrets in any text content, as well as other
potential security vulnerabilities.
py-gitguardian can be used to create integrations to scan various data
sources, from your workstation's filesystem to your favorite chat
application.
You can check API details here with all the response codes and expected structures on each method.
just take a look GitGuardian/py-gitguardian Github repository,
You can also check this Youtube video that will help you implement this.
Good luck.

Is there a python webframework which is able to run decently on 128mb of ram?

I have a small and easy project with no DB interactions for which I don't have free resources except the small linux VPS (vServer) 128MB RAM machine. Feeling adventures I want to try to implement this project in python.
Will it be possible? If so what setup (webserver, framework and so on) I have to choose?
I'm reading files from file-system and displaying their content in a beautiful way. Also diffs between the files and couple of similar things... No file upload from the users, all textfiles are pre-made.

I would go with a micro framework like bottle or flask.
Edit: You probably don't want to use django if you are looking for light. Django is a full stack framework and if you don't need database interaction I would seriously look into one of the above CherryPy or web.py.
Honestly I think that you should go with bottle. It is a single file and its memory usage is very low.
This will allow you to have python and bottle installed and you can read those files easily and serve content with bottle easily and with very low memory use.
Like I said before web.py, CherryPy, Flask are also good alternatives.

Not directly answering your question, but if you have no DB interactions, why use Python at all? I think I would prefer to serve a site that does the expensive work on the client (since you are resource-restrained), i.e. with a light web server and neat Javscript framework. There are many, I like angular.js.

Best Python Web Framework for my API Server Needs

I am working on developing two systems:
A system that will constantly retrieve economic data from a 3rd party data feed and push it into a MySQL DB (using sqlalchemy)
A server that will allow anyone to query the data in the db over a JSON AJAX API (similar to Yelp or Yahoo API for example)
I have two main questions:
Which Python framework should I use in 2)? Pyramid is my first choice, but if you strongly suggest against it or in favor of something else like Django or Pylons I am definitely wiling to consider it.
Should I develop the two system separately? Or should 1) be a part of 2), running within the framework (using crontab or celery for example)?

Depends on what stage you are at, I would suggest to develop 2 systems because the load to pull data from 3rd party and the load to handle the API would be different. You can scale them into a different types of nodes if you want.
Django-Tastypie (https://github.com/toastdriven/django-tastypie) is not bad, it supports all JSON, XML and YAML. Also you can add OAuth easily. Though, Django itself maybe a bit heavy for your needs at this time.

You might want to check out web2py's new functionality for easily generating RESTful API's, particularly its parse_as_rest and smart_query functions. You might also consider using web2py's database abstraction layer to handle #1.
If you need any help, ask on the mailing list.

I agree with Anthony, you should look at Web2Py. It is very easy to get started, very low learning cure and easy to deploy on many systems including Linux, Windows and Amazon.
So far I have found nothing that Web2Py can not do. But more importantly it does things how you would think they should be done, so if you are not sure, very often a guess is good enough and it just works. If you do get stuck, it has by far the best and most up to date documentation for any Python Web Framework.
Even with all it's great features, easy use and up to date documentation, you will also find that the web2py user group on Google, is like having a paid for help desk staffed 24 hours a day. Most questions are answered with a couple minutes and Massimo (The original creator of Web2Py) goes out of his way not only to help, but to implement new ideas, suggestions and bug fixes within days of them being raised in the group.

Python/GAE social network / cms?

After much research, I've come up with a list of what I think might be the best way of putting together a Python based social network/cms, but have some questions about how some of these components fit together.
Before I ask about the particular components, here are some of the key features of the site to be built:
a modern almost desktop-like gui
future ability to host an advanced html5 sub-application (ex.http://www.lucidchart.com)
high scalability both for functionality and user load
user ability to password protect and permission manage content on per item/group basis
typical social network features
ability to build a scaled down mobile version in the future
Here's the list of tools I'm considering using:
Google App Engine
Python
Django
Pinax
Pyjamas
wxPython
And the questions:
Google App Engine -- this is an attempt to cut to the chase as many pieces of the puzzle seem to be in place.
Question: Am I limiting my options with this choice? Example: datastore not being relational? Should I wait
for SQL support under the Business version?
Python -- I considered 'drupal' at first, but in the end decided that being dependent on modules that may or
may not exist tomorrow + limitations of its templating system are a no-no. Learning its API, too, would be useless elsewhere
whereas Python seems like a swiss army knife of languages -- good for almost anything.
Question: v.2.5.2 is required by GAE, but python.org recommends 2.5.5. Which do I install?
Django -- v.0.96 is built into GAE. You seem to be able to upgrade it.
Questions: Any reason not to upgrade to the latest version? Ways to get around the lack of HTML5 support?
Pinax (http://pinaxproject.com) Rides on top of Django and appears to provide most of the social network functionality
anyone would want.
Question: Reasons NOT to use it? Alternatives?
Pyjamas and wxPython -- this is the part that gets a little confusing. The basic idea behind these is the ability
to build a GUI. I've considered Silverlight and Flash, before the GAE/Python route, but a few working versions of
HTML5 apps convinced me that enough of it ALREADY runs on the latest batch of browsers to chose the HTML5/Javascript
route instead.
Question: How do I extend/supplement Python/Django to build an app-like HTML5 interface? Are Pyjamas and wxPython
the way to go? Or should I change my thinking completely?
Answers to some/any of these questions would be of great help. Please excuse my ignorance if any of this doesn't make much sense.
My last venture into web programming was a decent sized LAMP website some 5-6 years ago. On the desktop side of things,
my programming experience boils down to very high level scripting languages that I keep on learning to accomplish very specific
tasks :)

As someone who has deployed a Django site to GAE, I can tell you that you are not going to reach the ideal solution. Django on GAE misses some of the best aspects of Django because the ORM doesn't work right. The best compromise may be to use Django-nonrel to add the features back in.
This introduces it's own problems though: because of the large number of files and memory used by a Django app you're code will be unloaded from memory quickly after the app becomes idle. That means that visitors will frequently hit an approximately 6 second delay on the first page view after the site's code has been unloaded from memory while GAE uncompresses the zipped modules. Once your site is busy this won't be a problem, but while your site is still young and unknown it will cause the appearance of performance problems. :-(
Second, I've also worked for a company that built a custom CMS and can tell you that the first 80% is pretty easy, especially with modern frameworks. However, the rest can be quite challenging. For example, user roles and custom content types are two challenging aspects. Therefore strongly consider standing on the backs of giants and finding a CMS or CMS framework that almost perfectly meets your needs and then extend it to do that extra bit you need.
So, that said, answering your points:
Yes, you're limiting your options but that may be OK. Most developers are more comfortable with the relational model than the nosql model. Therefore more open source software is built with it in mind. Also, GAE is a closed source platform which is also a deterrent to open source developers. App Engine Oil is a CMS framework that may suit you well and is optimized for App Engine. Also look at web2py which has support for GAE.
I've found myself to be extremely productive with Python. I used to write a lot of PHP now I find it ugly. That said, think about the total line count of code you'll have to write. If you can make Drupal work with high quality pre-made modules you may find yourself only needing 1/10th of the code. By the way, the trick with Drupal is to mainly use only high quality modules. Look at the history, make sure not to use development versions. Try to contact the authors on IRC. I'm not saying you should use Drupal but it is possible to have a reliable site with it (for example, whitehouse.gov)
You're in the classic GAE/Django problem. If you use 0.96 you get great performance but you miss a lot of the great 1.0+ features and you don't get the ORM and all of it's benefits. If you use a newer version of Django you get the performance/memory problems mentioned above.
I'm about to investigate pinax for my company. I've done a very cursor glance at it. I don't know if it has good support for non relational model backends. You'll probably need to look at django-nonrel. However know that you're going to be investing in relatively untried solutions here. A small percentage of Django users use Pinax and an even smaller percentage, if any, use it on a nonrelational backend. Therefore you're going to be in the highly experimental scenario you mentioned in point 2 above.
I can't offer personal experience on it. I've investigated pyjamas a few times. However I like writing HTML CSS and JS. I like to have control. I like progressive enhancement and knowing what users will see if they don't have the full capabilities. Also, I think any new app that doesn't explicitly address mobile clients is implicitly shooting themself in the foot. As many as 15% of Internet users only use the Internet via their smart phone. What kind of experience will they get with pyjamas?
You didn't mention this, but one thing I consider when choosing a platform is vendor lockin and portability. If you develop your solution for GAE and find that you're not able to do what you want, will you be able to port it to another solution elsewhere? How much work will it take? If you code heavily for GAE or make commitments to its architecture, you're stuck with it or with rewriting to move. Using Django or Web2py can help mitigate this.
That said, the big benefit of Python GAE is that you get to be very productive, see your results instantly, get hosting for free while your site is small and get excellent scalability. These are not small things. There is great value there.

Secure plugin system for python application

I have an application written in python. I created a plugin system for the application that uses egg files. Egg files contain compiled python files and can be easily decompiled and used to hack the application. Is there a way to secure this system? I'd like to use digital signature for this - sign these egg files and check the signature before loading such egg file. Is there a way to do this programmatically from python? Maybe using winapi?

Is there a way to secure this system?
The answer is "that depends".
The two questions you should ask is "what are people supposed to be able to do" and "what are people able to do (for a given implementation)". If there exists an implementation where the latter is a subset of the former, the system can be secured.
One of my friend is working on a programming competition judge: a program which runs a user-submitted program on some test data and compares its output to a reference output. That's damn hard to secure: you want to run other peoples' code, but you don't want to let them run arbitrary code. Is your scenario somewhat similar to this? Then the answer is "it's difficult".
Do you want users to download untrustworthy code from the web and run it with some assurance that it won't hose their machine? Then look at various web languages. One solution is not offering access to system calls (JavaScript) or offering limited access to certain potentially dangerous calls (Java's SecurityManager). None of them can be done in python as far as I'm aware, but you can always hack the interpreter and disallow the loading of external modules not on some whitelist. This is probably error-prone.
Do you want users to write plugins, and not be able to tinker with what the main body of code in your application does? Consider that users can decompile .pyc files and modify them. Assume that those running your code can always modify it, and consider the gold-farming bots for WoW.
One Linux-only solution, similar to the sandboxed web-ish model, is to use AppArmor, which limits which files your app can access and which system calls it can make. This might be a feasible solution, but I don't know much about it so I can't give you advice other than "investigate".
If all you worry about is evil people modifying code while it's in transit in the intertubes, standard cryptographic solutions exist (SSL). If you want to only load signed plugins (because you want to control what the users do?), signing code sounds like the right solution (but beware of crafty users or evil people who edit the .pyc files and disables the is-it-signed check).

Maybe some crypto library like this http://chandlerproject.org/Projects/MeTooCrypto helps to build an ad-hoc solution. Example usage: http://tdilshod.livejournal.com/38040.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.