Security Issues with Anaconda?

Security Issues with Anaconda? - python

There has been some back and forth between myself and the IT department of a company I recently began working regarding the installation of Python / Anaconda suite on my work PC. The IT department is making claims of security risks (with Anaconda) but I suspect it’s more of a matter of them not wanting to give me access. My suspicion is based, not on my IT knowledge, but due to the fact that I’ve used Anaconda at my last job with no issues. I’m hoping for some insight of enterprise risks (if any) associated with installation of Anaconda. To summarize the situation/my knowledge:
• I am not a developer, nor do I come from an IT/enterprise risk background. I’ve used Python for analytics, data cleansing and report automation
• Current and past companies are within the finance industry, i.e. confidential information lives on the network
• I’m requesting Anaconda as opposed Anaconda Enterprise
• I’m requesting Python version 3.6.4
I’m not trying to write-off IT’s concerns. What I’m trying to do is better understand the situation, educate myself and either alleviate their concerns or propose an alternative all parties can work with.
So my questions are:
Are their security threats associated with leveraging Anaconda? If so, what specifically?
If the risk is too great, what are alternatives to simply installing the Anaconda Suite?
Thanks

Are their security threats associated with leveraging Anaconda? If so, what specifically?
It depends on the environment. Do you have admin privileges ? How the global GPO policy looks like ? For example if you don't have an admin rights you can't do things like create a socket, access network stack on os level and vice versa. The thing is, that you can do similar damage with CMD/PowerShell also. Is the former secure than the latter, I don't think so ....
If the risk is too great, what are alternatives to simply installing the Anaconda Suite?
It depends on what kind of functionality you need from Anaconda, maybe you can use different python interpreter/framework, but from security perspective looks the same.
Usually "IT Stuff" don't have a clue how to implement OS/Domain/Security in a proper manner so their solution is to tell everybody that it's a security risk. In these days, everything is a security risk.

One could argue that any programming environment is a potential security risk.
But such an argument is rather unconvincing in any office environment where you will probably find a well-known office suite installed that comes with its own built-in programming environment! You can cause plenty of mayhem with VBA macros hidden in Word en Excel documents... Especially when these documents are sent to unsuspecting co-workers.
To put it more bluntly, any environment that is not nailed down to the point of being unusable is a potential security risk. Can you e.g. run programs off a USB drive? Or open Office files from them? Or even copy files to them?
You could ask IT what their specific objections are?
BTW; You don't need administrator privileges to install Anaconda for yourself. Only if you want to install it for all users on your machine.
Edit: Another approach is to bring your own laptop and use Python on that.
This is an approach I'm using. And I would suggest using a laptop loaded with a UNIX-like operating system like a Linux distribution or one of the BSD variants. All of these come with a lot of tools out of the box and since they have decent package management (as opposed to ms-windows) basically every open-source tool you can think of is easily available. There is a learning curve associated with this, but on these systems tools are meant to work together instead of existing as pre-packaged one-trick ponies.
For example, I keep yearly logbooks of work-related stuff that should be documented. These logbooks can run into 200-300 pages, with hundreds of illustrations and graphs. For such a thing, ms-word just doesn't cut it; I've tried several times. So I use LaTeX, python, gnuplot and a host of other tools for it. And the whole thing is kept under revision control as a matter of course.
This laptop isn't connected to the network at work, so it cannot be a threat to that.

Related

Is there a way for two People to work on one Jupyter Notebook

A friend of mine and me are doing some field research for our Physics degree. And we are using jupyter notebook to analyse the data we get. We usually sit together working at two different copies of the same file that in the end will be drag and dropped together using jupyter lab. This is obviously not ideal, so i thought is there any way for just two people to work on one document in Jupyter, sadly Google Colab has been Deprecated and CoCalc is expensive. So i thought id ask here if there is a way to make one person run a Jupyter notebook and the other one just being able to access it over peer to peer aswell so we could write in the same file at the same time.
Do you guys know something that makes me do this maybe a workaround that i can do.
Thanks for answers in advance

CoCalc is expensive.
Fortunately, we also provide a complete free easy to install open source version of CoCalc, which you can run on any computer that supports Docker. For example, here's how to run it on Google cloud.
(I have put too many years of my life into making realtiime collaboration work for Jupyter via CoCalc... In any case, the open source code has been battle tested in production for a while now and is working well finally. I hope it can solve your problem...)

You can upload your notebook to Deepnote. It provides a hosted environment, where you and your colleague can connect at the same time and work on the same notebook in real-time (the same way you'd do in Google Docs).
Colab is also good, but writing at the same time will result in conflicts.

Notebook itself doesn't support to collaborate simultaneously, but you can use GitHub to manage your python script and upload it into Colab separately. This way Github can help manage the file history and solve the conflicts.

JupyterLab 3.1.0a7 introduced real time collaboration.
There is a screencast showing it in action.
Key thing to note is the new top-level menu item called Share, to the right of Settings & Help.
You can click on launch binder here or here to try it now.
"Once you see the JupyterLab interface, there's a new top-level menu item called "Share"; click that, grab and share that URL, and you're done!"-SOURCE: Step #5 here
There's a gist here that seems to be updated regularly with how to activate the feature.
There's a detailed walk-through here if you want to add the ability into your own repositories that can launch via MyBinder.org. Although if that repo falls behind the gist, you'll probably want to consult the gist for the current best practices once you have the idea from the detailed walk-through.
Closely related question with an answer by #krassowski, is here. You may want to look there for some additional details.

While you can use github for this it can get messy, many people clear output cells when committing to git to avoid conflict issues. Which would defeat the object of your review work.
You should try Curvenote (which we're building for that reason) it doesn't offer compute as its a collaborative writing tool, works on top of Jupyter via a chrome extenson and gives you real time versioning, commenting and diffs.

Google Colab has been Deprecated and CoCalc is expensive
Noteable.io is 100% free for all users including storage, compute, RAM. For your purposes, it will be ideal as you will get Google Drive like collaboration (commenting, #mentioning, Annotating data points), versioning, sharing, interactive visualizations, choice of using Python and SQL in the same notebook and a ton of other features.
Here are good example notebooks on Noteable:
Climate Change: An analysis of Dew Point for the city of Toronto
Healthcare Sector Employee Attrition Exploratory Data Analysis
Exploratory Data Analysis Using SQL and Python - Online Retailer Orders

Enforcing use of a package management system when developing with a team

I work on a large python code base with several teammates. We are often installing or updating dependencies on other python packages, and inevitably this causes problems when someone else updates their master branch from git, or we deploy on a new system.
I've seen many tools available for deploying environments on new computers, which are great. The problem is that these tools only work if everyone is consistently updating the relevant files (e.g. requirements.txt, setup.py, tarballs on a PyPI server...) every time they update or add a package.
We use Github's pull request system for code reviews. What would be great would be some means of indicating to the reviewer that the dependency structure has changed, prompting the reviewer to check for the necessary updates (also good would be to build in a checklist that the reviewer has to complete, reminding them to do the check).
How have other folks dealt with this problem?

I would enforce the use of tools with a network proxy or network ACL, to block public sites and stand up internal services like gitlab, bitbucket, GitHub enterprise, or internal pypi server, to force the use of certain standards.

Encrypted and secure docker containers

We all know situations when you cannot go open source and freely distribute software - and I am in one of these situations.
I have an app that consists of a number of binaries (compiled from C sources) and Python code that wraps it all into a system. This app used to work as a cloud solution so users had access to app functions via network but no chance to touch the actual server where binaries and code are stored.
Now we want to deliver the "local" version of our system. The app will be running on PCs that our users will physically own. We know that everything could be broken, but at least want to protect the app from possible copying and reverse-engineering as much as possible.
I know that Docker is a wonderful deployment tool so I wonder: is it possible to create encrypted Docker containers where no one can see any data stored in the container's filesystem? Is there a known solution to this problem?
Also, maybe there are well known solutions not based on Docker?

The root user on the host machine (where the docker daemon runs) has full access to all the processes running on the host. That means the person who controls the host machine can always get access to the RAM of the application as well as the file system. That makes it impossible to hide a key for decrypting the file system or protecting RAM from debugging.
Using obfuscation on a standard Linux box, you can make it harder to read the file system and RAM, but you can't make it impossible or the container cannot run.
If you can control the hardware running the operating system, then you might want to look at the Trusted Platform Module which starts system verification as soon as the system boots. You could then theoretically do things before the root user has access to the system to hide keys and strongly encrypt file systems. Even then, given physical access to the machine, a determined attacker can always get the decrypted data.

What you are asking about is called obfuscation. It has nothing to do with Docker and is a very language-specific problem; for data you can always do whatever mangling you want, but while you can hope to discourage the attacker it will never be secure. Even state-of-the-art encryption schemes can't help since the program (which you provide) has to contain the key.
C is usually hard enough to reverse engineer, for Python you can try pyobfuscate and similar.
For data, I found this question (keywords: encrypting files game).

If you want a completely secure solution, you're searching for the 'holy grail' of confidentiality: homomorphous encryption. In short, you want to encrypt your application and data, send them to a PC, and have this PC run them without its owner, OS, or anyone else being able to scoop at the data.
Doing so without a massive performance penalty is an active research project. There has been at least one project having managed this, but it still has limitations:
It's windows-only
The CPU has access to the key (ie, you have to trust Intel)
It's optimised for cloud scenarios. If you want to install this to multiple PCs, you need to provide the key in a secure way (ie just go there and type it yourself) to one of the PCs you're going to install your application, and this PC should be able to securely propagate the key to the other PCs.
Andy's suggestion on using the TPM has similar implications to points 2 and 3.

Sounds like Docker is not the right tool, because it was never intended to be used as a full-blown sandbox (at least based on what I've been reading). Why aren't you using a more full-blown VirtualBox approach? At least then you're able to lock up the virtual machine behind logins (as much as a physical installation on someone else's computer can be locked up) and run it isolated, encrypted filesystems and the whole nine yards.
You can either go lightweight and open, or fat and closed. I don't know that there's a "lightweight and closed" option.

I have exactly the same problem. Currently what I was able to discover is bellow.
A. Asylo(https://asylo.dev)
Asylo requires programs/algorithms to be written in C++.
Asylo library is integrated in docker and it seems to be feаsable to create custom dоcker image based on Asylo .
Asylo depends on many not so popular technologies like "proto buffers" and "bazel" etc. To me it seems that learning curve will be steep i.e. the person who is creating docker images/(programs) will need a lot of time to understand how to do it.
Asylo is free of charge
Asylo is bright new with all the advantages and disadvantages of being that.
Asylo is produced by Google but it is NOT an officially supported Google product according to the disclaimer on its page.
Asylo promises that data in trusted environment could be saved even from user with root privileges. However, there is lack of documentation and currently it is not clear how this could be implemented.
B. Scone(https://sconedocs.github.io)
It is binded to INTEL SGX technology but also there is Simulation mode(for development).
It is not free. It has just a small set of functionalities which are not paid.
Seems to support a lot of security functionalities.
Easy for use.
They seems to have more documentation and instructions how to build your own docker image with their technology.

For the Python part, you might consider using Pyinstaller, with appropriate options, it can pack your whole python app in a single executable file, which will not require python installation to be run by end users. It effectively runs a python interpreter on the packaged code, but it has a cipher option, which allows you to encrypt the bytecode.
Yes, the key will be somewhere around the executable, and a very savvy costumer might have the means to extract it, thus unraveling a not so readable code. It's up to you to know if your code contains some big secret you need to hide at all costs. I would probably not do it if I wanted to charge big money for any bug solving in the deployed product. I could use it if client has good compliance standards and is not a potential competitor, nor is expected to pay for more licenses.
While I've done this once, I honestly would avoid doing it again.
Regarding the C code, if you can compile it into executables and/or shared libraries can be included in the executable generated by Pyinstaller.

Considerations for moving our web app to an appliance

We have developed a web application running on Linux that is quite popular. We now wish to release it as an appliance so customers can run it internally on their own networks.
We are unsure of the best approach. We are flexible on areas such as: the Linux distro, whether it's a hardware or software only appliance. Does anyone have any advice on the best way to go about this? Links to any good resources on the subject? Questions we should be asking ourselves? Legal considerations for a commercial app? Security considerations?
UPDATE:
It's a Python based web application. We would like the user to be able to do everything via a web interface. No command line stuff etc.

I know when Github needed to do something similar, they contracted with a company who specializes in building installers called BitRock.
If you want to develop the solution yourself, you can't go wrong building a Debian package (or RPM if you prefer). That's what most linux system admins would be comfortable with, and there are very well-known ways to give them a mix of customization/control while also making the process easy to manage on your end. That gives you and your users a very well-known update path as well.
Unless you have had very specific requests from customers, I would shy away from a turnkey-style appliance where you provide hardware. It's extra work for you and could be a turn-off to customers. Different business have different needs though, so maybe your client-base doesn't have IT and would prefer an all-in-one solution. Until you ask, you can't be sure though.

It depends on what langauge/tecnology application is written.
If it is java, release war file + tomcat/jboss.
If it is python, release eggs.
If it is php... not sure, probably just .tar.bz2.
Linux distro or virtual image mught be advantage, but I dislike using them, because they are usually does not fits to my infra (why do I have to install some custom debian-based distro to my rhel-only infrastructure?).

I need a beginners guide to setting up windows for python development

I currently work with .NET exclusively and would like to have a go at python. To this end I need to set up a python development environment. I guide to this would be handy. I guess I would be doing web development so will need a web server and probably a database. I also need pointers to popular ORM's, an MVC framework, and a testing library.
One of my main criteria with all this is that I want to understand how it works, and I want it to be as isolated as possible. This is important as i am wary of polluting what is a working .NET environment with 3rd party web and database servers. I am perfectly happy using SQLite to start with if this is possible.
If I get on well with this I am also likely to want to set up automated build and ci server (On a virtual machine, probably ubuntu). Any suggestions for these would be useful.
My ultimate aim if i like python is to have similar sorts of tools that i have available with .NET and to really understand the build and deployment of it all. To start with I will settle for a simple development environment that is as isolated as possible and will be easy to remove if I don't like it. I don't want to use IronPython as I want the full experience of developing a python solution using the tools and frameworks that are generally used.

It's not that hard to set up a Python environment, and I've never had it muck up my .NET work. Basically, install Python --- I'd use 2.6 rather than 3.0, which is not yet broadly accepted --- and add it to your PATH, and you're ready to go with the language. I wouldn't recommend using a Ubuntu VM as your development environment; if you're working on Windows, you might as well develop on Windows, and I've had no significant problems doing so. I go back and forth from Windows to Linux with no trouble.
If you have an editor that you're comfortable with that has basic support for Python, I'd stick with it. If not, I've found Geany to be a nice, light, easy-to-use editor with good Python support, though I use Emacs myself because I know it; other people like SCITE, NotePad++, or any of a slew of others. I'd avoid fancy IDEs for Python, because they don't match the character of the language, and I wouldn't bother with IDLE (included with Python), because it's a royal pain to use.
Suggestions for libraries and frameworks:
Django is the standard web framework, but it's big and you have to work django's way; I prefer CherryPy, which is also actively supported, but is light, gives you great freedom, and contains a nice, solid webserver that can be replaced easily with httpd.
Django includes its own ORM, which is nice enough; there's a standalone one for Python, though, which is even nicer: SQL Alchemy
As far as a testing library goes, pyunit seems to me to be the obvious choice
Good luck, and welcome to a really fun language!
EDIT summary: I originally recommended Karrigell, but can't any more: since the 3.0 release, it's been continuously broken, and the community is not large enough to solve the problems. CherryPy is a good substitute if you like a light, simple framework that doesn't get in your way, so I've changed the above to suggest it instead.

Well, if you're thinking of setting up an Ubuntu VM anyway, you might as well make that your development environment. Then you can install Apache and MySQL or Postgres on that VM just via the standard packaging tools (apt-get install), and there's no danger of polluting your Windows environment.
You can either do the actual development on your Windows machine via your favourite IDE, using the VM as a networked drive and saving the code there, or you can just use the VM as a full desktop environment and do everything there, which is what I would recommend.

Install the pre-configured ActivePython release from activestate.
Among other features, it includes the PythonWin IDE (Windows only) which makes it easy to explore Python interactively.
The recommended reference is Dive Into Python, mentioned many times on similar SO discussions.

You should install python 2.4, python 2.5, python 2.6 and python 3.0, and add to your path the one you use more often (Add c:\Pythonxx\ and c:\Pythonxx\Scripts).
For every python 2.x, install easy_install; Download ez_setup.py and then from the cmd:
c:\Python2x\python.exe x:\path\to\ez_setup.py
c:\Python2x\Scripts\easy_install virtualenv
Then each time you start a new project create a new virtual environment to isolate the specific package you needs for your project:
mkdir <project name>
cd <project name>
c:\Python2x\Scripts\virtualenv --no-site-packages .\v
It creates a copy of python and its libraries in .v\Scripts and .\v\Lib. Every third party packages you install in that environment will be put into .\v\Lib\site-packages. The -no-site-packages don't give access to the global site package, so you can be sure all your dependencies are in .\v\Lib\site-packages.
To activate the virtual environment:
.\v\Scripts\activate
For the frameworks, there are many. Django is great and very well documented but you should probably look at Pylons first for its documentions on unicode, packaging, deployment and testing, and for its better WSGI support.
For the IDE, Python comes with IDLE which is enough for learning, however you might want to look at Eclipse+PyDev, Komodo or Wingware Python IDE. Netbean 6.5 has beta support for python that looks promising (See top 5 python IDE).
For the webserver, you don't need any; Python has its own and all web framework come with their own. You might want to install MySql or ProgreSql; it's often better to develop on the same DB you will use for production.
Also, when you have learnt Python, look at Foundations of Agile Python Development or Expert Python Programming.

Using Python on Windows
SO: Python tutorial for total beginners?

Take a look at Pylons, read about WSGI and Paste.
There's nice introductory Google tech talk about them: ReUsable Web Components with Python and Future Python Web Development.
Here's my answer to similar question:
Django vs other Python web frameworks?

NOTE: I included a lot of links to frameworks, projects and what-not, but as a new user I was limited to 1 link per answer. If someone else with enough reputation to edit wants/can edit them into this answer instead of the footnotes, I'd be grateful.
There are some Python IDE's such as Wing IDE[1], I believe some people use Eclipse[2] with a python plugin[3] as well. A lot of people in the #python channel of FreeNode seem to prefer vim, emacs, nano and similar text editors in favor of IDE's. My personal preffered editor is Vim, but if you've mostly done .NET development on windows, presumably with the usual Visual X IDE's, vim and emacs will probably cause you culture shock and you'd be better of using an IDE.
Nearly all python web frameworks* support the WSGI standard[4], most of the large web servers have some sort of plugin to support WSGI, the others support WSGI via fast cgi or plain cgi.
The Zope[5] and Django[6] frameworks have their own ORM's, of other ORM's the two most well known appear to be SQL Alchemy[7] and SQL Object[8]. I only have experience with the former, but both support all possible sane database choices, including SQLite which is installed together with Python and hence perfectly suited to testing and experimenting without polluting your .NET environment with 3rd part web servers and database servers.
The builtin unittest[9] and pyunit[10] frameworks seem to be the preffered solutions for unit testing, but I don't have much experience with these.
bpython[11] and ipython[12] offer enhanced interactive python shells which can greatly help speed up and testing small bits of code and hence worth looking in to.
As for a list of well known and often used web frameworks, look into the following frameworks**:
Twisted[13] is a generic networking framework, which supports almost every single protocol under the sun.
Pylons[14] is light-weight framework aimed at being as flexible as possible and leaving all the choices about what ORM, templating language and what-not to you.
CherryPy[15] tries to provide an interface to expose Python objects to the web.
Django[6] attempts to be an all-in-one solution, builtin template system, ORM, admin pages and internationalization. While the previous frameworks have more DIY wiring together various frameworks work involved with them.
Zope[5] is aimed to be suitable for large enterprise applications, I've heard nothing but good things about it, but consensus seems to be that for smaller you're probably better off with one of the simpler and smaller frameworks.
TurboGears[16] is the framework I know the least about, but it seems to be mostly competition for Django.
This is everything I can think of right now, I'll edit and add stuff if I can think of it. I hope this helps you some in the wonderful world of python.
* - The main exception would be Apache's mod_python, which you should avoid for exactly that reason, use mod_wsgi instead.
** - Word of warning, I have not personally used these frameworks this is just a very short impression I have gotten from talking to other people about each framework, it may be wildly inaccurate. (If anyone has any corrections, do comment and I'll try to edit and fix this answer).
(The http:// is missing since they're recognized as links otherwise)
[1] www.wingware.com/
[2] www.eclipse.org/
[3] pydev.sourceforge.net/
[4] wsgi.org/wsgi/
[5] www.zope.org/
[6] www.djangoproject.com/
[7] www.sqlalchemy.org/
[8] www.sqlobject.org/
[9] docs.python.org/library/unittest.html
[10] pyunit.sourceforge.net/pyunit.html
[11] www.bpython-interpreter.org/
[12] ipython.scipy.org/
[13] twistedmatrix.com/trac/
[14] pylonshq.com/
[15] www.cherrypy.org/
[16] turbogears.org/

Environment?
Here is the simplest solution:
Install Active Python 2.6. Its the Python itself, but comes with some extra handy useful stuff, like DiveintoPython chm.
Use Komodo Edit 5. It is among the good free editor you can use for Python.
Use IDLE. Its the best simplest short snippet editor, with syntax highlighting and auto complete unmatched by most other IDEs. It comes bundled with python.
Use Ipython. Its a shell that does syntax highlighting and auto complete, bash functions, pretty print, logging, history and many such things.
Install easy_install and/or pip for installing various 3rd party apps easily.
Coming from Visual Studio and .Net it will sound a lot different, but its an entirely different world.
For the framework, django works the best. Walk thro the tutorial and you will be impressed enough. The documentation rocks. The community, you have to see for yourself, to know how wonderful it is!!

Python has build in SQL like database and web server, so you wouldn't need to install any third party apps. Remember Python comes with batteries included.

If you've worked with Eclipse before you could give Pydev a try

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.