Remove unused code automatically from large modules like numpy, pandas

Remove unused code automatically from large modules like numpy, pandas - python

I'm creating a tool for my company's developers that will take a python project, zip it up with the required modules from the site-packages of a virtualenv, and run the code in an AWS Lambda function. I need to do this because Lambda doesn't come with 3rd party modules, and also won't let you install using pip, so I have to bring them all myself. The problem is that there's also a 250mb limit for the total amount of code, and if I have numpy+pandas, that's already over the limit. However, the developers aren't using nearly the full functionality of these modules, so the actual amount of code being used is tiny in comparison. I'd like something that'll strip out at least some of the unused code based on what the developer's project uses, or at least give me enough information to where I can write a script to remove the dead code automatically. Does this exist, or is it at least partially implemented?

You could try PyMinifier which looks like it can reduce size by about half. It supports obfusticating code, compressing your project, and it looks like it has an analyzer to look for and exclude unused imports.
Edit: Linked to the documentation in my answer, here's the GitHub repo.

Related

Can a Package containing several sub-packages be also called as a Library in Python?

I am a little bit confused in the difference between a package and a library. When I install packages from pypi.org, these packages contain several sub-packages, that contain modules. When I googled the difference between a package and I library, I found this.
And that being the case, can a package contain several sub-packages be also called as a library? If no then what is a library? And what is the difference between a library and a package containing sub-packages?

Library
Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module
A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect. Main modules you probably see most often are popular because they are special modules that can get info from other files/modules. It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package
This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters. These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.

Extract dependencies and versions from a gradle file with Python

I need to extract the dependencies and versions of build.gradle file. I do not have access to the project folder, only to the file, so this answer does not work in my case. I am using python to do the parsing, but it has not worked for me, especially since it does not have a structure already defined for example JSON.
I'm using these files to test my parsing:
Twidere gradle
votling grade
Thanks in advance

Unfortunately, you can't do what you want.
As you can see from the answer given to the SO post you linked, a gradle build file is a script. That script is written in either Kotlin or Groovy, and you can programmatically define the version and dependencies in a multitude of ways. For instance, to set a version, you can hard code it in the script, reference a system property, or get it through an included plugin and more. In your first example, it is set through an extension property, and in the second it is not even defined - likely leaving it up to the individual sub-projects if they even use it. In both examples, the build files are just a small part of a larger multi-project, and each individual project potentially has their own defined dependencies and version.
So there is really no way to tell without actually evaluating the script. And you can't do that unless you have access to the full project structure.

Deploying Python modules for 3D software

I have been developing a fairly extensive library of python modules that automate the more time consuming parts of "3D character development" for games/film/tv.
All of my code up until a few months ago has been run within Maya's dedicated python interpreter, however, my GUIs are built in PySide/PyQt, and so, run just fine in mac/windows/linux or a few other Graphics programs such as Nuke, XSI, Max.
What I would really like to figure out is a "simple" way to distribute my code to various different people ---> using various different operating Systems ---> potentially using various applications (Nuke, XSI, Max), which, in turn, have their own dedicated python interpreters.
The obvious option would be pip and easy_install.. These modules are clearly the "right" way to go, but its not really clear how a user would install/run them under the dedicated python installs that ship with Maya/Nuke/ etc...Though, it does seem possible (as explained here). Still Its going to be a pretty big barrier for a less-technical user.
Any help or points in the right direction would be immensely appreciated..

I would not say that pip/easy_install are the 'right' way for this problem. They are pretty good (not quite 'great') tools for motivated, technically inclined users -- but even in that context they have issues (such as unintended upgrades or deletions). Most importantly, they are opt-in methods: nobody can make you pip unless you want to. This means users can accidentally or deliberately get themselves into very different positions from each other, which makes support and maintenance a nightmare.
I've had very good luck in Maya distributing a zipped file containing a complete environment - all the modules etc. userSetup.py adds that zip to the path and the Python's native zipimport functionality handles the rest. This makes sure that there is only one file to maintain and distribute. It also fixes the common problem of leftover .pyc files creating havok after .py files get moved or renamed. Since this is all standard python, I'd assume this will work for any app-specific python that uses a 2.6+ version of python, though I've never tried it in Nuke or Max.
The main wrinkle will be modules with .pyd or other binary components, typically these don't work inside the zip files. I include a bootstrap routine which unpacks those to a (disposable) location on the user's disk and adds that to the path.
There's a detailed discussion of the method here and some background here

deploying python applications

Is it possible to deploy python applications such that you don't release the source code and you don't have to be sure the customer has python installed?
I'm thinking maybe there is some installation process that can run a python app from just the .pyc files and a shared library containing the interpreter or something like that?
Basically I'm keen to get the development benefits of a language like Python - high productivity etc. but can't quite see how you could deploy it professionally to a customer where you don't know how there machine is set up and you definitely can't deliver the source.
How do professional software houses developing in python do it (or maybe the answer is that they don't) ?

You protect your source code legally, not technologically. Distributing py files really isn't a big deal. The only technological solution here is not to ship your program (which is really becoming more popular these days, as software is provided over the internet rather than fully installed locally more often.)
If you don't want the user to have to have Python installed but want to run Python programs, you'll have to bundle Python. Your resistance to doing so seems quite odd to me. Java programs have to either bundle or anticipate the JVM's presence. C programs have to either bundle or anticipate libc's presence (usually the latter), etc. There's nothing hacky about using what you need.
Professional Python desktop software bundles Python, either through something like py2exe/cx_Freeze/some in-house thing that does the same thing or through embedding Python (in which case Python comes along as a library rather than an executable). The former approach is usually a lot more powerful and robust.

Yes, it is possible to make installation packages. Look for py2exe, cx_freeze and others.
No, it is not possible to keep the source code completely safe. There are always ways to decompile.
Original source code can trivially be obtained from .pyc files if someone wants to do it. Code obfuscation would make it more difficult to do something with the code.

I am surprised no one mentioned this before now, but Cython seems like a viable solution to this problem. It will take your Python code and transpile it into CPython compatible C code. You also get a small speed boost (~25% last I checked) since it will be compiled to native machine code instead of just Python byte code. You still need to be sure the user has Python installed (either by making it a pre-requisite pushed off onto the user to deal with, or bundling it as part of the installer process). Also, you do need to have at least one small part of your application in pure Python: the hook into the main function.
So you would need something basic like this:
import cython_compiled_module
if __name__ == '__main__':
cython_compiled_module.main()
But this effectively leaks no implementation details. I think using Cython should meet the criteria in the question, but it also introduces the added complexity of compiling in C, which loses some of Python's easy cross-platform nature. Whether that is worth it or not is up to you.
As others stated, even the resulting compiled C code could be decompiled with a little effort, but it is likely much more close to the type of obfuscation you were initially hoping for.

Well, it depends what you want to do. If by "not releasing the source code" you mean "the customer should not be able to access the source code in any way", well, you're fighting a losing battle. Even programs written in C can be reverse engineered, after all. If you're afraid someone will steal from you, make them sign a contract and sue them if there's trouble.
But if you mean "the customer should not care about python files, and not be able to casually access them", you can use a solution like cx_Freeze to turn your Python application into an executable.

Build a web application in python. Then the world can use it via a browser with zero install.

Python4Delphi-powered program, how to deploy it?

My Delphi program uses Python4Delphi, and the Python script uses some standard libs. When deploying my program I don't want an entire Python installation, and it must work with python27.dll. What are the minimal set of necessary files? The document in Python4Delphi is dated and it's not clear to me...
Thanks for your help.

When I did this, I made the list myself, of what I needed for my embedded python application to work.
I remember this worked with python15.dll:
PythonXX.dll should work, without any other external files other than the Visual C++ Runtime DLLs, which require a side-by-side manifest (see the p4d wiki page) to work.
If you want to IMPORT something, then you need to ship it and anything it depends on. That means, either you pick part of the python standard libraries you want, or you pick all of it. There is no way you need all of Python's standard libraries. But I wouldn't want to live without OS and a a few other key ones. BUt the decision is yours.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove unused code automatically from large modules like numpy, pandas - python

You could try PyMinifier which looks like it can reduce size by about half. It supports obfusticating code, compressing your project, and it looks like it has an analyzer to look for and exclude unused imports. Edit: Linked to the documentation in my answer, here's the GitHub repo.

Related

Can a Package containing several sub-packages be also called as a Library in Python?

Extract dependencies and versions from a gradle file with Python

Deploying Python modules for 3D software

deploying python applications

Python4Delphi-powered program, how to deploy it?

Categories

Resources