Quickly Search a Drive with Python

Quickly Search a Drive with Python - python

I recently asked this question and got a wonderful answer to it involving the os.walk command. My script is using this to search through an entire drive for a specific folder using for root, dirs, files in os.walk(drive):. Unfortunately, on a 600 GB drive, this takes about 10 minutes.
Is there a better way to invoke this or a more efficient command to be using? Thanks!

If you're just looking for a small constant improvement, there are ways to do better than os.walk on most platforms.
In particular, walk ends up having to stat many regular files just to make sure they're not directories, even though the information is (Windows) or could be (most *nix systems) already available from the lower-level APIs. Unfortunately, that information isn't available at the Python level… but you can get to it via ctypes or by building a C extension library, or by using third-party modules like scandir.
This may cut your time to somewhere from 10% to 90%, depending on your platform and the details of your directory layout. But it's still going to be a linear search that has to check every directory on your system. The only way to do better than that is to access some kind of index. Your platform may have such an index (e.g., Windows Desktop Search or Spotlight); your filesystem may as well (but that will require low-level calls, and may require root/admin access), or you can build one on your own.

Use subprocess.Popen to start a native 'find' process.

scandir.walk(path) gives 2-20 times faster results then the os.walk(path).
you can use this module pip install scandir
here is docs for scandir

Related

Why do we need qresource files?

Why does Qt seem to need "resource files"? I mean the things you compile with pyside2-rcc resources.qrc -o resources.py, for example.
The documentation seems to say that it has something to do with accessing paths, abstracted in such a way that Python doesn't trip over the differences in a cross-platform application. But IME, Python does fine with Linux/Unix-style pathnames on Windows, especially if you use pathlib or os.path.join() or something.
In the days I used to use DOS, I found that command.com had problems with /, but you could binary-patch command.com to rename cd to something else, and use a cd.exe that worked with /'s fine - it was command.com that had the problem, not the underlying OS.
https://doc.qt.io/qtforpython/overviews/resources.html seems to say it has something to do with not losing files, which seems kinda odd.
So why does Qt need resource files, and can I safely skip them?

TL; DR; We are not required to use qresource, it is only an option.
It is not that Qt requires qresources on a mandatory basis, but rather that it is an option that in the world of Qt provides us. That is, using them or not in general does not bring a benefit or harm.
From a Qt/C++ point of view, resources allow us to embed resources within binaries. And this abstraction possibly avoids having problems with the directory system paths. And since PySide2 is a Qt wrapper it also tries to expose that tool.
And as you point out, there are several python libraries that already handle the paths in a generic way avoiding the known routing problems between OS so you could use it.

Deploying Python modules for 3D software

I have been developing a fairly extensive library of python modules that automate the more time consuming parts of "3D character development" for games/film/tv.
All of my code up until a few months ago has been run within Maya's dedicated python interpreter, however, my GUIs are built in PySide/PyQt, and so, run just fine in mac/windows/linux or a few other Graphics programs such as Nuke, XSI, Max.
What I would really like to figure out is a "simple" way to distribute my code to various different people ---> using various different operating Systems ---> potentially using various applications (Nuke, XSI, Max), which, in turn, have their own dedicated python interpreters.
The obvious option would be pip and easy_install.. These modules are clearly the "right" way to go, but its not really clear how a user would install/run them under the dedicated python installs that ship with Maya/Nuke/ etc...Though, it does seem possible (as explained here). Still Its going to be a pretty big barrier for a less-technical user.
Any help or points in the right direction would be immensely appreciated..

I would not say that pip/easy_install are the 'right' way for this problem. They are pretty good (not quite 'great') tools for motivated, technically inclined users -- but even in that context they have issues (such as unintended upgrades or deletions). Most importantly, they are opt-in methods: nobody can make you pip unless you want to. This means users can accidentally or deliberately get themselves into very different positions from each other, which makes support and maintenance a nightmare.
I've had very good luck in Maya distributing a zipped file containing a complete environment - all the modules etc. userSetup.py adds that zip to the path and the Python's native zipimport functionality handles the rest. This makes sure that there is only one file to maintain and distribute. It also fixes the common problem of leftover .pyc files creating havok after .py files get moved or renamed. Since this is all standard python, I'd assume this will work for any app-specific python that uses a 2.6+ version of python, though I've never tried it in Nuke or Max.
The main wrinkle will be modules with .pyd or other binary components, typically these don't work inside the zip files. I include a bootstrap routine which unpacks those to a (disposable) location on the user's disk and adds that to the path.
There's a detailed discussion of the method here and some background here

Python .pyc files and Windows UAC

I'm working on an Inno Setup installer for a Python application for Windows 7, and I have these requirements:
The app shouldn't write anything to the installation directory
It should be able to use .pyc files
The app shouldn't require a specific Python version, so I can't just add a set of .pyc files to the installer
Is there a recommended way of handling this? Like give the user a way to (re)generate the .pyc files? Or is the shorter startup time benefit from the .pyc files usually not worth worrying about?

PYC files aren't guaranteed to be compatible for different python versions. If you don't know that all your customers are running the same python versions, you really don't want to distribute pyc's directly. So, you have to choose between distributing PYCs and supporting multiple python versions.
You could create build process that compiles all your files using py_compile and zips them up into a version-specific package. You can do this with setuptools.; however it will be awkward to do because you'll have to run py_compile in every version you need to support.
If you are basically distributing a closed application and don't want people to have trivial access to your source code, then py2exe is probably a simpler alternative. If your python is supposed to be integrated into the user's python install, then it's probably simpler to just create a zip of your .py files and add a one-line .py stub that imports the zipped package(s) using zipfile
if it makes you feel better, PYC doesn't provide much extra security and it doesn't really boost perf much either :)

If you haven't read PEP 3147, that will probably answer your questions.
I don't mean the solution described in that PEP and implemented as of Python 3.2. That's great if your "multiple Python versions" just means "3.2, 3.3, and probably future 3.x". Or even if it means "2.6+ and 3.1+, but I only really care about 3.2 and 3.3, so if I don't get the pyc speedups for other ones that's OK".
But when I asked your supported versions, you said, "2.7", which means you can't rely on PEP 3147 to solve your problems.
Fortunately, the PEP is full of discussion of earlier attempts to solve the problem, and the pitfalls of each, and there should be more than enough there to figure out what the options are and how to implement them.
The one problem is that the PEP is very linux-centric—mainly because it's primarily linux distros that tried to solve the problem in the past. (Apple also did so, but their solution was (a) pretty much working, and (b) tightly coupled with the whole Mac-specific "framework" thing, so they were mostly ignored…)
So, it largely leaves open the question of "Where should I put the .pyc files on Windows?"
The best choice is probably an app-specific directory under the user's local application data directory. See Known Folders if you can require Vista or later, CSIDL if you can't. Either way, you're looking for the FOLDERID_LocalAppData or CSIDL_LOCAL_APPDATA, which is:
The file system directory that serves as a data repository for local (nonroaming) applications. A typical path is C:\Documents and Settings\username\Local Settings\Application Data.
The point is that it's a place for applications to store data that's separate for each user (and inside that user's profile directory), and also for each machine the user's roaming profile might end up on, which means you can safely put stuff there and know that the user has the permissions to write there without UAC getting involved, and also know (as well as you ever can) that no other user or machine will interfere with what's there.
Within that directory, you create a directory for your program, and put whatever you want there, and as long as you picked a unique name (e.g., My Unique App Name or My Company Name\My App Name or a UUID), you're safe from accidental collision with other programs. (There used to be specific guidelines on this in MSDN, but I can no longer find them.)
So, how do you get to that directory?
The easiest way is to just use the env variable %LOCALAPPDATA%. If you need to deal with older Windows, you can use %USERPROFILE% and tack \Local Settings\Application Data onto the end, which is guaranteed to either be the same, or end up in the same place via junctions.
You can also use pywin32 or ctypes to access the native Windows APIs (since there are at least 3 different APIs for this and at least two ways to access those APIs, I don't want to give all possible ways to write this… but a quick google or SO search for "pywin32 SHGetFolderPath" or "ctypes SHGetKnownFolderPath" or whatever should give you what you need).
Or, there are multiple third-party modules to handle this. The first one both Google and PyPI turned up was winshell.

Re-reading the original question, there's a much simpler answer that probably fits your requirements.
I don't know much about Inno, but most installers give you a way to run an arbitrary command as a post-copy step.
So, you can just use python -m compileall to create the .pyc files for you at install time—while you've still got elevated privileges, so there's no problem with UAC.
In fact, if you look at pywin32, and various other Python packages that come as installer packages, they do exactly this. This is an idiomatic thing to do for installing libraries into the user's Python installation, so I don't see why it wouldn't be considered reasonable for installing an executable that uses the user's Python installation.
Of course if the user later decides to uninstall Python 2.6 and install 2.7, your .pyc files will be hosed… but from your description, it sounds like your entire program will be hosed anyway, and the recommended solution for the user would probably be to uninstall and reinstall anyway, right?

Py2App Can't find standard modules

I've created an app using py2app, which works fine, but if I zip/unzip it, the newly unzipped version can't access standard python modules like traceback, or os. The manpage for zip claims that it preserves resource forks, and I've seen other applications packaged this way (I need to be able to put this in a .zip file). How do I fix this?

This is caused by building a semi-standalone version that contains symlinks to the natively installed files and as you say, the links are lost when zipping/unzipping unless the "-y" option is used.
An alternate solution is to build for standalone instead, which puts (public domain) files inside the application and so survives zipping/unzipping etc. better. It also means the app is more resilient to changes in the underlying OS. The downside is that it is bigger, of course, and is more complicated to get it set up.
To build a stand alone version, you need to install the python.org version which can be repackaged.
An explanation of how to do this is here, but read the comments as there have been some changes since the blog post was written.

use zip -y ... to create the file whilst preserving symlinks.

You probably need to give it your full PYTHONPATH.
Depends on your os. Here's how to find out:
import os [or any other std module]
os.file()

Using C in a shared multi-platform POSIX environment

I write tools that are used in a shared workspace. Since there are multiple OS's working in this space, we generally use Python and standardize the version that is installed across machines. However, if I wanted to write some things in C, I was wondering if maybe I could have the application wrapped in a Python script, that detected the operating system and fired off the correct version of the C application. Each platform has GCC available and uses the same shell.
One idea was to have the C compiled to the users local ~/bin, with timestamp comparison with C code so it is not compiled each run, but only when code is updated. Another was to just compile it for each platform, and have the wrapper script select the proper executable.
Is there an accepted/stable process for this? Are there any catches? Are there alternatives (assuming the absolute need to use native C code)?
Clarification: Multiple OS's are involved that do not share ABI. Eg. OS X, various Linuxes, BSD etc. I need to be able to update the code in place in shared folders and have the new code working more or less instantaneously. Distributing binary or source packages is less than ideal.

Launching a Python interpreter instance just to select the right binary to run would be much heavier than you need. I'd distribute a shell .rc file which provides aliases.
In /shared/bin, you put the various binaries: /shared/bin/toolname-mac, /shared/bin/toolname-debian-x86, /shared/bin/toolname-netbsd-dreamcast, etc. Then, in the common shared shell .rc file, you put the logic to set the aliases according to platform, so that on OSX, it gets alias toolname=/shared/bin/toolname-mac, and so forth.
This won't work as well if you're adding new tools all the time, because the users will need to reload the aliases.
I wouldn't recommend distributing tools this way, though. Testing and qualifying new builds of the tools should be taking up enough time and effort that the extra time required to distribute the tools to the users is trivial. You seem to be optimizing to reduce the distribution time. Replacing tools that quickly in a live environment is all too likely to result in lengthy and confusing downtime if anything goes wrong in writing and building the tools--especially when subtle cross-platform issues creep in.

Also, you could use autoconf and distribute your application in source form only. :)

You know, you should look at static linking.
These days, we all have HUGE hard drives, and a few extra megabytes (for carrying around libc and what not) is really not that big a deal anymore.
You could also try running your applications in chroot() jails and distributing those.

Depending on your mix os OSes, you might be better off creating packages for each class of system.
Alternatively, if they all share the same ABI and hardware architecture, you could also compile static binaries.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.