My Python package needs to download a data file (cc-cedict) that I cannot include in my package because of licensing issues. The file is large enough that I only want to force users to download it once and have it live somewhere in the local filesystem where it can be found by my package. I may want to treat other external data resources in this fashion as well.
What is the recommended way of doing this? Specifically, is there a platform-neutral way of getting a user-writable directory path where such a datafile can be stored and accessed?
I've done enough research to know that the file cannot be stored inside of the package (I guess permissions would make that complicated even if it was allowed generally).
Related
I am currently writing a python script for a project, but to move on the with the project,
I need to know where all the downloaded files/programs[exe] will go to in Windows.
I know that it will go to download folder of the user's, as long as the user did not change the default location for the downloaded folder, or they did not do save as to a certain location but I am asking if there is a way to locate all the files downloaded regardless of where they are saved?
Any help will be a great help, I have googled but haven't found the answer i was looking for so hoping someone here could provide some insight.
The fact that a file was downloaded once-upon-a-time isn't something you can observe from the filesystem. In other words, no, you can't do this.
However, NTFS does store the fact that a file was JUST downloaded in an alternate data stream (ADS) which you can read. This is how Windows warns you that a file was downloaded from the internet and might be dangerous.
The problem with that is, if the file is ever opened and the user says the file is safe, that data is removed. You can't know that a file was previously downloaded, only if it was downloaded and has never been opened.
If your python script needs to act upon some other files, you should ask the user where the files are, either on the command line or interactively.
There's a few things to consider.
You can always scan the entire drive to find the latest files. It's slow, but possible. That's your worst case scenario.
By extension, you can leverage the Windows file index - Windows Search. This will speed up searching and allow sorting by date, but is still just a faster version of the first option. In other words, it doesn't tell you anything new.
To get the current user's default downloads folder, consider using Windows environment variables, such as: %USERPROFILE%\downloads. This can greatly simplify programmatically finding the current user's folder without having to know their username.
(Edited) I had stated that there was no way to track the origin, but as pointed out in another answer it is possible to tell if it was downloaded using ADS. Specifically, the Zone.Identifier stream signifies that the file came from a different "security zone", i.e. not this computer. Other than that, it doesn't provide details of where it came from, but perhaps that's not important for your use case.
I need to extract the dependencies and versions of build.gradle file. I do not have access to the project folder, only to the file, so this answer does not work in my case. I am using python to do the parsing, but it has not worked for me, especially since it does not have a structure already defined for example JSON.
I'm using these files to test my parsing:
Twidere gradle
votling grade
Thanks in advance
Unfortunately, you can't do what you want.
As you can see from the answer given to the SO post you linked, a gradle build file is a script. That script is written in either Kotlin or Groovy, and you can programmatically define the version and dependencies in a multitude of ways. For instance, to set a version, you can hard code it in the script, reference a system property, or get it through an included plugin and more. In your first example, it is set through an extension property, and in the second it is not even defined - likely leaving it up to the individual sub-projects if they even use it. In both examples, the build files are just a small part of a larger multi-project, and each individual project potentially has their own defined dependencies and version.
So there is really no way to tell without actually evaluating the script. And you can't do that unless you have access to the full project structure.
My goal is to make a program I've written easily accessible to potential employers/etc. in order to... showcase my skills.. or whatever. I am not a computer scientist, and I've never written a python module meant for installation before, so I'm new to this aspect.
I've written a machine learning algorithm, and fit parameters to data that I have locally. I would like to distribute the algorithm with "default" parameters, so that the downloader can use it "out of the box" for classification without having a training set. I've written methods which save the parameters to/load the parameters from text files, which I've confirmed work on my platform. I could simply ask users to download the files I've mentioned seperately and use the loadParameters method I've created to manually load the parameters, but I would like to make the installation process as easy as possible for people who may be evaluating me.
What I'm not sure is how to package the text files in such a way that they can automatically be loaded in the __init__ method of the object I have.
I have put the algorithm and files on github here, and written a setup.py script so that it can be downloaded from github using pip like this:
pip install --upgrade https://github.com/NathanWycoff/SySE/tarball/master
However, this doesn't seem to install the text files containing the data I need, only the __init__.py python file containing my code.
So I guess the question boils down to: How do I force pip to download additional files aside from just the module in __init__.py? Or, is there a better way to load default parameters?
Yes, there is a better way, how you can distribute data files with python package.
First of all, read something about proper python package structure. For instance, it's not recommended to put a code into __init__ files. They're just marking that a directory is a python package, plus you can do some import statements there. So, it's better, if you put your SySE class to (for instance) file syse.py in that directory and in __init__.py you can from .syse import SySE.
To the data files. By default, setuptools will distribute only *.py and several other special files (README, LICENCE and so on). However, you can tell to setuptools that you want distribute some other files with the package. Use setup's kwarg package_data, more about that here. Also don't forget to include all you data file into MANIFEST.in, more on that here.
If you do all the above correctly, than you can use package pkg_resources to discover your data files on runtime. pkg_resources handles all possible situations - your package can be distributed in several ways, it can be installed from pip server, it can be installed from wheel, as egg,...more on that here.
Lastly, if you package is public, I can only recommend to upload it on pypi (in case it is not public, you can run your own pip server). Register there and upload your package. You could than do only pip install syse to install it from anywhere. It's quite likely the best way, how to distribute your package.
It's quite a lot work and reading but I'm pretty sure you will benefit from it.
Hope this help.
I have a Python script with a make install (by default to /usr/local/lib/python2.7/dist-packages/) option available. But the script also generates files with user-specific mutable data during script's normal usage. It seems to me that I should not keep compiled script files in couple with data. What is conventional default place to store software data in such cases?
Summarizing from the Filesystem Hierarchy Standard:
Immutable architecture-independent data should go in /usr/share or /usr/local/share. Mutable data should go in the user's home directory if it's user-specific (XDG provides more guidance here), or in /var if it's system-wide (this usually requires a group-owned directory and files, and a setgid application, to allow writing to a shared file).
/usr/share and /usr/local/share usually have a structure that somewhat mirrors /usr/lib and /usr/local/lib; I don't know about Python, but Perl has a module File::ShareDir that helps a module with installing and accessing data in a share directory corresponding to the directory where the module is installed.
And don't forget the other option: just ask the user where the data should go.
Using this general structure:
setup.py
/package
__init__.py
project.py
/data
client.log
I have a script that saves a list of names to client.log, so I don't have to reinitialize that list each time I need access to it or run the module. Before I set up this structure with pkg_resources, I used open('.../data/client.log', 'w') to update the log with explicit paths, but this doesn't work anymore.
Is there any way to edit data files within modules? Or is there a better way to save this list?
No, pkg_resources are for reading resources within a package. You can't use it to write log files, because it's the wrong place for log files. Your package directory should typically not be writeable by the user that loads the library. Also, your package may in fact be inside a ZIP-file.
You should instead store the logs in a log directory. Where to put that depends on a lot of things, the biggest issue is your operating system but also if it's system software or user software.