share files and functions through multiple projects

share files and functions through multiple projects - python

it's a kind of open question but please bear with me.
I am working on several projects (mainly with pandas) and I have created my standard approach to manage them:
1. create a main folder for all files in a project
2. create a data folder
3. have all the output in another folder
and so on.
One of my main activities is data cleaning, and in order to standardize it I have created a dictionary file where I store the various translation of the same entity, e.g. USA, US, United States, and so on, so that the files I am producing are consistent.
Every time I create a new project, I copy the dictionary file in the data directory and then:
xls = pd.ExcelFile(r"data/dictionary.xlsx")
df_area = xls.parse("area")
and after, to translate the country name into my standard, I call:
join_column, how_join = "country", "inner"
df_ct = pd.concat([
df_ct.merge(df_area, left_on=join_column, right_on="country_name", how=how_join),
df_ct.merge(df_area, left_on=join_column, right_on="alternative01", how=how_join),
and finally I check that I am not losing an record with a miss-join.
Over and over the same thing.
I would like to have a way to remove all this unnecessary cut and paste (of the file and of the code). Also, the file I used on the first projects are already deprecated and I need to update them (and sometime the code) when I need to process new data. Sometimes I also lose track of where is the latest dictionary file! Overall it's a lot of maintenance, which I believe might be saved.
Creating my own package is the way to go or is it a little too much ambitious?
Is there another shortcut? Overall it's not a lot of code, but multiplied by several projects.
Thanks for any insight, your time going through this is appreciated.

At the end I decided to create my own package.
It required some time so I am happy to share the details about the process (I run python on jupyter and windows).
The first step is to decide where to store the code.
In my case it was C:\Users\my_user\Documents
You need to add this directory to the list of the directories where python is looking for packages. this is achieved running the following statement:
import sys
sys.path.append("C:\\Users\\my_user\\Documents")
In order to run the above statement each time you start python, it must be included into a file in the directory (this directory might vary depending on your installation):
C:\Users\my_user\.ipython\profile_default\startup
the file can be named "00-first.py" ("50-middle.py" or "99-last.py" will also work)
To verify everything is working, restart python and run the command:
print(sys.path)
you should be able to see your directory at this point.
create a folder with the package name in your directory, and a subfolder (I prefer not to have code in the main package folder)
C:\Users\my_user\Documents\my_package\my_subfolder
put an empty file named "_ _init__.py" (note that there should be no space between underscores, but I do not know how to achieve it with the editor) in each of the two folders: my package and my_subfolder. At this point you should be able already to import your empty package from python
import my_package as my_pack
inside my_subfolder create a file (my_code.py) which will store the actual code
def my_function(name):
print("Hallo " + name)
modify the outer _ _init__.py file to include shortcuts. Add the following:
from my_package.my_subfolder.my_code import my_function
You should be able now to run the following in python:
my_pack.my_function("World!")
Hope you find it useful!

Related

naming and storing fileinformation for comparison

I am currently working on a script that automatically syncs files from the Documents and Picture directory with an USB stick that I use as sort of an "essentials backup". In practice, this should identify filenames and some information about them (like last time edited etc.) in the directories that I choose to sync.
If a file exists in one directory, but not in the other (i.e. it's on my computer but not on my USB drive), it should automatically copy that file to the USB as well. Likewise, if a file exists in both directories, but has different mod-times, it should replace the older with the newer one.
However, I have some issues with storing that information for the purpose of comparing those files. I initially thought about a file class, that stores all that information and through which I can compare objects with the same name.
Problem 1 with that approach is, that if I create an object, how do I name it? Do I name it like the file? I then would have to remove the file-extension like .txt or .py, because I'd run into trouble with my code. but I might have a notes.odt and a notes.jpg, which would be problem 2.
I am pretty new to Python, so my imagination is probably limited by my lack of knowledge. Any pointers on how I could make that work?

Create new directory with files without copying or overwriting old ones? (Python)

I'm working with many thousands of images held in a particular folder structure which is needed for certain image processing programs. There is a different program, however, that requires the images to be in a different folder structure. I can't change the programs, so what I do is just make the second folder structure out of the first one using copyfile from python's shutil module.
This works okay with small data sets, but my latest one is 12 gigs and it is so silly to have duplicates of everything. Is there a way to create multiple folders containing the "same" file?
Thanks so much!

The usual solution to this is to simply make a symbolic link with the desired name. This saves the trouble of consuming double disk storage. In UNIX (Linux), the command is
ln -s <existing_dir_name> <desried_dir_name>
In Windows, you make a "short cut".

Creating Executable Zip Archives With Packages in the Project

I feel this question needs a better title and I will amend it if someone suggests something better. The problem is I'm not sure of the terminology of the feature that I'm using here.
The best way to describe my problem is to show what I've done. The project is here: https://github.com/jeffnyman/quendor
This project is setup so it can be executed as a module. For example, from the project root someone could do this:
python3 -m quendor
I also have a build script to generate an in-memory zip (if I'm using that terminology correctly):
https://github.com/jeffnyman/quendor/blob/master/build.py
That works in that if you run build.py it will generate a quendor.py file that executes the entire project. That worked fine up until I included other directories (like my utilities and zinterface).
With the project as it is in the repo right now, if you run the build (.\build.py) and then run the generated file:
./quendor.py
You get the following error:
File "./quendor.py/quendor/__main__.py", line 6, in <module>
ModuleNotFoundError: No module named 'quendor.zinterface'
So a key point: if all of my files are in the same directory (i.e., in quendor) this build script works fine in terms of producing an executable script file.
But once I include the subdirectories and files in those directories, things go south on me with the above error.
I'm sure all the files are being gathered. I handle that starting on line 18 (https://github.com/jeffnyman/quendor/blob/master/build.py#L18). And if you were to add to line 24 this statement:
print(f"* {file_path}")
You would see it outputs the following:
* quendor/__init__.py
* quendor/__version__.py
* quendor/zinterface/fileio.py
* quendor/utilities/messages.py
* quendor/__main__.py
So I'm suspecting it might have to do with the code where I write the string at line 28 (https://github.com/jeffnyman/quendor/blob/master/build.py#L28). I feel I have to do more to let the executable zipped script file know about the modules.
But I'm not sure if (1) I'm accurate and (2) even if I'm accurate, if that's possible. I'm finding I'm in a bit over my head here.
Any thoughts would be appreciated and I'm happy to update with any necessarily clarifications or terminology.

So it won't let me comment unless I have more reputation but I can post an answer. Even though I don't have an answer, but rather a comment. I think the above comment was not meant for your actual __main__.py file but rather the one that is getting generated in your quendor.py file. You might want to try adding the import statements to your packed string that you write.
For example, see what happens if on line 32 you add this: import quendor.zinterface.fileio as zio. (Don't replace the line that's there. Just put my line and then keep your others.) I'm not sure how the zip process works but if it tries to mirror the module process that should work. However, if it doesn't, that won't work. You might also just want to try doing import quendor.zinterface. By itself that won't work but it would be interesting to see if it gave you a different error.

Actually, it turns out I found a way to do this! It required using os.walk rather than os.listdir. This required taking a few ideas that people here discussed. Here is the script that does the trick:
https://github.com/jeffnyman/quendor/blob/master/build.py
You can compare that with my previous commit that was trying to handle this a different way.
Eldritch was right that I couldn't just flatten the directory nor could I just add imports to the string I was writing to the final zip file. Jean-François was correct that I had to focus on the __main__.py that was being generated. My contribution was figuring out os.walk() and then parameterizing the written string to handle the different directories.
Finally, this solution does require, as per HTF's suggestion, that I put an empty __init__.py file in each package.
With my solution in place, you can run build.py which then generates the quendor.py script. That script then executes correctly, in terms of recognizing the imports to various packages.

Playing around with just about every variation of import and file gathering that I can think of with your repo, there's a good news / bad news thing.
The bad news is that the answer is this: it isn't possible.
The good news is this: you do have a working implementation if you just keep all files in the quendor directory rather than having subdirectories.
The other good news is you stumbled on something, and posed a problem, that Python gurus aren't able to answer. And there's a certain pleasure to be found in that! I guarantee you will not get an answer to this that works (except for the "all files in one directory" solution).
A refinement to the answer is that if you're setting up the program to run as a module anyway, just use a pip configuration. That basically does the same thing that you want but without having to go through the contortions. (Unless there's a reason you were doing the build the way you were rather than using pip.)

Maya Python: fix matching names

I am writing a script to export alembic caches of animation in a massive project containing lots of maya files. Our main character is having an issue; along the way his eyes somehow ended up with the same name. This has created issues with the alembic export. Dose maya already have a sort of clean up function that can correct matching names?

Any two objects can have the same names, but never the same DAG paths. In your script, make sure all your ls, listRelatives calls etc. Have the full path or longName or long flags set so you always operate on the full DAG paths as opposed to the possibly conflicting short names.

To my knowledge maya (and its python api) does not offer anything like that.
You'll have to run a snippet on export to check for duplicates before export.
Or, alternatively use an already existing script and run that.

can linux command line programs see python temporary files?

I have a simple web-server written using Python Twisted. Users can log in and use it to generate certain reports (pdf-format), specific to that user. The report is made by having a .tex template file where I replace certain content depending on user, including embedding user-specific graphs (.png or similar), then use the command line program pdflatex to generate the pdf.
Currently the graphs are saved in a tmp folder, and that path is then put into the .tex template before calling pdflatex. But this probably opens up a whole pile of problems when the number of users increases, so I want to use temporary files (tempfile module) instead of a real tmp folder. Is there any way I can make pdflatex see these temporary files? Or am I doing this the wrong way?

without any code it's hard to tell you how, but
Is there any way I can make pdflatex see these temporary files?
yes you can print the path to the temporary file by using a named temporary file:
>>> with tempfile.NamedTemporaryFile() as temp:
... print temp.name
...
/tmp/tmp7gjBHU

As commented you can use tempfile.NamedTemporaryFile. The problem is that this will be deleted once it is closed. That means you have to run pdflatex while the file is still referenced within python.
As an alternative way you could just save the picture with a randomly generated name. The tempfile is designed to allow you to create temporary files on various platforms in a consistent way. This is not what you need, since you'll always run the script on the same webserver I guess.
You could generate random file names using the uuid module:
import uuid
for i in xrange(3):
print(str(uuid.uuid4()))
The you save the pictures explictly using the random name and pass insert it into the tex-file.
After running pdflatex you explicitly have to delete the file, which is the drawback of that approach.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.