Creating Executable Zip Archives With Packages in the Project

Creating Executable Zip Archives With Packages in the Project - python

I feel this question needs a better title and I will amend it if someone suggests something better. The problem is I'm not sure of the terminology of the feature that I'm using here.
The best way to describe my problem is to show what I've done. The project is here: https://github.com/jeffnyman/quendor
This project is setup so it can be executed as a module. For example, from the project root someone could do this:
python3 -m quendor
I also have a build script to generate an in-memory zip (if I'm using that terminology correctly):
https://github.com/jeffnyman/quendor/blob/master/build.py
That works in that if you run build.py it will generate a quendor.py file that executes the entire project. That worked fine up until I included other directories (like my utilities and zinterface).
With the project as it is in the repo right now, if you run the build (.\build.py) and then run the generated file:
./quendor.py
You get the following error:
File "./quendor.py/quendor/__main__.py", line 6, in <module>
ModuleNotFoundError: No module named 'quendor.zinterface'
So a key point: if all of my files are in the same directory (i.e., in quendor) this build script works fine in terms of producing an executable script file.
But once I include the subdirectories and files in those directories, things go south on me with the above error.
I'm sure all the files are being gathered. I handle that starting on line 18 (https://github.com/jeffnyman/quendor/blob/master/build.py#L18). And if you were to add to line 24 this statement:
print(f"* {file_path}")
You would see it outputs the following:
* quendor/__init__.py
* quendor/__version__.py
* quendor/zinterface/fileio.py
* quendor/utilities/messages.py
* quendor/__main__.py
So I'm suspecting it might have to do with the code where I write the string at line 28 (https://github.com/jeffnyman/quendor/blob/master/build.py#L28). I feel I have to do more to let the executable zipped script file know about the modules.
But I'm not sure if (1) I'm accurate and (2) even if I'm accurate, if that's possible. I'm finding I'm in a bit over my head here.
Any thoughts would be appreciated and I'm happy to update with any necessarily clarifications or terminology.

So it won't let me comment unless I have more reputation but I can post an answer. Even though I don't have an answer, but rather a comment. I think the above comment was not meant for your actual __main__.py file but rather the one that is getting generated in your quendor.py file. You might want to try adding the import statements to your packed string that you write.
For example, see what happens if on line 32 you add this: import quendor.zinterface.fileio as zio. (Don't replace the line that's there. Just put my line and then keep your others.) I'm not sure how the zip process works but if it tries to mirror the module process that should work. However, if it doesn't, that won't work. You might also just want to try doing import quendor.zinterface. By itself that won't work but it would be interesting to see if it gave you a different error.

Actually, it turns out I found a way to do this! It required using os.walk rather than os.listdir. This required taking a few ideas that people here discussed. Here is the script that does the trick:
https://github.com/jeffnyman/quendor/blob/master/build.py
You can compare that with my previous commit that was trying to handle this a different way.
Eldritch was right that I couldn't just flatten the directory nor could I just add imports to the string I was writing to the final zip file. Jean-François was correct that I had to focus on the __main__.py that was being generated. My contribution was figuring out os.walk() and then parameterizing the written string to handle the different directories.
Finally, this solution does require, as per HTF's suggestion, that I put an empty __init__.py file in each package.
With my solution in place, you can run build.py which then generates the quendor.py script. That script then executes correctly, in terms of recognizing the imports to various packages.

Playing around with just about every variation of import and file gathering that I can think of with your repo, there's a good news / bad news thing.
The bad news is that the answer is this: it isn't possible.
The good news is this: you do have a working implementation if you just keep all files in the quendor directory rather than having subdirectories.
The other good news is you stumbled on something, and posed a problem, that Python gurus aren't able to answer. And there's a certain pleasure to be found in that! I guarantee you will not get an answer to this that works (except for the "all files in one directory" solution).
A refinement to the answer is that if you're setting up the program to run as a module anyway, just use a pip configuration. That basically does the same thing that you want but without having to go through the contortions. (Unless there's a reason you were doing the build the way you were rather than using pip.)

Related

How can I prevent the filename from changing (rename) using python?

I have a project in mind, but there is a section that I don't know how to do. I'm using Python version 3.6 and windows 10. For example we have a file name of "example.txt" I want to prevent the name and its content of this file from being changed.
I did research on this topic, but I could not reach any research. Can we prevent the file's name (including its extension) from changing or its contents?To realize this, I think it is necessary to start as an administrator.
Thanks.

It is possible to stop another program from editing a file by locking it in python.
There is a module that does this called filelock. Take a look at the source code to see how it is done.
It is also worth noting that more advanced ransomware will try to stop processes so they can encrypt files, so this might not work in all cases.

How can the root directory in python chunk be specified?

Setting the root directory in a python-chunk with the following code line results in an error while for an ordinary r-chunk it works just fine
knitr::opts_knit$set(root.dir ="..")
knitr::opts_knit$set(root.dir ="..")
Optimally there should exist the following options for each knitr-chunk:
- directory to find code to be imported / executed
- directory to find files / dependencies that are needed for code execution
- directory to save any code output
Does something similar exist?

What it looks like here is that you have told it that it is to look for python code:
```{python}
knitr::opts_knit$set(root.dir ="..")
```
When you run this in R studio it will give you an error:
Error: invalid syntax (, line 1)
You fed it python code instead. This makes sense as the call knit::opt_knit$set means to look in the knitr package for the opts_knit$set and set it to…. This doesn’t exist in python… yet. The python compiler does not recognize it as python code and returns a syntax error. Whereas when you run it as an R chunk, it knows to look into the knitr package. Error handling can be huge issue to deal with. It makes more sense to handle error categories than to account for every type of error. If you want to control the settings for a code chunk, you would do so in the parenthesis ie:
```{python, args }
codeHere
```
I have not seen args for any other language than R, but that does not mean it doesn’t exist. I have just not seen them. I also do not believe that this will fix your issue. You could try some of the following ideas:
Writing your python in a separate file and link to it. This would allow for you to take advantage of the language and utilize things like the OS import. This may be something you want to consider as even python has its ways of navigating around the various operating systems. This may be helpful if you are just running quick scripts and not loading or running python programs.
# OS module
import os
# Your os name
print(os.name)
# Gets PWD or present working directory
os.getcwd()
# change your directory
os.chdir("path")
You could try using the reticulate library within an R chunk and load your python that way
Another thought is that you could try
library(reticulate)
use_python(“path”)
Knitr looks in the same directory as your markdown file to find other files if needed. This is just like about any other IDE
At one point in time knitr would not accept R’s setwd() command. Trying to call setwd() may not do you any good.
It may not the best idea to compute paths relative to what's being executed. If possible they should be determined relative to where the user calls the code from.
This site may help.
The author of the knitr package is very active and involved. Good luck with what you are doing!

share files and functions through multiple projects

it's a kind of open question but please bear with me.
I am working on several projects (mainly with pandas) and I have created my standard approach to manage them:
1. create a main folder for all files in a project
2. create a data folder
3. have all the output in another folder
and so on.
One of my main activities is data cleaning, and in order to standardize it I have created a dictionary file where I store the various translation of the same entity, e.g. USA, US, United States, and so on, so that the files I am producing are consistent.
Every time I create a new project, I copy the dictionary file in the data directory and then:
xls = pd.ExcelFile(r"data/dictionary.xlsx")
df_area = xls.parse("area")
and after, to translate the country name into my standard, I call:
join_column, how_join = "country", "inner"
df_ct = pd.concat([
df_ct.merge(df_area, left_on=join_column, right_on="country_name", how=how_join),
df_ct.merge(df_area, left_on=join_column, right_on="alternative01", how=how_join),
and finally I check that I am not losing an record with a miss-join.
Over and over the same thing.
I would like to have a way to remove all this unnecessary cut and paste (of the file and of the code). Also, the file I used on the first projects are already deprecated and I need to update them (and sometime the code) when I need to process new data. Sometimes I also lose track of where is the latest dictionary file! Overall it's a lot of maintenance, which I believe might be saved.
Creating my own package is the way to go or is it a little too much ambitious?
Is there another shortcut? Overall it's not a lot of code, but multiplied by several projects.
Thanks for any insight, your time going through this is appreciated.

At the end I decided to create my own package.
It required some time so I am happy to share the details about the process (I run python on jupyter and windows).
The first step is to decide where to store the code.
In my case it was C:\Users\my_user\Documents
You need to add this directory to the list of the directories where python is looking for packages. this is achieved running the following statement:
import sys
sys.path.append("C:\\Users\\my_user\\Documents")
In order to run the above statement each time you start python, it must be included into a file in the directory (this directory might vary depending on your installation):
C:\Users\my_user\.ipython\profile_default\startup
the file can be named "00-first.py" ("50-middle.py" or "99-last.py" will also work)
To verify everything is working, restart python and run the command:
print(sys.path)
you should be able to see your directory at this point.
create a folder with the package name in your directory, and a subfolder (I prefer not to have code in the main package folder)
C:\Users\my_user\Documents\my_package\my_subfolder
put an empty file named "_ _init__.py" (note that there should be no space between underscores, but I do not know how to achieve it with the editor) in each of the two folders: my package and my_subfolder. At this point you should be able already to import your empty package from python
import my_package as my_pack
inside my_subfolder create a file (my_code.py) which will store the actual code
def my_function(name):
print("Hallo " + name)
modify the outer _ _init__.py file to include shortcuts. Add the following:
from my_package.my_subfolder.my_code import my_function
You should be able now to run the following in python:
my_pack.my_function("World!")
Hope you find it useful!

Extremely new user to Python. "No module named request" error while trying code to detect image subdomains in a website to extract them to a folder

I may sound rather uninformed writing this, and unfortunately, my current issue may require a very articulate answer to fix. Therefore, I will try to be specific as possible as to ensure that my problem can be concisely understood.
My apologizes for that- as this Python code was merely obtained from a friend of mine who wrote it for me in order to complete a certain task. I myself had had extremely minimal programming knowledge.
Essentially, I am running Python 3.6 on a Mac. I am trying to work out a code that allows Python to scan through a bulk of a particular website's potentially existent subdomains in order to find possibly-existent JPG images files contained within said subdomains, and download any and all of the resulting found files to a distinct folder on my Desktop.
The Setup-
The code itself, named "download.py" on my computer, is written as follows:
import urllib.request
start = int(input("Start range:100000"))
stop = int(input("End range:199999"))
for i in range(start, stop + 1):
filename = str(i).rjust(6, '0') + ".jpg"
url = "http://website.com/Image_" + filename
urllib.request.urlretrieve(url, filename)
print(url)
(Note that the words "website" and "Image" have been substituted for the actual text included in my code).
Before I proceed, perhaps some explanation would be necessary.
Basically, the website in question contains several subdomains that include .JPG images, however, the majority of the exact URLs that allow the user to access these sub-domains are unknown and are a hidden component of the internal website itself. The format is "website.com/Image_xxxxxx.jpg", wherein x indicates a particular digit, and there are 6 total numerical digits by which only when combined to make a valid code pertain to each of the existent images on the site.
So as you can see, I have calibrated the code so that Python will initially search through number values in the aforementioned URL format from 100000 to 199999, and upon discovering any .JPG images attributed to any of the thousands of link combinations, will directly download all existent uncovered images to a specific folder that resides within my Desktop. The aim would be to start from that specific portion of number values, and upon running the code and fetching any images (or not), continually renumbering the code to work my way through all of the possible 6-digit combos until the operation is ultimately a success.
(Possible Side-Issue- Although I am fairly confident that my friend's code is written in a manner so that Python will only download .JPG files to my computer from images that actually do exist on that particular URL, rather than swarming my folder with blank/bare files from every single one of URL attempts regardless of whether that URL happens to be successful or not, I am admittedly not completely certain. If the latter is the case, informing me of a more suitable edit to my code would be tremendously appreciated.)
The Execution-
Right off the bat, the code experienced a large error. I'll list through the series of steps that led to the creation of said error.
#1- Of course, I first copy-pasted the code into a text document, and saved it as "download.py". I saved it inside of a folder named "Images" where I sought the images to be directly downloaded to. I used BBEdit.
#2- I proceeded, in Terminal, to input the commands "cd Desktop/Images" (to account for the file being held within the "Images" folder on my Desktop), followed by the command "Python download.py" (to actually run the code).
As you can see, the error which I obtained following my attempt to run the code was the ImportError: No module named request. Despite me guessing that the answer to solving this is simple, I can legitimately say I have got such minimal knowledge regarding Python that I've absolutely no idea how to solve this.
Hint: Prior to making the download.py file, the folder, and typing the Terminal code the only interactions I made with Python were downloading the program (3.6) and placing it in my toolbar. I'm not even quite sure if I am required to create any additional scripts/text files, or make any additional downloads before a script like this would work and successfully download the resulting images into my "Images" folder as is my desired goal. If I sincerely missed something integral at any point during this long read, hopefully, someone in here can provide a thoroughly detailed explanation as to how to solve my issue.
Finishing statements for those who've managed to stick along this far:
Thank you. I know this is one hell of a read, and I'm getting more tired as I go along. What I hope to get out of this question is
1.) Obviously, what would constitute a direct solution to the "No module named request" Input Error in Terminal. In other words, what I did wrong there or am missing.
2.) Any other helpful information that you know would assist this code, for example, if there is any integral step or condition I've missed or failed to meet that would ultimately cause the entirety of my code to cease to work. If you do see a fault in this, I only ask of you to be specific, as I've not got much experience in the programming world. After all, I know there is a lot of developers out here that are far more informed and experienced than am I. Thanks.

urllib.request is in Python 3 only. When running 'python' on a Mac, you're running Python 2 by default. Try running executing with python3.
python --version
might need to
brew install python3

urllib.request is a Python 3 construct. Most systems run Python 2 as default and this is what you get when you run simply python.
To install Python 3, go to https://brew.sh/ and follow the instructions to install the Hombrew package manager. Then run
brew install python3
python3 download.py

Listing files in a sub folder using Glob

I saw this answer - How can I search sub-folders using glob.glob module in Python? but I didn't understand what os.walk() was doing (I read the docs but it didn't quite make sense).
I'm really new to pathing and still trying to make sense of it.
I have a script that lives in - /users/name/Desktop/folder/ and I want to access some files in /users/name/Desktop/folder/subfolder/*.html
I tried glob.glob('/users/name/Desktop/folder/subfolder/*.html') but it returned an empty list. I realize this is what the previous person did and it didn't work (I was just hoping that glob had been updated!)
Any thoughts on how to do this?

Without any further information it's hard to say what the issue is. I tested your syntax, it works fine for me. Are you sure the extension is .html not .htm in your /users/name/Desktop/folder/subfolder/ directory?
Also, to further troubleshoot you can check what python can see in you directory by running:
os.listdir('/users/name/Desktop/folder/subfolder/')
This should get you started.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.