Failing import of zipped library with py-files

Failing import of zipped library with py-files - python

I have to maintain an oll code running with pyspark.
It's using a method I've never seen.
I have some reusable code zipped into a file ingestion.zip.
Then, this file is called using a pipeline.cfg file like this:
[spark]
master=spark://master
py-files=${HOME}/lib/ingestion.zip
spark-submit=${SPARK_HOME}/bin/spark-submit
When I'm trying to import the library as shown below, I cant make Pycharm understand that the lib should point to the zip file.
from ingestion.data import csv, storage
I've seen the zip is a solution proposed by spark-submit using py-files but how can I make it running on my IDE ?

I haven't used below method with pycharm, but it worked for us with spark-submit and we could import these modules using normal import statements.
Actually, we had a very few files to import and we needed something quick. So, if you also have the same use-case and if pycharm allows then maybe, you can give it a try.
--py-files s3://bucket-name/module1.py,s3://bucket-name/module2.py,s3://bucket-name/module3.py,s3://bucket-name/module4.py"
(Note - there shouldn't be any spaces.)
(Note - this suggestion is only an interim solution till someone replies with a better answer.)

Related

When I import a package in my python script, does Python also import the sub package?

I am doing an online Python course and currently learning about modules and packages, but can't seem to get my head around the logic. I have used modules (or maybe they were packages who knows?) before in current job but without really knowing what was happening when I would do import Pandas for example.
By following the course I have created a folder called "MyMainPackage" and in this folder there is another folder called MySubPackage.
In "MyMainPackage":
init.py and my_main_script.py; in my_main_script.py there is a function called main_func()
in "MySubPackage":
init.py and my_sub_script.py; in my_sub_script.py there is a function called sub_func()
In Sublime Text editor I wrote a script called myprogram2.py:
import MyMainPackage
MyMainPackage.my_main_script.main_func()
However, this hasn't worked; when I call python myprogram2.py in Windows Command Prompt I get the following message
Attribute Error: module MyMainPackage has no attribute my_main_script
However, as per the online lecture, what does work is this:
from MyMainPackage import my_main_script
from MyMainPackage.MySubPackage import my_sub_script
my_main_script.main_func()
my_sub_script.sub_func()
Why can't I just import the entire package and access the modules like I have tried to above instead of the way the online lecture does it? I thought it would be the same thing. I am just struggling to understand the logic.

How can the root directory in python chunk be specified?

Setting the root directory in a python-chunk with the following code line results in an error while for an ordinary r-chunk it works just fine
knitr::opts_knit$set(root.dir ="..")
knitr::opts_knit$set(root.dir ="..")
Optimally there should exist the following options for each knitr-chunk:
- directory to find code to be imported / executed
- directory to find files / dependencies that are needed for code execution
- directory to save any code output
Does something similar exist?

What it looks like here is that you have told it that it is to look for python code:
```{python}
knitr::opts_knit$set(root.dir ="..")
```
When you run this in R studio it will give you an error:
Error: invalid syntax (, line 1)
You fed it python code instead. This makes sense as the call knit::opt_knit$set means to look in the knitr package for the opts_knit$set and set it to…. This doesn’t exist in python… yet. The python compiler does not recognize it as python code and returns a syntax error. Whereas when you run it as an R chunk, it knows to look into the knitr package. Error handling can be huge issue to deal with. It makes more sense to handle error categories than to account for every type of error. If you want to control the settings for a code chunk, you would do so in the parenthesis ie:
```{python, args }
codeHere
```
I have not seen args for any other language than R, but that does not mean it doesn’t exist. I have just not seen them. I also do not believe that this will fix your issue. You could try some of the following ideas:
Writing your python in a separate file and link to it. This would allow for you to take advantage of the language and utilize things like the OS import. This may be something you want to consider as even python has its ways of navigating around the various operating systems. This may be helpful if you are just running quick scripts and not loading or running python programs.
# OS module
import os
# Your os name
print(os.name)
# Gets PWD or present working directory
os.getcwd()
# change your directory
os.chdir("path")
You could try using the reticulate library within an R chunk and load your python that way
Another thought is that you could try
library(reticulate)
use_python(“path”)
Knitr looks in the same directory as your markdown file to find other files if needed. This is just like about any other IDE
At one point in time knitr would not accept R’s setwd() command. Trying to call setwd() may not do you any good.
It may not the best idea to compute paths relative to what's being executed. If possible they should be determined relative to where the user calls the code from.
This site may help.
The author of the knitr package is very active and involved. Good luck with what you are doing!

Creating Executable Zip Archives With Packages in the Project

I feel this question needs a better title and I will amend it if someone suggests something better. The problem is I'm not sure of the terminology of the feature that I'm using here.
The best way to describe my problem is to show what I've done. The project is here: https://github.com/jeffnyman/quendor
This project is setup so it can be executed as a module. For example, from the project root someone could do this:
python3 -m quendor
I also have a build script to generate an in-memory zip (if I'm using that terminology correctly):
https://github.com/jeffnyman/quendor/blob/master/build.py
That works in that if you run build.py it will generate a quendor.py file that executes the entire project. That worked fine up until I included other directories (like my utilities and zinterface).
With the project as it is in the repo right now, if you run the build (.\build.py) and then run the generated file:
./quendor.py
You get the following error:
File "./quendor.py/quendor/__main__.py", line 6, in <module>
ModuleNotFoundError: No module named 'quendor.zinterface'
So a key point: if all of my files are in the same directory (i.e., in quendor) this build script works fine in terms of producing an executable script file.
But once I include the subdirectories and files in those directories, things go south on me with the above error.
I'm sure all the files are being gathered. I handle that starting on line 18 (https://github.com/jeffnyman/quendor/blob/master/build.py#L18). And if you were to add to line 24 this statement:
print(f"* {file_path}")
You would see it outputs the following:
* quendor/__init__.py
* quendor/__version__.py
* quendor/zinterface/fileio.py
* quendor/utilities/messages.py
* quendor/__main__.py
So I'm suspecting it might have to do with the code where I write the string at line 28 (https://github.com/jeffnyman/quendor/blob/master/build.py#L28). I feel I have to do more to let the executable zipped script file know about the modules.
But I'm not sure if (1) I'm accurate and (2) even if I'm accurate, if that's possible. I'm finding I'm in a bit over my head here.
Any thoughts would be appreciated and I'm happy to update with any necessarily clarifications or terminology.

So it won't let me comment unless I have more reputation but I can post an answer. Even though I don't have an answer, but rather a comment. I think the above comment was not meant for your actual __main__.py file but rather the one that is getting generated in your quendor.py file. You might want to try adding the import statements to your packed string that you write.
For example, see what happens if on line 32 you add this: import quendor.zinterface.fileio as zio. (Don't replace the line that's there. Just put my line and then keep your others.) I'm not sure how the zip process works but if it tries to mirror the module process that should work. However, if it doesn't, that won't work. You might also just want to try doing import quendor.zinterface. By itself that won't work but it would be interesting to see if it gave you a different error.

Actually, it turns out I found a way to do this! It required using os.walk rather than os.listdir. This required taking a few ideas that people here discussed. Here is the script that does the trick:
https://github.com/jeffnyman/quendor/blob/master/build.py
You can compare that with my previous commit that was trying to handle this a different way.
Eldritch was right that I couldn't just flatten the directory nor could I just add imports to the string I was writing to the final zip file. Jean-François was correct that I had to focus on the __main__.py that was being generated. My contribution was figuring out os.walk() and then parameterizing the written string to handle the different directories.
Finally, this solution does require, as per HTF's suggestion, that I put an empty __init__.py file in each package.
With my solution in place, you can run build.py which then generates the quendor.py script. That script then executes correctly, in terms of recognizing the imports to various packages.

Playing around with just about every variation of import and file gathering that I can think of with your repo, there's a good news / bad news thing.
The bad news is that the answer is this: it isn't possible.
The good news is this: you do have a working implementation if you just keep all files in the quendor directory rather than having subdirectories.
The other good news is you stumbled on something, and posed a problem, that Python gurus aren't able to answer. And there's a certain pleasure to be found in that! I guarantee you will not get an answer to this that works (except for the "all files in one directory" solution).
A refinement to the answer is that if you're setting up the program to run as a module anyway, just use a pip configuration. That basically does the same thing that you want but without having to go through the contortions. (Unless there's a reason you were doing the build the way you were rather than using pip.)

Python how to learn about importing files etc

I'm fairly new to python programming, i'm familiar with the very basic stuff and i'm currently learning about creating function definitions in scripts.
In particular i'm a mac user and i'm using text wrangler to write and run my python programmes.
Now that i have learned how to define basic functions in scripts i have questions which my notes do not seem to answer.
How do i import my definition which is saved in a file on my desktop to use on IDLE? I've tried
import fileaname
in IDLE and it does not work.
Secondly Suppose i create a function A in one script and then another function B in a separate script that depends on A, do i have to import A in the script for B first? Do they need to be saved in the same file?
I appreciate any advice and useful tips.

With import filename , you want to make sure its in the same directory as your original file. Try using this too
import sys
sys.path.append(directory_path)
It seems to be a common issue.

Python project architecture

I'm a java developer new to python. In java, you can access all classes in the same directory without having to import them.
I am trying to achieve the same behavior in python. Is this possible?
I've tried various solutions, for example by importing everything in a file which I import everywhere. That works, but I have to type myClass = rootFolder.folder2.folder3.MyClass() each time I want to access a foreign class.
Could you show me an example for how a python architecture over several directories works? Do you really have to import all the classes you need in each file?
Imagine that I'm writing a web framework. Will the users of the framework have to import everything they need in their files?

Put everything into a folder (doesn't matter the name), and make sure that that folder has a file named __init__.py (the file can be empty).
Then you can add the following line to the top of your code:
from myfolder import *
That should give you access to everything defined in that folder without needing to give the prefix each time.
You can also have multiple depths of folders like this:
from folder1.folder2 import *
Let me know if this is what you were looking for.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.