how to setup modules in the Glue job script

how to setup modules in the Glue job script - python

I'm following this tutorial: https://www.youtube.com/watch?v=EzQArFt_On4
In this tutorial it's only using one python script, what is I need to import some functions from another python script? For example: import script2
I wonder what's the correct way to setup in Glue job? I've tried to store this script in s3 bucket and add the location in editjob -> Security configuration, script libraries, and job parameters (optional)
->Python library path, but it gave me error ModuleNotFoundError: No module named 'script2, does anyone know how to fix this? Thanks.

In the video tutorial there is no such import like import script2. So if you do this in your script and don't provide script2.py library, the import is going to fail with your the message you are getting.
How to write modules, is best explained in Python docs.
The best way to start programming glue jobs is to auto-generate glue scripts by glue console. Then you can use the scripts generated as a starting point for customization. What's more you can setup Glue Endpoints or even run glue locally (or on ec2 instance) for learning and development purposes.

Related

Azure Storage Blob Python modulewill NOT be recognized when running on azure batch node

I’m trying to run a python script on Azure Batch nodes. One of the required things I need in this file is
import azure.storage.blob
Of which I need to use the BlobServiceClient class from that module. When I try to access the BSC class, it tells me that no attribute of the name BSC (for short) exists in the azure.storage.blob module. Here are the things I have done
I’ve ran the script on my local machine. The script works perfectly
Python3 —version returns 3.8.10 on my AZ nodes
I have downloaded the azure-storage-blob module on my Azure computer nodes (which are Linux nodes)
What might I need to do?

You have not prepared your compute node with the proper software or the task execution environment does not have the proper context. Please read the documentation about Azure Batch jobs and tasks.

Azure failing to run python script

I built a simple web app using Flask. What this does is basically take data from a form and sends a POST - which is then passed as a command line argument to the script using
os.popen("python3 script.py " + postArgument).read()
The command is stored in a variable which is then passed to an element in a new page with the results.
About the script: It runs the string in the POST through an API, gets some data, processes it, sends it to another API and finally prints the results (which are finally stored in the variable)
It works fine on a local server. But Azure fails to return anything. The string is empty.
How do I get some terminal logs?
Is there a solution?

Per my experience, it seems that the issue was caused by the Python 3 (even for Python 2) interpreter called python on Azure, not python3.
So if you had configured the Python 3 runtime environment for the Application settings on Azure portal as the figure below, please using python script.py instead of python3 script.py in your code.
Or you also can use the absolute path of Python 3 on Azure WebApp D:\Python34\python instead of python3 in your code, as below.
However, I also doubt the another possible issue for you besides the above case. You may use some python packages which be not install using pip on Azure. If so, you need to refer to the section Troubleshooting - Package Installation of the Azure offical document for Python to resolve the possible issue.
Hope it helps. Any concern & update, please feel free to let me know.

Command glossary for dataflow?

I'm experimenting with the Dataflow Python SDK and would like some sort of reference as to what the various commands do, their required args and their recommended syntax.
So after
import google.cloud.dataflow as df
Where can I read up on df.Create, df.Write, df.FlatMap, df.CombinePerKey, etc. ? Has anybody put together such a reference?
Is there anyplace (link please) where all the possible Apache Beam / Dataflow commands are collected and explained?

There is not yet a pydoc server running for Dataflow Python. However, you can easily run your own in order to browse: https://github.com/GoogleCloudPlatform/DataflowPythonSDK#a-quick-tour-of-the-source-code

Using the GAE remote_api to Create Local Scripts

I'm trying to do some local processing of data entries from the GAE datastore and I am trying to do this by using the remote_api. I just want to write some quick processing scripts that pull some data, but I am getting import errors saying that Python cannot import from google.
Am I supposed to run the script from within the development environment somehow. Or perhaps I need to include all of the google stuff in my Python path? That seems excessive though.

Why is including the paths that onerous ?
Normally the remote_api shell is used interactively but it is a good tool that you can use as the basis of acheiving what your want.
The simplest way will be to copy and modify the remote_api shell so that rather than presenting an interactive shell you get it to run a named script.
That way it will deal with all the path setup.
In the past I have integrated the remote_api inside a zope server, so that plone could publish stuff to appengine. All sort of things are possible with remote_api, however you need to deal with imports like anything else in python, except that appengine libraries are not installed in site-packages.

Use app engine yaml parser in scripts

I have some configuration files I want to write in yaml and read in a Python script running on Google app engine. Given that app engine uses app.yaml, index.yaml among others it seems reasonable to assume there is a python yaml parser available.
How can I gain access to this parser (what is the import) and where can I find its documentation.
I'd also like to use this parser for scripts running outside of agg engine (build scripts and such) so how can I gain access to the same import from a script that will run from the command line?

The YAML library is included with the AppEngine SDK. It is located in google_appengine/lib/yaml. You should be able to use it in your AppEngine code just by having import yaml in your code.
For non-AppEngine work, a quick Google search reveals http://pyyaml.org/ home to many and various Python implementations.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to setup modules in the Glue job script - python

Related

Azure Storage Blob Python modulewill NOT be recognized when running on azure batch node

Azure failing to run python script

Command glossary for dataflow?

Using the GAE remote_api to Create Local Scripts

Use app engine yaml parser in scripts

Categories

Resources