Azure Databricks parameter transfer between notebooks using Python

Azure Databricks parameter transfer between notebooks using Python - python

The documentation by Microsoft at https://learn.microsoft.com/en-us/azure/databricks/notebooks/notebook-workflows says that you can run another notebook and pass parameters by doing the following:
notebook1:
result = dbutils.notebook.run("notebook2", 60, {"argument": "data", "argument2": "data2"})
print(f"{result}")
But it doesnt say how I can fetch the paramters argument and argument2 in my notebook2.
notebook2:
argument = ??
argument2 = ??
print(f"argument={argument} and argument2={argument2}")
dbutils.notebook.exit("Success")
How can I get get parameters in notebook2?

The document provided an answer for this. In order to get the parameters passed from notebook1 you must create two text widgets using dbuitls.widgets.text() in notebook2. Now use the dbuitls.widgets.get() method to get the values of these parameters.
You can try using the following code:
Notebook1
result = dbutils.notebook.run("nb2", 60, {"argument": "data", "argument2": "data2"})
print(f"{result}")
Notebook2
dbutils.widgets.text("argument","argument_default")
argument = dbutils.widgets.get("argument")
dbutils.widgets.text("argument2","argument2_default")
argument2 = dbutils.widgets.get("argument2")
ans = argument+' '+argument2
#print(f"argument={argument} and argument2={argument2}")
dbutils.notebook.exit(ans)
When you execute the notebook1 to run notebook2, the notebook2 runs successfully with the exit value as shown below:
data data2
Note: If pass only one value from, then the other argument in notebook2 takes the default value mentioned in dbutils.widgets.text() (2nd parameter).

Related

Passing multiple arguments to Python script using sys.argv[1] and numpy array. IndexError: list index out of range

I'm working in a small code to receive multiple arguments from an MQTT server and use them to predict another value. I'm showing a simplified code here just to get some help. To pass the arguments to the script for executing the prediction, first part is the creation of a numpy array, then pass the arguments to the script using sys.argv[], then the indexing to position the incoming values.
import numpy as np
import sys
# creating empty numpy array for feature values
X = np.empty(2).reshape(1, 2)
#storing the arguments
azimuth_sin=sys.argv[1]
azimuth_cos=sys.argv[2]
#displaying the arguments
print("azimuth_sin : " + azimuth_sin)
print("azimuth_cos : " + azimuth_cos)
print("Number of arguments : ", len(sys.argv))
# set vector values
X[:,0] = sys.argv[1]
X[:,1] = sys.argv[2]
print(X)
However, I have an issue with the second argument as I get an error:
exit code: 1, Traceback (most recent call last):
File "numpy-array.py", line 10, in
azimuth_cos=sys.argv[2]
IndexError: list index out of range
The only way to avoid that error is if I set both arguments to: sys.arg[1]
#storing the arguments
azimuth_sin=sys.argv[1]
azimuth_cos=sys.argv[1]
#displaying the arguments
print("azimuth_sin : " + azimuth_sin)
print("azimuth_cos : " + azimuth_cos)
print("Number of arguments : ", len(sys.argv))
# set vector values
X[:,0] = sys.argv[1]
X[:,1] = sys.argv[1]
print(X)
Then I get two consecutive outputs:
azimuth_sin : -0.9152180545267792
azimuth_cos : -0.9152180545267792
Number of arguments : 2
[[-0.91521805 -0.91521805]]
and:
azimuth_sin : 0.40295894662883136
azimuth_cos : 0.40295894662883136
Number of arguments : 2
[[0.40295895 0.40295895]]
which are actually the values of the two arguments printed, but repeated twice: sin = -0.9152180545267792 and cos = 0.40295894662883136
If I put the arguments in one line:
#storing the arguments
azimuth_sin, azimuth_cos = sys.argv[1:2]
The error is:
exit code: 1, Traceback (most recent call last):
File "numpy-array-t1.py", line 10, in
azimuth_sin, azimuth_cos = sys.argv[1:2]
ValueError: not enough values to unpack (expected 2, got 1)
I've tried many ways to fix this without success, I'd appreciate any help or suggestions. Thank you in advance.

Start with something simple to validate the data you are receiving.
Do something like:
import sys
# Verify first so you don't get an error
# This check verifies we have at least two parameters
if 1 < len(sys.argv):
if sys.argv[1]:
var_argv1 = sys.argv[1]
print("var_argv1 type: %s, length: %s" % (type(var_argv1), len(var_argv1)))
# It is pointless to continue if argv[1] has no data
if sys.argv[2]:
var_argv2 = sys.argv[2]
print("var_argv2 type: %s, length: %s" % (type(var_argv2), len(var_argv2)))
else:
print("sys.argv[2] has no data")
else:
print("sys.argv[1] has no data")
It may be that you are trying to process a numpy object on the command line
Note:
Do you have access to the MQTT server?
It might be easier to pick a channel (topic) to use for this data transfer.
You could get the MQTT server to publish this data on a channel and have this script subscribe to that channel.
Then you could make sending information as easy as a function call on your MQTT system.

In Linux terminal window, I have a simple script that just displays sys.argv:
1619:~$ cd mypy
1619:~/mypy$ cat echo.py
import sys
print(sys.argv)
When I call it thus:
1619:~/mypy$ python3 echo.py 1.23 3.112 foo bar
['echo.py', '1.23', '3.112', 'foo', 'bar']
See that sys.argv is list of 5 strings, which come from the commandline.
If you are calling your script from a shell or windows command window, you should be able to enter and see multiple strings.
But people have problems using sys.argv (and argparse) when running the script from something like pydev or as a Jupyter notebook. I don't know anything about the MQTT server so can't help with providing even one command line argument. As I demonstrate, sys.argv is primarily intended as a way of providing startup values and options when running a script from the operating system.

Running the same Databricks Python Notebook concurrently

I am running the same notebook three times in parallel using the code below:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def notebook1_function(country, days):
dbutils.notebook.run(path = "/pathtonotebook1/notebook1", \
timeout_seconds = 300, \
arguments = {"Country":country, "Days":days})
countries = ['US','Canada','UK']
days = [2] * len(countries)
with ThreadPoolExecutor() as executor:
results = executor.map(notebook1_function, countries, days)
Each time, I am passing different value for 'country' and 2 for 'days'. Inside notebook1 I have df1.
I want to know the following:
How to append all the df1's from the three concurrent runs into a single dataframe.
How to get the status [Success/Failure] of each run after completion.
Thank you in advance.

When you're using dbutils.notebook.run (so-called Notebook workflows), the notebook is executed as a separate job, and caller of the notebook that doesn't share anything with it - all communication happens via parameters that you're passing to the notebook, and notebook may return only string value specified via call to dbutils.notebook.exit. So your code doesn't have access to the df1 inside the notebook that you're calling.
Usually, if you're using such notebook workflow, then you need to somehow persist the content of the df1 from the called notebook into some table, and then read that content from caller notebook.
Another possibility, is to extract the code from the called notebook into the function that will receive arguments, and will return the dataframe, include that notebook via %run, call the function with different arguments, and combine results using the union. Something like this:
Notebook 1 (called):
def my_function(country, days):
# do something
return dataframe
Caller notebook:
%run "./Notebook 1"
df_us = my_function('US', 10)
df_canada = my_function('Canada', 10)
df_uk = my_function('UK', 10)
df_all = df_us.union(df_canada).union(df_uk)

How to run ipython script in python?

I'd like to run ipython script in python, ie:
code='''a=1
b=a+1
b
c'''
from Ipython import executor
for l in code.split("\n"):
print(executor(l))
that whould print
None
None
2
NameError: name 'c' is not defined
does it exists ? I searched the doc, but it does not seems to be (well) documented.

In short, depending on what you want to do and how much IPython features you want to include, you will need to do more.
First thing you need to know is that IPython separates its code into blocks.
Each block has its own result.
If you use blocks use this advice
If you don't any magic IPython provides you with and don't want any results given by each block, then you could just try to use exec(compile(script, "exec"), {}, {}).
If you want more than that, you will need to actually spawn an InteractiveShell-instance as features like %magic and %%magic will need a working InteractiveShell.
In one of my projects I have this function to execute code in an InteractiveShell-instance:
https://github.com/Irrational-Encoding-Wizardry/yuuno/blob/master/yuuno_ipython/ipython/utils.py#L28
If you want to just get the result of each expression,
then you should parse the code using the ast-Module and add code to return each result.
You will see this in the function linked above from line 34 onwards.
Here is the relevant except:
if isinstance(expr_ast.body[-1], ast.Expr):
last_expr = expr_ast.body[-1]
assign = ast.Assign( # _yuuno_exec_last_ = <LAST_EXPR>
targets=[ast.Name(
id=RESULT_VAR,
ctx=ast.Store()
)],
value=last_expr.value
)
expr_ast.body[-1] = assign
else:
assign = ast.Assign( # _yuuno_exec_last_ = None
targets=[ast.Name(
id=RESULT_VAR,
ctx=ast.Store(),
)],
value=ast.NameConstant(
value=None
)
)
expr_ast.body.append(assign)
ast.fix_missing_locations(expr_ast)
Instead doing this for every statement in the body instead of the last one and replacing it with some "printResult"-transformation will do the same for you.

Why there is a difference in the number of launch configurations received from the python script and AWS CLI?

The python script that returns the list of launch configurations is as follows ( for the us-east-1 region):
autoscaling_connection = boto.ec2.autoscale.connect_to_region(region)
nlist = autoscaling_connection.get_all_launch_configurations()
For some reason the length of nlist is 50, i.e we found only 50 launch configurations. The same query in AWS CLI results in 174 results:
aws autoscaling describe-launch-configurations --region us-east-1 | grep LaunchConfigurationName | wc
Why is so big deviation?

Because get_all_launch_configurations has a default limit of 50 returned records per call. It doesn't seem to be specifically documented for that boto2's function, but a similar function describe_launch_configurations from boto3 mentions that:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/autoscaling.html#AutoScaling.Client.describe_launch_configurations
Parameters
MaxRecords (integer) -- The maximum number of items to return with this
call. The default value is 50 and the maximum value is 100.
NextToken (string) -- The token for the next set of items
to return. (You received this token from a previous call.)
The same parameters are supported by boto2's get_all_launch_configurations() under names max_records and next_token, see here.
First make a call with NextToken="" and you'll get the first 50 (or up to 100) launch configs. In the returned data look for NextToken value and keep repeating the call until the returned data comes back without NextToken.
Something like this:
data = conn.get_all_launch_configurations()
process_lc(data['LaunchConfigurations'])
while 'NextToken' in data:
data = conn.get_all_launch_configurations(next_token=data['NextToken'])
process_lc(data['LaunchConfigurations'])
Hope that helps :)
BTW If you're writing a new script consider writing it in boto3 as that's the current and recommended version.
Update - boto2 vs boto3:
Looks like boto2 doesn't return NextToken in the return value list. Use boto3, it's better and more logical, really :)
Here is an actual script that works:
#!/usr/bin/env python3
import boto3
def process_lcs(launch_configs):
for lc in launch_configs:
print(lc['LaunchConfigurationARN'])
client = boto3.client('autoscaling')
response = client.describe_launch_configurations(MaxRecords=1)
process_lcs(response['LaunchConfigurations'])
while 'NextToken' in response:
response = client.describe_launch_configurations(MaxRecords=1, NextToken=response['NextToken'])
process_lcs(response['LaunchConfigurations'])
I intentionally set MaxRecords=1 for testing, raise it to 50 or 100 in your actual script.

Python win32com.client function with propput

I'm using win32com.client to write a little plugin, but I have a problem with set a property. The definition for the property or function is this:
[id(0x00000021), propget, helpstring("property SystemChannel")]
long SystemChannel(
long lVEN,
long lDEV,
long lSVID,
long lSID);
[id(0x00000021), propput, helpstring("property SystemChannel")]
void SystemChannel(
long lVEN,
long lDEV,
long lSVID,
long lSID,
[in] long rhs);
I have not problems with get the value, the next code work greats:
app = Dispatch("CmAVConfig.AudioConfig")
self.speakerNumber = app.SystemChannel(self.glVid, self.glDid, self.glSvid, self.glsid)
But I can't set the value of the same property, I have tried using the next instructions and I get the errors below:
app = Dispatch("CmAVConfig.AudioConfig")
app.SystemChannel(self.glVid, self.glDid, self.glSvid, self.glsid, self.speakerNumber)
ERROR: SystemChannel() takes at most 5 arguments (6 given)
//this one is from a working example using javascript
app.SystemChannel(self.glVid, self.glDid, self.glSvid, self.glsid) = self.speakerNumber
ERROR: SyntaxError: ("can't assign to function call", ('ooo.py', 56, None, 'app.SystemChannel(self.glVid, self.glDid, self.glSvid, self.glsid) = self.speakerNumber\n'))

If you run makepy for the library (or use win32com.client.gencache.EnsureDispatch)
it should create a SetSystemChannel method that takes an extra arg.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Azure Databricks parameter transfer between notebooks using Python - python

Related

Passing multiple arguments to Python script using sys.argv[1] and numpy array. IndexError: list index out of range

Running the same Databricks Python Notebook concurrently

How to run ipython script in python?

Why there is a difference in the number of launch configurations received from the python script and AWS CLI?

Python win32com.client function with propput

Categories

Resources