Function input() in pyspark - python

My problem here is when I enter the value of p, Nothing happens, It does not pursue execution: is there a way to fix it please?
import sys
from pyspark import SparkContext
sc = SparkContext("local", "simple App")
p =input("Enter the word")
rdd1 = sc.textFile("monfichier")
rdd2= rdd1.map(lambda l : l.split("\t"))
rdd3=rdd2.map(lambda l: l[1])
print rdd3.take(6)
rdd5=rdd3.filter(lambda l : p in l)
sc.stop()

You can use py4j to get input via Java
from py4j.java_gateway import JavaGateway
scanner = sc._gateway.jvm.java.util.Scanner
sys_in = getattr(sc._gateway.jvm.java.lang.System, 'in')
result = scanner(sys_in).nextLine()
print(result)
Depending on your environment/spark version you might need to replace sc with spark.sparkContext

You have to distinguish between to different cases:
Script submitted with $SPARK_HOME/bin/spark-submit script.py
In this case you execute Scala application which in turn starts Python interpreter. Since Scala application doesn't expect any interaction from the standard input, not to mention passing it to Python interpreter, your Python script will simply hang waiting for data which won't come.
Script executed directly using Python interpreter (python script.py).
You should be able to use input directly but at the cost of handling all the configuration details, normally handled by spark-submit / org.apache.spark.deploy.SparkSubmit, manually in your code.
In general all required arguments for your scripts can be passed using commandline
$SPARK_HOME/bin/spark-submit script.py some_app_arg another_app_arg
and accessed using standard methods like sys.argv or argparse and using input is neither necessary nor useful.

I had same problem in Azure DataBricks, I used widgets to get input from the user.

Related

Python logging doesn't work when script is called from another program

I have a python script that I use with LibreOffice Calc to do some more advanced macros. I need to debug this script and I'm trying to use logging for this. Logging works fine when the script is called from the command line, but it doesn't work at all when the script is called by LibreOffice.
Here is my logging test code:
import logging
logging.basicConfig(filename='test.log', level=logging.INFO)
logging.warning('test')
As requested, here is the LibreOffice Basic script that calls the Python script (this was mostly just a copy/paste from a guide on how to call Python scripts from LO):
function cev(a as String) as double
Dim scriptPro As Object, myScript As Object
Dim a1(1), b1(0), c1(0) as variant
a1(0) = ThisComponent
a1(1) = a
scriptPro = ThisComponent.getScriptProvider()
myScript = scriptPro.getScript( _
"vnd.sun.star.script:Cell_Functions.py$calcEffectValue?language=Python&location=user")
cev = myScript.invoke(a1, b1, c1)
end function
The basic script is called on a single cell using CEV(cellAddress), which passes the contents of the cell through to the Python script as a string.
Well, I updated to LibreOffice 7 and this started working. The Python version in LO 7 is 3.8 instead of 3.5, so maybe that made the difference.
Maybe it is working but you just don't know where test.log file is getting placed when it runs from LibreOffice. Try providing an absolute file path for test.log, like let's say C:/test.log.

How do I run a python file with arguments as variables from another python file?

I'm working on cloning a Virtual Machine (VM) in vCenter environment using this code. It takes command line arguments for name of the VM, template, datastore, etc. (e.g. $ clone_vm.py -s <host_name> -p < password > -nossl ....)
I have another Python file where I've been able to list the Datastore volumes in descending order of free_storage. I have stored the datastore with maximum available storage in a variable ds_max. (Let's call this ds_info.py)
I would like to use ds_max variable from ds_info.py as a command line argument for datastore command line argument in clone_vm.py.
I tried importing the os module in ds_info.py and running os.system(python clone_vm.py ....arguments...) but it did not take the ds_max variable as an argument.
I'm new to coding and am not confident to change the clone_vm.py to take in the Datastore with maximum free storage.
Thank you for taking the time to read through this.
I suspect there is something wrong in your os.system call, but you don't provide it, so I can't check.
Generally it is a good idea to use the current paradigm, and the received wisdom (TM) is that we use subprocess. See the docs, but the basic pattern is:
from subprocess import run
cmd = ["mycmd", "--arg1", "--arg2", "val_for_arg2"]
run(cmd)
Since this is just a list, you can easily drop arguments into it:
var = "hello"
cmd = ["echo", var]
run(cmd)
However, if your other command is in fact a python script it is more normal to refactor your script so that the main functionality is wrapped in a function, called main by convention:
# script 2
...
def main(arg1, arg2, arg3):
do_the_work
if __name__ == "__main__":
args = get_sys_args() # dummy fn
main(*args)
Then you can simply import script2 from script1 and run the code directly:
# script 1
from script2 import main
args = get_args() # dummy fn
main(*args)
This is 'better' as it doesn't involve spawning a whole new python process just to run python code, and it generally results in neater code. But nothing stops you calling a python script the same way you'd call anything else.

How to answer input call from console script in pytest?

I'm currently writing a module which uses console_script in setup.py to create scripts at installation time. For performing the tests I use the plugin pytest-console-scripts to execute those scripts. One of the functions I want to test involves a input() call to get an anwer from the user ('y'es or 'n'o). But I do not have any idea on how to mock this input.
A sample test using pytest-console-scripts looks like:
import pytest
def test_my_function(script_runner):
# first option is the console script to be run followed by arguments
ret = script_runner.run('myscript', '--version')
assert ret.success
This can be used when the console script does not involve user action. How can this be solved?
Many thanks in advance, regards, Thomas
EDIT: the provided solutions in How to test a function with input call may solve my question only partially. My intention is to test the functionality through the console script, but not importing the module containing the function called through that script - if this is possible.
After investigating a lot more through Google I came across a solution, which worked perfectly for me:
# pip install pytest-mock pytest-console-scripts
...
def test_user_input(script_runner, mocker):
# optional use side_effect with any kind of value you try to give to
# your tested function
mocker.patch('builtins.input', return_value='<your_expected_input>')
# use side_effect=<values> instead if you want to insert more than one value
# Options have to be seperated
# Example: ('my_prog', '-a', 'val_a', '-b', 'val_b')
# or: ('my_prog', '-a val_a -b val_b'.split(' '))
ret = script_runner.run('my_prog')
assert ret.success
assert ret.stdout == <whatever>
# or assert 'string' in ret.stdout
See https://docs.python.org/3/library/unittest.mock.html#unittest.mock.Mock.side_effect for further possibilities of how to use side_effect.

Set the variable in command output

I would like to know how can i use my variables in output of another command. For example if i try to generate some keys with "openssl" i'll get the question about the country, state, organizations etc.
I would like to use my variables in the script that i have to fill this information. I'll have variable "Country"; variable "State" etc. and to be parsed/set in to this questions from the openssl command when is executed.
I'm trying this in bash but also would like to know how will be the same think done in python.
Kind regards
You have multiple ways to do so.
1. If you have your script launched before the python script and the result set in an enviroment variable you can read the environment variable from your python script as follows:
import os
os.environ.get('MYVARIABLE', 'Default val')
Otherwise you can try to launch the other application from your python script and read the result by using os.popen():
import os
tmp = os.popen("ls").read()
or better (if you have a python newer than 2.6)
import subprocess
proc = subprocess.Popen('ls', stdout=subprocess.PIPE)
tmp = proc.stdout.read()

python define replica set configuration

I'm writing the script to setup the replaset for mongo in python.
The first part of the script starts the processes and the second should configure the replicaset.
From the command line I ussually do:
config={_id:"aaa",members:[{_id:0,host:"localhost:27017"},{_id:1,host:"localhost:27018"},{_id:2,host:"localhost:27019",arbiterOnly:true}]}
rs.initiate(config)
rs.status();
And then I'm looking from rs.status() that all members are initialized
I want to do the same in python script.
In general i'm looking for a good reference of setup scripts for mongodb (also sharding). I saw the python script in their site, it is a good start point (but it only for single machine and sinle node in replSet). I need to setup all on different machines.
Thanks
If you run rs.initiate (without the (config)) the shell tells you which command it would run. In this case, it would be:
function (c) {
return db._adminCommand({replSetInitiate:c});
}
In python this should be something like:
>>> from pymongo import Connection
>>> c = Connection("morton.local:27017", slave_okay=True)
>>> d.command( "replSetInitiate", c );
With c being your replicaset configuration. http://api.mongodb.org/python/current/api/pymongo/database.html#pymongo.database.Database.command has some more information on calling commands.
Thanks Derick. Here are some remarks to your answer. 'replSetInitiate' is DBA command. Run it agains 'admin' database. As here:
conn = Connection("localhost:27017", slave_okay=True)
conn.admin.command( "replSetInitiate" );
To get the output of rs.status in pymongo we can use like this
def __init__(self):
'''Constructor'''
self.mdb=ReplicaSetConnection('localhost:27017',replicaSet='rs0')
def statusofcluster(self):
'''Check the status of Cluster and gives the output as true'''
print "We are Inside Status of Cluster"
output=self.mdb.admin.command('replSetGetStatus')

Categories