I want to run dask.dataframe by using jython - python

I want to run dask in java process by using jython.
I installed dask[complete] by using pip command.
but, java process raise ImportError: dask
so how I can fix this bug?
package test;
import org.python.core.*;
import org.python.util.*;
public class TestJython {
private static PythonInterpreter pi;
public static void main(String[] args) throws PyException {
pi = new PythonInterpreter();
PySystemState sys = pi.getSystemState();
sys.path.append(new PyString("/usr/local/lib/python2.7/dist-packages"));
pi.exec("import dask.dataframe as dd");
}
}
error log :
Exception in thread "MainThread" Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/dask/dataframe/__init__.py", line 31, in <module>
raise ImportError(str(e) + '\n\n' + msg)
ImportError: Missing required dependencies ['numpy']

Looks like the PythonInterpreter isn't initialized with the correct PYTHONPATH setup. This is not an issue with Dask, but with how you're initializing PythonInterpreter. Looks like you may need to set the python.path system property, or use the JYTHONPATH environment variable: https://www.stefaanlippens.net/jython_and_pythonpath/.
Note that the dask team has no experience running dask in Jython, and cannot guarantee that things will work, or be performant.

Related

Call java .jar package in python using jpype

I'm trying to run a .jar from python, but I get the following error. I need help to solve it.
The python code is:
import jpype
import os.path
jvmPath = jpype.getDefaultJVMPath()
jarPath =os.path.join(os.path.abspath('.'),'C:\Programación\Java\JpypePrueba\dist\JpypePrueba.jar')
dependency = os.path.join(os.path.abspath('.'), "C:\Programación\Java\JpypePrueba")
jpype.startJVM(jvmPath, "-ea", "-Djava.class.path=%s" % jarPath, "-Djava.ext.dirs=%s" % dependency)
JDClass = jpype. JClass("project1.sort")
jd = JDClass()
print (jd.calc(1,2))
jpype.shutdownJVM()
and the java code is:
package project1;
public class sort {
public static void main(String[] args ) {
sort t2 = new sort();
System.out.println ( t2.calc (1, 2)) ;
}
public int calc( int a , int b ) {
return a + b ;
}
}
The error that is generated in python is the following:
runfile('C:/Programación/Java/JpypePrueba/JpypePrueba.py', wdir='C:/Programación/Java/JpypePrueba')
Traceback (most recent call last):
File "C:\Programación\Java\JpypePrueba\JpypePrueba.py", line 17, in <module>
jpype.startJVM(jvmPath, "-ea", "-Djava.class.path=%s" % jarPath, "-Djava.ext.dirs=%s" % dependency)
File "C:\ProgramData\Anaconda3\lib\site-packages\jpype\_core.py", line 166, in startJVM
raise OSError('JVM is already started')
OSError: JVM is already started
The location of my python main code looks like the attached image:
The program should return the sum of 2+1=3.

call python script in java application (Error in python import)

I call my python script with java and also pass variables. My The script alone in a PyCharm project works so far, only if I call it via Java (InteliJ) nothing happens.
The program should pass values ​​to my script via Java so that the script changes and saves values ​​in a Word Document (docx).
If I use the version of my script in the Intelij folder, it has problems importing docx.
I think the problem is that in the PyCharm project the docx data is directly in the project. This is not the case with Intelij and it should access the system data from docx.
I have already completely reinstalled lxml and docx but to no avail.
How do I have to change my program structure or my script so that it works.
Java:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class runPython {
public static void main(String[] args) throws IOException {
try {
ProcessBuilder builder = new ProcessBuilder("python", "C://Users//Notebook//IdeaProjects//bhTool_bridge//scripts//main.py", "Annemarie Brekow");
Process process = builder.start();
}
catch (Exception e){
e.printStackTrace();
}
}
}
Python:
from docx import Document
import sys
import os
document = Document('C:/Users/Notebook/IdeaProjects/bhTool_bridge/template/Muster_Rechnung.docx')
workingdirectory=os.getcwd()
def find_replace(paragraph_keyword, draft_keyword, paragraph):
if paragraph_keyword in paragraph.text:
# print("found")
paragraph.text = paragraph.text.replace(paragraph_keyword, draft_keyword)
for paragraph in document.paragraphs:
find_replace("$Kunden-Name", "Annemarie Brekow", paragraph)
print(paragraph.text)
document.save("C:/Users/Notebook/IdeaProjects/bhTool_bridge/template/Muster_Rechnung.docx")
EDIT:
Errormsg
Traceback (most recent call last):
File "C:\Users\Notebook\IdeaProjects\bhTool_bridge\scripts\main.py", line 2, in
from docx import Document
ModuleNotFoundError: No module named 'docx'

Running python script with gurobipy module from qt

I want to run a python script from Qt, when the user clicks a button. This script works properly in a terminal but I get an error when I execute from Qt.
I have tried to execute the script from Pycharm IDE and I get the same error:
Traceback (most recent call last):
File "/home/ana/PycharmProjects/Gurobi/one_set.py", line 1, in <module>
from gurobipy import *
File "/usr/local/lib/python2.7/dist-packages/gurobipy/__init__.py", line 1, in <module>
from .gurobipy import *
ImportError: libgurobi81.so: cannot open shared object file: No such file or directory
When I execute "import gurobipy" in a python console, I get no error.
import gurobipy
import pkg_resources
pkg_resources.get_distribution("gurobipy").version
'8.1.1'
Searching libgurobi81.so, I check that this file exists in:
/opt/gurobi811/linux64/lib/libgurobi81.so
/usr/lib/python2.7/dist-packages/gurobi811/linux64/lib/libgurobi81.so
/usr/local/lib/python2.7/dist-packages/gurobipy/libgurobi81.so
As suggested in install instructions, I have included environment variables in /home/usr/.bashrc as:
export GUROBI_HOME="/opt/gurobi811/linux64"
export PATH="${PATH}:${GUROBI_HOME}/bin"
export LD_LIBRARY_PATH="${GUROBI_HOME}/lib"
I also included the other directories that contain libgurobi81.so:
export PATH=$PATH:/usr/lib/python2.7/dist-packages/gurobi811/
export PATH=$PATH:/usr/local/lib/python2.7/dist-packages/gurobipy/
However, from terminal everything works fine and I get the solution:
/usr/bin/python2.7 /home/ana/PycharmProjects/Gurobi/one_set.py
Academic license - for non-commercial use only
instance objVal time
Instance1.csv 0.030176 0.0002670288
[1 rows x 2 columns]
The code I use to run python script from Qt is:
QString command("/usr/bin/python2.7");
QStringList params = QStringList() << "/home/ana/PycharmProjects/Gurobi/one_set.py";
QProcess *process = new QProcess();
process->startDetached(command, params);
process->waitForFinished();
qDebug()<<process->readAllStandardOutput();
process->close();
I expected the same output from Qt as from terminal, since the command I use to run it is the same:
/usr/bin/python2.7 /home/ana/PycharmProjects/Gurobi/one_set.py
Solved. The solution was adding environment variables before the start of the process:
QString command("/usr/bin/python2.7");
QStringList params = QStringList();
params.append("/home/ana/PycharmProjects/Gurobi/one_set.py");
QProcess *process = new QProcess();
QProcessEnvironment env = QProcessEnvironment::systemEnvironment();
env.insert("LD_LIBRARY_PATH", "/usr/local/lib:/opt/gurobi811/linux64/lib:/opt/gurobi811/linux64/lib:/opt/gurobi811/linux64/lib/"); // Add an environment variable
process->setProcessEnvironment(env);
process->start(command, params);
process->waitForFinished();
QString p_stdout = process->readAllStandardOutput();
ui->Output->setText(p_stdout);
process->close();

Airflow Exception: DataFlow failed with return code 1

I am trying to execute dataflow jar through airflow script. For it i am using DataFlowJavaOperator. In the param jar,i am passing the path of the executable jar file present in the local system.But when i try to run this job i get error as
{gcp_dataflow_hook.py:108} INFO - Start waiting for DataFlow process to complete.
[2017-09-12 16:59:38,225] {models.py:1417} ERROR - DataFlow failed with return code 1
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1374, in run
result = task_copy.execute(context=context)
File "/usr/lib/python2.7/site-packages/airflow/contrib/operators/dataflow_operator.py", line 116, in execute
hook.start_java_dataflow(self.task_id, dataflow_options, self.jar)
File "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 146, in start_java_dataflow
task_id, variables, dataflow, name, ["java", "-jar"])
File "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 138, in _start_dataflow
_Dataflow(cmd).wait_for_done()
File "/usr/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 119, in wait_for_done
self._proc.returncode))
Exception: DataFlow failed with return code 1`
My airflow script is :
from airflow.contrib.operators.dataflow_operator import DataFlowJavaOperator
from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'start_date': datetime(2017, 03, 16),
'email': [<EmailID>],
'dataflow_default_options': {
'project': '<ProjectId>',
# 'zone': 'europe-west1-d', (i am not sure what should i pass here)
'stagingLocation': 'gs://spark_3/staging/'
}
}
dag = DAG('Dataflow',schedule_interval=timedelta(minutes=2),
default_args=default_args)
dataflow1 = DataFlowJavaOperator(
task_id='dataflow_example',
jar ='/root/airflow_scripts/csvwriter.jar',
gcp_conn_id = 'GCP_smoke',
dag=dag)
I am not sure what mistake i am making ,Can anybody please help me to get out of this
Note :I am creating this jar while selecting option as Runnable JAR file by packaging all the external dependencies.
The problem was with the jar that I was using. Before using the jar, Make sure that the jar is executing as expected.
Example:
If your jar was dataflow_job1.jar, Execute the jar using
java -jar dataflow_job_1.jar --parameters_if_any
Once your jar runs successfully, Proceed with using the jar in Airflow DataflowJavaOperator jar.
Furthermore,
If you encounter errors related to Coders, you may have to make your own coder to execute the code.
For instance, I had a problem with TableRow class as it didnot have a default coder and thus i had to make this up:
TableRowCoder :
public class TableRowCoder extends Coder<TableRow> {
private static final long serialVersionUID = 1L;
private static final Coder<TableRow> tableRow = TableRowJsonCoder.of();
#Override
public void encode(TableRow value, OutputStream outStream) throws CoderException, IOException {
tableRow.encode(value, outStream);
}
#Override
public TableRow decode(InputStream inStream) throws CoderException, IOException {
return new TableRow().set("F1", tableRow.decode(inStream));
}
#Override
public List<? extends Coder<?>> getCoderArguments() {
// TODO Auto-generated method stub
return null;
}
#Override
public void verifyDeterministic() throws org.apache.beam.sdk.coders.Coder.NonDeterministicException {
}
}
Then Register this coder in your code using
pipeline.getCoderRegistry().registerCoderForClass(TableRow.class, new TableRowCoder())
If there are still errors(which are not related to coders) Navigate to:
*.jar\META-INF\services\FileSystemRegistrar
and add any dependencies that may occur.
For example there might be a staging error as:
Unable to find registrar for gs
i had to add the following line to make it work.
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar

How do I get debug_info from lttng ctf trace using babeltrace python bindings?

I'm using the Babeltrace python3 bindings to read an lttng ust trace that contains debug_info. When I run Babeltrace from the shell I see the debug_info in the output:
[13:28:29.998652878] (+0.000000321) hsm-dev lttng_ust_cyg_profile:func_exit: { cpu_id = 1 }, { ip = 0x4008E5, debug_info = { bin = "a.out#0x4008e5", func = "foo+0" }, vpid = 28208, vtid = 28211 }, { addr = 0x4008E5, call_site = 0x400957 }
From the python bindings I can get the other event fields (cpu_id, ip, addr, call_site...) but I get key errors trying to access debug_info, bin or func.
import babeltrace
collection = babeltrace.TraceCollection()
collection.add_traces_recursive('lttng-traces/a.out-20170624-132829/', 'ctf')
for e in collection.events:
if e.name == 'lttng_ust_cyg_profile:func_entry':
print(e['addr'])
print(e['func'])
Traceback (most recent call last):
File "fields.py", line 9, in <module>
print(e['func'])
File "/usr/lib/python3/dist-packages/babeltrace.py", line 865, in __getitem__
raise KeyError(field_name)
KeyError: 'func'
Is there a way to get those fields from Python?
I'm using Babeltrace 1.5.2
Not yet. It is possible with the Babeltrace 2 Python bindings, after building the appropriate processing graph and running it, but this major revision is not released as of this date (pre stage).
There's a hack for debug information in Babeltrace 1 in which the text output "injects" virtual fields at print time, but they are not available before that, so that's why you can't access e['func'], for example.
Your best bet for the moment is to create a babeltrace CLI subprocess and, one line of output at a time, use a regex to find the fields you need. Ugly, but that's what available today.

Categories