import fails for murmur2 package in Redshift UDF - python

I am trying to import murmur2 package as a library in Redshift database. I did following steps
Run the module packer
$ ./installPipModuleAsRedshiftLibrary.sh -m murmur2 -s s3://path/to/murmur2/lib
Create library on redshift
CREATE OR REPLACE LIBRARY murmur2 LANGUAGE plpythonu from 's3://path/to/murmur2/lib/murmur2.zip' WITH CREDENTIALS AS 'aws_access_key_id=AAAAAAAAAAAAAAAAAAAA;aws_secret_access_key=SSSSSSSSSSSSSSSSS' region 'us-east-1';
Create function and query
create OR REPLACE function f_py_kafka_partitioner (s varchar, ps int)
returns int stable as $$ import murmur2
m2 = murmur2.murmur64a(s, len(s), 0x9747b28c)
return m2 % ps
$$ language plpythonu;
SELECT f_py_kafka_partitioner('jiimit', 100);
This gives following error :
[Amazon](500310) Invalid operation: ImportError: No module named murmur2. Please look at svl_udf_log for more information
Details:
-----------------------------------------------
error: ImportError: No module named murmur2. Please look at svl_udf_log for more information
code: 10000
context: UDF
query: 0
location: udf_client.cpp:366
process: padbmaster [pid=31381]
-----------------------------------------------;
And here is the contents of svl_udf_log
0 ImportError: No module named murmur2 2018-10-14 07:05:43.431561 line 2, in f_py_kafka_partitioner\n f_py_kafka_partitioner 1000 20000 0
Folder structure looks like this

Related

IronPython.Runtime.Exceptions.ImportException: 'No module named 'errno'

Overview
Hi I'm trying to integrate some python code that uses various libraries in a C# application, but even referencing the Lib folder isn't helping me.
C# code
var loader = "C:\\PythonNet\\PythonNetProvider\\loader.py";
var lib = "C:\\Python34\\Lib";
ICollection\<string\> paths = engine.GetSearchPaths();
paths.Add(loader);
paths.Add(lib);
engine.SetSearchPaths(paths);
var source = engine.CreateScriptSourceFromFile("C:\\PythonNet\\PythonNetProvider\\main.py");
var scope = engine.CreateScope();
var compiledSource = source.Compile();
//The line underneath is where the exception is being thrown
compiledSource.Execute(scope);
The exception is the following: IronPython.Runtime.Exceptions.ImportException: 'No module named 'errno''
loader.py
import pandas as pd
# other code
main.py
import loader
import copy
# other code
Conclusions
If I try to change the order of the two imports in the main.py the exception gives me the following message: IronPython.Runtime.Exceptions.ImportException: 'No module named '_weakref''
How can I solve these issues? Even using a different technology than IronPython
I've done pip install pandas and added the pandas folder path to the ICollection<string> paths but nothing changes

python: generated grpc and protobuf files don't stay in current directory

I have two .proto schema file with the following directives:
// vendor_interface.proto
syntax = "proto3";
package vendor.name.int.v1;
import "google/protobuf/timestamp.proto";
...
and
// vendor_interface_api.proto
syntax = "proto3";
package vendor.name.int.v1;
import "vendor/name/int/v1/vendor_interface.proto";
...
message my_message {
vendor.name.int.v1.token token = 1;
...
}
...
I noticed that grpc and protobuf code generated with
% python3 -m grpc_tools.protoc --proto_path=./my_proto_grpc/ \
--python_out=. \
--grpc_python_out=. \
./my_proto_grpc/vendor/name/int/v1/vendor_interface.proto
is placed in vendor/name/int/v1/, is it expected behavior? Does it mean that python code using API, classes etc. from the generated files need to have updated PYTHONPATH (so that it finds the generated code)?

Nagios check giving error and not the output I expect

I have a python code which when run locally gives correct output but when I run it with Nagios check locally it gives errors.
Code :
#!/usr/bin/env python
import pandas as pd
df = pd.read_csv("...")
print(df)
Nagios configuration :
inside localhost.cfg
define service {
use local-service
host_name localhost
service_description active edges
check_command. check_edges
}
inside commands.cfg
define command {
command_name check_edges
command_line $USER1$/check_edges.py $HOSTADDRESS$
}
Error :
(No output on stdout) stderr : Traceback File "/usr/local/nagios/libexec/check_edges.py" line 3, in <module> import pandas as pd
ImportError: No module named pandas
Please give as much details as possible to solve this problem
****pip show python gives :
Location: /usr/lib/python2.7/lib-dynload
pip show pandas gives :
Location : /home/nwvepops01/.local/lib/python2.7/site-packages****
As user, from the shell, check the istance of python.
For example with this command:
env python
Modify the script and on the first line replace this
#!/usr/bin/env python
with the absolute path of the python executable.

Unable to import module 'lambda_function': No module named 'error'

I have a simple Python Code that uses Elasticsearch module "curator" to make snapshots.
I've tested my code locally and it works.
Now I want to run it in an AWS Lambda but I have this error :
Unable to import module 'lambda_function': No module named 'error'
Here is how I proceeded :
I created manually a Lambda and gave it a "AISA-BasicLambdaExecutionRole" role. Then I created my package with my function and the dependencies that I installed with the command :
pip install elasticsearch-curator -t /<path>/myRepository
I zipped the content (not the folder) and uploaded it in my Lambda.
I changed the Handler name to "lambda_function.lambda_handler" (my function's name is "lambda_function.py").
Did I miss something ? This is my first time working with Lambda and Python
I've seen the other questions about this error :
"errorMessage": "Unable to import module 'lambda_function'"
But nothing works for me.
EDIT :
Here is my lambda_function :
from __future__ import print_function
import curator
import time
from curator.exceptions import NoIndices
from elasticsearch import Elasticsearch
def lambda_handler(event, context):
es = Elasticsearch()
index_list = curator.IndexList(es)
index_list.filter_by_regex(kind='prefix', value="logstash-")
Number = 1
try:
while Number <= 3:
Name="snapshotLmbd_n_"+ str(Number) +""
curator.Snapshot(index_list, repository="s3-backup", name= Name , wait_for_completion=True).do_action()
Number += 1
print('Just taking a nap ! will be back soon')
time.sleep(30)
except KeyboardInterrupt:
print('My bad ! I interrupted this')
return
Thank you for your time.
Ok, since you have everything else correct, check for the permissions of the python script.
It must have executable permissions (755)

Pig streaming through python script with import modules

Working with pigtmp$ pig --version
Apache Pig version 0.8.1-cdh3u1 (rexported)
compiled Jul 18 2011, 08:29:40
I have a python script (c-python), which imports another script, both very simple in my example:
DATA
example$ hadoop fs -cat /user/pavel/trivial.log
1 one
2 two
3 three
EXAMPLE WITHOUT INCLUDE - works fine
example$ pig -f trivial_stream.pig
(1,1,one)
()
(1,2,two)
()
(1,3,three)
()
where
1) trivial_stream.pig:
DEFINE test_stream `test_stream.py` SHIP ('test_stream.py');
A = LOAD 'trivial.log' USING PigStorage('\t') AS (mynum: int, mynumstr: chararray);
C = STREAM A THROUGH test_stream;
DUMP C;
2) test_stream.py
#! /usr/bin/env python
import sys
import string
for line in sys.stdin:
if len(line) == 0: continue
new_line = line
print "%d\t%s" % (1, new_line)
So essentially I just aggregate lines with one key, nothing special.
EXAMPLE WITH INCLUDE - bombs!
Now I'd like to append a string from a python import module which sits in the same directory as test_stream.py. I've tried to ship the import module in many different ways but get the same error (see below)
1) trivial_stream.pig:
DEFINE test_stream `test_stream.py` SHIP ('test_stream.py', 'test_import.py');
A = LOAD 'trivial.log' USING PigStorage('\t') AS (mynum: int, mynumstr: chararray);
C = STREAM A THROUGH test_stream;
DUMP C;
2) test_stream.py
#! /usr/bin/env python
import sys
import string
import test_import
for line in sys.stdin:
if len(line) == 0: continue
new_line = ("%s-%s") % (line.strip(), test_import.getTestLine())
print "%d\t%s" % (1, new_line)
3) test_import.py
def getTestLine():
return "test line";
Now
example$ pig -f trivial_stream.pig
Backend error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: 'test_stream.py ' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:265)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.cleanup(PigMapBase.java:103)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Pig Stack Trace
ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: 'test_stream.py ' failed with exit status: 1
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C. Backend error : Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: 'test_stream.py ' failed with exit status: 1
at org.apache.pig.PigServer.openIterator(PigServer.java:753)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:615)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:396)
at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: 'test_stream.py ' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
at org.apache.pig.PigServer.storeEx(PigServer.java:885)
at org.apache.pig.PigServer.store(PigServer.java:827)
at org.apache.pig.PigServer.openIterator(PigServer.java:739)
... 7 more
Thanks you much for your help!
-Pavel
Correct answer from comment above:
The dependencies aren't shipped, if you want your python app to work with pig you need to tar it (don't forget init.py's!), then include the .tar file in pig's SHIP statement. The first thing you do is untar the app. There might be issues with paths, so I'd suggest the following even before tar extraction: sys.path.insert(0, os.getcwd()).
You need to append the current directory to sys.path in your test_stream.py:
#! /usr/bin/env python
import sys
sys.path.append(".")
Thus the SHIP command you had there does ship the python script, but you just need to tell Python where to look.

Categories