Import own python modules in nextflow script block? - python

I created a python script called utilities.py in bin/ directory:
#!/usr/bin/env python3
import numpy as np
import pandas as pd
from datetime import datetime
import io
def print_info(in_df, fname_base):
buffer = io.StringIO()
df = in_df.copy()
df.info(buf=buffer)
s = buffer.getvalue()
with open(fname_base+"_info.txt", "w", encoding="utf-8") as f:
f.write(s)
def print_desc(in_df, fname_base):
df = in_df.copy()
desc = df.describe()
desc.to_csv(fname_base+"_desc.tsv", sep = '\t')
def print_data(in_df, fname_base):
df = in_df.copy()
print_info(df, fname_base)
print_desc(df, fname_base)
df.to_csv(fname_base+".tsv", sep = '\t')
and made it executable with chmod +x. I would like to use these functions in a several script blocks in various processes in my workflow. Currently when I try importing a function from my utilities module:
#!/bin/bash nextflow
process transform_data {
input:
path(data)
output:
path("out.tsv"), emit: out_data
script:
"""
#!/usr/bin/env python3
import pandas as pd
import io
from utilities import print_info
"""
}
I get the following error:
Traceback (most recent call last):
File ".command.sh", line 4, in <module>
from utilities import print_info
ModuleNotFoundError: No module named 'utilities'
Is it possible to import own modules in this way?

Which version of Nextflow are you using?
I tested with v22.04.5 and the following works:
My setup is little bit different, instead of specifying #!/usr/bin/env python3, I directly invoked a python script (test.py) which has from utilities import print_info inside it, and it works fine.
script:
"""
test.py
"""
Note that the following won't work: from .utilities import print_info. Therefore, you can import custom Python module with Nextflow.

Related

Install pandas when calling the PythonRunner object

I have a python script:
# -*- coding: utf-8 -*-
import pandas as pd
print('Hello World')
I'm trying to run it in my Scala project using a PythonRunner object:
import org.apache.spark.deploy.PythonRunner
import java.io.File
import java.nio.file.Paths
object PythonRunnerApp extends App{
val pyFilePath = this.getClass.getResource("").getPath + "/hello.py"
PythonRunner.main(Array(pyFilePath, "hello.py"))
}
As a result, I get an import error: ImportError: No module named pandas
Traceback (most recent call last):
File "/Users/a19562665/IdeaProjects/PythonRunner/target/scala-2.12/classes//hello.py", line 3, in <module>
import pandas as pd
ImportError: No module named pandas
Exception in thread "main" org.apache.spark.SparkUserAppException: User application exited with 1
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)
at PythonRunnerApp$.runUsingSpark(PythonRunnerApp.scala:15)
at PythonRunnerApp$.delayedEndpoint$PythonRunnerApp$1(PythonRunnerApp.scala:27)
at PythonRunnerApp$delayedInit$body.apply(PythonRunnerApp.scala:8)
at scala.Function0.apply$mcV$sp(Function0.scala:39)
at scala.Function0.apply$mcV$sp$(Function0.scala:39)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1$adapted(App.scala:80)
at scala.collection.immutable.List.foreach(List.scala:431)
at scala.App.main(App.scala:80)
at scala.App.main$(App.scala:78)
at PythonRunnerApp$.main(PythonRunnerApp.scala:8)
at PythonRunnerApp.main(PythonRunnerApp.scala)
Is there any way I can ask PythonRunner to install pandas?
UPD:
Here is another example of running python scripts:
#!/usr/bin/python
# -*- coding: utf-8 -*-
#import pandas as pd
import sys
for line in sys.stdin:
print('Hello, ' + line)
# this is hello.py
And Scala application:
spark.sparkContext.addFile(getClass.getResource("hello.py").getPath, true)
val test = spark.sparkContext.parallelize(List("Body!")).repartition(1)
val piped = test.pipe(SparkFiles.get("./hello.py"))
val c = piped.collect()
c.foreach(println)
Output: Hello, Body!
But the question remains open to me. Can I, as a cluster user, install pandas on workers?
You need to install all necessary Python dependencies across every executor before submitting Spark applications into your cluster
Keep in mind that you're not using Pandas here, and SparkSQL should probably be used instead

Can't import modules in Python?

I'm following instructions and using files from: https://github.com/eBay/ebay-oauth-python-client
I'm getting error when I import: oauth2api, credentialutil, & model. This is step 3 in the above site.
import yaml, json
sys.path.insert(0, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient/model')
sys.path.insert(1, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/test')
sys.path.insert(2, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient')
import credentialutil
import model
import oauth2api
print(sys.path)
error message:
C:\Users\kyle\AppData\Local\Programs\Python\Python38-32\python.exe C:/Users/kyle/PycharmProjects/app/app.py
Traceback (most recent call last):
File "C:/Users/kyle/PycharmProjects/app/app.py", line 10, in
import credentialutil
File "/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient\credentialutil.py", line 20, in
from model.model import environment, credentials
ModuleNotFoundError: No module named 'model.model'; 'model' is not a package
Process finished with exit code 1
The code runs if I only import model:
import yaml, json
sys.path.insert(0, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient/model')
sys.path.insert(1, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/test')
sys.path.insert(2, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient')
import model
print(sys.path)
no error message:
C:\Users\kyle\AppData\Local\Programs\Python\Python38-32\python.exe C:/Users/kyle/PycharmProjects/app/app.py
['/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient/model', '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/test', '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient', 'C:\Users\kyle\PycharmProjects\app', 'C:\Users\kyle\PycharmProjects\app', 'C:\Users\kyle\AppData\Local\Programs\Python\Python38-32\python38.zip', 'C:\Users\kyle\AppData\Local\Programs\Python\Python38-32\DLLs', 'C:\Users\kyle\AppData\Local\Programs\Python\Python38-32\lib', 'C:\Users\kyle\AppData\Local\Programs\Python\Python38-32', 'C:\Users\kyle\AppData\Local\Programs\Python\Python38-32\lib\site-packages', 'C:\Users\kyle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pymodel']
Process finished with exit code 0
I'm also getting a green line under oauthclient, and I don't know why. Everything is spelled correctly.
sys.path.insert(0, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient/model')
sys.path.insert(2, '/Users/kyle/PycharmProjects/app/ebay-oauth-python-client-master/oauthclient')
I can see two problems.
First, it seems that you are using Python under Windows, but you tried to insert a MacOS path to sys.path. Are you sure that paths like /Users/kyle/... really exist in your file system?
Second, you only need to insert the parent path, i.e. /path/to/ebay-oauth-python-client/oauthclient to your sys.path. In my local test, this works:
import yaml, json
import sys
sys.path.insert(0, r"C:\Users\guosh\Downloads\test\ebay-oauth-python-client\oauthclient")
import credentialutil
import model
import oauth2api
print(sys.path)
However, I would suggest you import the package as a whole, like below:
import yaml, json
import sys
sys.path.insert(0, r"C:\Users\guosh\Downloads\test\ebay-oauth-python-client")
import oauthclient
print(sys.path)

Python cannot import from another file name not defined

I am trying to import a custom function from another python file but keep getting an error NameError: name 'testme' is not defined. I confirmed that I am importing the file correctly according to this SO post and that the function is top level. What else can I try to fix this?
My main python file is:
import sys
import dbconn
#from dbconn import testme #<----did not work
dev=True
if(dev):
categId='528'
pollIds=[529,530,531]
else:
categId=str(sys.argv[1])
pollIds=[529,530,531]
df=testme(categIds)#callServer(categId,pollIds)
df
if(not categId.isdigit):
print('categ id fail. expected digit got: '+categId)
.....
and dbconn.py:
import pymysql #pip3 install PyMySQL
import pandas as pd
from scipy.stats.stats import pearsonr
from scipy import stats
def testme(categIds):
try:
df=categIds
except Exception as e:
print("broke")
return categIds
Not sure if it makes a difference but I am running the main python file from within a Jupyter notebook, and have a compiled version of dbconn.py in the same directory
In response to the suggestions I tried:
df=dbconn.testme(categIds)
got the error:
module 'dbconn' has no attribute 'testme'
You Have to follow these fox exact import
1)import <package>
2)import <module>
3)from <package> import <module or subpackage or object>
4)from <module> import <object>
in your case, you have tried
from dbconn import testme
you have to use only packages in from section and module in import section
like >>
from testme import dbconn

import ansible.module_utils in 2.2.1.0 as part of inventory module

Importing UTILS classes into Inventory - can it be done?
I have created a custom LDAP data importer as part of creating my inventory class. The LDAP schema we have wasn't similar enough to the LDAP plugin provided in samples.
My class is called ldapDataModule; the class is in:
/home/agt/ansible/agt_module_utils/ldapDataModule.py
My "$HOME/.ansible.cfg" file has the following:
module_utils = /home/agt/ansible/agt_module_utils
When running my Ansible inventory module, I get the following output:
ansible ecomtest37 -m ping
ERROR! Attempted to execute "/sites/utils/local/ansible/hosts" as
inventory script: Inventory script (/sites/utils/local/ansible/hosts) had
an execution error: Traceback (most recent call last):
File "/sites/utils/local/ansible/hosts", line 22, in
from ansible.module_utils import ldapDataModule
ImportError: No module named module.utils
The include statement inside hosts appears like this:
import copy
import ldap
import re
import sys
import operator
import os
import argparse
import datetime
import os.path
try:
import json
except:
import simplejson as json
from ansible.module_utils import ldapDataModule
class agtInventory(object):
RECOMENDATIONS?
I was able to do the following as a "work around". I'd still like to hear from Ansible guru's on proper use of "module_utils" variable from ansible.cfg
sys.path.insert(0, '/home/agt/ansible/agt_module_utils')
from ldapDataModule import ldapDataModule

Python: Import networkx as nx: Global name not defined

I wrote a module (processing_0) in which I import all packages and modules required for my project.
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import collections
import la
import csv
import fishery
import re
from collections import OrderedDict
import processing_1
import processing_2
import processing_3
from processing_1 import readingraph, readinpathgraph, preparefisher, inEEG
from processing_2 import pathwayprofile
from processing_3 import checkkin
from fishery import fisher
The modules that I wrote (processing_1/2/3) all require access to networkx (nx).
As part of the master module, I have a the startup function:
def startup():
EEG = readingraph("/.../file1")
EET = readingraph("/.../file2")
EEL = readingraph("/.../file3")
return EEG, EET, EEL
However, after importing processing_0 and trying to run startup() that uses readingraph from processing_1, I keep getting the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "processing_0.py", line 31, in startup
EEG = readingraph("/.../file1")
File "processing_1.py", line 4, in process
graph = nx.read_adjlist(filename)
NameError: global name 'nx' is not defined
Is there any way to globally import networkx as nx and make it accessible to all imported modules?
if you are using linux ubuntu, do these followings in order.
sudo apt-get update
sudo apt-get install python-networkx
go to pycharm env and > file>setting> interpreter and structure to
configure your python env and add packages, there is all packages
that are in env, then click on + to add newpackage
type networkx in search text box, and then select it from package
list and click on install
after finish it , click ok and close windows
In every file that you use networkx you need to import it. So just repeat the line
import networkx as nx
inside the file processing_1.py

Categories