How to install parts of specific python package? - python

I am looking at a code that uses acicobra package. Acicobra library is 670MB. The python code uses only a few functions from acicobra library. Is there a way to install only the required modules from this acicobra library and not the entire library? If I install the entire library, my docker image size gets inflated because of this gigantic library.
root#1f5edb150a78:/usr/local/lib/python2.7/site-packages# ls -l |grep -v dist-info|du -sh *|sort -hr
659M cobra
These are the only references to cobra in the python code
from cobra.mit.access import MoDirectory
from cobra.mit.session import LoginSession, LoginError
from cobra.mit.request import ClassQuery, DnQuery, QueryError
As you can see, the code is referencing only 3 modules out of the entire library.
I am looking for ways to avoid installing the entire library to limit the size of the dockerimage

Not unless it relies on other smaller sublibraries that you can fetch.
You could check out the code and see if you can create your own smaller python package.
But 90% of the time you'll go down a depecdency chain and need the full library anyway.
Also check the license if commercial use.
This should be the correct github for the source:
https://github.com/datacenter/cobra

Related

Running .jl file from R or Python

I am new to Julia. I developed a few lines of code to get the results I needed from packages I was not able to find in Python or R. Now, I am trying to get this file to be easily accessible, and wrap the code in Python or R. Has anyone done this before? I have tried a few methods and have not found anything that has helped.
The most simple way to do this would be just a few lines of code that calls the .jl file, runs it (which the code is then added to a .txt file from julia), and then alerts you when the code is done.
Any help would be greatly appreciated. R is the preferable method and at this point Python would be appreciated as well.
Please find below instructions for Python, R and just an external process (which of course is an executable command that can be run from any other process). I recommend putting your code in a package and loading it in one of those languages rather than executing this as an external process.
Python
Use Python Anaconda (not in-built system Python) and install Julia
Run Julia and install PyCall
using Pkg
ENV["PYTHON"]="/path/to/your/python/executable"
pkg"add PyCall"
pkg"build PyCall"
Put your code into a Julia package
using Pkg
Pkg.generate("MyPackage")
In the folder src you will find MyPackage.jl, edit it to look like this:
module MyPackage
function main(x,y)
#do very complex staff or place it in your_other_file.jl
2x.+y
end
include("your_other_file.jl")
export main, and_whatever_other_functio_you_defined
end
Install pyjulia
python -m pip install julia
(On Linux systems you might want to use python3 instead of python command)
For this step note that while an external Python can be used with Julia. However, for a convenience it might be worth
to consider using a Python that got installed together with Julia as PyCall.
In that case for installation use a command such this one:
%HOMEPATH%\.julia\conda\3\python -m pip install julia
or on Linux
~/.julia/conda/3/python -m pip install julia
Note that if you have JULIA_DEPOT_PATH variable defined you can replace %HOMEPATH%\.julia or ~/.julia/ with its value.
Run the appropiate Python and tell it to configure the Python-Julia integration:
import julia
julia.install()
Now you are ready to call your Julia code:
>>> from julia import Pkg
>>> Pkg.activate(".\\MyPackage") #use the correct path
Activating environment at `MyPackage\Project.toml`
>>> from julia import MyPackage
>>> MyPackage.main([1,2],5)
[7,9]
Gnu R
Configure your system PATH variable to point to your Julia location. Hence when you type julia in the console it should start Julia
Run the script below to install R-Julia integration
install.packages("JuliaCall")
library(JuliaCall)
julia <- julia_setup()
Follow the above instructions for Python (step 3 only) and create the package named MyPackage
Run the code
library(JuliaCall)
julia_eval("using Pkg;Pkg.activate(\"C:/temp/rrr/MyPackage\")")
julia_library("MyPackage")
julia_eval("MyPackage.main(3,5)")
Bash (or just any language)
Build the package following instructions for Python (step 3 only)
Configure the system PATH variable
Being in the package directory run the command (note string(:.) is a Julian trick that I use to avoid apostrophe escaping in bash commands):
julia -e "using Pkg;Pkg.activate(string(:.));Pkg.instantiate();using MyPackage;MyPackage.main(3,4)"
This will install all dependencies for your package. In order to skip the installation remove Pkg.instantiate() from the above command.
The answer from #przemyslaw-szufel is correct but maybe a bit overcomplicated. You don't necessarily need to wrap your code in a module or define a custom environment (yes it is good practice, but a step at the time...)
First create a file juliaScripts.jl with content:
function getAnElement(array,n)
return array[n]
end
To run functions defined in a .jl file in R
Then in R you just do:
> install.packages("JuliaCall")
> library(JuliaCall)
> julia_setup() # on every new R session !
> julia_source("juliaScript.jl")
> out <- julia_call("getAnElement",c(10,20,30),2)
> out
[1] 20
Note that the R vector has been automatically converted to a Julia Array.
To run functions defined in a .jl file in Python
In Python, it is even easier:
$ python3 -m pip install --user julia
>>> import julia
>>> julia.install() # only once, not every session
>>> jl=julia.Julia(compiled_modules=False)
>>> from julia import Main
>>> Main.include("juliaScript.jl")
>>> Main.getAnElement([1,2,3],2)
20
Also in Python arrays (native python lists as well Numpy arrays and other commonly used data structures) are automatically converted between Python and Julia.
Not to make advertising, but more details on interfacing R <-> Julia or Python <-> Julia are on my Apress(2019) book "Julia Quick Syntax reference" in Chapter 7 "Interfacing Julia with other languages" (I shouldn't say it, but you can easily find the pdf online in well-known sites...)
Using the JuliaConnectoR package:
library(JuliaConnectoR)
fun <- juliaCall("include", "/path/to/file.jl") # you may need to provide the full path
For more info on JuliaConnectoR, see the link above as well as this paper, which additionally compares it to alternative packages suck as JuliaCall and XRJulia.

J meter with Python : how to import the packages

Iam new bee to the jmeter
My code is working in the Python 2.7 with importing additional packages Dateutil, parser .
Problme : But when I am trying to run same code in the J Meter-JSR-223 PreProcessors , an error saying No module named dateutil in.
So , I have tried another approach to use Jython .
Installed the Jython ( downloaded the dateutil) and provide the packages reference under
import sys
sys.path.append('C:/Jython27/Lib/site-packages')
sys.path.append('C:/Jython27/Lib/site-packages/python_dateutil-2.4.2-py2.7/dateutil')
sys.path.append('C:/Jython27/Lib/site-packages/python_dateutil-2.4.2-py2.7/dateutil')
Now packages error is gone but string syntax error is present .
java.sql.Date' object has no attribute .
I believe dateutil package can be picked up from CPython as it doesn't require any extra wrappers for Java.
Install dateutil normally using pip like:
pip install python-dateutil
Add site-packages folder of Python (not Jython) installation to sys.path like:
sys.path.append("C:\Python27\Lib\site-packages")
That's it, now you should be able to use dateutil module functions from the JSR223 Test Elements:
Be aware that invoking Python scripts via Jython interpreter is not the best idea from performance perspective and if you're about to invoke your Python code only limited number of times and/or with a single thread - it might be better to go for the OS Process Sampler.
If you plan to use the Python code to create the main load - consider using Locust tool instead of JMeter. If you don't want to change JMeter a good approach would be rewriting your Python code in Groovy - it will be way better from the performance perspective.
hi please find follwing
import sys
sys.path.append('C:/Python27/Lib/site-packages')
sys.path.append('C:/Python27/Lib/site-packages/python_dateutil-2.4.2-py2.7/dateutil')
from dateutil.parser import *
sourceDateTimeOfEvent = ""
dateTimeOfEvent = ""
a=parse('2016-07-01 13:00:00')
sourceDateTimeOfEvent = a.isoformat()+"+05:30Z"
dateTimeOfEvent = a.isoformat()+ "Z"
vars.put("sourceDateTimeOfEvent", sourceDateTimeOfEvent)
vars.put("dateTimeOfEvent", dateTimeOfEvent)
This sourceDateTimeOfEvent and dateTimeOfEvent considered as two variables and passed it to the json file

How to bundle Python dependancies in IronWorker?

I'm writing a simple IronWorker in Python to do some work with the AWS API.
To do so I want to use the boto library which is distributed via PyPi repository. The boto library is not installed by default in the IronWorker runtime environment.
How can I bundle the boto library dependancy with my IronWorker code?
Ideally I'm hoping I can use something like the gem dependancy bundling available for Ruby IronWorkers - i.e in myRuby.worker specify
gemfile '../Gemfile', 'common', 'worker' # merges gems from common and worker groups
In the Python Loggly sample, I see that the hoover library is used:
#here we have to include hoover library with worker.
hoover_dir = os.path.dirname(hoover.__file__)
shutil.copytree(hoover_dir, worker_dir + '/loggly') #copy it to worker directory
However, I can't see where/how you specify which hoover library version you want, or where to download it from.
What is the official/correct way to use 3rd party libraries in Python IronWorkers?
Newer iron_worker version has native support of pip command.
So, you need:
runtime "python"
exec "something.py"
pip "boto"
pip "someotherpip"
full_remote_build true
[edit]We've worked on our toolset a bit since this answer was written and accepted. The answer from my colleague below is the recommended course moving forward.[/edit]
I wrote the Python client library for IronWorker. I'm also employed by Iron.io.
If you're using the Python client library, the easiest (and recommended) way to do this is to just copy over the library's installed folder, and include it when uploading the package. That's what the Python Loggly sample is doing above. As you said, that doesn't specify a version or where to download the library from, because it doesn't care. It just takes the one installed on your system and uses it. Whatever you get when you enter "import boto" on your local machine is what would be uploaded.
The other option is using our CLI to upload your worker, with a .worker file.
To do this, here's what you'd need to do:
Create a botoworker.worker file:
runtime "binary"
build 'pip install --install-option="--prefix=`pwd`/pips" boto'
file 'botoworker.py'
exec "botoworker.sh"
That second line is the pip command that will be run to install the dependency. You can modify it like you would any pip command run from the command line. It's going to execute that command on the worker during the "build" phase, so it's only executed once instead of every time you run a task.
The third line should be changed to the Python file you want to run--it's your Python worker file. Here's the one we used to test this:
import boto
If you save that as botoworker.py, the above should work without any modification. :)
The fourth line is a shell script that's going to actually run your worker. I've included the one we used below. Just save it as botoworker.sh, and you won't have to worry about modifying the .worker file above.
PYTHONPATH="$HOME/pips/lib/python2.7/site-packages:$PYTHONPATH" python botoworker.py "$#"
You'll notice it refers to your Python file--if you don't name your Python file botoworker.py, remember to change it here, too. All this does is set your PYTHONPATH to include the installed library, and then runs your Python file.
To upload this, just make sure you have the CLI installed (gem install iron_worker_ng, making sure your Ruby version is 1.9.3 or higher) and then run "iron_worker upload botoworker" in your shell, from the same directory your botoworker.worker file is in.
Hope this helps!

Is there a faster method to load a yaml file than the standard .load method? Django/Python

I am loading a big yaml file and it is taking forever. I am wondering if there is a faster method than the yaml.load() method.
I have read that there is a CLoader method but havent been able to run it.
The website that suggested this CLoader method asks me to do this:
Download the source package PyYAML-3.08.tar.gz and unpack it.
Go to the directory PyYAML-3.08 and run:
$ python setup.py install
If you want to use LibYAML bindings, which are much faster than the pure Python version, you need to download and install LibYAML.
Then you may build and install the bindings by executing
$ python setup.py --with-libyaml install
In order to use LibYAML based parser and emitter, use the classes CParser and CEmitter:
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
This looks like this will work but I dont have a setup.py directory anywhere in my Django project and therefore can't install/import any of these things
Can anyone help me figure out how to do this or let me know about another faster loading method??
Thanks for the help!!
I have no idea what's faster - bspymaster's ideas might be the most useful.
When you download PyYAML-3.08.tar.gz, inside the archive there will be a setup.py what you can run.
Note to use LibYAML, download this: http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
And run using the instructions from http://pyyaml.org/wiki/LibYAML
You will need a set a build tools, which should be installed on linux/unix, for osx make sure xcode is installed, and I'm not sure about windows.

How can I import hbase in python?

I'm trying to play around with hbase in python and I am using the cloudera repository to install the hadoop/hbase packages. It seems to work as I can access and work on the database using the shell but its not fully working within python.
I know to communicate with hbase I need thrift so I downloaded and complied it from source, I can import thrift into python but when I do from hbase import Hbase, I get module not found errors.
Does anyone know what package/module I would need to get it to work? I tried to look around easy_install and yum(I'm using centos6) and no luck. I did find an article where a person using debain installed it by doing sudo aptiutde install python-hbase I don't have that command/package, so I'm not sure how to get it(or if I have to compile from source to get it).
Also if it helps, I installed most of the base from cloudera and followed some instructions(the ones didn't require install) from http://yannramin.com/2008/07/19/using-facebook-thrift-with-python-and-hbase/
Any help/tips/suggestions would be great.
Thanks!
Have a look at HappyBase (see https://github.com/wbolster/happybase for info). It is the modern way to interact with HBase from Python. It covers the complete Thrift API but wraps it in a much better interface.
Okay, I figured it out. If anyone else is having problems with this in the future its actually pretty easy. In the step where you run thrift --gen py Hbase.thrift, it creates a hbase folder in the location you ran that command. Simply take that command and copy it to your default module folder(or in the folder where you run your program and it should work).
search for /src/contrib/thriftfs/gen-py under hadoop installation folder
Copy the output of thrift --gen py Hbase.thrif onto the location below (part till /home/hadoop/data/ will differ in your case) /home/hadoop/data/hadoop-1.0.4/src/contrib/thriftfs/gen-py
then
$ python
import sys
sys.path.append("/home/hadoop/data/hadoop-1.0.4/src/contrib/thriftfs/gen-py")
import hbase
It should work now

Categories