Monkey-patching scipy.misc.imresize

Monkey-patching scipy.misc.imresize - python

I just spent 20 min monkey-patching this, figured I’d share
My code calls a function dataloader.dataloader from a module dataloader which calls the function scipy.misc.imresize
I tried using numpy.array(Image.fromarray(arr).resize()) as suggested in the scipy doc, but I ran into this issue, and since the accepted answer was to use scipy.misc.imresize, so that was not very helpful.
By what should I replace imresize?

import PIL
import numpy as np
def imresize(arr, size, interp="nearest" , mode="L"):
if len(arr.shape) == 3:
return np.stack([imresize(arr[:,:,channel], size, interp, mode) for channel in range(arr.shape[-1])], axis=-1)
resample_ = {'nearest': PIL.Image.NEAREST, 'lanczos': PIL.Image.LANCZOS, 'bilinear': PIL.Image.BILINEAR, 'bicubic': PIL.Image.BICUBIC, 'cubic': PIL.Image.BICUBIC}[interp]
return np.array(PIL.Image.fromarray(arr,mode=mode).resize(size, resample=resample_))
dataloader.scipy.misc.imresize=imresize
explanation:
this functions supports all the arguments of the original imresize (translating the call to a call to PIL as suggested in the scipy documentation)
The first part is in case the image has several channels, we simply process them one at a time, and use that.

Related

How to call up Open CV Durand Tonemap function?

I'm trying to examine multiple tone-mapping operators in Open CV, using Python.
Various sources use four operators (Drago, Durand, Reinhard, Mantiuk). Three of them work. However, when I call up cv2.createTonemapDurand(), I get this error:
AttributeError: module 'cv2.cv2' has no attribute 'createTonemapDurand'
Is it possible to call the Durand operator somehow, or did Open CV drop that one recently?
Thanks!

I'll switch from comment to answer to have a better representation.
you just have to :
import cv2
cv2.xphoto.createTonemapDurand()
Be aware that, if u compiled opencv by yourself, you had to check OPENCV_ENABLE_NONFREE.

Please post your code where you import cv2 and call the function. If you want to look for some functions, attributes or whatever either look in the documentation of the package or use dir() and type(). For your example you can use this:
import cv2
from re import match
cv2_filtered = filter(lambda v: match('.*Tonemap', v), dir(cv2))
[print(val) for val in cv2_filtered]
Returns:
Tonemap
TonemapDrago
TonemapMantiuk
TonemapReinhard
createTonemap
createTonemapDrago
createTonemapMantiuk
createTonemapReinhard
Seems like there is no function createTonemapDurand in cv2.

Should I use numpy's Random Generator?

I have a large Python code that I've been maintaining/updating/expanding since ~2014. Recently I came across numpy's Random Number Generator Policy (2018-05) and now I'm a bit confused.
I'm not sure what changed, and if I should upgrade my code accordingly to use the new Random Generator. For example, the Random sampling docs say:
# Do this
from numpy.random import default_rng
rng = default_rng()
vals = rng.standard_normal(10)
more_vals = rng.standard_normal(10)
# instead of this
from numpy import random
vals = random.standard_normal(10)
more_vals = random.standard_normal(10)
All my code depends on the (old?) syntax shown in the second block (i.e., I don't use default_rng but simple calls to np.random.seed(), np.random.uniform(), np.random.normal(), etc), and I don't know why I should use the first block instead of the second block.
Could someone shed some light over this please?

1.In python2(old code) default_rng is not available.
2.In python3(new code) both first and second blocks you mentioned will runs without an error and executed.
3.In future they may drop the random.standard_normal from coming versions of python,that's why they mentioned to use of default_rng instead of random.standard_normal

VSCode Itellisense with python C extension module (petsc4py)

I'm currently using a python module called petsc4py (https://pypi.org/project/petsc4py/). My main issue is that none of the typical intellisense features seems to work with this module.
I'm guessing it might have something to do with it being a C extension module, but I am not sure exactly why this happens. I initially thought that intellisense was unable to look inside ".so" files, but it seems that numpy is able to do this with the array object, which in my case is inside a file called multiarray.cpython-37m-x86_64-linux-gnu (check example below).
Does anyone know why I see this behaviour in the petsc4py module. Is there anything that I (or the developers of petsc4py) can do to get intellisense to work?
Example:
import sys
import petsc4py
petsc4py.init(sys.argv)
from petsc4py import PETSc
x_p = PETSc.Vec().create()
x_p.setSizes(10)
x_p.setFromOptions()
u_p = x_p.duplicate()
import numpy as np
x_n = np.array([1,2,3])
u_n = x_n.copy()
In this example, when trying to work with a Vec object from petsc4py, doing u_p.duplicate() cannot find the function and the suggestion is simply a repetition of the function immediately before. However, using an array from numpy, doing u_n.copy() works perfectly.

If you're compiling in-place then you're bumping up against https://github.com/microsoft/python-language-server/issues/197.

Python import statement with argument [duplicate]

This question already has answers here:
Python import functions from module twice with different internal imports
(3 answers)
Closed 4 years ago.
I am using numpy in one of my libraries. No surprise there.
One user would essentially like a copy of my project where I don't use the default numpy, but the one bundled with autograd. For instance, let's say I have a dumb function:
import numpy
def doSomething(x):
return numpy.sin(x)
They would like a copy of the library where all of these import numpy are replaced by from autograd import numpy:
from autograd import numpy
def doSomething(x):
return numpy.sin(x)
This would allow them to easily compute gradients and jacobians of my functions.
I would like to know what the easiest way to handle this is without copying the whole codebase and replacing all of these lines.
Options I am aware of:
I could make a copy of the codebase (lib and lib_autograd) where the first uses import numpy, and the second uses from autograd import numpy. This is bad because then I have to maintain two codebases.
I could automatically import from autograd if it is available:
try:
from autograd import numpy
except ImportError:
import numpy
The reason I do not want to do this is that many people have highly optimized numpy installs, whereas autograd might not. So I want to give the user an option which version to import. Forcing the user to use the autograd version if they have it seems bad since it would not be apparent to the user what is going on, and would require the user to uninstall autograd if they want to use the library with their default numpy installation.
So what are my options?
Ideally there would be a way of doing something like passing a parameter to the import statement (I do realize that you can't do this):
useAutograd = False
from lib(useAutograd) import doSomething

You can have 'conditional' import with:
try:
from autograd import numpy
except ImportError:
import numpy
One of other options is to have environment variable that switches whether you want to use numpy from autograd or regular one, because here you either use autograd.numpy (if it exists) or numpy. You don't have an option to use numpy if there is autograd module/package.
To elaborate on giving user an option to switch, here is one possibility:
import os
if os.environ.get('AUTOGRADNUMPY'):
try:
from autograd import numpy
except ImportError:
import numpy
else:
import numpy
Having environment variable AUTOGRADNUMPY set to True (or anything else that is not empty string) when you want to load numpy from autograd package. If it is not set or doesn't exist, regular numpy is imported.
All of this stands if user has at least numpy installed.

This might help:
try:
from autograd import numpy as np
except ImportError:
import numpy as np
...
...
np.sum(..)

Multiprocessing IOError: bad message length

I get an IOError: bad message length when passing large arguments to the map function. How can I avoid this?
The error occurs when I set N=1500 or bigger.
The code is:
import numpy as np
import multiprocessing
def func(args):
i=args[0]
images=args[1]
print i
return 0
N=1500 #N=1000 works fine
images=[]
for i in np.arange(N):
images.append(np.random.random_integers(1,100,size=(500,500)))
iter_args=[]
for i in range(0,1):
iter_args.append([i,images])
pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)
In the docs of multiprocessing there is the function recv_bytes that raises an IOError. Could it be because of this? (https://python.readthedocs.org/en/v2.7.2/library/multiprocessing.html)
EDIT
If I use images as a numpy array instead of a list, I get a different error: SystemError: NULL result without error in PyObject_Call.
A bit different code:
import numpy as np
import multiprocessing
def func(args):
i=args[0]
images=args[1]
print i
return 0
N=1500 #N=1000 works fine
images=[]
for i in np.arange(N):
images.append(np.random.random_integers(1,100,size=(500,500)))
images=np.array(images) #new
iter_args=[]
for i in range(0,1):
iter_args.append([i,images])
pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)
EDIT2 The actual function that I use is:
def func(args):
i=args[0]
images=args[1]
image=np.mean(images,axis=0)
np.savetxt("image%d.txt"%(i),image)
return 0
Additionally, the iter_args do not contain the same set of images:
iter_args=[]
for i in range(0,1):
rand_ind=np.random.random_integers(0,N-1,N)
iter_args.append([i,images[rand_ind]])

You're creating a pool and sending all the images at once to func(). If you can get away with working on a single image at once, try something like this, which runs to completion with N=10000 in 35s with Python 2.7.10 for me:
import numpy as np
import multiprocessing
def func(args):
i = args[0]
img = args[1]
print "{}: {} {}".format(i, img.shape, img.sum())
return 0
N=10000
images = ((i, np.random.random_integers(1,100,size=(500,500))) for i in xrange(N))
pool=multiprocessing.Pool(4)
pool.imap(func, images)
pool.close()
pool.join()
The key here is to use iterators so you don't have to hold all the data in memory at once. For instance I converted images from an array holding all the data to a generator expression to create the image only when needed. You could modify this to load your images from disk or whatever. I also used pool.imap instead of pool.map.
If you can, try to load the image data in the worker function. Right now you have to serialize all the data and ship it across to another process. If your image data is larger, this might be a bottleneck.
[update now that we know func has to handle all images at once]
You could do an iterative mean on your images. Here's a solution without using multiprocessing. To use multiprocessing, you could divide your images into chunks, and farm those chunks out to the pool.
import numpy as np
N=10000
shape = (500,500)
def func(images):
average = np.full(shape, 0)
for i, img in images:
average += img / N
return average
images = ((i, np.full(shape,i)) for i in range(N))
print func(images)

Python is likely to load your data in your RAM memory and you need this memory to be available. Have you checked your computer memory usage ?
Also as Patrick mentioned, you're loading 3GB of data, make sure you use the 64 bits version of Python as you are reaching the 32 bits memory contraint. This could cause your process to crash : 32 vs 64 bits Python
Another improvement would be to use python 3.4 instead of 2.7. Python 3 implementation seems to be optimized for very large ranges, see Python3 vs Python2 list/generator range performance

When running your program it actually gives me an clear error:
OSError: [Errno 12] Cannot allocate memory
Like mentioned by other users, the solution to your problem is simple add memory(a lot) or change the way your program is handling the images.
The reason it's using so much memory is because you allocate your memory for your images on a module level. So when multiprocess forks your process it's also copying all the images (which isn't free according to Shared-memory objects in python multiprocessing), this is not necessary because you are also giving the images as an argument to the function which the multiprocess module also copies using ipc and pickle, this would still likely result in a lack of memory. Try one of the proposed solutions given by the other users.

This is what solved the problem: declaring the images global.
import numpy as np
import multiprocessing
N=1500 #N=1000 works fine
images=[]
for i in np.arange(N):
images.append(np.random.random_integers(1,100,size=(500,500)))
def func(args):
i=args[0]
images=images
print i
return 0
iter_args=[]
for i in range(0,1):
iter_args.append([i])
pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)

The reason why you get IOError: bad message length when passing around large objects is due to a hard coded limit in older CPython versions (3.2 and earlier) of 0x7fffffff Bytes or around 2.1GB: https://github.com/python/cpython/blob/v2.7.5/Modules/_multiprocessing/multiprocessing.h#L182
This CPython changeset (which is in CPython 3.3 and later) removed the hard coded limit: https://github.com/python/cpython/commit/87cf220972c9cb400ddcd577962883dcc5dca51a#diff-4711c9abeca41b149f648d4b3c15b6a7d2baa06aa066f46359e4498eb8e39f60L182

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Monkey-patching scipy.misc.imresize - python

Related

How to call up Open CV Durand Tonemap function?

Should I use numpy's Random Generator?

VSCode Itellisense with python C extension module (petsc4py)

Python import statement with argument [duplicate]

Multiprocessing IOError: bad message length

Categories

Resources