Limit the range of x in seaborn distplot KDE estimation

Limit the range of x in seaborn distplot KDE estimation - python

Suppose we have an array with numbers between 0 and 1:
arr=np.array([ 0. , 0. , 0. , 0. , 0.6934264 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.6934264 , 0. , 0.6934264 ,
0. , 0. , 0. , 0. , 0.251463 ,
0. , 0. , 0. , 0.87104906, 0.251463 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.48419626,
0. , 0. , 0. , 0. , 0. ,
0.87104906, 0. , 0. , 0.251463 , 0.48419626,
0. , 0.251463 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.251463 , 0. , 0.35524532, 0. ,
0. , 0. , 0. , 0. , 0.251463 ,
0.251463 , 0. , 0.74209813, 0. , 0. ])
Using seaborn, I want to plot a distribution plot:
sns.distplot(arr, hist=False)
Which will give us the following figure:
As you can see, the kde estimation ranges from somewhere near -0.20 to 1.10. Is it possible to force the estimation to be between 0 and 1? I have tried the followings with no luck:
sns.distplot(arr, hist=False, hist_kws={'range': (0.0, 1.0)})
sns.distplot(arr, hist=False, kde_kws={'range': (0.0, 1.0)})
The second line raises an exception -- range not a valid keyword for kde_kws.

The correct way of doing this, is by using the clip keyword instead of range:
sns.distplot(arr, hist=False, kde_kws={'clip': (0.0, 1.0)})
which will produce:
Indeed, if you only care about the kde and not the histogram, you can use the kdeplot function, which will produce the same result:
sns.kdeplot(arr, clip=(0.0, 1.0))

Setting plt.xlim(0, 1) beforehand should help :
import matplotlib.pyplot as plt
plt.xlim(0, 1)
sns.distplot(arr, hist=False)

Related

Zeroes on specific rows in Python

I have an array Pe. I want to exclude certain rows mentioned in the list J and ensure the other rows have all zero elements. For example, for Pe[0], J[0]=[0,1] which means 0,1 rows of Pe[0] are to be excluded but 2 row of Pe[0] should contain all zero elements. Similarly, for Pe[1]. But I am getting an error. I also present the expected output.
import numpy as np
Pe = [np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]]),
np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])] #Entry pressure
J = [[0,1],[2]]
for i in range(0,len(Pe)):
out = np.zeros_like(Pe[i])
for j in range(0,len(J)):
out[i][J[j]] = Pe[i][J[j]]
print([out])
The error is
in <module>
out[i][J[j]] = Pe[i][J[j]]
ValueError: shape mismatch: value array of shape (2,12) could not be broadcast to indexing result of shape (2,)
The expected output is
[np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ]]),
np.array([[0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0., 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])]

Using lists and loops in Numpy is often an anti-pattern, and that is the case here. You should be using vectorised operations throughout. J is jagged so you need to reinterpret it as a boolean indexer. Also, Pe should not start with repeated dimensions; it should start as a single two-dimensional array without a list.
import numpy as np
Pe = np.array([[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 316.58460442, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 230.97804127, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])
J = np.ones((2, Pe.shape[0]), dtype=bool)
J[0, 0:2] = 0
J[1, 2] = 0
Pe_indexed = np.tile(Pe, (J.shape[0], 1, 1))
Pe_indexed[J, :] = 0
Pe_indexed will now be a proper three-dimensional matrix, no lists.

out = []
for arr, ind in zip(Pe, J):
x = np.zeros_like(arr)
x[ind] = arr[ind]
out.append(x)

How to add rows to a matrix with pad?

I have a matrix like this:
profile=np.array([[0,0,0.5,0.1],
[0.3,0,0,0],
[0,0,0.1,0.9],
[0,0,0,0.1],
[0,0.5,0,0]])
And I want to add a row before and after filled with zeros. How can I do that?
I thought of using np.pad but not sure how.
Output should be:
np.array([[0,0,0,0],
[0,0,0.5,0.1],
[0.3,0,0,0],
[0,0,0.1,0.9],
[0,0,0,0.1],
[0,0.5,0,0]
[0,0,0,0]])

The np.pad function allows you to specify the axes you want to pad:
In [3]: np.pad(profile, ((1, 1), (0, 0)))
Out[3]:
array([[0. , 0. , 0. , 0. ],
[0. , 0. , 0.5, 0.1],
[0.3, 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9],
[0. , 0. , 0. , 0.1],
[0. , 0.5, 0. , 0. ],
[0. , 0. , 0. , 0. ]])
The nested tuple can be read as: pad 1 array "above", and 1 array "below" axis 0, and pad 0 arrays "above" and 0 arrays "below" axis 1.
Another example, which pads five columns "after" on axis 1:
In [4]: np.pad(profile, ((0, 0), (0, 5)))
Out[4]:
array([[0. , 0. , 0.5, 0.1, 0. , 0. , 0. , 0. , 0. ],
[0.3, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9, 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.1, 0. , 0. , 0. , 0. , 0. ],
[0. , 0.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])

You can use np.pad:
out = np.pad(profile, 1)[:, 1:-1]
Output:
>>> out
array([[0. , 0. , 0. , 0. ],
[0. , 0. , 0.5, 0.1],
[0.3, 0. , 0. , 0. ],
[0. , 0. , 0.1, 0.9],
[0. , 0. , 0. , 0.1],
[0. , 0.5, 0. , 0. ],
[0. , 0. , 0. , 0. ]])
Because np.pad pads it on all sides (left and right, in addition to top and bottom), [:, 1:-1] slices off the first and last columns.

Multiprocessing pool inside a class in python changing array to list

I am trying to use multiprocessing.pool in python 3 inside a class. This is the original function:
stress_matrix, compliance = self.problem.compute_objs(self.xphys)
avg_sm = np.zeros(self.nel)
for i2 in range(self.nel):
avg_sm = avg_stress_calc(self.xphys, self.nelx, self.nely,
self.rmin, stress_matrix, avg_sm, i2)
This leaves me with an array with a shape of (16,) and is this:
array([0.81814754, 0.64561319, 0.62517261, 0.78422925, 0.6962134 ,
0.65993462, 0.63970099, 0.68776093, 0.49890513, 0.60900864,
0.71575952, 0.73120825, 0.32964378, 0.53196899, 0.80481781,
0.99930964])
I tried to speed this up by using multiprocessing pools as my NEL is greater than 10,000 (normally), like so:
avgsm = np.zeros(self.nel)
pool = multiprocessing.Pool()
func = partial(avg_stress_calc, self.xphys, self.nelx,
self.nely, self.rmin, stress_matrix, avgsm)
avg_sm = pool.map(func, range(self.nel))
For some reason when I do this I get an attribute error: 'list' object has no attribute 'shape' and I convert it to an array to get the shape it is (16, 16). The output from the multi-process version looks like this:
[array([0.81814754, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0.64561319, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0.62517261, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0.78422925, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0.6962134, 0. ,
0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ]), array([0. , 0. , 0. , 0. , 0. ,
0.65993462, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0.63970099, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.68776093, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.49890513, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.60900864,
0. , 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.71575952, 0. , 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.73120825, 0. , 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.32964378, 0. , 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.53196899, 0. ,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.80481781,
0. ]), array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.99930964])]
I was hoping to use multiprocessing to speed up the for loop as that is the greatest time consumer in my code. Any advice on what I am doing wrong would be greatly appreciated.
Thanks!

Pool().map() is good but if you need an object (memory space) to be shared between the different instances you better use threads like this:
import numpy as np
from threading import Thread
import random
import time
def partial(args, process_id , avg_sm):
# do some stuff
print("Th%d - Starting working time from 1 to 3 seconds"%process_id)
# do more stuff
working_time = random.uniform(1,3)
time.sleep( working_time )
# write the result
avg_sm[process_id] = random.uniform(0.0,1.0)
print("Th%d - Finishing working time from 1 to 3 seconds"%process_id)
if __name__ == "__main__":
MAX_THREADS = 1000
nel = 10000
args = ["some args you define"]
avg_sm = np.zeros(nel) #[0]*nel
process_l = []
t0 = time.time()
for pid in range(nel):
if pid%MAX_THREADS == 0:
for p in process_l:
p.join()
process_l.append( Thread( target=partial, args=( args, pid, avg_sm ) ) )
process_l[pid].start()
for p in process_l:
p.join()
t1 = time.time() - t0
print(avg_sm)
print("Total computation time: %d s"%t1)
PS: you can put as much args as you want, in this example I used a single list named args
EDIT
A MAX_THREADS limit was needed due to memory problems, this you will have to define by yourself. You when this number of created threads are reached (if pid%MAX_THREADS == 0:) you will wait until they are finished (maybe you can create a new thread once one or more has finished, that would be more efficient I think). I assumed a delay of 1 to 3 seconds of your method to finish computation time.
Results:
Th9889 - Starting working time from 1 to 3 seconds
Th9988 - Starting working time from 1 to 3 seconds
Th9919 - Starting working time from 1 to 3 seconds
Th9986 - Starting working time from 1 to 3 seconds
Th9866 - Starting working time from 1 to 3 seconds
Th9951 - Starting working time from 1 to 3 seconds
Th9918 - Starting working time from 1 to 3 seconds
Th9991 - Starting working time from 1 to 3 seconds
Th9886 - Starting working time from 1 to 3 seconds
Th9915 - Starting working time from 1 to 3 seconds
Th9996 - Starting working time from 1 to 3 seconds
Th9963 - Starting working time from 1 to 3 seconds
Th9978 - Starting working time from 1 to 3 seconds
[0.01340808 0.0567745 0.31191508 ... 0.91127015 0.95141791 0.60075809]
Total computation time: 42 s

How can I solve this error?: networkx.exception.NetworkXError: ('Adjacency matrix is not square.', 'nx,ny=(10, 11)')

I am trying to create a graph from a numpy array using networkx but I get this error: networkx.exception.NetworkXError: ('Adjacency matrix is not square.', 'nx,ny=(10, 11)')
Someone know how to solve it?
My_Diz = {'X120213_1_0013_2_000004': array([[ 0. , 23.40378234, 30.29631001, 49.45217086,
53.47727757, 74.32949293, 73.27188558, 93.85556785,
132.31971186, 118.04532327, 88.1557181 ],
[ 0. , 0. , 34.41617904, 39.54024761,
34.25713329, 51.79037103, 51.33810652, 70.9900316 ,
109.76561471, 98.51724406, 69.76728919],
[ 0. , 0. , 0. , 26.66788605,
42.7133817 , 79.11779461, 65.88325262, 89.68664703,
125.91837789, 102.22926865, 71.58316322],
[ 0. , 0. , 0. , 0. ,
22.98401022, 65.5730092 , 44.53195174, 68.64071584,
102.34029705, 75.76571351, 45.22368742],
[ 0. , 0. , 0. , 0. ,
0. , 43.0377496 , 23.19245567, 47.19664886,
83.42653241, 65.0762151 , 35.66216118],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 30.28626571, 29.1448064 ,
64.72235299, 72.76481721, 56.93798086],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 24.18622881,
60.591058 , 49.69530936, 27.61846738],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
39.02763348, 46.26701103, 40.06206332],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 44.72240673, 62.0541588 ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 30.69921172]])}
for k,v in My_Diz.items():
G = nx.from_numpy_matrix(v)
nx.draw(G)

Your Matrix is not square. You have to give networkx a square matrix.
Since the matrix is (n × n+1), and it is triangular, you can do that :
for k,v in My_Diz.items():
r, c = v.shape
M = np.zeros((c,c))
M[:r, :c] = v
M[:c, :r] += v.T
G = nx.from_numpy_matrix(M)
nx.draw(G)

How to vectorize this cumulative operation?

Let W be some matrix of dimension (x, nP) [see end of question]
Right now, I'm doing the following code:
uUpperDraw = np.zeros(W.shape)
for p in np.arange(0, nP):
uUpperDraw[s, p] = (W[s+1,:(p+1)]).sum()
I want to vectorize this for efficiency gains. Given a pGrid = [0, 1, ...], how can I reproduce the following?
uUpperDraw = np.array([sum(W[x, 0]), sum(W[x,0] + W[x, 1]), sum(W[x,0] + W[x, 1] + W[x, 2]) ...
Here is some reproducible example.
>>> s, nP
(3, 10)
>>> W
array([[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[ 2. , 1.63636364, 1.38461538, 1.2 , 1.05882353,
0.94736842, 0.85714286, 0.7826087 , 0.72 , 0.66666667]])
>>> uUpperDraw
array([[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ],
[ 2. , 3.63636364, 5.02097902, 6.22097902,
7.27980255, 8.22717097, 9.08431383, 9.86692252,
10.58692252, 11.25358919],
[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. ]])

This looks like the cumulative sum. When you want to have the cumulative sum for each row seperately this here works
uUpperDraw = np.cumsum(W,axis=1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Limit the range of x in seaborn distplot KDE estimation - python

Setting plt.xlim(0, 1) beforehand should help : import matplotlib.pyplot as plt plt.xlim(0, 1) sns.distplot(arr, hist=False)

Related

Zeroes on specific rows in Python

How to add rows to a matrix with pad?

Multiprocessing pool inside a class in python changing array to list

How can I solve this error?: networkx.exception.NetworkXError: ('Adjacency matrix is not square.', 'nx,ny=(10, 11)')

How to vectorize this cumulative operation?

Categories

Resources