I have created a random data source that looks like this:
This is the code I use to gennerate and plot the first image.
import pandas as pd
import numpy as np
import numpy.ma as ma
import matplotlib.pyplot as plt
msize=25
rrange=5
jump=3
start=1
dpi=96
h=500
w=500
X,Y=np.meshgrid(range(0,msize),range(0,msize))
dat=np.random.rand(msize,msize)*rrange
msk=np.zeros_like(dat)
msk[start::jump,start::jump].fill(1)
mdat=msk*dat
mdat[mdat==0]=np.nan
mmdat = ma.masked_where(np.isnan(mdat),mdat)
fig = plt.figure(figsize=(w/dpi,h/dpi),dpi=dpi)
cmap = plt.get_cmap('RdYlBu')
cmap.set_bad(color='#cccccc', alpha=1.)
plot = plt.pcolormesh(X,Y,mmdat,cmap=cmap)
plot.axes.set_ylim(0,msize-1)
plot.axes.set_xlim(0,msize-1)
fig.savefig("masked.png",dpi=dpi)
Often this data source isn't so evenly distributed (but this is another subject).
Is there any kind of interpolation that makes the points "spill out" from its position?
Something like we take that light yellow point #(1,1) and turn all region around it (1 radius in taxi driver metric + diagonals) with the same color/value (for every valid point on image, nans will not be expanded)?
As I "gimped" on this image, on the three most lower/left values, the idea is find a way to do the same in all valid points, and not use gimp for that ;-):
After some thinking I arrived on this solution
import numpy as np
import matplotlib.pyplot as plt
t=np.array([
[ 0,0,0,0,0,0,0,0 ],
[ 0,0,0,0,0,0,0,0 ],
[ 0,0,2,0,0,4,0,0 ],
[ 0,0,0,0,0,0,0,0 ],
[ 0,0,0,0,0,0,0,0 ],
[ 0,0,3,0,0,1,0,0 ],
[ 0,0,0,0,0,0,0,0 ],
[ 0,0,0,0,0,0,0,0 ]])
def spill(arr, nval=0, m=1):
narr=np.copy(arr)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
if arr[i][j] != nval:
narr[i-m:i+m+1:1,j-m:j+m+1:1]=arr[i][j]
return narr
l=spill(t)
plt.figure()
plt.pcolormesh(t)
plt.savefig("notspilled.png")
plt.figure()
plt.pcolormesh(l)
plt.savefig("spilled.png")
plt.show()
This solution didn't make me very happy because the double for loop inside the spill() function :-/
Here are the output from the last code
This one isn't spilled
This one was sppilled:
How can I enhance the code above to eliminate the double loop.
You could do this with a 2D convolution. For example:
from scipy.signal import convolve2d
def spill2(arr, nval=0, m=1):
return convolve2d(arr, np.ones((2*m+1, 2*m+1)), mode='same')
np.allclose(spill(t), spill2(t))
# True
Be aware that as written, the results will not match if nval != 0 or if the spilled pixels overlap, but you can probably modify this to suit your needs.
Related
I would like to plot the values of matrix A on y-axis as a function of node number on x-axis. However, since I have a 5x5 matrix, I don't wish to define the node numbers manually. For instance, node 1 corresponds to 2.53734572e-01, node 2 to -1.08940733e-01,..., node 6 to -5.02000098e-01 and so on.
import numpy as np
import matplotlib.pyplot as plt
Node=np.array([[1,2,3,4,5],[6,7,8,9,10]])
A=np.array([[ 2.53734572e-01, -1.08940733e-01, 3.26138649e-03,
-6.10246692e-03, -2.59115145e-02],
[-5.02000098e-01, 1.08933714e-01, -3.65540228e-02,
5.93536044e-03, 3.88767438e-02],
[-1.42775456e+00, 4.52103243e-01, -2.33067190e-02,
7.27554880e-03, 1.15638039e-01],
[ 4.81030592e-01, -8.91302226e-02, 1.40486724e-03,
2.28801066e-02, -3.83389182e-02],
[ 8.39965176e-01, -2.81589587e-01, 2.24843962e-01,
-8.47758268e-03, -6.84721033e-02]])
plt.scatter(Node, A)
plt.xlabel('Node')
plt.ylabel('Velocity')
We can reduce the matrix to one dimension and use numpy.arange on the length of the matrix:
import numpy as np
import matplotlib.pyplot as plt
ys=np.array([[ 2.53734572e-01, -1.08940733e-01, 3.26138649e-03,
-6.10246692e-03, -2.59115145e-02],
[-5.02000098e-01, 1.08933714e-01, -3.65540228e-02,
5.93536044e-03, 3.88767438e-02],
[-1.42775456e+00, 4.52103243e-01, -2.33067190e-02,
7.27554880e-03, 1.15638039e-01],
[ 4.81030592e-01, -8.91302226e-02, 1.40486724e-03,
2.28801066e-02, -3.83389182e-02],
[ 8.39965176e-01, -2.81589587e-01, 2.24843962e-01,
-8.47758268e-03, -6.84721033e-02]]).flatten()
nodes=np.arange(len(ys))
plt.scatter(nodes, ys)
plt.xlabel('Node')
plt.ylabel('Velocity')
plt.show()
I have to analyse a PPG signal. I found something to find the peaks but I can't use the values of the heights. They are stored in like a dictionary array or something and I don't know how to extract the values out of it. I tried using dict.values() but that didn't work.
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import savgol_filter
data = pd.read_excel('test_heartpy.xlsx')
arr = np.array(data)
time = arr[1:,0] # time in s
ECG = arr[1:,1] # ECG
PPG = arr[1:,2] # PPG
filtered = savgol_filter(PPG, 251, 3)
plt.plot(time, filtered)
plt.xlabel('Time (in s)')
plt.ylabel('PPG')
plt.grid('on')
The PPG signal looks like this. To search for the peaks I used:
# searching peaks
from scipy.signal import find_peaks
peaks, heights_peak_0 = find_peaks(PPG, height=0.2)
heights_peak = heights_peak_0.values()
plt.plot(PPG)
plt.plot(peaks, np.asarray(PPG)[peaks], "x")
plt.plot(np.zeros_like(PPG), "--", color="gray")
plt.title("PPG peaks")
plt.show()
print(heights_peak_0)
print(heights_peak)
print(peaks)
Printing:
{'peak_heights': array([0.4822998 , 0.4710083 , 0.43884277, 0.46728516, 0.47094727,
0.44702148, 0.43029785, 0.44146729, 0.43933105, 0.41400146,
0.45318604, 0.44335938])}
dict_values([array([0.4822998 , 0.4710083 , 0.43884277, 0.46728516, 0.47094727,
0.44702148, 0.43029785, 0.44146729, 0.43933105, 0.41400146,
0.45318604, 0.44335938])])
[787 2513 4181 5773 7402 9057 10601 12194 13948 15768 17518 19335]
Signal with highlighted peaks looks like this.
heights_peak_0 is the properties dict returned by scipy.signal.find_peaks
You can find more information about what is returned here
You can extract the array containing all the heights of the peaks with heights_peak_0["peak_heights"]
# the following will give you an array with the values of peaks
heights_peak_0['peak_heights']
# peaks seem to be the indices where find_peaks function foud peaks in the original signal. So you can get the peak values this way also
PPG[peaks]
According to the docs, the find_peaks() functions returns a tuple consisting of the peaks itself and a properties dict. As you are only interested in the peak values, you can simply ignore the second element of the tuple and only use the first one.
Assuming you want to have the 'coordinates' of your peaks you could then combine the peak heights (y-values) with its positions (x-values) like so (based on the first code snippet given in the docs):
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, distance=150)
peaks_x_values = peaks
peaks_y_values = x[peaks]
peak_coordinates = list(zip(peaks_x_values, peaks_y_values))
print(peak_coordinates)
plt.plot(x)
plt.plot(peaks_x_values, peaks_y_values, "x")
plt.show()
Printing:
[(65, 0.705), (251, 1.155), (431, 1.705), (608, 1.96), (779, 1.925), (956, 2.09), (1125, 1.745), (1292, 1.37), (1456, 1.2), (1614, 0.81), (1776, 0.665), (1948, 0.665)]
I have Lab dataset: val_lab_2 like that:
[[ 7.803e+01 3.100e-01 1.382e+01]
[ 6.697e+01 -7.400e+00 2.750e+01]
[ 5.631e+01 -1.804e+01 1.599e+01]
[ 6.701e+01 2.650e+00 2.913e+01]
[ 6.564e+01 1.660e+00 2.540e+01]
[ 3.537e+01 2.050e+01 3.784e+01]
[ 4.178e+01 2.251e+01 4.438e+01]
[ 6.129e+01 1.261e+01 5.934e+01]
[ 4.269e+01 5.120e+00 4.995e+01]...]
I want to change it to RGB 0-255, so I use colormath package:
import numpy as np
import pandas as pd
from colormath.color_objects import sRGBColor, XYZColor, LabColor
from colormath.color_conversions import convert_color
val_rgb_2 = []
for lab_list in val_lab_2:
lab = LabColor(*[component for component in lab_list])
rgb = convert_color(lab, sRGBColor)
rgb_list = [255*color for color in rgb.get_value_tuple()]
val_rgb_2.append(rgb_list)
val_rgb_2 = np.array(val_rgb_2)
print(val_rgb_2)
the result shows:
[[201.44003158 192.14717152 167.2643737 ]
[163.51015463 166.15931613 112.52909259]
[109.8451797 143.64817993 106.17872676]
[181.45664244 160.44644464 110.27374654]
[174.70091435 157.51328159 113.65978481]
[122.18753543 69.26114552 19.79086107]
[143.2650315 82.85139354 20.50673141]
[187.60706038 138.48640317 31.06087929]...]
However, I think it is not correct, because I have a label, first few rows shows it should be:
natural white 100% wool,blue-violet flowers,blue flowers,flowers,whole plant,peels,peels,peels...
I suggest you to use this website to:
insert the LAB values
check the RGB values that the website gives you
check the color associated with the LAB/RGB you have insert
I have done a check and the conversion between the two matrix you wrote is correct.
For example let's try with the second element of the list above, which it should be labeled as blue-violet flowers:
[ 6.697e+01 -7.400e+00 2.750e+01]
The conversion is correct; I have some doubts regarding the labels.
I have a question similar to the question asked here:
simple way of fusing a few close points. I want to replace points that are located close to each other with the average of their coordinates. The closeness in cells is specified by the user (I am talking about euclidean distance).
In my case I have a lot of points (about 1-million). This method is working, but is time consuming as it uses a double for loop.
Is there a faster way to detect and fuse close points in a numpy 2d array?
To be complete I added an example:
points=array([[ 382.49056159, 640.1731949 ],
[ 496.44669161, 655.8583119 ],
[ 1255.64762859, 672.99699399],
[ 1070.16520917, 688.33538171],
[ 318.89390168, 718.05989421],
[ 259.7106383 , 822.2 ],
[ 141.52574427, 28.68594436],
[ 1061.13573287, 28.7094536 ],
[ 820.57417943, 84.27702407],
[ 806.71416007, 108.50307828]])
A scatterplot of the points is visible below. The red circle indicates the points located close to each other (in this case a distance of 27.91 between the last two points in the array). So if the user would specify a minimum distance of 30 these points should be fused.
In the output of the fuse function the last to points are fused. This will look like:
#output
array([[ 382.49056159, 640.1731949 ],
[ 496.44669161, 655.8583119 ],
[ 1255.64762859, 672.99699399],
[ 1070.16520917, 688.33538171],
[ 318.89390168, 718.05989421],
[ 259.7106383 , 822.2 ],
[ 141.52574427, 28.68594436],
[ 1061.13573287, 28.7094536 ],
[ 813.64416975, 96.390051175]])
If you have a large number of points then it may be faster to build a k-D tree using scipy.spatial.KDTree, then query it for pairs of points that are closer than some threshold:
import numpy as np
from scipy.spatial import KDTree
tree = KDTree(points)
rows_to_fuse = tree.query_pairs(r=30)
print(repr(rows_to_fuse))
# {(8, 9)}
print(repr(points[list(rows_to_fuse)]))
# array([[ 820.57417943, 84.27702407],
# [ 806.71416007, 108.50307828]])
The major advantage of this approach is that you don't need to compute the distance between every pair of points in your dataset.
You can use scipy's distance functions such as pdist in order to quickly find which points should be merged:
import numpy as np
from scipy.spatial.distance import pdist, squareform
d = squareform(pdist(a))
d = np.ma.array(d, mask=np.isclose(d, 0))
a[d.min(axis=1) < 30]
#array([[ 820.57417943, 84.27702407],
# [ 806.71416007, 108.50307828]])
NOTE
For large samples this method can cause memory errors since it is storing a full matrix containing the relative distances.
I am looking to find the peaks in some gaussian smoothed data that I have. I have looked at some of the peak detection methods available but they require an input range over which to search and I want this to be more automated than that. These methods are also designed for non-smoothed data. As my data is already smoothed I require a much more simple way of retrieving the peaks. My raw and smoothed data is in the graph below.
Essentially, is there a pythonic way of retrieving the max values from the array of smoothed data such that an array like
a = [1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1]
would return:
r = [5,3,6]
There exists a bulit-in function argrelextrema that gets this task done:
import numpy as np
from scipy.signal import argrelextrema
a = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
# determine the indices of the local maxima
max_ind = argrelextrema(a, np.greater)
# get the actual values using these indices
r = a[max_ind] # array([5, 3, 6])
That gives you the desired output for r.
As of SciPy version 1.1, you can also use find_peaks. Below are two examples taken from the documentation itself.
Using the height argument, one can select all maxima above a certain threshold (in this example, all non-negative maxima; this can be very useful if one has to deal with a noisy baseline; if you want to find minima, just multiply you input by -1):
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
import numpy as np
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, height=0)
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.plot(np.zeros_like(x), "--", color="gray")
plt.show()
Another extremely helpful argument is distance, which defines the minimum distance between two peaks:
peaks, _ = find_peaks(x, distance=150)
# difference between peaks is >= 150
print(np.diff(peaks))
# prints [186 180 177 171 177 169 167 164 158 162 172]
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.show()
If your original data is noisy, then using statistical methods is preferable, as not all peaks are going to be significant. For your a array, a possible solution is to use double differentials:
peaks = a[1:-1][np.diff(np.diff(a)) < 0]
# peaks = array([5, 3, 6])
>> import numpy as np
>> from scipy.signal import argrelextrema
>> a = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
>> argrelextrema(a, np.greater)
array([ 4, 10, 17]),)
>> a[argrelextrema(a, np.greater)]
array([5, 3, 6])
If your input represents a noisy distribution, you can try smoothing it with NumPy convolve function.
If you can exclude maxima at the edges of the arrays you can always check if one elements is bigger than each of it's neighbors by checking:
import numpy as np
array = np.array([1,2,3,4,5,4,3,2,1,2,3,2,1,2,3,4,5,6,5,4,3,2,1])
# Check that it is bigger than either of it's neighbors exluding edges:
max = (array[1:-1] > array[:-2]) & (array[1:-1] > array[2:])
# Print these values
print(array[1:-1][max])
# Locations of the maxima
print(np.arange(1, array.size-1)[max])