Shift several graphs under each other - python

I want to shift several graphs under each other. I read in the data as an array with 4 columns in the data:
# load data for variable intensities
data_with_minimum = []
for i in [6, 12, 25, 50, 100]:
data_with_minimum.append(np.loadtxt('data{0}.dat'.format(i)))
then I search for a characteristic point, in this case a minimum in the first 5000 rows(I know that there always is a minimum) and saving the indixes.
# open arrays for minimum value and index
m = []
mi = []
for k in range(5):
m.append(0)
mi.append(0)
# search minimum in first 5000 data points
for i in range(5000):
if m[k] > data_with_minimum[k][i,1]:
m[k] = data_with_minimum[k][i,1]
mi[k] = i
Lastly, I want to shift every minimum from the first column under each other:
# shift x-axis
for i in range(30000 - m_max):
for k in range(5):
data_with_minimum[k][i,1] = data_with_minimum[k][i+(mi[k]-min(mi)),1]
Unfortunately this is not working, because the values are redefining itself. Because I'm quite new to Python I got stuck. So any suggestions would be helpful. Or is there maybe in general an easier way to solve this problem? This seems to me inconvenient.. Thank you!
edit:
1) Unfortunately I can't post images because I've not enough reputation points. So I need to post this link shift graphs. Sorry for that. My goal is that the minima of all graphs are at the same point. This graph was plotted with the command:
plt.figure(0)
for i in range(5):
plt.plot(data_with_minimum[i][:,0], data_with_minimum[i][:,1])
Minimum data example:
x y(file1) y(file2) y(file3)
1 5 8 3
2 3 6 1
3 1 5 5
4 2 3 8
5 5 1 10
6 8 3 13
7 10 4 15
8 14 7 18
9 16 10 20
...
this should become
x y(file1) y(file2) y(file3)
1 3 3 3
2 1 1 1
3 2 3 5
4 5 4 8
5 8 7 10
6 10 10 13
7 14 - 15
8 16 - 18
9 - - 20
...
with 1 the minimum. But there is to mention that it could be possible that there's an additional minimum after the 5000 first data points.
And the Beginning of the real data of one file:
0.000000 -1.057758
0.000200 -1.051918
0.000400 -1.063922
0.000600 -1.065220
0.000800 -1.069438
0.001000 -1.065220
0.001400 -1.065545
0.001600 -1.077549
0.001800 -1.072682
0.002000 -1.082416
0.002200 -1.078847
0.002400 -1.090203
0.002600 -1.087283
0.002800 -1.095069
0.003000 -1.090527
0.003200 -1.098314
0.003400 -1.100261
0.003600 -1.108372
0.003800 -1.103505
0.004000 -1.111292
0.004200 -1.107074
0.004400 -1.113887
0.004600 -1.112590
0.004800 -1.127514
0.005000 -1.115510
0.005200 -1.127514
...
2) changed columns to rows in the passage "in this case a minimum in the first 5000 columns"

First of all, you can find the minima indices much easier and faster using numpy's argmin:
import numpy
# setup example data
x = numpy.arange(9)
data_with_minimum = numpy.array(
[[ 5, 3, 1, 2, 5, 8, 10, 14, 16],
[ 8, 6, 5, 3, 1, 3, 4, 7, 10],
[ 3, 1, 5, 8, 10, 13, 15, 18, 20]])
mi = numpy.argmin(data_with_minimum, axis = 1)
Then, I wanted to point you to numpy.roll, which could be used to shift/align the arrays, but if you are interested in plotting, it is much more elegant and logical not to modify the arrays at all (and to deal with boundary issues), but just to shift the line plots:
import matplotlib.pyplot as plt
plt.clf()
for i, row in enumerate(data_with_minimum):
plt.plot(x - mi[i], row)
plt.xlabel('offset from minimum')
plt.show()

This is hard to answer without a MWE. But here's how I would plot two lines so that their minimum values are aligned:
import numpy as np
np.random.seed(1)
a = np.random.random_sample(10)
b = np.random.random_sample(10)
# say we want to align "b" to "a" based on
# the minima as you describe
a_indices = np.arange(0, len(a))
b_indices = a_indices + (a.argmin() - b.argmin())
import matplotlib.pyplot as plt
plt.plot(a_indices, a)
plt.plot(b_indices, b)
plt.show()

Related

How to Find Closest data points for a query data point in a pandas dataframe?

I have a query data point with 15 columns and I have a pandas data frame with same columns(15) and i want to find closest data points present in data frame to my query data point. can some one guide me on this ?
Example:
query data point
[1, 2, 3, 4]
df
1 3 5 6
2 7 9 1
2 8 1 8
5 4 9 0
2 4 6 7
here, below rows are closest , in the same way i want to retrieve first n closest data points to my query point.
1 3 5 6
2 4 6 7
I tried clustering but it was too complex for me to understand and KNN is expecting a target variable, so need your help .Thank you!
You can use the Euclidean distance or L2Norm to calculate the distance between each row of your dataframe and your query point.
df = pd.DataFrame([[1, 3, 5, 6],
[2, 7, 9, 1],
[2, 8, 1, 8],
[5, 4, 9, 0],
[2, 4, 6, 7]])
vec = [1, 2, 3, 4]
dist = df.sub(vec, axis=1).pow(2).sum(axis=1).pow(.5)
This gives the output,
0 3.000000
1 8.426150
2 7.549834
3 8.485281
4 4.795832
dtype: float64
You can select the shortest n distances, which give you the positions of n-closest data points to your query points.
Or you can use the np.linlag.norm
dist = np.linalg.norm(source.to_numpy() - vec, axis=1)
which gives you the output
array([3. , 8.42614977, 7.54983444, 8.48528137, 4.79583152])
Check out the answers to this question.
You can try:
query_point = [1, 2, 3, 4]
n = 2
n_closest_points = df.loc[(df - query_point).pow(2).sum(axis=1).nsmallest(n).index]
gives
0 1 2 3
0 1 3 5 6
4 2 4 6 7
We take the sum of squared distance between each row and the query_point by chaining subtraction (which broadcasts), taking square (pow) and summing (sum). Then we require the n closest rows via getting the rows that have the smallest distance (nsmallest). Then this gives a series with values being the squared distance and index indicating the desired rows, so we take its index and look them into the original df (.loc).

Merge 3D numpy array into pandas Dataframe + 1D vector

I have a dataset which is a numpy array with shape (1536 x 16 x 48). A quick explanation of these dimensions that might be helpful:
The dataset consists of data collected by EEG sensors at 256Hz rate (1 second = 256 measures/values);
1536 values represent 6 seconds of EEG data (256 * 6 = 1536);
16 is the number of electrodes used to collect data;
48 is the number of samples.
In summary: i have 48 samples of 6 seconds (1536 values) of EEG data, collected by 16 electrodes.
I need to create a pandas dataframe with all this data, and therefore turn this 3D array into 2D. The depth dimension (48) can be removed if i stack all samples one above another. So the new dataset will be shaped (1536 * 48) x 16.
In addition to that, since this is a classification problem, i have a vector with 48 values that represents the class of each EEG sample. The new dataset should also has this as a "class" column, and then the real shape would be: (1536 * 48) x 16 + 1 (class).
I could easily do that looping through the depth dimension of the 3D array and concatenate everything into a 2D new one. But this looks bad since i will be dealing with many datasets like this one. Performance is an issue. I would like to know if there's any more clever way of doing it.
I tried to provide the maximum of information i could for this question, but since it is not a trivial task feel free to ask further details if needed.
Thanks in advance.
Setup
>>> import numpy as np
>>> import pandas as pd
>>> a = np.zeros((4,3,3),dtype=int) + [0,1,2]
>>> a *= 10
>>> a += np.array([1,2,3,4])[:,None,None]
>>> a
array([[[ 1, 11, 21],
[ 1, 11, 21],
[ 1, 11, 21]],
[[ 2, 12, 22],
[ 2, 12, 22],
[ 2, 12, 22]],
[[ 3, 13, 23],
[ 3, 13, 23],
[ 3, 13, 23]],
[[ 4, 14, 24],
[ 4, 14, 24],
[ 4, 14, 24]]])
Split evenly along the last dimension; stack those elements, reshape, feed to DataFrame. Using the lengths of the array's dimensions simplifies the process.
>>> d0,d1,d2 = a.shape
>>> pd.DataFrame(np.stack(np.dsplit(a,d2)).reshape(d0*d2,d1))
0 1 2
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 11 11 11
5 12 12 12
6 13 13 13
7 14 14 14
8 21 21 21
9 22 22 22
10 23 23 23
11 24 24 24
>>>
Using your shape.
>>> b = np.random.random((1536, 16, 48))
>>> d0,d1,d2 = b.shape
>>> df = pd.DataFrame(np.stack(np.dsplit(b,d2)).reshape(d0*d2,d1))
>>> df.shape
(73728, 16)
>>>
After making the DataFrame from the 3d array, add the classification column to it, df['class'] = data. - Column selection, addition, deletion
For the numpy part
x = np.random.random((1536, 16, 48)) # ndarray with simillar shape
x = x.swapaxes(1,2) # swap axes 1 and 2 i.e 16 and 48
x = x.reshape((-1, 16), order='C') # order is important, you may want to check the docs
c = np.zeros((x.shape[0], 1)) # class column, shape=(73728, 1)
x = np.hstack((x, c)) # final dataset
x.shape
Output
(73728, 17)
or in one line
x = np.hstack((x.swapaxes(1,2).reshape((-1, 16), order='C'), c))
Finally,
x = pd.DataFrame(x)

Check if X,Y column pair in table B is within delta distance of any X, Y column pair in table A

I have a dataframe named origA:
X, Y
10, 20
11, 2
9, 35
8, 7
And another one named calcB:
Xc, Yc
1, 7
9, 22
I want to check that for every Xc, Yc pair in calcB if there is a X,Y pair in origA that has an euclidean distance to Xc, Yc that is less than delta and if yes, put True in the respective row at a new column Detected in origA.
#Wen-Ben's solution might work for small datasets. However, you run quickly into performance problems when you try to compute the distances for many points. Hence, there are already plenty of smart algorithms which reduce the amount of required distance calculations - one of them is BallTree (provided by scikit-learn):
from sklearn.neighbors import BallTree
# Prepare the data and the search radius:
origA = pd.DataFrame()
origA['X'] = [10, 11, 9, 8]
origA['Y'] = [20, 2, 35, 7]
calcB = pd.DataFrame()
calcB['Xc'] = [1, 9]
calcB['Yc'] = [7, 22]
delta = 5
# Stack the coordinates together:
pointsA = np.column_stack([origA.X, origA.Y])
pointsB = np.column_stack([calcB.Xc, calcB.Yc])
# Create the Ball Tree and search for close points:
tree = BallTree(pointsB)
detected = tree.query_radius(pointsA, r=delta, count_only=True)
# Add results as additional column:
origA['Detected'] = detected.astype(bool)
Output
X Y Detected
0 10 20 True
1 11 2 False
2 9 35 False
3 8 7 False
You can using the method from scipy
import scipy
delta=5
ary = scipy.spatial.distance.cdist(dfa, dfb, metric='euclidean')
ary
Out[189]:
array([[15.8113883 , 2.23606798],
[11.18033989, 20.09975124],
[29.12043956, 13. ],
[ 7. , 15.03329638]])
dfa['detected']=(ary<delta).any(1)
dfa
Out[191]:
X Y detected
0 10 20 False
1 11 2 True
2 9 35 True
3 8 7 False

How can I split my array to small arrays with 10 elements and shift 2 elements each time?

I am trying to split my array that is composed by 100 elements to small arrays each one has 10 elements and calculate their average (the average of each small array). My problem is that each time I want to shift two elements, is what I am doing in the next code is correct ?
Avg_Arr=[sum(Signal[k:k+10])/10 for k in range(0,N,2)]
More precisely, if my Array is the following
Array=[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 .....]
My first small array is
My_Array1=[0 1 2 3 4 5 6 7 8 9]
==> average is (0+1+2+3+4+5+6+7+8+9)/10
while my second one must be
My_Array2=[2 3 4 5 6 7 8 9 10 11]
This should works:
Signal=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
N = len(Signal)
Avg_Arr=[sum(Signal[k:k+10])/10 for k in range(0, N-10, 2)]
print(Avg_Arr)
Beware that you must stop 10 elements from the end. Otherwise you are not averaging over 10 elements.

Resampling a numpy array representing an image

I am looking for how to resample a numpy array representing image data at a new size, preferably having a choice of the interpolation method (nearest, bilinear, etc.). I know there is
scipy.misc.imresize
which does exactly this by wrapping PIL's resize function. The only problem is that since it uses PIL, the numpy array has to conform to image formats, giving me a maximum of 4 "color" channels.
I want to be able to resize arbitrary images, with any number of "color" channels. I was wondering if there is a simple way to do this in scipy/numpy, or if I need to roll my own.
I have two ideas for how to concoct one myself:
a function that runs scipy.misc.imresize on every channel separately
create my own using scipy.ndimage.interpolation.affine_transform
The first one would probably be slow for large data, and the second one does not seem to offer any other interpolation method except splines.
Based on your description, you want scipy.ndimage.zoom.
Bilinear interpolation would be order=1, nearest is order=0, and cubic is the default (order=3).
zoom is specifically for regularly-gridded data that you want to resample to a new resolution.
As a quick example:
import numpy as np
import scipy.ndimage
x = np.arange(9).reshape(3,3)
print 'Original array:'
print x
print 'Resampled by a factor of 2 with nearest interpolation:'
print scipy.ndimage.zoom(x, 2, order=0)
print 'Resampled by a factor of 2 with bilinear interpolation:'
print scipy.ndimage.zoom(x, 2, order=1)
print 'Resampled by a factor of 2 with cubic interpolation:'
print scipy.ndimage.zoom(x, 2, order=3)
And the result:
Original array:
[[0 1 2]
[3 4 5]
[6 7 8]]
Resampled by a factor of 2 with nearest interpolation:
[[0 0 1 1 2 2]
[0 0 1 1 2 2]
[3 3 4 4 5 5]
[3 3 4 4 5 5]
[6 6 7 7 8 8]
[6 6 7 7 8 8]]
Resampled by a factor of 2 with bilinear interpolation:
[[0 0 1 1 2 2]
[1 2 2 2 3 3]
[2 3 3 4 4 4]
[4 4 4 5 5 6]
[5 5 6 6 6 7]
[6 6 7 7 8 8]]
Resampled by a factor of 2 with cubic interpolation:
[[0 0 1 1 2 2]
[1 1 1 2 2 3]
[2 2 3 3 4 4]
[4 4 5 5 6 6]
[5 6 6 7 7 7]
[6 6 7 7 8 8]]
Edit: As Matt S. pointed out, there are a couple of caveats for zooming multi-band images. I'm copying the portion below almost verbatim from one of my earlier answers:
Zooming also works for 3D (and nD) arrays. However, be aware that if you zoom by 2x, for example, you'll zoom along all axes.
data = np.arange(27).reshape(3,3,3)
print 'Original:\n', data
print 'Zoomed by 2x gives an array of shape:', ndimage.zoom(data, 2).shape
This yields:
Original:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Zoomed by 2x gives an array of shape: (6, 6, 6)
In the case of multi-band images, you usually don't want to interpolate along the "z" axis, creating new bands.
If you have something like a 3-band, RGB image that you'd like to zoom, you can do this by specifying a sequence of tuples as the zoom factor:
print 'Zoomed by 2x along the last two axes:'
print ndimage.zoom(data, (1, 2, 2))
This yields:
Zoomed by 2x along the last two axes:
[[[ 0 0 1 1 2 2]
[ 1 1 1 2 2 3]
[ 2 2 3 3 4 4]
[ 4 4 5 5 6 6]
[ 5 6 6 7 7 7]
[ 6 6 7 7 8 8]]
[[ 9 9 10 10 11 11]
[10 10 10 11 11 12]
[11 11 12 12 13 13]
[13 13 14 14 15 15]
[14 15 15 16 16 16]
[15 15 16 16 17 17]]
[[18 18 19 19 20 20]
[19 19 19 20 20 21]
[20 20 21 21 22 22]
[22 22 23 23 24 24]
[23 24 24 25 25 25]
[24 24 25 25 26 26]]]
If you want to resample, then you should look at Scipy's cookbook for rebinning. In particular, the congrid function defined at the end will support rebinning or interpolation (equivalent to the function in IDL with the same name). This should be the fastest option if you don't want interpolation.
You can also use directly scipy.ndimage.map_coordinates, which will do a spline interpolation for any kind of resampling (including unstructured grids). I find map_coordinates to be slow for large arrays (nx, ny > 200).
For interpolation on structured grids, I tend to use scipy.interpolate.RectBivariateSpline. You can choose the order of the spline (linear, quadratic, cubic, etc) and even independently for each axis. An example:
import scipy.interpolate as interp
f = interp.RectBivariateSpline(x, y, im, kx=1, ky=1)
new_im = f(new_x, new_y)
In this case you're doing a bi-linear interpolation (kx = ky = 1). The 'nearest' kind of interpolation is not supported, as all this does is a spline interpolation over a rectangular mesh. It's also not the fastest method.
If you're after bi-linear or bi-cubic interpolation, it is generally much faster to do two 1D interpolations:
f = interp.interp1d(y, im, kind='linear')
temp = f(new_y)
f = interp.interp1d(x, temp.T, kind='linear')
new_im = f(new_x).T
You can also use kind='nearest', but in that case get rid of the transverse arrays.
Have you looked at Scikit-image? Its transform.pyramid_* functions might be useful for you.
I've recently just found an issue with scipy.ndimage.interpolation.zoom, which I've submitted as a bug report: https://github.com/scipy/scipy/issues/3203
As an alternative (or at least for me), I've found that scikit-image's skimage.transform.resize works correctly: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize
However it works differently to scipy's interpolation.zoom - rather than specifying a mutliplier, you specify the the output shape that you want. This works for 2D and 3D images.
For just 2D images, you can use transform.rescale and specify a multiplier or scale as you would with interpolation.zoom.
You can use interpolate.interp2d.
For example, considering an image represented by a numpy array arr, you can resize it to an arbitrary height and width as follows:
W, H = arr.shape[:2]
new_W, new_H = (600,300)
xrange = lambda x: np.linspace(0, 1, x)
f = interp2d(xrange(W), xrange(H), arr, kind="linear")
new_arr = f(xrange(new_W), xrange(new_H))
Of course, if your image has multiple channels, you have to perform the interpolation for each one.
This solution scales X and Y of the fed image without affecting RGB channels:
import numpy as np
import scipy.ndimage
matplotlib.pyplot.imshow(scipy.ndimage.zoom(image_np_array, zoom = (7,7,1), order = 1))
Hope this is useful.

Categories