I have tried to view field lines of an uncomplete regular grid vector field with first pyVista Streamlines and then with plotly without success... I have yet good results with other 2d streamplots :
2d streamplot of the data
Could someone help me with this ? I found no answer... Here is my data : https://wetransfer.com/downloads/7f3c4ae01e5922e753ea708134f956e720230214141330/bf11ab
import pandas as pd
import numpy as np
import pyvista as pv
import plotly.graph_objects as go
df = pd.read_csv("mix_griddata.csv")
X = df['X']
Y = df['Y']
Z = df['Z']
Vx = df['Vx']
Vy = df['Vy']
Vz = df['Vz']
fig = go.Figure(data=go.Streamtube(
x = X,
y = Y,
z = Z,
u = Vx,
v = Vy,
w = Vz,
starts = dict(
x = X.sample(frac=0.01,replace=False),
y = Y.sample(frac=0.01,replace=False),
z = Z.sample(frac=0.01,replace=False)
),
sizeref =1,
colorscale = 'Portland',
showscale = False,
maxdisplayed = 30000000
))
fig.update_layout(
scene = dict(
aspectratio = dict(
x = 1,
y = 1,
z = 1
)
),
margin = dict(
t = 10,
b = 10,
l = 10,
r = 10
)
)
fig.show(renderer="browser")
#Streamlines
mix_FD_grid = np.load("C:/Users/hd377/OneDrive - ensam.eu/0-Thesis/Fibres_Direction_in_allvolume/mix/mix_FD_grid.npy")
origin = (0,0,0)
mesh = pv.UniformGrid(dimensions=mix_FD_grid[:,:,:,0].shape, spacing=(1, 1, 1), origin=origin)
vectors = np.empty((mesh.n_points, 3))
vectors[:, 0] = mix_FD_grid[:,:,:,0].flatten()
vectors[:, 1] = mix_FD_grid[:,:,:,1].flatten()
vectors[:, 2] = mix_FD_grid[:,:,:,2].flatten()
mesh['vectors'] = vectors
stream, src = mesh.streamlines(
'vectors', return_source=True, max_steps = 20000, n_points=200, source_radius=25, source_center=(15, 0, 30)
)
p = pv.Plotter()
p.add_mesh(mesh.outline(), color="k")
p.add_mesh(stream.tube(radius=0.1))
p.camera_position = [(182.0, 177.0, 50), (139, 105, 19), (-0.2, -0.2, 1)]
p.show()
The plotly window does appear in my browser but no tube are visible at all, and the axes values are false.
The pyVista does show something, but in the wrong direction, and clearly not what expected (longitudinal flux circumventing a central cone).
I'll only be tackling PyVista. It's hard to say for sure and I'm only guessing, but your data is probably laid out in the wrong order.
For starters, your data is inconsistent to begin with: your CSV has 1274117 rows whereas your multidimensional array has shape (37, 364, 100, 3), for a total of 1346800 vectors. And your question title says "unstructured", but your PyVista attempt uses a uniform grid with.
Secondly, your CSV doesn't correspond to a regular grid in the first place, e.g. at the end of the file you have 15 rows starting with 368.693,36.971999999999994, then 8 rows starting with 369.71999999999997,36.971999999999994, then a single row starting with 370.74699999999996,36.971999999999994. In a regular grid you'd get the same number of items in each block.
Thirdly, your CSV has an unusual (MATLAB-smelling) layout that the order of axes is z-x-y (rather than either x-y-z or z-y-x). This is a strong clue that your data is mangled due to memory layout issues when flattened. But the previous two point mean that I can't verify how your 4d array was created, I have to take it for granted that it's correct.
Just plotting your raw data makes it obvious that the data is mangled in your original version (with some style cleanup):
import numpy as np
import pyvista as pv
mix_FD_grid = np.load("mix_FD_grid.npy")
origin = (0, 0, 0)
mesh = pv.UniformGrid(dimensions=mix_FD_grid.shape[:-1], spacing=(1, 1, 1), origin=origin)
vectors = np.empty_like(mesh.points)
vectors[:, 0] = mix_FD_grid[..., 0].ravel()
vectors[:, 1] = mix_FD_grid[..., 1].ravel()
vectors[:, 2] = mix_FD_grid[..., 2].ravel()
mesh.point_data['vectors'] = vectors
mesh.plot()
The fragmented pattern you can see is a hallmark of data mangling due to mistaken memory layout.
If we assume the layout is more or less sane, trying column-major layout ("F" for "Fortran", also used by MATLAB) seems to make a lot more sense:
vectors[:, 0] = mix_FD_grid[..., 0].ravel('F')
vectors[:, 1] = mix_FD_grid[..., 1].ravel('F')
vectors[:, 2] = mix_FD_grid[..., 2].ravel('F')
mesh.point_data['vectors'] = vectors
mesh.plot()
So we can try using streamlines using that:
stream, src = mesh.streamlines(
'vectors', return_source=True, max_steps=20000, n_points=200, source_radius=25, source_center=(15, 0, 30)
)
p = pv.Plotter()
p.add_mesh(mesh.outline(), color="k")
p.add_mesh(stream.tube(radius=0.1))
p.show()
It doesn't look great:
So, you said that the streamlines should be longitudinal, but here they are clearly transversal. Can it be that the x and y field components are swapped? I can't tell, so let's try!
import numpy as np
import pyvista as pv
mix_FD_grid = np.load("mix_FD_grid.npy")
origin = (0, 0, 0)
mesh = pv.UniformGrid(dimensions=mix_FD_grid.shape[:-1], spacing=(1, 1, 1), origin=origin)
vectors = np.empty_like(mesh.points)
vectors[:, 0] = mix_FD_grid[..., 1].ravel('F') # swap 0 <-> 1
vectors[:, 1] = mix_FD_grid[..., 0].ravel('F') # swap 0 <-> 1
vectors[:, 2] = mix_FD_grid[..., 2].ravel('F')
mesh.point_data['vectors'] = vectors
stream, src = mesh.streamlines(
'vectors', return_source=True, max_steps=20000, n_points=200, source_radius=25, source_center=(15, 0, 30)
)
p = pv.Plotter()
p.add_mesh(mesh.outline(), color="k")
p.add_mesh(stream.tube(radius=0.1))
p.show()
Now we're talking!
Bonus: y field component on a volumetric plot:
mesh.plot(volume=True, scalars=vectors[:, 1], show_scalar_bar=False)
Related
I have a data frame like the below:
Every row represents a person. They stay at 3 different locations for some time given on the dataframe. The first few people don't stay at location1 but they "born" at location2. The rest of them stay at every locations (3 locations).
I would like to animate every person at the given X, Y coordinates given on the data frame and represent them as dots or any other shape. Here is the flow:
Every person should appear at the first given location (location1) at the given time. Their color should be blue at this state.
Stay at location1 until location2_time and then appear at location2. Their color should be red at this state.
Stay at location2 until location3_time and then appear at location3. Their color should be red at this state.
Stay at location3 for 3 seconds and disappear forever.
There can be several people on the visual at the same time. How can I do that?
There are some good answers on the below links. However, on these solutions, points don't disappear.
How can i make points of a python plot appear over time?
How to animate a scatter plot?
The following is an implementation with python-ffmpeg, pandas, matplotlib, and seaborn. You can find output video on my YouTube channel (link is unlisted).
Each frame with figures is saved directly to memory. New figures are generated only when the state of the population changes (person appears/moves/disappears).
You should definetely separate this code into smaller chunks if you are using this in a Python package:
from numpy.random import RandomState, SeedSequence
from numpy.random import MT19937
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import ffmpeg
RESOLUTION = (12.8, 7.2) # * 100 pixels
NUMBER_OF_FRAMES = 900
class VideoWriter:
# Courtesy of https://github.com/kylemcdonald/python-utils/blob/master/ffmpeg.py
def __init__(
self,
filename,
video_codec="libx265",
fps=15,
in_pix_fmt="rgb24",
out_pix_fmt="yuv420p",
input_args=None,
output_args=None,
):
self.filename = filename
self.process = None
self.input_args = {} if input_args is None else input_args
self.output_args = {} if output_args is None else output_args
self.input_args["r"] = self.input_args["framerate"] = fps
self.input_args["pix_fmt"] = in_pix_fmt
self.output_args["pix_fmt"] = out_pix_fmt
self.output_args["vcodec"] = video_codec
def add(self, frame):
if self.process is None:
height, width = frame.shape[:2]
self.process = (
ffmpeg.input(
"pipe:",
format="rawvideo",
s="{}x{}".format(width, height),
**self.input_args,
)
.filter("crop", "iw-mod(iw,2)", "ih-mod(ih,2)")
.output(self.filename, **self.output_args)
.global_args("-loglevel", "quiet")
.overwrite_output()
.run_async(pipe_stdin=True)
)
conv = frame.astype(np.uint8).tobytes()
self.process.stdin.write(conv)
def close(self):
if self.process is None:
return
self.process.stdin.close()
self.process.wait()
def figure_to_array(figure):
"""adapted from: https://stackoverflow.com/questions/21939658/"""
figure.canvas.draw()
buf = figure.canvas.tostring_rgb()
n_cols, n_rows = figure.canvas.get_width_height()
return np.frombuffer(buf, dtype=np.uint8).reshape(n_rows, n_cols, 3)
# Generate data for the figure
rs1 = RandomState(MT19937(SeedSequence(123456789)))
time_1 = np.round(rs1.rand(232) * NUMBER_OF_FRAMES).astype(np.int16)
time_2 = time_1 + np.round(rs1.rand(232) * (NUMBER_OF_FRAMES - time_1)).astype(np.int16)
time_3 = time_2 + np.round(rs1.rand(232) * (NUMBER_OF_FRAMES - time_2)).astype(np.int16)
loc_1_x, loc_1_y, loc_2_x, loc_2_y, loc_3_x, loc_3_y = np.round(rs1.rand(6, 232) * 100, 1)
df = pd.DataFrame({
"loc_1_time": time_1,
"loc_1_x": loc_1_x,
"loc_1_y": loc_1_y,
"loc_2_time": time_2,
"loc_2_x": loc_2_x,
"loc_2_y": loc_2_y,
"loc_3_time": time_3,
"loc_3_x": loc_3_x,
"loc_3_y": loc_3_y,
})
"""The stack answer starts here"""
# Add extra column for disappear time
df["disappear_time"] = df["loc_3_time"] + 3
all_times = df[["loc_1_time", "loc_2_time", "loc_3_time", "disappear_time"]]
change_times = np.unique(all_times)
# Prepare ticks for plotting the figure across frames
x_values = df[["loc_1_x", "loc_2_x", "loc_3_x"]].values.flatten()
x_ticks = np.array(np.linspace(x_values.min(), x_values.max(), 6), dtype=np.uint8)
y_values = df[["loc_1_y", "loc_2_y", "loc_3_y"]].values.flatten()
y_ticks = np.array(np.round(np.linspace(y_values.min(), y_values.max(), 6)), dtype=np.uint8)
sns.set_theme(style="whitegrid")
video_writer = VideoWriter("endermen.mp4")
if 0 not in change_times:
# Generate empty figure if no person arrive at t=0
fig, ax = plt.subplots(figsize=RESOLUTION)
ax.set_xticklabels(x_ticks)
ax.set_yticklabels(y_ticks)
ax.set_title("People movement. T=0")
video_writer.add(figure_to_array(fig))
loop_range = range(1, NUMBER_OF_FRAMES)
else:
loop_range = range(NUMBER_OF_FRAMES)
palette = sns.color_palette("tab10") # Returns three colors from the palette (we have three groups)
animation_data_df = pd.DataFrame(columns=["x", "y", "location", "index"])
for frame_idx in loop_range:
if frame_idx in change_times:
plt.close("all")
# Get person who appears/moves/disappears
indexes, loc_nums = np.where(all_times == frame_idx)
loc_nums += 1
for i, loc in zip(indexes, loc_nums):
if loc != 4:
x, y = df[[f"loc_{loc}_x", f"loc_{loc}_y"]].iloc[i]
if loc == 1: # location_1
animation_data_df = animation_data_df.append(
{"x": x, "y": y, "location": loc, "index": i},
ignore_index=True
)
else:
data_index = np.where(animation_data_df["index"] == i)[0][0]
if loc in (2, 3): # location_2 or 3
animation_data_df.loc[[data_index], :] = x, y, loc, i
elif loc == 4: # Disappear
animation_data_df.iloc[data_index] = np.nan
current_palette_size = np.sum(~np.isnan(np.unique(animation_data_df["location"])))
fig, ax = plt.subplots(figsize=RESOLUTION)
sns.scatterplot(
x="x", y="y", hue="location", data=animation_data_df, ax=ax, palette=palette[:current_palette_size]
)
ax.set_xticks(x_ticks)
ax.set_xticklabels(x_ticks)
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_ticks)
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
ax.set_title(f"People movement. T={frame_idx}")
video_writer.add(figure_to_array(fig))
video_writer.close()
Edit: There was a bug in which location_3 wasn't removed after 3 seconds. Fixed now.
Modifying the code from this question to only include the positions you want automatically removes the old ones if the old position isn't included in the new ones. This doesn't change if you want to animate by time or iterations or anything else. I have opted to use iterations here since it's easier and I don't know how you are handling your dataset. The code does have one bug though, the last point (or points if they last the same amount of time) remaining won't disappear, this can be solved easily if you don't want to draw anything again, if you do though for exaple in case you there is a gap in the data with no people and then the data resumes I haven't found any workarounds
import math
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
#The t0,t1,t2,t3 are the times (in iterations) that the position changes
#If t0 is None then the person will never be displayed
people = [
# t0 x1 y1 t1 x2 y2 t2 x3 y3 t4
[ 0, 1, 0.1, 1, 2, 0.2, 2, 3, 0.3, 3],
[ 2, None, None, None, 2, 1, 3, 4, 1, 7],
[ 2, float("NaN"), float("NaN"), float("NaN"), 2, 0.8, 4, 4, 0.8, 10],
]
fig = plt.figure()
plt.xlim(0, 5)
plt.ylim(0, 1)
graph = plt.scatter([], [])
def animate(i):
points = []
colors = []
for person in people:
if person[0] is None or math.isnan(person[0]) or i < person[0]:
continue
# Position 1
elif person[3] is not None and not (math.isnan(person[3])) and i <= person[3]:
new_point = [person[1], person[2]]
color = "b"
# Position 2
elif person[6] is not None and not (math.isnan(person[6])) and i <= person[6]:
new_point = [person[4], person[5]]
color = "r"
# Position 3
elif person[9] is not None and not (math.isnan(person[9])) and i <= person[9]:
new_point = [person[7], person[8]]
color = "r"
else:
people.remove(person)
new_point = []
if new_point != []:
points.append(new_point)
colors.append(color)
if points != []:
graph.set_offsets(points)
graph.set_facecolors(colors)
else:
# You can use graph.remove() to fix the last point not disappiring but you won't be able to plot anything after that
# graph.remove()
pass
return graph
ani = FuncAnimation(fig, animate, repeat=False, interval=500)
plt.show()
I am trying to identify small dashed lines in an image. An example would be identifying copy area in an excel type of application.
I have tried this.
I am finding it difficult to chose the filter sizes. So, I tried a different approach using Fourier Transform to check repeatability.
Given I know the dashed line pixel repetition range I go row by row by using a moving window to check for periodicity by finding dominant frequency in that window.
If dominant frequency is in range of dashed lines period I set the mask in the mask image. I repeat the same for columns. However this is still failing. Any suggestions/other techniques ?
Here is the code:
import cv2
import numpy as np
img = cv2.imread('test.png')
imgGray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
cv2.imshow('imgGray', imgGray)
rows,cols = imgGray.shape
maskImage = np.full((rows, cols), 0, dtype=np.uint8)
kernelL = np.array([[1, 1, 1], [1, -8, 1], [1, 1, 1]], dtype=np.float32)
imgLaplacian = cv2.filter2D(imgGray, cv2.CV_32F, kernelL)
imgResult = imgLaplacian
imgResult = np.clip(imgResult, 0, 255)
imgResult = imgResult.astype('uint8')
imgLaplacian = imgResult
cv2.imshow('imgLaplacian', imgLaplacian)
dashLineSearchInterval = 30
fmaxPixel =9 # minimum interval for dash repetation
fminPixel =7 # maximum interval for dash repetation
stride =2
for y in range(0,rows-dashLineSearchInterval,stride):
for x in range(0,cols-dashLineSearchInterval,stride):
kX = (imgLaplacian[y,x:x+ dashLineSearchInterval]).copy()
kX = kX - np.mean(kX)
N= dashLineSearchInterval
freq = np.fft.fftfreq(N)
ft = np.fft.fft(kX) # fourier transform
power = ft.real**2 + ft.imag**2 # power
maxPowerFreq= np.argmax(power) # dominant frequency
domFreq = freq [maxPowerFreq]
if(domFreq<0):
domFreq = -domFreq
#print(domFreq)
if float(1/fmaxPixel) <= domFreq <= float(1/fminPixel) :
maskImage[y,x:x+dashLineSearchInterval]=255
for x in range(0,cols-dashLineSearchInterval,stride):
for y in range(0,rows-dashLineSearchInterval,stride):
kY = (imgLaplacian[y:y+dashLineSearchInterval,x]).copy()
kY = kY - np.mean(kY)
N= dashLineSearchInterval
freq = np.fft.fftfreq(N)
ft = np.fft.fft(kY) # fourier transform
power = ft.real**2 + ft.imag**2 # power
maxPowerFreq= np.argmax(power) # dominant frequency
domFreq = freq [maxPowerFreq]
if(domFreq<0):
domFreq = -domFreq
#print(domFreq)
if float(1/fmaxPixel) <= domFreq <= float(1/fminPixel) :
maskImage[y:y+dashLineSearchInterval,x]=255
cv2.imshow('maskImage', maskImage)
cv2.waitKey()
I'm trying to make a fit using matplotlib.psd function. My datafile has 8 columns with displacement and speed for a particle (positionX, positionY, positionZ, AveragePositionXYZ, speedX, speedY, speedZ, AverageSpeedXYZ). Using the positionX for example, I try to get the Power Spectrum with matplotlib.psd:
power, freqs = plt.psd(data, len(data), Fs = 256, scale_by_freq=True, return_line=0)
Then, I try to make a curve fitting using linear regression with scipy stas.linregress:
slope, inter, r2, p, stderr = stats.linregress(x, y)
However, my results are very bad. I try to plot with:
line = (inter + slope * (10 * np.log10(freqs)))
plt.semilogx(freqs, line)
plt.show()
And get the following image:
I know that I have a lot of mistakes, and I try to get some solutions in the web. However, I have not had much success. So, I'm asking if there's someone here that could help me.
The datafile has the following format (first 10 lines):
1.50000000,0.00000000,0.00000000,0.50000000,0.00000000,0.00000000,0.00000000,0.00000000
1.49788889,0.00000000,0.00000000,0.49929630,-0.06333333,0.00000000,0.00000000,-0.02111111
1.49367078,0.00000005,0.00000000,0.49789028,-0.12654314,0.00000165,0.00000000,-0.04218050
1.48735391,0.00000027,0.00000000,0.49578473,-0.18950635,0.00000659,0.00000000,-0.06316659
1.47895054,0.00000082,0.00000000,0.49298379,-0.25210085,0.00001647,0.00000000,-0.08402813
1.46847701,0.00000192,0.00000000,0.48949298,-0.31420588,0.00003296,0.00000000,-0.10472431
1.45595360,0.00000385,0.00000000,0.48531915,-0.37570257,0.00005769,0.00000000,-0.12521496
1.44140445,0.00000692,0.00000000,0.48047046,-0.43647431,0.00009232,0.00000000,-0.14546066
1.42485754,0.00001154,0.00000000,0.47495636,-0.49640723,0.00013851,0.00000000,-0.16542291
1.40634452,0.00001814,0.00000000,0.46878755,-0.55539066,0.00019789,0.00000000,-0.18506426
My complete Python code is as follows:
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
filename = 'datafile.txt'
# Load data
file = np.genfromtxt(filename,
skip_header = 0,
skip_footer = 0,
delimiter = ',',
dtype = 'float32',
filling_values = 0,
usecols = (0, 1, 2, 3, 4, 5, 6, 7),
names = ['posX', 'posY', 'posZ', 'posMedias', 'velX', 'velY', 'velZ', 'velMedias'])
# Map values
posX = file['posX']
posY = file['posY']
posZ = file['posZ']
posMedia = file['posMedias']
velX = file['velX']
velY = file['velY']
velZ = file['velZ']
velMedia = file['velMedias']
# Column data that will be used
data = posMedia
# PSD calculation
power, freqs = plt.psd(data, len(data), Fs = 256, scale_by_freq=True, return_line=0)
# Linear fit
x = np.log10(freqs[1:])
y = np.log10(power[1:])
slope, inter, r2, p, stderr = stats.linregress(x, y)
print(slope, inter)
# Plot
line = (inter + slope * (10 * np.log10(freqs)))
plt.semilogx(freqs, line)
plt.show()
Thank you so much!
I'm trying to make a movie by taking png images of an updating plot and stitching them together. There are three variables: degrees, ksB, and mp. Only mp changes each frame; the other two are constant. The data for mp for all times is stored in X. This is the relevant part of the code:
def plot(fname, haveMLPY=False):
# Load data from .npz file.
data = np.load(fname)
X = data["X"]
T = data["T"]
N = X.shape[1]
A = data["vipWeights"]
degrees = A.sum(1)
ksB = data["ksB"]
# Initialize a figure.
figure = plt.figure()
# Generate a plottable axis as the first subplot in 1 rows and 1 columns.
axis = figure.add_subplot(1,1,1)
# MP is the first (0th) variable. Plot one trajectory for each cell over time.
axis.plot(T, X[:,:,0], color="black")
# Decorate the plot.
axis.set_xlabel("time [hours]")
axis.set_ylabel("MP [nM]")
axis.set_title("PER mRNA concentration across all %d cells" % N)
firstInd = int(T.size / 2)
if haveMLPY:
import circadian.analysis
# Generate a and plot Signal object, which encapsulates wavelet analysis.
signal = circadian.analysis.Signal(X[firstInd:, 0, 0], T[firstInd:])
signal.showSpectrum(show=False)
files=[]
# filename for the name of the resulting movie
filename = 'animation'
mp = X[10**4-1,:,0]
from mpl_toolkits.mplot3d import Axes3D
for i in range(10**4):
print i
mp = X[i,:,0]
data2 = np.c_[degrees, ksB, mp]
# Find best fit surface for data2
# regular grid covering the domain of the data
mn = np.min(data2, axis=0)
mx = np.max(data2, axis=0)
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
XX = X.flatten()
YY = Y.flatten()
order = 2 # 1: linear, 2: quadratic
if order == 1:
# best-fit linear plane
A = np.c_[data2[:,0], data2[:,1], np.ones(data2.shape[0])]
C,_,_,_ = scipy.linalg.lstsq(A, data2[:,2]) # coefficients
# evaluate it on grid
Z = C[0]*X + C[1]*Y + C[2]
# or expressed using matrix/vector product
#Z = np.dot(np.c_[XX, YY, np.ones(XX.shape)], C).reshape(X.shape)
elif order == 2:
# best-fit quadratic curve
A = np.c_[np.ones(data2.shape[0]), data2[:,:2], np.prod(data2[:,:2], axis=1), data2[:,:2]**2]
C,_,_,_ = scipy.linalg.lstsq(A, data2[:,2])
# evaluate it on a grid
Z = np.dot(np.c_[np.ones(XX.shape), XX, YY, XX*YY, XX**2, YY**2], C).reshape(X.shape)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2)
ax.scatter(degrees, ksB, mp)
ax.set_xlabel('degrees')
ax.set_ylabel('ksB')
ax.set_zlabel('mp')
# form a filename
fname2 = '_tmp%03d.png'%i
# save the frame
savefig(fname2)
# append the filename to the list
files.append(fname2)
# call mencoder
os.system("mencoder 'mf://_tmp*.png' -mf type=png:fps=10 -ovc lavc -lavcopts vcodec=wmv2 -oac copy -o " + filename + ".mpg")
# cleanup
for fname2 in files: os.remove(fname2)
Basically, all the data is stored in X. The format X[i, i, i] means X[time, neuron, data type]. Each time through the loop, I want to update the time, but still plot mp (the 0th variable) for all the neurons.
When I run this code, I get "IndexError: too many indices for array". I asked it to print i to see when the code was going wrong. I get an error when i = 1, meaning that the code loops through once but then has the error the second time.
However, I have data for 10^4 time steps. You can see in the first line of the provided code, I access X[10**4-1, :, 0] successfully. That's why it's confusing to me why X[1,:,0] would be out of range. If anybody could explain why/help me get around this, that would be great.
The traceback error is
Traceback (most recent call last):
File"/Users/angadanand/Documents/LiClipseWorkspace/Circadian/scripts /runMeNets.py", line 196, in module
plot(fname)
File"/Users/angadanand/Documents/LiClipseWorkspace/Circadian/scripts /runMeNets.py", line 142, in plot
mp = X[i,:,0]
IndexError: too many indices for array
Thanks!
Your problem is that you overwrite your X inside your loop:
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
So afterwards it will have another shape and contain different data. I would suggest changing this second X to x_grid and check where you need this "other" X and where the original.
for example:
X_grid, Y_grid = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
I've got two musical files: one lossless with little sound gap (at this time it's just silence but it could be anything: sinusoid or just some noise) at the beginning and one mp3:
In [1]: plt.plot(y[:100000])
Out[1]:
In [2]: plt.plot(y2[:100000])
Out[2]:
This lists are similar but not identical so I need to cut this gap, to find the first occurrence of one list in another with lowest delta error.
And here's my solution (5.7065 sec.):
error = []
for i in range(25000):
y_n = y[i:100000]
y2_n = y2[:100000-i]
error.append(abs(y_n - y2_n).mean())
start = np.array(error).argmin()
print(start, error[start]) #23057 0.0100046
Is there any pythonic way to solve this?
Edit:
After calculating the mean distance between special points (e.g. where data == 0.5) I reduce the area of search from 25000 to 2000. This gives me reasonable time of 0.3871s:
a = np.where(y[:100000].round(1) == 0.5)[0]
b = np.where(y2[:100000].round(1) == 0.5)[0]
mean = int((a - b[:len(a)]).mean())
delta = 1000
error = []
for i in range(mean - delta, mean + delta):
...
What you are trying to do is a cross-correlation of the two signals.
This can be done easily using signal.correlate from the scipy library:
import scipy.signal
import numpy as np
# limit your signal length to speed things up
lim = 25000
# do the actual correlation
corr = scipy.signal.correlate(y[:lim], y2[:lim], mode='full')
# The offset is the maximum of your correlation array,
# itself being offset by (lim - 1):
offset = np.argmax(corr) - (lim - 1)
You might want to take a look at this answer to a similar problem.
Let's generate some data first
N = 1000
y1 = np.random.randn(N)
y2 = y1 + np.random.randn(N) * 0.05
y2[0:int(N / 10)] = 0
In these data, y1 and y2 are almost the same (note the small added noise), but the first 10% of y2 are empty (similarly to your example)
We can now calculate the absolute difference between the two vectors and find the first element for which the absolute difference is below a sensitivity threshold:
abs_delta = np.abs(y1 - y2)
THRESHOLD = 1e-2
sel = abs_delta < THRESHOLD
ix_start = np.where(sel)[0][0]
fig, axes = plt.subplots(3, 1)
ax = axes[0]
ax.plot(y1, '-')
ax.set_title('y1')
ax.axvline(ix_start, color='red')
ax = axes[1]
ax.plot(y2, '-')
ax.axvline(ix_start, color='red')
ax.set_title('y2')
ax = axes[2]
ax.plot(abs_delta)
ax.axvline(ix_start, color='red')
ax.set_title('abs diff')
This method works if the overlapping parts are indeed "almost identical". You will have to think of smarter alignment ways if the similarity is low.
I think what you are looking for is correlation. Here is a small example.
import numpy as np
equal_part = [0, 1, 2, 3, -2, -4, 5, 0]
y1 = equal_part + [0, 1, 2, 3, -2, -4, 5, 0]
y2 = [1, 2, 4, -3, -2, -1, 3, 2]+y1
np.argmax(np.correlate(y1, y2, 'same'))
Out:
7
So this returns the time-difference, where the correlation between both signals is at its maximum. As you can see, in the example the time difference should be 8, but this depends on your data...
Also note that both signals have the same length.