I'm trying to make a Python app that shows a graph after the input of the data by the user, but the problem is that the y_array and the x_array do not have the same dimensions. When I run the program, this error is raised:
ValueError: x and y must have same first dimension, but have shapes () and ()
How can I draw a graph with the X and Y axis of different length?
Here is a minimal example code that will lead to the same error I got
:
import matplotlib.pyplot as plt
y = [0, 8, 9, 3, 0]
x = [1, 2, 3, 4, 5, 6, 7]
plt.plot(x, y)
plt.show()
This is virtually a copy/paste of the answer found here, but I'll show what I did to get these to match.
First, we need to decide which array to use- the x_array of length 7, or the y_array of length 5. I'll show both, starting with the former. Note that I am using numpy arrays, not lists.
Let's load the modules
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as interp
and the arrays
y = np.array([0, 8, 9, 3, 0])
x = np.array([1, 2, 3, 4, 5, 6, 7])
In both cases, we use interp.interp1d which is described in detail in the documentation.
For the x_array to be reduced to the length of the y_array:
x_inter = interp.interp1d(np.arange(x.size), x)
x_ = x_inter(np.linspace(0,x.size-1,y.size))
print(len(x_), len(y))
# Prints 5,5
plt.plot(x_,y)
plt.show()
Which gives
and for the y_array to be increased to the length of the x_array:
y_inter = interp.interp1d(np.arange(y.size), y)
y_ = y_inter(np.linspace(0,y.size-1,x.size))
print(len(x), len(y_))
# Prints 7,7
plt.plot(x,y_)
plt.show()
Which gives
I have two measurements consisting of x and y value pairs. I want to calculate the difference between these two series. The problem is that I cannot simply calculate the difference between these two measurements because they are sampled differently in the x values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x1 = np.array([1, 2, 3, 4, 5])
y1 = np.array([1, 4, 9, 16, 25])
x2 = np.array([1.5, 2.5, 3.3, 4.2, 5.1])
y2 = np.array([1.3, 2.5, 3.3, 4.2, 5.1])
df = np.array([x1, y1, x2, y2])
df = pd.DataFrame(df.T, columns=['x1', 'y1', 'x2', 'y2'])
df.head()
plt.plot(df.x1.values, df.y1.values, df.x2.values, df.y2.values)
I would like to assign a new variable x = np.linspace(0, 5, 100, endpoint=True) and then determine new y1_new and y2_new by interpolating the y1 and y2 values on the values of x.
I have looked at pandas.resample() but that seems to be working with timestamps. Maybe 'scipy.interpolate' could help but I am not sure about the capabilities. In principle, I know how to program this by hand in python, but I am sure that there is already a solution to my problem.
An example of using the scipy.interpolate would be:
import scipy.interpolate as interp
import numpy as np
x1 = np.array([1, 2, 3, 4, 5])
y1 = np.array([1, 4, 9, 16, 25])
new_x1 = np.linspace(0, 5, 100, endpoint=True)
interpolated_1 = interp.interp1d(x1, y1, fill_value="extrapolate")
new_y1 = interpolated_1(new_x1)
new_y1
All the other methods follow the same signature, more or less, as you can see in the docs. Which one to use, depends on the underlying data you have, for example, the first looks like a quadratic and the second the identity.
I'm trying to reiterate calculation using the previous result via using map function. I have a code work, but looks ugly. If you have insights, so that a code can be written elegantly, please, teach me. Any help will be very appreciable.
The reiterating process is described as you see in the figure below.
I have put my ugly code and also my trial with map function. I appreciate your help in advance.
The ugly one
import numpy as np
ys=np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
xs=ys
from scipy.interpolate import interp1d
g = interp1d(xs, ys, fill_value='extrapolate')
x0=ys[0]
s1=-4
def func(x1):
return -g(x1)/(x0-x1)-s1
from scipy.optimize import fsolve
initial_guess = 5
x1=fsolve(func, initial_guess)[0]
print(x1)
s2=-2
def func(x2):
return -g(x2)/(x1-x2)-s2
from scipy.optimize import fsolve
initial_guess = 5
x2=fsolve(func, initial_guess)[0]
print(x2)
s3=-0.67
def func(x3):
return -g(x3)/(x2-x3)-s3
from scipy.optimize import fsolve
initial_guess = 5
x3=fsolve(func, initial_guess)[0]
print(x3)
My trial with map function
import numpy as np
ys=np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
xs=ys
from scipy.interpolate import interp1d
g = interp1d(xs, ys, fill_value='extrapolate')
x0=ys[0]
s=[-4,-2,-0.67]
def func(x):
return -g(x)/(x0-x)-s
xall=list(map(func, s))
from scipy.optimize import fsolve
initial_guess = 5*np.ones(s.size)
xi=fsolve(xall, initial_guess)[0]
print(xi)
Maybe you want to use a lambda function as input to fsolve. Something like this:
import numpy as np
from scipy.optimize import fsolve
from scipy.interpolate import interp1d
ys = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
xs = ys
g = interp1d(xs, ys, fill_value='extrapolate')
x0 = ys[0]
s = [-4, -2, -0.67]
initial_guess = 5
for si in s:
x0 = fsolve(lambda x1: -g(x1)/(x0 - x1) - si, initial_guess)[0]
print(x0)
In an experiment, a load cell advances in equal increments of distance with time, compresses a sample; stops when a specified distance from the start point is reached; then retracts in equal increments of distance with time back to the starting position.
A plot of pressure (load cell reading) on the y axis against pressure on the x axis produces a familiar hysteresis loop. A plot of pressure (load cell reading) on the y axis against time on the x axis produces an assymetric peak with the maximum pressure in the centre, corresponding to the maximum advancement point of the sensor.
Instead of the above, I'd like to plot pressure on the y axis against distance on the x axis, with the additional constraint that the x axis is labelled starting at 0, with maximum pressure at the middle of the x axis, and 0 again at the right hand end of the x-axis. In other words, the curve will be identical in shape to the plot of pressure v time, but will be of pressure v distance, where the left half of the plot indicates the distance of the probe from its starting position during advancement; and the right half of the plot indicates distance of the probe from its starting position during retraction.
My actual datasets contain thousands of rows of data but by way of illustration, a minimal dummy dataset would look something like the following, where the 3 columns correspond to Time, Distance of probe from origin, and Pressure measured by probe respectively:
[
[0,0,0],
[1,2,10],
[2,4,30],
[3,6,60],
[4,4,35],
[5,2,15],
[6,0,0]
]
I can't work out how to get MatPlotlib to construct the x-axis so that the range goes from 0 to a maximum, then back to 0 again. I'd be grateful for advice on how to achieve this plot in the most simple and elegant way. Many thanks.
As you have time, you can use it for the x axis values and just change the x tick labels:
import numpy as np
import matplotlib.pyplot as plt
# Time, Distance, Pressure
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 0], data[:, 2]) # Pressure(time)
ax.set_xticks(data[::skip, 0])
ax.set_xticklabels(data[::skip, 1]) # Pressure(Distance(time)) ?
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
The skip is just so you don't end up with too many ticks on the plot, change as you like.
As said in comment, the above only holds for uniforme changes in distance as a function of time. For non uniform changes, you'll have to use something like:
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[3.5, 5.4, 40],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
def find_max_pos(data, column=0):
return np.argmax(data[:, column])
def reverse_unload(data, unload_start):
# prepare new_data with new column:
new_shape = np.array(data.shape)
new_shape[1] += 1
new_data = np.empty(new_shape)
# copy all correct data
new_data[:, 0] = data[:, 0]
new_data[:, 1] = data[:, 1]
new_data[:, 2] = data[:, 2]
new_data[:unload_start+1, 3] = data[:unload_start+1, 1]
# use gradient to fill the rest
gradient = -np.gradient(data[:, 1])
for i in range(unload_start + 1, data.shape[0]):
new_data[i, 3] = new_data[i-1, 3] + gradient[i]
return new_data
data = reverse_unload(data, find_max_pos(data, 1))
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 3], data[:, 2]) # Pressure("Distance")
ax.set_xticks(data[::skip, 3])
ax.set_xticklabels(data[::skip, 1])
ax.grid() # added for clarity
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
Regarding the fact that using the measured values as the ticks results in these not being round nice numbers, I found it was just easier to map the automatic ticks from matplotlib to the correct values:
import numpy as np
import matplotlib.pyplot as plt
data = [[0, 0, 0],
[1, 2, 10],
[2, 4, 30],
[3, 6, 60],
[3.5, 5.4, 40],
[4, 4, 35],
[5, 2, 15],
[6, 0, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
def find_max_pos(data, column=0):
return np.argmax(data[:, column])
def reverse_unload(data):
unload_start = find_max_pos(data, 1)
# prepare new_data with new column:
new_shape = np.array(data.shape)
new_shape[1] += 1
new_data = np.empty(new_shape)
# copy all correct data
new_data[:, 0] = data[:, 0]
new_data[:, 1] = data[:, 1]
new_data[:, 2] = data[:, 2]
new_data[:unload_start+1, 3] = data[:unload_start+1, 1]
# use gradient to fill the rest
gradient = data[unload_start:-1, 1]-data[unload_start+1:, 1]
for i, j in enumerate(range(unload_start + 1, data.shape[0])):
new_data[j, 3] = new_data[j-1, 3] + gradient[i]
return new_data
def create_map_function(data):
"""
Return function that maps values of distance
folded over the maximum pressure applied.
"""
max_index = find_max_pos(data, 1)
x0, y0 = data[max_index, 1], data[max_index, 1]
x1, y1 = 2*data[max_index, 1], 0
m = (y1 - y0) / (x1 - x0)
b = y0 - m*x0
def map_function(x):
if x < x0:
return x
else:
return m*x+b
return map_function
def process_data(data):
data = reverse_unload(data)
map_function = create_map_function(data)
fig, ax = plt.subplots()
ax.plot(data[:, 3], data[:, 2])
ax.set_xticklabels([map_function(x) for x in ax.get_xticks()])
ax.grid()
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
if __name__ == '__main__':
process_data(data)
Update: Have found a workaround to the problem of rounding ticks to the nearest integer by using the np.around function which rounds decimals to the nearest even value, to a specified number of decimal places (default = 0): e.g. 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc. More info here: https://docs.scipy.org/doc/numpy1.10.4/reference/generated/numpy.around.html
So berna1111's code becomes:
import numpy as np
import matplotlib.pyplot as plt
# Time, Distance, Pressure
data = [[0, 0, 0],
[1, 1.9, 10], # Dummy data including decimals to demonstrate rounding
[2, 4.1, 30],
[3, 6.1, 60],
[4, 3.9, 35],
[5, 1.9, 15],
[6, -0.2, 0]]
# convert to array to allow indexing like [i, j]
data = np.array(data)
fig = plt.figure()
ax = fig.add_subplot(111)
max_ticks = 10
skip = (data.shape[0] / max_ticks) + 1
ax.plot(data[:, 0], data[:, 2]) # Pressure(time)
ax.set_xticks(data[::skip, 0])
ax.set_xticklabels(np.absolute(np.around((data[::skip, 1])))) # Pressure(Distance(time)); rounded to nearest integer
ax.set_ylabel('Pressure [Pa?]')
ax.set_xlabel('Distance [m?]')
fig.show()
According to the numpy documentation, np.around should round the final value of -0.2 for Distance to '0.0'; however it seems to round to '-0.0' instead. Not sure why this occurs, but since all my xticklabels in this particular case need to be positive integers or zero, I can correct this behaviour by using the np.absolute function as shown above. Everything now seems to work OK for my requirements, but if I'm missing something, or there's a better solution, please let me know.