I can sample HSV space (fixed s and v) as so
hue_gradient = np.linspace(0, 360,16)#sample 16 different equally spread hues
hsv = np.ones(shape=(1, len(hue_gradient), 3), dtype=float)*0.75#set sat and brightness to 0.75
hsv[:, :, 0] = hue_gradient#make one array
hsv
array([[[ 0. , 0.75, 0.75],
[ 24. , 0.75, 0.75],
[ 48. , 0.75, 0.75],
[ 72. , 0.75, 0.75],
[ 96. , 0.75, 0.75],
[120. , 0.75, 0.75],
[144. , 0.75, 0.75],
[168. , 0.75, 0.75],
[192. , 0.75, 0.75],
[216. , 0.75, 0.75],
[240. , 0.75, 0.75],
[264. , 0.75, 0.75],
[288. , 0.75, 0.75],
[312. , 0.75, 0.75],
[336. , 0.75, 0.75],
[360. , 0.75, 0.75]]])
However, all of these colors are not perceptually uniform
I can confirm this by running a deltaE2000 equation (delta_e_cie2000) from colormath package. The result looks like this:
The values are deltaE values, colors 0-15 correspond to the hue angle positions. As you can see, some colors are below the perceptual threshold
So, question is, is it possible for me to uniformly sample a hsv space with the s and v fixed? If not, how can I sample the space in a way that the colors are arranged as neighbors with hue similarity with s and v varying as little as they have to?
I tried a few things, but in the end this seemed to work. It spaces the hue values uniformly and then nudges them until they are perceptually uniform.
from colormath import color_objects, color_diff, color_conversions
SAT = 1.0
VAL = 1.0
COLOR_COUNT = 16
NUDGE_SIZE = 0.2
def hue_to_lab(hue):
return color_conversions.convert_color(
color_objects.HSVColor(hue, SAT, VAL), color_objects.LabColor
)
def get_equally_spaced(number, iters=100):
# Create hues with evenly spaced values in hue space
hues = [360 * x / number for x in range(number)]
for _ in range(iters):
# Convert hues to CIELAB colours
cols = [hue_to_lab(h) for h in hues]
# Work out the perceptual differences between pairs of adjacent
# colours
deltas = [
color_diff.delta_e_cie2000(cols[i], cols[i - 1]) for i in range(len(cols))
]
# Nudge each hue towards whichever adjacent colour is furthest
# away perceptually
nudges = [
(deltas[(i + 1) % len(deltas)] - deltas[i]) * NUDGE_SIZE
for i in range(len(deltas))
]
hues = [(h + d) % 360 for (h, d) in zip(hues, nudges)]
return hues
print(get_equally_spaced(COLOR_COUNT, iters=1000))
NUDGE_SIZE can mess it up if set wrong (changing it to 2 here results in nothing resembling a rainbow) and I think the best value depends on how many iterations you’re doing and how many colours you’re generating. The delta_e_cie2000 values for adjacent colours (with the settings given) are [16.290288769191324, 16.290288766871242, 16.290288753399196, 16.290288726186013, 16.290288645469946, 16.290288040904777, 16.290288035037598, 16.290288051426675, 16.290288079361915, 16.290288122430887, 16.290288180738187, 16.290288265350803, 16.290288469198916, 16.29028866254433, 16.2902887136652], which are pretty uniform: I think iters=1000 is overkill for this few colours. I’m using normal lists here, but it should translate to NumPy arrays—and probably run a bit faster.
The algorithm works like this:
Start with a naïve evenly spaced set of hues.
Calculate the perceptual differences between adjacent pairs of colours.
Move each hue slightly towards whichever of its neighbours is most different to it perceptually. The size of this movement is proportional to NUDGE_SIZE.
Repeat 2–3 until the hues have been nudged iters times.
Does anybody have an idea how to get the elements in a list whose values fall within a specific (from - to) range?
I need a loop to check if a list contains elements in a specific range, and if there are any, I need the biggest one to be saved in a variable..
Example:
list = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
# range (0.5 - 0.58)
# biggest = 0.56
You could use a filtered comprehension to get only those elements in the range you want, then find the biggest of them using the built-in max():
lst = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
biggest = max([e for e in lst if 0.5 < e < 0.58])
# biggest = 0.56
As an alternative to other answers, you can also use filter and lambda:
lst = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
biggest = max([i for i in filter(lambda x: 0.5 < x < 0.58, lst)])
I suppose a normal if check would be faster, but I'll give this just for completeness.
Also, you should not use list = ... as list is a built-in in python.
You could also go about it a step at a time, as the approach may aid in debugging.
I used numpy in this case, which is also a helpful tool to put in your tool belt.
This should run as is:
import numpy as np
l = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
a = np.array(l)
low = 0.5
high = 0.58
index_low = (a < high)
print(index_low)
a_low = a[index_low]
print(a_low)
index_in_range = (a_low >= low)
print(index_in_range)
a_in_range = a_low[index_in_range]
print(a_in_range)
a_max = a_in_range.max()
print(a_max)
I'm currently working a project to estimate flow meter uncertainty. The meter uncertainty is based on four different values:
Liquid Flowrate (liq)
Fluid Viscosity (cP)
Water Liquid Ratio (wlr)
Gas Volume Fraction (gvf)
A third party provides tables for the meter at multiple different values for liq, cP, wlr and gvf. As you can guess the data from the meter never perfectly falls into one of the predefined values. For example a minute of data may read:
Liquid Flowrate: 6532
Fluid Viscosity: 22
Water Liquid Ratio: 0.412
Gas Volume Fraction: 0.634
With the data above a four way interpolation on the tables is performed to find what the uncertainty.
I've come up with a solution but it seems clunky and I'm wondering if anyone has any ideas. I'm still new to the pandas game and really appreciate seeing other peoples solutions.
Initially I sort the data to reduce the table down to the values above and below the actual point that I'm looking for.
aliq = 6532 # stbpd
avisc = 22 # centipoise
awlr = 0.412 # water liquid ratio
agvf = 0.634 # gas volume fraction
def findclose(num, colm):
arr = colm.unique()
if num in arr:
clslo = num
clshi = num
else:
clslo = arr[arr > num].min() # close low value
clshi = arr[arr < num].max() # close high value
return [clslo, clshi]
df = tbl_vx52[
(tbl_vx52['liq'].isin(findclose(aliq,tbl_vx52['liq']))) &
(tbl_vx52['visc'].isin(findclose(avisc,tbl_vx52['visc']))) &
(tbl_vx52['wlr'].isin(findclose(awlr,tbl_vx52['wlr']))) &
(tbl_vx52['gvf'].isin(findclose(agvf,tbl_vx52['gvf'])))
].reset_index(drop=True)
The table is reduced down from 2240 to 16 values. Instead of including all the data (tbl_vx52). I've created some code to load so you can see what the sub dataframe looks like, called df, with just the values above and below the areas for this example.
df = pd.DataFrame({'liq':[5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 7000, 7000, 7000, 7000, 7000, 7000, 7000, 7000],
'visc':[10, 10, 10, 10, 30, 30, 30, 30, 10, 10, 10, 10, 30, 30, 30, 30],
'wlr':[0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5],
'gvf':[0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75],
'uncert':[0.0707, 0.0992, 0.0906, 0.1278, 0.0705, 0.0994, 0.091, 0.128, 0.0702, 0.0991, 0.0905, 0.1279, 0.0704, 0.0992, 0.0904, 0.1283],
})
Some pretty crude looping is done to start pairing the values based on individual inputs (either liq, visc, wlr or gvf). Shown below is the first loop on gvf.
pairs = [
slice(0,1),
slice(2,3),
slice(4,5),
slice(6,7),
slice(8,9),
slice(10,11),
slice(12,13),
slice(14,15)]
for pair in pairs:
df.loc[pair,'uncert'] = np.interp(
agvf,
df.loc[pair,'gvf'],
df.loc[pair,'uncert']
)
df.loc[pair,'gvf'] = agvf
df = df.drop_duplicates().reset_index(drop=True)
The duplicate values are dropped, reducing from 16 rows to 8 rows. This is then repeated again for wlr.
pairs = [
slice(0,1),
slice(2,3),
slice(4,5),
slice(6,7)
]
for pair in pairs:
df.loc[pair,'uncert'] = np.interp(
awlr,
df.loc[pair,'wlr'],
df.loc[pair,'uncert']
)
df.loc[pair,'wlr'] = awlr
df = df.drop_duplicates().reset_index(drop=True)
The structure above is repeated for visc (four rows) and finally liquid (two rows) until only one value in the sub array is left. Which gives the uncertainty in meter at your operating point.
I know its pretty clunky. Any input or thoughts on different methods is appreciated.
Alright, I was able to find and apply a matrix based solution. It is based on a matrix method for trilinear interpolation which can be expanded to quad-linear interpolation. Wikipedia provides a good write up on trilinear interpolation. The 8x8 matrix in the wikipedia article can be expanded to a 16x16 for quadlinear interpolation. A single function is written below to make each row inside the matrix.
def quad_row(x, y, z, k):
"""
Generate a row for the quad interpolation matrix
x, y, z, k are scalar input values
"""
qrow = [1,
x, y, z, k,
x*y, x*z, x*k, y*z, y*k, z*k,
x*y*z, x*y*k, x*z*k, y*z*k,
x*y*z*k]
return qrow
It should be evident that this is just an extension of the rows inside the trilinear matrix. The function can be looped across sixteen times to generate the entire matrix.
Side Note: If you want to get fancy you can accomplish the quad_row function using itertools combinations. The advantage is that you can input an array of any size and it returns the properly formatted row for the interpolation matrix. The function is more flexible, but ultimately slower.
from itertools import combinations
def interp_row(values):
values = np.asarray(values)
n = len(values)
intp_row = [1]
for i in range(1, n+1):
intp_row.extend([np.product(x) for x in list(combinations(values, i))])
return intp_row
The function that accepts an input table, finds the values close to your interpolated values, builds the interpolation matrix and performs the matrix math is shown below.
def quad_interp(values, table):
"""
values - four points to interpolate across, pass as list or numpy array
table - lookup data, four input columns and one output column
"""
table = np.asarray(table)
A, B, C, D, E = np.transpose(table)
a, b, c, d = values
in_vector = quad_row(a, b, c, d)
mask = (
np.isin(A, findclose(a, A)) &
np.isin(B, findclose(b, B)) &
np.isin(C, findclose(c, C)) &
np.isin(D, findclose(d, D)))
quad_matrix = []
c_vector = []
for row in table[mask]:
x, y, z, v, w = row
quad_matrix.append(quad_row(x, y, z, v))
c_vector.append(w)
quad_matrix = np.matrix(quad_matrix)
c_vector = np.asarray(c_vector)
a_vector = np.dot(np.linalg.inv(quad_matrix), c_vector)
return float(np.dot(a_vector, in_vector))
For example, calling the function would look like this.
df = pd.DataFrame({'liq':[5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 7000, 7000, 7000, 7000, 7000, 7000, 7000, 7000],
'visc':[10, 10, 10, 10, 30, 30, 30, 30, 10, 10, 10, 10, 30, 30, 30, 30],
'wlr':[0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5],
'gvf':[0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75],
'uncert':[0.0707, 0.0992, 0.0906, 0.1278, 0.0705, 0.0994, 0.091, 0.128, 0.0702, 0.0991, 0.0905, 0.1279, 0.0704, 0.0992, 0.0904, 0.1283],
})
values = [6532, 22, 0.412, 0.634]
quad_interp(values, df)
As seen, no error handling exists for the above function. It will break down if the following is attempted:
1. Interpolating values outside table boundaries.
2. Inputting lookup values that are already in the table, resulting in less than 16 points being selected.
Also, I acknowledge the following:
1. Naming convention could of been better
2. Faster way may exist for creating the mask function
The function findclose() is shown the original question.
Please let me know if you have any feedback or room for improvement .
I have periodic data with the index being a floating point number like so:
time = [0, 0.1, 0.21, 0.31, 0.40, 0.49, 0.51, 0.6, 0.71, 0.82, 0.93]
voltage = [1, -1, 1.1, -0.9, 1, -1, 0.9,-1.2, 0.95, -1.1, 1.11]
df = DataFrame(data=voltage, index=time, columns=['voltage'])
df.plot(marker='o')
I want to create a cross(df, y_val, direction='rise' | 'fall' | 'cross') function that returns an array of times (indexes) with all the
interpolated points where the voltage values equal y_val. For 'rise' only the values where the slope is positive are returned; for 'fall' only the values with a negative slope are retured; for 'cross' both are returned. So if y_val=0 and direction='cross' then an array with 10 values would be returned with the X values of the crossing points (the first one being about 0.025).
I was thinking this could be done with an iterator but was wondering if there was a better way to do this.
Thanks. I'm loving Pandas and the Pandas community.
To do this I ended up with the following. It is a vectorized version which is 150x faster than one that uses a loop.
def cross(series, cross=0, direction='cross'):
"""
Given a Series returns all the index values where the data values equal
the 'cross' value.
Direction can be 'rising' (for rising edge), 'falling' (for only falling
edge), or 'cross' for both edges
"""
# Find if values are above or bellow yvalue crossing:
above=series.values > cross
below=np.logical_not(above)
left_shifted_above = above[1:]
left_shifted_below = below[1:]
x_crossings = []
# Find indexes on left side of crossing point
if direction == 'rising':
idxs = (left_shifted_above & below[0:-1]).nonzero()[0]
elif direction == 'falling':
idxs = (left_shifted_below & above[0:-1]).nonzero()[0]
else:
rising = left_shifted_above & below[0:-1]
falling = left_shifted_below & above[0:-1]
idxs = (rising | falling).nonzero()[0]
# Calculate x crossings with interpolation using formula for a line:
x1 = series.index.values[idxs]
x2 = series.index.values[idxs+1]
y1 = series.values[idxs]
y2 = series.values[idxs+1]
x_crossings = (cross-y1)*(x2-x1)/(y2-y1) + x1
return x_crossings
# Test it out:
time = [0, 0.1, 0.21, 0.31, 0.40, 0.49, 0.51, 0.6, 0.71, 0.82, 0.93]
voltage = [1, -1, 1.1, -0.9, 1, -1, 0.9,-1.2, 0.95, -1.1, 1.11]
df = DataFrame(data=voltage, index=time, columns=['voltage'])
x_crossings = cross(df['voltage'])
y_crossings = np.zeros(x_crossings.shape)
plt.plot(time, voltage, '-ob', x_crossings, y_crossings, 'or')
plt.grid(True)
It was quite satisfying when this worked. Any improvements that can be made?