Normalization VS. numpy way to normalize? - python

I'm supposed to normalize an array. I've read about normalization and come across a formula:
I wrote the following function for it:
def normalize_list(list):
max_value = max(list)
min_value = min(list)
for i in range(0, len(list)):
list[i] = (list[i] - min_value) / (max_value - min_value)
That is supposed to normalize an array of elements.
Then I have come across this: https://stackoverflow.com/a/21031303/6209399
Which says you can normalize an array by simply doing this:
def normalize_list_numpy(list):
normalized_list = list / np.linalg.norm(list)
return normalized_list
If I normalize this test array test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9] with my own function and with the numpy method, I get these answers:
My own function: [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
The numpy way: [0.059234887775909233, 0.11846977555181847, 0.17770466332772769, 0.23693955110363693, 0.29617443887954614, 0.35540932665545538, 0.41464421443136462, 0.47387910220727386, 0.5331139899831830
Why do the functions give different answers? Is there others way to normalize an array of data? What does numpy.linalg.norm(list) do? What do I get wrong?

There are different types of normalization. You are using min-max normalization. The min-max normalization from scikit learn is as follows.
import numpy as np
from sklearn.preprocessing import minmax_scale
# your function
def normalize_list(list_normal):
max_value = max(list_normal)
min_value = min(list_normal)
for i in range(len(list_normal)):
list_normal[i] = (list_normal[i] - min_value) / (max_value - min_value)
return list_normal
#Scikit learn version
def normalize_list_numpy(list_numpy):
normalized_list = minmax_scale(list_numpy)
return normalized_list
test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9]
test_array_numpy = np.array(test_array)
print(normalize_list(test_array))
print(normalize_list_numpy(test_array_numpy))
Output:
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
MinMaxscaler uses exactly your formula for normalization/scaling:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.minmax_scale.html
#OuuGiii: NOTE: It is not a good idea to use Python built-in function names as varibale names. list() is a Python builtin function so its use as a variable should be avoided.

The question/answer that you reference doesn't explicitly relate your own formula to the np.linalg.norm(list) version that you use here.
One NumPy solution would be this:
import numpy as np
def normalize(x):
x = np.asarray(x)
return (x - x.min()) / (np.ptp(x))
print(normalize(test_array))
# [ 0. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1. ]
Here np.ptp is peak-to-peak ie
Range of values (maximum - minimum) along an axis.
This approach scales the values to the interval [0, 1] as pointed out by #phg.
The more traditional definition of normalization would be to scale to a 0 mean and unit variance:
x = np.asarray(test_array)
res = (x - x.mean()) / x.std()
print(res.mean(), res.std())
# 0.0 1.0
Or use sklearn.preprocessing.normalize as a pre-canned function.
Using test_array / np.linalg.norm(test_array) creates a result that is of unit length; you'll see that np.linalg.norm(test_array / np.linalg.norm(test_array)) equals 1. So you're talking about two different fields here, one being statistics and the other being linear algebra.

The power of python is its broadcasting property, which allows you to do vectorizing array operations without explicit looping. So, You do not need to write a function using explicit for loop, which is slow and time-consuming, especially if your dataset is too big.
The pythonic way of doing min-max normalization is
test_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
normalized_test_array = (test_array - min(test_array)) / (max(test_array) - min(test_array))
output >> [ 0., 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1. ]

Related

Sum of sines and cosines from DFT

I have a signal and want to reconstruct it from its spectrum as a sum of sines and/or cosines. I am aware of the inverse FFT but I want to reconstruct the signal in this way.
An example would look like this:
sig = np.array([1, 5, -3, 0.7, 3.1, -5, -0.5, 3.2, -2.3, -1.1, 3, 0.3, -2.05, 2.1, 3.05, -2.3])
fft = np.fft.rfft(sig)
mag = np.abs(fft) * 2 / sig.size
phase = np.angle(fft)
x = np.arange(sig.size)
reconstructed = list()
for x_i in x:
val = 0
for i, (m, p) in enumerate(zip(mag, phase)):
val += ... # what's the correct form?
reconstructed.append(val)
What's the correct code to write in the next-to-last line?

Recursively dividing up a list, based on the endpoints

Here is what I am trying to do.
Take the list:
list1 = [0,2]
This list has start point 0 and end point 2.
Now, if we were to take the midpoint of this list, the list would become:
list1 = [0,1,2]
Now, if we were to recursively split up the list again (take the midpoints of the midpoints), the list would becomes:
list1 = [0,.5,1,1.5,2]
I need a function that will generate lists like this, preferably by keeping track of a variable. So, for instance, let's say there is a variable, n, that keeps track of something. When n = 1, the list might be [0,1,2] and when n = 2, the list might be [0,.5,1,1.5,2], and I am going to increment the value of to keep track of how many times I have divided up the list.
I know you need to use recursion for this, but I'm not sure how to implement it.
Should be something like this:
def recursive(list1,a,b,n):
"""list 1 is a list of values, a and b are the start
and end points of the list, and n is an int representing
how many times the list needs to be divided"""
int mid = len(list1)//2
stuff
Could someone help me write this function? Not for homework, part of a project I'm working on that involves using mesh analysis to divide up rectangle into parts.
This is what I have so far:
def recursive(a,b,list1,n):
w = b - a
mid = a + w / 2
left = list1[0:mid]
right = list1[mid:len(list1)-1]
return recursive(a,mid,list1,n) + mid + recursive(mid,b,list1,n)
but I'm not sure how to incorporate n into here.
NOTE: The list1 would initially be [a,b] - I would just manually enter that but I'm sure there's a better way to do it.
You've generated some interesting answers. Here are two more.
My first uses an iterator to avoid
slicing the list and is recursive because that seems like the most natural formulation.
def list_split(orig, n):
if not n:
return orig
else:
li = iter(orig)
this = next(li)
result = [this]
for nxt in li:
result.extend([(this+nxt)/2, nxt])
this = nxt
return list_split(result, n-1)
for i in range(6):
print(i, list_split([0, 2], i))
This prints
0 [0, 2]
1 [0, 1.0, 2]
2 [0, 0.5, 1.0, 1.5, 2]
3 [0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2]
4 [0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.125, 1.25, 1.375, 1.5, 1.625, 1.75, 1.875, 2]
5 [0, 0.0625, 0.125, 0.1875, 0.25, 0.3125, 0.375, 0.4375, 0.5, 0.5625, 0.625, 0.6875, 0.75, 0.8125, 0.875, 0.9375, 1.0, 1.0625, 1.125, 1.1875, 1.25, 1.3125, 1.375, 1.4375, 1.5, 1.5625, 1.625, 1.6875, 1.75, 1.8125, 1.875, 1.9375, 2]
My second is based on the observation that recursion isn't necessary if you always start from two elements. Suppose those elements are mn and mx. After N applications of the split operation you will have 2^N+1 elements in it, so the numerical distance between the elements will be (mx-mn)/(2**N).
Given this information it should therefore be possible to deterministically compute the elements of the array, or even easier to use numpy.linspace like this:
def grid(emin, emax, N):
return numpy.linspace(emin, emax, 2**N+1)
This appears to give the same answers, and will probably serve you best in the long run.
You can use some arithmetic and slicing to figure out the size of the result, and fill it efficiently with values.
While not required, you can implement a recursive call by wrapping this functionality in a simple helper function, which checks what iteration of splitting you are on, and splits the list further if you are not at your limit.
def expand(a):
"""
expands a list based on average values between every two values
"""
o = [0] * ((len(a) * 2) - 1)
o[::2] = a
o[1::2] = [(x+y)/2 for x, y in zip(a, a[1:])]
return o
def rec_expand(a, n):
if n == 0:
return a
else:
return rec_expand(expand(a), n-1)
In action
>>> rec_expand([0, 2], 2)
[0, 0.5, 1.0, 1.5, 2]
>>> rec_expand([0, 2], 4)
[0,
0.125,
0.25,
0.375,
0.5,
0.625,
0.75,
0.875,
1.0,
1.125,
1.25,
1.375,
1.5,
1.625,
1.75,
1.875,
2]
You could do this with a for loop
import numpy as np
def add_midpoints(orig_list, n):
for i in range(n):
new_list = []
for j in range(len(orig_list)-1):
new_list.append(np.mean(orig_list[j:(j+2)]))
orig_list = orig_list + new_list
orig_list.sort()
return orig_list
add_midpoints([0,2],1)
[0, 1.0, 2]
add_midpoints([0,2],2)
[0, 0.5, 1.0, 1.5, 2]
add_midpoints([0,2],3)
[0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2]
You can also do this totally non-recursively and without looping. What we're doing here is just making a binary scale between two numbers like on most Imperial system rulers.
def binary_scale(start, stop, level):
length = stop - start
scale = 2 ** level
return [start + i * length / scale for i in range(scale + 1)]
In use:
>>> binary_scale(0, 10, 0)
[0.0, 10.0]
>>> binary_scale(0, 10, 2)
[0.0, 2.5, 5.0, 7.5, 10.0]
>>> binary_scale(10, 0, 1)
[10.0, 5.0, 0.0]
Fun with anti-patterns:
def expand(a, n):
for _ in range(n):
a[:-1] = sum(([a[i], (a[i] + a[i + 1]) / 2] for i in range(len(a) - 1)), [])
return a
print(expand([0, 2], 2))
OUTPUT
% python3 test.py
[0, 0.5, 1.0, 1.5, 2]
%

Sample from a 2d probability numpy array?

Say that I have an 2d array ar like this:
0.9, 0.1, 0.3
0.4, 0.5, 0.1
0.5, 0.8, 0.5
And I want to sample from [1, 0] according to this probability array.
rdchoice = lambda x: numpy.random.choice([1, 0], p=[x, 1-x])
I have tried two methods:
1) reshape it into a 1d array first and use numpy.random.choice and then reshape it back to 2d:
np.array(list(map(rdchoice, ar.reshape((-1,))))).reshape(ar.shape)
2) use the vectorize function.
func = numpy.vectorize(rdchoice)
func(ar)
But these two ways are all too slow, and I learned that the nature of the vectorize is a for-loop and in my experiments, I found that map is no faster than vectorize.
I thought this can be done faster. If the 2d array is large it would be unbearably slow.
You should be able to do this like so:
>>> p = np.array([[0.9, 0.1, 0.3], [0.4, 0.5, 0.1], [0.5, 0.8, 0.5]])
>>> (np.random.rand(*p.shape) < p).astype(int)
Actually I can use the np.random.binomial:
import numpy as np
p = [[0.9, 0.1, 0.3],
[0.4, 0.5, 0.1],
[0.5, 0.8, 0.5]]
np.random.binomial(1, p)

Python rising/falling edge oscilloscope-like trigger

I'm trying to detect rising and/or falling edges in a numpy vector, based on a trigger value. This is kinda like how oscilloscope triggering works.
The numpy vector contains floating point values. The trigger itself is a floating point value. I would expect this to work as such:
import numpy as np
data = np.array([-1, -0.5, 0, 0.5, 1, 1.5, 2])
trigger = rising_edge(data, 0.3)
print(trigger)
[3]
In other words, it would work like np.where, returning a vector containing the positions where the condition is true.
I know i can simply iterate over the vector and get the same result (which is what i'm doing), but it isn't ideal, as you can imagine. Is there some functionality built into numpy that can do this using optimized C code? Or maybe in some other library?
Thank you.
We could slice one-off and compare against the trigger for smaller than and greater than, like so -
In [41]: data = np.array([-1, -0.5, 0, 0.5, 1, 1.5, 2, 0, 0.5])
In [43]: trigger_val = 0.3
In [44]: np.flatnonzero((data[:-1] < trigger_val) & (data[1:] > trigger_val))+1
Out[44]: array([3, 8])
If you would like to include equality as well, i.e. <= or >=, simply add that into the comparison.
To include for both rising and falling edges, add the comparison the other way -
In [75]: data = np.array([-1, -0.5, 0, 0.5, 1, 1.5, 2, 0.5, 0])
In [76]: trigger_val = 0.3
In [77]: mask1 = (data[:-1] < trigger_val) & (data[1:] > trigger_val)
In [78]: mask2 = (data[:-1] > trigger_val) & (data[1:] < trigger_val)
In [79]: np.flatnonzero(mask1 | mask2)+1
Out[79]: array([3, 8])
So I was just watching the latest 3Blue1Brown video on convolution when I realized a new way of doing this:
def rising_edge(data, thresh):
sign = data >= thresh
pos = np.where(np.convolve(sign, [1, -1]) == 1)
return pos
So, get all the positions where the data is larger or equal to the threshold, do a convolution over it with [1, -1], and then just find where the convolution returns a 1 for a rising edge. Want a falling edge? Look for -1 instead.
Pretty neat, if I do say so myself. And it's about 5-10% faster.

Probability list creation

following problem: Having a base triple (0,1,0)
Now I try to create a list of changed triples in a given range.
The constraints:
triple[0] and triple[2] have should have a maximum of r, such as r=0.2
sum(triple) = 1
triple[0] needn't to be equals triple[1] and should be increased by a given stepwise-parameter s, such as s= 0.02
In this above mentioned example our methode should create
lst = [(0.0, 1, 0.0),(0.02, 0.98, 0.), (0.04, 0.96,0), (0.04,0.94, 0.02), (0.06,0.94,0), (0.06, 0.92, 0.02), (0.06, 0.9, 0.04), ...]
Is there any pretty way to do this?
Maybe you have an idea to create these list without nested loops (probably with numpy?).
Thanks a lot!
Here's a list comprehension that should provide all 3-tuples that meet your constraints (as I understand them). Its a bit more clunky than I'd like due to the range function only accepting integers:
r = 0.2
s = 0.02
steps = int(math.ceil(r/s))
lst = [(a*s, 1-(a+b)*s, b*s) for b in range(steps) for a in range(steps)]
Results:
>>> lst[0:4]
[(0.0, 1.0, 0.0), (0.02, 0.98, 0.0), (0.04, 0.96, 0.0), (0.06, 0.94, 0.0)]
>>> lst[90:94]
[(0.0, 0.8200000000000001, 0.18), (0.02, 0.8, 0.18), (0.04, 0.78, 0.18), (0.06, 0.76, 0.18)]
The first and last values only go up to 0.18 in this code, and I'm not sure if that's desirable or not (is the constraint < r or <= r?). It shouldn't be too hard to tweak, if you want it the other way.
You could create a function that makes the triple as you describe ... something such as:
def make_triple(r=0.2, s=0.02):
element_one = round(random.uniform(0, r), 2)
max_s = r/s
element_three = random.randint(0, max_s) * s
element_two = round(1 - element_one - element_three, 2)
return (element_one, element_two, element_three)
And then just create a single loop that calls this function:
list_of_triples = []
for i in range(5):
list_of_triples.append(make_triple(0.2, 0.02))
And there you go! No nested loops necessary.
Another numpy answer just for kicks:
import numpy as np
r = .2
s = .02
a, b = np.mgrid[0:r:s, 0:r:s]
lst = np.dstack([a, 1 - (a+b), b]).reshape(-1, 3)
Here's a NumPy solution without for-loops, as requested. It uses a 3D array and NumPy broadcasting rules to assing the scale by row and by column. scale is a 2D array of a single column so it can be conveniently trasposed by .T. In the end, the 3D array is reshaped to 2D.
import numpy as np
r = .2
s = .02
scale = np.arange(r, step=s, dtype=float).reshape(-1,1)
a = np.empty((len(scale),len(scale),3), dtype=float)
a[:,:,0] = scale
a[:,:,2] = scale.T
a[:,:,1] = 1 - a[:,:,0] - a[:,:,2]
print a.reshape(-1,3)

Categories