I have a numpy array with the following integer numbers:
[10 30 16 18 24 18 30 30 21 7 15 14 24 27 14 16 30 12 18]
I want to normalize them to a range between 1 and 10.
I know that the general formula to normalize arrays is:
But how am I supposed to scale them between 1 and 10?
Question: What is the simplest/fastest way to normalize this array to values between 1 and 10?
Your range is actually 9 long: from 1 to 10. If you multiply the normalized array by 9 you get values from 0 to 9, which you need to shift back by 1:
start = 1
end = 10
width = end - start
res = (arr - arr.min())/(arr.max() - arr.min()) * width + start
Note that the denominator here has a numpy built-in named arr.ptp():
res = (arr - arr.min())/arr.ptp() * width + start
Related
For those not familiar with Young tableaus, they must increase in value from left to right and from top to bottom. We may have infinity values, but infinity values can only be at the end, as they are the largest.
I've written a function, extract_min, that removes the smallest value and replaces it with the largest value, and puts infinity at the index where the largest value once was. It then must shift the value at the first position until the rules of the Young tableau are restored (values increase from left to right and top to bottom). For example, in the following table:
12
13
15
13
18
20
15
23
25
We will remove 12, replace it with 25, and replace 25 with infinity, resulting in the following table:
25
13
15
13
18
20
15
23
inf
We then perform an operation that moves the value 25 until the rows and columns are in increasing order from left to right and from top to bottom, respectively. By the end, it should look like this:
11
13
15
18
20
25
22
23
inf
My code is as follows:
def extract_min(arr):
min=arr[0][0]
arr[0][0], arr[len(arr)-1][len(arr)-1]=arr[len(arr)-1][len(arr)-1], float('inf')
young_order(arr,0,0)
return min
def young_order(arr, x, y):
while (arr[x][y] >= arr[x + 1][y] and arr[x][y] >= arr[x][y + 1]):
if arr[x + 1][y] < arr[x][y + 1] or y+1==len(arr):
arr[x][y], arr[x + 1][y] = arr[x + 1][y], arr[x][y]
x += 1
if arr[x][y+1] < arr[x+1][y] or x + 1 == len(arr):
arr[x][y], arr[x][y + 1] = arr[x][y + 1], arr[x][y]
y += 1
for i in range(len(empty)):
print(empty[i])
For some reason, this function works on the example I've provided, but not on the following table:
1
2
3
4
5
6
7
8
9
My questions are:
Why, and how do I fix it?
How can I possibly make this algorithm run recursively?
TIA.
I have an array
a=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21...]
and I want to first choose the elements 7, 8, 9 and afterwards choose every 10 items of these 3 elements to form a new array
b=[7 8 9 17 18 19 27 28 29 ....]
How could I implement this?
you could use a list comprehension and boolean indexing:
import numpy as np
a = np.arange(123)
mask = [True if x%10 in [7,8,9] else False for x in a]
b = a[mask]
You can use reshaping to convert it to 2-D, then select columns and finally flatten it back to 1-D:
b = a.reshape(-1,10)[:,7:10].flatten()
And if your array’s shape is not a multiplier of 10, you can either crop it or pad it with zeros first and then remove extra zeros from selection.
How to crop first:
a = a[:a.size//10*10]
And padding with zero:
a = a.resize((a.size//10+1)*10)
Removing extra unwanted selected zeros is another cropping.
I am writing a code the represent the Ulam Spiral Diagonal Numbers and this is the code I typed myself
t = 1
i = 2
H = [1]
while i < 25691 :
for n in range(4):
t += i
H.append(t)
i += 2
print(H)
The number "25691" in the code is the side lenght of the spiral.If it was 7 then the spiral would contain 49 numbers etc.
Here H will give you the all numbers in diagonal. But I wonder is there a much faster way to do this.
For example if I increase the side lenght large amount it really takes forever to calculate the next H.
Code Example:
t = 1
i = 2
H = [1]
for j in range(25000,26000):
while i < j :
for n in range(4):
t += i
H.append(t)
i += 2
For example my computer cannot calculate it so, is there a faster way to do this ?
You dont need to calculate the intermediate values:
Diagonal, horizontal, and vertical lines in the number spiral correspond to polynomials of the form
where b and c are integer constants.
wikipedia
You can find b and c by solving a linear system of equations for two numbers.
17 16 15 14 13
18 5 4 3 12 ..
19 6 1 2 11 28
20 7 8 9 10 27
21 22 23 24 25 26
Eg for the line 1,2,11,28 etc:
f(0) = 4*0*0+0*b+c = 1 => c = 1
f(1) = 4*1*1+1*b+1 = 2 => 5+b = 2 => b = -3
f(2) = 4*2*2+2*(-3)+1 = 11
f(3) = 4*3*3+3*(-3)+1 = 28
I understand how to create simple quantiles in Pandas using pd.qcut. But after searching around, I don't see anything to create weighted quantiles. Specifically, I wish to create a variable which bins the values of a variable of interest (from smallest to largest) such that each bin contains an equal weight. So far this is what I have:
def wtdQuantile(dataframe, var, weight = None, n = 10):
if weight == None:
return pd.qcut(dataframe[var], n, labels = False)
else:
dataframe.sort_values(var, ascending = True, inplace = True)
cum_sum = dataframe[weight].cumsum()
cutoff = max(cum_sum)/n
quantile = cum_sum/cutoff
quantile[-1:] -= 1
return quantile.map(int)
Is there an easier way, or something prebuilt from Pandas that I'm missing?
Edit: As requested, I'm providing some sample data. In the following, I'm trying to bin the "Var" variable using "Weight" as the weight. Using pd.qcut, we get an equal number of observations in each bin. Instead, I want an equal weight in each bin, or in this case, as close to equal as possible.
Weight Var pd.qcut(n=5) Desired_Rslt
10 1 0 0
14 2 0 0
18 3 1 0
15 4 1 1
30 5 2 1
12 6 2 2
20 7 3 2
25 8 3 3
29 9 4 3
45 10 4 4
I don't think this is built-in to Pandas, but here is a function that does what you want in a few lines:
import numpy as np
import pandas as pd
from pandas._libs.lib import is_integer
def weighted_qcut(values, weights, q, **kwargs):
'Return weighted quantile cuts from a given series, values.'
if is_integer(q):
quantiles = np.linspace(0, 1, q + 1)
else:
quantiles = q
order = weights.iloc[values.argsort()].cumsum()
bins = pd.cut(order / order.iloc[-1], quantiles, **kwargs)
return bins.sort_index()
We can test it on your data this way:
data = pd.DataFrame({
'var': range(1, 11),
'weight': [10, 14, 18, 15, 30, 12, 20, 25, 29, 45]
})
data['qcut'] = pd.qcut(data['var'], 5, labels=False)
data['weighted_qcut'] = weighted_qcut(data['var'], data['weight'], 5, labels=False)
print(data)
The output matches your desired result from above:
var weight qcut weighted_qcut
0 1 10 0 0
1 2 14 0 0
2 3 18 1 0
3 4 15 1 1
4 5 30 2 1
5 6 12 2 2
6 7 20 3 2
7 8 25 3 3
8 9 29 4 3
9 10 45 4 4
I have a numpy array like this:
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
I want to replace all the zeros with the median value of the whole array (where the zero values are not to be included in the calculation of the median)
So far I have this going on:
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
foo = np.array(foo_array)
foo = np.sort(foo)
print "foo sorted:",foo
#foo sorted: [ 0 0 0 0 0 3 5 8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
nonzero_values = foo[0::] > 0
nz_values = foo[nonzero_values]
print "nonzero_values?:",nz_values
#nonzero_values?: [ 3 5 8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
size = np.size(nz_values)
middle = size / 2
print "median is:",nz_values[middle]
#median is: 26
Is there a clever way to achieve this with numpy syntax?
Thank you
This solution takes advantage of numpy.median:
import numpy as np
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
foo = np.array(foo_array)
# Compute the median of the non-zero elements
m = np.median(foo[foo > 0])
# Assign the median to the zero elements
foo[foo == 0] = m
Just a note of caution, the median for your array (with no zeroes) is 23.5 but as written this sticks in 23.
foo2 = foo[:]
foo2[foo2 == 0] = nz_values[middle]
Instead of foo2, you could just update foo if you want. Numpy's smart array syntax can combine a few lines of the code you made. For example, instead of,
nonzero_values = foo[0::] > 0
nz_values = foo[nonzero_values]
You can just do
nz_values = foo[foo > 0]
You can find out more about "fancy indexing" in the documentation.