Numpy: How to scale an integer array by an integer value? - python

I have a numpy array with maximum value num. I would like to scale all the values in the array by newMaxValue/num so that the new maximum value of array is newMaxValue. I tried to convert the array to float and make the division afterwards but I cannot seem to divide and multiply it successfully. I always end up with a zero valued array.
What is the correct way of doing this?
Thanks

Make sure you convert the max to a float:
>>> from numpy import array
>>> a = array([1, 2, 3, 4, 5])
>>> new_max = 6
>>> a / max(a) # This is probably what happens to you
array([0, 0, 0, 0, 1])
>>> a / float(max(a)) # Convert that integer to a float and it'll work
array([ 0.2, 0.4, 0.6, 0.8, 1. ])
>>> a / float(max(a)) * new_max
array([ 1.2, 2.4, 3.6, 4.8, 6. ])

import numpy as np
newMax = 20
myarr = np.random.randint(10, size=(10,2))
newarr = (myarr/float(np.amax(myarr))*newMax
PS: post your code, you probably made a simple coding mistake.

Related

Insert calculated values between consecutive values in array

Let's say I have a simple array, like this one:
import numpy as np
a = np.array([1,2,3])
Which returns me, obviously:
array([1, 2, 3])
I'm trying to add calculated values between consecutive values in this array. The calculation should return me n equally spaced values between it's bounds.
To express myself in numbers, let's say I want to add 1 value between each pair of consecutive values, so the function should return me a array like this:
array([1, 1.5, 2, 2.5, 3])
Another example, now with 2 values between each pair:
array([1, 1.33, 1.66, 2, 2.33, 2.66, 3])
I know the logic and I can create myself a function which will do the work, but I feel numpy has specific functions that would make my code so much cleaner!
If your array is
import numpy as np
n = 2
a = np.array([1,2,5])
new_size = a.size + (a.size - 1) * n
x = np.linspace(a.min(), a.max(), new_size)
xp = np.linspace(a.min(), a.max(), a.size)
fp = a
result = np.interp(x, xp, fp)
returns: array([1. , 1.33333333, 1.66666667, 2. , 2.66666667, 3.33333333, 4. ])
If your array is always evenly spaced, you can just use
new_size = a.size + (a.size - 1) * n
result = np.linspace(a.min(), a.max(), new_size)
Using linspace should do the trick:
a = np.array([1,2,3])
n = 1
temps = []
for i in range(1, len(a)):
temps.append(np.linspace(a[i-1], a[i], num=n+1, endpoint=False))
# Add last final ending point
temps.append(np.array([a[-1]]))
new_a = np.concatenate(temps)
print(new_a)
Try with np.arange:
a = np.array([1,2,3])
n = 2
print(np.arange(a.min(), a.max(), 1 / (n + 1)))
Output:
[1. 1.33333333 1.66666667 2. 2.33333333 2.66666667]

Fast, efficient approach to average values in one array selected using a key in another array

Sorry for the cryptic description.....
I'm workng in Python and need a fast solution for the below problem
I have an array of float values in one array (this array length can include be millions of values
values = [0.1, 0.2, 5.7, 12.9, 3.5, 100.6]
Each value represents an estimate of a quantity at a particular location where the location is identified by an ID. Multiple estimates per location are possible/common
locations = [1, 5, 3, 1, 1, 3]
I need to average all of the values that that share the same location id.
I can use numpy.where to do this for one location value
average_value_at_location = np.average(values[np.where(locations == 1)])
And of course I could loop over all of the unique values in locations..... But I'm looking for a fast (vectorized) way of doing this and can't figure out how to compose the numpy functions to do this without looping in Python.....
I'm not tied to numpy for this solution.
Any help will be gratefully received.
Thanks,
Doug
Assuming locations go from 0 to a maximum value of locmax (e.g. locmax=5), you can create a 2d array of nans to store the values at the corresponding location:
placements = np.zeros((values.size, locmax+1)) * np.nan
Then assign all the values using indexing:
placements[np.arange(values.size), locations] = values
Finally, calculate the np.nanmean along axis 0:
means = np.nanmean(placements, axis=0)
For your example this results in:
array([ nan, 5.5 , nan, 53.15, nan, 0.2 ])
Using add.reduceat for every group.
Preparing the arrays
import numpy as np
values = np.array([0.1, 0.2, 5.7, 12.9, 3.5, 100.6])
locations = np.array([1, 5, 3, 1, 1, 3])
Getting the indices to sort the arrays in groups
locsort = np.argsort(locations)
# locations[locsort] -> [ 1, 1, 1, 3, 3, 5]
# values[locsort] -> [0.1, 12.9, 3.5, 5.7, 100.6, 0.2]
Computing the start index for each group
i = np.flatnonzero(np.diff(locations[locsort], prepend=0))
# [0, 3, 5]
Adding values per group and dividing by the group size
np.add.reduceat(values[locsort], i) / np.diff(i, append=len(locsort))
# [ 16.5, 106.3, 0.2] / [3, 2, 1]
Output
array([ 5.5 , 53.15, 0.2 ])
OK - I've tried four solutions based on the replies here. So far, the pandas groupby approach is the winner, but the numpy add.reduceat solution proposed by Michael S is a close second......
Using pandas (from the link provided by Ben T)
# Set up the data arrays
rng = np.random.default_rng(12345)
values = rng.random(size = 100000)
locations = rng.integers(low = 1, high = 25000, size = 100000)
#Create the pandas dataframe
df = pd.DataFrame({"locations":locations, "values": values})
# groupby and mean
start=timer()
average_by_location_pandas = df.groupby(["locations"]).mean()
end=timer()
print("Pandas time :", end-start)
Pandas time : 0.009602722000000008
Using numpy np.where and list comprehension to lop over unique locations
unique_locations = np.unique(locations)
average_by_location_numpy = [(i, values[locations==i].mean()) for i in unique_locations]
Numpy time : 2.644003632
Using numpy_indexed (link provide by Ben T)
average_by_location_numpy_indexed = npi.group_by(locations).mean(values)
Numpy_indexed time : 0.03701074199999965
Using numpy add.reduceat (solution proposed by Michael S)
locsort = np.argsort(locations)
i = np.flatnonzero(np.diff(locations[locsort], prepend=0))
out = np.add.reduceat(values[locsort], i) / np.diff(i, append=len(locsort))
Numpy add_reduceat time : 0.01057279099999997

Convert Numpy array of floats to ints proportionately (balancing chemical equation)

I have a code that balances the chemical equations. The only problem is that I want to convert the final solution i.e. 1D np array of floats to integers. Obviously, I can not directly round it to nearest integers, that would mess up the balancing. One way is to multiply it with a number that will convert the floats to integers(type does not matter). See below for an example.
>>> coeffs=equation_balancer(reactants=["H2","O2"], products=["H2O"])
>>> coeffs
{"H2": 1.0, "O2": 0.5, 'H2O1': 1.0}
>>> import numpy as np
>>> np.asarray([i for i in coeffs.values()])
array([1. , 0.5, 1.])
if the final array is multiplied by 2, then the fractions (floats) can be removed.
PS to show an example above, I changed back to np, since the equation_balancer uses scipy.linalg.solve to balance the equation.
>>> np.asrray([i for i in coeffs.values()])*2
array([2., 1., 2.])
How to get this number that on multiplication with array gives the integer-valued array? The actual type of array does not matter.
One way would be to multiply the array with highest denominator i.e. multiples of 10. And then find the highest common factor:
>>> c=np.asrray([i for i in coeffs.values()])*10
>>> factor = np.gcd.reduce(c.astype(int))
>>> factor
5
>>> c/factor
array([2., 1., 2.])
In the above case finding the 10*n that is defined by the number of highest decimal places, is crucial. I don't know how to code it at the moment. Is there any other approach that would be more suitable? Any help.
This seems to work:
(Credit to this SO answer on how to convert a floating point number into a tuple of "minimal" integer numerator and integer denominator -- rather than some freaksihly large numerator and denominator)
import numpy as np
from fractions import Fraction
# A configurable param.
# Keep this small to avoid frekish large results.
# Increase it only in rare cases where the coeffs
# span a "huge" scale.
MAX_DENOM = 100
fractions = [Fraction(val).limit_denominator(MAX_DENOM)
for val in coeffs.values()]
ratios = np.array([(f.numerator, f.denominator) for f in fractions])
# As an alternative to the above two statements, uncomment and use
# below statement for Python 3.8+
# ratios = np.array([Fraction(val).limit_denominator(MAX_DENOM).as_integer_ratio()
# for val in coeffs.values()])
factor = np.lcm.reduce(ratios[:,1])
result = [round(v * factor) for v in coeffs.values()]
# print
result
Output for coeffs = {"H2": 1.0, "O2": 0.5, 'H2O1': 1.0}:
[2, 1, 2]
Output for coeffs = {"H2": 0.5, "N2":0.5, "O2": 1.5, "H1N1O3":1.0}:
[1, 1, 3, 2]
Output for coeffs = {"H2": 1.0, "O3": (1/3), "H2O1":1.0}:
[3, 1, 3]
Output for coeffs = {"H4": 0.5, "O7": (1/7), "H2O1":1.0}:
[7, 2, 14]
Output for coeffs = {"H2": .1, "O2": 0.05, 'H2O1': .1}:
[2, 1, 2]
I am not entirely happy with my solution but it seems to work alright, let me know what you think, I am essentially converting the float to a string and counting the number of characters after the decimal place, it will work as long as the values are always float
import numpy as np
coeffs = {"H2": .1, "O2": 0.05, 'H2O1': .1}
n = max([len(str(i).split('.')[1]) for i in coeffs.values()])
c=np.array([i for i in coeffs.values()])*10**n
factor = np.gcd.reduce(c.astype(np.uint64))
print((c/factor).astype(np.uint64))
source and other solutions:
Easy way of finding decimal places
Testing: running some possible difficult cases examples converting back
primes = [3,5,7,11,13,17,19,23,29,79] ## some prime numbers
primes_over_1 = [1/i for i in primes]
for i in range(1, len(primes_over_1) - 1):
coeffs = {"H2": primes_over_1[i-1], "O2": primes_over_1[i], 'H2O1': primes_over_1[i+1]}
print('coefs: ', [a for a in coeffs.values()])
n = max([len(str(a).split('.')[1]) for a in coeffs.values()])
c=np.array([a for a in coeffs.values()])*10**n
factor = np.gcd.reduce(c.astype(np.uint64))
coeffs_asInt = (c/factor).astype(np.uint64)
print('as int:', coeffs_asInt)
coeffs_back = coeffs_asInt.astype(np.float64)*(factor/10**n)
coeffs_back_str = ["{0:.16g}".format(a) for a in coeffs_back]
print('back: ', coeffs_back_str)
print('########################################################\n')
output:
coefs: [0.3333333333333333, 0.2, 0.14285714285714285]
as int: [8333333333333333 5000000000000000 3571428571428571]
back: ['0.3333333333333334', '0.2', '0.1428571428571428']
########################################################
coefs: [0.2, 0.14285714285714285, 0.09090909090909091]
as int: [5000000000000000 3571428571428571 2272727272727273]
back: ['0.2', '0.1428571428571428', '0.09090909090909093']
########################################################
coefs: [0.14285714285714285, 0.09090909090909091, 0.07692307692307693]
as int: [14285714285714284 9090909090909092 7692307692307693]
back: ['0.1428571428571428', '0.09090909090909093', '0.07692307692307694']
########################################################
coefs: [0.09090909090909091, 0.07692307692307693, 0.058823529411764705]
as int: [2840909090909091 2403846153846154 1838235294117647]
back: ['0.09090909090909091', '0.07692307692307693', '0.05882352941176471']
########################################################
coefs: [0.07692307692307693, 0.058823529411764705, 0.05263157894736842]
as int: [2403846153846154 1838235294117647 1644736842105263]
back: ['0.07692307692307693', '0.05882352941176471', '0.05263157894736842']
########################################################
coefs: [0.058823529411764705, 0.05263157894736842, 0.043478260869565216]
as int: [1838235294117647 1644736842105263 1358695652173913]
back: ['0.05882352941176471', '0.05263157894736842', '0.04347826086956522']
########################################################
coefs: [0.05263157894736842, 0.043478260869565216, 0.034482758620689655]
as int: [6578947368421052 5434782608695652 4310344827586207]
back: ['0.05263157894736842', '0.04347826086956522', '0.03448275862068966']
########################################################
coefs: [0.043478260869565216, 0.034482758620689655, 0.012658227848101266]
as int: [21739130434782608 17241379310344828 6329113924050633]
back: ['0.04347826086956522', '0.03448275862068966', '0.01265822784810127']
########################################################

Any better way to slicing numpy array in parametric way in numpy?

I'd like to do slice operation on numpy array in parametric way in function so I could get expected array element for my computation. I know how to slide the array by index, but I am more interested in slicing array element in parametric way, so no need to indicate the index. In my case, I have coefficient array c and power array p, I have also parameter num_order. Basically, num_order decide the index of slicing array. To do so, I have following attempt:
my attempt:
import numpy as np
c=[1,1/2, -1/6, 1/12]
p= [1,2,3,4]
x = np.array([1, 1, 2, 3, 5, 8, 13, 21])
def arr_pow(x, num_order):
output= []
for i in range(num_order):
mul = c[i] * np.power(x, p[i])
output.append(mul)
return output
so, if num_order=2, then I also slice first two term of c and p doing c_new = c[:-2], p_new=p[:-2], c_new=[1,1/2], p_new=[1,2] and so on. I am curious is there any better way to do slicing element in two or more array based on param num_order. Can anyone point me out any elegant way to make this happen in parameterized function? Any thoughts?
update:
instead of doing c_new=c[:-1], p_new=[:-1] if num_order=3, and c_new=c[:-2], p_new=p[:-2] if num_order=2, and so on, is there more elegant way (parametric fashion) to do this? Any way of doing this efficiently in python function? Thanks!
I'm not sure if this is the output you want (if you could please update your question to include the expected output that would be helpful):
import numpy as np
c = np.array([1, 1 / 2, -1 / 6, 1 / 12])
p = np.array([1, 2, 3, 4])
x = np.array([1, 1, 2, 3, 5, 8, 13, 21])
def arr_pow_numpy(x, num_order):
return c[:num_order, None] * np.power(x[None], p[:num_order, None])
def arr_pow(x, num_order):
output = []
for i in range(num_order):
mul = c[i] * np.power(x, p[i])
output.append(mul)
return np.asarray(output)
for num_order in range(1, len(p)):
assert np.array_equal(arr_pow(x, num_order), arr_pow_numpy(x, num_order)), f"{num_order}"
The idea here is to use NumPy broadcasting plus NumPy slicing to achieve the result you want without for loops and in a parametric way.
Use the following:
num_order = 2
np.array([c[i] * np.power(x, p[i]) for i in range(num_order)])
# Out:
# array([[ 1. , 1. , 2. , 3. , 5. , 8. , 13. , 21. ],
# [ 0.5, 0.5, 2. , 4.5, 12.5, 32. , 84.5, 220.5]])

Math library and arrays in Python

I am to use the Math Library to do some calculations on an array.
I tried something like this:
import numpy as np
import math
a = np.array([0, 1, 2, 3])
a1 = np.vectorize(a)
print("sin(a) = \n", math.sin(a1))
Unfortunately it does not work. An error occur: "TypeError: must be real number, not vectorize".
How can I use the vectorize function to be able to calculate that kind of things?
The whole point of numpy is that you don't need any math method or any list comprehension:
>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> a + 1
array([1, 2, 3, 4])
>>> np.sin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001])
>>> a ** 2
array([0, 1, 4, 9])
>>> np.exp(a)
array([ 1. , 2.71828183, 7.3890561 , 20.08553692])
You can use a as if it were a scalar and you get the corresponding array.
If you really need to use math.sin (hint: you don't), you can vectorize it (the function itself, not the array):
>>> vsin = np.vectorize(math.sin)
>>> vsin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001])
import numpy as np
import math
a = np.array([0, 1, 2, 3])
print("sin(a) = \n", [math.sin(x) for x in a])
math.sin requires one real number at a time.

Categories