I have a code that balances the chemical equations. The only problem is that I want to convert the final solution i.e. 1D np array of floats to integers. Obviously, I can not directly round it to nearest integers, that would mess up the balancing. One way is to multiply it with a number that will convert the floats to integers(type does not matter). See below for an example.
>>> coeffs=equation_balancer(reactants=["H2","O2"], products=["H2O"])
>>> coeffs
{"H2": 1.0, "O2": 0.5, 'H2O1': 1.0}
>>> import numpy as np
>>> np.asarray([i for i in coeffs.values()])
array([1. , 0.5, 1.])
if the final array is multiplied by 2, then the fractions (floats) can be removed.
PS to show an example above, I changed back to np, since the equation_balancer uses scipy.linalg.solve to balance the equation.
>>> np.asrray([i for i in coeffs.values()])*2
array([2., 1., 2.])
How to get this number that on multiplication with array gives the integer-valued array? The actual type of array does not matter.
One way would be to multiply the array with highest denominator i.e. multiples of 10. And then find the highest common factor:
>>> c=np.asrray([i for i in coeffs.values()])*10
>>> factor = np.gcd.reduce(c.astype(int))
>>> factor
5
>>> c/factor
array([2., 1., 2.])
In the above case finding the 10*n that is defined by the number of highest decimal places, is crucial. I don't know how to code it at the moment. Is there any other approach that would be more suitable? Any help.
This seems to work:
(Credit to this SO answer on how to convert a floating point number into a tuple of "minimal" integer numerator and integer denominator -- rather than some freaksihly large numerator and denominator)
import numpy as np
from fractions import Fraction
# A configurable param.
# Keep this small to avoid frekish large results.
# Increase it only in rare cases where the coeffs
# span a "huge" scale.
MAX_DENOM = 100
fractions = [Fraction(val).limit_denominator(MAX_DENOM)
for val in coeffs.values()]
ratios = np.array([(f.numerator, f.denominator) for f in fractions])
# As an alternative to the above two statements, uncomment and use
# below statement for Python 3.8+
# ratios = np.array([Fraction(val).limit_denominator(MAX_DENOM).as_integer_ratio()
# for val in coeffs.values()])
factor = np.lcm.reduce(ratios[:,1])
result = [round(v * factor) for v in coeffs.values()]
# print
result
Output for coeffs = {"H2": 1.0, "O2": 0.5, 'H2O1': 1.0}:
[2, 1, 2]
Output for coeffs = {"H2": 0.5, "N2":0.5, "O2": 1.5, "H1N1O3":1.0}:
[1, 1, 3, 2]
Output for coeffs = {"H2": 1.0, "O3": (1/3), "H2O1":1.0}:
[3, 1, 3]
Output for coeffs = {"H4": 0.5, "O7": (1/7), "H2O1":1.0}:
[7, 2, 14]
Output for coeffs = {"H2": .1, "O2": 0.05, 'H2O1': .1}:
[2, 1, 2]
I am not entirely happy with my solution but it seems to work alright, let me know what you think, I am essentially converting the float to a string and counting the number of characters after the decimal place, it will work as long as the values are always float
import numpy as np
coeffs = {"H2": .1, "O2": 0.05, 'H2O1': .1}
n = max([len(str(i).split('.')[1]) for i in coeffs.values()])
c=np.array([i for i in coeffs.values()])*10**n
factor = np.gcd.reduce(c.astype(np.uint64))
print((c/factor).astype(np.uint64))
source and other solutions:
Easy way of finding decimal places
Testing: running some possible difficult cases examples converting back
primes = [3,5,7,11,13,17,19,23,29,79] ## some prime numbers
primes_over_1 = [1/i for i in primes]
for i in range(1, len(primes_over_1) - 1):
coeffs = {"H2": primes_over_1[i-1], "O2": primes_over_1[i], 'H2O1': primes_over_1[i+1]}
print('coefs: ', [a for a in coeffs.values()])
n = max([len(str(a).split('.')[1]) for a in coeffs.values()])
c=np.array([a for a in coeffs.values()])*10**n
factor = np.gcd.reduce(c.astype(np.uint64))
coeffs_asInt = (c/factor).astype(np.uint64)
print('as int:', coeffs_asInt)
coeffs_back = coeffs_asInt.astype(np.float64)*(factor/10**n)
coeffs_back_str = ["{0:.16g}".format(a) for a in coeffs_back]
print('back: ', coeffs_back_str)
print('########################################################\n')
output:
coefs: [0.3333333333333333, 0.2, 0.14285714285714285]
as int: [8333333333333333 5000000000000000 3571428571428571]
back: ['0.3333333333333334', '0.2', '0.1428571428571428']
########################################################
coefs: [0.2, 0.14285714285714285, 0.09090909090909091]
as int: [5000000000000000 3571428571428571 2272727272727273]
back: ['0.2', '0.1428571428571428', '0.09090909090909093']
########################################################
coefs: [0.14285714285714285, 0.09090909090909091, 0.07692307692307693]
as int: [14285714285714284 9090909090909092 7692307692307693]
back: ['0.1428571428571428', '0.09090909090909093', '0.07692307692307694']
########################################################
coefs: [0.09090909090909091, 0.07692307692307693, 0.058823529411764705]
as int: [2840909090909091 2403846153846154 1838235294117647]
back: ['0.09090909090909091', '0.07692307692307693', '0.05882352941176471']
########################################################
coefs: [0.07692307692307693, 0.058823529411764705, 0.05263157894736842]
as int: [2403846153846154 1838235294117647 1644736842105263]
back: ['0.07692307692307693', '0.05882352941176471', '0.05263157894736842']
########################################################
coefs: [0.058823529411764705, 0.05263157894736842, 0.043478260869565216]
as int: [1838235294117647 1644736842105263 1358695652173913]
back: ['0.05882352941176471', '0.05263157894736842', '0.04347826086956522']
########################################################
coefs: [0.05263157894736842, 0.043478260869565216, 0.034482758620689655]
as int: [6578947368421052 5434782608695652 4310344827586207]
back: ['0.05263157894736842', '0.04347826086956522', '0.03448275862068966']
########################################################
coefs: [0.043478260869565216, 0.034482758620689655, 0.012658227848101266]
as int: [21739130434782608 17241379310344828 6329113924050633]
back: ['0.04347826086956522', '0.03448275862068966', '0.01265822784810127']
########################################################
Related
I have a list containing 1,000,000 elements (numbers) called x and I would like to count how many of them are equal to or above [0.5,0.55,0.60,...,1]. Is there a way to do it without a for loop?
Right now I have the following the code, which works for a specific value of the [0.5,...1] interval, let's say 0.5 and assigns it to the count variable
count=len([i for i in x if i >= 0.5])
EDIT: Basically what I want to avoid is doing this... if possible?
obs=[]
alpha = [0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1]
for a in alpha:
count= len([i for i in x if i >= a])
obs.append(count)
Thanks in advance
Best, Mikael
I don't think it's possible without loop, but you can sort the array x and then you can use bisect module (doc) to locate insertion point (index).
For example:
x = [0.341, 0.423, 0.678, 0.999, 0.523, 0.751, 0.7]
alpha = [0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1]
x = sorted(x)
import bisect
obs = [len(x) - bisect.bisect_left(x, a) for a in alpha]
print(obs)
Will print:
[5, 4, 4, 4, 3, 2, 1, 1, 1, 1, 0]
Note:
sorted() has complexity n log(n) and bisect_left() log(n)
You can use numpy and boolean indexing:
>>> import numpy as np
>>> a = np.array(list(range(100)))
>>> a[a>=50].size
50
Even if you are not using for loop, internal methods use them. But iterates them efficiently.
you can use below function without for loop from your end.
x = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
l = list(filter(lambda _: _ > .5 , x))
print(l)
Based on comments, you're ok with using numpy, so use np.searchsorted to simply insert alpha into a sorted version of x. The indices will be your counts.
If you're ok with sorting x in-place:
x.sort()
counts = x.size - np.searchsorted(x, alpha)
If not,
counts = x.size - np.searchsorted(np.sort(x), alpha)
These counts assume that you want x < alpha. To get <= add the keyword side='right':
np.searchsorted(x, alpha, side='right')
PS
There are a couple of significant problems with the line
count = len([i for i in x if i >= 0.5])
First of all, you're creating a list of all the matching elements instead of just counting them. To count them do
count = sum(1 for i in x if i >= threshold)
Now the problem is that you are doing a linear pass through the entire array for each alpha, which is not necessary.
As I commented under #Andrej Kesely's answer, let's say we have N = len(x) and M = len(alpha). Your implementation is O(M * N) time complexity, while sorting gives you O((M + N) log N). For M << N (small alpha), your complexity is approximately O(N), which beats O(N log N). But for M ~= N, yours approaches O(N^2) vs my O(N log N).
EDIT: If you are using NumPy already, you can simply do this:
import numpy as np
# Make random data
np.random.seed(0)
x = np.random.binomial(n=20, p=0.5, size=1000000) / 20
bins = np.arange(0.55, 1.01, 0.05)
# One extra value for the upper bound of last bin
bins = np.append(bins, max(bins.max(), x.max()) + 1)
h, _ = np.histogram(x, bins)
result = np.cumsum(h)
print(result)
# [280645 354806 391658 406410 411048 412152 412356 412377 412378 412378]
If you are dealing with large arrays of numbers, you may considering using NumPy. But if you are using simple Python lists, you can do that for example like this:
def how_many_bigger(nums, mins):
# List of counts for each minimum
counts = [0] * len(mins)
# For each number
for n in nums:
# For each minimum
for i, m in enumerate(mins):
# Add 1 to the count if the number is greater than the current minimum
if n >= m:
counts[i] += 1
return counts
# Test
import random
# Make random data
random.seed(0)
nums = [random.random() for _ in range(1_000_000)]
# Make minimums
mins = [i / 100. for i in range(55, 101, 5)]
print(mins)
# [0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0]
count = how_many_bigger(nums, mins)
print(count)
# [449771, 399555, 349543, 299687, 249605, 199774, 149945, 99928, 49670, 0]
I have a method that is using the random package to generate a list with certain probability for example:
import random
seed = 30
rand = random.Random(seed)
options_list = [1, 2, 3, 4, 5, 6]
prob_weights = [0.1, 0.2, 0.1, 0.05, 0.02, 0.06]
result = rand.choices(option_list, prob_weights, k=4) # k will be <= len(option_list)
my problem is that result can hold two of the same item, and I want it to be unique.
I could make the k param much larger and then filter out the unique items but that seems like the wrong way to do that. I looked in the docs and I dont see that the choices function gets this kind of parameter.
Any ideas how to config random to return a list of unique items?
You can use np.random.choice, which allows you to assign probabilities associated with each entry and also to generate random samples without replacement. The probabilities however must add up to one, you'll have to divide the vector by its L^1-Norm. So here's how you could do it:
import numpy as np
options_list = np.array([1, 2, 3, 4, 5, 6])
prob_weights = np.array([0.1, 0.2, 0.1, 0.05, 0.02, 0.06])
prob_weights_scaled = prob_weights / sum(prob_weights)
some_length = 4
np.random.choice(a=options_list, size=some_length, replace=False, p=prob_weights_scaled)
Output
array([2, 1, 6, 3])
For arbitrary pair of 2D points in the plane, I want to break the connecting vector to parts specified by a precision factor. However I want it to always include the start and endpoint. As an extra feature I am expecting the segmenting from the end of the vector to the beginning would give me the same segmentation from the beginning to end(of course after a flipping) . As I can see, numpy.linspace naturally satisfies this condition except for the situations where
the precision is too big that it only consists of one point. Is there any built-in function to take care of this situation or any hints that I would be able to correct this behaviour?
import numpy as np
alpha = np.array([0,0])
beta = np.array([1,1])
alpha_beta_dist = np.linalg.norm(beta - alpha)
for i in range(10):
precision = np.random.random(1)
traversal = np.linspace(0.0, 1.0, num = alpha_beta_dist / float(precision))
traversal2 = np.fliplr([np.linspace(1.0, 0.0, num = alpha_beta_dist / float(precision))])
traversal2 = traversal2[0]
if (traversal != traversal2).all():
print 'precision: ', precision
print 'taversal: ', traversal
print 'taversal2: ', traversal2[0]
Make sure num is at least 2:
traversal = np.linspace(0.0, 1.0,
num=max(alpha_beta_dist/float(precision), 2))
np.linspace will return both endpoints (by default) unless num is less than 2:
In [23]: np.linspace(0, 1, num=0)
Out[23]: array([], dtype=float64)
In [24]: np.linspace(0, 1, num=1)
Out[24]: array([ 0.])
In [25]: np.linspace(0, 1, num=2)
Out[25]: array([ 0., 1.])
I have a numpy array with maximum value num. I would like to scale all the values in the array by newMaxValue/num so that the new maximum value of array is newMaxValue. I tried to convert the array to float and make the division afterwards but I cannot seem to divide and multiply it successfully. I always end up with a zero valued array.
What is the correct way of doing this?
Thanks
Make sure you convert the max to a float:
>>> from numpy import array
>>> a = array([1, 2, 3, 4, 5])
>>> new_max = 6
>>> a / max(a) # This is probably what happens to you
array([0, 0, 0, 0, 1])
>>> a / float(max(a)) # Convert that integer to a float and it'll work
array([ 0.2, 0.4, 0.6, 0.8, 1. ])
>>> a / float(max(a)) * new_max
array([ 1.2, 2.4, 3.6, 4.8, 6. ])
import numpy as np
newMax = 20
myarr = np.random.randint(10, size=(10,2))
newarr = (myarr/float(np.amax(myarr))*newMax
PS: post your code, you probably made a simple coding mistake.
I have an array of element probabilities, let's say [0.1, 0.2, 0.5, 0.2]. The array sums up to 1.0.
Using plain Python or numpy, I want to draw elements proportional to their probability: the first element about 10% of the time, second 20%, third 50% etc. The "draw" should return index of the element drawn.
I came up with this:
def draw(probs):
cumsum = numpy.cumsum(probs / sum(probs)) # sum up to 1.0, just in case
return len(numpy.where(numpy.random.rand() >= cumsum)[0])
It works, but it's too convoluted, there must be a better way. Thanks.
import numpy as np
def random_pick(choices, probs):
'''
>>> a = ['Hit', 'Out']
>>> b = [.3, .7]
>>> random_pick(a,b)
'''
cutoffs = np.cumsum(probs)
idx = cutoffs.searchsorted(np.random.uniform(0, cutoffs[-1]))
return choices[idx]
How it works:
In [22]: import numpy as np
In [23]: probs = [0.1, 0.2, 0.5, 0.2]
Compute the cumulative sum:
In [24]: cutoffs = np.cumsum(probs)
In [25]: cutoffs
Out[25]: array([ 0.1, 0.3, 0.8, 1. ])
Compute a uniformly distributed random number in the half-open interval [0, cutoffs[-1]):
In [26]: np.random.uniform(0, cutoffs[-1])
Out[26]: 0.9723114393023948
Use searchsorted to find the index where the random number would be inserted into cutoffs:
In [27]: cutoffs.searchsorted(0.9723114393023948)
Out[27]: 3
Return choices[idx], where idx is that index.
You want to sample from the categorical distribution, which is not implemented in numpy. However, the multinomial distribution is a generalization of the categorical distribution and can be used for that purpose.
>>> import numpy as np
>>>
>>> def sampleCategory(p):
... return np.flatnonzero( np.random.multinomial(1,p,1) )[0]
...
>>> sampleCategory( [0.1,0.5,0.4] )
1
use numpy.random.multinomial - most efficient
I've never used numpy, but I assume my code below (python only) does the same thing as what you accomplished in one line. I'm putting it here just in case you want it.
Looks very c-ish so apologies for not being very pythonic.
weight_total would be 1 for you.
def draw(probs)
r = random.randrange(weight_total)
running_total = 0
for i, p in enumerate(probs)
running_total += p
if running_total > r:
return i
use bisect
import bisect
import random
import numpy
def draw(probs):
cumsum=numpy.cumsum(probs/sum(probs))
return bisect.bisect_left(cumsum, numpy.random.rand())
should do the trick.