So I have a script that reads multiple values from a file, and creates lists with these values.
I have a list of fractional coordinates of atoms (where each element is a list containing x, y, z coords) and their corresponding charges in another list. I also have three values that are scalars and correspond to the dimensions I am working in.
Here is a snippet of how the lists look:
Coords = [[0.982309, 0.927798, 0.458125], [0.017691, 0.072202, 0.958125], [0.482309, 0.572202, 0.458125], [0.517691, 0.427798, 0.958125], [0.878457, 0.311996, 0.227878], [0.121543, 0.688004, 0.727878], [0.378457, 0.188004, 0.227878], [0.621543, 0.811996, 0.727878], [0.586004, 0.178088, 0.37778], [0.413997, 0.821912, 0.87778], [0.086003, 0.321912, 0.37778], ......]
Charges = [0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.39, 0.39, 0.39, 0.39, 0.4, 0.4, 0.4, 0.4, 0.17, 0.17, 0.17....]
Then I have the three dimension values that I will call a, b, c.
Here is what I need to calculate:
I need to do an inner (dot) product of the fractional coordinates by the dimensions, for each atom. Of this sub-list, I need to multiply the first element by a, the second element by b and the third by c. I also need to multiply each of these components by the corresponding charge. This will give me the dipole moments, that I need to finally calculate.
Detailed example:
So I want to take each element in the list. We'll start with element 0.
So coords[0] = [0.982309, 0.927798, 0.458125], and charge[0] = 0.18
What I want to do, is take the first element of coords[0], multiply that by a. Then the second by b, and the third by c. Then, I want to multiply each of these elements by charges[0], 0.18, and sum the three together. That gives me the dipole moment due to one atom. I want to repeat this for every atom, so coords[1] with charges[1], coords[2] with charges[2] and so forth. Then, I want to sum all of these together, and divide by a * b * c to give the polarization.
I'm still new to using Python, and so I am quite unsure on even where to start with this! I hope I have explained well enough, but if not I can clarify where needed.
Side note:
I also then need to change some of these fractional coordinates and repeat the calculation. I am fairly certain I know how to do this, here's how it would look.
for x in displacement
fracCoord[0]=fracCoord[0]+x
So I would change the relevant values in the fractional coordinates list by constant amount before repeating the calculations.
You could use the numpy package, it is designed for such (and more complex) numerical computations based on matrices and arrays.
Then, you have:
import numpy as np
# Assume you have N atoms.
# N x 3 matrix where each row is the coordinates of an atom
CoordsMat = np.array(Coords)
# N x 1 vector where each value is charge of atom
ChargesVec = np.array(Charges)
# 3 x 1 vector, the weights for each coordinate
dims = np.array([a, b, c])
# N x 1 vector, computes dot product for each atom's coordinates with the weights
positionVecs = np.matmul(CoordsMat, dims)
# N x 1 vector, scales each dot product by the charges vector
dipoleMoments = np.multiply(positionVecs, ChargesVec)
Related
here is the problem that I have to solve in Python.
I have to create an array of random numbers that add up to 1. But, there are a few conditions to consider.
The number of elements in the array are fixed. For example let's take into account a list of size 7.
There are certain elements in this array that needs to be higher or lower than a certain number. Let's say that the second element needs to be higher than 0.4, third element needs to be smaller than 0.2, seventh element needs to be lower than 0.1.
After steps 1 and 2, the sum of the list should be 1.
I do not intend to select the results that satisfy these conditions, instead I want to create generic arrays for each loop that already satisfies the conditions.
The script will tell me how long the list should be and which elements needs to be higher or lower than certain values. The information I can use is an array that looks like this:
higher_than = 0.4
lower_than_1 = 0.2
lower_than_2 = 0.1
array = [1, 2, 0, 1, 1, 1, 0]
Here 2 means that the resulting weight should be higher than "higher_than". The first 0 means that that element in the result should be lower than "lower_than_1", the second 0 means that that element in the result should be lower than "lower_than_2". I will then use this info to come up with a solution to the problem stated above.
I would very much like to hear your insights and solutions to the problem. Thank you in advance.
There are lots of ways you could do this. Here's one.
Start with you're higher_than buckets. If you have more than 2 then there's no solution. Otherwise allocate the floor to each of them (0.4 in your example).
Next, treat each bucket as having a cap. The lower-thans are given. For the higher_thans, use 1.0 less the higher_than floor (so 0.6 here).
Record rand() * cap for each bucket. Rescale these to equal the capacity remaining after the initial allocation to the higher_than buckets, and complete the allocation.
E.g., say we have two buckets with a cap of 0.1, two with a cap of 0.2, and one with a floor of 0.4.
1. allocate weight to the higher_than bucket.
weights = [0.0, 0.0, 0.0, 0.0, 0.4] (buckets in order I listed them)
2. rand() * cap:
unscaled allocation = [0.73 * 0.1, 0.24 * 0.1, 0.34 * 0.2, 0.87 * 0.2, 0.33 * 0.6] = [0.073, 0.024, 0.068, 0.174, 0.198]
Now the scaling is a bit tricky because the naive approach, scaling everything by the ratio of the desired to current allocation, risks exceeding a lower_than cap.
Handle this by applying the scaling factor to the array of caps - unscaled allocation. I.e. [0.1 - 0.073, 0.1 - 0.024, 0.2 - 0.068, 0.2 - 0.174, 0.6 - 0.198] = [0.027, 0.076, 0.132, 0.026, 0.402]
We want to use this to scale the unscaled allocation array so that it sums to 0.6. Currently it sums to 0.537, and the caps_less_allocation array sums to 0.663. We need to boost our allocation by 0.6 - 0.537 = 0.063. So we multiply everything in the caps_less_allocation array by 0.063/0.663, then sum up our three arrays:
[0.0, 0.0, 0.0, 0.0, 0.4] - initial weight array
[0.073, 0.024, 0.068, 0.174, 0.198] - unscaled allocation array
[0.00257, 0.00722, 0.01254, 0.00247, 0.03820] - additive scaling factors (rounded)
---------------------------------------------------------------------
[0.07557, 0.03122, 0.08054, 0.17647, 0.63620]
Now we have a random array that meets our constraints and sums to 1.0
I have a python NxN numpy pair-wise array (matrix) of double values. Each array element of e.g., (i,j), is a measurement between the i and j item. The diagonal, where i==j, is 1 as it's a pairwise measurement of itself. This also means that the 2D NxN numpy array can be represented in matrix triangular form (one half of the numpy array identical to the other half across the diagonal).
A truncated representation:
[[1. 0.11428571 0.04615385 ... 0.13888889 0.07954545 0.05494505]
[0.11428571 1. 0.09836066 ... 0.06578947 0.09302326 0.07954545]
[0.04615385 0.09836066 1. ... 0.07843137 0.09821429 0.11711712]
...
[0.13888889 0.06578947 0.07843137 ... 1. 0.34313725 0.31428571]
[0.07954545 0.09302326 0.09821429 ... 0.34313725 1. 0.64130435]
[0.05494505 0.07954545 0.11711712 ... 0.31428571 0.64130435 1. ]]
I want to get out the smallest N values whilst not including the pairwise values twice, as would be the case due to the pair-wise duplication e.g., (5,6) == (6,5), and I do not want to include any of the identical diagonal values of 1 where i == j.
I understand that numpy has the partition method and I've seen plenty of examples for a flat array, but I'm struggling to find anything straightforward for a pair-wise comparison matrix.
EDIT #1
Based on my first response below I implemented:
seventyPercentInt: int = round((populationSizeInt/100)*70)
upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]
seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]
print(len(np.unique(seventyPercentArray)))
The upperTriangleArray numpy array has 1133265 elements to pick the lowest k from. In this case k is represented by seventyPercentInt, which is around 1054 values. However, when I apply np.argpartition only the value of 0 is returned.
The flat array upperTriangleArray is reduced to a shape (1133265,).
SOLUTION
As per the first reply below (the accepted answer), my code that worked:
upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]
seventyPercentInt: int = round((len(upperTriangleArray)/100)*70)
seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]
I ran into some slight trouble (my own making), with the seventyPercentInt. Rather than taking 70% of the pairwise elements, I took 70% of the elements to be compared. Two very different values.
You can use np.triu_indices to keep only the values of the upper triangle.
Then you can use np.argpartition as in the example below.
import numpy as np
A = np.array([[1.0, 0.1, 0.2, 0.3],
[0.1, 1.0, 0.4, 0.5],
[0.2, 0.3, 1.0, 0.6],
[0.3, 0.5, 0.4, 1.0]])
A_upper_triangle = A[np.triu_indices(len(A), 1)]
print(A_upper_triangle)
# return [0.1 0.2 0.3 0.3 0.5 0.4]
k=2
print(A_upper_triangle[np.argpartition(A_upper_triangle, k)][0:k])
#return [0.1 0.2]
I am trying to take the reciprocal of every non zero value in a numpy array but am messing something up. Suppose:
norm = np.arange(0,11)
I would like the np.array that would be (maintaining the zeros in place)
[ 0, 1, 0.5 , 0.33, 0.25, 0.2 , 0.17, 0.14, 0.12, 0.11, 0.1]
If I set
mask = norm !=0
and I try
1/norm[mask]
I receive the expected result of
[1, 0.5 , 0.33, 0.25, 0.2 , 0.17, 0.14, 0.12, 0.11, 0.1]
However I'm trying to understand why is it that when I try the following assignment
norm[mask] = 1/norm[mask]
i get the following numpy array.
[0,1,0,0,0,0,0,0,0,0,0]
any ideas on why this is or how to achieve the desired np.array?
Are you sure you didn't accidentally change the value of norm.
Both
mask = norm != 0
norm[mask] = 1 / norm[mask]
and
norm[norm != 0] = 1 / norm[norm != 0]
both do exactly what you want them to do. I also tried it using mask on the left side and norm != 0 on the right side like you do above (why?) and it works fine.
EDIT BY FY: I misread the example. I thought original poster was starting with [0, .5, .333, .25] rather than with [0, 1, 2, 3, 4]. Poster is accidentally creating an int64 array rather than a floating point array, and everything is rounding down to zero. Change it to np.arange(0., 11.)
another option is using numpy.reciprocal as documented here with a parameter where as followed:
import numpy as np
data = np.reciprocal(data,where= data!=0)
example:
In[1]: data = np.array([2.0,4.0,0.0])
in[2]: np.reciprocal(data,where=data!=0)
Out[9]: array([0.5 , 0.25, 0. ])
notice that this function is not intended to work with ints, therefore the initialized values are with the .0 suffix.
if you're not sure of the type, you can always use data.astype(float64)
Here's a custom function that allows stepping through decimal increments:
def my_range(start, stop, step):
i = start
while i < stop:
yield i
i += step
It works like this:
out = list(my_range(0, 1, 0.1))
print(out)
[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999]
Now, there's nothing surprising about this. It's understandable this happens because of floating point inaccuracies and that 0.1 has no exact representation in memory. So, those precision errors are understandable.
Take numpy on the other hand:
import numpy as np
out = np.arange(0, 1, 0.1)
print(out)
array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
What's interesting is that there are no visible imprecision accuracies introduced here. I thought this might have to do with what the __repr__ shows, so to confirm, I tried this:
x = list(my_range(0, 1.1, 0.1))[-1]
print(x.is_integer())
False
x = list(np.arange(0, 1.1, 0.1))[-1]
print(x.is_integer())
True
So, my function returns an incorrect upper value (it should be 1.0 but it is actually 1.0999999999999999), but np.arange does it correctly.
I'm aware of Is floating point math broken? but the point of this question is:
How does numpy do this?
The difference in endpoints is because NumPy calculates the length up front instead of ad hoc, because it needs to preallocate the array. You can see this in the _calc_length helper. Instead of stopping when it hits the end argument, it stops when it hits the predetermined length.
Calculating the length up front doesn't save you from the problems of a non-integer step, and you'll frequently get the "wrong" endpoint anyway, for example, with numpy.arange(0.0, 2.1, 0.3):
In [46]: numpy.arange(0.0, 2.1, 0.3)
Out[46]: array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1])
It's much safer to use numpy.linspace, where instead of the step size, you say how many elements you want and whether you want to include the right endpoint.
It might look like NumPy has suffered no rounding error when calculating the elements, but that's just due to different display logic. NumPy is truncating the displayed precision more aggressively than float.__repr__ does. If you use tolist to get an ordinary list of ordinary Python scalars (and thus the ordinary float display logic), you can see that NumPy has also suffered rounding error:
In [47]: numpy.arange(0, 1, 0.1).tolist()
Out[47]:
[0.0,
0.1,
0.2,
0.30000000000000004,
0.4,
0.5,
0.6000000000000001,
0.7000000000000001,
0.8,
0.9]
It's suffered slightly different rounding error - for example, in .6 and .7 instead of .8 and .9 - because it also uses a different means of computing the elements, implemented in the fill function for the relevant dtype.
The fill function implementation has the advantage that it uses start + i*step instead of repeatedly adding the step, which avoids accumulating error on each addition. However, it has the disadvantage that (for no compelling reason I can see) it recomputes the step from the first two elements instead of taking the step as an argument, so it can lose a great deal of precision in the step up front.
While arange does step through the range in a slightly different way, it still has the float representation issue:
In [1358]: np.arange(0,1,0.1)
Out[1358]: array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
The print hides that; convert it to a list to see the gory details:
In [1359]: np.arange(0,1,0.1).tolist()
Out[1359]:
[0.0,
0.1,
0.2,
0.30000000000000004,
0.4,
0.5,
0.6000000000000001,
0.7000000000000001,
0.8,
0.9]
or with another iteration
In [1360]: [i for i in np.arange(0,1,0.1)] # e.g. list(np.arange(...))
Out[1360]:
[0.0,
0.10000000000000001,
0.20000000000000001,
0.30000000000000004,
0.40000000000000002,
0.5,
0.60000000000000009,
0.70000000000000007,
0.80000000000000004,
0.90000000000000002]
In this case each displayed item is a np.float64, where as in the first each is float.
Aside from the different representation of lists and arrays NumPys arange works by multiplying instead of repeated adding. It's more like:
def my_range2(start, stop, step):
i = 0
while start+(i*step) < stop:
yield start+(i*step)
i += 1
Then the output is completely equal:
>>> np.arange(0, 1, 0.1).tolist() == list(my_range2(0, 1, 0.1))
True
With repeated addition you would "accumulate" floating point rounding errors. The multiplication is still affected by rounding but the error doesn't accumulate.
As pointed out in the comments it's not really what is happening. As far as I see it it's more like:
def my_range2(start, stop, step):
length = math.ceil((stop-start)/step)
# The next two lines are mostly so the function really behaves like NumPy does
# Remove them to get better accuracy...
next = start + step
step = next - start
for i in range(length):
yield start+(i*step)
But not sure if that's exactly right either because there's a lot more going on in NumPy.
I am having problem fitting the following data to a range of 0.1-1.0:
t=[0.23,0.76,0.12]
Obviously each item in the t-list falls within the range 0.1-1.0, but the output of my code indicates the opposite.
My attempt
import numpy as np
>>> g=np.arange(0.1,1.0,0.1)
>>> t=[0.23,0.76,0.12]
>>> t2=[x for x in t if x in g]
>>> t2
[]
Desired output:[0.23,0.76,0.12]
I clearly understand that using an interval of 0.1 will make it difficult to find any of the t-list items in the specified arange. Could have made some adjustment but my range is fixed and my data is large which makes it practically impossible to keep adjust the range.
Any suggestions on how to get around this? thanks
Did you try to inspect g?
>>> g
array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
So clearly none of your elements is in g.
Probably, you look for something like
>>> [x for x in t if 0.1<=x<=1.0]
[0.23, 0.76, 0.12]