+= with numpy.array object modifying original object - python

In the following code, I am attempting to calculate both the frequency and sum of a set of vectors (numpy vectors)
def calculate_means_on(the_labels, the_data):
freq = dict();
sums = dict();
means = dict();
total = 0;
for index, a_label in enumerate(the_labels):
this_data = the_data[index];
if a_label not in freq:
freq[a_label] = 1;
sums[a_label] = this_data;
else:
freq[a_label] += 1;
sums[a_label] += this_data;
Suppose the_data (a numpy 'matrix') is originally :
[[ 1. 2. 4.]
[ 1. 2. 4.]
[ 2. 1. 1.]
[ 2. 1. 1.]
[ 1. 1. 1.]]
After running the above code, the_data becomes:
[[ 3. 6. 12.]
[ 1. 2. 4.]
[ 7. 4. 4.]
[ 2. 1. 1.]
[ 1. 1. 1.]]
Why is this? I've deduced it down to the line sums[a_label] += this_data; as when i change it to sums[a_label] = sums[a_label] + this_data; it behaves as expected; i.e., the_data is not modified.

This line:
this_data = the_data[index]
takes a view, not a copy, of a row of the_data. The view is backed by the original array, and mutating the view will write through to the original array.
This line:
sums[a_label] = this_data
inserts that view into the sums dict, and this line:
sums[a_label] += this_data
mutates the original array through the view, since += requests that the operation be performed by mutation instead of by creating a new object, when the object is mutable.

Related

adding values in a 2d array provided that the first value is greater than 5

there is such question, it seems elementary, but for some reason at me it does not turn out. I have the 2 d list, I need to add a line to a line so that the sum on the first number was not less than 5 (it is possible to sum up only the next lines). For example
array([[ 0. , 3.817549],
[ 3. , 21.275711],
[ 11. , 59.286198],
[ 47. , 110.136649],
[132. , 153.451585],
[263. , 171.041259],
[301. , 158.872652],
[198. , 126.488376],
[ 50. , 200.63002 ]])
and I need outpuut like this:
array([[ 14. , 84.3794...],
[ 47. , 110.136649],
[132. , 153.451585],
[263. , 171.041259],
[301. , 158.872652],
[198. , 126.488376],
[ 50. , 200.63002 ]])
Try:
arr = np.array([[ 0. , 3.817549],
[ 3. , 21.275711],
[ 11. , 59.286198],
[ 47. , 110.136649],
[132. , 153.451585],
[263. , 171.041259],
[301. , 158.872652],
[198. , 126.488376],
[ 50. , 200.63002 ]])
for i in range(len(arr)):
if arr[i, 0] >= 5.0:
arr = arr[i:, :]
break
else:
arr[i + 1, :] += arr[i, :]
I'm not entirely sure if I understand the question, but I will try to help.
I would approach this problem with the following steps:
Create a separate 2D list to store your final output and a two-value accumulator list to temporary store values. Initialize the accumulator to the values at index [0][] of your input array
Iterate over the values in the original 2D list
For each item:
a. if accumulator[0] >= 5, add the accumulated values to your output and then set the accumulator to the values at current_index + 1
b. otherwise, add the values at current_index + 1 to your accumulator
The following code was able to take your input and reproduce the exact ouput you wanted:
# Assuming current_vals is the input list...
final_vals = []
accumulator = [current_vals[0][0], current_vals[0][1]]
for sublist_index in range(1, len(current_vals) - 1):
if accumulator[0] >= 5:
final_vals.append([accumulator[0], accumulator[1]])
accumulator[0] = current_vals[sublist_index][0]
accumulator[1] = current_vals[sublist_index][1]
else:
accumulator[0] += current_vals[sublist_index][0]
accumulator[1] += current_vals[sublist_index][1]
return final_vals

Numpy method to return the index of the occurrence of an array within an array of arrays

I have an array of arrays that represents a set of unique colour values:
[[0. 0. 0. ]
[0. 0. 1. ]
[0. 1. 1. ]
[0.5019608 0.5019608 0.5019608 ]
[0.64705884 0.16470589 0.16470589]
[0.9607843 0.9607843 0.8627451 ]
[1. 0. 0. ]
[1. 0.84313726 0. ]
[1. 1. 0. ]
[1. 1. 1. ]]
And another numpy array that represents one of the colours:
[0.9607843 0.9607843 0.8627451 ]
I need a function to find the index where the colour array occurs in the set of colours, i.e. the function should return 5 for the arrays above.
numpy.where() returns you the exact positions in the array for values of given condition. So here, it would be as following (denoting big array as arr1, and the sought vector as arr2:
np.where(np.all(arr1 == arr2, axis=1))
Which then returns array of row indexes of sought rows.
Assuming that this is a relatively short list of colors (<1000), the simplest thing to do is probably just iterate over the list and compare each element of the sub-array.
color_list = ...
color_index = -1
target_color = [0.9607843, 0.9607843, 0.8627451]
for i in range(0, len(color_list)):
cur_color = color_list[i]
if (cur_color[0] == target_color[0] and cur_color[1] = target_color[1] and cur_color[2] = target_color[2]):
color_index = i
break

negative values in my probability vector

hi I wanna create a probability vector for my 2 Dimensional array. I wrote a function myself to iterate through the elements and calculate the probability of each value. When I only enter positive values everything works, but as soon as there is a negative number I create a negative probability, which shouldn't be possible as the value must be 0<=x<=1.
def createProbabilityVector(inputArray):
vector = inputArray
probabilityVector = np.zeros(vector.shape)
for x in range(vector.shape[0]):
vectorSum = sum(vector[x])
probabilityVector[[x]] = vector[[x]] / vectorSum
return probabilityVector
is the mistake in the code or do I simply fail to understand what I want to do?
edit: some examples
input
[[ 1.62242568 1.27356428 -1.88008155 1.37183247]
[-1.10638392 0.18420085 -1.68558966 -1.59951709]
[ 1.79166467 -0.21911691 -1.29066019 0.4565108 ]
[-0.20459109 1.59912774 0.47735207 1.6398782 ]]
output:
[[ 0.67948147 0.53337625 -0.78738927 0.57453155]
[ 0.26296832 -0.04378136 0.4006355 0.38017754]
[ 2.42642012 -0.2967462 -1.74791851 0.61824459]
[-0.05825873 0.45536272 0.13592931 0.4669667 ]]
-----
input
[[ 1.50162225 -0.31502279 -1.40281248 -1.09221922]
[ 1.93663826 1.31671237 -1.14334774 1.54792572]
[ 1.21376416 -1.44547074 0.0045907 1.4099986 ]
[ 0.51903455 -0.80046238 -1.69780354 -1.29893969]]
output:
[[-1.14764998 0.24076355 1.0721323 0.83475413]
[ 0.52943577 0.3599612 -0.31256699 0.42317002]
[ 1.02610693 -1.2219899 0.00388094 1.19200202]
[-0.15833053 0.24417956 0.51791182 0.39623914]]
-----
input
[[-1.6333837 -0.50469549 -1.62305585 -1.43558978]
[ 0.29636416 -0.22401163 -1.82816273 0.10676174]
[-1.6599302 -0.2516563 -1.64843802 -0.86857615]
[ 1.31762542 0.8690911 1.5888384 -1.83204102]]
output:
[[ 0.31431022 0.09711799 0.31232284 0.27624895]
[-0.17971828 0.13584296 1.10861674 -0.06474142]
[ 0.37482047 0.05682524 0.37222548 0.1961288 ]
[ 0.67796038 0.44717514 0.81750812 -0.94264364]]
-----
input
[[ 0.15369025 1.05426071 -0.61295255 0.95033555]
[ 0.04138761 -1.41072628 1.90319561 -1.2563338 ]
[ 1.85131197 -1.24551221 -1.62731374 0.43129381]
[ 0.21235188 1.21581691 -0.57470021 -0.58482563]]
output:
[[ 0.09945439 0.68222193 -0.3966473 0.61497099]
[-0.05728572 1.95262488 -2.63426518 1.73892602]
[-3.1366464 2.11025017 2.75713 -0.73073377]
[ 0.79046139 4.52577253 -2.13927148 -2.17696245]]
You need to transform all the values of the input array into positive values, a few alternatives are:
Convert all the negatives to 0, function zeroed
Shift all the values by the absolute value of the minimum element, function shifted
Apply the exponential function to the values, function exponential
After you have converted the values of the input array you can use your function as usual, follow the definition of the transformation functions:
def zeroed(arr):
return arr.clip(min=0)
def shifted(arr):
return arr + abs(np.min(arr))
def exponential(arr):
return np.exp(arr)
In your function you can use the transformation as follows:
def createProbabilityVector(inputArray):
vector = inputArray
probabilityVector = np.zeros(vector.shape)
for x in range(vector.shape[0]):
new_vector = zeroed(vector[x])
vectorSum = sum(new_vector)
probabilityVector[[x]] = new_vector / vectorSum
return probabilityVector
The function zeroed can be replace by shifted or exponential, for the input:
array = np.array([[1.62242568, 1.27356428, -1.88008155, 1.37183247],
[-1.10638392, 0.18420085, -1.68558966, -1.59951709],
[1.79166467, -0.21911691, -1.29066019, 0.4565108],
[-0.20459109, 1.59912774, 0.47735207, 1.6398782]])
These are the results for the function zeroed:
[[0.38015304 0.29841079 0. 0.32143616]
[0. 1. 0. 0. ]
[0.79694165 0. 0. 0.20305835]
[0. 0.43029432 0.1284462 0.44125948]]
for shifted:
[[0.35350056 0.31829072 0. 0.32820872]
[0.22847732 0.73756992 0. 0.03395275]
[0.52233595 0.18158552 0. 0.29607853]
[0. 0.41655061 0.15748787 0.42596152]]
and exponential:
[[0.39778013 0.28063027 0.01198184 0.30960776]
[0.17223667 0.62606504 0.09651165 0.10518664]
[0.69307072 0.09279107 0.03177905 0.18235916]
[0.06504215 0.39494808 0.12863496 0.41137482]]

Compare items of a list sequentially with another one then use one by one in VPython application

I have grid of objects in Vpython created by this code:
iX = [(x - pointW // 2) * sclFact for x in range(pointW)]
iY = [(x - pointH // 2) * sclFact for x in range(pointH)]
iYr = iY[::-1]
xy = list(itertools.product(iX,iYr,))
ixyz = np.array(list(itertools.product(iX,iYr,[-0.0])))
for element in ixyz:
cube = box(pos = element,
size=( .1, .1, .1 ),)
ixyz list print will look like this:
[[-0.5 0. -0. ]
[-0.5 -0.5 -0. ]
[ 0. 0. -0. ]
[ 0. -0.5 -0. ]
[ 0.5 0. -0. ]
[ 0.5 -0.5 -0. ]]
I have other list that the z value changes sometime depand on certain input and its always updated, it wll look like this
[[-0.5 0. -0. ]
[-0.5 -0.5 -0. ]
[ 0. 0. -0. ]
[ 0. -0.5 -0. ]
[ 0.5 0. -2.3570226]
[ 0.5 -0.5 -0. ]]
I want to move the objects based on the new list, i tried different veriation but it did not work, it always look at the last item in the second list
while True:
.... some code here (the one getting the new list)
...
...
# then I added this:
for obj in scene.objects:
if isinstance(obj, box):
for i in xyz: # xyz is the new list
if obj.pos != i:
obj.pos = i
this variation will make all the boxes be one box and move based on the last position in the list
what I am doing wrong or is there another way to do that ?
or should I change the whole process of creating the objects and move them?
I am really new in VPython and python itself.
Edit
I fixed both lists to be better presented like this
[(-0.5,0.0,-0.0),(-0.5,-0.5,-0.0),...(0.5,-0.5,-0.0)]
You are repeatedly setting the position to each element in the updated positions list:
box.pos = 1
box.pos = 2
box.pos = 3
You need to set the position one time; so compute an index:
i = 0
for obj....
if isinstance ...
obj.pos = xyz [i]
i += 1

Suggestions for faster for/if statements in my code?

My code takes about two hours to process. The bottleneck is in for loop and if
statements (see comment in code).
I'm beginner with python :) Can anyone recommend an efficient python way to replace the nested for and if statements?
I have tables of ~30 million rows, each row with (x,y,z) values:
20.0 11.3 7
21.0 11.3 0
22.0 11.3 3
...
My desired output is a table in the form x, y, min(z), count(min(z)). The last
column is a final count of the least z values at that (x,y). Eg:
20.0 11.3 7 7
21.0 11.3 0 10
22.0 11.3 3 1
...
There's only about 600 unique coordinates, so the output table will be 600x4.
My code:
import numpy as np
file = open('input.txt','r');
coordset = set()
data = np.zeros((600,4))*np.nan
irow = 0
ctr = 0
for row in file:
item = row.split()
x = float(item[0])
y = float(item[1])
z = float(item[2])
# build unique grid of coords
if ((x,y)) not in coordset:
data[irow][0] = x
data[irow][1] = y
data[irow][2] = z
irow = irow + 1 # grows up to 599
# lookup table of unique coords
coordset.add((x,y))
# BOTTLENECK. replace ifs? for?
for i in range(0, irow):
if data[i][0]==x and data[i][1]==y:
if z > data[i][2]:
continue
elif z==data[i][2]:
ctr = ctr + 1
data[i][3]=ctr
if z < data[i][2]:
data[i][2] = z
ctr = 1
data[i][3]=ctr
edit: For reference the approach by #Joowani computes in 1m26s. My original approach, same computer, same datafile, 106m23s.
edit2: #Ophion and #Sibster thanks for suggestions, I don't have enough credit to +1 useful answers.
Your solution seems slow because it iterates through the list (i.e. data) every time you make an update. A better approach would be using a dictionary, which takes O(1) as opposed to O(n) per update.
Here would be my solution using a dictionary:
file = open('input.txt', 'r')
#coordinates
c = {}
for line in file:
#items
(x, y, z) = (float(n) for n in line.split())
if (x, y) not in c:
c[(x, y)] = [z, 1]
elif c[(x, y)][0] > z:
c[(x, y)][0], c[(x, y)][1] = z, 1
elif c[(x, y)][0] == z:
c[(x, y)][1] += 1
for key in c:
print("{} {} {} {}".format(key[0], key[1], c[key][0], c[key][1]))
Why not change the last if to an elif ?
Like it is done now you will evaluate the z < data[i][2]: every iteration of the loop.
You could even just replace it with an else since you have already checked if z>data[i][2] and z == data[i][2] so the only remaining possibility is z < data[i][2]:
So following code will do the same and should be faster :
if z > data[i][2]:
continue
elif z==data[i][2]:
ctr = ctr + 1
data[i][3]=ctr
else:
data[i][2] = z
ctr = 1
data[i][3]=ctr
To do this in numpy use np.unique.
def count_unique(arr):
row_view=np.ascontiguousarray(a).view(np.dtype((np.void,a.dtype.itemsize * a.shape[1])))
ua, uind = np.unique(row_view,return_inverse=True)
unique_rows = ua.view(a.dtype).reshape(ua.shape + (-1,))
count=np.bincount(uind)
return np.hstack((unique_rows,count[:,None]))
First lets check for a small array:
a=np.random.rand(10,3)
a=np.around(a,0)
print a
[[ 0. 0. 0.]
[ 0. 1. 1.]
[ 0. 1. 0.]
[ 1. 0. 0.]
[ 0. 1. 1.]
[ 1. 1. 0.]
[ 1. 0. 1.]
[ 1. 0. 1.]
[ 1. 0. 0.]
[ 0. 0. 0.]]
print output
[[ 0. 0. 0. 2.]
[ 0. 1. 0. 1.]
[ 0. 1. 1. 2.]
[ 1. 0. 0. 2.]
[ 1. 0. 1. 2.]
[ 1. 1. 0. 1.]]
print np.sum(output[:,-1])
10
Looks good! Now lets check for a large array:
a=np.random.rand(3E7,3)
a=np.around(a,1)
output=count_unique(a)
print output.shape
(1331, 4) #Close as I can get to 600 unique elements.
print np.sum(output[:,-1])
30000000.0
Takes about 33 second on my machine and 3GB of memory, doing this all in memory for large arrays will likely be your bottleneck. For a reference #Joowani's solution took about 130 seconds, although this is a bit of an apple and oranges comparison as we start with a numpy array. Your milage may vary.
To read in the data as a numpy array I would view the question here, but it should look something like the following:
arr=np.genfromtxt("./input.txt", delimiter=" ")
Loading in that much data from a txt file I would really recommend using the pandas example in that link.

Categories