Related
I have the following list or numpy array
ll=[7.2,0,0,0,0,0,6.5,0,0,-8.1,0,0,0,0]
and an additional list indicating the positions of non-zeros
i=[0,6,9]
I would like to make two new lists out of them, one filling the zeros and one counting in between, for this short example:
a=[7.2,7.2,7.2,7.2,7.2,7.2,6.5,6.5,6.5,-8.1,-8.1,-8.1,-8.1,-8.1]
b=[0,1,2,3,4,5,0,1,2,0,1,2,3,4]
Is therea a way to do that without a for loop to speed up things, as the list ll is quite long in my case.
Array a is the result of a forward fill and array b are indices associated with the range between each consecutive non-zero element.
pandas has a forward fill function, but it should be easy enough to compute with numpy and there are many sources on how to do this.
ll=[7.2,0,0,0,0,0,6.5,0,0,-8.1,0,0,0,0]
a = np.array(ll)
# find zero elements and associated index
mask = a == 0
idx = np.where(~mask, np.arange(mask.size), False)
# do the fill
a[np.maximum.accumulate(idx)]
output:
array([ 7.2, 7.2, 7.2, 7.2, 7.2, 7.2, 6.5, 6.5, 6.5, -8.1, -8.1,
-8.1, -8.1, -8.1])
More information about forward fill is found here:
Most efficient way to forward-fill NaN values in numpy array
Finding the consecutive zeros in a numpy array
Computing array b you could use the forward fill mask and combine it with a single np.arange:
fill_mask = np.maximum.accumulate(idx)
np.arange(len(fill_mask)) - fill_mask
output:
array([0, 1, 2, 3, 4, 5, 0, 1, 2, 0, 1, 2, 3, 4])
So...
import numpy as np
ll = np.array([7.2, 0, 0, 0, 0, 0, 6.5, 0, 0, -8.1, 0, 0, 0, 0])
i = np.array([0, 6, 9])
counts = np.append(
np.diff(i), # difference between each element in i
# (i element shorter than i)
len(ll) - i[-1], # + length of last repeat
)
repeated = np.repeat(ll[i], counts)
repeated becomes
[ 7.2 7.2 7.2 7.2 7.2 7.2 6.5 6.5 6.5 -8.1 -8.1 -8.1 -8.1 -8.1]
b could be computed with
b = np.concatenate([np.arange(c) for c in counts])
print(b)
# [0 1 2 3 4 5 0 1 2 0 1 2 3 4]
but that involves a loop in the form of that list comprehension; perhaps someone Numpyier could implement it without a Python loop.
I have created a vector of zeros called Qc_vector (18 rows x 1 column).
I have created another vector called s_vector (6 rows x 1 column) that is generated each time by a for loop within the range ingreso_datos, that is, for this example it is generated 5 times.
I have also created a list called indices that is generated for each iteration of the loop, these indices tell me the row number to which I should index the values from s_vector to Qc_vector
PROBLEM
When trying to do this I get the following error: ValueError: shape mismatch: value array of shape (6,) could not be broadcast to indexing result of shape (6,1)
For element 6 of the matrix ingreso_datos, the indices are: [1,2,3,4,5,6]
For the end of the loop, that is, for element number 5 s_vector it looks like this:
s_vector for element 5
Qc_vector indexed, how it should look
import numpy as np
# Element 1(i) 2(i) 3(i) 1(j) 2(j) 3(j) x(i) y(i) x(j) y(j) | W(kg/m) Axis(kg/m)
# [Col0] [Col1] [Col2] [Col3] [Col4] [Col5] [Col6] [Col7] [Col8] [Col9] [Col10] | [Col11] [Col12]
ingreso_datos = [[ 1, 13, 14, 15, 7, 8, 9, 0, 0, 0, 2.5, 0, 0],
[ 2, 16, 17, 18, 10, 11, 12, 4.5, 0, 4.5, 2.5, 0, 0],
[ 3, 7, 8, 9, 1, 2, 3, 4.5, 0, 4.5, 2.5, 0, 0],
[ 4, 10, 11, 12, 4, 5, 6, 4.5, 0, 4.5, 2.5, 0, 0],
[ 5, 7, 8, 9, 10, 11, 12, 4.5, 0, 4.5, 2.5, -2200, 0]]
Qc_vector = np.zeros((12,1)) # Vector de zeros
for i in range(len(ingreso_datos)):
indices = []
indices.append([ingreso_datos[i][0], ingreso_datos[i][1], ingreso_datos[i][2], ingreso_datos[i][3],
ingreso_datos[i][4], ingreso_datos[i][5], ingreso_datos[i][6]])
for row in indices:
indices = np.array(row[1:])
L = np.sqrt((ingreso_datos[i][9]-ingreso_datos[i][7])**2+(ingreso_datos[i][10]-ingreso_datos[i][8])**2)
lx = (ingreso_datos[i][9]-ingreso_datos[i][7])/L
ly = (ingreso_datos[i][10]-ingreso_datos[i][8])/L
w = ingreso_datos[i][11]
ad = ingreso_datos[i][12]
s_vector = np.array([ad*L/2, w*L/2, (w*L**2)/12, ad*L/2, w*L/2, (-w*L**2)/12]) # s_vector
Qc_vector[np.ix_(indices)] = s_vector # Indexing
Qc_vector is (18,1).
indices = [ingreso_datos[i][0], ingreso_datos[i][1], ingreso_datos[i][2], ingreso_datos[i][3], ingreso_datos[i][4], ingreso_datos[i][5], ingreso_datos[i][6]])
or simply:
indices = [ingreso_datos[i,[0,1,2,3,4,5,6]]]
followed by:
for row in indices:
indices = np.array(row[1:])
which is just
ingreso_datos[i,[1,2,3,4,5,6]]
s_vector is a 6 element array, shape (6,)
In:
Qc_vector[np.ix_(indices)] = s_vector
you don't need ix_. In my previous answer I suggested:
master_matrix[np.ix_(indices,indices)] ==little_matrix
as a way of doing the indexing for all rows, not just one at a time.
I think your assignment can be simplified to
Qc_vector[indices, 0] = s_vector
That way there's a shape (6,) array on both sides.
I have a feeling you are still trying to write this code by copying other people's code, without understanding what is happening, or why they suggest things.
or define Qc_vector with shape (18,) rather than (18,1).
A quick fix if you don't want to bother too much would be to use numpy.reshape().
This way you can manage the shape mismatch.
Simple problem, but I cannot seem to get it to work. I want to calculate the percentage a number occurs in a list of arrays and output this percentage accordingly.
I have a list of arrays which looks like this:
import numpy as np
# Create some data
listvalues = []
arr1 = np.array([0, 0, 2])
arr2 = np.array([1, 1, 2, 2])
arr3 = np.array([0, 2, 2])
listvalues.append(arr1)
listvalues.append(arr2)
listvalues.append(arr3)
listvalues
>[array([0, 0, 2]), array([1, 1, 2, 2]), array([0, 2, 2])]
Now I count the occurrences using collections, which returns a a list of collections.Counter:
import collections
counter = []
for i in xrange(len(listvalues)):
counter.append(collections.Counter(listvalues[i]))
counter
>[Counter({0: 2, 2: 1}), Counter({1: 2, 2: 2}), Counter({0: 1, 2: 2})]
The result I am looking for is an array with 3 columns, representing the value 0 to 2 and len(listvalues) of rows. Each cell should be filled with the percentage of that value occurring in the array:
# Result
66.66 0 33.33
0 50 50
33.33 0 66.66
So 0 occurs 66.66% in array 1, 0% in array 2 and 33.33% in array 3, and so on..
What would be the best way to achieve this?
Many thanks!
Here's an approach -
# Get lengths of each element in input list
lens = np.array([len(item) for item in listvalues])
# Form group ID array to ID elements in flattened listvalues
ID_arr = np.repeat(np.arange(len(lens)),lens)
# Extract all values & considering each row as an indexing perform counting
vals = np.concatenate(listvalues)
out_shp = [ID_arr.max()+1,vals.max()+1]
counts = np.bincount(ID_arr*out_shp[1] + vals)
# Finally get the percentages with dividing by group counts
out = 100*np.true_divide(counts.reshape(out_shp),lens[:,None])
Sample run with an additional fourth array in input list -
In [316]: listvalues
Out[316]: [array([0, 0, 2]),array([1, 1, 2, 2]),array([0, 2, 2]),array([4, 0, 1])]
In [317]: print out
[[ 66.66666667 0. 33.33333333 0. 0. ]
[ 0. 50. 50. 0. 0. ]
[ 33.33333333 0. 66.66666667 0. 0. ]
[ 33.33333333 33.33333333 0. 0. 33.33333333]]
The numpy_indexed package has a utility function for this, called count_table, which can be used to solve your problem efficiently as such:
import numpy_indexed as npi
arrs = [arr1, arr2, arr3]
idx = [np.ones(len(a))*i for i, a in enumerate(arrs)]
(rows, cols), table = npi.count_table(np.concatenate(idx), np.concatenate(arrs))
table = table / table.sum(axis=1, keepdims=True)
print(table * 100)
You can get a list of all values and then simply iterate over the individual arrays to get the percentages:
values = set([y for row in listvalues for y in row])
print [[(a==x).sum()*100.0/len(a) for x in values] for a in listvalues]
You can create a list with the percentages with the following code :
percentage_list = [((counter[i].get(j) if counter[i].get(j) else 0)*10000)//len(listvalues[i])/100.0 for i in range(len(listvalues)) for j in range(3)]
After that, create a np array from that list :
results = np.array(percentage_list)
Reshape it so we have the good result :
results = results.reshape(3,3)
This should allow you to get what you wanted.
This is most likely not efficient, and not the best way to do this, but it has the merit of working.
Do not hesitate if you have any question.
I would like to use functional-paradigm to resolve this problem. For example:
>>> import numpy as np
>>> import pprint
>>>
>>> arr1 = np.array([0, 0, 2])
>>> arr2 = np.array([1, 1, 2, 2])
>>> arr3 = np.array([0, 2, 2])
>>>
>>> arrays = (arr1, arr2, arr3)
>>>
>>> u = np.unique(np.hstack(arrays))
>>>
>>> result = [[1.0 * c.get(uk, 0) / l
... for l, c in ((len(arr), dict(zip(*np.unique(arr, return_counts=True))))
... for arr in arrays)] for uk in u]
>>>
>>> pprint.pprint(result)
[[0.6666666666666666, 0.0, 0.3333333333333333],
[0.0, 0.5, 0.0],
[0.3333333333333333, 0.5, 0.6666666666666666]]
I am rather new to programming, so I apologise if this is a classic and trivial question. I have a 100x100 2D array of values which is plotted by means of matplotlib. In this image, each cell has its value (ranging 0.0 to 1.0) and ID (ranging 0 to 9999 starting from the upper left corner). I want to sample the matrix by using a 2x2 moving window which produces two dictionaries:
1st dictionary: the key represents the intersection of 4 cells; the value represents the tuple with the IDs of the 4 neighboring cells (see image below - the intersection is represented by "N");
2nd dictionary: the key represents the intersection of 4 cells; the value represents the mean value of the 4 neighboring cells (see image below).
In the example below (upper left panel), where N has ID=0, the 1st dictionary would yield
{'0': (0,1,100,101)} since the cells are numbered 0 to 99 toward the right hand side and 0 to 9900, step=100, downward. The 2nd dictionary would yield {'0': 0.775}, as 0.775 is the average value of the 4 neighboring cells of N. Of course, these dictionaries must have as many keys as "intersections" I have on the 2D array.
How can this be accomplished? And are dictionaries the best "tool" in this case? Thank you guys!
PS:
I tried my own way but my code is incomplete, wrong, and I cannot get my head around it:
a=... #The 2D array which contains the cell values ranging 0.0 to 1.0
neigh=numpy.zeros(4)
mean_neigh=numpy.zeros(10000/4)
for k in range(len(neigh)):
for i in a.shape[0]:
for j in a.shape[1]:
neigh[k]=a[i][j]
...
Well, dictionaries may in fact be the way in your case.
Are you sure that the numpy.array format you're using is correct? I don't find any array((int, int)) form in the API. anyway...
What to do once you have your 2D array declared
To make things ordered, let's make two functions that will work with any square 2D array, returning the two dictionaries that you need:
#this is the one that returns the first dictionary
def dictionarize1(array):
dict1 = {}
count = 0
for x in range(len(array[0]) - 1) :
for y in range(len(array[0]) - 1):
dict1[count] = [array[x][y], array[x][y+1], array[x+1][y], array[x + 1][y+1]]
count = count + 1
return dict1
def dictionarize2(array):
dict2 = {}
counter = 0
for a in range(len(array[0]) - 1) :
for b in range(len(array[0]) - 1):
dict2[counter] = (array[a][b] + array[a][b+1] + array[a+1][b] + array[a + 1][b+1])/4
counter = counter + 1
return dict2
#here's a little trial code to see them working
eighties = [[2.0, 2.2, 2.6, 5.7, 4.7], [2.1, 2.3, 2.3, 5.8, 1.6], [2.0, 2.2, 2.6, 5.7, 4.7],[2.0, 2.2, 2.6, 5.7, 4.7],[2.0, 2.2, 2.6, 5.7, 4.7]]
print("Dictionarize1: \n")
print(dictionarize1(eighties))
print("\n\n")
print("Dictionarize2: \n")
print(dictionarize2(eighties))
print("\n\n")
Compared to the first code, i prefered using an integer as a key cause python will print the dictionary sorted in that case (dictionaries are by definition unsorted, but if they have int keys Python will print them out sorted by key). However, you can change it back to a string just using str(count) as I did before.
I hope this will help, now I'm not very practical with math libraries, but the code that I wrote should work well with any 2D square array that you may want to put as an input!
Let's say data is the original numpy.array with dimension dr and dc for rows and columns.
dr = data.shape[0]
dc = data.shape[1]
You could produce Keys as a function that return indices of interest and Values as a list with computed mean of 4 neighbouring cells.
In that case, Keys is equal to:
def Keys(x):
xmod = x + (x+1)/dc # dc is in scope
return [xmod, xmod + 1, xmod + dc, xmod + 1 + dc]
The dimension of Values is equal to dr-1 * dc-1 since the last row and column is not included. We can compute it as a moving average and reshape to 1D later, (inspiration from link):
Values = ((d[:-1,:-1] + d[1:,:-1] + d[:-1,1:] + d[1:,1:])/4).reshape((dr-1)*(dc-1))
Example:
dr = 3
dc = 5
In: np.array(range(dc*dr)).reshape((dr, dc)) # data
Out:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In: [Keys(x) for x in range((dr-1)*(dc-1))]
Out:
[[0, 1, 5, 6],
[1, 2, 6, 7],
[2, 3, 7, 8],
[3, 4, 8, 9],
[5, 6, 10, 11],
[6, 7, 11, 12],
[7, 8, 12, 13],
[8, 9, 13, 14]]
In: Values
Out: array([ 3, 4, 5, 6, 8, 9, 10, 11])
in Python, given an n x p matrix, e.g. 4 x 4, how can I return a matrix that's 4 x 2 that simply averages the first two columns and the last two columns for all 4 rows of the matrix?
e.g. given:
a = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
return a matrix that has the average of a[:, 0] and a[:, 1] and the average of a[:, 2] and a[:, 3].
I want this to work for an arbitrary matrix of n x p assuming that the number of columns I am averaging of n is obviously evenly divisible by n.
let me clarify: for each row, I want to take the average of the first two columns, then the average of the last two columns. So it would be:
1 + 2 / 2, 3 + 4 / 2 <- row 1 of new matrix
5 + 6 / 2, 7 + 8 / 2 <- row 2 of new matrix, etc.
which should yield a 4 by 2 matrix rather than 4 x 4.
thanks.
How about using some math? You can define a matrix M = [[0.5,0],[0.5,0],[0,0.5],[0,0.5]] so that A*M is what you want.
from numpy import array, matrix
A = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
M = matrix([[0.5,0],
[0.5,0],
[0,0.5],
[0,0.5]])
print A*M
Generating M is pretty simple too, entries are 1/n or zero.
reshape - get mean - reshape
>>> a.reshape(-1, a.shape[1]//2).mean(1).reshape(a.shape[0],-1)
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
is supposed to work for any array size, and reshape doesn't make a copy.
It's a bit unclear what should happen for matrices with n > 4, but this code will do what you want:
a = N.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=float)
avg = N.vstack((N.average(a[:,0:2], axis=1), N.average(a[:,2:4], axis=1))).T
This yields avg =
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
Here's a way to do it. You only need to change groupsize to make it work with other sizes like you said, though I'm not fully sure what you want.
groupsize = 2
out = np.hstack([np.mean(x,axis=1,out=np.zeros((a.shape[0],1))) for x in np.hsplit(a,groupsize)])
yields
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
for out. Hopefully it gives you some ideas on how to do exactly what it is that you want to do. You can make groupsize dependent on the dimensions of a for instance.