I have a 2D array of the form: np.zeros((m,n)).
My objective is to look at the first 2 columns, I want to find the element in the first column that is occurring the most (so the mode of the first column), however I do not want to count it twice if the second column is the same.
5x3 example:
[[1 2 x], [1 2 y], [1 3 z], [5 3 w], [5 6 v], [9 2 x], [9 2 y],]
Desired output, i.e. the number of occurrences of:
[1]: 2
[5]: 2
[9]: 1
So in a way it is a counter function but conditional on a second array (column 2).
I am relatively new to Python, is there a function that can do this directly and somewhat efficiently? I need to run it on very large matrices, but could not find such a function.
This funciotn solves your problem.
def count_special(arr):
counter = {}
for i in np.unique(arr[:,0]):
sec = arr[arr[:,0]==i,1]
counter[i] = len(np.unique(sec))
return counter
which, for your input, returns:
arr = np.array([[1,2,0],[1,2,4],[1,3,4],[5,3,1],[5,6,0]])
print(count_special(arr))
-> {1: 2, 5: 2}
Related
How do I fill a 2d array value from an updated 1d list ?
for example I have a list that I get from this code :
a=[]
for k, v in data.items():
b=v/sumcount
a.append(b)
What I want to do is produce several 'a' list and put their value into 2d array with different column. OR put directly the b value into 2D array whic one colum represent loop for number of k.
*My difficulties here is, k is not integer. its dict keys (str). whose length=9
I have tried this but does not work :
row = len(data.items())
matrix=np.zeros((9,2))
for i in range (1,3) :
a=[]
for k, v in data.items():
b=v/sumcount
matrix[x][i].fill(b), for x in range (1, 10)
a list is
1
2
3
4
5
6
7
8
9
and for example I do the outer loop, what I expect is
*for example 1 to 2 outer loop so I expect there will be 2 column and 9 row.
1 6
2 7
3 8
4 9
5 14
6 15
7 16
8 17
9 18
I want to fill matrix value with b
import numpy as np
import pandas as pd
matrix = np.zeros((9, 2))
df = pd.DataFrame({'aaa': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
sumcount = [1, 2]
for i in range(len(sumcount)):
matrix[:, i] = df['aaa']/sumcount[i]
print(matrix)
As far as I understand you: you need to get the result of the column from the dataframe and place it in a numpy array. No need to iterate over each row if your sumcount is the same number. This will work slowly. In general, loops are used as a last resort, if there is no other possibility.
Slicing is used to set values in numpy.
bbb = np.array([df['aaa']/sumcount[i] for i in range(len(sumcount))]).transpose()
print(bbb)
Or do without a loop at all using list comprehension, wrap the result in np.array and apply transpose.
I have some code right now that works fine, but it entirely too slow. I'm trying to add up the weighted sum of squares for every row in a Pandas dataframe. I'd like to vectorize the operations--that seems to run much, much faster--but there's a wrinkle in the code that has defeated my attempts to vectorize.
totalDist = 0.0
for index, row in pU.iterrows():
totalDist += (row['distance'][row['schoolChoice']]**2.0*float(row['students']))
The row has 'students' (an integer), distance (a numpy array of length n), and schoolChoice (an integer less than or equal to n-1 which designates which element of the distance array I'm using for the calcuation). Basically, I'm pulling a row-specific value from the numpy array. I've used df.lookup, but that actually seems to be slower and is being deprecated. Any suggestions on how to make this run faster? Thanks in advance!
If all else fails you can use .apply() on each row
totalSum = df.apply(lambda row: row.distance[row.schoolChoice] ** 2 * row.students, axis=1).sum()
To go faster you can import numpy
totalSum = (numpy.stack(df.distance)[range(len(df.schoolChoice)), df.schoolChoice] ** 2 * df.students).sum()
The numpy method requires distance be the same length for each row - however it is possible to pad them to the same length if needed. (Though this may affect any gains made.)
Tested on a df of 150,000 rows like:
distance schoolChoice students
0 [1, 2, 3] 0 4
1 [4, 5, 6] 2 5
2 [7, 8, 9] 2 6
3 [1, 2, 3] 0 4
4 [4, 5, 6] 2 5
Timings:
method time
0 for loop 15.9s
1 df.apply 4.1s
2 numpy 0.7s
consider two numpy arrays
array1 = np.arange(0,6)
array2 = np.arange(0,12)
i want to a run a loop (preferably a list comprehension) where the desire output for a single round is
print(array1[0])
print(array2[0],array2[1]) or
print(array1[1])
print(array2[2], array2[3])
ie the loop runs six times, but for every round in array1 it selects the two consecutive elements from array2.
I have tried something like
for i in xrange(array1):
for v in xrange(array2):
but this evidently runs the second loop inside the first one, How can i run them simultaneously but select different number of elements from each array in one round?
I have also tried making the loops equal in length such as
array1 = np.repeat(np.arange(0,6),2).ravel()
array1 = [0,0,1,1,2,2.....5,5]
however, this will make the length of the two arrays equal but i still cannot get the desired output
(In actual case, the elements of the array are pandas Series objects)
There are a bunch of different ways of going about this. One thing you can do is use the indices:
for ind, item in array1:
print(item, array2[2*ind:2*ind+2])
This does not use the full power of numpy, however. The easiest thing I can think of is to concatenate your arrays into a single array containing the desired sequence. You can make it into a 2D array for easy iteration, where each column or row will be the sequence of three elements you want:
array1 = np.arange(6)
array2 = np.arange(12)
combo = np.concatenate((array1.reshape(-1, 1), array2.reshape(-1, 2)), axis=1)
for row in combo:
print(row)
Results in
[0 0 1]
[1 2 3]
[2 4 5]
[3 6 7]
[4 8 9]
[ 5 10 11]
In this case, the explicit reshape of array1 is necessary because array1.T will result in a 1D array.
You can use a hybrid of the two approaches, as #Divakar suggests, where you reshape a2 but iterate using the index:
array3 = array2.reshape(-1, 2)
for ind, item in array1:
print(item, array3[ind])
Yes, as #MadPhysicist mentioned, there are a lot of ways to do this.... but the simplest is
>>> for x,y,z in zip(array1,array2[:-1:2],array2[1::2]):
... print x,y,z
...
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
5 10 11
for i in xrange(array1):
print(array1[i])
print(array2[2*i],array2[2*i+1])
I am trying to print the row number and column number of a matrix where the value is 1.
For example:
A=[0 1 0]
[1 0 1]
[1 0 0]
I want the output to be displayed as:(row number followed by the corresponding column)
0 1
1 0 2
2 0
I tried to use enumerate() but it gave me different kind of output.
G={i: [j for j, y in enumerate(row) if y] for i, row in enumerate(A)}
print (G)
Python's indices are zero-based. You have done everything correctly, but just need to add a couple of +1s to get the output you expected. Also, dictionaries are unordered by nature, so you would be better off just using a list of tuples:
G = [(i+1, [j+1 for j, y in enumerate(row) if y]) for i, row in enumerate(A)]
Or better still; just a 2d list using the indices as the first column when you need them.
This question already has answers here:
remove zero lines 2-D numpy array
(4 answers)
Closed 6 years ago.
I'm trying to write a function to delete all rows in which have a zero value in.
This is not from my code, but an example of the idea I am using:
import numpy as np
a=np.array(([7,1,2,8],[4,0,3,2],[5,8,3,6],[4,3,2,0]))
b=[]
for i in range(len(a)):
for j in range (len(a[i])):
if a[i][j]==0:
b.append(i)
print 'b=', b
for zero_row in b:
x=np.delete(a,zero_row, 0)
print 'a=',a
and this is my output:
b= [1, 3]
a= [[7 1 2 8]
[4 0 3 2]
[5 8 3 6]
[4 3 2 0]]
How do I get rid of the rows with the index in b?
Sorry, I'm fairly new to this any help would be really appreciated.
I'm trying to write a function to delete all rows in which have a zero value in.
You don't need to write a function for that, it can be done in a single expression:
>>> a[np.all(a != 0, axis=1)]
array([[7, 1, 2, 8],
[5, 8, 3, 6]])
Read as: select from a all rows that are entirely non-zero.
Looks like np.delete does't change the array, just returns a new array, so
Instead of
x = np.delete(a,zero_row, 0)
try
a = np.delete(a,zero_row, 0)
I think I have found the answer:
as #tuxcanfly said I changed x to a.
Also I have now removed the for loop as it removed the row with index 2 for some reason.
Instead I now just chose to delete the rows using b as the delete function with use the elements in the list to remove the row with that index.
the new code:
import numpy as np
a=np.array(([7,1,2,8],[4,0,3,2],[5,8,3,6],[4,3,2,0]))
b=[]
for i in range(len(a)):
for j in range (len(a[i])):
if a[i][j]==0:
b.append(i)
print 'b=',b
for zero_row in b:
a=np.delete(a,b, 0)
print 'a=',a
and the output:
b= [1, 3]
a= [[7 1 2 8]
[5 8 3 6]]
I think this helps readability (and allows you to loop once, not twice):
#!/usr/bin/env python
import numpy as np
a = np.array(([7,1,2,8], [4,0,3,2], [5,8,3,6], [4,3,2,0]))
b = None
for row in a:
if 0 not in row:
b = np.vstack((b, row)) if b is not None else row
a = b
print 'a = ', a
In this version, you loop over each row and test for 0's membership in the row. If the row does not contain a zero, you attempt to use np.vstack to append the row to an array called b. If b has not yet been assigned to, it is initialized to the current row.