How to create basic table in Python? - python

I want to make a table of 10 columns. I want also to find the row with the minimum value in column 0.
Example:
[[1,2,3]
[4,5,6,]
[7,8,9]
[10,11,21]]
How do I get to the row which have minimum value of column 0? I just need a function that can use column 0.
[1,2,3]

With numpy arange we can easily create a range of numbers, and then reshape them into a 2d array:
In [70]: arr = np.arange(1,13).reshape(4,3)
In [71]: arr
Out[71]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
argmin gives the index of the minimum value, for the whole array (flattened) or by row or column:
In [72]: np.argmin(arr, axis=1)
Out[72]: array([0, 0, 0, 0])
The 0 column:
In [73]: arr[:,0]
Out[73]: array([ 1, 4, 7, 10])
In [74]: np.argmin(arr[:,0])
Out[74]: 0
pandas makes a nice table.
In [76]: import pandas as pd
In [77]: df = pd.DataFrame(arr)
In [78]: df
Out[78]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12

There is a builtin function for that range.
Range does not create a list but an iterator wich behave quite like a list and should be way enough for you (iterator are "lists" but their item are calculated only when requested).
So :
a = range(10)
print(a) #-> range(0, 10)
for i in a:
print(a) #-> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
print(a[2]) #-> 2
print(a[0]) #-> 0
If you want not to start from 0 just put range(start_value, end_value).
And if you want a custom increment use range(start_value, end_value, increment) (the default increment is 1 but if you want to go backward you can use -1).
Edit:
To create a table like your example you can use this small function :
def ct(nStart, nEnd, nPerSubTable):
r = [] # Setup initial variable
subTable = []
for i in range(nStart, nEnd): # The main ranging
subTable.append(i)
if len(subTable) == nPerSubTable: # When the len of the sub table hit the requested one append to r and reset sub table
r.append(subTable)
subTable = []
if len(subTable) > 0: # If there is some left over because the last subtable is smaller than expected, add it any way
r.append(subTable)
return r

Related

can't understand scipy.sparse.csr_matrix example

I can't wrap my head around csr_matrix examples in scipy documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html
Can someone explain how this example work?
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
I believe this is following this format.
csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])
where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k].
What is a here?
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
from the above arrays;
for k in 0~5
a[row_ind[k], col_ind[k]] = data[k]
a
row[0],col[0] = [0,0] = 1 (from data[0])
row[1],col[1] = [0,2] = 2 (from data[1])
row[2],col[2] = [1,2] = 3 (from data[2])
row[3],col[3] = [2,0] = 4 (from data[3])
row[4],col[4] = [2,1] = 5 (from data[4])
row[5],col[5] = [2,2] = 6 (from data[5])
so let's arrange matrix 'a' in shape(3X3)
a
0 1 2
0 [1, 0, 2]
1 [0, 0, 3]
2 [4, 5, 6]
This is a sparse matrix. So, it stores the explicit indices and values at those indices. So for example, since row=0 and col=0 corresponds to 1 (the first entries of all three arrays in your example). Hence, the [0,0] entry of the matrix is 1. And so on.
Represent the "data" in a 4 X 4 Matrix:
data = np.array([10,0,5,99,25,9,3,90,12,87,20,38,1,8])
indices = np.array([0,1,2,3,0,2,3,0,1,2,3,1,2,3])
indptr = np.array([0,4,7,11,14])
'indptr'- Index pointers is linked list of pointers to 'indices' (Column
index Pointers)...
indptr[i:i+1] represents i to i+1 index of pointer
14 reprents len of Data len(data)...
indptr = np.array([0,4,7,11,len(data)]) other way of represenint 'indptr'
0,4 --> 0:4 represents pointers to indices 0,1,2,3
4,7 --> 4:7 represents the pointers of indices 0,2,3
7,11 --> 7:11 represents the pointers of 0,1,2,3
11,14 --> 11:14 represents pointers 1,2,3
# Representing the data in a 4,4 matrix
a = csr_matrix((data,indices,indptr),shape=(4,4),dtype=np.int)
a.todense()
matrix([[10, 0, 5, 99],
[25, 0, 9, 3],
[90, 12, 87, 20],
[ 0, 38, 1, 8]])
Another Stackoverflow explanation
As far as I understand, in row and col arrays we have indices which corrensponds to non-zero values in matrix. a[0, 0] = 1, a[0, 2] = 2, a[1, 2] = 3 and so on. As we have no indices for a[0, 1], a[1, 0], a[1, 1] so appropriate values in matrix are equal to 0.
Also, maybe this little intro will be helpful for you:
https://www.youtube.com/watch?v=Lhef_jxzqCg
#Rohit Pandey stated correctly, I just want to add an example on that.
When most of the elements of a matrix have 0 values, then we call this a sparse matrix. The process includes removing zero elements from the matrix and thus saving memory space and computing time. We only store non-zero items with their respected row and column index. i.e.
0 3 0 4
0 5 7 0
0 0 0 0
0 2 6 0
We calculate the sparse matrix by putting non-zero items row index first, then column index, and finally non-zero values like the following:
Row
0
0
1
1
3
3
Column
1
3
1
2
1
2
Value
3
4
5
7
2
6
By reversing the process we get the simple matrix form from the sparse form.

deleting rows based on value found in specififc column

I am attempting to write a code that searches a numpy array for cases where the value in the fifth column does not have 50. If it does not I wish to remove it.
This is what I have so far:
for rows in range(len(b)):
if b[:,4].any() != 50:
b = np.delete(b, b[rows])
However, I keep getting the following error:
too many indices for array
Lets run the calculation with some diagnositic prints. Note where the error occurs. That's important! (We shouldn't just keep trying things without isolating the problem!)
In [2]: b=np.array([[0,1,2],[1,2,3],[2,1,2]])
In [3]: for row in range(len(b)):
...: print(row)
...: if b[:,2].any() !=2:
...: print(b[row])
...: b = np.delete(b, b[row])
...:
0
[0 1 2]
1
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-04dc188d9a2b> in <module>()
1 for row in range(len(b)):
2 print(row)
----> 3 if b[:,2].any() !=2:
4 print(b[row])
5 b = np.delete(b, b[row])
IndexError: too many indices for array
So the error occurs on the 2nd iteration (row 1). Something is wrong with the b after the delete. What is the new value of b?
In [4]: b
Out[4]: array([1, 2, 3, 2, 1, 2])
b is a 1d array, not the 2d we started with. That explains the error, right? Something must be wrong with the use of delete. Maybe we need to check its documentation????
Look at the axis parameter:
axis : int, optional
The axis along which to delete the subarray defined by `obj`.
If `axis` is None, `obj` is applied to the flattened array.
We didn't specify an axis, so the delete was applied to the flattened array, and result was flattened - 1d.
But even if I specify an axis I get an error (I won't get into that), which prompts me to look more carefully at the if condition:
In [10]: b[:,2]
Out[10]: array([2, 3, 2])
In [11]: b[:,2].any()
Out[11]: True
In [12]: b[:,2]!=2
Out[12]: array([False, True, False])
Applying any to the column don't make sense - it just checks if any values in the column are not 0. Instead we want to test the column against the target, getting a boolean that matches the column in size.
We can use that boolean directly as row selection mask
In [13]: b[_,:]
Out[13]: array([[1, 2, 3]])
No need to iterate.
Another problem with your iteration. You iterate on the range(3), [0,1,2]. But inside the loop you try to remove a row from b, changing the size of b. That going to give problems when you try to index b[row] by number, right? When iterating, in Python or numpy, be careful about modifying the object that you are iterating over.
Sorry to be long winded about this, but it looks like you need some basic debugging guidance.
Here's a basic list approach:
In [15]: [row for row in b if row[2]!=2]
Out[15]: [array([1, 2, 3])]
I'm iterating on the rows, not their indices, and for each row checking the column value, and keeping that row if the check is True. We could do that with np.delete, but a list comprehension is clearer (and faster).
It would be better to provide b and desired output, but if i understand it correctly, you could use:
import numpy as np
b = np.array([[50, 2, 3, 4, 5, 6],
[4, 50, 6, 7, 8, 9],
[1, 1, 1, 1, 50, 9]])
array([[50, 2, 3, 4, 5, 6],
[ 4, 50, 6, 7, 8, 9],
[ 1, 1, 1, 1, 50, 9]])
Then you can check which rows contain 50 in the 5th column using
b[:, 4] == 50
array([False, False, True])
and feed this Boolean array back to b to select the desired columns:
b[b[:, 4] == 50]
which leaves you with one row in this case
array([[ 1, 1, 1, 1, 50, 9]])

Map index of numpy matrix

How should I map indices of a numpy matrix?
For example:
mx = np.matrix([[5,6,2],[3,3,7],[0,1,6]]
The row/column indices are 0, 1, 2.
So:
>>> mx[0,0]
5
Let s say I need to map these indices, converting 0, 1, 2 into, e.g. 10, 'A', 'B' in the way that:
mx[10,10] #returns 5
mx[10,'A'] #returns 6 and so on..
I can just set a dict and use it to access the elements, but I would like to know if it is possible to do something like what I just described.
I would suggest using pandas dataframe with the index and columns using the new mapping for row and col indexing respectively for ease in indexing. It allows us to select a single element or an entire row or column with the familiar colon operator.
Consider a generic (non-square 4x3 shaped matrix) -
mx = np.matrix([[5,6,2],[3,3,7],[0,1,6],[4,5,2]])
Consider the mappings for rows and columns -
row_idx = [10, 'A', 'B','C']
col_idx = [10, 'A', 'B']
Let's take a look on the workflow with the given sample -
# Get data into dataframe with given mappings
In [57]: import pandas as pd
In [58]: df = pd.DataFrame(mx,index=row_idx, columns=col_idx)
# Here's how dataframe data looks like
In [60]: df
Out[60]:
10 A B
10 5 6 2
A 3 3 7
B 0 1 6
C 4 5 2
# Get one scalar element
In [61]: df.loc['C',10]
Out[61]: 4
# Get one entire col
In [63]: df.loc[:,10].values
Out[63]: array([5, 3, 0, 4])
# Get one entire row
In [65]: df.loc['A'].values
Out[65]: array([3, 3, 7])
And best of all we are not making any extra copies as the dataframe and its slices are still indexing into the original matrix/array memory space -
In [98]: np.shares_memory(mx,df.loc[:,10].values)
Out[98]: True
Try this:
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)
print(B['ID'])
You can use the __getitem__ and __setitem__ special methods and create a new class as shown.
Store the index map as a dictionary in an instance variable self.index_map.
import numpy as np
class Matrix(np.matrix):
def __init__(self, lis):
self.matrix = np.matrix(lis)
self.index_map = {}
def setIndexMap(self, index_map):
self.index_map = index_map
def getIndex(self, key):
if type(key) is slice:
return key
elif key not in self.index_map.keys():
return key
else:
return self.index_map[key]
def __getitem__(self, idx):
return self.matrix[self.getIndex(idx[0]), self.getIndex(idx[1])]
def __setitem__(self, idx, value):
self.matrix[self.getIndex(idx[0]), self.getIndex(idx[1])] = value
Usage:
Creating a matrix.
>>> mx = Matrix([[5,6,2],[3,3,7],[0,1,6]])
>>> mx
Matrix([[5, 6, 2],
[3, 3, 7],
[0, 1, 6]])
Defining the Index Map.
>>> mx.setIndexMap({10:0, 'A':1, 'B':2})
Different ways to index the matrix.
>>> mx[0,0]
5
>>> mx[10,10]
5
>>> mx[10,'A']
6
It also handles slicing as shown.
>>> mx[1:3, 1:3]
matrix([[3, 7],
[1, 6]])

find indeces of grouped-item matches between two arrays

a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
I have two lists of integers that each is grouped by every 2 consecutive items (ie indices [0, 1], [2, 3] and so on).
The pairs of items cannot be found as duplicates in either list, neither in the same or the reverse order.
One list is significantly larger and inclusive of the other.
I am trying to figure out an efficient way to get the indices
of the larger list's grouped items that are also in the smaller one.
The desired output in the example above should be:
[2,3,6,7,10,11] #indices
Notice that, as an example, the first group ([3,4]) should not get indices 11,12 as a match because in that case 3 is the second element of [1,3] and 4 the first element of [4,7].
Since you are grouping your arrays by pairs, you can reshape them into 2 columns for comparison. You can then compare each of the elements in the shorter array to the longer array, and reduce the boolean arrays. From there it is a simple matter to get the indices using a reshaped np.arange.
import numpy as np
from functools import reduce
a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
# reshape a and b into columns
a2 = a.reshape((-1,2))
b2 = b.reshape((-1,2))
# create a generator of bools for the row of a2 that holds b2
b_in_a_generator = (np.all(a2==row, axis=1) for row in b2)
# reduce the generator to get an array of boolean that is True for each row
# of a2 that equals one of the rows of b2
ix_bool = reduce(lambda x,y: x+y, b_in_a_generator)
# grab the indices by slicing a reshaped np.arange array
ix = np.arange(len(a)).reshape((-1,2))[ix_bool]
ix
# returns:
array([[ 2, 3],
[ 6, 7],
[10, 11]])
If you want a flat array, simply ravel ix
ix.ravel()
# returns
array([ 2, 3, 6, 7, 10, 11])
Here's one approach making use of NumPy view of group of elements -
# Taken from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def grouped_indices(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
sidx = a0v.argsort()
idx = sidx[np.searchsorted(a0v,b0v, sorter=sidx)]
return ((idx*2)[:,None] + [0,1]).ravel()
If there isn't a membership between any group from b in a, we could filter that out using a mask : a0v[idx] == b0v.
Sample run -
In [345]: a
Out[345]: array([5, 8, 3, 4, 2, 5, 7, 8, 1, 9, 1, 3, 4, 7])
In [346]: b
Out[346]: array([3, 4, 7, 8, 1, 3])
In [347]: grouped_indices(a, b)
Out[347]: array([ 2, 3, 6, 7, 10, 11])
Another one using np.in1d to replace np.searchsorted -
def grouped_indices_v2(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
return (np.flatnonzero(np.in1d(a0v, b0v))[:,None]*2 + [0,1]).ravel()

How to replace only the first n elements in a numpy array that are larger than a certain value?

I have an array myA like this:
array([ 7, 4, 5, 8, 3, 10])
If I want to replace all values that are larger than a value val by 0, I can simply do:
myA[myA > val] = 0
which gives me the desired output (for val = 5):
array([0, 4, 5, 0, 3, 0])
However, my goal is to replace not all but only the first n elements of this array that are larger than a value val.
So, if n = 2 my desired outcome would look like this (10 is the third element and should therefore not been replaced):
array([ 0, 4, 5, 0, 3, 10])
A straightforward implementation would be:
import numpy as np
myA = np.array([7, 4, 5, 8, 3, 10])
n = 2
val = 5
# track the number of replacements
repl = 0
for ind, vali in enumerate(myA):
if vali > val:
myA[ind] = 0
repl += 1
if repl == n:
break
That works but maybe someone can can up with a smart way of masking!?
The following should work:
myA[(myA > val).nonzero()[0][:2]] = 0
since nonzero will return the indexes where the boolean array myA > val is non zero e.g. True.
For example:
In [1]: myA = array([ 7, 4, 5, 8, 3, 10])
In [2]: myA[(myA > 5).nonzero()[0][:2]] = 0
In [3]: myA
Out[3]: array([ 0, 4, 5, 0, 3, 10])
Final solution is very simple:
import numpy as np
myA = np.array([7, 4, 5, 8, 3, 10])
n = 2
val = 5
myA[np.where(myA > val)[0][:n]] = 0
print(myA)
Output:
[ 0 4 5 0 3 10]
Here's another possibility (untested), probably no better than nonzero:
def truncate_mask(m, stop):
m = m.astype(bool, copy=False) # if we allow non-bool m, the next line becomes nonsense
return m & (np.cumsum(m) <= stop)
myA[truncate_mask(myA > val, n)] = 0
By avoiding building and using an explicit index you might end up with slightly better performance...but you'd have to test it to find out.
Edit 1: while we're on the subject of possibilities, you could also try:
def truncate_mask(m, stop):
m = m.astype(bool, copy=True) # note we need to copy m here to safely modify it
m[np.searchsorted(np.cumsum(m), stop):] = 0
return m
Edit 2 (the next day): I've just tested this and it seems that cumsum is actually worse than nonzero, at least with the kinds of values I was using (so neither of the above approaches is worth using). Out of curiosity, I also tried it with numba:
import numba
#numba.jit
def set_first_n_gt_thresh(a, val, thresh, n):
ii = 0
while n>0 and ii < len(a):
if a[ii] > thresh:
a[ii] = val
n -= 1
ii += 1
This only iterates over the array once, or rather it only iterates over the necessary part of the array once, never even touching the latter part. This gives you vastly superior performance for small n, but even for the worst case of n>=len(a) this approach is faster.
You could use the same solution as here with converting you np.array to pd.Series:
s = pd.Series([ 7, 4, 5, 8, 3, 10])
n = 2
m = 5
s[s[s>m].iloc[:n].index] = 0
In [416]: s
Out[416]:
0 0
1 4
2 5
3 0
4 3
5 10
dtype: int64
Step by step explanation:
In [426]: s > m
Out[426]:
0 True
1 False
2 False
3 True
4 False
5 True
dtype: bool
In [428]: s[s>m].iloc[:n]
Out[428]:
0 7
3 8
dtype: int64
In [429]: s[s>m].iloc[:n].index
Out[429]: Int64Index([0, 3], dtype='int64')
In [430]: s[s[s>m].iloc[:n].index]
Out[430]:
0 7
3 8
dtype: int64
Output in In[430] looks the same as In[428] but in 428 it's a copy and in 430 original series.
If you'll need np.array you could use values method:
In [418]: s.values
Out[418]: array([ 0, 4, 5, 0, 3, 10], dtype=int64)

Categories