I can create a normal matrix with numpy using
np.zeros([800, 200])
How can I create a matrix with a negative index - as in a 1600x200 matrix with row index from -800 to 800?
Not sure what you need it for but maybe you could use a dictionary instead.
a={i:0 for i in range(-800,801)}
With this you can call a[-800] to a[800].
For 2-D,
a={(i,j):0 for i in range(-800,801) for j in range(-100,101)}
This can be called with a[(-800,-100)] to a[(800,100)]
Not clear what is being asked. NumPy arrays do already support access via negative indexing, which will reach out to positions relative to the end, e.g.:
import numpy as np
m = np.arange(3 * 4).reshape((3, 4))
print(m)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(m[-1, :])
# [ 8 9 10 11]
print(m[:, -1]
# [ 3 7 11]
If you need an array that is contiguous near the zero of your indices, one option would be to write a function to map each indices i to i + d // 2 (d being the size along the given axis), e.g.:
def idx2neg(indexing, shape):
new_indexing = []
for idx, dim in zip(indexing, shape):
if isinstance(idx, int):
new_indexing.append(idx + dim // 2)
...
# not implemented for slices, boolean masks, etc.
return tuple(new_indexing)
Note that the above function is not as flexible as what NumPy accepts, it is just meant to give some idea on how to proceed.
Probably you refer to the Fortran-like arbitrary indexing of arrays. This is not compatible with Python. Check the comments in this question. Basically it clashes with Python way of treating negative indexes, which is to start counting from the end (or right) of the array.
I don't know why you would need that but if you just need it for indexing try following function:
def getVal(matrix, i, k):
return matrix[i + 800][k]
This function only interpretes the first index so you can type in an index from -800 up to 799 for a 1600x200 matrix.
If you want to index relatively to its number of lines try following function:
def getVal(matrix, i, k):
return matrix[i + len(matrix) // 2][k]
Hope that helps!
Extending the dependency list, this is kind of straightforward using a pandas DataFrame with custom index.
However you will need to change slightly the syntax for how you access rows (and columns), yet there is the possibility to slice multiple rows and columns.
This is specific to 2d numpy arrays:
import numpy as np
import pandas as pd
a = np.arange(1600*200).reshape([1600, 200])
df = pd.DataFrame(a, index=range(-800, 800))
Once you have such dataframe you can access columns and rows (with few syntax inconsistencies):
Access the 1st column: df[0]
Access the 1st row: df.loc[-800]
Access rows from 1st to 100th: df.loc[-800:-700] and df[-800: -700]
Access columns from 1st to 100th: df.loc[:, 0:100]
Access rows and columns: df.loc[-800:-700, 0:100]
Full documentation on pandas slicing and indexing can be found here.
You can use the np.arange function to generate an array of integers from -800 to 800, and then reshape this array into the desired shape using the reshape method.
Here's an example of how you could do this:
import numpy as np
# Create an array of integers from -800 to 800
indices = np.arange(-800, 801)
# Reshape the array into a 1600 x 200 matrix
matrix = indices.reshape(1600, 200)
This will create a 1600 x 200 matrix with row indices ranging from -800 to 800. You can then access elements of the matrix using these negative indices just like you would with positive indices.
For example, to access the element at row -1 and column 0, you could use the following code:
matrix[-1, 0]
You can create a matrix with a negative index using the following code:
import numpy as np
my_matrix = np.zeros((1600, 200))
my_matrix = np.pad(my_matrix, ((800, 800), (0, 0)), mode='constant', constant_values=0)
my_matrix = my_matrix[-800:800, :]
You can create numpy.ndarray subclass. Take a look at the below example, it can create an array with a specific starting index.
import numpy as np
class CustomArray(np.ndarray):
def __new__(cls, input_array, startIndex=None):
obj = np.asarray(input_array)
if startIndex is not None:
if isinstance(startIndex, int):
startIndex = (startIndex, )
else:
startIndex = tuple(startIndex)
assert len(startIndex), len(obj.shape[0])
obj = obj.view(cls)
obj.startIndex = startIndex
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.startIndex = getattr(obj, 'startIndex', None)
#staticmethod
def adj_index(idx, adj):
if isinstance(idx, tuple):
if not isinstance(adj, tuple):
adj = tuple([adj for i in range(len(idx))])
idx = tuple([CustomArray.adj_index(idx_i, adj_i) for idx_i, adj_i in zip(idx, adj)])
elif isinstance(idx, slice):
if isinstance(adj, tuple):
adj = adj[0]
idx = slice(idx.start-adj if idx.start is not None else idx.start,
idx.stop-adj if idx.stop is not None else idx.stop,
idx.step)
else:
if isinstance(adj, tuple):
adj = adj[0]
idx = idx - adj
return idx
def __iter__(self):
return np.asarray(self).__iter__()
def __getitem__(self, idx):
if self.startIndex is not None:
idx = self.adj_index(idx, self.startIndex)
return np.asarray(self).__getitem__(idx)
def __setitem__(self, idx, val):
if self.startIndex is not None:
idx = self.adj_index(idx, self.startIndex)
return np.asarray(self).__setitem__(idx, val)
def __repr__(self):
r = np.asarray(self).__repr__()
if self.startIndex is not None:
r += f'\n StartIndex: {self.startIndex}'
return r
Example
Related
I have a simple 1 dimensional NumPy array called ar1 and another array that shows the range indices (begin index, end index) of ar1 called idx. The structure looks something like this:
ar1 = np.zeros((m*n))
idx = np.array(
[[1,10],
[40,80],
[100,110]] )
Now I want to change the elements in ar1 using the range indices that I have in idx. In another word, I'm looking for an efficent way using Numpy functions and tricks to manipulate the ar1's 1 to 10, 40 to 80 and 100 to 110 elements and for example, set them to number 255.
How can I do it?
what about just generating all the desired indices once, would that be slow for you application? Thinking something like:
def generate_indices( all_ranges ):
"""Take a list of ranges and explicitly create all indices"""
indices = []
for sub_range in all_ranges:
indices += range( sub_range[0], sub_range[1] )
return np.array( indices )
Then you could manipulate as
m=n=25
ar1 = np.zeros((m*n))
idx = np.array(
[[1,10],
[40,80],
[100,110]] )
indices = generate_indices( idx )
ar1[indices] = 255
for i in range(len(idx)):
ar1[idx[i][0]-1:idx[i][1]].fill(255)
I have a numpy array that may contain inf values.
The numpy array is a 1D vector of numbers.
Is there a way to change the inf values of the array for the previous value of the array (which is not inf)?
So if the 1000th index of the array is an inf it should replace it by the 999th index which is not inf.
Heres an example of what I want
vals = np.random.random(10000)
vals[vals<0.1] = np.inf
indexes = np.asarray(vals==np.inf).nonzero()
for i in indexes:
vals[i] = vals[i-1]
if np.isinf(vals).any():
print("It doesnt work")
else:
print("It works")
why do you not use the simplest way?
for i in range (0,len(a)):
if a[i]==inf: a[i]=a[i-1]
I have never work with inf. maybe you the type of it is str and so you should write a[i]=='inf'
def pandas_fill(arr):
df = pd.DataFrame(arr)
df.fillna(method='ffill', axis=1, inplace=True)
out = df.as_matrix()
return out
def numpy_fill(arr):
mask = np.isnan(arr)
idx = np.where(~mask,np.arange(mask.shape[1]),0)
np.maximum.accumulate(idx,axis=1, out=idx)
out = arr[np.arange(idx.shape[0])[:,None], idx]
return out
inf and -inf will be loaded as nan. So, this should be handled with that.
Try out this updated one.
import numpy as np
Data = np.array([np.nan,1.3,np.nan,1.4,np.nan,np.nan])
nansIndx = np.where(np.isnan(Data))[0]
isanIndx = np.where(~np.isnan(Data))[0]
for nan in nansIndx:
replacementCandidates = np.where(isanIndx>nan)[0]
if replacementCandidates.size != 0:
replacement = Data[isanIndx[replacementCandidates[0]]]
else:
replacement = Data[isanIndx[np.where(isanIndx<nan)[0][-1]]]
Data[nan] = replacement
print(Data)
I have a 2D numpy array with 3 columns. Columns 1 and 2 are a list of connections between ID's. Column 3 is a the strength of that connection. I would like to transform this 3 column matrix into a weighted adjacency matrix (an N x N matrix where cells represent the strength of connection between each ID).
I have already done this in my code below. matrix is the 3 column 2D array and t1 is the weighted adjacency matrix. My problem is this code is very slow because I am using nested for loops. I am familiar with the pandas function melt which does this, but I am not able to use pandas. Is there a faster implementation not using pandas?
import numpy as np
a = np.arange(2000)
np.random.shuffle(a)
b = np.arange(2000)
np.random.shuffle(b)
c = np.random.rand(2000,1)
matrix = np.column_stack((a,b,c))
#get unique value list of nm
flds = list(np.unique(matrix[:,0]))
flds.extend(list(np.unique(matrix[:,1])))
flds = np.asarray(flds)
flds = np.unique(flds)
#make lookup dict
lookup = dict(zip(np.arange(0,len(flds)), flds))
lookup_rev = dict(zip(flds, np.arange(0,len(flds))))
#make empty n by n matrix with unique lists
t1 = np.zeros([len(flds) , len(flds)])
#map values into the n by n matrix and make the rest 0
'''this takes a long time to run'''
#iterate through rows
for i in np.arange(0,len(lookup)):
#iterate through columns
for k in np.arange(0,len(lookup)):
val = matrix[(matrix[:,0] == lookup[i]) & (matrix[:,1] == lookup[k])][:,2]
if val:
t1[i,k] = sum(val)
Assuming that I understood the question correctly and that val is a scalar, you could use a vectorized approach that involves initializing with zeros and then indexing, like so -
out = np.zeros((len(flds),len(flds)))
out[matrix[:,0].astype(int),matrix[:,1].astype(int)] = matrix[:,2]
Please note that by my observation it looks like you can avoid using lookup.
You need to iterate your matrix only once:
import numpy as np
size = 2000
a = np.arange(size)
np.random.shuffle(a)
b = np.arange(size)
np.random.shuffle(b)
c = np.random.rand(size,1)
matrix = np.column_stack((a,b,c))
#get unique value list of nm
fields = np.unique(matrix[:,:2])
n = len(fields)
#make reverse lookup dict
lookup = dict(zip(fields, range(n)))
#make empty n by n matrix
t1 = np.zeros([n, n])
for src, dest, val in matrix:
i = lookup[src]
j = lookup[dest]
t1[i, j] += val
The main acceleration you can get is by not iterating through each element of the NxN matrix but instead iterate trough your connection list, which is much smaller.
I tried to simplify your code a bit. It use the list.index method, which can be slow, but it should still be faster that what you had.
import numpy as np
a = np.arange(2000)
np.random.shuffle(a)
b = np.arange(2000)
np.random.shuffle(b)
c = np.random.rand(2000,1)
matrix = np.column_stack((a,b,c))
lookup = np.unique(matrix[:,:2]).tolist() # You can call unique only once
t1 = np.zeros((len(lookup),len(lookup)))
for i,j,val in matrix:
t1[lookup.index(i),lookup.index(j)] = val # Fill the matrix
This is a follow-up to Find two pairs of pairs that sum to the same value .
I have random 2d arrays which I make using
import numpy as np
from itertools import combinations
n = 50
A = np.random.randint(2, size=(m,n))
I would like to determine if the matrix has two disjoint pairs of pairs of columns which sum to the same column vector. I am looking for a fast method to do this. In the previous problem ((0,1), (0,2)) was acceptable as a pair of pairs of column indices but in this case it is not as 0 is in both pairs.
The accepted answer from the previous question is so cleverly optimised I can't see how to make this simple looking change unfortunately. (I am interested in columns rather than rows in this question but I can always just do A.transpose().)
Here is some code to show it testing all 4 by 4 arrays.
n = 4
nxn = np.arange(n*n).reshape(n, -1)
count = 0
for i in xrange(2**(n*n)):
A = (i >> nxn) %2
p = 1
for firstpair in combinations(range(n), 2):
for secondpair in combinations(range(n), 2):
if firstpair < secondpair and not set(firstpair) & set(secondpair):
if (np.array_equal(A[firstpair[0]] + A[firstpair[1]], A[secondpair[0]] + A[secondpair[1]] )):
if (p):
count +=1
p = 0
print count
This should output 3136.
Here is my solution, extended to do what I believe you want. It isn't entirely clear though; one may get an arbitrary number of row-pairs that sum to the same total; there may exist unique subsets of rows within them that sum to the same value. For instance:
Given this set of row-pairs that sum to the same total
[[19 19 30 30]
[11 16 11 16]]
There exists a unique subset of these rows that may still be counted as valid; but should it?
[[19 30]
[16 11]]
Anyway, I hope those details are easy to deal with, given the code below.
import numpy as np
n = 20
#also works for non-square A
A = np.random.randint(2, size=(n*6,n)).astype(np.int8)
##A = np.array( [[0, 0, 0], [1, 1, 1], [1, 1 ,1]], np.uint8)
##A = np.zeros((6,6))
#force the inclusion of some hits, to keep our algorithm on its toes
##A[0] = A[1]
def base_pack_lazy(a, base, dtype=np.uint64):
"""
pack the last axis of an array as minimal base representation
lazily yields packed columns of the original matrix
"""
a = np.ascontiguousarray( np.rollaxis(a, -1))
packing = int(np.dtype(dtype).itemsize * 8 / (float(base) / 2))
for columns in np.array_split(a, (len(a)-1)//packing+1):
R = np.zeros(a.shape[1:], dtype)
for col in columns:
R *= base
R += col
yield R
def unique_count(a):
"""returns counts of unique elements"""
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), np.int)
np.add.at(count, inverse, 1) #note; this scatter operation requires numpy 1.8; use a sparse matrix otherwise!
return unique, count, inverse
def voidview(arr):
"""view the last axis of an array as a void object. can be used as a faster form of lexsort"""
return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1])
def has_identical_row_sums_lazy(A, combinations_index):
"""
compute the existence of combinations of rows summing to the same vector,
given an nxm matrix A and an index matrix specifying all combinations
naively, we need to compute the sum of each row combination at least once, giving n^3 computations
however, this isnt strictly required; we can lazily consider the columns, giving an early exit opportunity
all nicely vectorized of course
"""
multiplicity, combinations = combinations_index.shape
#list of indices into combinations_index, denoting possibly interacting combinations
active_combinations = np.arange(combinations, dtype=np.uint32)
#keep all packed columns; we might need them later
columns = []
for packed_column in base_pack_lazy(A, base=multiplicity+1): #loop over packed cols
columns.append(packed_column)
#compute rowsums only for a fixed number of columns at a time.
#this is O(n^2) rather than O(n^3), and after considering the first column,
#we can typically already exclude almost all combinations
partial_rowsums = sum(packed_column[I[active_combinations]] for I in combinations_index)
#find duplicates in this column
unique, count, inverse = unique_count(partial_rowsums)
#prune those combinations which we can exclude as having different sums, based on columns inspected thus far
active_combinations = active_combinations[count[inverse] > 1]
#early exit; no pairs
if len(active_combinations)==0:
return False
"""
we now have a small set of relevant combinations, but we have lost the details of their particulars
to see which combinations of rows does sum to the same value, we do need to consider rows as a whole
we can simply apply the same mechanism, but for all columns at the same time,
but only for the selected subset of row combinations known to be relevant
"""
#construct full packed matrix
B = np.ascontiguousarray(np.vstack(columns).T)
#perform all relevant sums, over all columns
rowsums = sum(B[I[active_combinations]] for I in combinations_index)
#find the unique rowsums, by viewing rows as a void object
unique, count, inverse = unique_count(voidview(rowsums))
#if not, we did something wrong in deciding on active combinations
assert(np.all(count>1))
#loop over all sets of rows that sum to an identical unique value
for i in xrange(len(unique)):
#set of indexes into combinations_index;
#note that there may be more than two combinations that sum to the same value; we grab them all here
combinations_group = active_combinations[inverse==i]
#associated row-combinations
#array of shape=(mulitplicity,group_size)
row_combinations = combinations_index[:,combinations_group]
#if no duplicate rows involved, we have a match
if len(np.unique(row_combinations[:,[0,-1]])) == multiplicity*2:
print row_combinations
return True
#none of identical rowsums met uniqueness criteria
return False
def has_identical_triple_row_sums(A):
n = len(A)
idx = np.array( [(i,j,k)
for i in xrange(n)
for j in xrange(n)
for k in xrange(n)
if i<j and j<k], dtype=np.uint16)
idx = np.ascontiguousarray( idx.T)
return has_identical_row_sums_lazy(A, idx)
def has_identical_double_row_sums(A):
n = len(A)
idx = np.array(np.tril_indices(n,-1), dtype=np.int32)
return has_identical_row_sums_lazy(A, idx)
from time import clock
t = clock()
for i in xrange(1):
## print has_identical_double_row_sums(A)
print has_identical_triple_row_sums(A)
print clock()-t
Edit: code cleanup
I'm trying to optimize the following code, potentially by rewriting it in Cython: it simply takes a low dimensional but relatively long numpy arrays, looks into of its columns for 0 values, and marks those as -1 in an array. The code is:
import numpy as np
def get_data():
data = np.array([[1,5,1]] * 5000 + [[1,0,5]] * 5000 + [[0,0,0]] * 5000)
return data
def get_cols(K):
cols = np.array([2] * K)
return cols
def test_nonzero(data):
K = len(data)
result = np.array([1] * K)
# Index into columns of data
cols = get_cols(K)
# Mark zero points with -1
idx = np.nonzero(data[np.arange(K), cols] == 0)[0]
result[idx] = -1
import time
t_start = time.time()
data = get_data()
for n in range(5000):
test_nonzero(data)
t_end = time.time()
print (t_end - t_start)
data is the data. cols is the array of columns of data to look for non-zero values (for simplicity, I made it all the same column). The goal is to compute a numpy array, result, which has a 1 value for each row where the column of interest is non-zero, and -1 for the rows where the corresponding columns of interest have a zero.
Running this function 5000 times on a not-so-large array of 15,000 rows by 3 columns takes about 20 seconds. Is there a way this can be sped up? It appears that most of the work goes into finding the nonzero elements and retrieving them with indices (the call to nonzero and subsequent use of its index.) Can this be optimized or is this the best that can be done?
How could a Cython implementation gain speed on this?
cols = np.array([2] * K)
That's going to be really slow. That's create a very large python list and then converts it into a numpy array. Instead, do something like:
cols = np.ones(K, int)*2
That'll be way faster
result = np.array([1] * K)
Here you should do:
result = np.ones(K, int)
That will produce the numpy array directly.
idx = np.nonzero(data[np.arange(K), cols] == 0)[0]
result[idx] = -1
The cols is an array, but you can just pass a 2. Furthermore, using nonzero adds an extra step.
idx = data[np.arange(K), 2] == 0
result[idx] = -1
Should have the same effect.