I'm having trouble figuring out how to create a 10x1 numpy array with the number 5 in the first 3 elements and the other 7 elements with the number 0. Any thoughts on how to do this efficiently?
Simplest would seem to be:
import numpy as np
the_array = np.array([5]*3 + [0]*7)
Does this simple approach present some specific disadvantage for your purposes?
Of course there are many alternatives, such as
the_array = np.zeros((10,))
the_array[:3] = 5
If you need to repeat this specific operation a huge number of times, so small differences in speed matter, you could benchmark various approaches to see where a nanosecond or so might be saved. But the premise is so unlikely I would not suggest doing that for this specific question, even though I'm a big fan of timeit:-).
I think the way proposed by Alex Martelli is the clearest but here's another alternative using np.repeat which can be quite useful for constructing arrays with repeating values:
>>> np.repeat([5, 0], [3, 7])
array([5, 5, 5, 0, 0, 0, 0, 0, 0, 0])
So here, a list of values [5, 0] is passed in along with a list of repeats [3, 7]. In the returned NumPy array the first element of the values list, 5, is repeated 3 times and the second element 0 is repeated 7 times.
Just do the following.
import numpy as np
arr = np.zeros(10)
arr[:3] = 5
Both Alex and ajcr have useful answers but one thing to keep in mind is what your expected data type needs are.
np.zeros for example will cast a float whereas the other two answers will cast an int.
You can of course recast by using the 'astype' method:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html
import numpy as np
np.repeat([5,0],[3,7]).reshape(10,1)
import numpy as np
array = np.zeros(10)
print("An array of zeros : ")
print (array)
array = np.ones(10)
print("An array of ones : ")
print (array)
array = np.ones(10)*5
print("An array of fives : ")
print (array)
Related
I've seen a lot of questions with answers on how to efficiently replace elements of a NumPy array with specific things, such as "1" or something, if they satisfy certain conditions.
I wish to replace all the elements in a 2D NumPy array with an array of themselves, i.e. element i is turned into an element [i,i,i] or perhaps [f(i),g(i),h(i)] for some functions f(x),g(x),h(x) which I specify. How can this be done pythonically (and preferably, in a way agreeable to Numba)?
Use np.repeat:
a = np.repeat(np.arange(30).reshape(10, 3)[..., np.newaxis], 3, axis=2)
print(a.shape)
print(0, 0, :)
Output:
(10, 3, 3)
[0 0 0]
I have a 2dim numpy array,content in it is like a combination format of (-1,0,1) and (0,1,2,3,4)
now I want to assign the different combination into a 1dim matrix. for example, the [-1,0] transfer to 1 [-1,1] transfer to 2, and the final result will be a 1-15 1-dim matrix.
due to my data is very big, the for circulation with low efficiency is not appropriate. Thus I want to np.where to realize the function.
Supposed the 2-dim data is represented as a, and the shape is (100,2)
import itertools
import numpy as np
dic = set(itertools.product([-1, 0, 1], [0, 1, 2, 3, 4]))
for key,value in enumerate():
a[np.where(a==[-1,0])[0]]=key
but the output of the where will match two times with one discrimination, and the assignment will cover the front data. and the result will still be (100,0) with cover data.
so how can I realize my ideas? I mean rapid with little computation.
or is there another way to solve this problem?
Thanks
I have already solved this problem now.
import itertools
dic = (set(itertools.product([-1, 0, 1], [0, 1, 2, 3, 4])))
for key,values in enumerate(dic):
address = np.where(a==values)[0]
a[np.delete(address,np.unique(address,return_index=True)[1])]=key
one_dim_result = a[:,0]
if there are some greater methods, plz share to me. thanks
I have two 2D numpy arrays, for example:
A = numpy.array([[1, 2, 4, 8], [16, 32, 32, 8], [64, 32, 16, 8]])
and
B = numpy.array([[1, 2], [32, 32]])
I want to have all lines from A where I can find all elements from any of the lines of B. Where there are 2 of the same element in a row of B, lines from A must contain at least 2 as well. In case of my example, I want to achieve this:
A_filtered = [[1, 2, 4, 8], [16, 32, 32, 8]]
I have control over the values representation so I chose numbers where the binary representation has only one place with 1 (example: 0b00000001 and 0b00000010, etc...) This way I can easily check if all type of values are in the row by using np.logical_or.reduce() function, but I cannot check that the number of the same elements are bigger or equal in a row of A. I was really hoping that I could avoid simple for loop and deep copies of the arrays as the performance is a very important aspect for me.
How can I do that in numpy in an efficient way?
Update:
A solution from here may work, but I think the performance is a big concern for me, the A can be really big (>300000 rows) and B can be moderate (>30):
[set(row).issuperset(hand) for row in A.tolist() for hand in B.tolist()]
Update 2:
The set() solution is not working since the set() drops all duplicated values.
I hope I got your question right. At least it works with the problem you described in your question. If the order of the output should stay the same as the input, change the inplace-sort.
The code looks quite ugly, but should perform well and shouldn't be to hard to understand.
Code
import time
import numba as nb
import numpy as np
#nb.njit(fastmath=True,parallel=True)
def filter(A,B):
iFilter=np.zeros(A.shape[0],dtype=nb.bool_)
for i in nb.prange(A.shape[0]):
break_loop=False
for j in range(B.shape[0]):
ind_to_B=0
for k in range(A.shape[1]):
if A[i,k]==B[j,ind_to_B]:
ind_to_B+=1
if ind_to_B==B.shape[1]:
iFilter[i]=True
break_loop=True
break
if break_loop==True:
break
return A[iFilter,:]
Measuring performance
####First call has some compilation overhead####
A=np.random.randint(low=0, high=60, size=300_000*4).reshape(300_000,4)
B=np.random.randint(low=0, high=60, size=30*2).reshape(30,2)
t1=time.time()
#At first sort the arrays
A.sort()
B.sort()
A_filtered=filter(A,B)
print(time.time()-t1)
####Let's measure the second call too####
A=np.random.randint(low=0, high=60, size=300_000*4).reshape(300_000,4)
B=np.random.randint(low=0, high=60, size=30*2).reshape(30,2)
t1=time.time()
#At first sort the arrays
A.sort()
B.sort()
A_filtered=filter(A,B)
print(time.time()-t1)
Results
46ms after the first run on a dual-core Notebook (sorting included)
32ms (sorting excluded)
I think this should work:
First, encode the data as follows (this assumes a limited number of 'tokens', as your binary scheme also seems to imply):
Make A shape [n_rows, n_tokens], dtype int8, where each element counts the number of tokens. Encode B in the same way, with shape [n_hands, n_tokens]
This allows for a single vectorized expression of your output; matches = (A[None, :, :] >= B[:, None, :]).all(axis=-1). (exactly how to map this matches array to your desired output format is left as an excerise to the reader since the question leaves it undefined for multiple matches).
But we are talking > 10Mbyte of memory per token here. Even with 32 tokens that should not unthinkable; but in a situation like this it tends to be better to not vectorize the loop over n_tokens or n_hands, or both; for loops are fine for small n, or if there is sufficient work to be done in the body, such that the looping overhead is insignificant.
As long as n_tokens and n_hands remain moderate, I think this will be the fastest solution, if staying within the realm of pure python and numpy.
I have a very large NumPy array: a = np.array. From this array I want to get the min, max and average which can be easily done with np.min(a), np.max(a) and np.mean(a).
However, I want also to have the min, max and average of a portion (begin part or end part) of this array. Are there some functions for this without creating a new array/list (because that would really result in a bad performance penalty)?
All arrays generated by basic slicing are always views of the original array.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
So, yes, just use slices.
If the chunk you're working on is contiguous (i.e. no fancy indexing, in that case the part will get copied), you can use usual slicing syntax to get a view on the part of the array in question, without copying:
>>> import numpy as np
>>> arr = np.array([1,2,3,4,5])
>>> part = arr[1:3] # no copies here
>>> part[:] = 22,33
>>> print arr
[ 1 22 33 4 5]
What's the best way to create 2D arrays in Python?
What I want is want is to store values like this:
X , Y , Z
so that I access data like X[2],Y[2],Z[2] or X[n],Y[n],Z[n] where n is variable.
I don't know in the beginning how big n would be so I would like to append values at the end.
>>> a = []
>>> for i in xrange(3):
... a.append([])
... for j in xrange(3):
... a[i].append(i+j)
...
>>> a
[[0, 1, 2], [1, 2, 3], [2, 3, 4]]
>>>
Depending what you're doing, you may not really have a 2-D array.
80% of the time you have simple list of "row-like objects", which might be proper sequences.
myArray = [ ('pi',3.14159,'r',2), ('e',2.71828,'theta',.5) ]
myArray[0][1] == 3.14159
myArray[1][1] == 2.71828
More often, they're instances of a class or a dictionary or a set or something more interesting that you didn't have in your previous languages.
myArray = [ {'pi':3.1415925,'r':2}, {'e':2.71828,'theta':.5} ]
20% of the time you have a dictionary, keyed by a pair
myArray = { (2009,'aug'):(some,tuple,of,values), (2009,'sep'):(some,other,tuple) }
Rarely, will you actually need a matrix.
You have a large, large number of collection classes in Python. Odds are good that you have something more interesting than a matrix.
In Python one would usually use lists for this purpose. Lists can be nested arbitrarily, thus allowing the creation of a 2D array. Not every sublist needs to be the same size, so that solves your other problem. Have a look at the examples I linked to.
If you want to do some serious work with arrays then you should use the numpy library. This will allow you for example to do vector addition and matrix multiplication, and for large arrays it is much faster than Python lists.
However, numpy requires that the size is predefined. Of course you can also store numpy arrays in a list, like:
import numpy as np
vec_list = [np.zeros((3,)) for _ in range(10)]
vec_list.append(np.array([1,2,3]))
vec_sum = vec_list[0] + vec_list[1] # possible because we use numpy
print vec_list[10][2] # prints 3
But since your numpy arrays are pretty small I guess there is some overhead compared to using a tuple. It all depends on your priorities.
See also this other question, which is pretty similar (apart from the variable size).
I would suggest that you use a dictionary like so:
arr = {}
arr[1] = (1, 2, 4)
arr[18] = (3, 4, 5)
print(arr[1])
>>> (1, 2, 4)
If you're not sure an entry is defined in the dictionary, you'll need a validation mechanism when calling "arr[x]", e.g. try-except.
If you are concerned about memory footprint, the Python standard library contains the array module; these arrays contain elements of the same type.
Please consider the follwing codes:
from numpy import zeros
scores = zeros((len(chain1),len(chain2)), float)
x=list()
def enter(n):
y=list()
for i in range(0,n):
y.append(int(input("Enter ")))
return y
for i in range(0,2):
x.insert(i,enter(2))
print (x)
here i made function to create 1-D array and inserted into another array as a array member. multiple 1-d array inside a an array, as the value of n and i changes u create multi dimensional arrays