2D list to np.ndarray of ndarray - python

I have a 2D list like this
import numpy as np
data=[[1,2],[2,3],[3,4]]
I want to turn this to np.ndarray.
I don't want to get this:
np.array(data).shape
#(3,2)
my expected is like this
result=np.array(np.array([1,2]),np.array([2,3]),np.array([3,4]))
result.dtype
result.shape
#np.ndarry
#(3,)
result[i].shape
result[i].dtype
#(2,)
#np.ndarry
what should i do this a lot

AS #hpaulj suggested in comments
You can create first empty array of object type and then assign list to it like
data = [[1,2],[2,3],[3,4]]
result = np.empty(shape=len(data), dtype=object)
result[:] = [np.array(i) for i in data]
result
array([array([1, 2]), array([2, 3]), array([3, 4])], dtype=object)
result.shape
(3,)

You can try converting the list into an ndarray and then flattening it
np.array(data).flatten()
You can also specify order parameter in flatten to flatten in row-major ('C') or column-major ('F') order

Related

Get unique values in a list of numpy arrays

I have a list made up of arrays. All have shape (2,).
Minimum example: mylist = [np.array([1,2]),np.array([1,2]),np.array([3,4])]
I would like to get a unique list, e.g.
[np.array([1,2]),np.array([3,4])]
or perhaps even better, a dict with counts, e.g. {np.array([1,2]) : 2, np.array([3,4]) : 1}
So far I tried list(set(mylist)), but the error is TypeError: unhashable type: 'numpy.ndarray'
As the error indicates, NumPy arrays aren't hashable. You can turn them to tuples, which are hashable and build a collections.Counter from the result:
from collections import Counter
Counter(map(tuple,mylist))
# Counter({(1, 2): 2, (3, 4): 1})
If you wanted a list of unique tuples, you could construct a set:
set(map(tuple,mylist))
# {(1, 2), (3, 4)}
In general, the best option is to use np.unique method with custom parameters
u, idx, counts = np.unique(X, axis=0, return_index=True, return_counts=True)
Then, according to documentation:
u is an array of unique arrays
idx is the indices of the X that give the unique values
counts is the number of times each unique item appears in X
If you need a dictionary, you can't store hashable values in its keys, so you might like to store them as tuples like in #yatu's answer or like this:
dict(zip([tuple(n) for n in u], counts))
Pure numpy approach:
numpy.unique(mylist, axis=0)
which produces a 2d array with your unique arrays in rows:
numpy.array([
[1 2],
[3 4]])
Works if all your arrays have same length (like in your example).
This solution can be useful depending on what you do earlier in your code: perhaps you would not need to get into plain Python at all, but stick to numpy instead, which should be faster.
Use the following:
import numpy as np
mylist = [np.array([1,2]),np.array([1,2]),np.array([3,4])]
np.unique(mylist, axis=0)
This gives out list of uniques arrays.
array([[1, 2],
[3, 4]])
Source: https://numpy.org/devdocs/user/absolute_beginners.html#how-to-get-unique-items-and-counts

Iterating through rows in numpy array with one row

For a 2D numpy array A, the loop for a in A will loop through all the rows in A. This functionality is what I want for my code, but I'm having difficulty with the edge case where A only has one row (i.e., is essentially a 1-dimensional array). In this case, the for loop treats A as a 1D array and iterates through its elements. What I want to instead happen in this case is a natural extension of the 2D case, where the loop retrieves the (single) row in A. Is there a way to format the array A such that the for loop functions like this?
Depending on if you declare the array yourself you can do this:
A = np.array([[1, 2, 3]])
Else you can check the dim of your array before iterating over it
B = np.array([1, 2, 3])
if B.ndim == 1:
B = B[None, :]
Or you can use the function np.at_least2d
C = np.array([1, 2, 3])
C = np.atleast_2d(C)
If your array trully is a 2D array, even with one row, there is no edge case:
import numpy
a = numpy.array([[1, 2, 3]])
for line in a:
print(line)
>>> [1 2 3]
You seem to be confusing numpy.array([[1, 2, 3]]) which is a 2D array of one line and numpy.array([1, 2, 3]) which would be a 1D array.
I think you can use np.expand_dims to achieve your goal
X = np.expand_dims(X, axis=0)

Scale data error: "ValueError: setting an array element with a sequence." [duplicate]

Why do the following code samples:
np.array([[1, 2], [2, 3, 4]])
np.array([1.2, "abc"], dtype=float)
...all give the following error?
ValueError: setting an array element with a sequence.
Possible reason 1: trying to create a jagged array
You may be creating an array from a list that isn't shaped like a multi-dimensional array:
numpy.array([[1, 2], [2, 3, 4]]) # wrong!
numpy.array([[1, 2], [2, [3, 4]]]) # wrong!
In these examples, the argument to numpy.array contains sequences of different lengths. Those will yield this error message because the input list is not shaped like a "box" that can be turned into a multidimensional array.
Possible reason 2: providing elements of incompatible types
For example, providing a string as an element in an array of type float:
numpy.array([1.2, "abc"], dtype=float) # wrong!
If you really want to have a NumPy array containing both strings and floats, you could use the dtype object, which allows the array to hold arbitrary Python objects:
numpy.array([1.2, "abc"], dtype=object)
The Python ValueError:
ValueError: setting an array element with a sequence.
Means exactly what it says, you're trying to cram a sequence of numbers into a single number slot. It can be thrown under various circumstances.
1. When you pass a python tuple or list to be interpreted as a numpy array element:
import numpy
numpy.array([1,2,3]) #good
numpy.array([1, (2,3)]) #Fail, can't convert a tuple into a numpy
#array element
numpy.mean([5,(6+7)]) #good
numpy.mean([5,tuple(range(2))]) #Fail, can't convert a tuple into a numpy
#array element
def foo():
return 3
numpy.array([2, foo()]) #good
def foo():
return [3,4]
numpy.array([2, foo()]) #Fail, can't convert a list into a numpy
#array element
2. By trying to cram a numpy array length > 1 into a numpy array element:
x = np.array([1,2,3])
x[0] = np.array([4]) #good
x = np.array([1,2,3])
x[0] = np.array([4,5]) #Fail, can't convert the numpy array to fit
#into a numpy array element
A numpy array is being created, and numpy doesn't know how to cram multivalued tuples or arrays into single element slots. It expects whatever you give it to evaluate to a single number, if it doesn't, Numpy responds that it doesn't know how to set an array element with a sequence.
In my case , I got this Error in Tensorflow , Reason was i was trying to feed a array with different length or sequences :
example :
import tensorflow as tf
input_x = tf.placeholder(tf.int32,[None,None])
word_embedding = tf.get_variable('embeddin',shape=[len(vocab_),110],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))
embedding_look=tf.nn.embedding_lookup(word_embedding,input_x)
with tf.Session() as tt:
tt.run(tf.global_variables_initializer())
a,b=tt.run([word_embedding,embedding_look],feed_dict={input_x:example_array})
print(b)
And if my array is :
example_array = [[1,2,3],[1,2]]
Then i will get error :
ValueError: setting an array element with a sequence.
but if i do padding then :
example_array = [[1,2,3],[1,2,0]]
Now it's working.
for those who are having trouble with similar problems in Numpy, a very simple solution would be:
defining dtype=object when defining an array for assigning values to it. for instance:
out = np.empty_like(lil_img, dtype=object)
In my case, the problem was another. I was trying convert lists of lists of int to array. The problem was that there was one list with a different length than others. If you want to prove it, you must do:
print([i for i,x in enumerate(list) if len(x) != 560])
In my case, the length reference was 560.
In my case, the problem was with a scatterplot of a dataframe X[]:
ax.scatter(X[:,0],X[:,1],c=colors,
cmap=CMAP, edgecolor='k', s=40) #c=y[:,0],
#ValueError: setting an array element with a sequence.
#Fix with .toarray():
colors = 'br'
y = label_binarize(y, classes=['Irrelevant','Relevant'])
ax.scatter(X[:,0].toarray(),X[:,1].toarray(),c=colors,
cmap=CMAP, edgecolor='k', s=40)
When the shape is not regular or the elements have different data types, the dtype argument passed to np.array only can be object.
import numpy as np
# arr1 = np.array([[10, 20.], [30], [40]], dtype=np.float32) # error
arr2 = np.array([[10, 20.], [30], [40]]) # OK, and the dtype is object
arr3 = np.array([[10, 20.], 'hello']) # OK, and the dtype is also object
``
In my case, I had a nested list as the series that I wanted to use as an input.
First check: If
df['nestedList'][0]
outputs a list like [1,2,3], you have a nested list.
Then check if you still get the error when changing to input df['nestedList'][0].
Then your next step is probably to concatenate all nested lists into one unnested list, using
[item for sublist in df['nestedList'] for item in sublist]
This flattening of the nested list is borrowed from How to make a flat list out of list of lists?.
The error is because the dtype argument of the np.array function specifies the data type of the elements in the array, and it can only be set to a single data type that is compatible with all the elements. The value "abc" is not a valid float, so trying to convert it to a float results in a ValueError. To avoid this error, you can either remove the string element from the list, or choose a different data type that can handle both float values and string values, such as object.
numpy.array([1.2, "abc"], dtype=object)

Numpy array with numpy arrays as objects

I'd like to create a numpy ndarray with entries of type ndarray itself. I was able to wrap ndarrays into another type to get it work but I want to do this without wrapping. With wrapping a ndarray x into e.g. the dictionary {1:x} I can do
F = np.vectorize(lambda x: {1:np.repeat(x,3)})
F(np.arange(9).reshape(3,3))
and get (3,3) ndarray with entries {1:[0,0,0]} ... {1:[8,8,8]} (with ndarrays). When change F to F = np.vectorize(lambda x: np.repeat(x,3)) numpy complains ValueError: setting an array element with a sequence. I guess it detects that the entries as arrays themselves and doesn't threat them as objects anymore.
How can I avoid this and do the same thing without wrapping the entries from ndarray into something different?
Thanks a lot in advance for hints :)
You can (ab-)use numpy.frompyfunc:
>>> F = np.arange(9).reshape(3, 3)
>>> np.frompyfunc(F.__getitem__, 1, 1)(range(3))
array([array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])], dtype=object)

How to convert a python set to a numpy array?

I am using a set operation in python to perform a symmetric difference between two numpy arrays. The result, however, is a set and I need to convert it back to a numpy array to move forward. Is there a way to do this? Here's what I tried:
a = numpy.array([1,2,3,4,5,6])
b = numpy.array([2,3,5])
c = set(a) ^ set(b)
The results is a set:
In [27]: c
Out[27]: set([1, 4, 6])
If I convert to a numpy array, it places the entire set in the first array element.
In [28]: numpy.array(c)
Out[28]: array(set([1, 4, 6]), dtype=object)
What I need, however, would be this:
array([1,4,6],dtype=int)
I could loop over the elements to convert one by one, but I will have 100,000 elements and hoped for a built-in function to save the loop. Thanks!
Do:
>>> numpy.array(list(c))
array([1, 4, 6])
And dtype is int (int64 on my side.)
Don't convert the numpy array to a set to perform exclusive-or. Use setxor1d directly.
>>> import numpy
>>> a = numpy.array([1,2,3,4,5,6])
>>> b = numpy.array([2,3,5])
>>> numpy.setxor1d(a, b)
array([1, 4, 6])
Try:
numpy.fromiter(c, int, len(c))
This is twice as fast as the solution with list as a middle product.
Try this.
numpy.array(list(c))
Converting to list before initializing numpy array would set the individual elements to integer rather than the first element as the object.

Categories