Are Numpy arrays hashable? - python

I've read that numpy arrays are hashable which means it is immutable but I'm able to change it's values so what does it exactly mean by being hashable?
c=pd.Series('a',index=range(6))
c
Out[276]:
0 a
1 a
2 a
3 a
4 a
5 a
dtype: object
This doesn't give me error then why it gives error if I try to do the same with numpy array.
d=pd.Series(np.array(['a']),index=range(6))

Contrary to what you have read, array are not hashable. You can test this with
import numpy as np,collections
isinstance(np.array(1), collections.Hashable)
or
{np.array(1):1}
This has nothing to do with the error you are getting:
d=pd.Series(np.array('a'),index=range(6))
ValueError: Wrong number of dimensions
the error is specific, and has nothing to do with hashes. The data frame is expecting at least something with 1 dimension, whereas the above has 0 dimensions. This is due to the fact it is getting an array - so it checks the dimension (as opposed to passing the string directly, where Pandas developers have chosen to implement as you have shown. TBH they could have chosen the same for a 0 dimension array).
So you could try:
d=pd.Series(np.array(('a',)),index=range(6))
ValueError: Wrong number of items passed 1, placement implies 6
The index value expects there to be a 6 in one dimension, so it fails. Finally
pd.Series(np.array(['a']*6),index=range(6))
0 a
1 a
2 a
3 a
4 a
5 a
dtype: object
works. So the DataFrame has no problem being initiated from an array, and this has nothing to do with hashability.

Related

Upsampling using Numpy

I want to upsample a given 1d array by adding 'k-1' zeros between the elements for a given upsampling factor 'k'.
k=2
A = np.array([1,2,3,4,5])
B = np.insert(A,np.arange(1,len(A)), values=np.zeros(k-1))
The Above code works for k=2.
Output: [1 0 2 0 3 0 4 0 5]
k=3
A = np.array([1,2,3,4,5])
B = np.insert(A,np.arange(1,len(A)), values=np.zeros(k-1))
For k=3, it's throwing me an error.
The output I desire is k-1 i.e., 3-1 = 2 zeros between the elements.
Output: [1,0,0,2,0,0,3,0,0,4,0,0,5]
I want to add k-1 zeros between the elements of the 1d array.
ValueError Traceback (most recent call last)
Cell In [98], line 4
1 k = 3
3 A = np.array([1,2,3,4,5])
----> 4 B = np.insert(A, np.arange(1,len(A)), values=np.zeros(k-1))
6 print(k,'\n')
7 print(A,'\n')
File <__array_function__ internals>:180, in insert(*args, **kwargs)
File c:\Users\Naruto\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\lib\function_base.py:5325, in insert(arr, obj, values, axis)
5323 slobj[axis] = indices
5324 slobj2[axis] = old_mask
-> 5325 new[tuple(slobj)] = values
5326 new[tuple(slobj2)] = arr
5328 if wrap:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (4,)```
Would this be what you are looking for?
k=3
A=np.array([1,2,3,4,5])
B=np.insert(A, list(range(1,len(A)+1))*(k-1), 0)
I just duplicate the indexes in the obj array. Plus, no need to build an array of zeros, a single 0 scalar will do for the value argument.
Note that there are certainly better ways than the list to create that index (since it actually build a list). I fail to think of a one-liner for now. But, if that list is big, it might be a good idea to create an iterator for that.
I am not sure (I've never asked myself this question before) if this insert is optimal neither.
For example
B=np.zeros((len(A)*k,), dtype=np.int)
B[::k]=A
also does the trick. Which one is better memory wise (I would say this one, but just at first glance, because it doesn't create the obj list), and cpu-wise, not sure.
EDIT: in fact, I've just tried. The second solution is way faster (27 ms vs 1586 ms, for A with 50000 values and k=100). Which is not surprising. It is quite easy to figure out what it does (in C, I mean, in numpy code, not in python): just an allocation, and then a for loop to copy some values. It could hardly be simpler. Whereas insert probably computes shifting and such
A simple and fast method using np.zeros to create B, then assign values from A.
k = 3
A = np.array([1,2,3,4,5])
B = np.zeros(k*len(A)-k+1, dtype=A.dtype)
B[::k] = A

Problem converting numpy array to ctypes array

I'm having a problem converting a numpy array to a ctypes array. I don't get any errors or exceptions, but the ctypes array is completely different from the original array.
def convarray(x):
arr = x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint64) )
print(arr[0], arr[1], arr[2])
print(x.shape, x.dtype, x)
...
The result of the print statements is:
8 399 1099526307842(958150,) uint64 [ 8 8 8 ... 92 94 96]
As you can see, of the first three elements, only one is correct.
Why is this happening?
I am using Numpy 1.21.0 with Python 3.9.2
I discovered what the problem was: the array being passed as x was derived by slicing from a 2d array, and thus its underlying data was 2 dimensional as well. Setting x = x.copy() solved the problem by creating a new array with 1 dimensional data.

What is the difference between numpyArr[:,:,:,c] and numpyArr[...,c]?

I've beeen taking deep learning course on coursera. While I was doing my assignment I saw one piece of code on the github.
1. numpyArr[...,c]
2. numpyArr[:,:,:,c]
What is the difference between these slicing methods?
If both arrays have 4 dimensions there is no difference in the result. However, if you do not really care about the number of dimensions, using the Ellipsis (...) just indicates any number of dimensions. So the first version means:
"get all dimensions but from the last one (whatever the last is) only entry c "
and the second means
"get dimensions 0, 1, 2 complete and from dimension 3 only entry c.
Which is the same for a 4-d array but different for a 5d array.
For an array with many dimensions even more fun is possible:
arr = np.random.uniform(size=(3, 3, 3, 3, 3))
print(arr[1, ..., 2, 3].shape)
Which means: get the second entry on the first dimension and of that entry 2 and 3 of the two last dimensions with everything in between.
Some years ago, this has already been asked, but one needs to know that ... is the Ellipsis.

How to reproduce the error 'InvalidArgumentError : expected multiples argument to be a vector of length 2 but got length 3' while using tf.tile

I implemented some code with tf.tile and got this error message:
InvalidArgumentError : expected multiples argument to be a vector of length 2 but got length 3
The code is quite complicated and I can't directly find out what caused the error. So I made some dummy codes to reproduce the error, so that I may understand which value was the source of the error. However, I can't figure out how to reproduce this error with dummy code.
I tried to do it like this:
import tensorflow as tf
a = tf.constant([[1,2,3],[2,3,4]])
b = tf.tile(a, [1,1,3])
This gives me error the message:
Shape must be rank 2 but is rank 3 for 'Tile_0' with input shapes:~~
Can anybody provide some example code that can reproduce my original error?
After four years, I suspect a direct solution won't be useful, so here's a general explanation for the other 2,000 people that have viewed this in the meantime!
From the documentation:
This operation creates a new tensor by replicating input multiples times. The output tensor's i'th dimension has input.dims(i) * multiples[i] elements, and the values of input are replicated multiples[i] times along the 'i'th dimension. For example, tiling [a b c d] by [2] produces [a b c d a b c d].
Your a vector is a rank-2 tensor, so tf.tiles expects multiples to have 2 elements, but your multiples has 3 elements. You must match a's dimension with the length of multiples.

Issues converting dtype i4 to dtype s4

So i have an array that is (10,3) with type of i4. Im looking to convert the data to 3 different string arrays, but i'm having issues with the 2d version of the array.
import numpy
xyz = (100.0*numpy.random.random((10,3))).astype("i4")
A = xyz[:,0:3].view("S12") #works fine
B = xyz[:,0:2].view("S8") #fails
C = xyz[:,0:1].view("S4") #works fine
D = xyz[0,0:2].view("S8") #works fine using only 1 element instead of whole array
Why is it not possible for me to convert to the general form:
xyz[:,0:dim].view("S%d"%(4*dim))
regardless of the dim chosen?
xyz[:,:2].copy().view('S8')
works.
With [:,:2] you are viewing 2 numbers (4 byte blocks), skipping 1, viewing the next 2 etc. That's a view, not a copy, because strides and shape can handle that without changing the underlying data.
But if you try to view the same thing in 8 bytes blocks, strides can't handle it. It requires viewing 1 block, skipping 1/2 block, etc.
By making a copy, those 8 byte blocks are contiguous, and can be viewed as a unit.
numpy.array(xyz[:,0:2].tostring()) is effectively a copy - writing the data (just the :2 columns) to a bytestring, and recreating an array.
So i've done some experimenting and it looks like i've found something that consistently works:
numpy.array(xyz[:,0:2].tostring()).view("10S8").reshape(10,1)
Im not super pumped to be using it since its so odd looking, but it works so whatever i guess. If anyone has a better answer let me know.

Categories