I'm trying to use the np.ceil function on a structrued numpy array, but all I get is the error message:
TypeError: ufunc 'ceil' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Here's a simply example of what that array would look like:
arr = np.array([(1.4,2.3), (3.2,4.1)], dtype=[("x", "<f8"), ("y", "<f8")])
When I try
np.ceil(arr)
I get the above mentioned error. When I just use one column, it works:
In [77]: np.ceil(arr["x"])
Out[77]: array([ 2., 4.])
But I need to get the entire array. Is there any way other than going column by column, or not using structured arrays all together?
Here's a dirty solution based on viewing the array without its structure, taking the ceiling, and then converting it back to a structured array.
# sample array
arr = np.array([(1.4,2.3), (3.2,4.1)], dtype = [("x", "<f8"), ("y", "<f8")])
# remove struct and take the ceiling
arr1 = np.ceil(arr.view((float, len(arr.dtype.names))))
# coerce it back into the struct
arr = np.array(list(tuple(t) for t in arr1), dtype = arr.dtype)
# kill the intermediate copy
del arr1
and here it is as an unreadable one-liner but without assigning the intermediate copy arr1
arr = np.array(
list(tuple(t) for t in np.ceil(arr.view((float, len(arr.dtype.names))))),
dtype = arr.dtype
)
# array([(2., 3.), (4., 5.)], dtype=[('x', '<f8'), ('y', '<f8')])
I don't claim this is a great solution, but it should help you move on with your project until something better is proposed
Related
I was working with numpy.ndarray and something interesting happened.
I created an array with the shape of (2, 2) and left everything else with the default values.
It created an array for me with these values:
array([[2.12199579e-314, 0.00000000e+000],
[5.35567160e-321, 7.72406468e-312]])
I created another array with the same default values and it also gave me the same result.
Then I created a new array (using the default values and the shape (2, 2)) and filled it with zeros using the 'fill' method.
The interesting part is that now whenever I create a new array with ndarray it gives me an array with 0 values.
So what is going on behind the scenes?
See https://numpy.org/doc/stable/reference/generated/numpy.empty.html#numpy.empty:
(Precisely as #Michael Butscher commented)
np.empty([2, 2]) creates an array without touching the contents of the memory chunk allocated for the array; thus, the array may look as if filled with some more or less random values.
np.ndarray([2, 2]) does the same.
Other creation methods, however, fill the memory with some values:
np.zeros([2, 2]) fills the memory with zeros,
np.full([2, 2], 9) fills the memory with nines, etc.
Now, if you create a new array via np.empty() after creating (and disposing of, i.e. automatically garbage collected) an array filled with e.g. ones, your new array may be allocated the same chunk of memory and thus look as if "filled" with ones.
np.empty explicitly says it returns:
Array of uninitialized (arbitrary) data of the given shape, dtype, and
order. Object arrays will be initialized to None.
It's compiled code so I can't say for sure, but I strongly suspect is just calls np.ndarray, with shape and dtype.
ndarray describes itself as a low level function, and lists many, better alternatives.
In a ipython session I can make two arrays:
In [2]: arr = np.empty((2,2), dtype='int32'); arr
Out[2]:
array([[ 927000399, 1267404612],
[ 1828571807, -1590157072]])
In [3]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[3]:
array([[ 927000399, 1267404612],
[ 1828571807, -1590157072]])
The values are the same, but when I check the "location" of their data buffers, I see that they are different:
In [4]: arr.__array_interface__['data'][0]
Out[4]: 2213385069328
In [5]: arr1.__array_interface__['data'][0]
Out[5]: 2213385068176
We can't use that number in code to fiddle with the values, but it's useful as a human-readable indicator of where the data is stored. (Do you understand the basics of how arrays are stored, with shape, dtype, strides, and data-buffer?)
Why the "uninitialized values" are the same is anyones guess; my guess it's just an artifact of the how that bit of memory was used before. np.empty stresses that we shouldn't place an significance to those values.
Doing the ndarray again, produces different values and location:
In [9]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[9]:
array([[1469865440, 515],
[ 0, 0]])
In [10]: arr1.__array_interface__['data'][0]
Out[10]: 2213403372816
apparent reuse
If I don't assign the array to a variable, or otherwise "hang on to it", numpy may reuse the data buffer memory:
In [17]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[17]: 2213403374512
In [18]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[18]: 2213403374512
In [19]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[19]: 2213403374512
In [20]: np.empty((2,2), dtype='int').__array_interface__['data'][0]
Out[20]: 2213403374512
Again, we shouldn't place too much significance to this reuse, and certainly not count on it for any calculations.
object dtype
If we specify the object dtype, then the values are initialized to None. This dtype contains references/pointers to objects in memory, and "random" pointers wouldn't be safe.
In [14]: arr1 = np.ndarray((2,2), dtype='object'); arr1
Out[14]:
array([[None, None],
[None, None]], dtype=object)
In [15]: arr1 = np.ndarray((2,2), dtype='U3'); arr1
Out[15]:
array([['', ''],
['', '']], dtype='<U3')
I'm pretty illiterate in using Python/numpy.
I have the following piece of code:
data = np.array([])
for i in range(10):
data = np.append(data, GetData())
return data
GetData() returns a numpy array with a custom dtype. However when executing the above piece of code, the numbers convert to float64 which I suspect is the culprit for other issues I'm having. How can I copy/append the output of the functions while preserving the dtype as well?
Given the comments stating that you will only know the type of data once you run GetData(), and that multiple types are expected, you could do it like so:
# [...]
dataByType = {} # dictionary to store the dtypes encountered and the arrays with given dtype
for i in range(10):
newData = GetData()
if newData.dtype not in dataByType:
# If the dtype has not been encountered yet,
# create an empty array with that dtype and store it in the dict
dataByType[newData.dtype] = np.array([], dtype=newData.dtype)
# Append the new data to the corresponding array in dict, depending on dtype
dataByType[newData.dtype] = np.append(dataByType[newData.dtype], newData)
Taking into account hpaulj's answer, if you wish to conserve the different types you might encounter without creating a new array at each iteration you can adapt the above to:
# [...]
dataByType = {} # dictionary to store the dtypes encountered and the list storing data with given dtype
for i in range(10):
newData = GetData()
if newData.dtype not in dataByType:
# If the dtype has not been encountered yet,
# create an empty list with that dtype and store it in the dict
dataByType[newData.dtype] = []
# Append the new data to the corresponding list in dict, depending on dtype
dataByType[newData.dtype].append(newData)
# At this point, you have all your data pieces stored according to their original dtype inside the dataByType dictionary.
# Now if you wish you can convert them to numpy arrays as well
# Either by concatenation, updating what is stored in the dict
for dataType in dataByType:
dataByType[dataType] = np.concatenate(dataByType[dataType])
# No need to specify the dtype in concatenate here, since previous step ensures all data pieces are the same type
# Or by creating array directly, to store each data piece at a different index
for dataType in dataByType:
dataByType[dataType] = np.array(dataByType[dataType])
# As for concatenate, no need to specify the dtype here
A little example:
import numpy as np
# to get something similar to GetData in the example structure:
getData = [
np.array([1.,2.], dtype=np.float64),
np.array([1,2], dtype=np.int64),
np.array([3,4], dtype=np.int64),
np.array([3.,4.], dtype=np.float64)
] # dtype precised here for clarity, but not needed
dataByType = {}
for i in range(len(getData)):
newData = getData[i]
if newData.dtype not in dataByType:
dataByType[newData.dtype] = []
dataByType[newData.dtype].append(newData)
print(dataByType) # output formatted below for clarity
# {dtype('float64'):
# [array([1., 2.]), array([3., 4.])],
# dtype('int64'):
# [array([1, 2], dtype=int64), array([3, 4], dtype=int64)]}
Now if we use concatenate on that dataset, we get 1D arrays, conserving the original type (dtype=float64 not precised in the output since it is the default type for floating point values):
for dataType in dataByType:
dataByType[dataType] = np.concatenate(dataByType[dataType])
print(dataByType) # once again output formatted for clarity
# {dtype('float64'):
# array([1., 2., 3., 4.]),
# dtype('int64'):
# array([1, 2, 3, 4], dtype=int64)}
And if we use array, we get 2D arrays:
for dataType in dataByType:
dataByType[dataType] = np.array(dataByType[dataType])
print(dataByType)
# {dtype('float64'):
# array([[1., 2.],
# [3., 4.]]),
# dtype('int64'):
# array([[1, 2],
# [3, 4]], dtype=int64)}
Important thing to note: using array will not work as intended if all the arrays to combine don't have the same shape:
import numpy as np
print(repr(np.array([
np.array([1,2,3]),
np.array([4,5])])])))
# array([array([1, 2, 3]), array([4, 5])], dtype=object)
You get an array of dtype object, which are all in this case arrays of different lengths.
Your use of [] and append indicates that your are naively copying that common list idiom:
alist = []
for x in another_list:
alist.append(x)
Your data is not a clone of the [] list:
In [220]: np.array([])
Out[220]: array([], dtype=float64)
It's an array with shape (0,) and dtype float.
np.append is not an list append clone. I stress that, because too many new users make that mistake, and the result is many different errors. It is really just a cover for np.concatenate, one that takes 2 arguments instead of a list of arguments. As the docs stress it returns a new array, and when used iteratively, that means a lot of copying.
It is best to collect your arrays in a list, and give it to concatenate. List append is in-place, and better when done iteratively. If you give concatenate a list of arrays, the resulting dtype will be the common one (or whatever promoting requires). (new versions do let you specify dtype when calling concatenate.)
Keep the numpy documentation at hand (python too if necessary), and look up functions. Pay attention to how they are called, including the keyword parameters). And practice with small examples. I keep an interactive python session at hand, even when writing answers.
When working with arrays, pay close attention to shape and dtype. Don't make assumptions.
concatenating 2 int arrays:
In [238]: np.concatenate((np.array([1,2]),np.array([4,3])))
Out[238]: array([1, 2, 4, 3])
making one a float array (just by adding a decimal point to one number):
In [239]: np.concatenate((np.array([1,2]),np.array([4,3.])))
Out[239]: array([1., 2., 4., 3.])
It won't let me change the result to int:
In [240]: np.concatenate((np.array([1,2]),np.array([4,3.])), dtype=int)
Traceback (most recent call last):
File "<ipython-input-240-91b4e3fec07a>", line 1, in <module>
np.concatenate((np.array([1,2]),np.array([4,3.])), dtype=int)
File "<__array_function__ internals>", line 180, in concatenate
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'same_kind'
If an element is a string, the result is also a string dtype:
In [241]: np.concatenate((np.array([1,2]),np.array(['4',3.])))
Out[241]: array(['1', '2', '4', '3.0'], dtype='<U32')
Sometimes it is necessary to adjust dtypes after a calculation:
In [243]: np.concatenate((np.array([1,2]),np.array(['4',3.]))).astype(float)
Out[243]: array([1., 2., 4., 3.])
In [244]: np.concatenate((np.array([1,2]),np.array(['4',3.]))).astype(float).as
...: type(int)
Out[244]: array([1, 2, 4, 3])
I am using a package that is fetching values from a csv file for me. If I print out the result I get ['0.12' '0.23']. I checked the type, which is <class 'numpy.ndarray'> I want to convert it to a numpy array like [0.12, 0.23].
I tried np.asarray(variabel) but that did not resolve the problem.
Solution
import numpy as np
array = array.astype(np.float)
# If you are just initializing array you can do this
ar= np.array(your_list,dtype=np.float)
It might help to know how the csv was read. But for what ever reason it appears to have created a numpy array with a string dtype:
In [106]: data = np.array(['0.12', '0.23'])
In [107]: data
Out[107]: array(['0.12', '0.23'], dtype='<U4')
In [108]: print(data)
['0.12' '0.23']
The str formatting of such an array omits the comma, the repr display keeps it.
A list equivalent also displays with comma:
In [109]: data.tolist()
Out[109]: ['0.12', '0.23']
We call this a numpy array, but technically it is of class numpy.ndarray
In [110]: type(data)
Out[110]: numpy.ndarray
It can be converted to an array of floats with:
In [111]: data.astype(float)
Out[111]: array([0.12, 0.23])
It is still a ndarray, just the dtype is different. You may need to read more in the numpy docs about dtype.
The error:
If I want to calculate with it it gives me an error TypeError: only size-1 arrays can be converted to Python scalars
has a different source. data has 2 elements. You don't show the code that generates this error, but often we see this in plotting calls. The parameter is supposed to be a single number (often an integer), where as your array, even with a numeric dtype) is two numbers.
Is there a more efficient method in python to extract data from a nested python list such as A = array([[array([[12000000]])]], dtype=object). I have been using A[0][0][0][0], it does not seem to be an efficinet method when you have lots of data like A.
I have also used
numpy.squeeeze(array([[array([[12000000]])]], dtype=object)) but this gives me
array(array([[12000000]]), dtype=object)
PS: The nested array was generated by loadmat() function in scipy module to load a .mat file which consists of nested structures.
Creating such an array is a bit tedious, but loadmat does it to handle the MATLAB cells and 2d matrix:
In [5]: A = np.empty((1,1),object)
In [6]: A[0,0] = np.array([[1.23]])
In [7]: A
Out[7]: array([[array([[ 1.23]])]], dtype=object)
In [8]: A.any()
Out[8]: array([[ 1.23]])
In [9]: A.shape
Out[9]: (1, 1)
squeeze compresses the shape, but does not cross the object boundary
In [10]: np.squeeze(A)
Out[10]: array(array([[ 1.23]]), dtype=object)
but if you have one item in an array (regardless of shape) item() can extract it. Indexing also works, A[0,0]
In [11]: np.squeeze(A).item()
Out[11]: array([[ 1.23]])
item again to extract the number from that inner array:
In [12]: np.squeeze(A).item().item()
Out[12]: 1.23
Or we don't even need the squeeze:
In [13]: A.item().item()
Out[13]: 1.23
loadmat has a squeeze_me parameter.
Indexing is just as easy:
In [17]: A[0,0]
Out[17]: array([[ 1.23]])
In [18]: A[0,0][0,0]
Out[18]: 1.23
astype can also work (though it can be picky about the number of dimensions).
In [21]: A.astype(float)
Out[21]: array([[ 1.23]])
With single item arrays like efficiency isn't much of an issue. All these methods are quick. Things become more complicated when the array has many items, or the items are themselves large.
How to access elements of numpy ndarray?
You could use A.all() or A.any() to get a scalar. This would only work if A contains one element.
Try A.flatten()[0]
This will flatten the array into a single dimension and extract the first item from it. In your case, the first item is the only item.
What worked in my case was the following..
import scipy.io
xcat = scipy.io.loadmat(os.path.join(dir_data, file_name))
pars = xcat['pars'] # Extract numpy.void element from the loadmat object
# Note that you are dealing with a numpy structured array object when you enter pars[0][0].
# Thus you can acces names and all that...
dict_values = [x[0][0] for x in pars[0][0]] # Extract all elements in one go
dict_keys = list(pars.dtype.names) # Extract the corresponding names/tags
dict_xcat = dict(zip(dict_keys, dict_values)) # Pack it up again in a dict
where the idea behind this is.. first extract ALL values I want, and format them in a nice python dict.
This prevents me from cumbersome indexing later in the file...
Of course, this is a very specific solution. Since in my case the values I needed were all floats/ints.
How to read string column only using numpy in python?
csv file:
1,2,3,"Hello"
3,3,3,"New"
4,5,6,"York"
How to get array like:
["Hello","york","New"]
without using pandas and sklearn.
I give the column name as a,b,c,d in csv
import numpy as np
ary=np.genfromtxt(r'yourcsv.csv',delimiter=',',dtype=None)
ary.T[-1]
Out[139]:
array([b'd', b'Hello', b'New', b'York'],
dtype='|S5')
import numpy
fname = 'sample.csv'
csv = numpy.genfromtxt(fname, dtype=str, delimiter=",")
names = csv[:,-1]
print(names)
Choosing the data type
The main way to control how the sequences of strings we have read from the file are converted to other types is to set the dtype argument. Acceptable values for this argument are:
a single type, such as dtype=float. The output will be 2D with the given dtype, unless a name has been associated with each column with the use of the names argument (see below). Note that dtype=float is the default for genfromtxt.
a sequence of types, such as dtype=(int, float, float).
a comma-separated string, such as dtype="i4,f8,|U3".
a dictionary with two keys 'names' and 'formats'.
a sequence of tuples (name, type), such as dtype=[('A', int), ('B', float)].
an existing numpy.dtype object.
the special value None. In that case, the type of the columns will be determined from the data itself (see below).
When dtype=None, the type of each column is determined iteratively from its data. We start by checking whether a string can be converted to a boolean (that is, if the string matches true or false in lower cases); then whether it can be converted to an integer, then to a float, then to a complex and eventually to a string. This behavior may be changed by modifying the default mapper of the StringConverter class.
The option dtype=None is provided for convenience. However, it is significantly slower than setting the dtype explicitly.
A quick file substitute:
In [275]: txt = b'''
...: 1,2,3,"Hello"
...: 3,3,3,"New"
...: 4,5,6,"York"'''
In [277]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=None,usecols=3)
Out[277]:
array([b'"Hello"', b'"New"', b'"York"'],
dtype='|S7')
bytestring array in Py3; or a default unicode string dtype:
In [278]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=str,usecols=3)
Out[278]:
array(['"Hello"', '"New"', '"York"'],
dtype='<U7')
Or the whole thing:
In [279]: data=np.genfromtxt(txt.splitlines(), delimiter=',',dtype=None)
In [280]: data
Out[280]:
array([(1, 2, 3, b'"Hello"'), (3, 3, 3, b'"New"'), (4, 5, 6, b'"York"')],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', 'S7')])
select the f3 field:
In [282]: data['f3']
Out[282]:
array([b'"Hello"', b'"New"', b'"York"'],
dtype='|S7')
Speed should be basically the same
To extract specific values into the numpy array one approach could be:
with open('Exercise1.csv', 'r') as file:
file_content = list(csv.reader(file, delimiter=","))
data = np.array(file_content)
print(file_content[1][1], len(file_content))
for i in range(1, len(file_content)):
patient.append(file_content[i][0])
first_column_array = np.array(patient, dtype=(''))
i iterates through the rows of data and j is the place of the value in the row, so for 0, the first value