Overloading "==" operator for numpy arrays - python

I am defining a function in Python that needs to check
if a==b:
do.stuff()
In principle, a and b could be numpy arrays or integers, and I would like my implementation to be robust against this. However, to check equality for a numpy array, one needs to append the boolean with all(), which will break the code when a and b are integers.
Is there a simple way to code the equality test so that it works regardless of whether a and b are integers or numpy arrays?

how about this that works for both arrays and integers(numbers):
if np.array_equal(a,b):
do.stuff()

Related

Write multiple arrays with different format (string and numbers) python

I am very new to python and I would like to write the following (something like fprintf in matlab) I do not know why this string not working ???
Here is the code
import numpy as np
coord=np.linspace(0,10,5)
keyy=("LE")
key=np.repeat(keyy,5)
out_arr=np.array_str(key)
zip=np.array([coord,out_arr])
zzip=zip.T
print(zzip)
savefile=np.savetxt("nam.dat",zzip,fmt="%f %s")
The problem is with the following line:
out_arr=np.array_str(key)
This is converting the array ['LE' 'LE' 'LE' 'LE' 'LE'] to the string "['LE' 'LE' 'LE' 'LE' 'LE']". Note the quotes. This is no longer an array, it is a single string, and numpy interprets it as a length-1 array. You first need to drop that line:
key=np.repeat(keyy,5)
zip=np.array([coord,key])
The next problem you will run into is that this will convert the coord numbers into strings, resulting in all elements being string. This is because numpy arrays have a single, fixed type (there are exceptions but they are more complicated). And the only way to do that in this case is to make everything a string.
The simple way around this is to use an "object" array (basically the same as a cell array in python), which stores arbitrary python objects rather than fixed data:
zip=np.array([coord,out_arr], dtype='object')
However, the better solution if you can is to use pandas. Pandas is sort of like MATLAB tables, but much more powerful. It is designed for this sort of data, and has very nice functions for writing text files like you want to do here in a cleaner, more explicit way.
Also, zip is a built-in function, and it is better not to name variables the same names as built-in functions. It is allowed, but zip is an important function and you don't want to block access to it.

why isn't the max of an empty numpy array negative infinity?

Calling arr.max() on an empty numpy array causes an error. But it's a common convention in math to say that the maximum of the empty set is negative infinity. And since numpy supports infinities, why doesn't np.max() behave this way? It would save me a couple lines of additional logic to handle empty arrays. I'm sure there's a good reason, just curious what that is.

Time complexity of python string index access?

If I'm not mistaken, a Python string is stored in unicode scalars. However, unicode scalars can combine to form other grapheme clusters. Therefore, using memory displacement start + scalarSize * n for string[n] isn't the answer you're looking for.
Does this mean that Python iterates linearly through each scalar to get to the scalar you are looking for? If you have
word = 'caf' + char(65) + char(301) #café
Does Python store this as five scalars and iteratively check if any should be combined before moving on or does it run a check upon insertion and store 'pure' scalars?
Edit: I was confusing Python with another language. Python's print() prints out grapheme clusters but Python's str stores scalars no matter how you input them. So two combined scalars will print as one grapheme cluster which could be the same cluster as another scalar. When you go to call string[0] you'd get the scalar as inserted into the string.
Python string indexing does not consider grapheme clusters. It works by Unicode code points. I don't think Python actually has anything built-in for working with grapheme clusters.
String indexing takes constant time, but if you want to retrieve the nth grapheme cluster, string indexing won't do that for you.
(People sometimes suggest applying canonical composition to the string, but there are plenty of possible grapheme clusters that still take multiple code points after canonical composition.)

Longdouble(1e3000) becomes inf: What can I do?

(Most other related questions in the web concern conversion between C's longdouble and python's. This question is different.)
I do not see why I cannot correctly get a longdouble in python like this:
In [72]: import numpy as np
In [73]: np.longdouble(1e3000)
Out[73]: inf
It seems that I need to let my python console know 1e3000 is a longdouble instead of double. How can I do that
The problem is that by using an expression like ...(1e3000), the Python parser has to calculate what is inside the parentheses first, and pass the result to the function call. Long double is not a native type, therefore, the value inside the parentheses is inf - which is passed to the longdouble constructor. The fact the string version fails could maybe be considered a bug in NumPy - it indicates the string is converted to a Python float (which is a "float64" or "double" in C) internally, possibly using the normal Python float constructor.
The workaround is to build the long double object first, with a value that is compatble with a Python float, and them multiply it to get to the desired value. If you need to do that with several values, use a NumPy array instead of a single value:
>>> x = np.longdouble(10)
>>> x
10.0
>>> x **= 3000
>>> x
9.9999999999999999999e+2999
Python doesn't have "long doubles". By using scientific notation, you are making a float literal. Those cannot represent 1e3000, so you get inf. If you use integers, you might be able to do what you need: 10**3000.

Limiting Numeric Digits in Python

I want to put numerics and strings into the same numpy array. However, I very rarely (difficult to replicate, but sometimes) run into an error where the numeric to string conversion results in a value that cannot back-translate into a decimal (ie, I get "9.8267567e", as opposed to "9.8267567e-5" in the array). This is causing problems after writing files. Here is an example of what I am doing (though on a much smaller scale):
import numpy as np
x = np.array(.94749128494582)
y = np.array(x, dtype='|S100')
My understanding is that this should allow 100 string characters, but sometimes I am seeing a cut-off after ~10. Is there another type that I should be assigning, or a way to limit the number of characters in my array (x)?
First of all, x = np.array(.94749128494582) may not be doing what you think because the argument passed into np.array should be some kind of sequence or something with the array interface. Perhaps you meant x = np.array([.94749128494582])?
Now, as for preserving the strings properly, you could solve this by using
y = np.array(x, dtype=object)
However, as Joe has mentioned in his comment, it's not very numpythonic and you may as well be using plain old python lists.
I would recommend to examine carefully why you seem to have this requirement to hold strings and numbers in the same array, it smells to me like you might have inappropriate data structures set up and could benefit from redesigning/refactoring. numpy arrays are for fast numerical operations, they are not really suited to be used for string manipulations or as some kind of storage/database.

Categories