cannot sum a column in numpy with nansum - python

Ok, here's the preconditions I cannot change:
I have a dataframe with a single column
it has to be converted and summed in numpy
It looks like that, and it starts from an arbitrary index (I don't think I need to re-index it to save on computational overhead)
3 1.32745e+06
4 0
5 6.07657e+08
6 NaN
The following does not sum it but returns nan. What am I doing wrong?
np_value = np_value.values
print(np.nansum(np_value))

Please provide more information on what your np_value is because I believe that is where you are going wrong. I tried this and got the correct answer of 5.
import numpy as np
import pandas as pd
#create numpy array of values
np_values = np.array([1,0,4,np.nan])
#put those values in a dataframe to test
np_values = pd.DataFrame(data=np_values)
#Take just the values of that data
np_value = np_values.values
print(np.nansum(np_value))

np.nansum can't operate on object arrays or string arrays:
>>> import numpy as np
>>> arr = np.array([1.32745e+06, 0, 6.07657e+08, 'NaN'], dtype=object)
>>> np.nansum(arr)
TypeError: unsupported operand type(s) for +: 'float' and 'str'
>>> arr = np.array([1.32745e+06, 0, 6.07657e+08, 'NaN'])
>>> np.nansum(arr)
TypeError: cannot perform reduce with flexible type
You need to cast it to a numeric type (e.g. float) to make it work:
>>> np.nansum(arr.astype(float))
608984450.0
Note: It's pretty obvious in this case that it's an object or string array because the 0 would display as 0.0 in a float array. Be careful with object arrays, these are slow and often unsupported.

Related

Why does select raise a FutureWarning?

In my code I have a 2D numpy.ndarray filled with numpy.str_ values. I'm trying to change values "null" to "nan" using the select method. The problem is that this method raises a FutureWarning.
I have read this. On a suggestion there I tried to not compare Python strings a Numpy strings, but convert Python string to Numpy string at the start. Obviously that doesn't help and I'm looking for an advice.
I would like to avoid shutting down the warning (as it is in the link). It seems to me like a very dirty approach.
My code snippet:
import pandas_datareader as pd
import numpy as np
import datetime as dt
start_date = dt.datetime(year=2013, month=1, day=1)
end_date = dt.datetime(year=2013, month=2, day=1)
df = pd.DataReader("AAA", "yahoo", start_date, end_date + dt.timedelta(days=1))
array = df.to_numpy()
null = np.str_("null")
nan = np.str_("nan")
array = np.select([array == null, not array == null], [nan, array])
print(array[0][0].__class__)
print(null.__class__)
C\Python\Project.py:13: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
array = np.select([array == null, not array == null], [nan, array])
<class 'numpy.str_'>
<class 'numpy.str_'>
I'm quite new to Python so every help will be appreciated. And also - if you have a better way how to achieve that, please let me know.
Thank you!
Edit: Sorry for that. Now it should work as it is.
I don't have 50 reputation yet, so I can't comment..
As I understand it you only want to change al 'null'-entries to 'nan' instead?
Your code creates a Numpy Array of float-values, but for some reason you expect strings of 'null' in the array?
Perhaps you should've written
array = df.to_numpy()
array = array.astype(str)
to make it more clear.
From here, the array consists only of strings, and to make the change from 'null' to 'nan', you only have to write
array[array == 'null'] = 'nan'
and the warning is gone. You don't even have to use np.select.
If you want floating-point values in your array, you could use Numpy's own np.nan instead of a string, and do
array = array.astype(float)
The nan-strings are automatically converted to np.nan, which is seen as a float.

numpy slicing - TypeError: only integer scalar arrays can be converted to a scalar index

datafile: pattern1.ktx
import numpy as np
data = np.fromfile('pattern1.ktx', dtype=np.byte)
print ('endianness:', hex(data[12:13]))
Result:
TypeError: only integer scalar arrays can be converted to a scalar index
Looks simple enough? I'm not getting it though. How to fix this? Thank You.
How about this one?
b = np.frombuffer(np.array(data[12:12+4], dtype=np.byte), dtype=np.uint32)
print ('endianess:', hex(b))
Same error. How to fix?
OK, it returns a list for some reason, so have to index it.
b = np.frombuffer(np.array(data[12:12+4], dtype=np.byte), dtype=np.uint32)
print ('endianess:', hex(b[0]))
Error fixed.

Unable to insert NA's in numpy array

I was working on this piece of code and was stuck here.
import numpy as np
a = np.arange(10)
a[7:] = np.nan
By theory, it should insert missing values starting from index 7 until the end of the array. However, when I ran the code, some random values are inserted into the array instead of NA's.
Can someone explain what happened here and how should I insert missing values intentionally into numpy arrays?
Not-a-number (NA) is a special type of floating point number. By default, np.arange() creates an array of type int. Casting this to float should allow you to add NA's:
import numpy as np
a = np.arange(10).astype(float)
a[7:] = np.nan

panda read_csv() converting imaginary to real

After calling a file using pandas by this two lines:
import pandas as pd
import numpy as np
df = pd.read_csv('PN_lateral_n_eff.txt', header=None)
df.columns = ["effective_index"]
here is my output:
effective_index
0 2.568393573877396+1.139080496494329e-006i
1 2.568398351899841+1.129979376397734e-006i
2 2.568401556986464+1.123872317134941e-006i
after that, i can not use the numpy to convert it into a real number. Because, panda dtype was object. I tried this:
np.real(df, dtype = float)
TypeError: real() got an unexpected keyword argument 'dtype'
Any way to do that?
Looks like astype(complex) works with Numpy arrays of strings, but not with Pandas Series of objects:
cmplx = df['effective_index'].str.replace('i','j')\ # Go engineering
.values\ # Go NumPy
.astype('str')\ # Go string
.astype(np.complex) # Go complex
#array([ 2.56839357 +1.13908050e-06j, 2.56839835 +1.12997938e-06j,
# 2.56840156 +1.12387232e-06j])
df['effective_index'] = cmplx # Go Pandas again

DataFrame of objects `astype(float)` behaviour different depending if lists or arrays

I'll preface this with the statement that I wouldn't do this in the first place and that I ran across this helping a friend.
Consider the data frame df
df = pd.DataFrame(pd.Series([[1.2]]))
df
0
0 [1.2]
This is a data frame of objects where the objects are lists. In my friend's code, they had:
df.astype(float)
Which breaks as I had hoped
ValueError: setting an array element with a sequence.
However, if those values were numpy arrays instead:
df = pd.DataFrame(pd.Series([np.array([1.2])]))
df
0
0 [1.2]
And I tried the same thing:
df.astype(float)
0
0 1.2
It's happy enough to do something and convert my 1-length arrays to scalars. This feels very dirty!
If instead they were not 1-length arrays
df = pd.DataFrame(pd.Series([np.array([1.2, 1.3])]))
df
0
0 [1.2, 1.3]
Then it breaks
ValueError: setting an array element with a sequence.
Question
Please tell me this is a bug and we can fix it. Or can someone explain why and in what world this makes sense?
Response to #root
You are right. Is this worth an issue? Do you expect/want this?
a = np.empty((1,), object)
a[0] = np.array([1.2])
a.astype(float)
array([ 1.2])
And
a = np.empty((1,), object)
a[0] = np.array([1.2, 1.3])
a.astype(float)
ValueError: setting an array element with a sequence.
This is due to the unsafe default-value for the castingargument of astype. In the docs the argument casting is described as such:
"Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility." (my emphasis)
Any of the other possible castings return a TypeError.
a = np.empty((1,), object)
a[0] = np.array([1.2])
a.astype(float, casting='same_kind')
Results in:
TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'same_kind'
This is true for all castings except unsafe, namely: no, equiv, safe, and same_kind.

Categories