I would like to use the dct functionality from the scipy.fftpack with an array of numpy float64. However, it seems it is only implemented for np.float32. Is there any quick workaround I could do to get this done? I looked into it quickly but I am not sure of all the dependencies. So, before messing everything up, I thought I'd ask for tips here!
The only thing I have found so far about this is this link : http://mail.scipy.org/pipermail/scipy-svn/2010-September/004197.html
Thanks in advance.
Here is the ValueError it raises:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-f09567c28e37> in <module>()
----> 1 scipy.fftpack.dct(c[100])
/usr/local/Cellar/python/2.7.3/lib/python2.7/site-packages/scipy/fftpack/realtransforms.pyc in dct(x, type, n, axis, norm, overwrite_x)
118 raise NotImplementedError(
119 "Orthonormalization not yet supported for DCT-I")
--> 120 return _dct(x, type, n, axis, normalize=norm, overwrite_x=overwrite_x)
121
122 def idct(x, type=2, n=None, axis=-1, norm=None, overwrite_x=0):
/usr/local/Cellar/python/2.7.3/lib/python2.7/site-packages/scipy/fftpack/realtransforms.pyc in _dct(x, type, n, axis, overwrite_x, normalize)
215 raise ValueError("Type %d not understood" % type)
216 else:
--> 217 raise ValueError("dtype %s not supported" % tmp.dtype)
218
219 if normalize:
ValueError: dtype >f8 not supported
The problem is not the double precision. Double precision is of course supported. The problem is that you have a little endian computer and (maybe loading a file from a file?) have big endian data, note the > in dtype >f8 not supported. It seems you will simply have to cast it to native double yourself. If you know its double precision, you probably just want to convert everytiong to your native order once:
c = c.astype(float)
Though I guess you could also check c.dtype.byteorder which I think should be '=', and if, switch... something along:
if c.dtype.byteorder != '=':
c = c.astype(c.dtype.newbyteorder('='))
Which should work also if you happen to have single precision or integers...
Related
I started working with a dataset, which is a collection of murder reports.There is a column "Perpetrator Age" in which there are simple integers. But when I looked at his type, it turned out that he was dtype('O').
In order to work with this column further, I want to change its type to dtype('int64'). I tried to do it like this:
data['Perpetrator Age'] = data['Perpetrator Age'].astype(int)
and got this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-64-50a3c796ab1e> in <module>()
----> 1 data['Perpetrator Age'] = data['Perpetrator Age'].astype(int)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
972 # work around NumPy brokenness, #1987
973 if np.issubdtype(dtype.type, np.integer):
--> 974 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
975
976 # if we have a datetime/timedelta array of objects
pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()
ValueError: invalid literal for int() with base 10: ' '
I saw advice for the "object" type, which must first be converted to a string, and then to "int". Tried it, it didn't work either, same error appeared. Please tell me how I can fix this?
As mentioned in the comments, the first row of your df is apparently an empty space (' '). You can either remove it, replace it with something else, or skip it:
df['column_1'].iloc[1:].astype('int')
I was getting this error:
> float() argument must be a string or a number
So, why does this happen?(I tried commands like np.asarray() but it keeps failing).
mp.mpc(cmath.rect(a,b)))
The items in raizes are actually mpmath.mpc instances rather than native Python complex floats. numpy doesn't know how to deal with mpmath types, hence the TypeError.
You didn't mention mpmath at all in your original question. The problem would still have been easy to diagnose if you had posted the full traceback, rather than cutting off the most important part at the end:
In [10]: np.roots(Q)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-f3a270c7e8c0> in <module>()
----> 1 np.roots(Q)
/home/alistair/.venvs/mpmath/lib/python3.6/site-packages/numpy/lib/polynomial.py in roots(p)
220 # casting: if incoming array isn't floating point, make it floating point.
221 if not issubclass(p.dtype.type, (NX.floating, NX.complexfloating)):
--> 222 p = p.astype(float)
223
224 N = len(p)
TypeError: float() argument must be a string or a number, not 'mpc'
Whenever you ask for help with debugging on this site, please always post the whole traceback rather than just (part of) the last line - it contains a lot of information that can be helpful for diagnosing the problem.
The solution is simple enough - just don't convert the native Python complex floats returned by cmath.rect to mpmath.mpc complex floats:
raizes = []
for i in range(2*n):
a, f = cmath.polar(l[i])
if((f>np.pi/2) or (f<-np.pi/2)):
raizes.append(cmath.rect(a*r,f))
Q = np.poly(raizes)
print(np.roots(Q))
# [-0.35372430 +1.08865146e+00j -0.92606224 +6.72823602e-01j
# -0.35372430 -1.08865146e+00j -1.14467588 -9.11902316e-16j
# -0.92606224 -6.72823602e-01j]
I am loading a train.csv file to fit it with a RandomForestClassifier.
The load and processing of the .csv file happens fine.I am able to play around with my dataframe.
When I try:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=150, min_samples_split=2, n_jobs=-1)
rf.fit(train, target)
I get this:
ValueError: could not convert string to float: 'D'
I have tried:
train=train.astype(float)
Replacing all 'D' with another value.
train.convert_objects(convert_numeric=True)
But the issue still persists.
I also tried printing all the valueErrors in my csv file, but cannot find a reference to 'D'.
This is my trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-9d8e309c06b6> in <module>()
----> 1 rf.fit(train, target)
\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
222
223 # Convert data
--> 224 X, = check_arrays(X, dtype=DTYPE, sparse_format="dense")
225
226 # Remap output
\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_arrays(*arrays, **options)
279 array = np.ascontiguousarray(array, dtype=dtype)
280 else:
--> 281 array = np.asarray(array, dtype=dtype)
282 if not allow_nans:
283 _assert_all_finite(array)
\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
460
461 """
--> 462 return array(a, dtype, copy=False, order=order)
463
464 def asanyarray(a, dtype=None, order=None):
ValueError: could not convert string to float: 'D'
How should I approach this problem?
Without RandomForestClassifier is not (as far as I could find) a python library (as included in python), it's difficult to know what's going on in your case. However, what's really happening there is that at some point, you're trying to transform a string 'D' into a float.
I can reproduce your error by doing:
float('D')
Now, to be able to debug this problem, I recommend you to catch the exception:
try:
rf.fit(train, target)
except ValueError as e:
print(e)
#do something clever with train and target like pprint them or something.
Then you can look into what's really going on. I couldn't find much about that random forest classifier except for this that might help:
https://www.npmjs.com/package/random-forest-classifier
You should explore and clean your data. Probably you have a 'D' somewhere in your data which your code try to convert to a float. A trace within a "try-except" block is a good idea.
I saved some arrays using numpy.savez_compressed(). One of the arrays is gigantic, it has the shape (120000,7680), type float32.
Trying to load the array gave me the error below (message caught using Ipython).
Is seems like this is a Numpy limitation:
Numpy: apparent memory error
What are other ways to save such a huge array? (I had problems with cPickle as well)
In [5]: t=numpy.load('humongous.npz')
In [6]: humg = (t['arr_0.npy'])
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.pyc in __getitem__(self, key)
229 if bytes.startswith(format.MAGIC_PREFIX):
230 value = BytesIO(bytes)
--> 231 return format.read_array(value)
232 else:
233 return bytes
/usr/lib/python2.7/dist-packages/numpy/lib/format.pyc in read_array(fp)
456 # way.
457 # XXX: we can probably chunk this to avoid the memory hit.
--> 458 data = fp.read(int(count * dtype.itemsize))
459 array = numpy.fromstring(data, dtype=dtype, count=count)
460
SystemError: error return without exception set
System: Ubuntu 12.04 64 bit, Python 2.7, numpy 1.6.1
Trying to use the awfully useful pandas to deal with data as time series, I am now stumbling over the fact that there do not seem to exist libraries that can directly interpolate (with a spline or similar) over data that has DateTime as an x-axis? I always seem to be forced to convert first to some floating point number, like seconds since 1980 or something like that.
I was trying the following things so far, sorry for the weird formatting, I have this stuff only in the ipython notebook, and I can't copy cells from there:
from scipy.interpolate import InterpolatedUnivariateSpline as IUS
type(bb2temp): pandas.core.series.TimeSeries
s = IUS(bb2temp.index.to_pydatetime(), bb2temp, k=1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-67-19c6b8883073> in <module>()
----> 1 s = IUS(bb2temp.index.to_pydatetime(), bb2temp, k=1)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py in __init__(self, x, y, w, bbox, k)
335 #_data == x,y,w,xb,xe,k,s,n,t,c,fp,fpint,nrdata,ier
336 self._data = dfitpack.fpcurf0(x,y,k,w=w,
--> 337 xb=bbox[0],xe=bbox[1],s=0)
338 self._reset_class()
339
TypeError: float() argument must be a string or a number
By using bb2temp.index.values (that look like these:
array([1970-01-15 184:00:35.884999, 1970-01-15 184:00:58.668999,
1970-01-15 184:01:22.989999, 1970-01-15 184:01:45.774000,
1970-01-15 184:02:10.095000, 1970-01-15 184:02:32.878999,
1970-01-15 184:02:57.200000, 1970-01-15 184:03:19.984000,
) as x-argument, interestingly, the Spline class does create an interpolator, but it still breaks when trying to interpolate/extrapolate to a larger DateTimeIndex (which is my final goal here). Here is how that looks:
all_times = divcal.timed.index.levels[2] # part of a MultiIndex
all_times
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-07-20 00:00:00.045000, ..., 2009-07-20 00:30:00.018000]
Length: 14063, Freq: None, Timezone: None
s(all_times.values) # applying the above generated interpolator
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-74-ff11f6d6d7da> in <module>()
----> 1 s(tall.values)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py in __call__(self, x, nu)
219 # return dfitpack.splev(*(self._eval_args+(x,)))
220 # return dfitpack.splder(nu=nu,*(self._eval_args+(x,)))
--> 221 return fitpack.splev(x, self._eval_args, der=nu)
222
223 def get_knots(self):
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/scipy/interpolate/fitpack.py in splev(x, tck, der, ext)
546
547 x = myasarray(x)
--> 548 y, ier =_fitpack._spl_(x, der, t, c, k, ext)
549 if ier == 10:
550 raise ValueError("Invalid input data")
TypeError: array cannot be safely cast to required type
I tried to use s(all_times) and s(all_times.to_pydatetime()) as well, with the same TypeError: array cannot be safely cast to required type.
Am I, sadly, correct? Did everybody get used to convert times to floating points so much, that nobody thought it's a good idea that these interpolations should work automatically? (I would finally have found a super-useful project to contribute..) Or would you like to prove me wrong and earn some SO points? ;)
Edit: Warning: Check your pandas data for NaNs before you hand it to the interpolation routines. They will not complain about anything but just silently fail.
The problem is that those fitpack routines that are used underneath require floats. So, at some point there has to be a conversion from datetime to floats. This conversion is easy. If bb2temp.index.values is your datetime array, just do:
In [1]: bb2temp.index.values.astype('d')
Out[1]:
array([ 1.22403588e+12, 1.22405867e+12, 1.22408299e+12,
1.22410577e+12, 1.22413010e+12, 1.22415288e+12,
1.22417720e+12, 1.22419998e+12])
You just need to pass that to your spline. And to convert the results back to datetime objects, you do results.astype('datetime64').