How can I load sklearn data in Jupyter Python 3?

How can I load sklearn data in Jupyter Python 3? - python

Hey I have a very short question. I need to load data for my machine learning course, but it does not work for me and I have no idea why. Im using Jupyter with Python 3.
My Code:
from sklearn.datasets import fetch_covtype
forest = fetch_covtype()
For my friend it works fine with the same conditions. I already tried to update sklearn with pip install -U scikit-learn, but it did not solve the problem. I hope somebody can help me.
It creates the following error:
UnboundLocalError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/sklearn/datasets/covtype.py in fetch_covtype(data_home, download_if_missing, random_state, shuffle, return_X_y)
126 try:
--> 127 X, y
128 except NameError:
UnboundLocalError: local variable 'X' referenced before assignment
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-9-fb303a92b6ca> in <module>
----> 1 forest =fetch_covtype()
/opt/conda/lib/python3.7/site-packages/sklearn/datasets/covtype.py in fetch_covtype(data_home, download_if_missing, random_state, shuffle, return_X_y)
127 X, y
128 except NameError:
--> 129 X, y = _refresh_cache([samples_path, targets_path], 9)
130 # TODO: Revert to the following two lines in v0.23
131 # X = joblib.load(samples_path)
/opt/conda/lib/python3.7/site-packages/sklearn/datasets/base.py in _refresh_cache(files, compress)
928 msg = "sklearn.externals.joblib is deprecated in 0.21"
929 with warnings.catch_warnings(record=True) as warns:
--> 930 data = tuple([joblib.load(f) for f in files])
931
932 refresh_needed = any([str(x.message).startswith(msg) for x in warns])
/opt/conda/lib/python3.7/site-packages/sklearn/datasets/base.py in <listcomp>(.0)
928 msg = "sklearn.externals.joblib is deprecated in 0.21"
929 with warnings.catch_warnings(record=True) as warns:
--> 930 data = tuple([joblib.load(f) for f in files])
931
932 refresh_needed = any([str(x.message).startswith(msg) for x in warns])
/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
603 return load_compatibility(fobj)
604
--> 605 obj = _unpickle(fobj, filename, mmap_mode)
606
607 return obj
/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
527 obj = None
528 try:
--> 529 obj = unpickler.load()
530 if unpickler.compat_mode:
531 warnings.warn("The file '%s' has been generated with a "
/opt/conda/lib/python3.7/pickle.py in load(self)
1083 raise EOFError
1084 assert isinstance(key, bytes_types)
-> 1085 dispatch[key[0]](self)
1086 except _Stop as stopinst:
1087 return stopinst.value
/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle.py in load_build(self)
353 if isinstance(array_wrapper, NDArrayWrapper):
354 self.compat_mode = True
--> 355 self.stack.append(array_wrapper.read(self))
356
357 # Be careful to register our new method.
/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle.py in read(self, unpickler)
196 array = self.read_mmap(unpickler)
197 else:
--> 198 array = self.read_array(unpickler)
199
200 # Manage array subclass case
/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle.py in read_array(self, unpickler)
147 read_size = int(read_count * self.dtype.itemsize)
148 data = _read_bytes(unpickler.file_handle,
--> 149 read_size, "array data")
150 array[i:i + read_count] = \
151 unpickler.np.frombuffer(data, dtype=self.dtype,
/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle_utils.py in _read_bytes(fp, size, error_template)
241 if len(data) != size:
242 msg = "EOF: reading %s, expected %d bytes got %d"
--> 243 raise ValueError(msg % (error_template, size, len(data)))
244 else:
245 return data
ValueError: EOF: reading array data, expected 262144 bytes got 209661

Related

Trying to download dataset, code doesn't work in Jupyter notebook but it does work in Pycharm

I'm trying to download the MNIST dataset from openml, using the openml library.
I tried using Jupyter notebooks because I don't want to download the same dataset every time.
Problem is, after running the following code, I get an error:
from openml.datasets import get_dataset
mnist = get_dataset(554)
x, y, p, q = mnist.get_data(
dataset_format="dataframe", target=mnist.default_target_attribute
)
I'm pasting the whole error message I get, the problem occurs when I try assigning the .get_data to x, y, p and q.
The environment I'm running this on is called Oceanic.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
File ~\anaconda3\envs\Oceanic\lib\site-packages\openml\datasets\dataset.py:491, in OpenMLDataset._cache_compressed_file_from_file(self, data_file)
490 try:
--> 491 data = pd.read_parquet(data_file)
492 except Exception as e:
File ~\anaconda3\envs\Oceanic\lib\site-packages\pandas\io\parquet.py:493, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs)
491 impl = get_engine(engine)
--> 493 return impl.read(
494 path,
495 columns=columns,
496 storage_options=storage_options,
497 use_nullable_dtypes=use_nullable_dtypes,
498 **kwargs,
499 )
File ~\anaconda3\envs\Oceanic\lib\site-packages\pandas\io\parquet.py:240, in PyArrowImpl.read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs)
239 try:
--> 240 result = self.api.parquet.read_table(
241 path_or_handle, columns=columns, **kwargs
242 ).to_pandas(**to_pandas_kwargs)
243 if manager == "array":
File ~\anaconda3\envs\Oceanic\lib\site-packages\pyarrow\parquet.py:1731, in read_table(source, columns, use_threads, metadata, use_pandas_metadata, memory_map, read_dictionary, filesystem, filters, buffer_size, partitioning, use_legacy_dataset, ignore_prefixes)
1727 dataset = ParquetFile(
1728 source, metadata=metadata, read_dictionary=read_dictionary,
1729 memory_map=memory_map, buffer_size=buffer_size)
-> 1731 return dataset.read(columns=columns, use_threads=use_threads,
1732 use_pandas_metadata=use_pandas_metadata)
1734 if ignore_prefixes is not None:
File ~\anaconda3\envs\Oceanic\lib\site-packages\pyarrow\parquet.py:1608, in _ParquetDatasetV2.read(self, columns, use_threads, use_pandas_metadata)
1606 use_threads = False
-> 1608 table = self._dataset.to_table(
1609 columns=columns, filter=self._filter_expression,
1610 use_threads=use_threads
1611 )
1613 # if use_pandas_metadata, restore the pandas metadata (which gets
1614 # lost if doing a specific `columns` selection in to_table)
File ~\anaconda3\envs\Oceanic\lib\site-packages\pyarrow\_dataset.pyx:458, in pyarrow._dataset.Dataset.to_table()
File ~\anaconda3\envs\Oceanic\lib\site-packages\pyarrow\_dataset.pyx:2889, in pyarrow._dataset.Scanner.to_table()
File ~\anaconda3\envs\Oceanic\lib\site-packages\pyarrow\error.pxi:141, in pyarrow.lib.pyarrow_internal_check_status()
File ~\anaconda3\envs\Oceanic\lib\site-packages\pyarrow\error.pxi:112, in pyarrow.lib.check_status()
OSError: NotImplemented: Support for codec 'snappy' not built
The above exception was the direct cause of the following exception:
Exception Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 x, y, p, q = mnist.get_data(
2 dataset_format="dataframe", target=mnist.default_target_attribute
3 )
File ~\anaconda3\envs\Oceanic\lib\site-packages\openml\datasets\dataset.py:698, in OpenMLDataset.get_data(self, target, include_row_id, include_ignore_attribute, dataset_format)
658 def get_data(
659 self,
660 target: Optional[Union[List[str], str]] = None,
(...)
668 List[str],
669 ]:
670 """ Returns dataset content as dataframes or sparse matrices.
671
672 Parameters
(...)
696 List of attribute names.
697 """
--> 698 data, categorical, attribute_names = self._load_data()
700 to_exclude = []
701 if not include_row_id and self.row_id_attribute is not None:
File ~\anaconda3\envs\Oceanic\lib\site-packages\openml\datasets\dataset.py:531, in OpenMLDataset._load_data(self)
528 self._download_data()
530 file_to_load = self.data_file if self.parquet_file is None else self.parquet_file
--> 531 return self._cache_compressed_file_from_file(file_to_load)
533 # helper variable to help identify where errors occur
534 fpath = self.data_feather_file if self.cache_format == "feather" else self.data_pickle_file
File ~\anaconda3\envs\Oceanic\lib\site-packages\openml\datasets\dataset.py:493, in OpenMLDataset._cache_compressed_file_from_file(self, data_file)
491 data = pd.read_parquet(data_file)
492 except Exception as e:
--> 493 raise Exception(f"File: {data_file}") from e
495 categorical = [data[c].dtype.name == "category" for c in data.columns]
496 attribute_names = list(data.columns)
Exception: File: C:\Users\Irving\.openml\org\openml\www\datasets\554\dataset.pq
Now, I've written the same code on Pycharm and it works just fine, I managed to correctly assign the dataframes and show them to me. I've got no idea why this isn't working and I would like to know why because I would prefer to work with Jupyter notebooks.
Any help is appreciated, thanks in advance.

Error while saving a pandas DataFrame to a feather file, using to_feather() function

I am trying to save a pandas dataframe to a feather file using a pandas function .to_feather() as shown below;
df.to_feather("D:\{}.feather".format(parms['table_or_view_name']))
But I am getting an error pointing to pyarrow library, please be noted that I have already upgraded the pyarrow to the most recent build(7.0.0), despite that I am still facing the same issue:
ArrowInvalid Traceback (most recent call last)
<ipython-input-26-5459a546a0cb> in <module>
----> 1 df.to_feather("D:\{}.feather".format(parms['table_or_view_name']))
~\AppData\Roaming\Python\Python38\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
205 else:
206 kwargs[new_arg_name] = new_arg_value
--> 207 return func(*args, **kwargs)
208
209 return cast(F, wrapper)
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\frame.py in to_feather(self, path, **kwargs)
2517 from pandas.io.feather_format import to_feather
2518
-> 2519 to_feather(self, path, **kwargs)
2520
2521 #doc(
~\AppData\Roaming\Python\Python38\site-packages\pandas\io\feather_format.py in to_feather(df, path, storage_options, **kwargs)
85 path, "wb", storage_options=storage_options, is_text=False
86 ) as handles:
---> 87 feather.write_feather(df, handles.handle, **kwargs)
88
89
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\feather.py in write_feather(df, dest, compression, compression_level, chunksize, version)
152
153 if _pandas_api.is_data_frame(df):
--> 154 table = Table.from_pandas(df, preserve_index=False)
155
156 if version == 1:
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\table.pxi in pyarrow.lib.Table.from_pandas()
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
592
593 if nthreads == 1:
--> 594 arrays = [convert_column(c, f)
595 for c, f in zip(columns_to_convert, convert_fields)]
596 else:
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\pandas_compat.py in <listcomp>(.0)
592
593 if nthreads == 1:
--> 594 arrays = [convert_column(c, f)
595 for c, f in zip(columns_to_convert, convert_fields)]
596 else:
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\pandas_compat.py in convert_column(col, field)
579 e.args += ("Conversion failed for column {!s} with type {!s}"
580 .format(col.name, col.dtype),)
--> 581 raise e
582 if not field_nullable and result.null_count > 0:
583 raise ValueError("Field {} was non-nullable but pandas column "
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\pandas_compat.py in convert_column(col, field)
573
574 try:
--> 575 result = pa.array(col, type=type_, from_pandas=True, safe=safe)
576 except (pa.ArrowInvalid,
577 pa.ArrowNotImplementedError,
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.array()
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib._ndarray_to_array()
C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
ArrowInvalid: ('cannot mix list and non-list, non-null values', 'Conversion failed for column Resources with type object')
Note:
This error only occurs when I try to save the file as to_feather(), but when I tried to save the dataframe as .to_json() it worked.
Please suggest if there is any workaround for this error, thanks in advance.

Pipeline error (ValueError: Specifying the columns using strings is only supported for pandas DataFrames)

The example is fully reproducible. Here is full notebook (which downloads data too): https://github.com/ageron/handson-ml2/blob/master/02_end_to_end_machine_learning_project.ipynb
After this part in notebook above:
full_pipeline_with_predictor = Pipeline([
("preparation", full_pipeline),
("linear", LinearRegression())
])
full_pipeline_with_predictor.fit(housing, housing_labels)
full_pipeline_with_predictor.predict(some_data)
I am trying to get predictions on the test set with this code:
X_test_prepared = full_pipeline.transform(X_test)
final_predictions = full_pipeline_with_predictor.predict(X_test_prepared)
But I am receiving error:
C:\Users\Alex\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py:430: FutureWarning: Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.
FutureWarning)
---------------------------------------------------------------------------
Empty Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
796 try:
--> 797 tasks = self._ready_batches.get(block=False)
798 except queue.Empty:
~\AppData\Local\Continuum\anaconda3\lib\queue.py in get(self, block, timeout)
166 if not self._qsize():
--> 167 raise Empty
168 elif timeout is None:
Empty:
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-141-dc87b1c9e658> in <module>
5
6 X_test_prepared = full_pipeline.transform(X_test)
----> 7 final_predictions = full_pipeline_with_predictor.predict(X_test_prepared)
8
9 final_mse = mean_squared_error(y_test, final_predictions)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)
114
115 # lambda, but not partial, allows help() to work with update_wrapper
--> 116 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
117 # update the docstring of the returned function
118 update_wrapper(out, self.fn)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\pipeline.py in predict(self, X, **predict_params)
417 Xt = X
418 for _, name, transform in self._iter(with_final=False):
--> 419 Xt = transform.transform(Xt)
420 return self.steps[-1][-1].predict(Xt, **predict_params)
421
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in transform(self, X)
586
587 self._validate_features(X.shape[1], X_feature_names)
--> 588 Xs = self._fit_transform(X, None, _transform_one, fitted=True)
589 self._validate_output(Xs)
590
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _fit_transform(self, X, y, func, fitted)
455 message=self._log_message(name, idx, len(transformers)))
456 for idx, (name, trans, column, weight) in enumerate(
--> 457 self._iter(fitted=fitted, replace_strings=True), 1))
458 except ValueError as e:
459 if "Expected 2D array, got 1D array instead" in str(e):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1002 # remaining jobs.
1003 self._iterating = False
-> 1004 if self.dispatch_one_batch(iterator):
1005 self._iterating = self._original_iterator is not None
1006
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
806 big_batch_size = batch_size * n_jobs
807
--> 808 islice = list(itertools.islice(iterator, big_batch_size))
809 if len(islice) == 0:
810 return False
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in <genexpr>(.0)
454 message_clsname='ColumnTransformer',
455 message=self._log_message(name, idx, len(transformers)))
--> 456 for idx, (name, trans, column, weight) in enumerate(
457 self._iter(fitted=fitted, replace_strings=True), 1))
458 except ValueError as e:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\__init__.py in _safe_indexing(X, indices, axis)
404 if axis == 1 and indices_dtype == 'str' and not hasattr(X, 'loc'):
405 raise ValueError(
--> 406 "Specifying the columns using strings is only supported for "
407 "pandas DataFrames"
408 )
ValueError: Specifying the columns using strings is only supported for pandas DataFrames
Question: How can I correct that error? And why that error happens?

Since your final pipeline:
full_pipeline_with_predictor = Pipeline([
("preparation", full_pipeline),
("linear", LinearRegression())
])
clearly contains already the full_pipeline, you should not "prepare" again your X_test; doing so, you are "preparing" X_test twice, which is wrong. So, your code should be simply
final_predictions = full_pipeline_with_predictor.predict(X_test)
exactly as it is for getting predictions for some_data, i.e.
full_pipeline_with_predictor.predict(some_data)
which some_data you correctly do not "prepare" before feeding them into the final pipeline.
The whole point of using pipelines is exactly this, i.e. to avoid having to run separately fit-predict for possibly several preparation steps, having wrapped all of them into a single pipeline instead. You correctly apply this process here when you predict some_data, but you somehow seem to have forgotten it in the next step, when you try to predict X_test.

ValueError: field 'IFORM' occurs more than once

I am trying to load a ".unf" file into Jupyter environment, using HyperSpy Library, but I get this error.
import hyperspy.api as hs
data = hs.load("/path/to/file/PRC.unf")
This is the error:
ValueError Traceback (most recent call last)
<ipython-input-7-b0117f505d01> in <module>
----> 1 data = hs.load("/home/vahid/PythonProjects/UNFfiles/PRC.unf")
~/PythonProjects/UNFfiles/venv/lib/python3.7/site-packages/hyperspy/io.py in load(filenames, signal_type, stack, stack_axis, new_axis_name, lazy, convert_units, **kwds)
279 objects = [load_single_file(filename, lazy=lazy,
280 **kwds)
--> 281 for filename in filenames]
282
283 if len(objects) == 1:
~/PythonProjects/UNFfiles/venv/lib/python3.7/site-packages/hyperspy/io.py in <listcomp>(.0)
279 objects = [load_single_file(filename, lazy=lazy,
280 **kwds)
--> 281 for filename in filenames]
282
283 if len(objects) == 1:
~/PythonProjects/UNFfiles/venv/lib/python3.7/site-packages/hyperspy/io.py in load_single_file(filename, **kwds)
316 else:
317 reader = io_plugins[i]
--> 318 return load_with_reader(filename=filename, reader=reader, **kwds)
319
320
~/PythonProjects/UNFfiles/venv/lib/python3.7/site-packages/hyperspy/io.py in load_with_reader(filename, reader, signal_type, convert_units, **kwds)
323 lazy = kwds.get('lazy', False)
324 file_data_list = reader.file_reader(filename,
--> 325 **kwds)
326 objects = []
327
~/PythonProjects/UNFfiles/venv/lib/python3.7/site-packages/hyperspy/io_plugins/semper_unf.py in file_reader(filename, **kwds)
703 def file_reader(filename, **kwds):
704 lazy = kwds.get('lazy', False)
--> 705 semper = SemperFormat.load_from_unf(filename, lazy=lazy)
706 semper.log_info()
707 return [semper.to_signal(lazy=lazy)._to_dictionary()]
~/PythonProjects/UNFfiles/venv/lib/python3.7/site-packages/hyperspy/io_plugins/semper_unf.py in load_from_unf(cls, filename, lazy)
386 :rec_length //
387 2],
--> 388 count=1)
389 metadata.update(sarray2dict(header))
390 assert np.frombuffer(f.read(4), dtype=np.int32)[0] == rec_length, \
**ValueError: field 'IFORM' occurs more than once**
I am not sure what the error is about. Apparently, the "IFORM" is some sort of a dictionary key in this type of data structure. I would be appreciated if anyone can help me address this problem.

pymc3: Disaster example with deterministic switchpoint function

I'm trying to reproduce coal mining example with deterministic function for switchpoint instead of using theano's switch function. Code:
%matplotlib inline
import matplotlib.pyplot as plt
import pymc3
import numpy as np
import theano.tensor as t
import theano
data = np.hstack((np.random.poisson(15,1000),np.random.poisson(2,100)))
plt.plot(data)
#theano.compile.ops.as_op(itypes=[t.lscalar, t.dscalar,t.dscalar],otypes=[t.dvector])
def rate1(sw,mu1,mu2):
n = len(data)
out = np.empty(n)
out[:sw] = mu1
out[sw:] = mu2
return out
with pymc3.Model() as dis:
switchpoint = pymc3.DiscreteUniform('switchpoint',lower=0, upper=len(data)-1)
mu1 = pymc3.Exponential('mu1', lam=1.)
mu2 = pymc3.Exponential('mu2',lam=1.)
disasters=pymc3.Poisson('disasters', mu=rate1, observed = data)
But this code rise an error:
--------------------------------------------------------------------------- KeyError Traceback (most recent call
last) c:\program files\git\theano\theano\tensor\type.py in
dtype_specs(self)
266 'complex64': (complex, 'theano_complex64', 'NPY_COMPLEX64')
--> 267 }[self.dtype]
268 except KeyError:
KeyError: 'object'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call
last) c:\program files\git\theano\theano\tensor\basic.py in
constant_or_value(x, rtype, name, ndim, dtype)
407 rval = rtype(
--> 408 TensorType(dtype=x_.dtype, broadcastable=bcastable),
409 x_.copy(),
c:\program files\git\theano\theano\tensor\type.py in init(self,
dtype, broadcastable, name, sparse_grad)
49 self.broadcastable = tuple(bool(b) for b in broadcastable)
---> 50 self.dtype_specs() # error checking is done there
51 self.name = name
c:\program files\git\theano\theano\tensor\type.py in dtype_specs(self)
269 raise TypeError("Unsupported dtype for %s: %s"
--> 270 % (self.class.name, self.dtype))
271
TypeError: Unsupported dtype for TensorType: object
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call
last) c:\program files\git\theano\theano\tensor\basic.py in
as_tensor_variable(x, name, ndim)
201 try:
--> 202 return constant(x, name=name, ndim=ndim)
203 except TypeError:
c:\program files\git\theano\theano\tensor\basic.py in constant(x,
name, ndim, dtype)
421 ret = constant_or_value(x, rtype=TensorConstant, name=name, ndim=ndim,
--> 422 dtype=dtype)
423
c:\program files\git\theano\theano\tensor\basic.py in
constant_or_value(x, rtype, name, ndim, dtype)
416 except Exception:
--> 417 raise TypeError("Could not convert %s to TensorType" % x, type(x))
418
TypeError: ('Could not convert FromFunctionOp{rate1} to TensorType',
)
During handling of the above exception, another exception occurred:
AsTensorError Traceback (most recent call
last) in ()
14 mu2 = pymc3.Exponential('mu2',lam=1.)
15 #rate1 = pymc3.switch(switchpoint >= np.arange(len(data)), mu1,mu2)
---> 16 disasters=pymc3.Poisson('disasters', mu=rate1, observed = data)
C:\Users\User\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py
in new(cls, name, *args, **kwargs)
19 if isinstance(name, str):
20 data = kwargs.pop('observed', None)
---> 21 dist = cls.dist(*args, **kwargs)
22 return model.Var(name, dist, data)
23 elif name is None:
C:\Users\User\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py
in dist(cls, *args, **kwargs)
32 def dist(cls, *args, **kwargs):
33 dist = object.new(cls)
---> 34 dist.init(*args, **kwargs)
35 return dist
36
C:\Users\User\Anaconda3\lib\site-packages\pymc3\distributions\discrete.py
in init(self, mu, *args, **kwargs)
185 super(Poisson, self).init(*args, **kwargs)
186 self.mu = mu
--> 187 self.mode = floor(mu).astype('int32')
188
189 def random(self, point=None, size=None, repeat=None):
c:\program files\git\theano\theano\gof\op.py in call(self,
*inputs, **kwargs)
598 """
599 return_list = kwargs.pop('return_list', False)
--> 600 node = self.make_node(*inputs, **kwargs)
601
602 if config.compute_test_value != 'off':
c:\program files\git\theano\theano\tensor\elemwise.py in
make_node(self, *inputs)
540 using DimShuffle.
541 """
--> 542 inputs = list(map(as_tensor_variable, inputs))
543 shadow = self.scalar_op.make_node(
544 *[get_scalar_type(dtype=i.type.dtype).make_variable()
c:\program files\git\theano\theano\tensor\basic.py in
as_tensor_variable(x, name, ndim)
206 except Exception:
207 str_x = repr(x)
--> 208 raise AsTensorError("Cannot convert %s to TensorType" % str_x, type(x))
209
210 # this has a different name, because _as_tensor_variable is the
AsTensorError: ('Cannot convert FromFunctionOp{rate1} to TensorType',
)
How i handle this?
The second thing - when i'm using the pymc3.switch function like this:
with pymc3.Model() as dis:
switchpoint = pymc3.DiscreteUniform('switchpoint',lower=0, upper=len(data)-1)
mu1 = pymc3.Exponential('mu1', lam=1.)
mu2 = pymc3.Exponential('mu2',lam=1.)
rate1 = pymc3.switch(switchpoint >= np.arange(len(data)), mu1,mu2)
disasters=pymc3.Poisson('disasters', mu=rate1, observed = data)
And next try to sample:
with dis:
step1 = pymc3.NUTS([mu1, mu2])
step2 = pymc3.Metropolis([switchpoint])
trace = pymc3.sample(10000, step = [step1,step2])
I get an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
c:\program files\git\theano\theano\compile\function_module.py in __call__(self, *args, **kwargs)
858 try:
--> 859 outputs = self.fn()
860 except Exception:
TypeError: expected type_num 9 (NPY_INT64) got 7
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-4-3247d908f897> in <module>()
2 step1 = pymc3.NUTS([mu1, mu2])
3 step2 = pymc3.Metropolis([switchpoint])
----> 4 trace = pymc3.sample(10000, step = [step1,step2])
C:\Users\User\Anaconda3\lib\site-packages\pymc3\sampling.py in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
153 sample_args = [draws, step, start, trace, chain,
154 tune, progressbar, model, random_seed]
--> 155 return sample_func(*sample_args)
156
157
C:\Users\User\Anaconda3\lib\site-packages\pymc3\sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
162 progress = progress_bar(draws)
163 try:
--> 164 for i, strace in enumerate(sampling):
165 if progressbar:
166 progress.update(i)
C:\Users\User\Anaconda3\lib\site-packages\pymc3\sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
244 if i == tune:
245 step = stop_tuning(step)
--> 246 point = step.step(point)
247 strace.record(point)
248 yield strace
C:\Users\User\Anaconda3\lib\site-packages\pymc3\step_methods\compound.py in step(self, point)
11 def step(self, point):
12 for method in self.methods:
---> 13 point = method.step(point)
14 return point
C:\Users\User\Anaconda3\lib\site-packages\pymc3\step_methods\arraystep.py in step(self, point)
116 bij = DictToArrayBijection(self.ordering, point)
117
--> 118 apoint = self.astep(bij.map(point))
119 return bij.rmap(apoint)
120
C:\Users\User\Anaconda3\lib\site-packages\pymc3\step_methods\metropolis.py in astep(self, q0)
123
124
--> 125 q_new = metrop_select(self.delta_logp(q,q0), q, q0)
126
127 if q_new is q:
c:\program files\git\theano\theano\compile\function_module.py in __call__(self, *args, **kwargs)
869 node=self.fn.nodes[self.fn.position_of_error],
870 thunk=thunk,
--> 871 storage_map=getattr(self.fn, 'storage_map', None))
872 else:
873 # old-style linkers raise their own exceptions
c:\program files\git\theano\theano\gof\link.py in raise_with_op(node, thunk, exc_info, storage_map)
312 # extra long error message in that case.
313 pass
--> 314 reraise(exc_type, exc_value, exc_trace)
315
316
C:\Users\User\Anaconda3\lib\site-packages\six.py in reraise(tp, value, tb)
656 value = tp()
657 if value.__traceback__ is not tb:
--> 658 raise value.with_traceback(tb)
659 raise value
660
c:\program files\git\theano\theano\compile\function_module.py in __call__(self, *args, **kwargs)
857 t0_fn = time.time()
858 try:
--> 859 outputs = self.fn()
860 except Exception:
861 if hasattr(self.fn, 'position_of_error'):
TypeError: expected type_num 9 (NPY_INT64) got 7
Apply node that caused the error: Elemwise{Composite{Switch(GE(i0, i1), i2, i3)}}(InplaceDimShuffle{x}.0, TensorConstant{[ 0 1..1098 1099]}, InplaceDimShuffle{x}.0, InplaceDimShuffle{x}.0)
Toposort index: 11
Inputs types: [TensorType(int64, (True,)), TensorType(int32, vector), TensorType(float64, (True,)), TensorType(float64, (True,))]
Inputs shapes: [(1,), (1100,), (1,), (1,)]
Inputs strides: [(4,), (4,), (8,), (8,)]
Inputs values: [array([549]), 'not shown', array([ 1.07762995]), array([ 1.01502801])]
Outputs clients: [[Elemwise{eq,no_inplace}(Elemwise{Composite{Switch(GE(i0, i1), i2, i3)}}.0, TensorConstant{(1,) of 0}), Elemwise{Composite{Switch(GE(i0, i1), ((Switch(i2, i3, (i4 * log(i0))) - i5) - i0), i3)}}[(0, 0)](Elemwise{Composite{Switch(GE(i0, i1), i2, i3)}}.0, TensorConstant{(1,) of 0}, InplaceDimShuffle{x}.0, TensorConstant{(1,) of -inf}, TensorConstant{[ 13. 13... 0. 1.]}, TensorConstant{[ 22.55216... ]})]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Being simple analyst, should i learn all this stuff about theano to being able to work with my statistical problems? Is a new mcmc sampler with gradient feature is only one thing that should motivates me to switch from pymc2 to pymc3?

For your first question, it looks like you're trying to pass a theano function as a variable. You need to call the function with the other variables as arguments, which will then return a theano variable. Try changing your line to
disasters=pymc3.Poisson('disasters', mu=rate1(switchpoint, mu1, mu2), observed = data)
I couldn't reproduce the error in your second part; the sampling worked just fine for me.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I load sklearn data in Jupyter Python 3? - python

Related

Trying to download dataset, code doesn't work in Jupyter notebook but it does work in Pycharm

Error while saving a pandas DataFrame to a feather file, using to_feather() function

Pipeline error (ValueError: Specifying the columns using strings is only supported for pandas DataFrames)

ValueError: field 'IFORM' occurs more than once

pymc3: Disaster example with deterministic switchpoint function

Categories

Resources