AzureException: Incorrect padding. Just following Azure tutorial - python

I'm trying to learn how to use the Azure Table Storage service. I'm following this tutorial: https://learn.microsoft.com/en-us/azure/storage/storage-python-how-to-use-table-storage and I am simply copy pasting the code into a jupyter notebook. I've setup a storage account and am successfully using the blob storage. Also from a notebook.
Code from the tutorial:
from azure.storage.table import TableService, Entity
table_service = TableService(account_name='myaccount', account_key='mykey')
table_service.create_table('tasktable')
When I run the last line I get the following error and I'm not sure what I am doing wrong to cause it
---------------------------------------------------------------------------
Error Traceback (most recent call last)
/usr/local/lib/python3.5/site-packages/azure/storage/storageclient.py in _perform_request(self, request, parser, parser_args, operation_context)
205 _add_date_header(request)
--> 206 self.authentication.sign_request(request)
207
/usr/local/lib/python3.5/site-packages/azure/storage/_auth.py in sign_request(self, request)
96
---> 97 self._add_authorization_header(request, string_to_sign)
98
/usr/local/lib/python3.5/site-packages/azure/storage/_auth.py in _add_authorization_header(self, request, string_to_sign)
50 def _add_authorization_header(self, request, string_to_sign):
---> 51 signature = _sign_string(self.account_key, string_to_sign)
52 auth_string = 'SharedKey ' + self.account_name + ':' + signature
/usr/local/lib/python3.5/site-packages/azure/storage/_common_conversion.py in _sign_string(key, string_to_sign, key_is_base64)
87 if key_is_base64:
---> 88 key = _decode_base64_to_bytes(key)
89 else:
/usr/local/lib/python3.5/site-packages/azure/storage/_common_conversion.py in _decode_base64_to_bytes(data)
77 data = data.encode('utf-8')
---> 78 return base64.b64decode(data)
79
/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/base64.py in b64decode(s, altchars, validate)
89 raise binascii.Error('Non-base64 digit found')
---> 90 return binascii.a2b_base64(s)
91
Error: Incorrect padding
During handling of the above exception, another exception occurred:
AzureException Traceback (most recent call last)
<ipython-input-17-192b23ba629f> in <module>()
----> 1 table_service.create_table('tasktable')
/usr/local/lib/python3.5/site-packages/azure/storage/table/tableservice.py in create_table(self, table_name, fail_on_exist, timeout)
520 if not fail_on_exist:
521 try:
--> 522 self._perform_request(request)
523 return True
524 except AzureHttpError as ex:
/usr/local/lib/python3.5/site-packages/azure/storage/table/tableservice.py in _perform_request(self, request, parser, parser_args, operation_context)
1087 def _perform_request(self, request, parser=None, parser_args=None, operation_context=None):
1088 _update_storage_table_header(request)
-> 1089 return super(TableService, self)._perform_request(request, parser, parser_args, operation_context)
/usr/local/lib/python3.5/site-packages/azure/storage/storageclient.py in _perform_request(self, request, parser, parser_args, operation_context)
264 sleep(retry_interval)
265 else:
--> 266 raise ex
267 finally:
268 # If this is a location locked operation and the location is not set,
/usr/local/lib/python3.5/site-packages/azure/storage/storageclient.py in _perform_request(self, request, parser, parser_args, operation_context)
240 if sys.version_info >= (3,):
241 # Automatic chaining in Python 3 means we keep the trace
--> 242 raise AzureException(ex.args[0])
243 else:
244 # There isn't a good solution in 2 for keeping the stack trace
AzureException: Incorrect padding

As a summary, the issue was caused by the variable name for account key within some mistake. Accroding to the error information Error: Incorrect padding, as #Scovetta said, it seems to be not BASE64 encoding. Some changes for the key like missing the last = symbol or adding more = symbol will cause the error. And the length of correct account key of Azure Storage is 88.

Shameless necro after putting "incorrect padding azure" in my favorite search engine.
Turns out I was passing arguments like so : --account-key "$ACCOUNT_KEY", and Azure didn't understand the quotes.
Since it's supposed to be base64 encoded, all characters in there should be shell-safe, so if your input is OK there shouldn't be any problem putting it like that.

Related

NotImplementedError: 'split_respect_sentence_boundary=True' is only compatible with split_by='word'

I have the following lines of code
from haystack.document_stores import InMemoryDocumentStore, SQLDocumentStore
from haystack.nodes import TextConverter, PDFToTextConverter,PreProcessor
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
doc_dir = "C:\\Users\\abcd\\Downloads\\PDF Files\\"
docs = convert_files_to_docs(dir_path=doc_dir, clean_func=None, split_paragraphs=True
preprocessor = PreProcessor(
clean_empty_lines=True,
clean_whitespace=True,
clean_header_footer=True,
split_by="passage",
split_length=2)
doc = preprocessor.process(docs)
When i try to run it, i get the following error message
NotImplementedError Traceback (most recent call last)
c:\Users\abcd\Downloads\solr9.ipynb Cell 27 in <cell line: 23>()
16 print(type(docs))
17 preprocessor = PreProcessor(
18 clean_empty_lines=True,
19 clean_whitespace=True,
20 clean_header_footer=True,
21 split_by="passage",
22 split_length=2)
---> 23 doc = preprocessor.process(docs)
File ~\AppData\Roaming\Python\Python39\site-packages\haystack\nodes\preprocessor\preprocessor.py:167, in PreProcessor.process(self, documents, clean_whitespace, clean_header_footer, clean_empty_lines, remove_substrings, split_by, split_length, split_overlap, split_respect_sentence_boundary, id_hash_keys)
165 ret = self._process_single(document=documents, id_hash_keys=id_hash_keys, **kwargs) # type: ignore
166 elif isinstance(documents, list):
--> 167 ret = self._process_batch(documents=list(documents), id_hash_keys=id_hash_keys, **kwargs)
168 else:
169 raise Exception("documents provided to PreProcessor.prepreprocess() is not of type list nor Document")
File ~\AppData\Roaming\Python\Python39\site-packages\haystack\nodes\preprocessor\preprocessor.py:225, in PreProcessor._process_batch(self, documents, id_hash_keys, **kwargs)
222 def _process_batch(
223 self, documents: List[Union[dict, Document]], id_hash_keys: Optional[List[str]] = None, **kwargs
224 ) -> List[Document]:
--> 225 nested_docs = [
226 self._process_single(d, id_hash_keys=id_hash_keys, **kwargs)
...
--> 324 raise NotImplementedError("'split_respect_sentence_boundary=True' is only compatible with split_by='word'.")
326 if type(document.content) is not str:
327 logger.error("Document content is not of type str. Nothing to split.")
NotImplementedError: 'split_respect_sentence_boundary=True' is only compatible with split_by='word'.
I don't even have split_respect_sentence_boundary=True as my argument and also i don't have split_by='word' rather i have it set as split_by="passage".
This is the same error if i try changing it to split_by="sentence".
Do let me know if i am missing out anything here.
Tried using split_by="sentence" but getting same error.
As you can see in the PreProcessor API docs, the default value for split_respect_sentence_boundary is True.
In order to make your code work, you should specify split_respect_sentence_boundary=False:
preprocessor = PreProcessor(
clean_empty_lines=True,
clean_whitespace=True,
clean_header_footer=True,
split_by="passage",
split_length=2,
split_respect_sentence_boundary=False)
I agree that this behavior is not intuitive.
Currently, this node is undergoing a major refactoring.

Error while saving Optuna study to Google Drive from Colab

I can save a random file to my drive colab as:
with open ("gdrive/My Drive/chapter_classification/output/hello.txt",'w')as f:
f.write('hello')
works fine but when I use the Official documentation approach of Optuna using the code:
direction = 'minimize'
name = 'opt1'
study = optuna.create_study(sampler=optuna.samplers.TPESampler(),direction=direction,study_name=name, storage=f"gdrive/My Drive/chapter_classification/output/sqlite:///{name}.db",load_if_exists=True)
study.optimize(tune, n_trials=1000)
throws an error as:
ArgumentError Traceback (most recent call last)
<ipython-input-177-f32da2c0f69a> in <module>()
2 direction = 'minimize'
3 name = 'opt1'
----> 4 study = optuna.create_study(sampler=optuna.samplers.TPESampler(),direction=direction,study_name=name, storage="gdrive/My Drive/chapter_classification/output/sqlite:///opt1.db",load_if_exists=True)
5 study.optimize(tune, n_trials=1000)
6 frames
/usr/local/lib/python3.7/dist-packages/optuna/study/study.py in create_study(storage, sampler, pruner, study_name, direction, load_if_exists, directions)
1134 ]
1135
-> 1136 storage = storages.get_storage(storage)
1137 try:
1138 study_id = storage.create_new_study(study_name)
/usr/local/lib/python3.7/dist-packages/optuna/storages/__init__.py in get_storage(storage)
29 return RedisStorage(storage)
30 else:
---> 31 return _CachedStorage(RDBStorage(storage))
32 elif isinstance(storage, RDBStorage):
33 return _CachedStorage(storage)
/usr/local/lib/python3.7/dist-packages/optuna/storages/_rdb/storage.py in __init__(self, url, engine_kwargs, skip_compatibility_check, heartbeat_interval, grace_period, failed_trial_callback)
173
174 try:
--> 175 self.engine = create_engine(self.url, **self.engine_kwargs)
176 except ImportError as e:
177 raise ImportError(
<string> in create_engine(url, **kwargs)
/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/deprecations.py in warned(fn, *args, **kwargs)
307 stacklevel=3,
308 )
--> 309 return fn(*args, **kwargs)
310
311 doc = fn.__doc__ is not None and fn.__doc__ or ""
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/create.py in create_engine(url, **kwargs)
528
529 # create url.URL object
--> 530 u = _url.make_url(url)
531
532 u, plugins, kwargs = u._instantiate_plugins(kwargs)
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/url.py in make_url(name_or_url)
713
714 if isinstance(name_or_url, util.string_types):
--> 715 return _parse_rfc1738_args(name_or_url)
716 else:
717 return name_or_url
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/url.py in _parse_rfc1738_args(name)
775 else:
776 raise exc.ArgumentError(
--> 777 "Could not parse rfc1738 URL from string '%s'" % name
778 )
779
ArgumentError: Could not parse rfc1738 URL from string 'gdrive/My Drive/chapter_classification/output/sqlite:///opt1.db'
So according to the official Documentation of create_study
When a database URL is passed, Optuna internally uses SQLAlchemy to handle the database. Please refer to SQLAlchemy’s document for further details. If you want to specify non-default options to SQLAlchemy Engine, you can instantiate RDBStorage with your desired options and pass it to the storage argument instead of a URL.
And when you visit the documentation of SQLAlchemy, you find that it uses absolute path.
So all you have to do is to change
storage=f"gdrive/My Drive/chapter_classification/output/sqlite:///{name}.db"
to the absolute path as:
storage = f"sqlite:///gdrive/My Drive/chapter_classification/output{name}.db"

python atlasapi authentication

I'm trying to authenticate to Atlas with the atlasapi. I'm using my google account and get the error ErrAtlasUnauthorized: Authentication is required with the below method. Is google auth supported or am I doing something wrong?
from atlasapi.atlas import Atlas
auth = Atlas("foo#google.com","<password>","<groupId>")
clusters = auth.Clusters.get_all_clusters
print (clusters())
full trace:
ErrAtlasUnauthorized Traceback (most recent call last)
<ipython-input-61-d69a101fdf69> in <module>
1 clusters = auth.Clusters.get_all_clusters
----> 2 print (clusters())
C:\...\atlasapi\atlas.py in get_all_clusters(self, pageNum, itemsPerPage, iterable)
129
130 uri = Settings.api_resources["Clusters"]["Get All Clusters"] % (self.atlas.group, pageNum, itemsPerPage)
--> 131 return self.atlas.network.get(Settings.BASE_URL + uri)
132
133 def get_single_cluster(self, cluster: str) -> dict:
C:\...\atlasapi\network.py in get(self, uri)
144 logger.debug("Auth information = {} {}".format(self.user, self.password))
145
--> 146 return self.answer(r.status_code, r.json())
147
148 except Exception:
C:\...\atlasapi\network.py in answer(self, c, details)
68 raise ErrAtlasBadRequest(c, details)
69 elif c == Settings.UNAUTHORIZED:
---> 70 raise ErrAtlasUnauthorized(c, details)
71 elif c == Settings.FORBIDDEN:
72 raise ErrAtlasForbidden(c, details)
ErrAtlasUnauthorized: Authentication is required
The API access keys are your User/Password.

Wahoo TICKR X .fit file reading/parsing and analysis in Python

Not sure if I can post a question like this here so please redirect me if I'm in the wrong place.
I've bought a Wahoo TICKR X to monitor my heart rate during exercise. Also I would like to get more familiar with python so i decided I would like do do the analyses of my heart rate myself in python instead of in the wahoo app. I thought this would also give more freedom in the choice of visualization, testing etc.
I've recorded my heart rate for 5 minutes or so and exported the .fit file. However I can't even find a suitable library to read the .fit file. Can anyone recommend a library that works with .fit file from wahoo?
I'm using ubuntu, anaconda, python 3.7
import pyfits
# Load the FITS file into the program
hdulist = pyfits.open('/home/bradmin/Downloads/2020-03-26.fit')
# Load table data as tbdata
tbdata = hdulist[1].data
OSError Traceback (most recent call last)
<ipython-input-3-a970e2cd9dee> in <module>
2
3 # Load the FITS file into the program
----> 4 hdulist = pyfits.open('/home/bradmin/Downloads/2020-03-26.fit')
5
6 # Load table data as tbdata
~/anaconda3/lib/python3.7/site-packages/pyfits/hdu/hdulist.py in fitsopen(name, mode, memmap, save_backup, **kwargs)
122 raise ValueError('Empty filename: %s' % repr(name))
123
--> 124 return HDUList.fromfile(name, mode, memmap, save_backup, **kwargs)
125
126
~/anaconda3/lib/python3.7/site-packages/pyfits/hdu/hdulist.py in fromfile(cls, fileobj, mode, memmap, save_backup, **kwargs)
264
265 return cls._readfrom(fileobj=fileobj, mode=mode, memmap=memmap,
--> 266 save_backup=save_backup, **kwargs)
267
268 #classmethod
~/anaconda3/lib/python3.7/site-packages/pyfits/hdu/hdulist.py in _readfrom(cls, fileobj, data, mode, memmap, save_backup, **kwargs)
853 # raise and exception
854 if mode in ('readonly', 'denywrite') and len(hdulist) == 0:
--> 855 raise IOError('Empty or corrupt FITS file')
856
857 # initialize/reset attributes to be used in "update/append" mode
OSError: Empty or corrupt FITS file
link to the file: https://wetransfer.com/downloads/6d054a5d52899aefcb1bcd22bda92ba120200326161849/b9831a
EDIT
I've tried this now but i get an error:
import fitdecode
src_file = "/home/bradmin/Downloads/2020-03-26.fit"
with fitdecode.FitReader(src_file) as fit:
for frame in fit:
# The yielded frame object is of one of the following types:
# * fitdecode.FitHeader
# * fitdecode.FitDefinitionMessage
# * fitdecode.FitDataMessage
# * fitdecode.FitCRC
if isinstance(frame, fitdecode.FitDataMessage):
# Here, frame is a FitDataMessage object.
# A FitDataMessage object contains decoded values that
# are directly usable in your script logic.
print(frame.name)
file_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
developer_data_id
field_description
field_description
field_description
field_description
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-7-e8d95d3087dc> in <module>
2
3 with fitdecode.FitReader(src_file) as fit:
----> 4 for frame in fit:
5 # The yielded frame object is of one of the following types:
6 # * fitdecode.FitHeader
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in __iter__(self)
191
192 def __iter__(self):
--> 193 yield from self._read_next()
194
195 #property
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in _read_next(self)
298 assert self._header
299
--> 300 record = self._read_record()
301 if not record:
302 break
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in _read_record(self)
443 self._add_dev_data_id(message)
444 elif message.mesg_type.mesg_num == profile.MESG_NUM_FIELD_DESCRIPTION:
--> 445 self._add_dev_field_description(message)
446
447 return message
~/anaconda3/lib/python3.7/site-packages/fitdecode/reader.py in _add_dev_field_description(self, message)
780 base_type_id = message.get_field('fit_base_type_id').raw_value
781 field_name = message.get_field('field_name').raw_value
--> 782 units = message.get_field('units').raw_value
783
784 try:
~/anaconda3/lib/python3.7/site-packages/fitdecode/records.py in get_field(self, field_name_or_num, idx)
188 raise KeyError(
189 f'field "{field_name_or_num}" (idx #{idx}) not found in ' +
--> 190 f'message "{self.name}"')
191
192 def get_fields(self, field_name_or_num):
KeyError: 'field "units" (idx #0) not found in message "field_description"'
The format seems to be this FIT format. pyfits is for an entirely different format, it seems.
The article above refers to a gpsbabel tool, which you could use to convert the FIT file to something more interoperable and usable, e.g. GPX (an XML-based format that's easy to parse).
Or, of course, if you want a pure-Python solution, you can port the FIT format reading bits from gpsbabel to Python use the fitdecode library.

H2O python rbind error

I have a 2000 rows data frame and I'm trying to slice the same data frame into two and combine them together.
t1 = test[:10, :]
t2 = test[20:, :]
temp = t1.rbind(t2)
temp.show()
Then I got this error:
---------------------------------------------------------------------------
EnvironmentError Traceback (most recent call last)
<ipython-input-37-8daeb3375743> in <module>()
2 t2 = test[20:, :]
3 temp = t1.rbind(t2)
----> 4 temp.show()
5 print len(temp)
6 print len(test)
/usr/local/lib/python2.7/dist-packages/h2o/frame.pyc in show(self, use_pandas)
383 print("This H2OFrame has been removed.")
384 return
--> 385 if not self._ex._cache.is_valid(): self._frame()._ex._cache.fill()
386 if H2ODisplay._in_ipy():
387 import IPython.display
/usr/local/lib/python2.7/dist-packages/h2o/frame.pyc in _frame(self, fill_cache)
423
424 def _frame(self, fill_cache=False):
--> 425 self._ex._eager_frame()
426 if fill_cache:
427 self._ex._cache.fill()
/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in _eager_frame(self)
67 if not self._cache.is_empty(): return self
68 if self._cache._id is not None: return self # Data already computed under ID, but not cached locally
---> 69 return self._eval_driver(True)
70
71 def _eager_scalar(self): # returns a scalar (or a list of scalars)
/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in _eval_driver(self, top)
81 def _eval_driver(self, top):
82 exec_str = self._do_it(top)
---> 83 res = ExprNode.rapids(exec_str)
84 if 'scalar' in res:
85 if isinstance(res['scalar'], list): self._cache._data = [float(x) for x in res['scalar']]
/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in rapids(expr)
163 The JSON response (as a python dictionary) of the Rapids execution
164 """
--> 165 return H2OConnection.post_json("Rapids", ast=expr,session_id=H2OConnection.session_id(), _rest_version=99)
166
167 class ASTId:
/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in post_json(url_suffix, file_upload_info, **kwargs)
515 if __H2OCONN__ is None:
516 raise ValueError("No h2o connection. Did you run `h2o.init()` ?")
--> 517 return __H2OCONN__._rest_json(url_suffix, "POST", file_upload_info, **kwargs)
518
519 def _rest_json(self, url_suffix, method, file_upload_info, **kwargs):
/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in _rest_json(self, url_suffix, method, file_upload_info, **kwargs)
518
519 def _rest_json(self, url_suffix, method, file_upload_info, **kwargs):
--> 520 raw_txt = self._do_raw_rest(url_suffix, method, file_upload_info, **kwargs)
521 return self._process_tables(raw_txt.json())
522
/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in _do_raw_rest(self, url_suffix, method, file_upload_info, **kwargs)
592 raise EnvironmentError(("h2o-py got an unexpected HTTP status code:\n {} {} (method = {}; url = {}). \n"+ \
593 "detailed error messages: {}")
--> 594 .format(http_result.status_code,http_result.reason,method,url,detailed_error_msgs))
595
596
EnvironmentError: h2o-py got an unexpected HTTP status code:
500 Server Error (method = POST; url = http://localhost:54321/99/Rapids).
detailed error messages: []
If I count rows (len(temp)), it works find. Also if I change the slicing index a little bit, it works find too. For example, if I change to this, it shows the data frame.
t1 = test[:10, :]
t2 = test[:5, :]
Do I miss something here? Thanks.
Unclear what happened without more information (logs would probably say why the rbind did not take).
What version are you using? I tried your code with iris on the bleeding edge and it all worked as expected.
By the way, rbind is typically going to be expensive, especially since what you're semantically after is a subset:
test[range(10) + range(20,test.nrow),:]
should also give you the desired subset (with caveat that you make the full list of row indices in python and pass it over REST to h2o).

Categories