Python getting results from Azure Storage Table with azure-data-tables

Python getting results from Azure Storage Table with azure-data-tables - python

I am trying to query an Azure storage table to get all rows to turn into a table on a web site, however I cannot get the entries from the table, I get the same error every time "azure.core.exceptions.HttpResponseError: The requested operation is not implemented on the specified resource."
For code I am following the examples here and it is not working as expected.
from azure.data.tables import TableServiceClient
from azure.core.credentials import AzureNamedKeyCredential
def read_storage_table():
credential = AzureNamedKeyCredential(os.environ["AZ_STORAGE_ACCOUNT"], os.environ["AZ_STORAGE_KEY"])
service = TableServiceClient(endpoint=os.environ["AZ_STORAGE_ENDPOINT"], credential=credential)
client = service.get_table_client(table_name=os.environ["AZ_STORAGE_TABLE"])
entities = client.query_entities(query_filter="PartitionKey eq 'tasksSeattle'")
client.close()
service.close()
return entities
Then calling the function.
table = read_storage_table()
for record in table:
for key in record.keys():
print("Key: {}, Value: {}".format(key, record[key]))
And that returns:
Traceback (most recent call last):
File "C:\Program Files\Python310\Lib\site-packages\azure\data\tables\_models.py", line 363, in _get_next_cb
return self._command(
File "C:\Program Files\Python310\Lib\site-packages\azure\data\tables\_generated\operations\_table_operations.py", line 386, in query_entities
raise HttpResponseError(response=response, model=error)
azure.core.exceptions.HttpResponseError: Operation returned an invalid status 'Not Implemented'
Content: {"odata.error":{"code":"NotImplemented","message":{"lang":"en-US","value":"The requested operation is not implemented on the specified resource.\nRequestId:cd29feda-1002-006b-679c-3d39e8000000\nTime:2022-03-22T03:27:00.5993216Z"}}}
Using a similar function I am able to write to the table. But even trying entities = client.list_entities() I get the same error. I'm at a loss.

KrunkFu thank you for identifying and sharing the solution here. Posting the same into answer section to help other community members.
replacing https://<accountname>.table.core.windows.net/<table>, with
https://<accountname>.table.core.windows.net to the query solved the
issue

Related

Apache Superset not loading table records/columns

I am trying to add a table in Superset. The other tables get added properly, meaning the columns are fetched properly by Superset. But for my table booking_xml, it does not load any columns.
The description of table is
After adding this table, when I click on the table name to explore it, it gives the following error
Empty query?
Traceback (most recent call last):
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/viz.py", line 473, in get_df_payload
df = self.get_df(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/viz.py", line 251, in get_df
self.results = self.datasource.query(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 1139, in query
query_str_ext = self.get_query_str_extended(query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 656, in get_query_str_extended
sqlaq = self.get_sqla_query(**query_obj)
File "/home/superset/superset_venv/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 801, in get_sqla_query
raise Exception(_("Empty query?"))
Exception: Empty query?
ERROR:superset.viz:Empty query?
However, when I try to explore it using the SQL editor, it loads up properly. I have found the difference in the form_data parameter in the URL when loading from tables page and from SQL editor.
URL from SQL Lab view:
form_data={"queryFields":{"groupby":"groupby","metrics":"metrics"},"datasource":"192__table","viz_type":"table","url_params":{},"time_range_endpoints":["inclusive","exclusive"],"granularity_sqla":"created_on","time_grain_sqla":"P1D","time_range":"Last+week","groupby":[],"metrics":["count"],"all_columns":[],"percent_metrics":[],"order_by_cols":[],"row_limit":10000,"order_desc":true,"adhoc_filters":[],"table_timestamp_format":"smart_date","color_pn":true,"show_cell_bars":true}
URL from datasets list:
form_data={"queryFields":{"groupby":"groupby","metrics":"metrics"},"datasource":"191__table","viz_type":"table","url_params":{},"time_range_endpoints":["inclusive","exclusive"],"time_grain_sqla":"P1D","time_range":"Last+week","groupby":[],"all_columns":[],"percent_metrics":[],"order_by_cols":[],"row_limit":10000,"order_desc":true,"adhoc_filters":[],"table_timestamp_format":"smart_date","color_pn":true,"show_cell_bars":true}
When loading from datasets list, /explore_json/ gives 400 Bad Request.
Superset version == 0.37.1, Python version == 3.8

Superset saves the details/metadata of the table that has to be connected. So, in that my table had a very long datatype as you can see in the image in question. Superset saves that as a varchar of length 32. So, the database was not allowing to enter this value into the database. Which was causing the error. Due to that no records were being fetched even after adding the table in the datasources.
What I did was to increase the length of the column datatype.
ALTER TABLE table_columns MODIFY type varchar(200)

Document AI process document fails with invalid argument when processing docs from GCS

I am getting an error very similar to the below, but I am not in EU:
Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument
When I use the raw_document and process a local pdf file, it works fine. However, when I specify a pdf file on a GCS location, it fails.
Error message:
the processor name: projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97
the form process request: name: "projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97"
inline_document {
uri: "gs://xxxx/temp/test1.pdf"
}
Traceback (most recent call last):
File "C:\Python39\lib\site-packages\google\api_core\grpc_helpers.py", line 66, in error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Request contains an invalid argument."
debug_error_string = "{"created":"#1647296055.582000000","description":"Error received from peer ipv4:142.250.80.74:443","file":"src/core/lib/surface/call.cc","file_line":1070,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>
Code:
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
print(f'the processor name: {name}')
# document = {"uri": gcs_path, "mime_type": "application/pdf"}
document = {"uri": gcs_path}
inline_document = documentai.Document()
inline_document.uri = gcs_path
# inline_document.mime_type = "application/pdf"
# Configure the process request
# request = {"name": name, "inline_document": document}
request = documentai.ProcessRequest(
inline_document=inline_document,
name=name
)
print(f'the form process request: {request}')
result = client.process_document(request=request)
I do not believe I have permission issues on the bucket since the same set up works fine for a document classification process on the same bucket.

This is a known issue for Document AI, and is already reported in this issue tracker. Unfortunately the only workaround for now is to either:
Download your file, read the file as bytes and use process_documents(). See Document AI local processing for the sample code.
Use batch_process_documents() since by default is only accepts files from GCS. This is if you don't want to do the extra step on downloading the file.

This is still an issue 5 months later, and something not mentioned in the accepted answer is (and I could be wrong but it seems to me) that batch processes are only able to output their results to GCS, so you'll still incur the extra step of downloading something from a bucket (be it the input document under Option 1 or the result under Option 2). On top of that, you'll end up having to do cleanup in the bucket if you don't want the results there, so under many circumstances, Option 2 won't present much of an advantage other than the fact that the result download will probably be smaller than the input file download.
I'm using the client library in a Python Cloud Function and I'm affected by this issue. I'm implementing Option 1 for the reason that it seems simplest and I'm holding out for the fix. I also considered using the Workflow client library to fire a Workflow that runs a Document AI process, or calling the Document AI REST API, but it's all very suboptimal.

Using Python to Manage AWS

I’m trying to use Python to create EC2 instances but I keep getting these errors.
Here is my code:
#!/usr/bin/env python
import boto3
ec2 = boto3.resource('ec2')
instance = ec2.create_instances(
ImageId='ami-0922553b7b0369273',
MinCount=1,
MaxCount=1,
InstanceType='t2.micro')
print instance[0].id
Here are the errors I'm getting
Traceback (most recent call last):
File "./createinstance.py", line 8, in <module>
InstanceType='t2.micro')
File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action
response = action(self, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(**params)
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 320, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 623, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-0922553b7b0369273]' does not exist
I also get an error when trying to create a key pair
Here's my code for creating the keypair
import boto3
ec2 = boto3.resource('ec2')
# create a file to store the key locally
outfile = open('ec2-keypair.pem','w')
# call the boto ec2 function to create a key pair
key_pair = ec2.create_key_pair(KeyName='ec2-keypair')
# capture the key and store it in a file
KeyPairOut = str(key_pair.key_material)
print(KeyPairOut)
outfile.write(KeyPairOut)
response = ec2.instance-describe()
print response
Here's are the error messages
./createkey.py: line 1: import: command not found
./createkey.py: line 2: syntax error near unexpected token `('
./createkey.py: line 2: `ec2 = boto3.resource('ec2')'
What I am I missing?

For your first script, one of two possibilities could be occurring:
1. The AMI you are referencing by the ID is not available because the key is incorrect or the AMI doesn't exist
2. AMI is unavailable in the region that your machine is setup for
You most likely are running your script from a machine that is not configured for the correct region. If you are running your script locally or on a server that does not have roles configured, and you are using the aws-cli, you can run the aws configure command to configure your access keys and region appropriately. If you are running your instance on a server with roles configured, your server needs to be ran in the correct region, and your roles need to allow access to EC2 AMI's.
For your second question (which in the future should probably be posted separate), your syntax error in your script is a side effect of not following the same format for how you wrote your first script. It is most likely that your python script is not in fact being interpreted as a python script. You should add the shebang at the top of the file and remove the spacing preceding your import boto3 statement.
#!/usr/bin/env python
import boto3
# create a file to store the key locally
outfile = open('ec2-keypair.pem','w')
# call the boto ec2 function to create a key pair
key_pair = ec2.create_key_pair(KeyName='ec2-keypair')
# capture the key and store it in a file
KeyPairOut = str(key_pair.key_material)
print(KeyPairOut)
outfile.write(KeyPairOut)
response = ec2.instance-describe()
print response

Get the list of due assignments using Google Classroom API

I am trying to get the list of assignments due/coursework for all the courses using the Google Classroom API. I am getting a list of courses using the below code :
results = service.courses().list(pageSize = 10).execute()
courses = results.get('courses',[])
Once I get the list of all the courses, I loop over each the course and try to supply the courseID in order to get the list of coursework using courses.courseWork.list method, but I'm getting an error.
I have written the following code :
for course in courses :
print(course['name'])
print "Assignments you have due in this course : "
print course[u'id']
course_work_results = service.courses().courseWork().list().execute()
print course_work_results
Since I am not supplying the courseID anywhere (which I need to know how to do), I get the following error :
Traceback (most recent call last):
File "classroom.py", line 53, in <module>
course_work_results = service.courses().courseWork().list().execute()
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 727, in method
raise TypeError('Missing required parameter "%s"' % name)
TypeError: Missing required parameter "courseId"
The error is caused due to the line
course_work_results = service.courses().courseWork().list().execute()
How to fix this ?

How to fix IncompleteRead error on Linux using Py2Neo

I am updating data on a Neo4j server using Python (2.7.6) and Py2Neo (1.6.4). My load function is:
from py2neo import neo4j,node, rel, cypher
session = cypher.Session('http://my_neo4j_server.com.mine:7474')
def load_data():
tx = session.create_transaction()
for row in dataframe.iterrows(): #dataframe is a pandas dataframe
name = row[1].name
id = row[1].id
merge_query = "MERGE (a:label {name:'%s', name_var:'%s'}) " % (id, name)
tx.append(merge_query)
tx.commit()
When I execute this from Spyder in Windows it works great. All the data from the dataframe is committed to neo4j and visible in the graph. However, when I run this from a linux server (different from the neo4j server) I get the following error at tx.commit(). Note that I have the same version of python and py2neo.
INFO:py2neo.packages.httpstream.http:>>> POST http://neo4j1.qs:7474/db/data/transaction/commit [1360120]
INFO:py2neo.packages.httpstream.http:<<< 200 OK [chunked]
ERROR:__main__:some part of process failed
Traceback (most recent call last):
File "my_file.py", line 132, in load_data
tx.commit()
File "/usr/local/lib/python2.7/site-packages/py2neo/cypher.py", line 242, in commit
return self._post(self._commit or self._begin_commit)
File "/usr/local/lib/python2.7/site-packages/py2neo/cypher.py", line 208, in _post
j = rs.json
File "/usr/local/lib/python2.7/site-packages/py2neo/packages/httpstream/http.py", line 563, in json
return json.loads(self.read().decode(self.encoding))
File "/usr/local/lib/python2.7/site-packages/py2neo/packages/httpstream/http.py", line 634, in read
data = self._response.read()
File "/usr/local/lib/python2.7/httplib.py", line 543, in read
return self._read_chunked(amt)
File "/usr/local/lib/python2.7/httplib.py", line 597, in _read_chunked
raise IncompleteRead(''.join(value))
IncompleteRead: IncompleteRead(128135 bytes read)
This post (IncompleteRead using httplib) suggests that is an httplib error. I am not sure how to handle since I am not calling httplib directly.
Any suggestions for getting this load to work on Linux or what the IncompleteRead error message means?
UPDATE :
The IncompleteRead error is being caused by a Neo4j error being returned. The line returned in _read_chunked that is causing the error is:
pe}"}]}],"errors":[{"code":"Neo.TransientError.Network.UnknownFailure"
Neo4j docs say this is an unknown network error.

Although I can't say for sure, this implies some kind of local network issue between client and server rather than a bug within the library. Py2neo wraps httplib (which is pretty solid itself) and, from the stack trace, it looks as though the client is expecting more chunks from a chunked response.
To diagnose further, you could make some curl calls from your Linux application server to your database server and see what succeeds and what doesn't. If that works, try writing a quick and dirty python script to make the same calls with httplib directly.
UPDATE 1: Given the update above and the fact that the server streams its responses, I'm thinking that the chunk size might represent the intended payload but the error cuts the response short. Recreating the issue with curl certainly seems like the best next step to help determine whether it is a fault in the driver, the server or something else.
UPDATE 2: Looking again this morning, I notice that you're using Python substitution for the properties within the MERGE statement. As good practice, you should use parameter substitution at the Cypher level:
merge_query = "MERGE (a:label {name:{name}, name_var:{name_var}})"
merge_params = {"name": id, "name_var": name}
tx.append(merge_query, merge_params)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python getting results from Azure Storage Table with azure-data-tables - python

KrunkFu thank you for identifying and sharing the solution here. Posting the same into answer section to help other community members. replacing https://<accountname>.table.core.windows.net/<table>, with https://<accountname>.table.core.windows.net to the query solved the issue

Related

Apache Superset not loading table records/columns

Document AI process document fails with invalid argument when processing docs from GCS

Using Python to Manage AWS

Get the list of due assignments using Google Classroom API

How to fix IncompleteRead error on Linux using Py2Neo

Categories

Resources