Azure Table Storage Querying using Python- Read Integer Column - python

I am trying to query Azure Table Storage using Python. An int32 datatype column doesn't return its value but returns something like this azure.storage.table.models.EntityProperty obj..... But, in case of string datatype columns, i am not facing any such issues. Could someone please help me ?
The column Pos in below script is an integer column in the table
queryfilter = "startDateTime gt datetime'%s' and temp eq '%s'" % (datefilter, temp)
task = table_service.query_entities(azureTable, filter=queryfilter)
for t in task:
print(t.Pos)

Looking at the documentation here: https://learn.microsoft.com/en-us/python/api/azure.cosmosdb.table.models.entityproperty?view=azure-python, can you try the following?
for t in task: print(t.Pos.value)

Azure Table Storage has a new python library in preview release that is available for installation via pip. To install use the following pip command
pip install azure-data-tables
This SDK is able to target either a Tables or Cosmos endpoint (albeit there are known issues with Cosmos).
The new library uses a similar TableEntity which is a key-value type inheriting from the Python dictionary, the value is the same EntityProperty. There are two ways to access entity properties. If the type is an Int32 (default integer type) or String they can be accessed by:
my_value = entity.my_key # direct access
my_value = entity['my_key'] # same access pattern as a dict
If the EntityProperty is of type INT32 or BINARY then you will have to use the .value notation:
my_value = entity.my_key.value # direct access
my_value = entity['my_key'].value # same access pattern as a dict
FYI I am a full-time engineer at Microsoft on the Azure SDK for Python team.

Related

BigQuery client python get column based partitioning column name

I am looking to include a where clause to an existing query to exclude data in the BigQuery streaming buffer.
To do this I would like to get the Partition column name so I can add
WHERE partition_column IS NOT NULL;
to my existing query.
I have been looking at the CLI and the get_table method however that just returns the value of the column not the column name.
I get the same when searching on .INFORMATION_SCHEMA.PARTITIONS this returns a field for partition_id but I would prefer the column name itself, is there away to get this ?
Additionally the table is setup with column based partitioning.
Based on python BigQuery client documentation, use the attribute time_partitioning:
from google.cloud import bigquery
bq_table = client.get_table('my_partioned_table')
bq_table.time_partitioning # TimePartitioning(field='column_name',type_='DAY')
bq_table.time_partitioning.field # column_name
Small tips, if you don't know where to search, print the API repr:
bq_table.to_api_repr()

How to get number of entities returned in a Azure Tables query?

I am using python to make a query to Azure tables.
query = table_service.query_entities(table_name, filter=filter_string)
How can I see the amount of entities returned by this query? I have tried using
query.count
query.count()
but have had no luck. I get the following error.
'ListGenerator' object has no attribute 'count
Searching online keeps bring back results about getting the count of the entire rows in the table which is not relevant.
There is a new SDK for the Azure tables service, you can install it from pip with the command pip install azure-data-tables. The new SDK can target either a storage account or a cosmos account. Here is a sample of how you can find the total number of entities in a table. You will have to iterate through each entity because the new Tables SDK uses paging on calls for query_entities and list_entities. Entities are turned in an ItemPaged which only returns a subset of the entities at a time.
from azure.data.tables import TableClient, TableServiceClient
connection_string = "<your_conn_str>"
table_name = "<your_table_name>"
with TableClient.from_connection_string(connection_string, table_name) as table_client:
f = "value gt 25"
query = table_client.query_entities(filter=f)
count = 0
for entity in query:
count += 1
print(count)
If you can clarify why you need the number of entities in a query I might be able to give better advice.
(Disclaimer, I work on the Azure SDK for Python team)
You should use len(query.items) to get the number of returned entities.
The code like below:
query = table_service.query_entities(table_name, filter=filter_string)
print(len(query.items))
Here is the test result:

how to use select query inside a python udf for redshift?

I tried uploading modules to redshift through S3 but it always says no module found. please help
CREATE or replace FUNCTION olus_layer(subs_no varchar)
RETURNS varchar volatile AS
$$
import plpydbapi
dbconn = plpydbapi.connect()
cursor = dbconn.cursor()
cursor.execute("SELECT count(*) from busobj_group.olus_usage_detail")
d=cursor.fetchall()
dbconn.close()
return d
$$
LANGUAGE plpythonu;
–
You cannot do this in Redshift. so you will need to find another approach.
1) see here for udf constraints http://docs.aws.amazon.com/redshift/latest/dg/udf-constraints.html
2) see here http://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html
especially this part:
Important Amazon Redshift blocks all network access and write access
to the file system through UDFs.
This means that even if you try to get around the restriction, it won't work!
If you don't know an alternative way to get what you need, you should ask a new question specifying exactly what your challenge is and what you have tried, (leave this question ans answer here for future reference by others)
It can't connect to DB inside UDF, Python functions are scalar in Redshift, meaning it takes one or more values and returns only one output value.
However, if you want to execute a function against a set of rows try to use LISTAGG function to build an array of values or objects (if you need multiple properties) into a large string (beware of string size limitation), pass it to UDF as parameter and parse/loop inside the function.
Amazon has recently announced the support for Stored Procedures in Redshift. Unlike a user-defined function (UDF), a stored procedure can incorporate data definition language (DDL) and data manipulation language (DML) in addition to SELECT queries. Along with that, it also supports looping and conditional expressions, to control logical flow.
https://docs.aws.amazon.com/redshift/latest/dg/stored-procedure-overview.html

How to get auto generate id in ES by using Python

I need to store the unique auto id like using XPOST
because every time I start the program, the data is overwritten.
but, I can't find an example that automatically generate id in Python
Could you tell me if there are any good examples?
my code:
def saveES(output,es):
bodys=[]
i=0
while i<len(output)-1: #output[len(output)-1] is space
json_doc=json.dumps(output[i])
body = {
"_index":"crawler",
"_type":"typed",
"_id":saveES.counter,
"_source":json_doc
}
i+=1
bodys.append(body)
saveES.counter+=1
helpers.bulk(es,bodys)
You don't need to do this in python -- if you index documents without an id, Elasticsearch will automatically create a unique id. However, if for some reason you want to generate the id in python, you could use uuid.
If you are using the ES Python Client you will not be able to use es.create(...) without an ID. You should instead es.index(...) instead.
Both of them call the Elasticsearch Index API, but es.create sets the op_type parameter to create, and es.index to index.
The create operation is designed to fail if the ID already exists, so it will not accept being called without an ID.

mysql - How to determine the length of a field which stores a mutable string?

I'm working on a iOS application and I use Flask(a Python framework) to build my backend.
I store my data in mysql database.
Now I need to store a bunch of IDs in one attribute.
Firstly I convert the array which stores the IDs to a JSON format object.
Then I ran into a problem. How to store this object?
As the length of the object can be rather large(I cannot make sure how many IDs I store), and SQLAlchemy requires the attribute to have a exact length when I create the table, so how to determine the length of the attribute?
In case you use MySQL 5.7 or newer
you should look at the new JSON type.
You can use this MySQL feature through sqlalchemy's type.JSON. This will greatly simplify column data management.
data_table = Table('data_table', metadata,
Column('id', Integer, primary_key=True),
Column('loosely_related_ids', JSON)
)
with engine.connect() as conn:
conn.execute(
data_table.insert(),
loosely_related_ids = [1, 54, 56, 99, 104]
)
Later on accessing the loosely_related_ids field will return a python array that you access normally.
If you are using an older version of MySQL
you should use a TEXT field or a wrapper around a similar type.
SQLAlchemy provides the PickleType field which is implemented on top of a BLOB field and will handle pickling and unpickling the array for you. Keep in mind that all the caveats of pickling python objects and sharing them across interpreters still apply here.
I don't quiet know the situation you meet , But it's not recommended to store multi records in one column , It's more normalised to build a relation map between ID owner and ID .
For example , you can create a new table called 'IDs' with schema like that :
id int auto increment ,
idbla varchar(<Your ID Length>)
owner int not null
When you are trying to get all idbla of some user x you can use
SELECT * idbla from IDs where owner = x
Another choice :
You can use nosql (non relational database) to store your data , It's document like and fit your situation pretty well .

Categories