BigQuery client python get column based partitioning column name - python

I am looking to include a where clause to an existing query to exclude data in the BigQuery streaming buffer.
To do this I would like to get the Partition column name so I can add
WHERE partition_column IS NOT NULL;
to my existing query.
I have been looking at the CLI and the get_table method however that just returns the value of the column not the column name.
I get the same when searching on .INFORMATION_SCHEMA.PARTITIONS this returns a field for partition_id but I would prefer the column name itself, is there away to get this ?
Additionally the table is setup with column based partitioning.

Based on python BigQuery client documentation, use the attribute time_partitioning:
from google.cloud import bigquery
bq_table = client.get_table('my_partioned_table')
bq_table.time_partitioning # TimePartitioning(field='column_name',type_='DAY')
bq_table.time_partitioning.field # column_name
Small tips, if you don't know where to search, print the API repr:
bq_table.to_api_repr()

Related

Python AWS dynamoDB: how can I query elements with a filter on not primary column?

I use a dynamoDB to stock data for an app.
ACTUALLY: I am loading all the data from my dynamodb (with a .scan()) then I filter the data in local.
PROBLEM: Loading values takes too much time.
SOLUTION:
Load only filtered data.
I wasn't able to use query like this one:
# Use the Table resource to query for all songs by artist Arturus Ardvarkian
response = table.query(
KeyConditionExpression=Key('artist').eq('Arturus Ardvarkian')
)
print(response['Items'])
They always say botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Query operation: Query condition missed key schema element: mail
You can create Local or Global secondary indices in your DynamoDB table. From your point of view it would be like there's different table definition (i.e. table key), but attached to the same data.
This way you will be able to use queries instead of filtering scan results.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html
Exact choice (LSI vs GSI) depends on your use case, and you can find comparison between the two here:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

AWS Glue Search Option

I'm currently using AWS Glue Data Catalog to organize my database. Once I set up the connection and sent my crawler to gather information, I was able to see the formulated metadata.
One feature that would be nice to have is the ability to SEARCH the entire data catalog on ONE column name. For example, if i have 5 tables in my data catalog, and one of those tables happen to have a field "age". I'd like to be able to see that table.
I also was wondering if I can search on the "comments" field every column has in a table on AWS Glue Data Catalog
Hope to get some help!
You can do that with AWS Glue API. For example, you can use python SDK boto3 and get_tables() method to retrieve all meta information about tables in a particular database. Have a look at the Response Syntax returned by calling get_tables() and then you would only need to parse it, for example:
import boto3
glue_client = boto3.client('glue')
response = glue_client.get_tables(
DatabaseName='__SOME_NAME__'
)
for table in response['TableList']:
columns = table['StorageDescriptor']['Columns']
for col in columns:
col_name = col['Name']
col_comment = col['Comment']
# Here you do search for what you need
Note: if you have a table with partitioning (artificial columns), then you would all need to search through
columns_as_partitions = table['PartitionKeys']
for col in columns_as_partitions:
col_name = col['Name']
col_comment = col['Comment']
# Here you do search for what you need

Is it possible to get sql column metadata from pandas DataFrame in python?

I read some database table with python to a pandas DataFrame and I am able to retrieve the type of the columns:
import sqlite3
import pandas
with sqlite3.connect('d:\database.sqlite') as connection:
dataFrame = pandas.read_sql_query('SELECT * FROM my_table', connection)
dataFrame.dtypes
I am wondering if the DataFrame also includes further metadata about the table, e.g. if a column is nullable, the default value of the column and if the column is a primary key?
I would expect some methods like
dataFrame.isNullable('columnName')
but could not find such methods.
If the meta data is not included in the DataFrame I would have to use an extra query to retrieve that data, e.g.
PRAGMA table_info('my_table')
That would give the columns
cid, name, type, notnull, dflt_value, pk
However, I would like to avoid that extra query. Especially if my original query does not contain all columns or if it defines some extra columns it could become complicated to find the corresponding metadata.
=>If the DataFrame already contains the wanted metadata, please let me know how to access it.
(In Java I would access the metadata with ResultSetMetaData metaData = resultSet.getMetaData(); )

column names and types for insert operation in sqlalchemy

I am building a sqlite browser in Python/sqlalchemy.
Here is my requirement.
I want to do insert operation on the table.
I need to pass a table name to a function. It should return all columns along with the respective types.
Can anyone tell me how to do this in sqlalchemy ?
You can access all columns of a Table like this:
my_table.c
Which returns a type that behaves similar to a dictionary, i.e. it has values method and so on:
columns = [(item.name, item.type) for item in my_table.c.values()]
You can play around with that to see what you can get from that. Using the declarative extension you can access the table through the class' __table__ attribute. Furthermore, you might find the Runtime Inspection API helpful.

How to get datatypes of specific fields of an Access database using pyodbc?

I'm using pyodbc to data-mine a big database in a .mbd (access) file.
I want to create a new table taking relevant information from several existing tables (to then feed it to a tool).
I think I know all I need to transfer the data, and I know how to create a table given column names and datatypes, but I'm having trouble getting the datatypes (INTEGER, VARCHAR, etc.) of the respective columns in the existing tables. I need these types to create the new columns compatibly.
What I found on the internet (like this and this) is getting me into invalid-command trouble, so I think this is a platform-specific issue. Then again, I'm fairly green on databases.
Does anybody know how to get the types of these fields?
The reason why those articles aren't helping you is because they are for SQL Server. SQL Server has system tables that you can query to get the column data, MS Access doesn't. MS Access only lets you query the object names.
However ODBC does support getting the schema through its connection via the ODBC.SQLColumns functions.
According to this answer PyODBC exposes this via a cursor method
# columns in table x
for row in cursor.columns(table='x'):
print row.column_name
As Mark noted in the comments you probably also want the row.data_type. The link he provided includes all the columns it provides
table_cat
table_schem
table_name
column_name
data_type
type_name
column_size
buffer_length
decimal_digits
num_prec_radix
nullable
remarks
column_def
sql_data_type
sql_datetime_sub
char_octet_length
ordinal_position
is_nullable: One of SQL_NULLABLE, SQL_NO_NULLS, SQL_NULLS_UNKNOWN.
I am not familiar with pyodbc, but I have done this in VBA in the past.
The 2 links you mentionned are for SQL Server, not for Access. To find out the data type of each field in an Access table, you can use DAO or ADOX.
Here is an example I did, in VBA with Excel 2010, where I connect to the Access database (2000 mdb format) and list the tables, fields and their datatypes (as an enum, for example '4' means dbLong). You can see in the output the system tables and, at the bottom, tables created by the user.
You can easily find examples on internet for how to do something similar with ADOX. I Hope this helps.
Private Sub TableDefDao()
Dim db As DAO.Database
Set db = DAO.OpenDatabase("C:\Database.mdb")
Dim t As DAO.TableDef
Dim f As DAO.Field
For Each t In db.TableDefs
Debug.Print t.Name
For Each f In t.Fields
Debug.Print vbTab & f.Name & vbTab & f.Type
Next
Next
End Sub
You'll get some info from this output:
import pandas as pd
dbq='D:\....\xyz.accdb'
conn=pyodbc.connect(r"Driver={Microsoft Access Driver (*.mdb, *.accdb)}; Dbq=%s;" %(dbq))
query = 'select * from tablename'
dataf = pd.read_sql(query, conn)
print(list(dataf.dtypes))

Categories