Connecting athena to superset - python

So, I am using AWS athena where I have Data Source set to AwsDataCatalog, database set to test_db, under which I have a table named debaprc.
Now, I have superset installed on an EC2 instance (in virtual environment). On the Instance, I have installed PyAthenaJDBC and PyAthena. Now, when I launch Superset and try to add a database, the syntax given is this:
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}#athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}
Now I have 2 questions -
What do I provide for schema_name?
I tried putting test_db as schema_name but it couldn't connect for some reason. Am I doing this right or do I need to do stuff differently?

Beware of the encoding:
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}#athena.{region_name}.amazonaws.com:443/{schema_name}?AwsRegion={region_name}&s3_staging_dir=s3%3A%2F%2Faws-athena-results-xxxxxxx
For example, for me it has been necessary to:
transform s3:// to s3%3A%2F%2F (and not just the : like in Superset doc?)
add the region again in the extra parameters
If you do not provide schema name (also called database), I think it defaults to a value of default
Sadly when a connection string fails on Superset, nothing very helpful is displayed...

It worked for me adding port 443 to the connection string as below and you can use test_db as schema_name:
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}#athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}

Check PyAthena version. Superset docs tell PyAthena>1.2.0 while PyAthena PyPI page says PyAthena[SQLAlchemy]>=1.0.0, <2.0.0. In my case PyAthena[SQLAlchemy]>1.2.0, <2.0.0 (combining both constraints) solved an issue and the tables were present in dropdown list in SQL Lab (it was empty with PyAthena==2.5.1 (latest) version before).

Related

How can I add an sqlite database to Apache Superset?

I'm trying to add a python sqlite3 generated database to superset. Getting that strange error. Is there a way to work around it?
You have to modify superset configuration (config.py file) adding this parameter:
PREVENT_UNSAFE_DB_CONNECTION = False
This is the link to a similar question in superset github repository: https://github.com/apache/incubator-superset/issues/9748, it points to the request to add this security measure.

Specify database backend store creation in specific schema

When creating an mlflow tracking server and specifying that a SQL Server database is to be used as a backend store, mlflow creates a bunch of table within the dbo schema. Does anyone know if it is possible to specify a different schema in which to create these tables?
It is possible to alter mlflow/mlflow/store/sqlalchemy_store.py to change the schema of the tables that are stored.
It is very likely that this is the wrong solution for you, since you will go out of sync with the open source and lose newer features that alter this, unless you maintain the fork yourself. Could you maybe reply with your use case?
You can use postgres uri options:
Postgres URI options sample:
"postgresql://postgres:postgres#localhost:5432/postgres?options=-csearch_path%3Ddbo,mlflow_schema"
In your Mlflow Code:
mlflow.set_tracking_uri("postgresql://postgres:postgres#localhost:5432/postgres?options=-csearch_path%3Ddbo,mlflow_schema")
! Don't forget to create 'mlflow_schema' schema.
how-to-specify-schema-in-psycopg2-connection-method
I'm using MSSQLServer as the backend store. I could use a different schema than dbo by specifying the default schema for the SQLServer user being used by MLFlow.
In my case, if the MLFlow tables (e.g: experiences) exist in dbo, then those tables will be used. If not, MLFlow will create those tables in the default schema.

How do I set up a login system in pyramid using mysql as database to store email and password?

I have gone through this http://docs.pylonsproject.org/projects/pyramid/en/latest/quick_tutorial/authentication.html but it does not give any clue how to add database to this to store email and password?
The introduction to the Quick Tutorial describes its purpose and intended audience. Authentication and persistent storage are not covered in the same lesson, but in two different lessons.
Either you can combine learning from previous steps (not recommended) or you can take a stab at the SQLAlchemy + URL dispatch wiki tutorial which covers a typical web application with authentication, authorization, hashing of passwords, and persistent storage in an SQL database.
Note however that it uses SQLite, not MySQL, as its SQL database, so you'll either have to use the provided one or swap it out for your preferred SQL database.
Here are a few suggestions regarding switching from SQLite to MySQL. In your development.ini (and/or production.ini) file, change from SQLite to MySQL:
# sqlalchemy.url = sqlite:///%(here)s/MyProject.sqlite [comment out or remove this line]
sqlalchemy.url = mysql://MySQLUsername:MySQLPassword#localhost/MySQLdbName
Of course, you will need a MySQL database (MySQLdbName in the example above) and likely the knowledge and privileges to edit its metadata, for example, to add fields called user_email and passwordhash to the users table or create a users table if necessary.
In your setup.py file, you will require the mysql-python module to be imported. An example would be:
requires = [
'bcrypt',
'pyramid',
'pyramid_jinja2',
'pyramid_debugtoolbar',
'pyramid_tm',
'SQLAlchemy',
'transaction',
'zope.sqlalchemy',
'waitress',
'mysql-python',
]
After specifying new module(s) in setup.py, be sure to run the following commands so your project recognizes the new module(s):
cd $VENV/MyPyramidProject
sudo $VENV/bin/pip install -e .
By this point, your Pyramid project should be hooked up to MySQL. Now it is down to learning the details of Pyramid (and SQLAlchemy if this is your selected ORM). Much of the suggestions in the tutorials, partcularly the SQLAlchemy + URL dispatch wiki tutorial in your case, should work as they work with SQLite.

error in retrieving tables in unicode data using Azure/Python

I'm using Azure and the python SDK.
I'm using Azure's table service API for DB interaction.
I've created a table which contains data in unicode (hebrew for example). Creating tables and setting the data in unicode seems to work fine. I'm able to view the data in the database using Azure Storage Explorer and the data is correct.
The problem is when retrieving the data.. Whenever I retrieve specific row, data retrieval works fine for unicoded data:
table_service.get_entity("some_table", "partition_key", "row_key")
However, when trying to get a number of records using a filter, an encode exception is thrown for any row that has non-ascii chars in it:
tasks = table_service.query_entities('some_table', "PartitionKey eq 'partition_key'")
Is this a bug on the azure python SDK? Is there a way to set the encoding beforehand so that it won't crash? (azure doesn't give access to sys.setdefaultencoding and using DEFAULT_CHARSET on settings.py doesn't work as well)
I'm using https://www.windowsazure.com/en-us/develop/python/how-to-guides/table-service/ as reference to the table service API
Any idea would be greatly appreciated.
This looks like a bug in the Python library to me. I whipped up a quick fix and submitted a pull request on GitHub: https://github.com/WindowsAzure/azure-sdk-for-python/pull/59.
As a workaround for now, feel free to clone my repo (remembering to checkout the dev branch) and install it via pip install <path-to-repo>/src.
Caveat: I haven't tested my fix very thoroughly, so you may want to wait for the Microsoft folks to take a look at it.

Utf-8 with sqlalchemy on a database with init connect

I am trying to use sqlalchemy to connect with mysql database. I have set up charset=utf-8$use_unicode=0. This worked with almost all databases, but not with a particular one. I believe it is because it has 'init-connect' variable set to 'SET NAMES latin2;' I have no privileges to change that.
It works for me if I send explicit query SET NAMES utf8, however if there is a temporal disconnection, then after reconnecting my program breaks again as it gets lati2-encoded data from the server.
Is it possible to create some hook to always send the SET NAMES when sqlalchemy connects? Or any other way to solve this problem?
Sounds like what you want is a custom PoolListener. This SO answer explains how to write one in the context of SQLite's PRAGMA foreign_keys=ON
Sqlite / SQLAlchemy: how to enforce Foreign Keys?

Categories