Is there a way of creating Biglake Tables through Python? - python

In the documentation I see no reference to BigLake tables. I wonder if there's a way of setting ExternalDataConfiguration to use them.

Found it out: if you provide a connection ID it will be used for setting the table as a BigLake Table (see https://stackoverflow.com/a/73987775/9944075 on how to create the connection)

Related

Relational DB - separate joined tables

Is there any way to join tables from a relational database and then separate them again ?
I'm working on a project that involves modifying the data after having joined them. Unfortunately, modifying the data bejore the join is not an option. I would then want to separate the data according to the original schema.
I'm stuck at the separating part. I have metadata (python dictionary) with the information on the tables (primary keys, foreign keys, fields, etc.).
I'm working with python. So, if you have a solution in python, it would be greatly appretiated. If an SQL solution works, that also helps.
Edit : Perhaps the question was unclear. To summarize I would like to create a new database with an identical schema to the old one. I do not want to make any modifications to the original database. The data that makes up the new database must originally be in a single table (result of a join of the old tables) as the operations that need to be run must be run on a single table and I cannot run these operations on invidual tables as the outcome will not be as desired.
I would like to know if this is possible and, if so, how can I achieve this?
Thanks!

How to create a managed Hive Table using pyspark

I am facing the problem that every table I create using pyspark has type EXTERNAL_TABLE in hive, but I want to create managed tables and don't know what I am doing wrong. I tried different possibilities to create those tables. For instance:
spark.sql('CREATE TABLE dev.managed_test(key int, value string) STORED AS PARQUET')
spark.read.csv('xyz.csv').write.saveAsTable('dev.managed_test2')
In both options the resulting table is an EXTERNAL_TABLE. When I describe the table in Apache Hue or Beeline, I find also the property TRANSLATED_TO_EXTERNAL is true.
Does anyone have an idea, what could be wrong or what I could do instead of these two options shown above? Maybe, I am missing some Configuration parameter?
Thank you!

Update SQL database registers based on JSON

I have a table with 30k clients, with the ClientID as primary key.
I'm getting data from API calls and inserting them into the table using python.
I'd like to find a way to insert rows with new clients and, if the ClientID that comes with the API call already exists in the table, update the existing register with the updated information of this client.
Thanks!!
A snippet of code would be nice to show us what exactly you are doing right now. I presume you are using an ORM like SqlAlchemy? If so, then you are looking at doing an UPSERT type of an operation.
That is already answered HERE
Alternatively, if you are executing raw queries without an ORM then you could write a custom procedure and pass required parameters. HERE is a good write up on how that is done in MSSQL under high concurrency. You could use this as a starting point for understanding and then re-write it for PostgreSQL.

Python, SQLAlchemy, MySQL: Insert data between existing records

Unfortunately I couldn't find any useful information on this topic. I have an existing database with existing tables and also existing data in it. Now I have to add new data in between the existing data. My code would look something like this, but it doesn't work:
INSERT INTO table_name(data) VALUES('xyz')
WHERE DATETIME(datetime) > DATETIME('2017-01-01 02:00:00');
I have created an image for a better understanding of my question.
Please take notice, that I need the Primary Key to adapt to the made changes as you can see in the picture. My tools are Python, SQLAlchemy and MySQL. I look forward to every help.

Select Data from Table and Insert into a different DB

I'm using python and psycopg2 to remotely query some psql databases, and I'm trying to figure out the best way to select the data I need from the remote table, and insert it into a table on a separate DB (local application server).
Most of the stuff I've read has directed me to avoid executemany and look toward COPY operations, but I'm unsure how to implement this on a specific select statement as opposed to the entire table. Should I be headed this way or am I completely off?
but I'm unsure how to implement this on a specific select statement as opposed to the entire table
COPY isn't limited to tables, you can use a query as the source as well, check out the examples in the manual, it shows how to use COPY to create a text file based on a query:
http://www.postgresql.org/docs/current/static/sql-copy.html#AEN59055
(3rd example)
Take a look at http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/
Granted, this is pulling from Oracle and inserting into SQL Server, but the concepts should be the same.

Categories