How to execute simple SQL query such as update in pyspark? - python

I want to execute update query in SQL using pyspark based some logic I am using. All I could find is documentation on how to read from SQL
BUT
there are no proper examples of executing update or create statement.

Related

can i store an sql query full of CTEs as a variable in python and use that variable to join with another query

i am trying to automate some SQL reports and trying to replicate the query in python.
I want to store my sql query (full of CTEs) in a simple variable lets say 'query_report'.
I then want to use 'query_report' in another sql query to join the two queries together.
Is what i am asking even possible?
Any help would be much appreciated.
Thanks

How can I use a SQL view to write a pandas dataframe to a SQL table from within python?

I have a dataframe in my python program with columns corresponding to a table on my SQL server. I want to append the contents of my dataframe to the SQL table. Here's the catch: I'm not permissioned to access the SQL table itself, I can only interact with it through a view.
I know if I could write directly to the table I could use SQL alchemy's to_sql function. However, I can only use a view to write to the table in the database.
Is this even possible? Thanks for the help.

Zeppelin: What the best way to query data with SQL and work with it?

I want to use Zeppelin to query databases. I currently see two possibilities but none of them is sufficient for me:
Configure a database connection as "interpreter", name it e.g. "sql1", use it in a paragraph, run a sql query and use the inbuilt nice plotting tools. It seems that all the tutorials and tips deal with it but then the documentation suddenly stops! But I want to do more with the data: I want to filter and process. If I want to plot it again (with other limitations), I have to do the query (that may last some seconds or minutes) again (see my other question Zeppelin SQL: reuse data of query without another interpreter or a new query)
Use spark with python, scala or similar. But the documentation seems only to load csv data, put in into a dataframe and then accesses this dataframe with sql. There is no accessing the data with sql in the first place. How do I access the sql data the best way? Can I use a already configured "interpreter" (database connection)?
You can use Zeppelin API to retrieve paragraph data:
val buffer = scala.io.Source.fromURL("http://XXXXX:9995/api/notebook/2CN2QP93H/paragraph/20170713-092810_1633770798").mkString
val df = sqlContext.read.json(sc.parallelize(buffer :: Nil)).select("body.text")
df.first.getAs[String](0)
This Spark Scala lines will retrieve the SQL query used by a paragprah. You could do same thing to get results I think.
I cannot find a solution for 1. But I have made a short solution for 2. that works within zeppelin with python (2.7), sqlalchemy (sql wrapper), mysqldb (mysql implementation) and pandas (make sure that have these packages installed, all of them are in Debian 9). I wonder why I have not found such a solution before...
%python
from sqlalchemy import create_engine
import pandas as pd
sql = "select col1, col2 from table limit 10"
df = pd.read_sql(sql,
create_engine('mysql+mysqldb://user:password#host:3306/database').connect())
z.show(df)
If you want to connect to another database like db2 or oracle, you have to use other python packages and adjust the first part in the create_engine string.

mysql: update millions of records by applying a python function

I have a python function (pyfunc):
def pyfunc(x):
...
return someString
I want to apply this function to every item in a mysql table column,
something like:
UPDATE tbl SET mycol=pyfunc(mycol);
This update includes tens of millions of records.
Is there an efficient way to do this?
Note: I cannot rewrite this function in sql or any other programming language.
If your pyfunc does not depend on other data sources like apis or any cache, and just does some data processing like string or mathematical manipulations, or depends on the data stored in same database in mysql, you shall go for MySQL user defined functions.
Lets assume you create a MySQL function called colFunc , then your query would be
Update tbl set mycol=colFunc(mycol)
Just prepare update.sql file using python script.
After this you can check update on your local machine(with dump of db). Just connect to sql and run update.sql script which was prepared from python.
In this case you will use raw sql query without python for updating data.
I think it is not bad solution.

In SQLAlchemy how do I preview SQL statements before committing for debugging purposes?

I want to see the SQL code instead of doing an actual db.commit(). This is for a one-off database population script that I want to verify is working as intended before actually making the changes.
Try calling str() on the query object.
print query_object.str()
From:
How do I get a raw, compiled SQL query from a SQLAlchemy expression?
Other possible solutions:
SQLAlchemy: print the actual query
How to retrieve executed SQL code from SQLAlchemy
The newest (as of v0.9) answer is also:
Retrieving ultimate sql query sentence (with the values in place of any '?') (by Mike Bayer)

Categories