Python, Oracle DB, XML data in a column, fetching cx_Oracle.Object - python

I am using python to fetch data from Oracle DB. All the rows have a column which has XML data. When I print the data fetched from Oracle DB using python, the column with XML data is printed as - cx_Oracle.OBJECT object at 0x7fffe373b960 etc. I even converted the data to pandas data frame and still the data for this columns is printed as cx_Oracle.OBJECT object at 0x7fffe373b960. I want to access the key value data stored in this column(XML files).

Please read inline comments.
cursor = connection.cursor() # you know what it is for
# here getClobVal() returns whole xml. It won't work without alias I don't know why.
query = """select a.columnName.getClobVal() from tablename a"""
cursor.execute(query) #you know what it is for
result = cursor.fetchone()[0].read() # for single record
result = cursor.fetchall() # for all records
for res in result:
print res[0].read()

Related

Handling UUID values in Arrow with Parquet files

I'm new to Python and Pandas - please be gentle!
I'm using SqlAlchemy with pymssql to execute a SQL query against a SQL Server database and then convert the result set into a dataframe. I'm then attempting to write this dataframe as a Parquet file:
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
The data I'm retrieving in the SQL query includes a uniqueidentifier column (i.e. a UUID) named rowguid. Because of this, I'm getting the following error on the last line above:
pyarrow.lib.ArrowInvalid: ("Could not convert UUID('92c4279f-1207-48a3-8448-4636514eb7e2') with type UUID: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column rowguid with type object')
Is there any way I can force all UUIDs to strings at any point in the above chain of events?
A few extra notes:
The goal for this portion of code was to receive the SQL query text as a parameter and act as a generic SQL-to-Parquet function.
I realise I can do something like df['rowguid'] = df['rowguid'].astype(str), but it relies on me knowing which columns have uniqueidentifier types. By the time it's a dataframe, everything is an object and each query will be different.
I also know I can convert it to a char(36) in the SQL query itself, however, I was hoping to do something more "automatic" so the person writing the query doesn't trip over this problem accidentally all the time / doesn't have to remember to always convert the datatype.
Any ideas?
Try DuckDB
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
# Close the database connection
conn.close()
# Create DuckDB connection
duck_conn = duckdb.connect(':memory:')
# Write DataFrame content to a snappy compressed parquet file
COPY (SELECT * FROM df) TO 'df-snappy.parquet' (FORMAT 'parquet')
Ref:
https://duckdb.org/docs/guides/python/sql_on_pandas
https://duckdb.org/docs/sql/data_types/overview
https://duckdb.org/docs/data/parquet

Upsert / merge tables in SQLite

I have created a database using sqlite3 in python that has thousands of tables. Each of these tables contains thousands of rows and ten columns. One of the columns is the date and time of an event: it is a string that is formatted as YYYY-mm-dd HH:MM:SS, which I have defined to be the primary key for each table. Every so often, I collect some new data (hundreds of rows) for each of these tables. Each new dataset is pulled from a server and loaded in directly as a pandas data frame or is stored as a CSV file. The new data contains the same ten columns as my original data. I need to update the tables in my database using this new data in the following way:
Given a table in my database, for each row in the new dataset, if the date and time of the row matches the date and time of an existing row in my database, update the remaining columns of that row using the values in the new dataset.
If the date and time does not yet exist, create a new row and insert it to my database.
Below are my questions:
I've done some searching on Google and it looks like I should be using the UPSERT (merge) functionality of sqlite but I can't seem to find any examples showing how to use it. Is there an actual UPSERT command, and if so, could someone please provide an example (preferably with sqlite3 in Python) or point me to a helpful resource?
Also, is there a way to do this in bulk so that I can UPSERT each new dataset into my database without having to go row by row? (I found this link, which suggests that it is possible, but I'm new to using databases and am not sure how to actually run the UPSERT command.)
Can UPSERT also be performed directly using pandas.DataFrame.to_sql?
My backup solution is loading in the table to be UPSERTed using pd.read_sql_query("SELECT * from table", con), performing pandas.DataFrame.merge, deleting the said table from the database, and then adding in the updated table to the database using pd.DataFrame.to_sql (but this would be inefficient).
Instead of going through upsert command, why don't you create your own algorithim that will find values and replace them if date & time is found, else it will insert new row. Check out my code, i wrote for you. Let me know if you are still confused. You can even do that for hundereds of tables just by replacing table name in algorithim with some variable and changing it for the whole list of your table names.
import sqlite3
import pandas as pd
csv_data = pd.read_csv("my_CSV_file.csv") # Your CSV Data Path
def manual_upsert():
con = sqlite3.connect(connection_str)
cur = con.cursor()
cur.execute("SELECT * FROM my_CSV_data") # Viewing Data from Column
data = cur.fetchall()
old_data_list = [] # Collection of All Dates already in Database table.
for line in data:
old_data_list.append(line[0]) # I suppose you Date Column is on 0 Index.
for new_data in csv_data:
if new_data[0] in old_data_list:
cur.execute("UPDATE my_CSV_data SET column1=?, column2=?, column3=? WHERE my_date_column=?", # it will update column based on date if condition is true
(new_data[1],new_data[2],new_data[3],new_data[0]))
else:
cur.execute("INSERT INTO my_CSV_data VALUES(?,?,?,?)", # It will insert new row if date is not found.
(new_data[0],new_data[1],new_data[2],new_data[3]))
con.commit()
con.close()
manual_upsert()
First, even though the questions are related, ask them separately in the future.
There is documentation on UPSERT handling in SQLite that documents how to use it but it is a bit abstract. You can check examples and discussion here: SQLite - UPSERT *not* INSERT or REPLACE
Use a transaction and the statements are going to be executed in bulk.
As presence of this library suggests to_sql does not create UPSERT commands (only INSERT).

Python BigQuery Error:400 configuration.query.destinationTable cannot be set for scripts

Hello everyone what I am trying to do is through a query insert data to a tableA(), once data is inserted into Table A then delete the newly inserted values in A, then write the response/output into Table B that I created.
Here is my python code :
client = bigquery.Client()
#This is Table B
table_id = "ntest.practice.btabletest"
#here is the table I am writing my deleted output to
job_config = bigquery.QueryJobConfig(destination=table_id)
sql2 ="""
INSERT INTO `ntest.practice.atabletest`(%s) VALUES (%s);
DELETE FROM `ntest.practice.atabletest`
WHERE name = 'HEART'
"""%(columns_aaa,valueaaa)
query_job1 = client.query(sql2,job_config=job_config) # Make an API request.
query_job1.result() # Waits for query to finish
print("Query results loaded to the table {}".format(table_id))
Yet, I get an error code saying:
google.api_core.exceptions.BadRequest: 400
configuration.query.destinationTable cannot be set for scripts
Any thoughts on how to fix this error? I don't believe, that my query is wrong, nor my tables or values are incorrect.
Although BigQuery scripting doesn't support destination table, it doesn't seems that you'll need it for your specific query.
DELETE query never writes any data to a destination table. You may work around it by sending INSERT first then DELETE, this way, destination table will "work" (I mean BigQuery won't complaint about it). But you're getting an empty table.

Python - parsing through a variable for desired data

this should be pretty simple.
I'm writing a program to pull data from a database to be stored in a variable and passed on to another program. I have it working to connect to the db and run the query to pull the data, which is returned in a new line for each column. I would like to parse through this output to store only the columns I need into separate variables to be imported by another python program. Please note that the print(outbound) part is just there for testing purposes.
Here's the function:
def pullData():
cnxn = pyodbc.connect('UID='+dbUser+';PWD='+dbPassword+';DSN='+dbHost)
cursor = cnxn.cursor()
#run query to pull the newest sms message
outbound = cursor.execute("SELECT * FROM WKM_SMS_outbound ORDER BY id DESC")
table = cursor.fetchone()
for outbound in table:
print(outbound)
#close connection
cnxn.close()
and here's the sample output from the query that I would like to parse through- as it's currently being stored in variable outbound. (NOTE) This is not 1 column. This is one ROW. Each new line is a new column in the db... this is just how it's being returned and formatted when I run the program.
I think this is the best way you can achieve this:
(Considering that your table variable returns as list)
# Lets say until here you've done your queries
collection = {}
for index,outbound in enumerate(table):
key_name = "key{0}".format(index)
collection[key_name] = outbound
print(collection)
OUTPUT Expected:
{
"key1" : 6932921,
"key2" : 303794,
...
...
...
}
And then what you can do to access it from another python file by importing the collection variable by adding return collection on your pulldata function
On the other python file it will be simple just to:
from your_file_name import collection # considering they are on the same directory

GET data using requests than insert into DB2

Currently I am trying to retrieve JSON data from an API and store it in a database. I am able to retrieve the JSON data as a list, and I am able to connect to and query the DB2 Database. My issue is that I can not figure out how to generate an INSERT statement for the data retrieved from the API. The application is only for short term personal use, so SQL Injection attacks are not a concern. So overall I need to generate an sql insert statement from a list. My current code is below, with the api url and info changed.
import ibm_db
import requests
ibm_db_conn = ibm_db.connect("DATABASE=node1;HOSTNAME=100.100.100.100;PORT=50000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
api_request = requests.get("http://api-url/resource?api_key=123456",
auth=('user#api.com', 'password'))
api_code = api_request.status_code
api_data = api_request.json()
print(api_code)
print(api_data)
Depends on the format of the Json returned, and on what your table looks like. My first thought, though, is to use Python's json module:
import json
#...
#...
api_data = json.loads(api_request.json())
Now, you have a Python object you can access like normal:
api_data["key"][2]
for instance. You can itterate, slice, or do whatever else to extract the data you want. Say your json represented rows to be inserted:
query = "INSERT INTO <table> VALUES\n"
i = 0
for row in api_data:
query += "(%s)" %([i for i in row])
if i < len(api_data)-1: query += ",\n"
i += 1
Again, this will vary greatly depending on the format of your table and JSON, but that's the general idea I'd start with.

Categories