Dynamically using INSERT for cx_Oracle - Python - python

I've been looking around so hopefully someone here can assist:
I'm attempting to use cx_Oracle in python to interface with a database; my task is to insert data from an excel file to an empty (but existing) table.
I have the excel file with almost all of the same column names as the columns in the database's table, so I essentially want to check if the columns share the same name; and if so, I insert that column from the excel (dataframe --pandas) file to the table in Oracle.
import pandas as pd
import numpy as np
import cx_Oracle
df = pd.read_excel("employee_info.xlsx")
con = None
try:
con = cx_Oracle.connect (
config.username,
config.password,
config.dsn,
encoding = config.encoding)
except cx_Oracle.Error as error:
print(error)
finally:
cursor = con.cursor()
rows = [tuple(x) for x in df.values]
cursor.executemany( ''' INSERT INTO ODS.EMPLOYEES({x} VALUES {rows}) '''
I'm not sure what sql I should put or if there's a way I can use a for-loop to iterate through the columns but my main issue stems from how can I dynamically add these for when our dataset grows in columns?
I check the columns that match by using:
sql = "SELECT * FROM ODS.EMPLOYEES"
cursor.execute(sql)
data = cursor.fetchall()
col_names = []
for i in range (0, len(cursor.description)):
col_names.append(cursor.description[i][0])
a = np.intersect1d(df.columns, col_names)
print("common columns:", a)
that gives me a list of all the common columns; so I'm not sure? I've renamed the columns in my excel file to match the columns in the database's table but my issue is that how can I match these in a dynamic/automated way so I can continue to add to my datasets without worrying about changing the code.
Bonus: I also am importing SQL in a case statement to create a new column where I'm rolling up a few other columns; if there's a way to add this to the first part of my SQL or if it's advisable to do all manipulations before using an insert statement that'll be helpful to know as well.

Look at https://github.com/oracle/python-oracledb/blob/main/samples/load_csv.py
You would replace the CSV reading bit with parsing your data frame. You need to construct a SQL statement similar to the one used in that example:
sql = "insert into LoadCsvTab (id, name) values (:1, :2)"
For each spreadsheet column that you decide matches a table column, construct the (id, name) bit of the statement and add another id to the bind section (:1, :2).

Related

Is There a Way To Select All Columns From The Table And Except One Column

I have been using sqlite3 with python for creating databases. Till Now I have been successful,
But Unfortunately I have No way Out Of This. I have A Table With 63 columns but I Want To Select Only 62 Out Of Them, I am Sure That I can write The Names of The Columns In The Select Statement. But Writing 62 Of Them seems like a non-logical(for a programmer like me) idea for me. I am using Python-sqlite3 databases. Is There A Way Out Of This
I'm Sorry If I am Grammarly Mistaken.
Thanks in advance
With Sqlite, you can:
do a PRAGMA table_info(tablename); query to get a result set that describes that table's columns
pluck the column names out of that result set and remove the one you don't want
compose a column list for the select statement using e.g. ', '.join(column_names) (though you might want to consider a higher-level SQL statement builder instead of playing with strings).
Example
A simple example using a simple table and an in-memory SQLite database:
import sqlite3
con = sqlite3.connect(":memory:")
con.executescript("CREATE TABLE kittens (id INTEGER, name TEXT, color TEXT, furriness INTEGER, age INTEGER)")
columns = [row[1] for row in con.execute("PRAGMA table_info(kittens)")]
print(columns)
selected_columns = [column for column in columns if column != 'age']
print(selected_columns)
query = f"SELECT {', '.join(selected_columns)} FROM kittens"
print(query)
This prints out
['id', 'name', 'color', 'furriness', 'age']
['id', 'name', 'color', 'furriness']
SELECT id, name, color, furriness FROM kittens

Update SQL database with dataframe content

I have a pandas dataframe containing two columns: ID and MY_DATA. I have an SQL database that contains a column named ID and some other data. I want to match the ID of the SQL database column to the rows of the dataframe ID column and update it with a new column MY_DATA.
So far I used the following:
import sqlite3
df = pd.read_csv('my_filename.csv')
con = sqlite3.connect('my_database.sqlite')
cur = con.cursor()
for row in cur.execute('SELECT ID FROM main;'):
for i in len(df):
if (row[i] == df.ID.iloc[i]):
update_sqldb(df, i)
However, I think this way of having two nested for-loops is probably ugly and not very pythonic. I thought that maybe I should use the map() function, but is this the right direction to go?

Splitting one large comma separated row into many rows after number of values

I'm rather new to MySQL so apologies if this is an intuitive problem, I couldn't find anything too helpful in stackoverflow. I have a rather large amount of financial data in one row currently, with each value separated by a comma. 12 values equals one set of data and so I want to create a new row after every 12 values.
In other words, the data I have looks like this:
(open_time,open,high,low,close,volume,close_time,quotevol,trades,ignore1,ignore2,ignore3, ...repeat...)
And I'd like for it to look like:
Row1:(open_time,open,high,low,close,volume,close_time,quotevol,trades,ignore1,ignore2,ignore3)
Row2:(open_time2,open2,high2,low2,close2,volume2,close_time2,quotevol2,trades2,ignore4,ignore5,ignore6)
Row3:
...
The data is already a .sql file and I have it in a table too if that makes a difference.
To clarify, the table it is in has only one row and one column.
I don't doubt there is a way to do it in MySQL, but I would approach it by exporting out the record as .CSV.
Export to CSV
Write a simple python script using the CSV module and shift every x number of fields to a new row using the comma as a delimiter. Afterward, you can reimport it back into MySQL.
If I understand correctly, you want to do the following:
Get the string from the database, which is located in the first row of the first column in the query results
Break the string into "rows" with 12 values long
Be able to use this data
The way I would go about this in Python is to:
Create a mysql connection and cursor
Execute the query to pull the data from the database
Put the data from the single cell into a string
Split the string at each comma and add those values to a list
Break that list into chunks of 12 elements each
Put this data into a tabular form for easy consumption
Code:
import mysql
import pandas as pd
query = '''this is your sql statement that returns everything into the first row of the first column in your query results'''
cnx = mysql.connector.connect('''enter relevant connection information here: user, password, host, and database''')
mycursor = cnx.cursor()
mycursor.execute(query)
tup = tuple(mycursor.fetchall()[0])
text = str(tup[0])
ls = text.split(',') # converts text into list of items
n = 12
rows = [ls[i:i + n] for i in range(0, len(ls), n)]
data = []
for row in rows:
data.append(tuple(row))
labels = ['open_time','open','high','low','close','volume','close_time','quotevol','trades','ignore1','ignore2','ignore3']
df = pd.DataFrame.from_records(data, columns=labels)
print(df)
The list comprehension code was taken from this. You did not specify exactly how you wanted your resultant dataset, but the pandas data frame should have each of your rows.
Without an actual string or dataset, I can't confirm that this works entirely. Would you be able to give us a Minimal, Complete, and Verifiable example?

Importing SQL query into Pandas results in only 1 column

I'm trying to import the results of a complex SQL query into a pandas dataframe. My query requires me to create several temporary tables since the final result table I want includes some aggregates.
My code looks like this:
cnxn = pyodbc.connect(r'DRIVER=foo;SERVER=bar;etc')
cursor = cnxn.cursor()
cursor.execute('SQL QUERY HERE')
cursor.execute('SECONDARY SQL QUERY HERE')
...
df = pd.DataFrame(cursor.fetchall(),columns = [desc[0] for desc in cursor.description])
I get an error that tells me shapes aren't matching:
ValueError: Shape of passed values is (1,900000),indices imply (5,900000)
And indeed, the result of all the SQL queries should be a table with 5 columns rather than 1. I've run the SQL query using Microsoft SQL Server Management Studio and it works and returns the 5 column table that I want. I've tried to not pass any column names into the dataframe and printed out the head of the dataframe and found that pandas has put all the information in 5 columns into 1. The values in each row is a list of 5 values separated by commas, but pandas treats the entire list as 1 column. Why is pandas doing this? I've also tried going the pd.read_sql route but I still get the same error.
EDIT:
I have done some more debugging, taking the comments into account. The issue doesn't appear to stem from the fact that my query is nested. I tried a simple (one line) query to return a 3 column table and I still got the same error. Printing out fetchall() looks like this:
[(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),
(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),...]
Use pd.DataFrame.from_records instead:
df = pd.DataFrame.from_records(cursor.fetchall(),
columns = [desc[0] for desc in cursor.description])
Simply adjust the pd.DataFrame() call as right now cursor.fetchall() returns one-length list of tuples. Use tuple() or list to map child elements into their own columns:
df = pd.DataFrame([tuple(row) for row in cur.fetchall()],
columns = [desc[0] for desc in cursor.description])

Pymssql insert multiple from list of dictionaries with dynamic column names

I am using python 2.7 to perform CRUD operations on a MS SQL 2012 DB.
I have data stored in a List of Dictionaries "NewComputers" (each dictionary is a row in the database).
This is working properly. However, the source data column names and the destination column names are both hard-coded (note that the column names are different).
Questions: Instead of hard-coding the data source column names, how can I loop over the dictionary to determine the column names dynamically? Also, instead of hard-coding the destination (database table) column names, how can I loop over a list of column names dynamically?
I would like to make this function re-usable for different data source columns and destination columns.
In other words:
"INSERT INTO Computer (PARAMETERIZED LIST OF COLUMN NAMES) VALUES (PARAMETERIZED LIST OF VALUES)"
Here is the function:
def insertSR(NewComputers):
conn = pymssql.connect(mssql_server, mssql_user, mssql_pwd, "Computers")
cursor = conn.cursor(as_dict=True)
try:
cursor.executemany("INSERT INTO Computer (ComputerID, HostName, Type) VALUES (%(computer_id)s, %(host_name)s, %(type)s)", NewComputers) # How to make the column names dynamic?
except:
conn.rollback()
print("ERROR: Database Insert failed.")
conn.commit()
print("Inserted {} rows successfully".format(cursor.rowcount))
conn.close()
You can't do what you'd like to.
Basically, your multiple insert SQL query will get translated to :
insert into table (column1, column2, column3) values (a1,a2,a3), (b1,b2,b3)
So as you can see, you'll at least have to make one different query per destination columns group.
Then on the data source side, (a1,a2,a3),(b1,b2,b3) in my example, you don't have to specify the column name, so you can have different data sources for a given destination.
On this part, I'd do something like this :
First build a correspondance dict, key is the destination field name, and values are the other names used for this field in the data source tables :
source_correspondance = {
'ComputerID':['id_computer', 'computer_Id'],
'HostName': ['host', 'ip', 'host_name'],
'Type':['type', 'type']
}
Then iterate over your data source, and replace the column name by the key of your correspondance dict.
Then finally you can build your queries (1 executemany per destination 'type').

Categories