I have a pandas dataframe containing two columns: ID and MY_DATA. I have an SQL database that contains a column named ID and some other data. I want to match the ID of the SQL database column to the rows of the dataframe ID column and update it with a new column MY_DATA.
So far I used the following:
import sqlite3
df = pd.read_csv('my_filename.csv')
con = sqlite3.connect('my_database.sqlite')
cur = con.cursor()
for row in cur.execute('SELECT ID FROM main;'):
for i in len(df):
if (row[i] == df.ID.iloc[i]):
update_sqldb(df, i)
However, I think this way of having two nested for-loops is probably ugly and not very pythonic. I thought that maybe I should use the map() function, but is this the right direction to go?
Related
I've been looking around so hopefully someone here can assist:
I'm attempting to use cx_Oracle in python to interface with a database; my task is to insert data from an excel file to an empty (but existing) table.
I have the excel file with almost all of the same column names as the columns in the database's table, so I essentially want to check if the columns share the same name; and if so, I insert that column from the excel (dataframe --pandas) file to the table in Oracle.
import pandas as pd
import numpy as np
import cx_Oracle
df = pd.read_excel("employee_info.xlsx")
con = None
try:
con = cx_Oracle.connect (
config.username,
config.password,
config.dsn,
encoding = config.encoding)
except cx_Oracle.Error as error:
print(error)
finally:
cursor = con.cursor()
rows = [tuple(x) for x in df.values]
cursor.executemany( ''' INSERT INTO ODS.EMPLOYEES({x} VALUES {rows}) '''
I'm not sure what sql I should put or if there's a way I can use a for-loop to iterate through the columns but my main issue stems from how can I dynamically add these for when our dataset grows in columns?
I check the columns that match by using:
sql = "SELECT * FROM ODS.EMPLOYEES"
cursor.execute(sql)
data = cursor.fetchall()
col_names = []
for i in range (0, len(cursor.description)):
col_names.append(cursor.description[i][0])
a = np.intersect1d(df.columns, col_names)
print("common columns:", a)
that gives me a list of all the common columns; so I'm not sure? I've renamed the columns in my excel file to match the columns in the database's table but my issue is that how can I match these in a dynamic/automated way so I can continue to add to my datasets without worrying about changing the code.
Bonus: I also am importing SQL in a case statement to create a new column where I'm rolling up a few other columns; if there's a way to add this to the first part of my SQL or if it's advisable to do all manipulations before using an insert statement that'll be helpful to know as well.
Look at https://github.com/oracle/python-oracledb/blob/main/samples/load_csv.py
You would replace the CSV reading bit with parsing your data frame. You need to construct a SQL statement similar to the one used in that example:
sql = "insert into LoadCsvTab (id, name) values (:1, :2)"
For each spreadsheet column that you decide matches a table column, construct the (id, name) bit of the statement and add another id to the bind section (:1, :2).
I'm currently looping through the dataframe updating the SQL table for each primary key row, but this is taking a very long time.
Is there a quicker way to implement the following logic:
with engine.begin() as conn:
for i in range(0, len(df['primary_key'])):
conn.execute('UPDATE SQL_TABLE
SET Column1 = df['Column1'].iloc[i]
WHERE primary_key = df['primary_key'].iloc[i]')
I'd like to append a column into the table existing in sqlite3 database, using values stored in a pandas Series.
My original DataFrame df looks like:
a b
0 1 2
1 3 4
And this is stored as a table in sqlite3 also.
If I add a column to df as:
df['c'] = df.a + df.b
then df will be:
a b c
0 1 2 3
1 3 4 7
whereas the table in the sqlite3 db is not changed yet.
What I want to do is to append a column ('c') into the table in sqlite3 and fill its values with df['c'].
What I tried is:
con = sqlite3.connect('data/a.db')
cur = con.cursor()
cur.execute('alter table temptable add column c integer')
con.commit()
cur.execute('update temptable set c=?', df.c)
con.commit()
con.close()
However, it is not working. Is there a possible way to perform bulk update for the new column 'c' in sqlite3? The number of rows is usually around 100,000,000.
Assuming ALTER TABLE worked and you were able to add the new column, try using sqlite3's executemany method to insert values into the new column. The accepted answer to this SO question shows you how to do it (Note that you'll need a primary key on your table)
As an alternative, this link shows you how to use DataFrame.to_sql to update the entire table using the dataframe without writing sql query yourself.
I'm trying to import the results of a complex SQL query into a pandas dataframe. My query requires me to create several temporary tables since the final result table I want includes some aggregates.
My code looks like this:
cnxn = pyodbc.connect(r'DRIVER=foo;SERVER=bar;etc')
cursor = cnxn.cursor()
cursor.execute('SQL QUERY HERE')
cursor.execute('SECONDARY SQL QUERY HERE')
...
df = pd.DataFrame(cursor.fetchall(),columns = [desc[0] for desc in cursor.description])
I get an error that tells me shapes aren't matching:
ValueError: Shape of passed values is (1,900000),indices imply (5,900000)
And indeed, the result of all the SQL queries should be a table with 5 columns rather than 1. I've run the SQL query using Microsoft SQL Server Management Studio and it works and returns the 5 column table that I want. I've tried to not pass any column names into the dataframe and printed out the head of the dataframe and found that pandas has put all the information in 5 columns into 1. The values in each row is a list of 5 values separated by commas, but pandas treats the entire list as 1 column. Why is pandas doing this? I've also tried going the pd.read_sql route but I still get the same error.
EDIT:
I have done some more debugging, taking the comments into account. The issue doesn't appear to stem from the fact that my query is nested. I tried a simple (one line) query to return a 3 column table and I still got the same error. Printing out fetchall() looks like this:
[(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),
(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),...]
Use pd.DataFrame.from_records instead:
df = pd.DataFrame.from_records(cursor.fetchall(),
columns = [desc[0] for desc in cursor.description])
Simply adjust the pd.DataFrame() call as right now cursor.fetchall() returns one-length list of tuples. Use tuple() or list to map child elements into their own columns:
df = pd.DataFrame([tuple(row) for row in cur.fetchall()],
columns = [desc[0] for desc in cursor.description])
Iam new to python sqlite, and I have a problem with create table query.
I need to create a table, but I have the column names of the table as a list.
columnslist = ["column1", "column2", "column3"]
Now, I have to create a table MyTable with the above columns. But the problem is, I won't know before hand how may columns are there in columnslist
Is it possible to create a table with the number of and name of columns given in columnslist and its syntax?
You can first convert your list to tuple and use str.format:
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE table_name {}'''.format(tuple(column_list)))