I have a database connection and I insert data into a table using to_sql.
xls.to_sql(table, con=engine, if_exists='append', index=False, chunksize=10000)
I've been trying to obtain the number of rows inserted, or the number of rows in the table (considering I truncate the table before inserting new data). I've been unsuccessful though.
Can you help?
I've tried:
countRow=engine.execute("select count(*) from "+table);
print(countRow)
I find it odd that this doesn't work because I use the same thing to truncate the table. Am I missing something or doing something wrong here?
As #Deepak Tripathi suggested, I used this and it worked:
engine.execute("select count(*) from "+table+ ";").fetchall()
Related
I have created a database using sqlite3 in python that has thousands of tables. Each of these tables contains thousands of rows and ten columns. One of the columns is the date and time of an event: it is a string that is formatted as YYYY-mm-dd HH:MM:SS, which I have defined to be the primary key for each table. Every so often, I collect some new data (hundreds of rows) for each of these tables. Each new dataset is pulled from a server and loaded in directly as a pandas data frame or is stored as a CSV file. The new data contains the same ten columns as my original data. I need to update the tables in my database using this new data in the following way:
Given a table in my database, for each row in the new dataset, if the date and time of the row matches the date and time of an existing row in my database, update the remaining columns of that row using the values in the new dataset.
If the date and time does not yet exist, create a new row and insert it to my database.
Below are my questions:
I've done some searching on Google and it looks like I should be using the UPSERT (merge) functionality of sqlite but I can't seem to find any examples showing how to use it. Is there an actual UPSERT command, and if so, could someone please provide an example (preferably with sqlite3 in Python) or point me to a helpful resource?
Also, is there a way to do this in bulk so that I can UPSERT each new dataset into my database without having to go row by row? (I found this link, which suggests that it is possible, but I'm new to using databases and am not sure how to actually run the UPSERT command.)
Can UPSERT also be performed directly using pandas.DataFrame.to_sql?
My backup solution is loading in the table to be UPSERTed using pd.read_sql_query("SELECT * from table", con), performing pandas.DataFrame.merge, deleting the said table from the database, and then adding in the updated table to the database using pd.DataFrame.to_sql (but this would be inefficient).
Instead of going through upsert command, why don't you create your own algorithim that will find values and replace them if date & time is found, else it will insert new row. Check out my code, i wrote for you. Let me know if you are still confused. You can even do that for hundereds of tables just by replacing table name in algorithim with some variable and changing it for the whole list of your table names.
import sqlite3
import pandas as pd
csv_data = pd.read_csv("my_CSV_file.csv") # Your CSV Data Path
def manual_upsert():
con = sqlite3.connect(connection_str)
cur = con.cursor()
cur.execute("SELECT * FROM my_CSV_data") # Viewing Data from Column
data = cur.fetchall()
old_data_list = [] # Collection of All Dates already in Database table.
for line in data:
old_data_list.append(line[0]) # I suppose you Date Column is on 0 Index.
for new_data in csv_data:
if new_data[0] in old_data_list:
cur.execute("UPDATE my_CSV_data SET column1=?, column2=?, column3=? WHERE my_date_column=?", # it will update column based on date if condition is true
(new_data[1],new_data[2],new_data[3],new_data[0]))
else:
cur.execute("INSERT INTO my_CSV_data VALUES(?,?,?,?)", # It will insert new row if date is not found.
(new_data[0],new_data[1],new_data[2],new_data[3]))
con.commit()
con.close()
manual_upsert()
First, even though the questions are related, ask them separately in the future.
There is documentation on UPSERT handling in SQLite that documents how to use it but it is a bit abstract. You can check examples and discussion here: SQLite - UPSERT *not* INSERT or REPLACE
Use a transaction and the statements are going to be executed in bulk.
As presence of this library suggests to_sql does not create UPSERT commands (only INSERT).
I currently have a Python dataframe that is 23 columns and 20,000 rows.
Using Python code, I want to write my data frame into a MSSQL server that I have the credentials for.
As a test I am able to successfully write some values into the table using the code below:
connection = pypyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=XXX;'
'uid=XXX;'
'pwd=XXX')
cursor = connection.cursor()
for index, row in df_EVENT5_15.iterrows():
cursor.execute("INSERT INTO MODREPORT(rowid, OPCODE, LOCATION, TRACKNAME)
cursor.execute("INSERT INTO MODREPORT(rowid, location) VALUES (?,?)", (5, 'test'))
connection.commit()
But how do I write all the rows in my data frame table to the MSSQL server? In order to do so, I need to code up the following steps in my Python environment:
Delete all the rows in the MSSQL server table
Write my dataframe to the server
When you say Python data frame, I'm assuming you're using a Pandas dataframe. If it's the case, then you could use the to_sql function.
df.to_sql("MODREPORT", connection, if_exists="replace")
The if_exists argument set to replace will delete all the rows in the existing table before writing the records.
I realise it's been a while since you asked but the easiest way to delete ALL the rows in the SQL server table (point 1 of the question) would be to send the command
TRUNCATE TABLE Tablename
This will drop all the data in the table but leave the table and indexes empty so you or the DBA would not need to recreate it. It also uses less of the transaction log when it runs.
I am trying to upload data from a csv file (its on my local desktop) to my remote SQL database. This is my query
dsn = "dsnname";pwd="password"
import pyodbc
csv_data =open(r'C:\Users\folder\Desktop\filename.csv')
def func(dsn):
cnnctn=pyodbc.connect(dsn)
cnnctn.autocommit =True
cur=cnnctn.cursor()
for rows in csv_data:
cur.execute("insert into database.tablename (colname) value(?)", rows)
cur.commit()
cnnctn.commit()
cur.close()
cnnctn.close()
return()
c=func(dsn)
The problem is that all of my data gets uploaded in one col- that I specified. If I don't specify a col name it won't run. I have 9 cols in my database table and I want to upload this data into separate cols.
When you insert with SQL, you need to make sure you are telling which columns you want to be inserting on. For example, when you execute:
INSERT INTO table (column_name) VALUES (val);
You are letting SQL know that you want to map column_name to val for that specific row. So, you need to make sure that the number of columns in the first parentheses matches the number of values in the second set of parentheses.
I've got some weird problem here and stuck. I'm rewriting python script that generates some CSV files, and I need to write the same info on MySQL server.
I've managed to get it working... somehow.
Here is the part that creates CSV:
final_table.get_tunid_town_pivot().to_csv('result_pivot_tunid_town_' + ConsoleLog.get_curr_date_underline() + '.csv', sep=';')
And here is the part that loads data into MySQL table:
conn = pymysql.connect(host='localhost', port=3306, user='test', passwd='test', db='test')
final_table.get_tunid_town_pivot().to_sql(con=conn, name='TunID', if_exists='replace', flavor='mysql', index=False, chunksize=10000)
conn.close()
The problem is that there are 4 columns in dataframe, but in MySQL i get only one last column. I have no idea why is that happening, and I found zero similar problems. Any help please?
Your DataFrame has (probably due to the pivoting) a MultiIndex of 3 levels and only 1 column. By default, to_sql will also write the index to the SQL table, but you did specify index=False, so only the one column will be written to SQL.
So either do not specify to not include the index (so use index=True), or either reset the index and write the frame then (df.reset_index().to_sql(..., index=False)).
Also note that using a pymysql connection in to_sql is deprecated (it should give you a warning), you have to use it through an SQLAlchemy engine.
I have a table using SQL Lite with Python. The size of the table always has 3 columns and could have many rows. Each of the cells are strings. Here is example table:
serial_num date_measured status
1234A 1-1-2015 passed
4321B 6-21-2015 failed
1423C 12-25-2015 passed
......
My program prompts me for a serial number. This is saved as a variable called serialNum. How can I delete (or overwrite) an entire row if serialNum equals any of the strings in the serial_num column in my table?
I've seen many examples on how to delete (or overwrite) a row in a table if I know all the values in each cell of that row, but my trouble is that the only cell that could ever be the same in each row would be the serial number. I need to so a search through the serial_number column and if any string in that column equals the current value of my serialNum variable, I need to delete (or overwrite) that row.
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE test (serial_num text, date_measured text, status text)''')
c.execute("INSERT INTO test VALUES ('1234A', '1-1-2015', 'passed')")
c.execute("INSERT INTO test VALUES ('4321B', '6-21-2015', 'failed')")
c.execute("INSERT INTO test VALUES ('1423C', '12-25-2015', 'passed')")
conn.commit()
Does anyone know a simple way to do this? I've seen others say that an ID must be used or a temporary table, but I would hope there might be an easier way to accomplish my task. Any advice would be great.
SQL suports this: simply use delete
"delete from test where serial_num=<some input>;"
or in this case
c.execute("delete from test where serial_num=%s;", serialNum);
There's no need to search through the list when using SQL. SQL is declarative: you tell it what to do using your query, not how to do it. Don't loop though all your rows to check which to delete: tell it what to delete and the database engine will find the best/fastest way to satisfy that goal.
Hope I well interpreted your question
for row in c.execute('SELECT * FROM test WHERE serial_num = ?', serialNum'):
# do whatever you want on row
print row
I was able to figure out a working solution:
sql = "DELETE FROM test WHERE serial_num = ?"
c.execute(sql, (serialNum,))
The comma after serialNum for some reason has to be there. Thank you #Michiel Arienfor the head start