I'm trying to create tables using python but when I inspect the data structure in SQLite, the primary keys aren't being assigned. Here's the code for one of the tables. It seems to work as intended except for the primary key part. I'm new to Python and SQLite so I'm probably missing something very obvious but can't find any answers.
# Create a database and connect
conn = sql.connect('Coursework.db')
c = conn.cursor()
# Create the tables from the normalised schema
c.execute('CREATE TABLE IF NOT EXISTS room_host (room_ID integer PRIMARY KEY, host_ID integer)')
c.execute("SELECT count(name) from sqlite_master WHERE type='table' AND name='room_host'")
if c.fetchone()[0] == 1:
c.execute("DROP TABLE room_host")
else:
c.execute('CREATE TABLE room_host (room_ID integer PRIMARY KEY, host_ID integer)')
conn.commit()
# read data from csv
read_listings = pd.read_csv('listings.csv')
room_host = pd.DataFrame(read_listings, columns=['id', 'host_id'])
room_host.set_index('id')
room_host.to_sql("room_host", conn, if_exists='append', index=False)
c.execute("""INSERT INTO room_host (id, host_ID)
SELECT room_host.id, room_host.host_ID
FROM room_host
""")
I can't reporoduce the issue with the primary key, the table is created as expected when I run that SQL statement.
Other than that, the detour through Pandas is not really necessary, the csv module plus .executemany() seems to me as a much more straight-forward way of loading data from a CSV into a table.
import csv
import sqlite3 as sql
conn = sql.connect('Coursework.db')
conn.executescript('CREATE TABLE IF NOT EXISTS room_host (room_ID integer PRIMARY KEY, host_ID integer)')
conn.commit()
with open('listings.csv', encoding='utf8', newline='') as f:
reader = csv.reader(f, delimiter=',')
conn.executemany('INSERT INTO room_host (room_ID, host_ID) VALUES (?, ?)', reader)
conn.commit()
Related
I query 4hrs data from source PLC MS SQL db, process it with python and write the data to main Postgresql table.
While writing to main Postgres table hourly, there is a duplicate value (previous 3 hrs) -it will result in error (primary key) and prevent the transaction & python error.
So,
I create a temp PostgreSQL table without any key every time hourly
Then copy pandas dataframe to temp table
Then insert rows from temp table --> main PostgreSQL table
Drop temp PostgreSQL table
This python script runs in windows task scheduler hourly
Below is my query.
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
conn = engine.raw_connection()
cur = conn.cursor()
cur.execute("""CREATE TABLE public.table_temp
(
datetime timestamp without time zone NOT NULL,
tagid text COLLATE pg_catalog."default" NOT NULL,
mc text COLLATE pg_catalog."default" NOT NULL,
value text COLLATE pg_catalog."default",
quality text COLLATE pg_catalog."default"
)
TABLESPACE pg_default;
ALTER TABLE public.table_temp
OWNER to postgres;""");
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_temp', null="")
cur.execute("""Insert into public.table_main select * From table_temp ON CONFLICT DO NOTHING;""");
cur.execute("""DROP TABLE table_temp CASCADE;""");
conn.commit()
I would like to know if there is any efficient/faster way to do it
If I'm correct in assuming that the data is in the data frame you should just be able to do
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
df.drop_duplicates(subset=None) # Replace None with list of column names that define the primary key ex. ['column_name1', 'column_name2']
df.to_sql('table_main', engine, if_exists='append')
Edit due to comment:
If that's the case you have the right idea. You can make it more efficient by using to_sql to insert the data into the temp table first like so.
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
df.to_sql('table_temp', engine, if_exists='replace')
cur.execute("""Insert into public.table_main select * From table_temp ON CONFLICT DO NOTHING;""");
# cur.execute("""DROP TABLE table_temp CASCADE;"""); # You can drop if you want to but the replace option in to_sql will drop and recreate the table
conn.commit()
I'm trying to fill in a SQLite database using the Python library sqlite3. In this example, the idea is simply to read data from a file and populate the database, however, although it seems the table is being created, the database is not growing and values are not written to it.
The code I have is below:
import sqlite3
def update_db(c, key, val):
sql = ''' UPDATE atable SET f1 = ? WHERE id = ?; '''
c.execute(sql, (val, key))
def create_table(c):
sql = ''' CREATE TABLE atable (id text PRIMARY KEY, f1 integer DEFAULT 0); '''
c.execute(sql)
with sqlite3.connect('test.db') as conn:
c = conn.cursor()
create_table(c)
with open('file1.txt') as f1:
for line in f1:
l = line.strip().split()
update_db(c, l[0], int(l[1]))
conn.commit()
This code runs without errors, but when trying to query this database, either with Python:
with sqlite3.connect('test.db') as conn:
c = conn.cursor()
c.execute('SELECT * FROM atable;')
for row in c:
print(row)
or in the SQLite command interface:
$ sqlite3 test.db
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> SELECT * FROM atable;
sqlite> .tables
atable
sqlite>
the output is always empty (but it looks the table was correctly create). What am I doing wrong here?
The 'file1.txt' for testing is this:
foo 2
bar 0
baz 1
The SQL syntax for adding rows to your table is:
INSERT INTO atable (id, f1) VALUES (?, ?);
UPDATE will do nothing if there are not already rows in the table.
If you want to insert or replace an existing row, sqlite also supports an INSERT OR REPLACE command.
I have a 10gb csv file of userIDs and genders which are sometimes duplicated.
userID,gender
372,f
37261,m
23,m
4725,f
...
Here's my code for importing csv and writing it to SQLite database:
import sqlite3
import csv
path = 'genders.csv'
user_table = 'Users'
conn = sqlite3.connect('db.sqlite')
cur = conn.cursor()
cur.execute(f'''DROP TABLE IF EXISTS {user_table}''')
cur.execute(f'''CREATE TABLE {user_table} (
userID INTEGER NOT NULL,
gender INTEGER,
PRIMARY KEY (userID))''')
with open(path) as csvfile:
datareader = csv.reader(csvfile)
# skip header
next(datareader, None)
for counter, line in enumerate(datareader):
# change gender string to integer
line[1] = 1 if line[1] == 'f' else 0
cur.execute(f'''INSERT OR IGNORE INTO {user_table} (userID, gender)
VALUES ({int(line[0])}, {int(line[1])})''')
conn.commit()
conn.close()
For now, it takes 10 seconds to process 1MB file (In reality, I have more columns and also create more tables.).
I don't think pd.to_sql can be used because I want to have a primary key.
Instead of using cursor.execute for every line, use cursor.executemany and insert all data at once.
Store your values in format _list=[(a,b,c..),(a2,b2,c2...),(a3,b3,c3...)......]
cursor.executemany('''INSERT OR IGNORE INTO {user_table} (userID, gender,...)
VALUES (?,?,...)''',(_list))
conn.commit()
Info:
https://docs.python.org/2/library/sqlite3.html#module-sqlite3
When trying to insert rows into a table with a unique index, it appears to simply silently not insert.
I've captured the behaviour in the following program: on the second call to test_insert I should get an integrity violation on the unique key. But nothing. Also, if I take the c.execute(query, [id_to_test]) line and duplicate itself below it, I do receive the proper integrity constraint as expected. What's happening here?
import sqlite3
def test_insert(id_to_test):
conn = sqlite3.connect('test.db')
c = conn.cursor()
query = '''INSERT INTO test(unique_id)
VALUES(?)'''
c.execute(query, [id_to_test])
def setup_table():
conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute('''DROP TABLE IF EXISTS test''')
c.execute('''CREATE TABLE test (unique_id text)''')
c.execute('''CREATE UNIQUE INDEX test_unique_id ON test (unique_id)''')
if __name__ == '__main__':
setup_table()
test_insert('test_id')
test_insert('test_id')
test_insert('test_id')
At the end of database operations, commit the changes to the database:
conn.commit()
I am using python to copy one table (dictionary) to another (origin_dictionary) in SQLite, and here is my code to this part:
def copyDictionaryToOrigin(self):
dropTableQueryStr = "DROP TABLE IF EXISTS origin_dictionary"
createTableQueryStr = "CREATE TABLE origin_dictionary (id INTEGER PRIMARY KEY AUTOINCREMENT, word TEXT, type TEXT)"
syncTableQueryStr = "INSERT INTO origin_dictionary (word, type) SELECT word, type FROM dictionary"
self.cur.execute(dropTableQueryStr)
self.cur.fetchone()
self.cur.execute(createTableQueryStr)
result = self.cur.fetchone()
self.cur.execute(syncTableQueryStr)
result = self.cur.fetchone()
With running this code, I can see a origin_dictionary table is created, but there is no data in the table. I could not find out the reason why the data didn't copy over to the new table. can someone please help me with this?
If you need to simply copy one table to another, why don't you use CREATE TABLE ... AS SELECT? Also, you need to commit() your statements.
Simply use code below, and it should work:
import sqlite3
conn = sqlite3.connect(example.db")
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS origin_dictionary")
cur.execute("CREATE TABLE origin_dictionary AS SELECT * FROM dictionary")
conn.commit()
conn.close()