I am trying to create a training app in python to work with a database of movies, adding movie details via a text menu prompting user input for all fields (movie name, actors, company, etc.). I am using PostgreSQL as the database and import psycopg2 in Python.
From user input, I am collecting data which I then want to store in my database tables 'movies' and 'actors'. For one movie, there are several actors. I have this code:
def insert_movie(name, actors, company, year):
connection = psycopg2.connect(user='postgres', password='postgres', database='movie')
cursor = connection.cursor()
query1 = "INSERT INTO movies (name, company, year) VALUES (%s, %s, %s);"
cursor.execute(query1, (name, company, year))
movie_id = cursor.fetchone[0]
print(movie_id)
query2 = 'INSERT INTO actors (last_name, first_name, actor_ordinal) VALUES (%s, %s, %s);'
for actor in actors:
cursor.execute(query2, (tuple(actor)))
rows = cursor.fetchall()
actor_id1 = [row[0] for row in rows]
actor_id2 = [row[1] for row in rows]
print(actor_id1)
print(actor_id2)
connection.commit()
connection.close()
This works great for printing movie_id after query1. However for printing actor_id2, I get IndexError: list index out of range.
If I leave only actor_id1 in query3 like this:
query2 = 'INSERT INTO actors (last_name, first_name, actor_ordinal) VALUES (%s, %s, %s);'
for actor in actors:
cursor.execute(query2, (tuple(actor)))
rows = cursor.fetchall()
actor_id1 = [row[0] for row in rows]
print(actor_id1)
, I get printed the following result:
movie_id --> 112
actor2_id --> 155
The problem that I cannot retrieve actor1_id with this code, which is 154.
Can anyone help with using fetchall correctly here?
OK, I have found out the answer. The fetch should be used inside the loop as we should execute fetch for every row and not after the whole query for all rows altogether:
query2 = 'INSERT INTO actors (last_name, first_name, actor_ordinal) VALUES (%s, %s, %s);'
actor_ids = []
for actor in actors:
cursor.execute(query2, (tuple(actor)))
actor_id = cursor.fetchone()[0]
actor_ids.append(actor_id)
print(actor_ids)
Related
the motive is to continuously look for new records in my CSV and insert the records to the mssql using pymssql library.
The CSV initially has 244 rows and I'm trying to insert 1 value and wants to dynamically insert the new row only when the script is ran with the scheduler.
I have the script which runs every 15 seconds to insert the values, but post inserting the values the first time, the second time the script throws 'Cannot insert duplicate key in object' as I have my first column DateID which is set a PK and terminates the statement from the first record itself, therefore doesn't insert the new row.
How do I encounter this.
Code:
def trial():
try:
for row in df.itertuples():
datevalue = datetime.datetime.strptime(row.OrderDate, format)
query= "INSERT INTO data (OrderDate, Region, City, Category) VALUES (%s,%s,%s,%s)"
cursor.execute(query, (datevalue, row.Region,row.City,row.Category))
print('"Values inserted')
conn.commit()
conn.close()
except Exception as e:
print("Handle error", e)
pass
schedule.every(15).seconds.do(trial)
Library used: pymssql
SQL: MSSQL server 2019
To avoid duplicate values, consider adjusting query to use EXCEPT clause (part of UNION and INTERSECT set operator family) against actual data. Also, consider using executemany by passing a nested list of all row/column data with DataFrame.to_numpy().tolist().
By the way if OrderDate column is a datetime type in data frame and database table, you do not need to re-format to string value.
def trial():
try:
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT %s, %s, %s, %s "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.executemany(query, vals)
print('Values inserted')
conn.commit()
except Exception as e:
print("Handle error", e)
finally:
cur.close()
conn.close()
For a faster, bulk insert, consider using a staging, temp table:
# CREATE EMPTY TEMP TABLE
query = "SELECT TOP 0 OrderDate, Region, City, Category INTO #pydata FROM data"
cur.execute(query)
# INSERT INTO TEMP TABLE
query= (
"INSERT INTO #pydata (OrderDate, Region, City, Category) "
"VALUES (%s, %s, %s, %s) "
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.execute("BEGIN TRAN")
cur.executemany(query, vals)
# MIGRATE TO FINAL TABLE
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT OrderDate, Region, City, Category "
"FROM #pydata "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
cur.execute(query)
conn.commit()
print("Values inserted")
import csv
import sqlite3
open("shows.db", "w").close()
con = sqlite3.connect('shows.db')
db = con.cursor()
db.execute("CREATE TABLE shows (id INTEGER, title TEXT, PRIMARY KEY(id))")
db.execute("CREATE TABLE genres (show_id INTEGER, genre TEXT, FOREIGN KEY(show_id) REFERENCES shows(id))")
with open("/Users/xxx/Downloads/CS50 2019 - Lecture 7 - Favorite TV Shows (Responses) - Form Responses 1.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
title = row["title"].strip().upper()
id = db.execute("INSERT INTO shows (title) VALUES(?)", (title,))
for genre in row["genres"].split(", "):
db.execute("INSERT INTO genres (show_id, genre) VALUES(?, ?)", id,genre)
con.commit()
con.close()
When I run this code I think in this line "db.execute("INSERT INTO genres (show_id, genre) VALUES(?, ?)", id,genre)" the problem happens.
My console says
"db.execute("INSERT INTO genres (show_id, genre) VALUES(?, ?)", id,genre)
TypeError: function takes at most 2 arguments (3 given)"
I don't under stand why it says 3 given even though I gave two argument ( id, genre )
enter image description here
Problems with code
This line returns the cursor. In order to get the result, you will need to call terminal operations such as .fetchall(), .fetchmany() or .fetchone()
id = db.execute("INSERT INTO shows (title) VALUES(?)", (title,))
As you didn't call the terminal operator or print out the result, you wouldn't know that the INSERT operation returns None, not actual id
Minor: generally, it isn't advised to call variables the same name as built-in Python functions. See id.
As I suggested in the comment, you will need to insert a tuple:
db.execute(
"INSERT INTO genres (show_id, genre) VALUES (?, ?)",
(id_, genre)
)
Solution
You will need to select title from the shows table after insertion. Also, retrieve the id from the selection and then insert it into the genres table.
The simplified version of the code, to showcase how to do it:
import csv
import sqlite3
open("shows.db", "w").close()
con = sqlite3.connect('shows.db')
db = con.cursor()
db.execute("CREATE TABLE shows (id INTEGER, title TEXT, PRIMARY KEY(id))")
db.execute("CREATE TABLE genres (show_id INTEGER, genre TEXT, FOREIGN KEY(show_id) REFERENCES shows(id))")
shows = ["FRIENS", "Game of Trhones", "Want", "Scooby Doo"]
genres = ["Comedy", "Fantasy", "Action", "Cartoon"]
for ind, show in enumerate(shows):
db.execute("INSERT INTO shows (title) VALUES(?)", (show,))
id_ = con.execute(
"SELECT id FROM shows WHERE title = :show ORDER BY id DESC LIMIT 1",
{
"show": show
},
).fetchone()
db.execute(
"INSERT INTO genres (show_id, genre) VALUES (?, ?)",
(id_[0], genres[ind], )
)
con.commit()
con.close()
For more details, check my code on the GitHub.
SELECT statement may look a bit complex. What it does in a nutshell, it takes the matching title and returns the largest id. As titles may be the same, you always take the last inserted that matches.
General suggestions on debugging issues like that
Try using print as much as possible
Use dir and type functions to see methods and be able to google types
Search docs or examples on GitHub
I can see your issue. It's because you're trying the add in the id and genre into the query like they do in the sqlite3 documentation. In the documentation they did what your trying to do in a tuple but you did it in the function call.
Try this instead:
sql_query = ("INSERT INTO genres (show_id, genre) VALUES(?, ?)", id, genre)
db.execute(sql_query)
Or you could put it into a one-liner:
# notice how there is 2 parenthesis
# ↓ ↓
db.execute(("INSERT INTO genres (show_id, genre) VALUES(?, ?)", id, genre))
# ↑ ↑
What's the best / fastest solution for the following task:
Used technology: MySQL database + Python
I'm downloading a data.sql file. It's format:
INSERT INTO `temp_table` VALUES (group_id,city_id,zip_code,post_code,earnings,'group_name',votes,'city_name',person_id,'person_name',networth);
INSERT INTO `temp_table` VALUES (group_id,city_id,zip_code,post_code,earnings,'group_name',votes,'city_name',person_id,'person_name',networth);
.
.
Values in each row differ.
Tables structures: http://sqlfiddle.com/#!9/8f10d6
A person can have multiple cities
A person can be only in one group or can be without group.
A group can have multiple persons
And i know from which country these .sql data are.
I need to split these data into 3 tables. And I will be updating data that are already in the tables and if not then I will create new row.
So I came up with 2 solutions:
Split the values from the file via python and then perform for each line 3x select + 3x update/insert in the transaction.
Somehow bulk insert the data into a temporary table and then manipulate with the data inside a database - meaning for each row in the temporary table I will perform 3 select queries (one to each actual table) and if I find row I will send 3x (update query and if not then I run insert query).
I will be running this function multiple times per day with over 10K lines in the .sql file and it will be updating / creating over 30K rows in the database.
//EDIT
My inserting / updating code now:
autocomit = "SET autocommit=0"
with connection.cursor() as cursor:
cursor.execute(autocomit)
data = data.sql
lines = data.splitlines
for line in lines:
with connection.cursor() as cursor:
cursor.execute(line)
temp_data = "SELECT * FROM temp_table"
with connection.cursor() as cursor:
cursor.execute(temp_data)
temp_data = cursor.fetchall()
for temp_row in temp_data:
group_id = temp_row[0]
city_id = temp_row[1]
zip_code = temp_row[2]
post_code = temp_row[3]
earnings = temp_row[4]
group_name = temp_row[5]
votes = temp_row[6]
city_name = temp_row[7]
person_id = temp_row[8]
person_name = temp_row[9]
networth = temp_row[10]
group_select = "SELECT * FROM perm_group WHERE group_id = %s AND countryid_fk = %s"
group_values = (group_id, countryid)
with connection.cursor() as cursor:
row = cursor.execute(group_select, group_values)
if row == 0 and group_id != 0: #If person doesn't have group do not create
group_insert = "INSERT INTO perm_group (group_id, group_name, countryid_fk) VALUES (%s, %s, %s)"
group_insert_values = (group_id, group_name, countryid)
with connection.cursor() as cursor:
cursor.execute(group_insert, group_insert_values)
groupid = cursor.lastrowid
elif row == 1 and group_id != 0:
group_update = "UPDATE perm_group SET group_name = group_name WHERE group_id = %s and countryid_fk = %s"
group_update_values = (group_id, countryid)
with connection.cursor() as cursor:
cursor.execute(group_update, group_update_values)
#Select group id for current row to assign correct group to the person
group_certain_select = "SELECT id FROM perm_group WHERE group_id = %s and countryid_fk = %s"
group_certain_select_values = (group_id, countryid)
with connection.cursor() as cursor:
cursor.execute(group_certain_select, group_certain_select_values)
groupid = cursor.fetchone()
#.
#.
#.
#Repeating the same piece of code for person and city
Measured time: 206 seconds - which is not acceptable.
group_insert = "INSERT INTO perm_group (group_id, group_name, countryid_fk) VALUES (%s, %s, %s) ON DUPLICATE KEY UPDATE group_id = %s, group_name = %s"
group_insert_values = (group_id, group_name, countryid, group_id, group_name)
with connection.cursor() as cursor:
cursor.execute(group_insert, group_insert_values)
#Select group id for current row to assign correct group to the person
group_certain_select = "SELECT id FROM perm_group WHERE group_id = %s and countryid_fk = %s"
group_certain_select_values = (group_id, countryid)
with connection.cursor() as cursor:
cursor.execute(group_certain_select, group_certain_select_values)
groupid = cursor.fetchone()
Measured time: from 30 to 50 seconds. (Still quite long, but it's getting better)
Are there any other better (faster) options on how to do it?
Thanks in advice, popcorn
I would recommend that you load the data into a staging table and do the processing in SQL.
Basically, your ultimate result is a set of SQL tables, so SQL is necessarily going to be part of the solution. You might as well put as much logic into the database as you can, to simply the number of tools needed.
Loading 10,000 rows should not take much time. However, if you have a choice of data formats, I would recommend a CSV file over inserts. inserts incur extra overhead, if only because they are larger.
Once the data is in the database, I would not worry much about the processing time for storing the data in three tables.
I HAVE ADDED MY OWN ANSWER THAT WORKS BUT OPEN TO IMPROVEMENTS
After seeing a project at datanitro. I took on getting a connection to MySQL (they use SQLite) and I was able to import a small test table into Excel from MySQL.
Inserting new updated data from the Excel sheet was this next task and so far I can get one row to work like so...
import MySQLdb
db = MySQLdb.connect("xxx","xxx","xxx","xxx")
c = db.cursor()
c.execute("""INSERT INTO users (id, username, password, userid, fname, lname)
VALUES (%s, %s, %s, %s, %s, %s);""",
(Cell(5,1).value,Cell(5,2).value,Cell(5,3).value,Cell(5,4).value,Cell(5,5).value,Cell(5,6).value,))
db.commit()
db.close()
...but attempts at multiple rows will fail. I suspect either issues while traversing rows in Excel. Here is what I have so far...
import MySQLdb
db = MySQLdb.connect(host="xxx.com", user="xxx", passwd="xxx", db="xxx")
c = db.cursor()
c.execute("select * from users")
usersss = c.fetchall()
updates = []
row = 2 # starting row
while True:
data = tuple(CellRange((row,1),(row,6)).value)
if data[0]:
if data not in usersss: # new record
updates.append(data)
row += 1
else: # end of table
break
c.executemany("""INSERT INTO users (id, username, password, userid, fname, lname) VALUES (%s, %s, %s, %s, %s, %s)""", updates)
db.commit()
db.close()
...as of now, I don't get any errors, but my new line is not added (id 3). This is what my table looks like in Excel...
The database holds the same structure, minus id 3. There has to be a simpler way to traverse the rows and pull the unique content for INSERT, but after 6 hours trying different things (and 2 new Python books) I am going to ask for help.
If I run either...
print '[%s]' % ', '.join(map(str, updates))
or
print updates
my result is
[]
So this is likely not passing any data to MySQL in the first place.
LATEST UPDATE AND WORKING SCRIPT
Not exactly what I want, but this has worked for me...
c = db.cursor()
row = 2
while Cell(row,1).value != None:
c.execute("""INSERT IGNORE INTO users (id, username, password, userid, fname, lname)
VALUES (%s, %s, %s, %s, %s, %s);""",
(CellRange((row,1),(row,6)).value))
row = row + 1
Here is your problem:
while True:
if data[0]:
...
else:
break
Your first id is 0, so in the first iteration of the loop data[0] will be falsely and your loop will exit, without ever adding any data. What you probably ment is:
while True:
if data[0] is not None:
...
else:
break
I ended up finding a solution that gets me an Insert on new and allows for UPDATE of those that are changed. Not exactly a Python selection based on a single query, but will do.
import MySQLdb
db = MySQLdb.connect("xxx","xxx","xxx","xxx")
c = db.cursor()
row = 2
while Cell(row,1).value is not None:
c.execute("INSERT INTO users (id, username, password, \
userid, fname, lname) \
VALUES (%s, %s, %s, %s, %s, %s) \
ON DUPLICATE KEY UPDATE \
id=VALUES(id), username=VALUES(username), password=VALUES(password), \
userid=VALUES(userid), fname=VALUES(fname), lname=VALUES(lname);",
(CellRange((row,1),(row,6)).value))
row = row + 1
db.commit()
db.close()
I'm having a small problem with a Python program (below) that I'm writing.
I want to insert two values from a MySQL table into another table from a Python program.
The two fields are priority and product and I have selected them from the shop table and I want to insert them into the products table.
Can anyone help? Thanks a lot. Marc.
import MySQLdb
def checkOut():
db = MySQLdb.connect(host='localhost', user = 'root', passwd = '$$', db = 'fillmyfridge')
cursor = db.cursor(MySQLdb.cursors.DictCursor)
user_input = raw_input('please enter the product barcode that you are taking out of the fridge: \n')
cursor.execute('update shops set instock=0, howmanytoorder = howmanytoorder + 1 where barcode = %s', (user_input))
db.commit()
cursor.execute('select product, priority from shop where barcode = %s', (user_input))
rows = cursor.fetchall()
cursor.execute('insert into products(product, barcode, priority) values (%s, %s)', (rows["product"], user_input, rows["priority"]))
db.commit()
print 'the following product has been removed from the fridge and needs to be ordered'
You don't mention what the problem is, but in the code you show this:
cursor.execute('insert into products(product, barcode, priority) values (%s, %s)', (rows["product"], user_input, rows["priority"]))
where your values clause only has two %s's in it, where it should have three:
cursor.execute('insert into products(product, barcode, priority) values (%s, %s, %s)', (rows["product"], user_input, rows["priority"]))
Well, the same thing again:
import MySQLdb
def checkOut():
db = MySQLdb.connect(host='localhost', user = 'root', passwd = '$$', db = 'fillmyfridge')
cursor = db.cursor(MySQLdb.cursors.DictCursor)
user_input = raw_input('please enter the product barcode that you are taking out of the fridge: \n')
cursor.execute('update shops set instock=0, howmanytoorder = howmanytoorder + 1 where barcode = %s', (user_input))
db.commit()
cursor.execute('select product, priority from shop where barcode = %s', (user_input))
rows = cursor.fetchall()
Do you need fetchall()?? Barcode's are unique I guess and one barcode is to one product I guess. So, fetchone() is enough....isn't it??
In any case if you do a fetchall() its a result set not a single result.
So rows["product"] is not valid.
It has to be
for row in rows:
cursor.execute('insert into products(product, barcode, priority) values (%s, %s, %s)', (row["product"], user_input, row["priority"]))
db.commit()
print 'the following product has been removed from the fridge and needs to be ordered'
or better
import MySQLdb
def checkOut():
db = MySQLdb.connect(host='localhost', user = 'root', passwd = '$$', db = 'fillmyfridge')
cursor = db.cursor(MySQLdb.cursors.DictCursor)
user_input = raw_input('please enter the product barcode that you are taking out of the fridge: \n')
cursor.execute('update shops set instock=0, howmanytoorder = howmanytoorder + 1 where barcode = %s', (user_input))
cursor.execute('insert into products(product, barcode, priority) select product, barcode, priority from shop where barcode = %s', (user_input))
db.commit()
Edit: Also, you use db.commit() almost like print - anywhere, you need to read and understand the atomicity principle for databases