I am wanting to perform random samples from a large database, and I am wanting those samples to be paired, which means that I either I care about the order of results from a (series of) select statement(s) or reorder afterwards. Additionally, there may be duplicate rows as well. This is fine, but I want an efficient way to make these samples straight from the db. I understand that SELECT statements cannot be used with cursor.executemany but really that is what I would like.
There is a similar question here
where the OP seems to be asking for a multi-select, but it happy with the current top answer which suggests using IN in the where clause. This is not what I am looking for really. I'd prefer something more like ken.ganong's solution, but wonder about the efficiency of this.
More precisely, I do something like the following:
import sqlite3
import numpy as np
# create the database and inject some values
values = [
(1, "Hannibal Smith", "Command"),
(2, "The Faceman", "Charm"),
(3, "Murdock", "Pilot"),
(4, "B.A. Baracas", "Muscle")]
con = sqlite3.connect('/tmp/test.db')
cur = con.cursor()
cur.execute(
'CREATE TABLE a_team (tid INTEGER PRIMARY KEY, name TEXT, role TEXT)')
con.commit()
cur.executemany('INSERT INTO a_team VALUES(?, ?, ?)', values)
con.commit()
# now let's say that I have these pairs of values I want to select role's for
tid_pairs = np.array([(1,2), (1,3), (2,1), (4,3), (3,4), (4,3)])
# what I currently do is run multiple selects, insert into a running
# list and then numpy.array and reshape the result
out_roles = []
select_query = "SELECT role FROM a_team WHERE tid = ?"
for tid in tid_pairs.flatten():
cur.execute(select_query, (tid,))
out_roles.append(cur.fetchall()[0][0])
#
role_pairs = np.array(out_roles).reshape(tid_pairs.shape)
To me it seems like there must be a more efficient way of passing a SELECT statement to the db which requests multiple rows each with their own constrants, but as I say executemany cannot be used with a SELECT statement. The alternative is to use an IN constraint in the WHERE clause then make the duplicates within python.
There are a few extra constraints, for instance, I may have non-existing rows in the db and I may want to handle that by dropping an output pair, or replacing with a default value, but these things are a side issue.
Thanks in advance.
Related
I am trying to execute a delete statement that checks if the table has any SKU that exists in the SKU column of the dataframe. And if it does, it deletes the row. As I am using a for statement to iterate through the rows and check, it takes a long time to run the program for 6000 rows of data.
I used executemany() as it was faster than using a for loop for the delete statement, but I am finding it hard to find an alternative for checking values in the dataframe.
sname = input("Enter name: ")
cursor = mydb.cursor(prepared=True)
column = df["SKU"]
data=list([(sname, x) for x in column])
query="""DELETE FROM price_calculations1 WHERE Name=%s AND SKU=%s"""
cursor.executemany(query,data)
mydb.commit()
cursor.close()
Is there a more efficient code for achieving the same?
You could first use a GET id FROM price_calculations1 WHERE Name=%s AND SKU=%s
and then use a MYSQL WHILE loop to delete these ids without the need of a cursor, which seems to be more performant.
See: https://www.mssqltips.com/sqlservertip/6148/sql-server-loop-through-table-rows-without-cursor/
A WHILE loop without the previous get, might also work.
See: https://dev.mysql.com/doc/refman/8.0/en/while.html
Rather than looping, try to do all the work in a single call to the database (this guideline is often applicable when working with databases).
Given a list of name / sku pairs:
pairs = [(name1, sku1), (name2, sku2), ...]
create a query that identifies all the matching records and deletes them
base_query = """DELETE FROM t1.price_calculations1 t1
WHERE t1.id IN (
SELECT t2.id FROM price_calculations1 t2
WHERE {})
"""
# Build the WHERE clause criteria
criteria = "OR ".join(["(name = %s AND sku = %s)"] * len(pairs))
# Create the query
query = base_query.format(criteria)
# "Flatten" the value pairs
values = [i for j in pairs for i in j]
cursor.execute(query, values)
cursor.commit()
My question may be out of pure ignorance. Given an arbitrary dataframe of say 5 rows. I want to insert that dataframe into a DB (in my case it's postgresSQL). General code to do that is along the lines of:
postgres_insert_query = """ INSERT INTO table (ID, MODEL, PRICE) VALUES (%s,%s,%s)""" record_to_insert = (1, 'A', 100) cursor.execute(postgres_insert_query, record_to_insert)
Is it a common practice that when inserting more than one row of data, you iterate over your rows and do that?
It appears that every article or example I see is about inserting a single row to a DB.
In python you could simply loop over your data frame and then do your inserts.
for record in dataframe:
sql = '''INSERT INTO table (col1, col2, col3)
VALUES ('{}', '{}', '{}')
'''.format(record[1], record[0], record[2])
dbo.execute(sql)
This is highly simplistic. You may want to use something like sqlalchemy and make surre you use prepared statements. Never overlook security.
Say I have a list of following values:
listA = [1,2,3,4,5,6,7,8,9,10]
I want to put each value of this list in a column named formatteddate in my SQLite database using executemany command rather than loop through the entire list and inserting each value separately.
I know how to do it if I had multiple columns of data to insert. For instance, if I had to insert listA,listB,listC then I could create a tuple like (listA[i],listB[i],listC[i]). Is it possible to insert one list of values without a loop. Also assume the insert values are integers.
UPDATE:
Based on the answer provided I tried the following code:
def excutemanySQLCodewithTask(sqlcommand,task,databasefilename):
# create a database connection
conn = create_connection(databasefilename)
with conn:
cur = conn.cursor()
cur.executemany(sqlcommand,[(i,) for i in task])
return cur.lastrowid
tempStorage = [19750328, 19750330, 19750401, 19750402, 19750404, 19750406, 19751024, 19751025, 19751028, 19751030]
excutemanySQLCodewithTask("""UPDATE myTable SET formatteddate = (?) ;""",tempStorage,databasefilename)
It still takes too long (roughly 10 hours). I have 150,000 items in tempStorage. I tried INSERT INTO and that was slow as well. It seems like it isn't possible to make a list of tuple of integers.
As you say, you need a list of tuples. So you can do:
cursor.executemany("INSERT INTO my_table VALUES (?)", [(a,) for a in listA])
I found the only way how to update only null variables in mysql db with python.
I have this kind of statement:
sql = "INSERT INTO `table` VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)\
ON DUPLICATE KEY UPDATE Data_block_1_HC1_sec_voltage=IF(VALUES(Data_block_1_HC1_sec_voltage)IS NULL,Data_block_1_HC1_sec_voltage,VALUES(Data_block_1_HC1_sec_voltage)),\
`Data_block_1_TC1_1`=IF(VALUES(`Data_block_1_TC1_1`)IS NULL,`Data_block_1_TC1_1`,VALUES(`Data_block_1_TC1_1`)),\
`Data_block_1_TC1_2`=IF(VALUES(`Data_block_1_TC1_2`)IS NULL,`Data_block_1_TC1_2`,VALUES(`Data_block_1_TC1_2`)),\
`Data_block_1_TCF1_1`=IF(VALUES(`Data_block_1_TCF1_1`)IS NULL,`Data_block_1_TCF1_1`,VALUES(`Data_block_1_TCF1_1`)),\
`HC1_HC1_output`=IF(VALUES(`HC1_HC1_output`)IS NULL,`HC1_HC1_output`,VALUES(`HC1_HC1_output`)),\
`Data_block_1_HC1_sec_cur`=IF(VALUES(`Data_block_1_HC1_sec_cur`)IS NULL,`Data_block_1_HC1_sec_cur`,VALUES(`Data_block_1_HC1_sec_cur`)),\
`Data_block_1_HC1_power`=IF(VALUES(`Data_block_1_HC1_power`)IS NULL,`Data_block_1_HC1_power`,VALUES(`Data_block_1_HC1_power`)),\
`HC1_HC1_setpoint`=IF(VALUES(`HC1_HC1_setpoint`)IS NULL,`HC1_HC1_setpoint`,VALUES(`HC1_HC1_setpoint`))\
"
Datablocks are columns in db. Primary key is datetime. Right now there are 8 columns but I will have a lot more variables (more columns). I am not reallz good at python but I dont like the statement because its kind of hardcoded. Could I make this statement somehow in a for cycle or something so It doesnt have to be so long and I dont have to write all the variables manually?
Thx for your help
Let's assume that your column names are stored in an array cols. Then in order to generate the "interesting" inner part of the SQL statement above, you could do
',\\\n'.join(map(lambda c: r'`%(col)s` = IF(VALUES(`%(col)s`) IS NULL, `%(col)s`, VALUES(`%(col)s`))' % {'col': c}, cols))
Here, map generates for each element of cols the corresponding line of the SQL statement and join then stitches everything together.
I am trying to extract data that corresponds to a stock that is present in both of my data sets (given in a code below).
This is my data:
#(stock,price,recommendation)
my_data_1 = [('a',1,'BUY'),('b',2,'SELL'),('c',3,'HOLD'),('d',6,'BUY')]
#(stock,price,volume)
my_data_2 = [('a',1,5),('d',6,6),('e',2,7)]
Here are my questions:
Question 1:
I am trying to extract price, recommendation, and volume that correspond to asset 'a'. Ideally I would like to get a tuple like this:
(u'a',1,u'BUY',5)
Question 2:
What if I wanted to get intersection for all the stocks (not just 'a' as in Question 1), in this case it is stock 'a', and stock 'd', then my desired output becomes:
(u'a',1,u'BUY',5)
(u'd',6,u'BUY',6)
How should I do this?
Here is my try (Question 1):
import sqlite3
my_data_1 = [('a',1,'BUY'),('b',2,'SELL'),('c',3,'HOLD'),('d',6,'BUY')]
my_data_2 = [('a',1,5),('d',6,6),('e',2,7)]
#I am using :memory: because I want to experiment
#with the database a lot
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''CREATE TABLE MY_TABLE_1
(stock TEXT, price REAL, recommendation TEXT )''' )
c.execute('''CREATE TABLE MY_TABLE_2
(stock TEXT, price REAL, volume REAL )''' )
for ele in my_data_1:
c.execute('''INSERT INTO MY_TABLE_1 VALUES(?,?,?)''',ele)
for ele in my_data_2:
c.execute('''INSERT INTO MY_TABLE_2 VALUES(?,?,?)''',ele)
conn.commit()
# The problem is with the following line:
c.execute( 'select* from my_table_1 where stock = ? INTERSECT select* from my_table_2 where stock = ?',('a','a') )
for entry in c:
print entry
I get no error, but also no output, so something is clearly off.
I also tried this line:
c.execute( 'select* from my_table_1 where stock = ? INTERSECT select volume from my_table_2 where stock = ?',('a','a')
but it does not work, I get this error:
c.execute( 'select* from my_table_1 where stock = ? INTERSECT select volume from my_table_2 where stock = ?',('a','a') )
sqlite3.OperationalError: SELECTs to the left and right of INTERSECT do not have the same number of result columns
I understand why I would have different number of resulting columns, but don't quite get why that triggers an error.
How should I do this?
Thank You in advance
It looks like those two questions are really the same question.
Why your query doesn't work: Let's reformat the query.
SELECT * FROM my_table_1 WHERE stock=?
INTERSECT
SELECT volume FROM my_table_2 WHERE stock=?
There are two queries in the intersection,
SELECT * FROM my_table_1 WHERE stock=?
SELECT volume FROM my_table_2 WHERE stock=?
The meaning of "intersect" is "give me the rows that are in both queries". That doesn't make any sense if the queries have a different number of columns, since it's impossible for any row to appear in both queries.
Note that SELECT volume FROM my_table_2 isn't a very useful query, since it doesn't tell you which stock the volume belongs to. The query will give you something like {100, 15, 93, 42}.
What you're actually trying to do: You want a join.
SELECT my_table_1.stock, my_table_2.price, recommendation, volume
FROM my_table_1
INNER JOIN my_table_2 ON my_table_1.stock=my_table_2.stock
WHERE stock=?
Think of join as "glue the rows from one table onto the rows from another table, giving data from both tables in a single row."
It's bizarre that the price appears in both tables; when you write the query with the join you have to decide whether you want my_table_1.price or my_table_2.price, or whether you want to join on my_table_1.price=my_table_2.price. You may want to consider redesigning your schema so this doesn't happen, it may make your life easier.
You are suffering from a misunderstanding about how to correlate different tables.
In order to do this the easiest way is to JOIN them with a suitable condition, resulting in results which automatically include the data from both the joined tables. In the example below I select all columns, but you can of course select only those you want by naming them in the FROM clause. You can also select only those rows you want with (a) further condition(s) in a WHERE clause. After you execute you code, try the following:
>>> c.execute("select * from my_table_1 t1 JOIN my_table_2 t2 ON t1.stock=t2.stock")
<sqlite3.Cursor object at 0x1004608f0>
This tells SQLite to take rows from table 1 and join them with rows in table 2 meeting the conditions in the ON clause (i.e. they have to have the same value for their STOCK attribute). Because you chose such long table names, and because I am a crappy typist, I used table alises in the FROM clause to allow me to use shortened names in the rest of the query.
>>> c.fetchall()
then gives you the result
[(u'a', 1.0, u'BUY', u'a', 1.0, 5.0), (u'd', 6.0, u'BUY', u'd', 6.0, 6.0)]
which would seem to answer both 1) and 2). For only a particular value of STOCK just add
WHERE t1.STOCK = 'a' -- or other required value, naturally
to the query string. You can see the names of the columns returned by querying the cursor's description attribute:
>>> [d[0] for d in c.description]
['stock', 'price', 'recommendation', 'stock', 'price', 'volume']
The INTERSECT operation is used to take the outputs from two separate SELECT queries and return only those elements that occur in both. I don't think that's going to be helpful here. The reason you got the error is because the queries have to be "UNION compatible", which is to say they need the same number and type of columns in the intersected queries.