group_concat only when more than one to group

group_concat only when more than one to group - python

I'd like to perform a group-concat in SQLite only on those records where there is more than one row to concatenate. It seems like you could do this beforehand (count records using a group by then remove those singleton rows before proceeding with the group_concat); after (complete the group_concat then remove rows where nothing was concatenated); or possibly even during?
My question: what's the fastest way for SQLite to accomplish this?
I've worked out an "after" example using APSW in Python, but am not happy with it:
#set up a table with data
c.execute("create table foo(x,y)")
def getvals():
a = [1, 1, 2, 3, 3]
b = ['a','b','c','d','e']
for i in range(5):
yield a[i],b[i]
c.executemany("insert into foo values(?,?)",getvals())
c.execute('''create table fooc(a,b);
insert into fooc(a,b) select x, group_concat(y) from foo group by x''')
c.execute('select * from fooc')
c.fetchall() ## reports three records
c.execute("select * from fooc where b like '%,%'")
c.fetchall() ## reports two records .. what I want
It seems crazy (and slow?) to use LIKE for this kind of need.

Add a HAVING clause to your query:
INSERT INTO fooc(a,b)
SELECT x, group_concat(y)
FROM foo
GROUP BY x
HAVING COUNT(*) > 1

Related

Python SQLite3 printing result of two combined tables makes problems

I have a problem programming a sqlite3 database in Python.
So I made two lists:
idata=[(0,"Ingredient1"),
(1,"Ingredient2")]
This is the first one that holds "Ingredients" and their ID's.
rdata=[(0,"Recipie1",0,1,1)]
And this is the second one that holds "Recipies" their ID's and and three numbers that indicate the ID of the "Ingredients" to be used in this "Recipie".
Then I created two tables that I filled with the data of these lists:
import sqlite3
conn = sqlite3.connect ("Alchemy_Data_Bank.dat")
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS recipie(id, name, iid_1, iid_2, iid_3);
""")
c.executemany("insert into recipie(id, name, iid_1, iid_2, iid_3) values (?,?,?,?,?)", rdata)
c.execute("""
CREATE TABLE IF NOT EXISTS ingredient(id, name);
""")
c.executemany("insert into ingredient(id, name) values (?,?)", idata)
conn.commit()
And now I want to print out the "Recipies" together with their "Ingredients" combined in a table. So I did this:
for p in c.execute("""SELECT DISTINCT recipie.name,
CASE WHEN recipie.iid_1 = ingredient.id THEN ingredient.name end,
CASE WHEN recipie.iid_2 = ingredient.id THEN ingredient.name end,
CASE WHEN recipie.iid_3 = ingredient.id THEN ingredient.name end
FROM recipie, ingredient;"""):
print(p)
c.close()
conn.close()
What I hoped to get as output is somethin like this:
('Recipie1','Ingredient1', 'Ingredient2', 'Ingredient2')
But it printed this:
('Recipie1', None, None, None)
('Recipie1', None, 'Ingredient2', 'Ingredient2')
('Recipie1', 'Ingedient1', None, None)
I think that my problem lies within the CASE WHEN statments as the programm compares recipie.iid_1, recipie.iid_2 and recipie.iid_3 only with one value for ingredient.id at a time.
So far as I've come the solution must be recursive slection in each CASE WHEN statment but I just can't figure out how to do that.
I hope that someone of you can tell me how to do that!
Thanks in advance!!
Cazo0

Try to rewrite a query. e.g.:
qry1 = """select name,
(select name from ingredient where ingredient.id = recipie.iid_1),
(select name from ingredient where ingredient.id = recipie.iid_2),
(select name from ingredient where ingredient.id = recipie.iid_3)
from recipie;"""
rsl = c.execute(qry1)
for r in rsl:
print (r)
see my gist for the whole code:
https://gist.github.com/mh70cz/5cfa595b455e87d7c08da5315b1abd21

cleaning a Postgres table of bad rows

I have inherited a Postgres database, and am currently in the process of cleaning it. I have created an algorithm to find the rows where the data is bad. The algorithm is encoded into the function called checkProblems(). Using this, I am able to select the rows that contains the bad rows, as shown below ...
schema = findTables(dbName)
conn = psycopg2.connect("dbname='%s' user='postgres' host='localhost'"%dbName)
cur = conn.cursor()
results = []
for t in tqdm(sorted(schema.keys())):
n = 0
cur.execute('select * from %s'%t)
for i, cs in enumerate(tqdm(cur)):
if checkProblem(cs):
n += 1
results.append({
'tableName': t,
'totalRows': i+1,
'badRows' : n,
})
cur.close()
conn.close()
print pd.DataFrame(results)[['tableName', 'badRows', 'totalRows']]
Now, I need to delete the rows that are bad. I have two different ways of doing it. First, I can write the clean rows in a temporary table, and rename the table. I think that this option is too memory-intensive. It would be much better if I would be able to just delete the specific record at the cursor. Is this even an option?
Otherwise, what is the best way of deleting a record under such circumstances? I am guessing that this should be a relatively common thing that database administrators do ...

Of course that delete the specific record at the cursor is better. You can do something like:
for i, cs in enumerate(tqdm(cur)):
if checkProblem(cs):
# if cs is a tuple with cs[0] being the record id.
cur.execute('delete from %s where id=%d'%(t, cs[0]))
Or you can store the ids of the bad records and then do something like
DELETE FROM table WHERE id IN (id1,id2,id3,id4)

MYSQL: how to insert statement without specifying col names or question marks?

I have a list of tuples of which i'm inserting into a Table.
Each tuple has 50 values. How do i insert without having to specify the column names and how many ? there is?
col1 is an auto increment column so my insert stmt starts in col2 and ends in col51.
current code:
l = [(1,2,3,.....),(2,4,6,.....),(4,6,7,.....)...]
for tup in l:
cur.execute(
"""insert into TABLENAME(col2,col3,col4.........col50,col51)) VALUES(?,?,?,.............)
""")
want:
insert into TABLENAME(col*) VALUES(*)

MySQL's syntax for INSERT is documented here: http://dev.mysql.com/doc/refman/5.7/en/insert.html
There is no wildcard syntax like you show. The closest thing is to omit the column names:
INSERT INTO MyTable VALUES (...);
But I don't recommend doing that. It works only if you are certain you're going to specify a value for every column in the table (even the auto-increment column), and your values are guaranteed to be in the same order as the columns of the table.
You should learn to use code to build the SQL query based on arrays of values in your application. Here's a Python example the way I do it. Suppose you have a dict of column: value pairs called data_values.
placeholders = ['%s'] * len(data_values)
sql_template = """
INSERT INTO MyTable ({columns}) VALUES ({placeholders})
"""
sql = sql_template.format(
columns=','.join(keys(data_values)),
placeholders=','.join(placeholders)
)
cur = db.cursor()
cur.execute(sql, data_values)

example code to put before your code:
cols = "("
for x in xrange(2, 52):
cols = cols + "col" + str(x) + ","
test = test[:-1]+")"
Inside your loop
for tup in l:
cur.execute(
"""insert into TABLENAME " + cols " VALUES {0}".format(tup)
""")
This is off the top of my head with no error checking

Variable in list name

I have this code :
cur.execute("SELECT * FROM foo WHERE date=?",(date,))
for row in cur:
list_foo.append(row[2])
cur.execute("SELECT * FROM bar WHERE date=?",(date,))
for row in cur:
list_bar.append(row[2])
It works fine, but I’d like to automize this. I have made a list of the tables in my sqlite database, and I’d like something like this :
table_list = ['foo','bar']
for t in table_list:
cur.execute("SELECT * FROM "+t+" WHERE date=?",(date,))
for row in cur:
# and here I’d like to append to the list which name depends of t (list_foo, then list_bar, etc.)
But I don’t know how to do that. Any idea ?

Use a dictionary to collect your data. Don't try to set new local names for each list.
You could use string templating too, and a list comprehension to turn your result rows into lists:
data = {}
for t in table_list:
cur.execute("SELECT * FROM {} WHERE date=?".format(t), (date,))
data[t] = [row[2] for row in cur]
One caveat: only do this with a pre-defined list of table names; don't ever interpolate untrusted input like that without hefty escaping to prevent SQL injection attacks.

PostgreSql and Python

I am using python and postgresql. I have a table with 6 column. One id and 5 entries. I want to copy the id and most repeated entry in 5 entries to a new table.
I have done this:
import psycopg2
connection=psycopg2.connect("dbname=homedb user=ria")
cursor=connection.cursor()
l_dict= {'licence_id':1}
cursor.execute("SELECT * FROM im_entry.usr_table")
rows=cursor.fetchall()
cursor.execute("INSERT INTO im_entry.pr_table (image_1d) SELECT image_1d FROM im_entry.usr_table")
for row in rows:
p = findmax(row) #to get most repeated entry from first table
.................
.................
Then how can I enter this p value to the new table?
Please help me

p is a tuple so you can create a new execute with the INSERT statement passing the tuple (or part):
cursor.execute("INSERT INTO new_table (x, ...) VALUES (%s, ...)", p)
where:
(x, ....) contains the column names
(%s, ...) %s is repeated for each column

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

group_concat only when more than one to group - python

Add a HAVING clause to your query: INSERT INTO fooc(a,b) SELECT x, group_concat(y) FROM foo GROUP BY x HAVING COUNT(*) > 1

Related

Python SQLite3 printing result of two combined tables makes problems

cleaning a Postgres table of bad rows

MYSQL: how to insert statement without specifying col names or question marks?

Variable in list name

PostgreSql and Python

Categories

Resources