I am trying to query a table in an existing sqlite database. The data must first be subsetted as such, from a user input:
query(Data.num == input)
Then I want to find the max and min of another field: date in this subset.
I have tried using func.min/max, as well as union, but received an error saying the columns do not match. One of the issues here is that func.min/max need to be used as query arguments, not filter.
ids = session.query(Data).filter(Data.num == input)
q = session.query(func.max(Data.date),
func.min(Data.date))
ids.union(q).all()
ArgumentError: All selectables passed to CompoundSelect must have identical numbers of columns; select #1 has 12 columns, select #2 has 2
Similarly, if I use func.max and min separately, the error says #2 has 1 column.
I think seeing this query in SQL might help as well.
Thanks
The following solution works. You first need to set up the query, then filter the data down afterwards.
query = session.query(Data.num, func.min(Data.date),
func.max(Data.date), Data.date)
query = query.filter(Data.num == input)
query = query.all()
Related
I have:
res = db.engine.execute('select count(id) from sometable')
The returned object is sqlalchemy.engine.result.ResultProxy.
How do I get count value from res?
Res is not accessed by index but I have figured this out as:
count=None
for i in res:
count = res[0]
break
There must be an easier way right? What is it? I didn't discover it yet.
Note: The db is a postgres db.
While the other answers work, SQLAlchemy provides a shortcut for scalar queries as ResultProxy.scalar():
count = db.engine.execute('select count(id) from sometable').scalar()
scalar() fetches the first column of the first row and closes the result set, or returns None if no row is present. There's also Query.scalar(), if using the Query API.
what you are asking for called unpacking, ResultProxy is an iterable, so we can do
# there will be single record
record, = db.engine.execute('select count(id) from sometable')
# this record consist of single value
count, = record
The ResultProxy in SQLAlchemy (as documented here http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=execute#sqlalchemy.engine.ResultProxy) is an iterable of the columns returned from the database. For a count() query, simply access the first element to get the column, and then another index to get the first element (and only) element of that column.
result = db.engine.execute('select count(id) from sometable')
count = result[0][0]
If you happened to be using the ORM of SQLAlchemy, I would suggest using the Query.count() method on the appropriate model as shown here: http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=count#sqlalchemy.orm.query.Query.count
Imagine one has two SQL tables
objects_stock
id | number
and
objects_prop
id | obj_id | color | weight
that should be joined on objects_stock.id=objects_prop.obj_id, hence the plain SQL-query reads
select * from objects_prop join objects_stock on objects_stock.id = objects_prop.obj_id;
How can this query be performed with SqlAlchemy such that all returned columns of this join are accessible?
When I execute
query = session.query(ObjectsStock).join(ObjectsProp, ObjectsStock.id == ObjectsProp.obj_id)
results = query.all()
with ObjectsStock and ObjectsProp the appropriate mapped classes, the list results contains objects of type ObjectsStock - why is that? What would be the correct SqlAlchemy-query to get access to all fields corresponding to the columns of both tables?
Just in case someone encounters a similar problem: the best way I have found so far is listing the columns to fetch explicitly,
query = session.query(ObjectsStock.id, ObjectsStock.number, ObjectsProp.color, ObjectsProp.weight).\
select_from(ObjectsStock).join(ObjectsProp, ObjectsStock.id == ObjectsProp.obj_id)
results = query.all()
Then one can iterate over the results and access the properties by their original column names, e.g.
for r in results:
print(r.id, r.color, r.number)
A shorter way of achieving the result of #ctenar's answer is by unpacking the columns using the star operator:
query = (
session
.query(*ObjectsStock.__table__.columns, *ObjectsProp.__table__.columns)
.select_from(ObjectsStock)
.join(ObjectsProp, ObjectsStock.id == ObjectsProp.obj_id)
)
results = query.all()
This is useful if your tables have many columns.
I want to get all the columns of a table with max(timestamp) and group by name.
What i have tried so far is:
normal_query ="Select max(timestamp) as time from table"
event_list = normal_query \
.distinct(Table.name)\
.filter_by(**filter_by_query) \
.filter(*queries) \
.group_by(*group_by_fields) \
.order_by('').all()
the query i get :
SELECT DISTINCT ON (schema.table.name) , max(timestamp)....
this query basically returns two columns with name and timestamp.
whereas, the query i want :
SELECT DISTINCT ON (schema.table.name) * from table order by ....
which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing.
Can somebody help me?
What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting greatest-n-per-group results (N = 1). So instead of grouping and aggregating just
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
all()
This will end up selecting rows "grouped" by name, having the greatest timestamp value.
You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.
In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
from_self().\
order_by(Table.timestamp.desc()).\
all()
I have been performing a query to count how many times in my sqlite3 database table (Users), within the column "country", the value "Australia" occurs.
australia = db.session.query(Users.country).filter_by(country="Australia").count()
I need to do this in a more dynamic way for any country value that may be within this column.
I have tried the following but unfortunately I only get a count of 0 for all values that are passed in the loop variable (each).
country = list(db.session.query(Users.country))
country_dict = list(set(country))
for each in country_dict:
print(db.session.query(Users.country).filter_by(country=(str(each))).count())
Any assistance would be greatly appreciated.
The issue is that country is a list of result tuples, not a list of strings. The end result is that the value of str(each) is something along the lines of ('Australia',), which should make it obvious why you are getting counts of 0 as results.
For when you want to extract a list of single column values, see here. When you want distinct results, use DISTINCT in SQL.
But you should not first query distinct countries and then fire a query to count the occurrence of each one. Instead use GROUP BY:
country_counts = db.session.query(Users.country, db.func.count()).\
group_by(Users.country).\
all()
for country, count in country_counts:
print(country, count)
The main thing to note is that SQLAlchemy does not hide the SQL when using the ORM, but works with it.
If you can use the sqlite3 module with direct SQL it is a simple query:
curs = con.execute("SELECT COUNT(*) FROM users WHERE country=?", ("Australia",))
nb = curs.fetchone()[0]
I have a table with 4 columns (1 PK) from which I need to select 30 rows.
Of these rows, two columns (col. A and B) must exists in another table (8 columns, 1 PK, 2 are A and B).
Second table is large, contains millions of records and it's enough for me to know if even a single row exists containing values of col. A and B of 1st table.
I am using the code below:
query = db.Session.query(db.Table_1).\
filter(
exists().where(db.Table_2.col_a == db.Table_1.col_a).\
where(db.Table_2.col_b == db.Table_2.col_b)
).limit(30).all()
This query gets me the results I desire however I'm afraid it might be a bit slow since it does not imply a limit condition to exists() function nor does it do select 1 but a select *.
exists() does not accept a .limit(1)
How can I put a limit to exists to get it not to look for whole table, hence making this query run faster?
I need n rows from Table_1, which 2 columns exist in a record in
Table_2
Thank you
You can do the "select 1" thing using a more explicit form as it mentioned here, that is,
exists([1]).where(...)
However, while I've been a longtime diehard "select 1" kind of guy, I've since learned that the usage of "1" vs. "*" for performance is now a myth (more / more).
exists() is also a wrapper around select(), so you can get a limit() by constructing the select() first:
s = select([1]).where(
table1.c.col_a == table2.c.colb
).where(
table1.c.colb == table2.c.colb
).limit(30)
s = exists(s)
query=select([db.Table_1])
query=query.where(
and_(
db.Table_2.col_a == db.Table_1.col_a,
db.Table_2.col_b == db.Table_2.col_b
)
).limit(30)
result=session.execute(query)