I have 3 lists of user id's and time ranges (different for each user id) for which I would like to extract data. I am querying an AWS redshift database through Python. Normally, with one list, I'd do something like this:
sql_query = "select userid from some_table where userid in {}".format(list_of_users)
where list of users is the list of user id's I want - say (1,2,3...)
This works fine, but now I need to somehow pass it along a triplet of (userid, lower time bound, upper time bound). So for example ((1,'2018-01-01','2018-01-14'),(2,'2018-12-23','2018-12-25'),...
I tried various versions of this basic query
sql_query = "select userid from some_table where userid in {} and date between {} and {}".format(list_of_users, list_of_dates_lower_bound, list_of_dates_upper_bound)
but no matter how I structure the lists in format(), it doesn't work. I am not sure this is even possible this way or if I should just loop over my lists and call the query repeatedly for each triplet?
suppose the list of values are something like following:
list_of_users = [1,2],
list_of_dates_lower_bound = ['2018-01-01', '2018-12-23']
list_of_dates_lower_bound = ['2018-01-14', '2018-12-25']
the formatted sql would be:
select userid from some_table where userid in [1,2] and date between ['2018-01-01', '2018-12-23'] and ['2018-01-14', '2018-12-25']
This result should not be what you thought as is, it's just an invalid sql, the operand of between should be scalar value.
I suggest loop over the lists, and pass a single value to the placeholder.
You can select within a particular range by using
select col from table where col between range and range;
In your case it may be
select userid from some_table where date_from between yesterday and today;
or even
select userid from some_table where date_from >= yesterday and date_from <= today;
Related
I have two querysets -
A = Bids.objects.filter(*args,**kwargs).annotate(highest_priority=Case(*[
When(data_source=data_source, then Value(i))
for i, data_source in enumerate(data_source_order_list)
],
.order_by(
"date",
"highest_priority"
))
B= A.values("date").annotate(Min("highest_priority)).order_by("date")
First query give me all objects with selected time range with proper data sources and values. Through highest_priority i set which item should be selected. All items have additional data.
Second query gives me grouped by information about items in every date. In second query i do not have important values like price etc. So i assume i have to join these two tables and filter out where a.highest_priority = b.highest priority. Because in this case i will get queryset with objects and only one item per date.
I have tried using distinct - not working with .first()/.last(). Annotates gives me dict by grouped by, and grouping by only date cutting a lot of important data, but i have to group by only date...
Tables looks like that
A
B
How to join them? Because when i join them i could easily filter highest_prio with highest_prio and get my date with only one database shot. I want to use ORM, because i could just distinct and put it on the list and i do not want to hammer base with connecting multiple queries through date.
Look if this sugestion works :
SELECT * , (to_char(a.date, 'YYYYMMDD')::integer)*highest_priority AS prioritycalc;
FROM table A
JOIN table B ON (to_char(a.date, 'YYYYMMDD')::integer)*highest_priority = (to_char(b.date, 'YYYYMMDD')::integer)*highest_priority
ORDER BY prioritycalc DESC;
I am trying to execute a delete statement that checks if the table has any SKU that exists in the SKU column of the dataframe. And if it does, it deletes the row. As I am using a for statement to iterate through the rows and check, it takes a long time to run the program for 6000 rows of data.
I used executemany() as it was faster than using a for loop for the delete statement, but I am finding it hard to find an alternative for checking values in the dataframe.
sname = input("Enter name: ")
cursor = mydb.cursor(prepared=True)
column = df["SKU"]
data=list([(sname, x) for x in column])
query="""DELETE FROM price_calculations1 WHERE Name=%s AND SKU=%s"""
cursor.executemany(query,data)
mydb.commit()
cursor.close()
Is there a more efficient code for achieving the same?
You could first use a GET id FROM price_calculations1 WHERE Name=%s AND SKU=%s
and then use a MYSQL WHILE loop to delete these ids without the need of a cursor, which seems to be more performant.
See: https://www.mssqltips.com/sqlservertip/6148/sql-server-loop-through-table-rows-without-cursor/
A WHILE loop without the previous get, might also work.
See: https://dev.mysql.com/doc/refman/8.0/en/while.html
Rather than looping, try to do all the work in a single call to the database (this guideline is often applicable when working with databases).
Given a list of name / sku pairs:
pairs = [(name1, sku1), (name2, sku2), ...]
create a query that identifies all the matching records and deletes them
base_query = """DELETE FROM t1.price_calculations1 t1
WHERE t1.id IN (
SELECT t2.id FROM price_calculations1 t2
WHERE {})
"""
# Build the WHERE clause criteria
criteria = "OR ".join(["(name = %s AND sku = %s)"] * len(pairs))
# Create the query
query = base_query.format(criteria)
# "Flatten" the value pairs
values = [i for j in pairs for i in j]
cursor.execute(query, values)
cursor.commit()
I want to get all the columns of a table with max(timestamp) and group by name.
What i have tried so far is:
normal_query ="Select max(timestamp) as time from table"
event_list = normal_query \
.distinct(Table.name)\
.filter_by(**filter_by_query) \
.filter(*queries) \
.group_by(*group_by_fields) \
.order_by('').all()
the query i get :
SELECT DISTINCT ON (schema.table.name) , max(timestamp)....
this query basically returns two columns with name and timestamp.
whereas, the query i want :
SELECT DISTINCT ON (schema.table.name) * from table order by ....
which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing.
Can somebody help me?
What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting greatest-n-per-group results (N = 1). So instead of grouping and aggregating just
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
all()
This will end up selecting rows "grouped" by name, having the greatest timestamp value.
You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.
In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
from_self().\
order_by(Table.timestamp.desc()).\
all()
I have been performing a query to count how many times in my sqlite3 database table (Users), within the column "country", the value "Australia" occurs.
australia = db.session.query(Users.country).filter_by(country="Australia").count()
I need to do this in a more dynamic way for any country value that may be within this column.
I have tried the following but unfortunately I only get a count of 0 for all values that are passed in the loop variable (each).
country = list(db.session.query(Users.country))
country_dict = list(set(country))
for each in country_dict:
print(db.session.query(Users.country).filter_by(country=(str(each))).count())
Any assistance would be greatly appreciated.
The issue is that country is a list of result tuples, not a list of strings. The end result is that the value of str(each) is something along the lines of ('Australia',), which should make it obvious why you are getting counts of 0 as results.
For when you want to extract a list of single column values, see here. When you want distinct results, use DISTINCT in SQL.
But you should not first query distinct countries and then fire a query to count the occurrence of each one. Instead use GROUP BY:
country_counts = db.session.query(Users.country, db.func.count()).\
group_by(Users.country).\
all()
for country, count in country_counts:
print(country, count)
The main thing to note is that SQLAlchemy does not hide the SQL when using the ORM, but works with it.
If you can use the sqlite3 module with direct SQL it is a simple query:
curs = con.execute("SELECT COUNT(*) FROM users WHERE country=?", ("Australia",))
nb = curs.fetchone()[0]
I use MySQLDB to query some data from database, when use like in sql, I am confused about sql sentence.
As I use like, so I construct below sql which can get correct result.
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%",))
Now I want to make column b as variable as below. it will get none
cur.execute("SELECT a FROM table WHERE %s like %s limit 0,10", ("b", "%"+"ccc"+"%"))
I searached many website but not get result. I am a bit dizzy.
In the db-api, parameters are for values only, not for columns or other parts of the query. You'll need to insert that using normal string substitution.
column = 'b'
query = "SELECT a FROM table WHERE {} like %s limit 0,10".format(column)
cur.execute(query, ("%"+"ccc"+"%",))
You could make this a bit nicer by using format in the parameters too:
cur.execute(query, ("%{}%".format("ccc",))
The reason that the second query does not work is that the query that results from the substitution in the parameterised query looks like this:
select a from table where 'b' like '%ccc%' limit 0,10
'b' does not refer to a table, but to the static string 'b'. If you instead passed the string abcccba into the query you'd get a query that selects all rows:
cur.execute("SELECT a FROM table WHERE %s like %s limit 0,10", ("abcccba", "%"+"ccc"+"%"))
generates query:
SELECT a FROM table WHERE 'abcccba' like '%ccc%' limit 0,10
From this you should now be able to see why the second query returns an empty result set: the string b is not like %ccc%, so no rows will be returned.
Therefore you can not set values for table or column names using parameterised queries, you must use normal Python string subtitution:
cur.execute("SELECT a FROM table WHERE {} like %s limit 0,10".format('b'), ("abcccba", "%"+"ccc"+"%"))
which will generate and execute the query:
SELECT a FROM table WHERE b like '%ccc%' limit 0,10
You probably need to rewrite your variable substitution from
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%"))
to
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%"+"ccc"+"%",))
Note the trailing comma which adds a last empty element, which makes sure the tuple that states variables is longer than 1 element. In this example the string concatenation isn't even necessary, this code says:
cur.execute("SELECT a FROM table WHERE b like %s limit 0,10", ("%ccc%",))