Combining 2 columns in sql query based on alternate field - python

I have 2 columns in a database(sent_time, accept_time) which each include time stamps, as well as a field that can have 2 different values (ref_col). I would like to find a way in my query to make a new column (result_col), which will check the value of ref_col, and copy sent_time if the value is 1, and copy accept_time if the value is 2.
I am using pandas to query the database in python, if that has any bearing on the answer.

Just use case expression statement :
SELECT sent_time,
accept_time,
ref_col,
CASE WHEN ref_col = 1 THEN sent_col
ELSE accept_col
END AS result_col
FROM Your_Table

When you say "I have 2 columns in a database", what you actually mean is that you have 2 columns in a table, right?
In sql for postgresql it would be something like:
select (case when ref_col = 1 then sent_time else accept_time end) as result_col
from mytable
don't know how close from SQL standard that is, but would assume it's not that far.

Related

Query external tuples list embedded within a SQL query

I have to run a sql query that grabs the values only if two conditions are true. So for example, I need to grab all the values where asset=x and id_name=12345. There are about 10k combinations between asset and id_name that I need to be able to query for using sql. Usually I would just do the following:
select * from database where id_name IN (12345)
But how do I make this query when two conditions have to be true. id_name has to equal 12345 AND asset has to equal x.
I tried turning the list i need into tuples like this:
new_list = list(scams[['asset', 'id_name']].itertuples(index=False, name=None))
which gives me a list like this:
new_list = (12345, x), (32342, z)...etc.
Any suggestions would be great. Thanks!
Based on my understanding you need to query or fetch records based on a combination of two filters. Also you have around 10K combinations. Here is a simple SQL based solution.
Create a new column in the same table or build a temp table/view with a new column say "column_new". Populate the concatenated value of id_name and asset in the new column. You can use a concatenation function based on the database. Example in SQL server use CONCAT(column1,column2).
Now you can write your SQL as select * from database where colum_new IN ("12345x","32342z");.
Note : You can also use a "-" or "|" between column 1 and column 2 while doing a concatenation.

What is the correct way to use distinct on (Postgres) with SqlAlchemy?

I want to get all the columns of a table with max(timestamp) and group by name.
What i have tried so far is:
normal_query ="Select max(timestamp) as time from table"
event_list = normal_query \
.distinct(Table.name)\
.filter_by(**filter_by_query) \
.filter(*queries) \
.group_by(*group_by_fields) \
.order_by('').all()
the query i get :
SELECT DISTINCT ON (schema.table.name) , max(timestamp)....
this query basically returns two columns with name and timestamp.
whereas, the query i want :
SELECT DISTINCT ON (schema.table.name) * from table order by ....
which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing.
Can somebody help me?
What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting greatest-n-per-group results (N = 1). So instead of grouping and aggregating just
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
all()
This will end up selecting rows "grouped" by name, having the greatest timestamp value.
You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.
In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:
event_list = Table.query.\
distinct(Table.name).\
filter_by(**filter_by_query).\
filter(*queries).\
order_by(Table.name, Table.timestamp.desc()).\
from_self().\
order_by(Table.timestamp.desc()).\
all()

count how many times in an sqlite3 database table column the values occurs

I have been performing a query to count how many times in my sqlite3 database table (Users), within the column "country", the value "Australia" occurs.
australia = db.session.query(Users.country).filter_by(country="Australia").count()
I need to do this in a more dynamic way for any country value that may be within this column.
I have tried the following but unfortunately I only get a count of 0 for all values that are passed in the loop variable (each).
country = list(db.session.query(Users.country))
country_dict = list(set(country))
for each in country_dict:
print(db.session.query(Users.country).filter_by(country=(str(each))).count())
Any assistance would be greatly appreciated.
The issue is that country is a list of result tuples, not a list of strings. The end result is that the value of str(each) is something along the lines of ('Australia',), which should make it obvious why you are getting counts of 0 as results.
For when you want to extract a list of single column values, see here. When you want distinct results, use DISTINCT in SQL.
But you should not first query distinct countries and then fire a query to count the occurrence of each one. Instead use GROUP BY:
country_counts = db.session.query(Users.country, db.func.count()).\
group_by(Users.country).\
all()
for country, count in country_counts:
print(country, count)
The main thing to note is that SQLAlchemy does not hide the SQL when using the ORM, but works with it.
If you can use the sqlite3 module with direct SQL it is a simple query:
curs = con.execute("SELECT COUNT(*) FROM users WHERE country=?", ("Australia",))
nb = curs.fetchone()[0]

SQLAlchemy Select with Exist limiting

I have a table with 4 columns (1 PK) from which I need to select 30 rows.
Of these rows, two columns (col. A and B) must exists in another table (8 columns, 1 PK, 2 are A and B).
Second table is large, contains millions of records and it's enough for me to know if even a single row exists containing values of col. A and B of 1st table.
I am using the code below:
query = db.Session.query(db.Table_1).\
filter(
exists().where(db.Table_2.col_a == db.Table_1.col_a).\
where(db.Table_2.col_b == db.Table_2.col_b)
).limit(30).all()
This query gets me the results I desire however I'm afraid it might be a bit slow since it does not imply a limit condition to exists() function nor does it do select 1 but a select *.
exists() does not accept a .limit(1)
How can I put a limit to exists to get it not to look for whole table, hence making this query run faster?
I need n rows from Table_1, which 2 columns exist in a record in
Table_2
Thank you
You can do the "select 1" thing using a more explicit form as it mentioned here, that is,
exists([1]).where(...)
However, while I've been a longtime diehard "select 1" kind of guy, I've since learned that the usage of "1" vs. "*" for performance is now a myth (more / more).
exists() is also a wrapper around select(), so you can get a limit() by constructing the select() first:
s = select([1]).where(
table1.c.col_a == table2.c.colb
).where(
table1.c.colb == table2.c.colb
).limit(30)
s = exists(s)
query=select([db.Table_1])
query=query.where(
and_(
db.Table_2.col_a == db.Table_1.col_a,
db.Table_2.col_b == db.Table_2.col_b
)
).limit(30)
result=session.execute(query)

python and cx_Oracle - dynamic cursor.setinputsizes

I'm using cx_Oracle to select rows from one database and then insert those rows to a table in another database. The 2nd table's columns match the first select.
So I have (simplified):
db1_cursor.execute('select col1, col2 from tab1')
rows = db1_cursor.fetchall()
db2_cursor.bindarraysize = len(rows)
db2_cursor.setinputsizes(cx_Oracle.NUMBER, cx_Oracle.BINARY)
db2_cursor.executemany('insert into tab2 values (:1, :2)', rows)
This works fine, but my question is how to avoid the hard coding in setinputsizes (I have many more columns).
I can get the column types from db1_cursor.description, but I'm not sure how to feed those into setinputsizes. i.e. how can I pass a list to setinputsizes instead of arguments?
Hope this makes sense - new to python and cx_Oracle
Just use tuple unpacking.
eg.
db_types = (d[1] for d in db1_cursor.description)
db2_cursor.setinputsizes(*db_types)

Categories