I am looking to make a query that will look at the "reading" column of a table, and return the difference of the average readings over the past hour with another column (called height) id of 1 or 2.
Essentially, make average of all readings over past hour with height value of 1. Make average of all readings over past hour with height value of 2, and subtract the two.
How can I do this in one query in sqlalchemy?
Might not be want you want because I don't know sqlalchemy, but a raw Postgresql SQL query to achieve could be
SELECT
AVG(CASE WHEN height = 2 THEN reading ELSE NULL END) -
AVG(CASE WHEN height = 1 THEN reading ELSE NULL END)
FROM table
WHERE reading_time >= (CURRENT_TIMESTAMP - INTERVAL '1 HOUR')
This makes use of the fact that NULL values are excluded from the AVG aggregate function.
Related
I have a database table where some of the values in a column for sales are zero.
So far my query returns the min of all the values.
Sale.objects.all().aggregate(minimum=Min('sale'))
I want to exclude the values that are equal to zero while calculating the minimum or the average.
So my question is how is that query done?
Try:
Sale.objects.filter(sales > 0).aggregate(minimum=Min('sale'))
or:
Sale.objects.exclude(sales = 0).aggregate(minimum=Min('sale'))
Trying to write SQL queries in Python.
I have a data frame df with columns date, ID and amount
Every month I am getting a new load of data. I have to calculate the average amount for a particular ID for the last 12 months (means we will have 12 records for that one ID).
Currently, my approach is
M1 = pd.date_range(first_day_of_month, last_day_of_12_month, freq='D').strftime("%Y%d%m").tolist()
df["new"] = df[(df['date'].isin(M1))]['amount'].mean()
Now I want to upload this average as a new column, each ID with current (latest) time stamp has average of last 12 months amount. Tried using groupby but was not able to apply properly.
mask = d.date.between(datetime.datetime(2019,1,1),datetime.datetime(2019,12,31))
d[].groupby(['ID'])['amount'].mean()
I guess ? maybe ?
I have a situation where I need to select records from a Sybase table based on a certain condition
Record needs to extract on batches. If the total count is 2000 then I need to extract 500 in first batch and 500 in next batch till 2000 record count is reached.
I used the limit condition but it's giving a incorrect syntax
select top 2 *
from CERD_CORPORATE..BOOK
where id_bo_book in('5330')
limit(2,3)
You can't use the range for LIMIT condition, but you can use OFFSET keyword for this:
SELECT top 2 * FROM CERD_CORPORATE.BOOK
WHERE id_bo_book in('5330')
LIMIT 2 OFFSET 1;
On ASE 12.5.1 and onwards this can be done with a "SQL Derived Table" or "Inline View". The query requires that each row has a unique key so the table can be joined with itself and a count of the rows where the key value is less than the row being joined can be returned. This gives a monotonically increasing number with which to specify the limit and offset.
The equivalents of limit and offset are the values compared against x.rowcounter.
select
x.rowcounter,
x.error,
x.severity
from
(
select
t1.error,
t1.severity,
t1.description,
count(t2.error) as rowcounter
from
master..sysmessages t1,
master..sysmessages t2
where
t1.error >= t2.error
group by
t1.error,
t1.severity,
t1.description
) x
where
x.rowcounter >= 50
and x.rowcounter < 100
SQL Derived Tables are available as far back as Sybase ASE 12.5.1, SQL Derived Tables
The use of master..sysmessages in the example provides a reasonable (10,000 rows) data set with which to experiment.
i'm working with timeseries data with this format:
[timestamp][rain value]
i wanted to count rainfall events in the timeseries data, where we define a rainfall event as a subdataframe of the main dataframe which contains nonzero values between zero rainfall values
i managed to get the start of the dataframe by getting the index of rainfall value before the first nonzero value:
start = df.rain.values.nonzero()[0][0] - 1
cur = df[start:]
what i can't figure out is how to find the end. i was looking for some function zero():
end=cur.rain.values.zero()[0][0]
to find the next zero value in the rain column and mark that as the end of my subdataframe
additionally, because my data is sampled at 15min intervals, it would mean that a temporary lull of 15mins would give me two rainfall events instead of one, which realistically isn't true. which means i would like to define some time period - 6hrs for example - to separate rainfall events.
what i was thinking of (but could not execute because i couldn't find the end of the subdataframe yet), in pseudocode:
start = df.rain.values.nonzero()[0][0] - 1
cur = df[start:]
end=cur.rain.values.zero()[0][0]
temp = df[end:]
z = temp.rain.values.nonzero()[0][0] - 1
if timedelta (z-end) >=6hrs:
end stays as endpoint of cur
else:
z is new endpoint, find next nonzero to again check
so i guess my question is, how do i find the end of my subdataframe if i don't want to iterate over all rows
and am i on the right track with my pseudocode in defining the end of a rainfall event as, say, 6 hours of 0 rain.
i have an table in sqlite using pysqlite:
create table t
(
id integer primary key not null,
time datetime not null,
price decimal(5,2)
)
how can i from this data calculate moving average with window X seconds large with an sql statement?
As far as I understand your question, You do not want the average over the last N items, but over the last x seconds, am I correct?
Well, this gives you the list of all prices recorded the last 720 seconds:
>>> cur.execute("SELECT price FROM t WHERE datetime(time) > datetime('now','-720 seconds')").fetchall()
of course, you can feed that to the AVG-SQL-Function, to get the average price in that window:
>>> cur.execute("SELECT AVG(price) FROM t WHERE datetime(time) > datetime('now','-720 seconds')").fetchall()
You can also use other time units, and even chain them.
For example, to obtain the average price for the last one and a half hour, do:
>>> cur.execute("SELECT AVG(price) FROM t WHERE datetime(time) > datetime('now','-30 minutes','-1 hour')").fetchall()
Edit: SQLite datetime reference can be found here
The moving average with a window x units large is given at time i by:
(x[i] + x[i+1] + ... + x[i+x-1]) / x
To compute it, you want to make a LIFO stack (which you can implement as a queue) of size x, and compute its sum. You can then update the sum by adding the new value and subtracting the old one from the old sum; you get the new one from the database and the old one by popping the first element off the stack.
Does that make sense?