How to Loop Through Dates in Python to pass into PostgresSQL query - python

I have 2 date variables which I pass into a SQL query via Python. It looks something like this:
start = '2019-10-01'
finish = '2019-12-22'
code_block = '''select sum(revenue) from table
where date between '{start}' and '{finish}'
'''.format(start = start, finish = finish)
That gets me the data I want for the current quarter, however I want to be able to loop through this same query for the previous 5 quarters. Can someone help me figure out a way so that this runs for the current quarter, then updates both start and finish to previous quarter, runs the query, and then keeps going until 5 quarters ago?

Consider adding a year and quarter grouping in aggregate SQL query and avoid the Python looping. And use a date difference of 15 months (i.e., 5 quarters) even use NOW() for end date. Also, use parameterization (supported in pandas) and not string formatting for dynamic querying.
code_block = '''select concat(date_part('year', date)::text,
'Q', date_part('quarter', date)::text) as yyyyqq,
sum(revenue) as sum_revenue
from table
where date between (%s::date - INTERVAL '15 MONTHS') and NOW()
group by date_part('year', date),
date_part('quarter', date)
'''
df = pd.read_sql(code_block, myconn, params=[start])
If you still need separate quarterly data frames use groupby to build a dictionary of data frames for the 5 quarters.
# DICTIONARY OF QUARTERLY DATA FRAMES
df_dict = {i:g for i,g in df.groupby(['yyyyqq'])}
df_dict['2019Q4'].head()
df_dict['2019Q3'].tail()
df_dict['2019Q2'].describe()
...

Just define a list with start dates and a list with finish dates and loop through them with:
for date_start, date_finish in zip(start_list, finish_list):
start = date_start
finish = date_finish
# here you insert the query
Hope this is what you are looking for =)

Related

Delete datetime from SQL database based on hour

I'm a python dev, I'm handling an SQL database through sqlite3 and I need to perform a certain SQL query to delete data.
I have tables which contain datetime objects as keys.
I want to keep only one row per hour (the last record for that specific time) and delete the rest.
I also need this to only happen on data older than 1 week.
Here's my attempt:
import sqlite3
c= db.cursor()
c.execute('''DELETE FROM TICKER_AAPL WHERE time < 2022-07-11 AND time NOT IN
( SELECT * FROM
(SELECT min(time) FROM TICKER_AAPL GROUP BY hour(time)) AS temp_tab);''')
Here's a screenshot of the table itself:
First change the format of your dates from yyyyMMdd ... to yyyy-MM-dd ..., because this is the only valid text date format for SQLite.
Then use the function strftime() in your query to get the hour of each value in the column time:
DELETE FROM TICKER_AAPL
WHERE time < date(CURRENT_DATE, '-7 day')
AND time NOT IN (SELECT MAX(time) FROM TICKER_AAPL GROUP BY strftime('%Y-%m-%d %H', time));

Get max of column data for entire day 7 days ago

Using pyodbc for python and a postgreSQL database, I am looking to gather the max data of a specific day 7 days ago from the script's run date. The table has the following columns which may prove useful, Timestamp (yyyy-MM-dd hh:mm:ss.ffff), year, month, day.
I've tried the following few
mon = currentDate - dt.timedelta(days=7)
monPeak = cursor.execute("SELECT MAX(total) FROM {} WHERE timestamp = {};".format(tableName, mon)).fetchval()
Error 42883: operator does not exist: timestamp with timezone = integer
monPeak = cursor.execute("SELECT MAX(total) FROM {} WHERE timestamp = NOW() - INTERVAL '7 DAY'".format(tableName)).fetchval()
No error, but value is returned as 'None' (didn't think this was a viable solution anyways because I want the max of the entire day, not that specific time)
I've tried a few different ways of incorporating year, date, and time columns from db table but no good. The goal is to gather the max of data for every day of the prior week. Any ideas?
You have to cast the timestamp to a date if you want to do date comparisons.
WHERE timestamp::date = (NOW() - INTERVAL '7 DAY')::date
Note that timestamptz to date conversions are not immutable, they depend on your current timezone setting.

Using pandas and datetime in Jupyter to see during what hours no sales were ever made (on any day)

So I have sales data that I'm trying to analyze. I have datetime data ["Order Date Time"] and I'd like to see the most common hours for sales but more importantly I'd like to see what minutes have NO sales.
I have been spinning my wheels for a while and I can't get my brain around a solution. Any help is greatly appreciated.
I import the data:
df = pd.read_excel ('Audit Period.xlsx')
print (df)
I clean up the data:
# Remove all columns except `applieddate` and null rows
time_df = df[df["Order Date Time"].notnull()]
# Ensure the index is still sequential
time_df = time_df[["Order Date Time"]].reset_index(drop=True)
# Select the first 10 rows
time_df.head(10)
I convert to datetime and I look at the month totals:
# Convert applieddate to datetime
time_df = time_df.copy()
time_df["Order Date Time"] = time_df["Order Date Time"].apply(pd.to_datetime)
time_df = time_df.set_index(time_df["Order Date Time"])
# Group by month
grouped = time_df.resample("M").count()
time_df = pd.DataFrame({"count": grouped.values.flatten()}, index=grouped.index)
time_df.head(10)
I try to group by hour but that gives me totals per day/hour rather than totals per hour like every order ever at noon, etc:
# Group by hour
grouped = time_df.resample("2H").count()
time_df = pd.DataFrame({"count": grouped.values.flatten()}, index=grouped.index)
time_df.head(10)
And that is where I'm stuck. I'm trying to integrate the below suggestions but can't quite get a grasp on them yet. Any help would be appreciated.
Not sure if this is the most brilliant solution, but I would start by generating a dataframe at the level of detail I wanted, whether that is 1-hour intervals, 5-minute intervals, etc. Then in your df with all the actual data, you could do your grouping as you currently are doing it above. Once it is grouped, join the two. That way you have one dataframe that includes empty rows associated with time spans with no records. The tricky part will just be making sure you have your date and time formatted in a way that it will match and join properly.

Variables in a Postgres view?

I have a view in Postgres which queries a master table (150 million rows) and retrieves data from the prior day (a function which returns SELECT yesterday; it was the only way to get the view to respect my partition constraints) and then joins it with two dimension tables. This works fine, but how would I loop through this query in Python? Is there a way to make the date dynamic?
for date in date_range('2016-06-01', '2017-07-31'):
(query from the view, replacing the date with the date in the loop)
My workaround was to literally copy and paste the entire view as a huge select statement format string, and then pass in the date in a loop. This worked, but it seems like there must be a better solution to utilize an existing view or to pass in a variable which might be useful in the future.
To loop day by day inside the interval on a for loop you could do something like:
import datetime
initialDate = datetime.datetime(2016, 6, 1)
finalDate = datetime.datetime(2017, 7, 31)
for day in range((finalDate - initialDate).days + 1):
current = (initialDate + datetime.timedelta(days = day)).date()
print("query from the view, replacing the date with " + current.strftime('%m/%d/%Y'))
Replacing the print with the call to your query. If the dates are strings you can do something like:
initialDate = datetime.datetime.strptime("06/01/2016", '%m/%d/%Y')

Python Count number of records within a given date range

We have a backend table that stores details of transaction including seconds since epoch. I am creating a UI where I collect from-to dates to display counts of transaction occurred in-between the dates.
Assuming that the date range is from 07/01/2012 - 07/30/2012, I am unable to establish a logic that will increment a counter for records that happened within the time period. I should hit the DB only once as hitting for each day will give poor performance.
I am stuck at a logic:
Convert 07/01/2012 & 07/30/2012 to seconds since epoch.
Get the records for start date - end date [as converted to seconds since epoch]
For each record get the month / date
-- now how will we add counters for each date in between 07/01/2012 - 07/30/2012
MySQL has the function FROM_UNIXTIME which will convert your seconds since epoch into datetime and you can then extract the DATE part of it (YYYY-MM-DD format) and group according to it.
SELECT DATE(FROM_UNIXTIME(timestamp_column)), COUNT(*)
FROM table_name
GROUP BY DATE(FROM_UNIXTIME(timestamp_column))
This will return something like
2012-07-01 2
2012-07-03 4
…
(no entries for days without transactions)

Categories