Get range of dates across multiple years - python

I have a sqllite table which contains a datetime col of dates and col of real numbers. Any ideas on how i can query to get all all the years within that date range
Note the dates are stored as yyyy-mm-dd
For example if I had a table with all the dates from Jan 1 1990 - Dec 31 2021 and I wanted between Feb 1 and Feb 2 I would get
2022-02-01
2022-02-02
2021-02-01
2021-02-02
...
1991-02-01
1991-02-02
1990-02-01
1990-02-02
or if i query Jan 31 to Feb 2 I would get
2022-01-31
2022-02-01
2022-02-02
2021-01-31
2021-02-01
2021-02-02
...
1991-01-31
1991-02-01
1991-02-02
1991-01-31
1991-02-01
1991-02-02
I'm trying to do this using peewee for python but I'm not even sure how to write this sql statement. From what I've seen sql has a between statement but this wont work as it would give me only one year when I want every record in the database between a given range.

Since it appears that the author's sample data posted originally was not representative of the actual data, and the ACTUAL data is stored in YYYY-MM-DD, then it is quite simple using sqlite's strftime():
start = '01-31'
end = '02-02'
query = (Reg.select()
.where(fn.strftime('%m-%d', Reg.date).between(start, end)))
for row in query:
print(row.date)
Should print something like:
2018-01-31
2018-02-01
2018-02-02
2019-01-31
...

Consider filtering query by building dates with date() keeping year of current date. Date ranges across years may need to be split with a self-join or union:
SELECT *
FROM my_table
WHERE my_date BETWEEN date(strftime('%Y', my_date) ||'-01-31')
AND date(strftime('%Y', my_date) ||'-02-02')

To get the years within a date range in a SQLite table, you can use the strftime function to extract the year from the date column and then use the distinct keyword to only return unique years.
SELECT DISTINCT strftime('%Y', date_column) as year
FROM table_name
WHERE date_column BETWEEN '2022-01-01' AND '2022-12-31'

I accidentally ran into this solution (it was a problem for me) working in SQL Server, where I was trying to get records created in the last week;
SELECT *
FROM table1
WHERE DATEPART(WK, table1.date_entered) = DATEPART (WK, GETDATE()) -1
This returns everything that was created in Week now-1 of all previous years, similar to the SQLite strftime(%W, my-date-here).
If the dates you are querying don't span over a change in years (i.e; not Dec-29 --> Jan-5) you could do something like the below;
SELECT date_column
FROM myTable
WHERE strftime($j, date_column) >= strftime($j, **MY_START_DATE**)
AND strftime($j, date_column) <= strftime($j, **MY_END_DATE**)
Here we get the Day of the year (thats what $j gives us) and select anything from date_column where the day of the year is between our start and end dates. If i understood your question properly, it will give you as listed
2022-02-01
2022-02-02
2021-02-01
2021-02-02
...
IF you want further information on working with dates I can't do better than refer you to "Robyn Page’s SQL Server DATE/TIME Workbench".
Obviously this article is on SQL Server but most of it translates to other DB's.

Related

How to query dates in pandas using pd.DataFrame.query()

When querying rows in a dataframe based on a datcolumn value, it works when comparing just the year, but not for a date.
fil1.query('Date>2019')
This works fine. However when giving complete date, it fails.
fil1.query('Date>01-01-2019')
#fil1.query('Date>1-1-2019') # fails as well
TypeError: Invalid comparison between dtype=datetime64[ns] and int
What is the correct way to use dates on query function ? The docs doesnt seem to help.
There are two errors in your code. Date default format is yyyy-mm-dd, and you should use "" for values.
fil1.query('Date>"2019-01-01"')
Query on dates works for me
df = pd.DataFrame({'Dates': ['2022-01-01', '2022-01-02','2022-01-03','2022-01-04']})
df
Out[101]:
Dates
0 2022-01-01
1 2022-01-02
2 2022-01-03
3 2022-01-04
df.query('Dates>"2022-01-02"')
Out[102]:
Dates
2 2022-01-03
3 2022-01-04
If you filter only the year, this is probably an integer value, so this works:
fil1.query('year > 2019')
A full date-string must be filtered with quotation marks, e.g.
fil1.query('date > "2019-01-01"')
It's a bit like in SQL, there you also cannot filter like WHERE date > 1-1-2019, you need to do WHERE date > '2019-01-01'.

Get max of column data for entire day 7 days ago

Using pyodbc for python and a postgreSQL database, I am looking to gather the max data of a specific day 7 days ago from the script's run date. The table has the following columns which may prove useful, Timestamp (yyyy-MM-dd hh:mm:ss.ffff), year, month, day.
I've tried the following few
mon = currentDate - dt.timedelta(days=7)
monPeak = cursor.execute("SELECT MAX(total) FROM {} WHERE timestamp = {};".format(tableName, mon)).fetchval()
Error 42883: operator does not exist: timestamp with timezone = integer
monPeak = cursor.execute("SELECT MAX(total) FROM {} WHERE timestamp = NOW() - INTERVAL '7 DAY'".format(tableName)).fetchval()
No error, but value is returned as 'None' (didn't think this was a viable solution anyways because I want the max of the entire day, not that specific time)
I've tried a few different ways of incorporating year, date, and time columns from db table but no good. The goal is to gather the max of data for every day of the prior week. Any ideas?
You have to cast the timestamp to a date if you want to do date comparisons.
WHERE timestamp::date = (NOW() - INTERVAL '7 DAY')::date
Note that timestamptz to date conversions are not immutable, they depend on your current timezone setting.

How to get the average of a column that is connected to a datetime column in mysql?

Hello I am trying to get the average of column tc that has a specific date as well as getting its another average in each different date only. Is there a mysql query for it? Here is my weatherdata table:
tc is 31 date is 2022-03-11
tc is 35 date is 2022-03-13
tc is 41 date is 2022-03-14
tc is 100 date is 2022-03-15
My current try of mysqlquery is this
select round(avg(tc),0),date_format(dtime,'%m/%d/%Y') as timeee from weatherdata where DATE(dtime) BETWEEN '2022-03-13' AND '2022-03-15';
I am trying to achieve this one using Python and Matplotlib wherein the dates in mysql is shown in the x axis of the graph and the y values that are plotted are the average of column tc in each different date.
trying to achieve this
Hopefully someone can help me please thanks. Still learning
It sound to me like you want to have an average for each day. You probably want to use a GROUP BY clause:
SELECT
DATE_FORMAT(dtime,'%m/%d/%Y') as timeee,
ROUND(AVG(tc),0)
FROM
weatherdata
WHERE
DATE(dtime) BETWEEN '2022-03-13' AND '2022-03-15'
GROUP BY 1
ORDER BY DATE(dtime);

How to retrieve a certain day of the month for each row based on a dataframe value?

I am trying to replace some hardcoded SQL queries related to timezone changes with a more dynamic/data-driven Python script. I have a dataset that looks like this spreadsheet below. WEEK_START/DAY/MONTH is the week, day, and month when daylight savings time begins (for example Canberra starts the first Sunday of April while Vienna is the last Sunday of March). The end variables are in the same format and display when it ends.
Dataset
Here is the issue. I have seen solutions for specific use cases such as this, finding the last Sunday of the month:
current_year=today.year
current_month=today.month
current_day=today.day
month = calendar.monthcalendar(current_year, current_month)
day_of_month = max(month[-1][calendar.SUNDAY], month[-2][calendar.SUNDAY])
print(day_of_month)
31
This tells me that the last day of this month is the 31st. I can adjust the attributes for one given month/scenario, but how would I make a column for each and every row (city) to retrieve each? That is, several cities that change times on different dates? I thought if I could set attributes in day_of_month in an apply function it would work but when I do something like weekday='SUNDAY' it returns an error because of course the string 'SUNDAY' is not the same as SUNDAY the attribute of calendar. My SQL queries are grouped by cities that change on the same day but ideally anyone would be able to edit the CSV that loads the above dataset as needed and then each day the script would run once to see if today is between the start and end of daylight savings. We might have new cities to add in the future. I'm confident in doing that bit but quite lost on how to retrieve the dates for a given year.
My alternate, less resilient, option is to look at the distinct list of potential dates (last Sunday of March, first Sunday of April, etc.), write code to retrieve each one upfront (as in the above snippet above), and assign the dates in that way. I say that this is less resilient because if a city is added that does not fit in an existing group for time changes, the code would need to be altered as well.
So stackoverflow, is there a way to do this in a data driven way in pandas through an apply or something similar? Thanks in advance.
Basically I think you have most of what you need. Just map the WEEK_START / WEEK_END column {-1, 1} to last or first day of month, put it all in a function and apply it to each row. EX:
import calendar
import operator
import pandas as pd
def get_date(year: int, month: int, dayname: str, first=-1) -> pd.Timestamp:
"""
get the first or last day "dayname" in given month and year.
returns last by default.
"""
daysinmonth = calendar.monthcalendar(year, month)
getday = operator.attrgetter(dayname.upper())
if first == 1:
day = daysinmonth[0][getday(calendar)]
else:
day = max(daysinmonth[-1][getday(calendar)], daysinmonth[-2][getday(calendar)])
return pd.Timestamp(year, month, day)
year = 2021 # we need a year...
df['date_start'] = df.apply(lambda row: get_date(year,
row['MONTH_START'],
row['DAY_START'],
row['WEEK_START']), # selects first or last
axis=1) # to each row
df['date_end'] = df.apply(lambda row: get_date(year,
row['MONTH_END'],
row['DAY_END'],
row['WEEK_END']),
axis=1)
giving you for the sample data
df[['CITY', 'date_start', 'date_end']]
CITY date_start date_end
0 Canberra 2021-04-04 2021-10-03
1 Melbourne 2021-04-04 2021-10-03
2 Sydney 2021-04-04 2021-10-03
3 Kitzbuhel 2021-03-28 2021-10-31
4 Vienna 2021-03-28 2021-10-31
5 Antwerp 2021-03-28 2021-10-31
6 Brussels 2021-03-28 2021-10-31
7 Louvain-la-Neuve 2021-03-28 2021-10-31
Once you start working with time zones and DST transitions, Q: Is there a way to infer in Python if a date is the actual day in which the DST (Daylight Saving Time) change is made? might also be interesting to you.

Distinguish between, for example, month February (02) and date 02 in pandas date column in Python

I am new to Python and I am working on a project where I work with timeseries data. I have a pandas dataframe containing the date of my dataset, a small example can be seen below (dates ranging for a whole year):
result_time: 2021-01-01 00:00:08, 2021-01-01 00:00:18, 2021-01-01 00:00:28...
I am processing this column in order to determine if the specific date is a weekday or not. When processing moves to the second day of January, i.e: 2021-02-01 12:07:17, 2021-02-01 12:07:27, 2021-02-01 12:07:37, and so on, the day part of the date (02) is considered as month February. I have tried to make it work but with no luck.
For example, I tried the following But nothing works. Please any advise will be much appreciated!
df_uci['result_time1'] = df_uci['result_time'].dt.strftime('%YYYY-%dd-%mm %HH:%mm:%ss')
df_uci['result_time1'] = pd.to_datetime(df_uci['result_time1'])
df_uci['Weekday1'] = df_uci['result_time1'].dt.day_name()
Try using the pd.to_datetime() function with the format attribute. Here you can define how your data should be interpreted. (Especially, it seems your day comes before your month)
In your case this should do it:
df_uci['result_time_as_datetime'] = pd.to_datetime(df_uci['result_time'], format="%Y-%d-%m %H:%M:%S")
df_uci['Weekday1'] = df_uci['result_time_as_datetime'].dt.day_name()

Categories