Get the closest date to a date in a mysql query - python

So I have a table in MySQL which stores a name and a date. I want to write a query that gets the closest date to a certain date I have determined. For example, I have:
x = datetime(2022, 01, 01)
This:
query = "SELECT date_ FROM set_payment7777 GROUP BY name"
mycursor.execute(query)
for cursor in mycursor:
print(cursor)
is currently printing all the dates from the table grouped by name. I want to add something to make it print all the dates for each name that is closer to the variable x.
For instance, if we have the entries in the table: "06-06-2021, James" also "06-07-2021, James" and "04-04-2021, Helen" also "05-04-2021, Helen" it should print: 06-07-2021 and 05-04-2021.
I hope you understand.

If I can understand the problem you can try this solution using it on the result of your query:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'date': ['06/06/2021', '06/07/2021', '04/04/2021', '05/04/2021' ],
'name': ['James', 'James', 'Helen', 'Helen']})
df['date'] = pd.to_datetime(df['date'])
data_point = datetime(2022, 1, 1)
df['data_diff'] = df['date'] - data_point
# abs data_diff
df['data_diff'] = df['data_diff'].abs()
# select the rows where data_diff is less than a month
df['data_end'] = df['data_diff'].apply(lambda x: x.days)

Since the comparison date is greater than all the dates in the table, you can reduce the problem to finding the greatest date for each name:
SELECT name, MAX(date)
FROM set_payment7777
GROUP BY name

Related

Trying not to hardcode date range in SQL query (Python, SQL server)

I am using Python to connect to SQL Server database and execute several 'select' type of queries that contain date range written in a particular way. All these queries have the same date range, so instead of hard-coding it, I'd prefer to have it as a string and change it in one place only when needed.
So far, I found out that I can use datetime module and the following logic to convert dates to strings:
from datetime import datetime
start_date = datetime(2020,1,1).strftime("%Y-%m-%d")
end_date = datetime(2020,1,31).strftime("%Y-%m-%d")
Example of the query:
select * from where xxx='yyy' and time between start_date and end_date
How can I make it work?
EDIT
my code:
import pyodbc
import sqlalchemy
from sqlalchemy import create_engine
from datetime import datetime
start_date = datetime(2020,1,1).strftime("%Y-%m-%d")
end_date = datetime(2020,1,31).strftime("%Y-%m-%d")
engine = create_engine("mssql+pyodbc://user:pwd#server/monitor2?driver=SQL+Server+Native+Client+11.0")
sql_query = """ SELECT TOP 1000
[mtime]
,[avgvalue]
FROM [monitor2].[dbo].[t_statistics_agg]
where place = 'Europe' and mtime between 'start_date' and 'end_date'
order by [mtime] asc;"""
df = pd.read_sql(sql_query, engine)
print(df)
Thank you all for your input, I have found the answer to make the query work. The variables should look like:
start_date = date(2020, 1, 1)
end_date = date(2020, 1, 31)
and SQL query like:
sql_query = f""" SELECT TOP 1000
[mtime]
,[avgvalue]
FROM [monitor2].[dbo].[t_statistics_agg]
where place = 'Europe' and mtime between '{start_date}' and '{end_date}'
order by [mtime] asc;"""

pandas read_sql. How to query with where clause of date field

I have a field month-year which is in datetime64[ns] format.
How do i use this field in where clause to get rolling 12 months data(past 12 months data).
Below query does not work, but I would like something that filters data for 12 months.
select * from ABCD.DEFG_TABLE where monthyear > '2019-01-01'
FYI - It is an oracle database. If i can avoid hard coding the value 2019-01-01 that would be great!!
You need to use the datetime and set the date format as below.
Just get your relative date and if you follow datetime format as YYYYMMDD, use strftime from date time with regex string as ("%Y%m%d")
import datetime
import pandas
from dateutil.relativedelta import relativedelta
query = "SELECT * FROM ng_scott.Emp"
between_first = datetime.date.today()
between_second = between_first - relativedelta(years=1)
# GET THE DATASET
dataset = pd.read_sql(query , con=engine)
# PARSE THE DATASET
filtered_dataset = dataset[(dataset['DOJ'] > between_first ) & (dataset['DOJ'] > between_second )]
print(filtered_dataset)
You can do this with pure SQL.
The following expression dynamically computes the beginning of the months 1 year ago:
add_months(trunc(sysdate, 'month'), -12)
This phrases as: take the date at the first day of the current month, and withdraw 12 months from it.
You can just use it as a filter condition:
select * from ABCD.DEFG_TABLE where monthyear > add_months(trunc(sysdate, 'month'), -12)
NB: this assumes that monthyear is of datatype date.

adding year to date format %a,%d/%m

I'm trying to scrape a table from this website:
https://www.epexspot.com/en/market-data/capacitymarket/capacity-table/
table = pd.read_html("https://www.epexspot.com/en/market-data/capacitymarket")[0]
Here's the output it gives:
I want to change the columns to the format %y-%m-%d. The columns for the above table should be
2018-09-13, 2018-10-18, 2018-12-13, 2019-03-21, 2019-05-15, 2019-06-27, 2019-09-12
Any suggestion?
By iterating over table.columns and using datetime module. Just make sure to use replace(year=2019) otherwise the default year (1900) will be used.
from datetime import datetime
table.columns = [datetime.strptime(column, '%a, %d/%m').replace(year=2019).strftime('%Y-%m-%d')
if '/' in column else column for column in table.columns]
You can use map if you don't like the long list comprehension:
def rename_col(col):
if '/' in col:
return datetime.strptime(col, '%a, %d/%m').replace(year=2019).strftime('%Y-%m-%d')
return col
table.columns = map(rename_col, table.columns)

Calculate recurring customer

I'm analyzing sales data from a shop and want to calculate the percentage of "first order customer" who turn into recurring customers in following month.
I have a DataFrame with all the orders. This includes a customer id, a date and a flag if this is his/her first order. This is my data:
import pandas as pd
data = {'Name': ['Tom', 'nick', 'krish', 'Tom'],
'First_order': [1, 1, 1, 0],
'Date' :['01-01-2018', '01-01-2018', '01-01-2018', '02-02-2018']}
df = pd.DataFrame(data)
I would now create a list of all new customers in January and a list of all recurring customers in February and inner-join them. Then I have two numbers with which I could calculate the percentage.
But I have no clue, how I could calculate this rolling for a whole year without looping over the data frame. Is there a nice pandas/python way to do so?
The goal would be to have a new dataframe with the month and the percentage of recurring customers from the previous month.
One thought would be to take all orders Jan-November and have a column "reccurr" which gives you a true/false based on if this customer ordered in the next month. Then you can take a per-month groupby with count / sum of true/falses and add a column giving the ratio.
EDIT: before this you may need to convert dates:
df.Date = pd.to_datetime(df.Date)
Then:
df['month'] = df['Date'].apply(lambda x: x.month) #this is for simplicity's sake, not hard to extend to MMYYYY
df1 = df[df.month != 12].copy() #now we select everything but Nov
df1 = df1[df1.First_order == 1].copy() #and filter out non-first orders
df1['recurr'] = df1.apply(lambda x: True if len(df[(df.month == x.month + 1)&(df.Name == x.Name)])>0 else False, axis=1) #Now we fill a column with True if it finds an order from the same person next month
df2 = df1[['month','Name','recurr']].groupby('month').agg({'Name':'count','recurr':'sum'})
At this point, for each month, the "Name" column has number of first orders and "recurr" column has number of those that ordered again the following month. A simple extra column gives you percentage:
df2['percentage_of_recurring_customer'] = (df2.recurr/df2.Name)*100
EDIT: For any number of dates, here's a clumsy solution. Choose a start date and use that year's January as month 1, and number all months sequentially after that.
df.Date = pd.to_datetime(df.Date)
start_year = df.Date.min().year
def get_month_num(date):
return (date.year-start_year)*12+date.month
Now that we have a function to convert dates, the slightly changed code:
df['month'] = df['Date'].apply(lambda x: get_month_num(x))
df1 = df[df.First_order == 1].copy()
df1['recurr'] = df1.apply(lambda x: True if len(df[(df.month == x.month + 1)&(df.Name == x.Name)])>0 else False, axis=1)
df2 = df1[['month','Name','recurr']].groupby('month').agg({'Name':'count','recurr':'sum'})
Finally, you can make a function to revert your month numbers into dates:
def restore_month(month_num):
year = int(month_num/12)+start_year #int rounds down so we can do this.
month = month_num%12 #modulo gives us month
return pd.Timestamp(str(year)+'-'+str(month)+'-1') #This returns the first of that month
df3 = df2.reset_index().copy() #removing month from index so we can change it.
df3['month_date'] = df3['month'].apply(lambda x: restore_month(x))

Pandas equivalent to sql for month date time

I have a pandas dataframe that I need to filter just like a sql query for a specific month. Everytime I run the code I want it to grab data from the previous month, no matter what the specific day is of the current month.
My SQL code is here but I need pandas equivalent.
WHERE DATEPART(m, logged) = DATEPART(m, DATEADD(m, -1, getdate()))
df = pd.DataFrame({'month': ['1-05-01 00:00:00','1-06-01 00:00:00','1-06-01 00:00:00','1-05-01 00:00:00']})
df['month'] = pd.to_datetime(df['month'])```
In this example, I only want the data from June.
Would definitely appreciate the help! Thanks.
Modifying based on the question edit:
df = pd.DataFrame({'month': ['1-05-01 00:00:00','1-06-01 00:00:00','1-06-01 00:00:00','1-05-01 00:00:00']})
df['month'] = pd.to_datetime(df['month'])
## To get it to the right format
import datetime as dt
df['month'] = df['month'].apply(lambda x: dt.datetime.strftime(x, '%Y-%d-%m'))
df['month'] = pd.to_datetime(df['month'])
## Extract the month from this date
df['month_ex'] = df.month.dt.month
## Get current month to get the latest month from the dataframe, which is the previous month of the current month
from datetime import datetime
currentMonth = datetime.now().month
newDf = df[df.month_ex == currentMonth - 1]
Output:
month month_ex
1 2001-06-01 6
2 2001-06-01 6

Categories