How to get today - “6 months” date in PySpark(SQL) [duplicate] - python

This question already has answers here:
is there any pyspark function for add next month like DATE_ADD(date, month(int type))
(2 answers)
Closed 4 years ago.
I have the table which is updated every day. I use this table for analysis. I want to have a static window of 6 months data as input for analysis.
I know I can make a filter like this in SQL to have 6 months data every time I run the code.
date >= dateadd(mm, -6, getdate())
Can somebody suggest how I can carry on the same action in PySpark. I can only think of this:
df.filter(col("date") >= date_add(current_date(), -6)))
Thanks in advance!

date_add will add or subtract a number of days, in this case use add_months instead:
import pyspark.sql.functions as F
df.filter(F.col("date") >= F.add_months(F.current_date(), -6)))

Related

How can I have the exact date for a certain week of the year? [duplicate]

This question already has answers here:
Pandas: How to create a datetime object from Week and Year?
(4 answers)
Closed 4 months ago.
I have the following data:
week = [202001, 202002, 202003, ..., 202052]
Where the composition of the variable is [year - 4 digits] + [week - 2 digits] (so, the first row means it's the first week of 2020, and so on).
I want to transform this, to a date-time variable [YYYY - MM - DD]. I'm not sure what day could fit in this format :( maybe the first saturday of every week.
week_date = [2020-01-04, 2020-01-11, 2020-01-18, ...]
It seems like a simple sequence, neverthless I have some missings values on the data, so my n < number of weeks of 2020.
The main purpose of this conversion is that I can have a fit model to train in prophet. I also think I need no missing values when incorporating the data into prophet, so maybe the answer could be also adding 0 to my time series?
Any ideas? Thanks
Try:
l = [202001, 202002, 202003, 202052]
out [datetime.datetime.fromisocalendar(int(x[:4]), int(x[4:]), 6).strftime("%Y-%m-%d") for x in map(str,l)]
print(out)
outputs:
['2020-01-04', '2020-01-11', '2020-01-18', '2020-12-26']
Here I used 6 as the week day but chose as you want
This makes a datetime object from the first and last part of each number after mapping them to a string, then outputs a string back with strftime and the right format.

How to parse an 8 digit number as a date? [duplicate]

This question already has answers here:
How to convert integer into date object python?
(4 answers)
How to convert a specific integer to datetime in python
(1 answer)
int to datetime in Python [duplicate]
(1 answer)
Closed 1 year ago.
I need to build a way to automate file names, and I was curious as to if there is a function or a quick way to take in 8 digits and output the corresponding Date.
Specifically I only need the Month and year.
Example: 03232021 -> Mar2021
I was trying pandas to_datetime but it didn't seem like it was what I needed.
In [133]: s = "03232021"
In [134]: dt.datetime.strptime(s, "%m%d%Y").date()
Out[134]: datetime.date(2021, 3, 23)

Remove "days 00:00:00"from dataframe [duplicate]

This question already has answers here:
Pandas Timedelta in Days
(5 answers)
Closed 3 years ago.
So, I have a pandas dataframe with a lot of variables including start/end date of loans.
I subtract these two in order to get their difference in days.
The result I get is of the type i.e. 349 days 00:00:00.
How can I keep only for example the number 349 from this column?
Check this format,
df['date'] = pd.to_timedelta(df['date'], errors='coerce').days
also, check .normalize() function in pandas.

How to use Pandas calculate # of weeks/days? [duplicate]

This question already has answers here:
Pandas Datetime: Calculate Number of Weeks Between Dates in Two Columns
(2 answers)
Closed 4 years ago.
Hi, I have a dataframe with date columns. I want to add a column to calculate how many weeks since the contact? For example, today's date is 20-Sep-18, and use this date to calculate with the column.
Can anyone help me with this questions? Thanks!
You can do like this.
df['Contact Date']= pd.to_datetime(df['Contact Date'])
import datetime
df['How Many days'] = datetime.datetime.now() - df['Contact Date']

Finding the day x years,months from a given time (Python) [duplicate]

This question already has answers here:
python getting weekday from an input date
(2 answers)
Closed 9 years ago.
Have a maths question here which I know to solve using only pen and paper. Takes a while with that approach, mind you. Does anybody know to do this using Python? I've done similar questions involving "dates" but none involving "days". Any of you folk able to figure this one out?
The date on 25/11/1998 is a Wednesday. What is the day on 29/08/2030?
Can anyone at least suggest an algorithm?
Cheers
Use the wonderful datetime module:
>>> import datetime
>>> mydate = datetime.datetime.strptime('29/08/2030', '%d/%M/%Y')
>>> print mydate.strftime('%A')
Tuesday
The algorithm/math is quite simple: There are always 7 days a week. Just calculate how many days between the two days, add it to the weekday of the given day then mod the sum by 7.
<!-- language: python -->
> from datetime import datetime
> given_day = datetime(1998,11,25)
> cal_day = datetime(2030,8,29)
> print cal_day.weekday()
3
> print (given_day.weekday() + (cal_day-given_day).days) % 7
3

Categories