Apache Spark Query only on YEAR from "dd/mm/yyyy" format - python

I have more than 1 Million records in excel file. I want to query on the Table using python, but date format is dd/mm/yyyy. I know that in MySQL the supported format is yyyy-mm-dd. I am restricted towards changing the format of date. Is there any possibility that I could do it on run-time. Just query on yyyy from dd/mm/yyyy and fetch the record.
How Do I query on such format only on Year and not on Month or Date to get data ?

Assuming the "date" is being received as a string, then RIGHT(date, 4) will give you just the year.
(I see no need to reformat the string if you only need the data. Otherwise see STR_TO_DATE()

Related

What's the correct datetime format for this string date generated by python?

I have this date example '2022-08-30T11:53:52.204219' stored in database, when I get it from database it's type is string so I wanted to convert it to a date type by using this python code
datetime.strptime('2022-08-30T11:53:52.204219', "%Y-%m-%d'T'%H:%M:%S.%f")
I also tried this one
datetime.strptime('2022-08-30T11:53:52.204219', "yyyy-MM-dd'T'HH:mm:ssZ")
But I always get this error response 'time data '2022-08-30T11:53:52.204219' does not match format "%Y-%m-%d'T'%H:%M:%S.%f'
I need help to convert this string date to an actual date
As per comment:
from datetime import datetime
print(datetime.strptime('2022-08-30T11:53:52.204219', "%Y-%m-%dT%H:%M:%S.%f"))
Result:
2022-08-30 11:53:52.204219

Date format in pandas

I am changing date format using below code, however when I extracting this dataframe in excel, this date is appearing in text format (not date format).
new_data['Expiry']=new_data['Expiry'].dt.strftime('%d-%b-%Y')
How can I change my code so that my excel should also have date format?
PS: I don't want datetime format, need only date.
new_data['Expiry']=pd.to_datetime(new_data['Expiry'], format = '%d-%m-%Y')
new_data['Expiry']=new_data['Expiry'].dt.date

How to change date format (from yyyy-MM-DD to yyyy-MM)

I'm trying to change date format from yyyy-MM-dd to yyyy-MM.
Ultimately I want to be able to sum and group by month. So far the only working solution I found is adding concat(year(join_data["firstVisit"]), lit("-"), month(join_data["firstVisit"])).alias('firstVisitMonth') in my select statement but then it return the column as a string and I can't sort it correctly.
Try date_format:
date_format(join_data["firstVisit"], 'yyyy-MM')

Python: Reading Excel and automatically turning a string into a Date object?

I'm using the openpyxl library in Python and I'm trying to read in the value of a cell. The cells value is a date in the format MM/DD/YYYY. I would like for the value to be read into my script simply as a string (i.e. "8/6/2014"), but instead Python is somehow automatically reading it as a date object (Result is "2014-08-06 00:00:00") I don't know if this is something I need to fix in Excel or Python, but how do I get the string I'm looking for?
I would suggest changing it in your Excel if you want to preserve what is being read in by openpyxl. That said, when a cell has been formatted to a date in Excel, it becomes altered to fit a specified format so you've lost the initial string format in either case.
For example, let's say that the user enters the date 1/1/2018 into a cell that is formatted MM/DD/YYYY, Excel will change the data to 01/01/2018 and you will lose the original string that was entered.
If you only care to see data of the form MM/DD/YYYY, an alternate solution would be to cast the date with date_cell.strftime("%m/%d/%Y")
I found out how to fix it with these lines of code:
dateString = str(ws.cell(row=row, column=column).value.date())
newDate = datetime.strptime(dateString, "%Y-%m-%d").strftime("%m/%d/%Y")
The string "newDate" gives me the format "8/6/2018"

Redshift COPY Statement Date load error

I am loading the data using COPY command.
My Dates are in the following format.
D/MM/YYYY eg. 1/12/2016
DD/MM/YYYY eg. 23/12/2016
My target table data type is DATE. I am getting the following error "Invalid Date Format - length must be 10 or more"
As per the AWS Redshift documentation,
The default date format is YYYY-MM-DD. The default time stamp without
time zone (TIMESTAMP) format is YYYY-MM-DD HH:MI:SS.
So, as your date is not in the same format and of different length, you are getting this error. Append the following at the end of your COPY command and it should work.
[[COPY command as you are using right now]] + DATEFORMAT 'DD/MM/YYYY'
Not sure about the single digit case though. You might want to pad the incoming values with a 0 in the beginning to match the format length.

Categories