Writing Pandas dataframe to Snowflake but issue with Date column

Writing Pandas dataframe to Snowflake but issue with Date column - python

I've a Pandas dataframe with a date column (E.g., 29-11-2019). But when I'm writing the dataframe to Snowflake it's throwing an error like this:
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 100040 (22007): Date '29-11-2019' is not recognized
I've tried changing the datatype to datetime:
df['REPORTDATE'] = df['REPORTDATE'].astype('datetime64[ns]')
and I get this error:
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 100035 (22007): Timestamp '00:24.3' is not recognized
Will appreciate your help.

Snowflake is now enforcing strict date format and the date is expected as YYYY-MM-DD . Any other format is not going to be recognized and also "odd" dates like 0000-00-00 are not going to be recognized.
You can try to change the DATE_INPUT_FORMAT in session to 'dd-MM-YYYY' and see if that fixes anything. Otherwise you'd have to re-format your sorce data (my guess would be strftime("%Y/%m/%d %H:%M:%S")) if there is the hour/minute/second piece in it, but be aware that in DATE format for Snowflake these are getting truncated anyway.

Related

Unable to format timestamp in pyspark

I have a CSV data like the below:
time_value,annual_salary
5/01/2019 1:02:16,120.56
06/01/2019 2:02:17,12800
7/01/2019 03:02:18,123.00
08/01/2019 4:02:19,123isdhad
Now, I want to convert to the timestamp column. So, I created a view out of these records and tried to convert it but it throws an error:
spark.sql("select to_timestamp(time_value,'M/dd/yyyy H:mm:ss') as time_value from table")
Error:
Text '5/1/2019 1:02:16' could not be parsed

According to the error that I am seeing there, this is concerning the Date Format issue.
Text '5/1/2019 1:02:16' could not be parsed
But your time format is specified as such
'M/dd/yyyy H:mm:ss'
You can see that the day-specific is /1/ but your format is dd which expects two digits.
Please try the following format:
'M/d/yyyy H:mm:ss'

I tried your SQL no problem. It may be a problem with the spark version. I used 2.4.8

querying Elasticsearch for parse date field with format

I am querying Elasticsearch based on date, passing in a date and time string in this format yyyy-mm-dd hh:mm:ss, but Elasticsearch and DateTime are unable to accept this format.
I am writing a script that takes input and queries Elasticsearch based on those inputs, primarily by index and date-time. I've written the script using command line arguments, entering the date-time in the same format, and the script runs perfectly. However, when I try converting the script running with hardcoded inputs, the error appears:
error elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to parse date field
[2019-07-01 00:00:00] with format
[strict_date_optional_time||epoch_millis]')
#this throws the error
runQueryWithoutCommandLine("log4j-*", "2019-07-01 00:00:00", "csv", "json")
#this does not throw error
def runQueryWithCommandLine(*args):
# "yyyy-mm-dd hh:mm:ss" date-time format is given in commandline
Why is this error appearing, and how can I get rid of it? Thank you!

The Date format "strict_date_optional_time||epoch_millis" in elastic uses the ISO date format standards.
As can be seen in link above, the ISO format for string representation of date is :
date-opt-time = date-element ['T' [time-element] [offset]]
In your case, the time portion is separated by a whitespace and not the 'T' and hence the parsing error.
In addition, as I see the time mentioned is 00:00:00, you can simply omit this as this is what that's taken as default is no time portion is specified.
So, any of below date value will work:
1) 2019-07-01T00:00:00
2) 2019-07-01

Redshift COPY Statement Date load error

I am loading the data using COPY command.
My Dates are in the following format.
D/MM/YYYY eg. 1/12/2016
DD/MM/YYYY eg. 23/12/2016
My target table data type is DATE. I am getting the following error "Invalid Date Format - length must be 10 or more"

As per the AWS Redshift documentation,
The default date format is YYYY-MM-DD. The default time stamp without
time zone (TIMESTAMP) format is YYYY-MM-DD HH:MI:SS.
So, as your date is not in the same format and of different length, you are getting this error. Append the following at the end of your COPY command and it should work.
[[COPY command as you are using right now]] + DATEFORMAT 'DD/MM/YYYY'
Not sure about the single digit case though. You might want to pad the incoming values with a 0 in the beginning to match the format length.

Apache Spark Query only on YEAR from "dd/mm/yyyy" format

I have more than 1 Million records in excel file. I want to query on the Table using python, but date format is dd/mm/yyyy. I know that in MySQL the supported format is yyyy-mm-dd. I am restricted towards changing the format of date. Is there any possibility that I could do it on run-time. Just query on yyyy from dd/mm/yyyy and fetch the record.
How Do I query on such format only on Year and not on Month or Date to get data ?

Assuming the "date" is being received as a string, then RIGHT(date, 4) will give you just the year.
(I see no need to reformat the string if you only need the data. Otherwise see STR_TO_DATE()

psycopg2, inserting timestamp without time zone

I am having an issue inserting data into a table in postgresql using psycopg2.
The script does the following:
Queries out data from a postgres datebase
Does some math using numpy
And then I would like to re-insert the date to another table in the database. Here is the code to insert the data:
cur.executemany("INSERT INTO water_level_elev (hole_name,measure_date,water_level_elev,rid) VALUES (%s,%s,%s,%s);",[(hole.tolist(),m_date.tolist(),wl.tolist(),rid.tolist(),)])
The script throws the following error:
psycopg2.ProgrammingError: column "measure_date" is of type timestamp without time zone but expression is of type timestamp without time zone[]
LINE 1: INSERT INTO water_level_elev (hole_name,measure_date,water_l...
^
HINT: You will need to rewrite or cast the expression.
I'm confused... The column "measure_date" and the data I'm trying to insert are of the same type. What's the issue?????
Thanks!

Try it without the tolist() on m_date.
It's not really possible to answer this completely without seeing the table schema for your water_level_elev table, or the source of the tolist method. However, it sounds like PostgreSQL is expecting a measure_date value that is a timestamp, but is getting a list of timestamps. That is why PostgreSQL has [] on the end of the second type in the error message. This appears to be because the code you paste calls a method named tolist on whatever is in your m_date variable, which most likely converts a single timestamp to a list of timestamps, containing the timestamp in m_date.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing Pandas dataframe to Snowflake but issue with Date column - python

Related

Unable to format timestamp in pyspark

querying Elasticsearch for parse date field with format

Redshift COPY Statement Date load error

Apache Spark Query only on YEAR from "dd/mm/yyyy" format

psycopg2, inserting timestamp without time zone

Categories

Resources