I am aware that this question was posted more times before but yet I have few doubts. I have a datetime.date (ex. mydate = date(2014,5,1)) and I converted this as a string, then saved in DB as a column (dtype:object) in a table. Now I wanted to change the storage of dates from text to timestamp in DB. I tried this,
Ex. My table is tab1. I read this as dataframe df in python.
# datetime to timestamp
df['X'] = pd.to_datetime(mydate)
When I check dtype in python editor df.info(), the dtype of X is datetime64[ns] but when I save this to DB in MySQL and read again as dataframe in python, the dtype changes as object. I have datatype as datetime in MySQL but I need this as timestamp datatype in MySQL. Is there any way to do it? Also, I need only date from Timestamp('2014-5-01 00:00:00') and exclude time.
The problem is that when u read the serialized value from MySQL the python MySQL connector does not convert it. you have to convert it to DateTime value after reading data from the cursor by calling your function again on retrieved data:
df['X'] = pd.to_datetime(df['col'])
As suggested, I changed the column type directly by using dtype argument in to_sql() function while inserting into the database. So, now I can have datatypes as TIMESTAMP, DATETIME and also DATE in MySQL.
such works for me:
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df['timestamp'] = df['Date'].apply(lambda x: x.timestamp())
Related
I have a datetime object that I want to INSERT in an SQL database TIMESTAMP column.
However, the procedure always fails due to formatting mismatch.
The datetime object:
date = datetime.datetime.strptime('2021-07-21 00:00:00', '%Y-%m-%d %H:%M:%S')
I tried to format it this way:
datetime.datetime.timestamp(date)
But it didn't work out, I think because the timestamp() function returns a float.
Is there a way to properly convert the datetime object to timestamp?
I have a timezone aware column in my dataframe and when I run dtypes I get the output datetime64[ns, pytz.FixedOffset(60)]
I am writing a script to ensure that the column data is definitely of type datatime64[ns] before I add it to my database.
However when I go to check the column dtype using the following if statement:
df['date'].dtypes != 'datetime64[ns, pytz.FixedOffset(60)]'
I get the following error message:
TypeError: Invalid datetime unit in metadata string "[ns, pytz.FixedOffset(60)]"
So basically... How do I confirm the dtype column in a pandas dataframe is of type datetime[*] when the column is a timezone aware?
P.S My timezone is London/UTC. I have not done any extra formatting/parsing on the column other than parse_dates in read_csv() and:
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%dT%T%z')
Is there something I am missing?
I had a similar problem and I solved this way:
isinstance(df['col_name'].dtype, pd.core.dtypes.dtypes.DatetimeTZDtype)
df['date'].dtypes is an object. You should be able to str() it and then compare:
str(df['date'].dtypes) != 'datetime64[ns, pytz.FixedOffset(60)]'
I'm trying to push data from a data frame to Google Big Query.
I set my date field of the data frame as
df['time'] = df['time'].astype('datetime64[ns]')
and I set Google's Big Query date to *DATETIME*. When I do the export from Python to GBQ, I get this error:
InvalidSchema: Please verify that the structure and data types in the
DataFrame match the schema of the destination table.
If I make everything into string format, it works. I don't think you can just set a data frame field to just date, right? Is there a clever way to get this working, or do dates have to be set as strings?
TIA.
I found data loading with date and datetime type column is not working. So I tried with datatype timestamp and then could load the data in bigquery table.
while defining schema for date columns define it as timestamp as below.
bigquery.SchemaField('dateofbirth', 'timestamp')
and convert the dataframe column datatype from object to other datetime format which bigquery can understand.
df.dateofbirth=df.dateofbirth.astype('datetime64')
as of 8-mar-2019 date and datetime column type are not working.
Changing datetime datatype to timestamp in biguery schema will give you a time value added with UTC. This might not be the ideal scenario for may of us. Rather try the below code:
job_config = bigquery.LoadJobConfig(
schema=table_schema, source_format=bigquery.SourceFormat.CSV
)
load_job = bigquery_client.load_table_from_dataframe(
dataframe, table_id, job_config=job_config
)
Is there any possibility to convert string timestamp in pyarrow table to datetime format before writing to parquet file?
Depending on the timestamp format, you can make use of pyarrow.compute.strptime function. It is not well-documented yet, but you can use something like this:
import pyarrow.compute as pc
pc.strptime(table.column("Timestamp"), format='%Y-%m-%d %H:%M:%S', unit='s')
provided your data is stored in table and "Timestamp" is the name of the column with timestamp strings.
I have a date in the format 2014-01-31 05:47.
When its read into pandas the object gets changed as object.
When i try to change it to pd.to_datetime, there is no error, but the datatype does not change to datatime.
Please suggest some way out.
T=pd.read_csv("TESTING.csv")
T['DATE']=pd.to_datetime(T['DATE'])
T.dtypes
>DATE object
T['DATE']
>2014-01-31 05:47
Basically, Pandas doesn't understand what the string "2014-01-31 05:47" is other than the fact that you gave it a string. If you read this string in from a CSV file then read the Pandas docs on the read_csv method that allows you to parse datetimes.
However, given something like this:
records = ["2014-01-31 05:47", "2014-01-31 14:12"]
df = pandas.DataFrame(records)
df.dtypes
>0 object
>dtype: object
This is because you haven't told Pandas how to parse your string into a datetime (or TimeStamp) type.
Using the pandas.to_datetime method is what you want but you must be careful to pass it only the column that has the values you want to convert. Remember that pandas won't mutate the dataframe you're working on, you need to save it again.
df[0] = pandas.to_datetime(df[0])
df.dtypes
>0 datetime64[ns]
>dtype: object
This is what you want. The cells are now the right format.
There are many ways to achieve the same thing, you could use the apply() method with a lambda, correctly parse from CSV or SQL or work with Series.