I'm trying to load a pandas dataframe to a mysql table using Sqlalchemy.
I connect using; engine = create_engine("mysql+pymysql://user:password#ip:port/db")
I am then simply running;
df.to_sql(con=engine, name='Table', if_exists='append', index=False, chunksize=10000);
I keep getting the error
AttributeError: 'Timestamp' object has no attribute 'translate'
This worked fine when I used older versions and did this via pymysql, not sqlalchemy
I can't find anything online to help, any ideas.
thanks,
I converted my timestamp object into ts.to_pydatetime() before passing it to the cursor and that solved the attribute error.
See this answer for different ways of converting timestamps:
Converting between datetime, Timestamp and datetime64
I encountered the same issue when I tried to insert a very large dataframe(df) into MySQL which has a column(df['date_']) with 'datetime64[ns]' dtype.
The date_ column is initially converted to datetime64[ns] using pd.to_datetime(df[date_]). I found that the datatype of df['date_'] is an object and not datetime64[ns]. So, I took a small chunk of data from the table to debug & found that the dtype was datetime64[ns] which is correct & even the data got inserted into the database. Hence, there were definitely some rows that failed to convert into datetime64[ns] which I believe is the reason behind the error.
However, casting this column as string(shown below) solved the issue for me.
df['date_'] = df['date_'].astype(str)
Check the datatypes of your data frame using df.dtypes and make sure your columns are the correct datatype. If you have a column that you expect to be timestamps, make sure it's datatype is datetime64[ns]. If it is object (or even something else), you can probably convert it using the pandas.to_datetime() function.
I was just having this same issue. In my case there was a column I was trying to write that contained timestamps but it's datatype was object. After conversion with df[colname] = pd.to_datetime(df[colname]) I was able to successfully write it using the pymysql driver and sqlalchemy.
I was using a parametized insert query when I encountered this issue. Converting to pd.to_datetime didnt solve the issue for me, but casting the pandas Timestamp object as a str worked well.
I encountered the same problem. Found the best solution was to just transform all timestamp-like columns as string.
df = df.apply(lambda col: col.astype(str) if col.dtype == 'datetime64[ns]' else col)
Related
I've got some timestamps in a database that are 9999-12-31 and trying to convert to parquet. Somehow these timestamps all end up as 1816-03-29 05:56:08.066 in the parquet file.
Below is some code to reproduce the issue.
file_path = "tt.parquet"
schema = pa.schema([pa.field("tt", pa.timestamp("ms"))])
table = pa.Table.from_arrays([pa.array([datetime(9999, 12, 31),], pa.timestamp('ms'))], ["tt"])
writer = pq.ParquetWriter(file_path, schema)
writer.write_table(table)
writer.close()
I'm not trying to read the data with pandas but I've tried inspecting with pandas but that ends up with pyarrow.lib.ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would result in out of bounds timestamp: error.
I'm loading the parquet files into Snowflake and get back the incorrect timestamp. I've also tried inspecting with parquet-tools but that doesn't seem to work with timestamps.
Does parquet/pyarrow not support large timestamps? How can I store the correct timestamp?
It turns out for me, it was cause I needed to set use_deprecated_int96_timestamps=False on parquet writer
It says by default it's False but I had set the flavor to 'spark' so I think it overrode it.
Thanks for help
Clearly the timestamp '9999-12-31' is being used not as a real timestamp, but as a flag for an invalid value.
If at the end of the pipeline Snowflake is seeing those as '1816-03-29 05:56:08.066', then you could just keep them as that - or re-cast them to whatever value you want them to have in Snowflake. At least it's consistent.
But if you insist that you want Python to handle the 9999 cases correctly, look at this question that solves it with use_deprecated_int96_timestamps=True:
handling large timestamps when converting from pyarrow.Table to pandas
I have a field in a data frame that is type 'object'. I tried to convert it to datetime using this.
df['dt'] = pd.to_datetime(df['dt'])
It gives me this error: TypeError: <class 'datetime.time'> is not convertible to datetime
I think the problem is that some of the dates show up like this: 00:00:00
I tried to do a replace, and just use some older date that has no significance. It ran, but nothing changed
df['dt'] = df['dt'].replace(['00:00:00'],'2000-01-01 00:00:00')
I'm basically trying to run df.to_sql to append the data frame to a table in PostgreSQL. How can I get this date-thing fixed? Thanks!!
After more and more Googling, I came up with this, and it worked fine for me.
df['dt'] = pd.to_datetime(df['dt'], errors='coerce')
I am aware that this question was posted more times before but yet I have few doubts. I have a datetime.date (ex. mydate = date(2014,5,1)) and I converted this as a string, then saved in DB as a column (dtype:object) in a table. Now I wanted to change the storage of dates from text to timestamp in DB. I tried this,
Ex. My table is tab1. I read this as dataframe df in python.
# datetime to timestamp
df['X'] = pd.to_datetime(mydate)
When I check dtype in python editor df.info(), the dtype of X is datetime64[ns] but when I save this to DB in MySQL and read again as dataframe in python, the dtype changes as object. I have datatype as datetime in MySQL but I need this as timestamp datatype in MySQL. Is there any way to do it? Also, I need only date from Timestamp('2014-5-01 00:00:00') and exclude time.
The problem is that when u read the serialized value from MySQL the python MySQL connector does not convert it. you have to convert it to DateTime value after reading data from the cursor by calling your function again on retrieved data:
df['X'] = pd.to_datetime(df['col'])
As suggested, I changed the column type directly by using dtype argument in to_sql() function while inserting into the database. So, now I can have datatypes as TIMESTAMP, DATETIME and also DATE in MySQL.
such works for me:
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df['timestamp'] = df['Date'].apply(lambda x: x.timestamp())
I am attempting to insert a dataframe into my postgres database using the pscycopg2 module used with sqlalchemy. The process is loading an excel file into a pandas dataframe and then inserting the dataframe into the database via the predefined table schema.
I believe these are the relevant lines of code:
post_meta.reflect(schema="users")
df = pd.read_excel(path)
table = sql.Table(table_name, post_meta, schema="users")
dict_items = df.to_dict(orient='records')
connection.execute(table.insert().values(dict_items))
I'm getting the following error:
<class 'sqlalchemy.exc.ProgrammingError'>, ProgrammingError("(psycopg2.ProgrammingError) can't adapt type 'numpy.int64'",)
All data field types in the dataframe are int64.
I can't seem to find a similar question or information regarding why this error is and what it means.
Any direction would be great.
Thanks
Looks like you're trying to insert numpy integers, and psycopg2 doesn't know how to handle those objects. You need to convert them to normal python integers first. Maybe try calling the int() function on each value... Please provide more context with code if that fails.
I also ran into this error, and then realized that I was trying to insert integer data into a SqlAlchemy Numeric column, which maps to float, not int. Changing the offending DataFrame column to float did the trick for me:
df[col] = df[col].astype(float)
Perhaps you are also trying to insert integer data into a non-integer column?
I am using latest pandas 14.1 and using the to_sql method to write to a MS SQL Server 2008 v2 server. Using SQLalchemy as engine. The following dataframe with datetime objects works as expected.
#test DataFrame
df1 = pd.DataFrame(index=range(10))
df1['A'] = 'Text'
df1['date_test'] = datetime.datetime(2014,1,1)
code used to write to database:
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pymssql://XXXXXXX')
df1.to_sql('test', engine, if_exists='replace')
For business reasons the data in the database need to be date objects and not datetime. If I use:
#test DataFrame
df2 = pd.DataFrame(index=range(10))
df2['A'] = 'Text'
df2['date_test'] = datetime.date(2014,1,1) # date not datetime
the to_sql method gives a very long error message:
OperationalError: (OperationalError) (206, 'Operand type clash: datetime is incompatible
with textDB-Lib error message 206, severity 16:\nGeneral SQL Server error:
Check messages from the SQL Server.......
My first suspicion is that this might be a bug with the new created functionality in Pandas 14.1 if dates are used in method. Not sure though.
UPDATE: starting from pandas 0.15, to_sql supports writing columns of datetime.date and datetime.time (https://github.com/pydata/pandas/pull/8090, now in development version).
Support for datetime.date and datetime.time types is at the moment (0.14.1) not yet implemented (only for the datetime64 type, and datetime.datetime will be converted to that), but it should be easy to add this (there is an issue for it: https://github.com/pydata/pandas/issues/6932).
The problem is that at the moment, to_sql creates a column of text type in the database for the datetime.date column (as is done for all columns of object dtype). For this reason you get the above error message.
A possible solution for now would be to create the database yourself, and then append the dataframe to it.