I am using latest pandas 14.1 and using the to_sql method to write to a MS SQL Server 2008 v2 server. Using SQLalchemy as engine. The following dataframe with datetime objects works as expected.
#test DataFrame
df1 = pd.DataFrame(index=range(10))
df1['A'] = 'Text'
df1['date_test'] = datetime.datetime(2014,1,1)
code used to write to database:
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pymssql://XXXXXXX')
df1.to_sql('test', engine, if_exists='replace')
For business reasons the data in the database need to be date objects and not datetime. If I use:
#test DataFrame
df2 = pd.DataFrame(index=range(10))
df2['A'] = 'Text'
df2['date_test'] = datetime.date(2014,1,1) # date not datetime
the to_sql method gives a very long error message:
OperationalError: (OperationalError) (206, 'Operand type clash: datetime is incompatible
with textDB-Lib error message 206, severity 16:\nGeneral SQL Server error:
Check messages from the SQL Server.......
My first suspicion is that this might be a bug with the new created functionality in Pandas 14.1 if dates are used in method. Not sure though.
UPDATE: starting from pandas 0.15, to_sql supports writing columns of datetime.date and datetime.time (https://github.com/pydata/pandas/pull/8090, now in development version).
Support for datetime.date and datetime.time types is at the moment (0.14.1) not yet implemented (only for the datetime64 type, and datetime.datetime will be converted to that), but it should be easy to add this (there is an issue for it: https://github.com/pydata/pandas/issues/6932).
The problem is that at the moment, to_sql creates a column of text type in the database for the datetime.date column (as is done for all columns of object dtype). For this reason you get the above error message.
A possible solution for now would be to create the database yourself, and then append the dataframe to it.
Related
I have a script that loads a CSV into a pandas dataframe, cleanses the resulting table (eg removes invalid values, formats dates as dates, etc) and saves the output to a local sqlite .db file.
I then have other scripts that open that database file and perform other operations on it.
My problem is that Sqlite3 doesn't have an explicit date format: https://www.sqlite.org/datatype3.html
This means that operations on dates fail, e.g.:
df_read['Months since mydate 2'] = ( pd.to_datetime('15-03-2019') - df_read['mydate'] )
returns
TypeError: unsupported operand type(s) for -: 'Timestamp' and 'str'
How can I export my dataframe in a way which keeps track of all the data types, including dates?
I have thought of the following:
Export to another format, but what format? A proper SQL Server would be great, but I don't have access to any in this case. I'd need a format which EXPLICITLY declares the data type of each column, so CSV is not an option.
Having a small function which reconverts the columns to dates, after reading them from SQL lite. But this would mean I'd have to manually keep track of what the column dates are - it would be cumbersome and slow on large datasets.
Having another table in the SQL lite database which keeps track of which columns are dates, and what format they are in (e.g. %Y-%m-%d); this can help with the reconversion into dates, but it still feels very cumbersome, clunky and very un-pythonic.
Here is a quick example of what I mean:
import numpy as np
import pandas as pd
import sqlite3
num=int(10e3)
df=pd.DataFrame()
df['month'] = np.random.randint(1,13,num)
df['year'] = np.random.randint(2000,2005,num)
df['mydate'] = pd.to_datetime(df['year'] * 10000 + df['month']* 100 + df['month'], format ='%Y%m%d' )
df.iloc[20:30,2]=np.nan
#this works
df['Months since mydate'] = ( pd.to_datetime('15-03-2019') - df['mydate'] )
conn=sqlite3.connect("test_sqllite_dates.db")
df.to_sql('mydates',conn, if_exists='replace')
conn.close()
conn2=sqlite3.connect("test_sqllite_dates.db")
df_read=pd.read_sql('select * from mydates',conn2 )
# this doesn't work
df_read['Months since mydate 2'] = ( pd.to_datetime('15-03-2019') - df_read['mydate'] )
conn2.close()
print(df.dtypes)
print(df_read.dtypes)
As shown here (w/ sqlite writing), here (reading back from sqlite), the solution could be by creating the column type in sqlite as a datetime, so that when reading back, python will convert automatically to the datetime type.
Mind that, when you are connecting to the database, you need to give the parameter detect_types=sqlite3.PARSE_DECLTYPES
I'm trying to load a pandas dataframe to a mysql table using Sqlalchemy.
I connect using; engine = create_engine("mysql+pymysql://user:password#ip:port/db")
I am then simply running;
df.to_sql(con=engine, name='Table', if_exists='append', index=False, chunksize=10000);
I keep getting the error
AttributeError: 'Timestamp' object has no attribute 'translate'
This worked fine when I used older versions and did this via pymysql, not sqlalchemy
I can't find anything online to help, any ideas.
thanks,
I converted my timestamp object into ts.to_pydatetime() before passing it to the cursor and that solved the attribute error.
See this answer for different ways of converting timestamps:
Converting between datetime, Timestamp and datetime64
I encountered the same issue when I tried to insert a very large dataframe(df) into MySQL which has a column(df['date_']) with 'datetime64[ns]' dtype.
The date_ column is initially converted to datetime64[ns] using pd.to_datetime(df[date_]). I found that the datatype of df['date_'] is an object and not datetime64[ns]. So, I took a small chunk of data from the table to debug & found that the dtype was datetime64[ns] which is correct & even the data got inserted into the database. Hence, there were definitely some rows that failed to convert into datetime64[ns] which I believe is the reason behind the error.
However, casting this column as string(shown below) solved the issue for me.
df['date_'] = df['date_'].astype(str)
Check the datatypes of your data frame using df.dtypes and make sure your columns are the correct datatype. If you have a column that you expect to be timestamps, make sure it's datatype is datetime64[ns]. If it is object (or even something else), you can probably convert it using the pandas.to_datetime() function.
I was just having this same issue. In my case there was a column I was trying to write that contained timestamps but it's datatype was object. After conversion with df[colname] = pd.to_datetime(df[colname]) I was able to successfully write it using the pymysql driver and sqlalchemy.
I was using a parametized insert query when I encountered this issue. Converting to pd.to_datetime didnt solve the issue for me, but casting the pandas Timestamp object as a str worked well.
I encountered the same problem. Found the best solution was to just transform all timestamp-like columns as string.
df = df.apply(lambda col: col.astype(str) if col.dtype == 'datetime64[ns]' else col)
I am attempting to insert a dataframe into my postgres database using the pscycopg2 module used with sqlalchemy. The process is loading an excel file into a pandas dataframe and then inserting the dataframe into the database via the predefined table schema.
I believe these are the relevant lines of code:
post_meta.reflect(schema="users")
df = pd.read_excel(path)
table = sql.Table(table_name, post_meta, schema="users")
dict_items = df.to_dict(orient='records')
connection.execute(table.insert().values(dict_items))
I'm getting the following error:
<class 'sqlalchemy.exc.ProgrammingError'>, ProgrammingError("(psycopg2.ProgrammingError) can't adapt type 'numpy.int64'",)
All data field types in the dataframe are int64.
I can't seem to find a similar question or information regarding why this error is and what it means.
Any direction would be great.
Thanks
Looks like you're trying to insert numpy integers, and psycopg2 doesn't know how to handle those objects. You need to convert them to normal python integers first. Maybe try calling the int() function on each value... Please provide more context with code if that fails.
I also ran into this error, and then realized that I was trying to insert integer data into a SqlAlchemy Numeric column, which maps to float, not int. Changing the offending DataFrame column to float did the trick for me:
df[col] = df[col].astype(float)
Perhaps you are also trying to insert integer data into a non-integer column?
Code is pretty straight forward:
import Quandl
import sqlite3
myData = Quandl.get("DMDRN/AAPL_ALLFINANCIALRATIOS")
cnx = sqlite3.connect("APPL.db")
myData.to_sql('AAPL', cnx)
I make a call to Quandl API. It gives me a pandas dataframe.
When I try to commit the data to a SQL table I get this error
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
The index is a Timestamp.
I have tried this
1- How to write Pandas dataframe to sqlite with Index
2- Set the index to an other value + Convert numpy.datetime64 to string object in python
For the first one I still get an error binding parameter 1 and 2 isn't working.
What I should do (or what's the best way) if I want to commit the dataframe to a sqlite table and keep the date as the index.
See pandas to_sql method gives error with date column
The problem is that writing datetime64 values is not yet supported with a sqlite connection. With upcoming pandas 0.15, this bug will be fixed.
Writing the index is already supported (from pandas 0.14), and controllable with the index keyword (default of True).
You some options to solve this:
use sqlalchemy to make the connection (you need at least 0.14 for this), as this already supports writing datetime values:
import sqlalchemy
engine = sqlalchemy.create_engine('sqlite:///APPL.db')
myData.to_sql('AAPL', engine, index=True)
convert the datetime index to strings (and then you can keep using sqlite connection directly). You can do this with:
myData.index = myData.index.map(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
use pandas development version (https://github.com/pydata/pandas)
I am having an issue inserting data into a table in postgresql using psycopg2.
The script does the following:
Queries out data from a postgres datebase
Does some math using numpy
And then I would like to re-insert the date to another table in the database. Here is the code to insert the data:
cur.executemany("INSERT INTO water_level_elev (hole_name,measure_date,water_level_elev,rid) VALUES (%s,%s,%s,%s);",[(hole.tolist(),m_date.tolist(),wl.tolist(),rid.tolist(),)])
The script throws the following error:
psycopg2.ProgrammingError: column "measure_date" is of type timestamp without time zone but expression is of type timestamp without time zone[]
LINE 1: INSERT INTO water_level_elev (hole_name,measure_date,water_l...
^
HINT: You will need to rewrite or cast the expression.
I'm confused... The column "measure_date" and the data I'm trying to insert are of the same type. What's the issue?????
Thanks!
Try it without the tolist() on m_date.
It's not really possible to answer this completely without seeing the table schema for your water_level_elev table, or the source of the tolist method. However, it sounds like PostgreSQL is expecting a measure_date value that is a timestamp, but is getting a list of timestamps. That is why PostgreSQL has [] on the end of the second type in the error message. This appears to be because the code you paste calls a method named tolist on whatever is in your m_date variable, which most likely converts a single timestamp to a list of timestamps, containing the timestamp in m_date.