error inserting values to db with psycopg2 module - python

I am attempting to insert a dataframe into my postgres database using the pscycopg2 module used with sqlalchemy. The process is loading an excel file into a pandas dataframe and then inserting the dataframe into the database via the predefined table schema.
I believe these are the relevant lines of code:
post_meta.reflect(schema="users")
df = pd.read_excel(path)
table = sql.Table(table_name, post_meta, schema="users")
dict_items = df.to_dict(orient='records')
connection.execute(table.insert().values(dict_items))
I'm getting the following error:
<class 'sqlalchemy.exc.ProgrammingError'>, ProgrammingError("(psycopg2.ProgrammingError) can't adapt type 'numpy.int64'",)
All data field types in the dataframe are int64.
I can't seem to find a similar question or information regarding why this error is and what it means.
Any direction would be great.
Thanks

Looks like you're trying to insert numpy integers, and psycopg2 doesn't know how to handle those objects. You need to convert them to normal python integers first. Maybe try calling the int() function on each value... Please provide more context with code if that fails.

I also ran into this error, and then realized that I was trying to insert integer data into a SqlAlchemy Numeric column, which maps to float, not int. Changing the offending DataFrame column to float did the trick for me:
df[col] = df[col].astype(float)
Perhaps you are also trying to insert integer data into a non-integer column?

Related

Reading Data Frame in Atoti?

While reading Dataframe in Atoti using the following code error is occured which is shown below.
#Code
global_data=session.read_pandas(df,keys=["Row ID"],table_name="Global_Superstore")
#error
ArrowInvalid: Could not convert '2531' with type str: tried to convert to int64
How to solve this? Please help guys..
Was trying to read a Dataframe using atoti functions.
There are values with different types in that particular column. If you aren't going to preprocess the data and you're fine with that column being read as a string, then you should specify the exact datatypes of each of your columns (or that particular column), either when you load the dataframe with pandas, or when you read the data into a table with the function you're currently using:
import atoti as tt
global_superstore = session.read_pandas(
df,
keys=["Row ID"],
table_name="Global_Superstore",
types={
"<invalid_column>": tt.type.STRING
}
)

write_pandas from Dataframe to Snowflake insert - Can only convert 1-dimensional array values

I am getting the Can only convert 1-dimensional array values error when trying to use write_pandas from the dataframe below
Also I was assuming that my dataframe was set up appropriately, as I have followed Snowflakes documentation on using a DF to insert with write_pandas.
printed the dataframe repr:
write_pandas(connection_String, dataframe, table_name,database='SB', schema = 'SBB',quote_identifiers=False)
it is also telling me that 'Conversion failed for column SRC_XML with type object') my snowflake table is set to type variant, not sure it should be something else...or if i can convert SRC_XML to another type?
I tried for a while to reformat the first column of the dataframe to Str, and it wasnt working, finally was able to add
df['SRC_XML']=df['SRC_XML'].astype(str)
and it finally worked. I think the Object type that the dataframe column was not working with the data type variant in the snowflake table

How to efficiently setup column data types in SQL from pandas dataframe

More of a theoretical question as to the best way to set something up.
I have quite a large dataframe in pandas (roughly 330 columns) and I am hoping to transfer it into a table in SQL Server.
My current process has been to export the dataframe as a .csv and then use the Import Flat File function to first of all create the table, and then in future I have a direct connection setup in Python to interact. For smaller dataframes this has worked fine as it has been easier to change data column types and eventually get it to work.
When doing it on the larger dataframes my problem is that I am frequently getting the following message:
TITLE: Microsoft SQL Server Management Studio
Error inserting data into table. (Microsoft.SqlServer.Import.Wizard)
The given value of type String from the data source cannot be converted to type nvarchar of the specified target column. (System.Data)
String or binary data would be truncated. (System.Data)
It doesn't give me a specific column as to what is causing the problem so is there any way to more efficiently get this data in as opposed to going through each column manually?
Any help would be appreciated! Thanks
As per your query, this is in fact an issue when you are trying to write a string value into a column, the size limit is exceeded. Either you may increase the column size limit or try truncating before inserting.
Let's say column A in df is of type varchar(500), Try the following before insertion :-
df.A=df.A.apply(lambda x: str(x)[:500])
Below is the sqlalchemy alternative for the insertion.
connect_str="mssql+pyodbc://<username>:<password>#<dsnname>"
To create a connection -
engine = create_engine(connect_str)
Create the table -
from sqlalchemy import Table, MetaData, Column, Integer
m = MetaData()
t = Table('example', m,
Column('column_1', Integer),
Column('column_2', Integer)),
...)
m.create_all(engine)
Once created, do the following :-
df.to_sql('example', if_exists='append')

Python to mysql 'Timestamp' object has no attribute 'translate'

I'm trying to load a pandas dataframe to a mysql table using Sqlalchemy.
I connect using; engine = create_engine("mysql+pymysql://user:password#ip:port/db")
I am then simply running;
df.to_sql(con=engine, name='Table', if_exists='append', index=False, chunksize=10000);
I keep getting the error
AttributeError: 'Timestamp' object has no attribute 'translate'
This worked fine when I used older versions and did this via pymysql, not sqlalchemy
I can't find anything online to help, any ideas.
thanks,
I converted my timestamp object into ts.to_pydatetime() before passing it to the cursor and that solved the attribute error.
See this answer for different ways of converting timestamps:
Converting between datetime, Timestamp and datetime64
I encountered the same issue when I tried to insert a very large dataframe(df) into MySQL which has a column(df['date_']) with 'datetime64[ns]' dtype.
The date_ column is initially converted to datetime64[ns] using pd.to_datetime(df[date_]). I found that the datatype of df['date_'] is an object and not datetime64[ns]. So, I took a small chunk of data from the table to debug & found that the dtype was datetime64[ns] which is correct & even the data got inserted into the database. Hence, there were definitely some rows that failed to convert into datetime64[ns] which I believe is the reason behind the error.
However, casting this column as string(shown below) solved the issue for me.
df['date_'] = df['date_'].astype(str)
Check the datatypes of your data frame using df.dtypes and make sure your columns are the correct datatype. If you have a column that you expect to be timestamps, make sure it's datatype is datetime64[ns]. If it is object (or even something else), you can probably convert it using the pandas.to_datetime() function.
I was just having this same issue. In my case there was a column I was trying to write that contained timestamps but it's datatype was object. After conversion with df[colname] = pd.to_datetime(df[colname]) I was able to successfully write it using the pymysql driver and sqlalchemy.
I was using a parametized insert query when I encountered this issue. Converting to pd.to_datetime didnt solve the issue for me, but casting the pandas Timestamp object as a str worked well.
I encountered the same problem. Found the best solution was to just transform all timestamp-like columns as string.
df = df.apply(lambda col: col.astype(str) if col.dtype == 'datetime64[ns]' else col)

psycopg2, inserting timestamp without time zone

I am having an issue inserting data into a table in postgresql using psycopg2.
The script does the following:
Queries out data from a postgres datebase
Does some math using numpy
And then I would like to re-insert the date to another table in the database. Here is the code to insert the data:
cur.executemany("INSERT INTO water_level_elev (hole_name,measure_date,water_level_elev,rid) VALUES (%s,%s,%s,%s);",[(hole.tolist(),m_date.tolist(),wl.tolist(),rid.tolist(),)])
The script throws the following error:
psycopg2.ProgrammingError: column "measure_date" is of type timestamp without time zone but expression is of type timestamp without time zone[]
LINE 1: INSERT INTO water_level_elev (hole_name,measure_date,water_l...
^
HINT: You will need to rewrite or cast the expression.
I'm confused... The column "measure_date" and the data I'm trying to insert are of the same type. What's the issue?????
Thanks!
Try it without the tolist() on m_date.
It's not really possible to answer this completely without seeing the table schema for your water_level_elev table, or the source of the tolist method. However, it sounds like PostgreSQL is expecting a measure_date value that is a timestamp, but is getting a list of timestamps. That is why PostgreSQL has [] on the end of the second type in the error message. This appears to be because the code you paste calls a method named tolist on whatever is in your m_date variable, which most likely converts a single timestamp to a list of timestamps, containing the timestamp in m_date.

Categories