I have a Pandas dataframe that I'm inserting into an SQL database. I'm using Psycopg2 directly to talk to the database, not SQLAlchemy, so I can't use Pandas built in to_sql functions. Almost everything works as expected except for the fact that numpy np.NaN values get converted to text as NaN and inserted into the database. They really should be treated as SQL null values.
So, I'm trying to make a custom adapter to convert np.NaN to SQL null but everything I've tried results in the same NaN strings being inserted in the database.
The code I'm currently trying is:
def adapt_nans(null):
a = adapt(None).getquoted()
return AsIs(a)
register_adapter(np.NaN, adapt_nans)
I've tried a number of variations along this theme but haven't had any luck.
The code I was trying previously fails because it assumes that np.Nan is its own type when it is actually a float. The following code, courtesy of Daniele Varrazzo on the psycopg2 mailing list, does the job correctly.
def nan_to_null(f,
_NULL=psycopg2.extensions.AsIs('NULL'),
_Float=psycopg2.extensions.Float):
if not np.isnan(f):
return _Float(f)
return _NULL
psycopg2.extensions.register_adapter(float, nan_to_null)
This answer is an alternate version of Gregory Arenius's Answer. I have replaced the conditional statement to work on any Nan value by simply checking if the value is equal to itself.
def nan_to_null(f,
_NULL=psycopg2.extensions.AsIs('NULL')
_Float=psycopg2.extensions.Float)):
if f != f:
return _NULL
else:
return _Float(f)
psycopg2.extensions.register_adapter(float, nan_to_null)
If you check if a nan value is equal to itself you will get False. The rational behind why this works is explained in detail in Stephen Canon's answer.
If you are trying to insert Pandas dataframe data into PostgreSQL and getting the error for NaN, all you have to do is:
import psycopg2
output_df = output_df.fillna(psycopg2.extensions.AsIs('NULL'))
#Now insert output_df data in the table
I believe the easiest way is:
df.where(pd.notnull(df), None)
Then None is "translated": to NULL when imported to Postgres.
Related
I recently added a new column in my BigQuery Table.
The following code snippet is used in legacy code to determine the table schema
df = gbq.read_gbq('SELECT * FROM {}.{} where 1=0'.format(BIGQUERY_DATASET_NAME, table), project_id=project_id)
But the problem is that it is not returning the newly added column in the df. Although when I use some other condition like 1=3 in where clause or limit 0 then it returns the correct schema.
Trying to understand what is causing the issue.
I'm guessing, it has to do with caching , you can always ask BQ not to use caching :
df = gbq.read_gbq('SELECT * FROM {}.{} where 1=0'.format(...)
, configuration = {'query': {'useQueryCache': False}} )
If you want to get the column names - which I assume is the point of this - and can change the legacy code, perhaps a better approach would be to get it directly from the INFORMATION_SCHEMA view.
An example would be as follows:
schema_query = f"""
SELECT column_name
FROM {BIGQUERY_DATASET_NAME}.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = '{table}'
"""
df = gbq.read_gbq(schema_query, project_id=project_id)
(if using python <3.6, revert to the .format syntax of course)
I have a hunch that this will avoid the problem you're encountering using the legacy code you're working with. I agree with the other answer that it's quite possible you're seeing cached results.
I'm writing a flask - sqlalchemy database.
I'dont understand nor do I find a solution. If I write a query, it returns the row double...
The database Class has indeed two rows, but they are different.
data = db.session.query(Class).filter_by(Class.id==1).first()
print(data)
<Class>
<Class>
Check if 'data' variable is not defined somewhere earlier in the code, resulting in two rows.
The query, as you have written it, should actually result in a TypeError (wrong use of filter_by method).
It should be either:
data = db.session.query(Class).filter_by(id=1).first()
or
data = db.session.query(Class).filter(Class.id==1).first()
I have a python script which selects some rows from a table and insert them into another table. One field has type of date and there is a problem with its data when its value is '0000-00-00'. python converts this value to None and so gives an error while inserting it into the second table.
How can I solve this problem? Why python converts that value to None?
Thank you in advance.
This is actually a None value in the data base, in a way. MySQL treats '0000-00-00' specially.
From MySQL documentation:
MySQL permits you to store a “zero” value of '0000-00-00' as a “dummy
date.” This is in some cases more convenient than using NULL values,
and uses less data and index space. To disallow '0000-00-00', enable
the NO_ZERO_DATE mode.
It seems that Python's MySQL library is trying to be nice to you and converts this to None.
When writing, it cannot guess that you wanted '0000-00-00' and uses NULL instead. You should convert it yourself. For example, this might work:
if value_read_from_one_table is not None:
value_written_to_the_other_table = value_read_from_one_table
else:
value_written_to_the_other_table = '0000-00-00'
I'm trying to update TIMESTAMP column in table with None value in python code.
It worked perfectly when using insert statement with null value.
But when using update statement, it doesn't work!!
following is test code for your understanding.
(The reason why I'm updating 'None' value is that the new value is from the other database, and I want to update the value with the new one, and some of the values are NULL.)
:1 is '20160418154000' type string in python code
but when it is 'None' value it raise exception.
INSERT INTO TEST_TABLE (ARR_TIME) VALUES(TO_TIMESTAMP(:1, 'YYYYMMDDHH24MISS'))
it works well!!
UPDATE TEST_TABLE SET ARR_TIME = TO_TIMESTAMP(:1, 'YYYYMMDDHH24MISS')
it doesn't work!!
error message : ORA-00932: inconsistent datatypes: expected - got NUMBER
I think cx_Oracle recognize the None value in python as number (0??)
and it cannot be converted to 'YYYYMMDDHH24MISS' string type.
Is there way to update NULL value in TIMESTAMP column?
Yes, there is. Unless you specify otherwise, nulls are bound as type string. You can override this, though, using the following code:
cursor.setinputsizes(cx_Oracle.TIMESTAMP)
See here for documentation:
http://cx-oracle.readthedocs.org/en/latest/cursor.html#Cursor.setinputsizes
NOTE: you could have also solved this by using this code instead:
update test_table set arr_time = :1
There is no need to convert the data using to_timestamp() as cx_Oracle can bind timestamp values directly (use datetime.datetime) and if you bind None Oracle will implicitly convert for you.
python 2.7
pyramid 1.3a4
sqlalchemy 7.3
sqlite3.7.9
from sqlite prompt > I can do:
insert into risk(travel_dt) values ('')
also
insert into risk(travel_dt) values(Null)
Both result in a new row with a null value for risk.travel_dt but when I try those travel_dt values from pyramid, Sqlalchemy gives me an error.
In the first case, I get sqlalchemy.exc.StatementError:
SQLite Date type only accepts python date objects as input
In the second case, I get Null is not defined. When I use "Null", I get the first case error
I apologize for another question on nulls: I have read a lot of material but must have missed something simple. Thanks for any help
Clemens Herschel
While you didn't provide any insight into the table definition you're using or any example code, I am guessing the issue is due to confusing NULL (the database reserved word) and None (the Python reserved word).
The error message is telling you that you need to call your SQLA methods with valid python date objects, rather than strings such as "Null" or ''.
Assuming you have a Table called risk containing a Column called travel_dt, you should be able to create a row in that table with something sort of like:
risk.insert().values(travel_dt=None)
Note that this is just a snippet, you would need to execute such a call within an engine context like that defined in the SA Docs SQL Expression Language Tutorial.