I have some data that contains NULLs, floats and the occasional Nan. I'm trying to insert this data into a MySQL database using python and MySqldb.
Here's the insert statement:
for row in zip(currents, voltages):
row = [id] + list(row)
for item in row:
sql_insert = ('INSERT INTO result(id, current, voltage)'
'VALUES(%s, "%s")')
cursor.execute(sql_insert, row)
This is the table:
CREATE TABLE langmuir_result (result_id INT auto_increment,
id INT,
current FLOAT,
voltage FLOAT,
PRIMARY KEY (result_id));
When I try to insert NaN into the table I get this error:
_mysql_exceptions.DataError: (1265, "Data truncated for column 'current' at row 1")
I want to insert the NaN values into the database as a float or a number, not a string or NULL. I've tried having the type of the column be FLOAT and DECIMAL but get the same error. How can I do this without making it a string or NULL? Is that possible?
No, it's not possible to store a NaN value in a FLOAT type columns in Mysql. Values allowed are only NULL or a number. You may solve it using some value that you don't use for NaN (maybe negatives, a big/low value)
Try converting your NaN to None since MySQL understands None and in the UI you'll see NULL
You may use pandas package isna function:
import pandas as pd # use pandas package
if pd.isna(row[column]):
values_data.append(None)
else:
values_data.append(row[column])
Related
I create a table with primary key and autoincrement.
with open('RAND.xml', "rb") as f, sqlite3.connect("race.db") as connection:
c = connection.cursor()
c.execute(
"""CREATE TABLE IF NOT EXISTS race(RaceID INTEGER PRIMARY KEY AUTOINCREMENT,R_Number INT, R_KEY INT,\
R_NAME TEXT, R_AGE INT, R_DIST TEXT, R_CLASS, M_ID INT)""")
I want to then insert a tuple which of course has 1 less number than the total columns because the first is autoincrement.
sql_data = tuple(b)
c.executemany('insert into race values(?,?,?,?,?,?,?)', b)
How do I stop this error.
sqlite3.OperationalError: table race has 8 columns but 7 values were supplied
It's extremely bad practice to assume a specific ordering on the columns. Some DBA might come along and modify the table, breaking your SQL statements. Secondly, an autoincrement value will only be used if you don't specify a value for the field in your INSERT statement - if you give a value, that value will be stored in the new row.
If you amend the code to read
c.executemany('''insert into
race(R_number, R_KEY, R_NAME, R_AGE, R_DIST, R_CLASS, M_ID)
values(?,?,?,?,?,?,?)''',
sql_data)
you should find that everything works as expected.
From the SQLite documentation:
If the column-name list after table-name is omitted then the number of values inserted into each row must be the same as the number of columns in the table.
RaceID is a column in the table, so it is expected to be present when you're doing an INSERT without explicitly naming the columns. You can get the desired behavior (assign RaceID the next autoincrement value) by passing an SQLite NULL value in that column, which in Python is None:
sql_data = tuple((None,) + a for a in b)
c.executemany('insert into race values(?,?,?,?,?,?,?,?)', sql_data)
The above assumes b is a sequence of sequences of parameters for your executemany statement and attempts to prepend None to each sub-sequence. Modify as necessary for your code.
I've read answers that do something similar but not exactly what I'm looking for, which is: attempting to insert a row with a NULL value in a column will result instead in that column's DEFAULT value being inserted.
I'm trying to process a large number of inserts in the mySQL Python connector with a large number of column values that I don't want to deal with individually, and none of the typical alternatives work here. Here is a sketch of my code:
qry = "INSERT INTO table (col1, col2, ...) VALUES (%s, %s, ...)"
row_data_dict = defaultdict(lambda : None, {...})
params = []
for col in [col1, col2, ...]:
params.append(row_data_dict[col])
cursor.execute(qry, tuple(params))
My main problem is that setting None as the default in the dictionary results in either NULL being inserted or an error if I specify the row as NOT NULL. I have a large number of columns that might change in the future so I'd want to avoid setting different 'default' values for different entries if at all possible.
I can't do the typical way of inserting DEFAULT by skipping over columns on the insert because while those columns might have the DEFAULT value, I can't guarantee it and considering I'm doing a large number of inserts I don't want to change the query string each time I insert depending on if it's default or not.
The other way of inserting DEFAULT seems to be to have DEFAULT as one of the parameters (e.g. INSERT INTO table (col1,...) VALUES (DEFAULT,...)) but in my case setting the default in the dictionary to 'DEFAULT' results in error (mySQL complains about it being an incorrect integer value on trying to insert into an integer column, making it seem like it's interpreting the default as a string and not a keyword).
This seems like it would be a relatively common use case, so it kind of shocks me that I can't figure out a way to do this. I'd appreciate any way to do this or get around it that I haven't already listed here.
EDIT: All the of the relevant columns are already labeled with a DEFAULT value, it doesn't seem to actually replace NULL (or python's None) when it's inserted.
EDIT 2: The reason why I want to avoid NULL so badly is because NULL != NULL and I want to have unique rows, so that if there's one row (1, 2, 3, 'Unknown'), INSERT IGNORE'ing a row (1, 2, 3, 'Unknown') won't insert it. With NULL you end up with a bunch of copies of the same record because one of the values is unknown.
You can use the DEFAULT() function in the VALUES list to specify that default value for the column should be used. And you can put this in an IFNULL() call so it will be used when the supplied value is NULL.
qry = """INSERT INTO table (col1, col2, ...)
VALUES (IFNULL(%s, DEFAULT(col1)), IFNULL(%s, DEFAULT(col2)), ...)"""
Welcome to Stackoverflow. What you need to do is in your database add a default value for the column you want to have the default value. When you create your table just use DEFAULT and then the value after you create the column in the table, like this:
CREATE TABLE `yourTable` (`id` INT DEFAULT 0, .....)
if you have already created the table and you need to alter the existing column, you would do something like this:
ALTER TABLE `yourTable` MODIFY `id` INT DEFAULT 0
so in your insert statement coming from python, as long as you pass in either NULL or Nothing for the value of that column then when the row is inserted into your database, the default value will be populated for that column
Another thing to keep in mind is that you have to pass in the proper number of values when you have a default set up for a column. Say you have a table with 3 columns, we'll call them colA, colB and colC.
if you want to insert a row with colA_value for colA, nothing for colB so it will use it's default value and colC_value for colC then you need to still pass in 3 values that will be used for your insert. If you just passed in colA_value and colC_value, then colA will get colA_value and colB will get colC_value and colC will be null. you need to pass in values that will be interpreted by MySQL like this:
INSERT INTO `yourTable` (`colA`, `colB`, `colC`)
VALUES
('colA_value', null, 'colC_value')
even though you are not passing in anything for colB you need to pass a null value from your python program by either passing null or None to MySQL for the value for colB in order to get colB to be populated with it's default value
if you only pass in 2 values to MySQL to insert a row in your table, the insert statement under the hood will look like this:
INSERT INTO `yourTable` (`colA`, `colB`, `colC`)
VALUES
('colA_value', 'colC_value')
which would result in colA getting set to colA_value, colB getting set to colC_value and colC being left as null
if you are passing in the right number of values to be inserted into MySQL (that would mean you need to include null or None for the value to be inserted into the column with the default value) than that is another story. Please let me know if you are passing in the right number of values so I can help you troubleshoot further if needed.
I am trying to write from a pandas dataframe to AWS redshift:
df_tmp_rpt = pd.read_csv('path')
df_tmp_rpt = df_tmp_rpt[df_tmp_rpt['COL'] == 'VALUE']
df_tmp_rpt = df_tmp_rpt.replace(np.nan, null, regex=True)
records = df_tmp_rpt.to_records(index=False)
for record in records:
script_insert = ScriptReader.get_script(SCRIPT_PATH).format(record)
RedshiftDataManager.run_update(script_insert, DB_CONNECTION)
Redshift expects the format ('value1','value2',null) for inserting data. That is why i try to replace all NaN with null in the dataframe. How would I achieve such thing? (I need a null value not the string 'null')
Thanks for help in advance
This is what worked for me.
df_tmp_rpt = df_tmp_rpt.where(df_tmp_rpt.notna(), None)
This will replace all the NaN values in your Dataframe to None. None is loaded as NULL in the Database. This works in MS SQL.
There is no null in Python. In AWS Redshift, a null is when a value is missing or unknown. Replacing NaN with an empty string might thus work. Consider using df_tmp_rpt.fillna(value=[None]) instead of using replace().
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas-dataframe-fillna
I'm in the process of learning the pandas library. My task is to download the table from the website, transform it and send it to the database - in this case to ms-access. I download the data to my DataFrame.
My problem is that selected table in one of the columns (concerning prices) has value '-'. Looking for information how to deal with it I found 3 main possibilities:
Using 'replace' character '-' to 0. However, this solution does not meet my expectations because the value '-' means no data and not it`s value equal to 0.
The replacement of '-' with an empty string - this solution will not pass, because after changes the column has the data type - float.
Replace '-' with NaN using - .replace('-',np.nan) - This possibility is closest to solving my problem, but after loading data to the access using the "pyodbc" library the replaced records have the value '1,#QNAN'. I'm betting that such a format accepts Access for NaN type, but the problem occurs when I would like to pull the average from the column using SQL:
sql SELECT AVG (nameColumns) FROM nameTable name
returns the 'Overflow' message.
Does anyone have any idea what to do with '-'? Is there any way that the numeric field after loading is just empty?
EDIT - more code:
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=PathToDB;')
cursor = conn.cursor()
for index,row in df.iterrows():
cursor.execute("INSERT INTO tableName(col1,col2,col3) VALUES (?,?,?)",
row['col1'], row['col2'],row['col3'])
conn.commit()
cursor.close()
conn.close()
EDIT 2 - more code
import pandas as pd
d ={'col1': [1,2,'-'],'col2':[5,'-',3]}
dfstack = pd.DataFrame(data=d)
dfstack.head()
dfstack = dfstack.replace("-",None)
dfstack.head()
Maybe you could replace - with the None keyword in python? I'm not sure how pyodbc works but SQL will ignore NULL values with its AVG function and pyodbc might convert None to NULL.
https://www.sqlservertutorial.net/sql-server-aggregate-functions/sql-server-avg/
You need to replace the '-' with None, that seems to convert it to NULL when inserting using pyodbc:
dfstack = dfstack.where(dfstack!='-', None)
I have a table in SQLite3 database (using Python), Tweet Table (TwTbl) that has some values in the column geo_id. Most of the values in this column are NULL\None. I want to replace/update all NULLS in the geo_id column of TwTbl by a number 999. I am not sure about the syntax. I am trying the following query, but I am getting an error ("No such Column: None")
c.execute("update TwTbl SET geo_id = 999 where geo_id = None").fetchall()
I even tried using Null instead of None, that did not give any errors but did not do any update.
Any help will be appreciated.
As an answer, so that you can accept it if you're inclined.
You need Is Null instead of = Null. Null is a special value that's indeterminate, and neither equal nor non-equal in most database implementations.