I am trying to write from a pandas dataframe to AWS redshift:
df_tmp_rpt = pd.read_csv('path')
df_tmp_rpt = df_tmp_rpt[df_tmp_rpt['COL'] == 'VALUE']
df_tmp_rpt = df_tmp_rpt.replace(np.nan, null, regex=True)
records = df_tmp_rpt.to_records(index=False)
for record in records:
script_insert = ScriptReader.get_script(SCRIPT_PATH).format(record)
RedshiftDataManager.run_update(script_insert, DB_CONNECTION)
Redshift expects the format ('value1','value2',null) for inserting data. That is why i try to replace all NaN with null in the dataframe. How would I achieve such thing? (I need a null value not the string 'null')
Thanks for help in advance
This is what worked for me.
df_tmp_rpt = df_tmp_rpt.where(df_tmp_rpt.notna(), None)
This will replace all the NaN values in your Dataframe to None. None is loaded as NULL in the Database. This works in MS SQL.
There is no null in Python. In AWS Redshift, a null is when a value is missing or unknown. Replacing NaN with an empty string might thus work. Consider using df_tmp_rpt.fillna(value=[None]) instead of using replace().
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas-dataframe-fillna
Related
I'm brand new in python and updating tables using sql. I would like to ask how to update certain group of values in single column using SQL. Please see example below:
id
123
999991234
235
789
200
999993456
I need to add the missing prefix '99999' to the records without '99999'. The id column has integer data type by default. I've tried the sql statement, but I have a conflict between data types that's I've tried with cast statement:
update tablename
set id = concat('99999', cast(id as string))
where id not like '99999%';
To be able to use the LIKE operator and CONCAT() function, the column data type should be a STRING or BYTE. In this case, you would need to cast the WHERE clause condition as well as the value of the SET statement.
Using your sample data:
Ran this update script:
UPDATE mydataset.my_table
SET id = CAST(CONCAT('99999', CAST(id AS STRING)) AS INTEGER)
WHERE CAST(id as STRING) NOT LIKE '99999%'
Result:
Rows were updated successfully and the table ended up with this data:
I'm in the process of learning the pandas library. My task is to download the table from the website, transform it and send it to the database - in this case to ms-access. I download the data to my DataFrame.
My problem is that selected table in one of the columns (concerning prices) has value '-'. Looking for information how to deal with it I found 3 main possibilities:
Using 'replace' character '-' to 0. However, this solution does not meet my expectations because the value '-' means no data and not it`s value equal to 0.
The replacement of '-' with an empty string - this solution will not pass, because after changes the column has the data type - float.
Replace '-' with NaN using - .replace('-',np.nan) - This possibility is closest to solving my problem, but after loading data to the access using the "pyodbc" library the replaced records have the value '1,#QNAN'. I'm betting that such a format accepts Access for NaN type, but the problem occurs when I would like to pull the average from the column using SQL:
sql SELECT AVG (nameColumns) FROM nameTable name
returns the 'Overflow' message.
Does anyone have any idea what to do with '-'? Is there any way that the numeric field after loading is just empty?
EDIT - more code:
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=PathToDB;')
cursor = conn.cursor()
for index,row in df.iterrows():
cursor.execute("INSERT INTO tableName(col1,col2,col3) VALUES (?,?,?)",
row['col1'], row['col2'],row['col3'])
conn.commit()
cursor.close()
conn.close()
EDIT 2 - more code
import pandas as pd
d ={'col1': [1,2,'-'],'col2':[5,'-',3]}
dfstack = pd.DataFrame(data=d)
dfstack.head()
dfstack = dfstack.replace("-",None)
dfstack.head()
Maybe you could replace - with the None keyword in python? I'm not sure how pyodbc works but SQL will ignore NULL values with its AVG function and pyodbc might convert None to NULL.
https://www.sqlservertutorial.net/sql-server-aggregate-functions/sql-server-avg/
You need to replace the '-' with None, that seems to convert it to NULL when inserting using pyodbc:
dfstack = dfstack.where(dfstack!='-', None)
I have a table which looks like this:
When I try to look up only the row with case_id = 5 based on pr, sr, sn, I use the following code:
SELECT case_id, dupl_cnt
FROM cases
WHERE pr = NULLIF('', '')::INT AND
sr = NULLIF('CH_REP13702.10000', '')::VARCHAR AND
sn = NULLIF('22155203912', '')::VARCHAR
However, the code above does not yield any result (empty query result). I have narrowed it down to being some sort of an issue with the "pr" value being null - when "pr" removed line is removed from the above query, it starts to work as expected. Can someone explain to me why is that happening? I am anticipating pr or sr columns at times to feature NULL values, but still have to be able to look up case_id numbers with them.
(NULLIF function is in there because it is a part of Python integration with psycopg2 module, and I have to anticipate that sometimes data entry will feature empty string for these values).
NULLIF('', '') returns [null]
that deos'nt that satisfy the pr = [null] condition because
anything = NULL returns NULL
You need to use IS NOT DISTINCT FROM instead of =
I have some data that contains NULLs, floats and the occasional Nan. I'm trying to insert this data into a MySQL database using python and MySqldb.
Here's the insert statement:
for row in zip(currents, voltages):
row = [id] + list(row)
for item in row:
sql_insert = ('INSERT INTO result(id, current, voltage)'
'VALUES(%s, "%s")')
cursor.execute(sql_insert, row)
This is the table:
CREATE TABLE langmuir_result (result_id INT auto_increment,
id INT,
current FLOAT,
voltage FLOAT,
PRIMARY KEY (result_id));
When I try to insert NaN into the table I get this error:
_mysql_exceptions.DataError: (1265, "Data truncated for column 'current' at row 1")
I want to insert the NaN values into the database as a float or a number, not a string or NULL. I've tried having the type of the column be FLOAT and DECIMAL but get the same error. How can I do this without making it a string or NULL? Is that possible?
No, it's not possible to store a NaN value in a FLOAT type columns in Mysql. Values allowed are only NULL or a number. You may solve it using some value that you don't use for NaN (maybe negatives, a big/low value)
Try converting your NaN to None since MySQL understands None and in the UI you'll see NULL
You may use pandas package isna function:
import pandas as pd # use pandas package
if pd.isna(row[column]):
values_data.append(None)
else:
values_data.append(row[column])
I have a table in SQLite3 database (using Python), Tweet Table (TwTbl) that has some values in the column geo_id. Most of the values in this column are NULL\None. I want to replace/update all NULLS in the geo_id column of TwTbl by a number 999. I am not sure about the syntax. I am trying the following query, but I am getting an error ("No such Column: None")
c.execute("update TwTbl SET geo_id = 999 where geo_id = None").fetchall()
I even tried using Null instead of None, that did not give any errors but did not do any update.
Any help will be appreciated.
As an answer, so that you can accept it if you're inclined.
You need Is Null instead of = Null. Null is a special value that's indeterminate, and neither equal nor non-equal in most database implementations.