Invalid Date when inserting to Teradata using Python - python

I'm working on a python piece that will insert a dataframe into a teradata table using pyodbc. The error I can't get past is...
File "file.py", line 33, in <module>
cursor.execute("INSERT INTO DB.TABLE (MASDIV,TRXTYPE,STATION,TUNING_EVNT_START_DT,DOW,MOY,TRANSACTIONS)VALUESrow['MASDIV'],'trx_chtr',row['STATION'],row['TUNING_EVNT_START_DT'],row['DOW'],row['MOY'],row['TRANSACTIONS'])
pyodbc.DataError: ('22008', '[22008] [Teradata][ODBC Teradata Driver][TeradataDatabase] Invalid date supplied for Table.TUNING_EVNT_START_DT. (-2666) (SQLExecDirectW)')
To fill you in... I've got a Teradata table that I want to take a dataframe and insert it into. That table is made as.
CREATE SET TABLE DB.TABLE, FALLBACK
(PK decimal(10,0) NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 1
--MAXVALUE 2147483647
NO CYCLE),
TRXTYPE VARCHAR(10),
MASDIV VARCHAR(30),
STATION VARCHAR(50),
TUNING_EVNT_START_DT DATE format 'MM/DD/YYYY',
DOW VARCHAR(3),
MOY VARCHAR(10),
TRANSACTIONS INT,
ANOMALY_FLAG INT NOT NULL DEFAULT 1)
PRIMARY INDEX (PK);
The primary key and anomaly_flag will be automatically filled in. Below is the script that I am using and running into the error. It is reading in a csv and creating a dataframe. The first two lines (including a header) of the csv look like...
MASDIV | STATION | TUNING_EVNT_START_DT | DOW | MOY | TRANSACTIONS
Staten Island | WFUTDT4 | 9/12/18 | Wed | September | 538
San Fernando Valley | American Heroes Channel HD | 6/28/2018 | Thu | June | 12382
Here is the script that I am using...
'''
Written by Bobby October 1st, 2018
REFERENCE
https://tomaztsql.wordpkress.com/2018/07/15/using-python-pandas-dataframe-to-read-and-insert-data-to-microsoft-sql-server/
'''
import pandas as pd
import pyodbc
from datetime import datetime
#READ IN CSV TEST DATA
df = pd.read_csv('Data\\test_set.csv')
print('CSV LOADED')
#ADJUST DATE FORMAT
df['TUNING_EVNT_START_DT'] = pd.to_datetime(df.TUNING_EVNT_START_DT)
#df['TUNING_EVNT_START_DT'] =
df['TUNING_EVNT_START_DT'].dt.strftime('%m/%d/%Y')
df['TUNING_EVNT_START_DT'] = df['TUNING_EVNT_START_DT'].dt.strftime('%Y-%m-%d')
print('DATE FORMAT CHANGED')
print(df)
#PUSH TO DATABASE
conn = pyodbc.connect('dsn=ConnectR')
cursor = conn.cursor()
# Database table has columns...
# PK | TRXYPE | MASDIV | STATION | TUNING_EVNT_START_DT | DOW | MOY |
TRANSACTIONS | ANOMALY_FLAG
# PK is autoincrementing, TRXTYPE needs to be specified on insert command,
and ANOMALY_FLAG defaults to 1 for yes
for index, row in df.iterrows():
cursor.execute("INSERT INTO DLABBUAnalytics_Lab.Anomaly_Detection_SuperSet(MASDIV,TRXTYPE,STATION,TUNING_EVNT_START_DT,DOW,MOY,TRANSACTIONS)VALUES(?,?,?,?,?,?,?)", row['MASDIV'],'trx_chtr',row['STATION'],row['TUNING_EVNT_START_DT'],row['DOW'],row['MOY'],row['TRANSACTIONS'])
conn.commit()
print('RECORD ENTERED')
print('DF SUCCESSFULLY WRITTEN TO DB')
#PULL FROM DATABASE
sql_conn = pyodbc.connect('dsn=ConnectR')
query = 'SELECT * FROM DLABBUAnalytics_Lab.Anomaly_Detection_SuperSet;'
df = pd.read_sql(query, sql_conn)
print(df)
So in this I am converting the date format and trying to insert row by row into the Teradata table. The first record reads in and is in the database. The second record throws the error that is at the top. The date is 6/28/18 and I've changed it to 6/11/18 just to see if there was a mix up with day and month, but that still had the same problem. Are the columns getting off somewhere and it is trying to insert a different column's value into the date column.
Any ideas or help is much appreciated!

So the issue was in the format of the table. Initially it was built to have the MM/DD/YYYY format from the CSV, but changing it to the YYYY-MM-DD format made the script run perfectly.
Thanks!

Related

Pandas df to MySQL gives the same timestamp

I am trying to upload a CSV to MySQL by first reading the CSV using pandas's .read_csv function and then using the .to_sql function to upload it to the db table.
I have a modification_time column defined in the table schema as follows:
CREATE TABLE test_table (
id BIGINT(20) UNSIGNED NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY unique_id (id),
modification_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
insertion_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
and the code to read and upload the data is as follows:
from sqlalchemy import create_engine
import pymysql
import pandas as pd
import mysql.connector
from urllib.parse import quote
conn = mysql.connector.connect(
host='xx.x.x.xx', user='username', password='password', database='dbname')
sql = 'TRUNCATE ' + 'test_table' + ";"
cur = conn.cursor()
conn.commit()
conn.close()
engine = create_engine('mysql+pymysql://username:%s#xx.x.x.xx/dbname' % quote('password'),echo=True)
df = pd.read_csv("inputdata.csv")
df.to_sql('test_table', con = engine, if_exists='append', index=False)
The data is being uploaded fine for all the columns except the modification_time and insertion_time columns. The same timestamp is repeated for all the records in the table.
I want the different insertion timestamps for each row as they are getting uploaded one after another (I am passing None in the method parameter in the .to_sql function)
The method parameter is referred from here
Any suggestions are very much appreciated, Thanks.
The default MySQL TIMESTAMP records times with a granularity of seconds, so it's quite likely that multiple records will be inserted in the same second.
MariaDB [test]> create table tstest (
-> name varchar(4),
-> ts timestamp default current_timestamp(6)
-> );
Query OK, 0 rows affected (0.274 sec)
MariaDB [test]> insert into tstest (name) values ('a');
Query OK, 1 row affected (0.057 sec)
MariaDB [test]> select * from tstest;
+------+---------------------+
| name | ts |
+------+---------------------+
| a | 2021-12-28 09:42:26 |
+------+---------------------+
You can specify a fractional seconds value in the column description to increase the granularity of the timestamps recorded (6 is the highest value accepted in MySQL 8.0):
MariaDB [test]> create table tstest ( name varchar(4), ts timestamp(6) default current_timestamp() );
Query OK, 0 rows affected (0.247 sec)
MariaDB [test]> insert into tstest (name) values ('a');
Query OK, 1 row affected (0.040 sec)
MariaDB [test]> select * from tstest;
+------+----------------------------+
| name | ts |
+------+----------------------------+
| a | 2021-12-28 09:47:10.708227 |
+------+----------------------------+
This doesn't guarantee that you won't have collisions (and may be subject to the precision of the system clock), but it should make them more unlikely.
While this approach may help, if you want timestamps based on the order of records in your dataframe then I would consider setting them on arrival in the dataframe; relying on the SQL engine processing the records in a specific order may not be safe or portable.

Append sqlite3 data from csv to a table whose 1 column is - id INTEGER PRIMARY KEY AUTOINCREMENT

So I have a table, which has the first column called id as autoincrement.
Now, Suppose I have data in the table with ids- 1,2,3
And I also have some data in the csv that starts with id 1,2,3
This is the code that I am trying to use-
cur.execute("CREATE TABLE IF NOT EXISTS sub_features (id INTEGER PRIMARY KEY AUTOINCREMENT,featureId INTEGER, name TEXT, FOREIGN KEY(featureId) REFERENCES features(id))")
df = pd.read_csv(csv_location+'/sub_features_table.csv')
df.to_sql("sub_features", con, if_exists='append', index=False)
I am getting this error-
sqlite3.IntegrityError: UNIQUE constraint failed: sub_features.id
How do I make sure that data gets appended and the id gets set as per requirement and in case the entire row is a duplicate then it gets ignored?
To explain further, Say I have a table:
id | Name
1 | Abhisek
2 | Amit
And I am trying to import this csv to the same table:
id | Name
1 | Abhisek
2 | Rahul
Then my resultant table should be:
id | Name
1 | Abhisek
2 | Amit
3 | Rahul

How to split comma delimited values into multiple rows using Sqlite

I'm using Python and SQLite to manipulate a string in android.
I have a SQLite Table that looks like this:
| ID | Country
+----------------+-------------
| 1 | USA, Germany, Mexico
| 2 | Brazil, Canada
| 3 | Peru
I would like to split the comma delimited values of Country column and insert them into another table countries so that Countries table looks like this
| ID | Country
+----------------+-------------
| 1 | USA
| 1 | Germany
| 1 | Mexico
| 2 | Brazil
| 2 | Canada
| 3 | Peru
How do I do split the values from Country column in one table and insert them into Country column of another table?
There is no split function in SQLite.
There is of course the substring function but it's not suitable for your needs since every row could contain more than 1 commas.
If you were an expert in SQLite I guess you could create a recursive statement using substring to split each row.
If you're not use Python to read the data, split each row and write it back to the db.
You can use a recursive common table expression to split the comma-delimited column by extracting substrings of the Country column recursively.
CREATE TABLE country_split AS
WITH RECURSIVE split(id, value, rest) AS (
SELECT ID, '', Country||',' FROM country
UNION ALL SELECT
id,
substr(rest, 0, instr(rest, ',')),
substr(rest, instr(rest, ',')+1)
FROM split WHERE rest!=''
)
SELECT id, value
FROM split
WHERE value!='';
im solved
im using python
import sqlite3
db = sqlite3.connect(':memory:')
db = sqlite3.connect('mydb.db')
cursor = db.cursor()
cursor.execute("""Select * from Countries""")
all_data = cursor.fetchall()
cursor.execute("""CREATE TABLE IF NOT EXISTS Countriess
(ID TEXT,
Country TEXT)""")
for single_data in all_data:
countriess = single_data[1].split(",")
for single_country in countriess :
cursor.execute("INSERT INTO Countriess VALUES(:id,:name)", { "id": single_data[0], "name": single_country })
db.commit()
and after use sqlite db another project; :)

Inserting a predetermined datetime value into SQL table

Question: how do I insert a datetime value into MS SQL server, given the code below?
Context:
I have a 2-D list (i.e., a list of lists) in Python that I'd like to upload to a table in Microsoft SQL Server 2008. For this project I am using Python's pymssql package. Each value in each list is a string except for the very first element, which is a datetime value.
Here is how my code reads:
import pymssql
db_connect = pymssql.connect( # these are just generic names
server = server_name,
user = db_usr,
password = db_pwd,
database = db_name
)
my_cursor = db_connect.cursor()
for individual_list in list_of_lists:
# the first value in the paranthesis should be datetime
my_cursor.execute("INSERT INTO [DB_Table_Name] VALUES (%s, %s, %s, %s, %s, %s, %s, %s)", tuple(individual_list))
db_connect.commit()
The python interpreter is having a tough time inserting my datetime values. I understand that currently I have %s and that it is a string formatter, but I'm unsure what I should use for datetime, which is what the database's first column is formatted as.
The "list of lists" looks like this (after each list is converted into a tuple):
[(datetime.datetime(2012, 4, 1), '1', '4.1', 'hip', 'A1', 'J. Smith', 'B123', 'XYZ'),...]
Here is an illustration of what the table should look like:
+-----------+------+------+--------+-------+-----------+---------+---------+
| date | step | data | type | ID | contact | notif. | program |
+-----------+------+------+--------+-------+-----------+---------+---------+
|2012-04-01 | 1 | 4.1 | hip | A1 | J. Smith | B123 | XYZ |
|2012-09-05 | 2 | 5.1 | hip | A9 | B. Armst | B123 | ABC |
|2012-01-16 | 5 | 9.0 | horray | C6 | F. Bayes | P995 | XYZ |
+-----------+------+------+--------+-------+-----------+---------+---------+
Thank you in advance.
I would try formatting the date time to "yyyymmdd hh:mm:ss" before inserting. With what you are doing SQL will be parsing the string so I would also build the entire string and then insert the string as a variable. See below
for individual_list in list_of_lists:
# the first value in the parentheses should be datetime
date_time = individual_list[0].strftime("%Y%m%d %H:%M:%S")
insert_str = "INSERT INTO [DB_Table_Name] VALUES (" + str(date_time) + "),(" + str(individual_list[1]) + ");"
print insert_str
my_cursor.execute(insert_str)
db_connect.commit()
I apologize for the crude python but SQL should like that insert statement as long as all the fields match up. If not you may want to specify what fields those values go to in your insert statement.
Let me know if that works.

Storing the data from text file into mysql table

I have a text file and a MySQL table. the text file look like below.
new.txt
apple| 3
ball | 4
cat | 2
like this. from this text file I want to store data in the below MySQL table.
| query | count | is_prod_ready | time_of_created | last_updated |
I want to store apple,ball,cat in the column query, and all the number 3,4,2 in count column. in is_prod_ready column will be false by default, in time_of_created will take the current time. and last_updated_column will take the update time.
I have already made the table. and i am not able to store all the data into database from the text file. i have tried the below code.
import MySQLdb
con = MySQLdb.connect(host="localhost",user="root",passwd="9090547207",db="Test")
cur = con.cursor()
query = 'load data local infile "new.txt" into table data field terminated by "|" lines terminated by "\n" '
cur.execute(query)
con.commit()
con.close()
here my data base name is Test and table name is data.

Categories