What is the best approach to export byte/graphic data from Teradata ? For a data migration project, can someone guide me on how to export data from teradata to snowflake. I am using TPT scripts and tdload approach, however, it doesn't seem to work.
Approach that I have followed :
Convert the data from byte to ASCII using FROM_BYTES() method in Teradata. However, during the ingestion process I was not able to get it bacl to original state.
Using FROM_BYTES() method with base10/base16 to get into the desired format, however, I am facing same issue with the process.
Below is Table structure :
CREATE SET TABLE DBC.AccessRights ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO,
MAP = TD_DATADICTIONARYMAP
(
UserId BYTE(4) NOT NULL,
DatabaseId BYTE(4) NOT NULL,
TVMId BYTE(6) NOT NULL,
FieldId SMALLINT FORMAT '---,--9' NOT NULL,
AccessRight CHAR(2) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC NOT NULL,
WithGrant CHAR(1) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC NOT NULL,
GrantorID BYTE(4) NOT NULL,
AllnessFlag CHAR(1) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC NOT NULL,
CreateUID BYTE(4),
CreateTimeStamp TIMESTAMP(0),
LastAccessTimeStamp TIMESTAMP(0),
AccessCount INTEGER FORMAT '--,---,---,--9')
PRIMARY INDEX ( UserId )
PARTITION BY ( RANGE_N((ID2BIGINT(DatabaseId )) MOD 1073741824 BETWEEN 0 AND 1073741823 EACH 1 ),
RANGE_N(ID2BIGINT(TVMId ) BETWEEN 0 AND 4294967295. EACH 1 )ADD 2 );
Tpt Script :
USING CHARACTER SET UTF8
DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from a Teradata table to a unformatted file' ( DEFINE SCHEMA FILE_SCHEMA (
UserId BYTE(4),
DatabaseId BYTE(4),
TVMId BYTE(6),
FieldId SMALLINT,
AccessRight CHAR(4),
WithGrant CHAR(2),
GrantorID BYTE(4),
AllnessFlag CHAR(2),
CreateUID BYTE(4),
CreateTimeStamp TIMESTAMP(0),
LastAccessTimeStamp TIMESTAMP(0),
AccessCount INTEGER
);
DEFINE OPERATOR SQL_SELECTOR
TYPE SELECTOR SCHEMA FILE_SCHEMA ATTRIBUTES
(
VARCHAR PrivateLogName = 'selector_log',
VARCHAR TdpId = <host_id>,
VARCHAR UserName = <user_name>,
VARCHAR UserPassword = <password>,,
VARCHAR SelectStmt = 'SELECT * FROM DBC.AccessRights;',
VARCHAR LobDirectoryPath = <lob_dir>
);
DEFINE OPERATOR FILE_WRITER TYPE DATACONNECTOR CONSUMER SCHEMA * ATTRIBUTES
(
VARCHAR PrivateLogName = 'dataconnector_log',
VARCHAR DirectoryPath = <dir_path>,
VARCHAR FileName = 'file.csv',
VARCHAR FORMAT= 'BINARY',
VARCHAR OpenMode = 'Write'
);
APPLY TO OPERATOR (FILE_WRITER)
SELECT * FROM OPERATOR (SQL_SELECTOR);
);
Snowflake defaults to hex string representation for input/output of binary data. If you have TPT output DELIMITED format with a schema defined or generated as all VARCHAR fields, you won't be able to use SELECT * but you can use FROM_BYTES to generate the hex string. There are some quirks that are simple enough to work around as long as you are aware, namely
If the high order bit of the BYTE value is set the returned value will be the two's complement preceded by a negative sign
leading zeros are removed from the result string
But you can prefix a "positive, non-zero" byte, convert, and then remove the extra characters from the string.
SUBSTRING(FROM_BYTES('12'xb||DatabaseId, 'base16') FROM 3)
Related
Although I am quite new to SQL I have already used python to build DBs, but now I am stuck.
To put it simple, I have a schema with three tables, which are related to one another via foreign keys. They were created using python, as described below (not showing the definitions of c and conn, as I am pretty sure that the error does not lie there):
import sqlalchemy
import pandas as pd
# create the runsMaster table
c.execute("""CREATE TABLE IF NOT EXISTS `ngsRunStats_FK`.`runsMaster` (
`run_ID` INT NOT NULL AUTO_INCREMENT,
`run_name` VARCHAR(50) NULL,
PRIMARY KEY (`run_ID`))
ENGINE = InnoDB""")
# Create the samplesMaster table
c.execute("""CREATE TABLE IF NOT EXISTS `ngsRunStats_FK`.`samplesMaster` (
`sample_ID` INT NOT NULL AUTO_INCREMENT,
`run_ID` INT NULL,
`sample_name` VARCHAR(50) NULL,
PRIMARY KEY (`sample_ID`),
INDEX `fk_table1_runsMaster1_idx` (`run_ID` ASC),
CONSTRAINT `fk_table1_runsMaster1`
FOREIGN KEY (`run_ID`)
REFERENCES `ngsRunStats_FK`.`runsMaster` (`run_ID`)
ON DELETE CASCADE
ON UPDATE NO ACTION)
ENGINE = InnoDB""")
# Create the XYStats table
c.execute("""CREATE TABLE IF NOT EXISTS `ngsRunStats_FK`.`XYstats` (
`XYstats_ID` INT NOT NULL AUTO_INCREMENT,
`run_ID` INT NULL,
`sample_ID` INT NULL,
`X_TOTAL_COVERAGE` FLOAT NULL,
`X_TARGET_COUNT` FLOAT NULL,
`X_MEAN_TARGET_COVERAGE` FLOAT NULL,
`Y_TOTAL_COVERAGE` FLOAT NULL,
`Y_TARGET_COUNT` FLOAT NULL,
`Y_MEAN_TARGET_COVERAGE` FLOAT NULL,
`Ymeancov_Xmeancov` FLOAT NULL,
PRIMARY KEY (`XYstats_ID`),
INDEX `fk_XYstats_runsMaster_idx` (`run_ID` ASC),
INDEX `fk_XYstats_samplesMaster1_idx` (`sample_ID` ASC),
CONSTRAINT `fk_XYstats_runsMaster`
FOREIGN KEY (`run_ID`)
REFERENCES `ngsRunStats_FK`.`runsMaster` (`run_ID`)
ON DELETE CASCADE
ON UPDATE NO ACTION,
CONSTRAINT `fk_XYstats_samplesMaster1`
FOREIGN KEY (`sample_ID`)
REFERENCES `ngsRunStats_FK`.`samplesMaster` (`sample_ID`)
ON DELETE CASCADE
ON UPDATE NO ACTION)
ENGINE = InnoDB""")
Both the samplesMaster and the runsMaster table are working fine. They are automatically populated from other iterations that are not all that important for the understanding of this problem.
After a few operations, I want to extract some values from a pandas df (XY_df) and insert into the XYStats table. My pandas df looks like the following
0 1 2 3
0 X 121424.000000 64.0 1897.26000
1 Y 14.019900 4.0 3.50497
2 Ymeancov/Xmeancov 0.001847 NaN NaN
Below is the dictionary that can be obtained from the table with XY_df.to_dict()
{0: {0: 'X', 1: 'Y', 2: 'Ymeancov/Xmeancov'},
1: {0: 121424.0, 1: 14.0199, 2: 0.00184739},
2: {0: 64.0, 1: 4.0, 2: nan},
3: {0: 1897.26, 1: 3.5049699999999997, 2: nan}}
The code that I am using to populate the XYStats table is shown below:
c.execute(f"""INSERT INTO XYstats (run_ID, sample_ID, X_TOTAL_COVERAGE, X_TARGET_COUNT, X_MEAN_TARGET_COVERAGE, Y_TOTAL_COVERAGE, Y_TARGET_COUNT, Y_MEAN_TARGET_COVERAGE, Ymeancov_Xmeancov)
VALUES
('{runID}',
'{sampleID}',
'{XY_df.iloc[0,1]}',
'{XY_df.iloc[0,2]}',
'{XY_df.iloc[0,3]}',
'{XY_df.iloc[1,1]}',
'{XY_df.iloc[1,2]}',
'{XY_df.iloc[1,3]}',
'{XY_df.iloc[2,1]}'
""")
conn.commit()
But then I get
ProgrammingError: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 11
Which is not informative at all I reckon
I am quite sure that my error does not lie in
The tables creation. I have been using the runsMaster as well as the samplesMaster the way they are
The data type that I am trying to insert into the XYStats table> XY_df is a pandas data frame and what I am trying to insert (e.g. XY_df.iloc[0,3]) are numpy.float64 (type(XY_df.iloc[0,1]))
But other than that I am quite clueless on what's going on as the error message that I am getting is very vague.
The error is a syntax error in the query you are executing on SQL. You have an unclosed bracket after VALUES. All you need to do is add a closing bracket at the end of the query string and you're good to go:
c.execute(f"""INSERT INTO XYstats (run_ID, sample_ID, X_TOTAL_COVERAGE, X_TARGET_COUNT, X_MEAN_TARGET_COVERAGE, Y_TOTAL_COVERAGE, Y_TARGET_COUNT, Y_MEAN_TARGET_COVERAGE, Ymeancov_Xmeancov)
VALUES
('{runID}',
'{sampleID}',
'{XY_df.iloc[0,1]}',
'{XY_df.iloc[0,2]}',
'{XY_df.iloc[0,3]}',
'{XY_df.iloc[1,1]}',
'{XY_df.iloc[1,2]}',
'{XY_df.iloc[1,3]}',
'{XY_df.iloc[2,1]}')
""")
I am working with Python and SQLite. I am constantly getting this message
"near ")": syntax error".
I tried to add a semi-colon to all the queries but still, I get this error message.
tables.append("""
CREATE TABLE IF NOT EXISTS payment (
p_id integer PRIMARY KEY,
o_id integer NON NULL,
FOREIGN KEY(o_id) REFERENCES orders(o_id),
);"""
)
You have a comma before the final closing ). Simply remove it.
i.e. use :-
tables.append("""
CREATE TABLE IF NOT EXISTS payment (
p_id integer PRIMARY KEY,
o_id integer NON NULL,
FOREIGN KEY(o_id) REFERENCES orders(o_id)
);"""
)
Remove the comma in the end of the FOREIGN KEY(o_id) REFERENCES orders(o_id),
The working code will be:
tables.append("""
CREATE TABLE IF NOT EXISTS payment (
p_id integer PRIMARY KEY,
o_id integer NON NULL,
FOREIGN KEY(o_id) REFERENCES orders(o_id)
);"""
)
Try this:
tables = []
tables.append("""
CREATE TABLE IF NOT EXISTS payment p_id integer PRIMARY KEY,
o_id integer NON NULL FOREIGN KEY(o_id) REFERENCES orders(o_id),
""")
print(tables)
I'm trying to create an SQL database with the following fields:
connection= sqlite3.connect('Main Database')
crsr = connection.cursor()
#Creates a table for the teacher data if no table is found on the system
crsr.execute("""CREATE TABLE IF NOT EXISTS Teacher_Table(Teacher_ID INTEGER PRIMARY KEY,
TFirst_Name VARCHAR(25) NOT NULL,
TLast_Name VARCHAR (25) NOT NULL,
Gender CHAR(1) NOT NULL,
Home_Address VARCHAR (50) NOT NULL,
Contact_Number VARCHAR (14) NOT NULL);""")
connection.commit()
connection.close()
But when I input values, the gender field accepts more than one value
Database View
How can I make sure it only accepts one character for that field
How can I make sure it only accepts one character for that field
SQLite does not check the length constraints defined at type level, as is specified in the documentation on types:
(...) Note that numeric arguments in parentheses that following the type name (ex: "VARCHAR(255)") are ignored by SQLite - SQLite does not impose any length restrictions (other than the large global SQLITE_MAX_LENGTH limit) on the length of strings, BLOBs or numeric values.
So you can not enforce this at the database level. You will thus need to enforce this through your views, etc.
We can however, like #Ilja Everilä says, use a CHECK constraint:
CREATE TABLE IF NOT EXISTS Teacher_Table(
Teacher_ID INTEGER PRIMARY KEY,
TFirst_Name VARCHAR(25) NOT NULL,
TLast_Name VARCHAR (25) NOT NULL,
Gender CHAR(1) NOT NULL CHECK (length(Gender) < 2),
Home_Address VARCHAR (50) NOT NULL,
Contact_Number VARCHAR (14) NOT NULL
)
I am using Python 3.6.3 and SQLite 3.14.2. I have two tables with one having a foreign key pointin to the other one. When I run the query with join in SQlite browser, it works fine and returns the results I need. But when I try to execute the query in Python, it always return empty list. No matter how simple I make the join, the result is same. Can anyone help me? Thank you in advance.
query = '''SELECT f.ID, f.FoodItemName, f.WaterPerKilo, r.AmountInKilo FROM
FoodItems AS f INNER JOIN RecipeItems AS r on f.ID=r.FoodItemID
WHERE r.RecipeID = {:d}'''.format(db_rec[0])
print(query)
db_fooditems = cur.execute(query).fetchall() #this returns []
The Tables are as follows:
CREATE TABLE "FoodItems" (
`ID` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
`FoodItemName` TEXT NOT NULL,
`WaterPerKilo` REAL NOT NULL)
CREATE TABLE "RecipeItems" (
`ID` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
`RecipeID` INTEGER NOT NULL,
`FoodItemID` INTEGER NOT NULL,
`AmountInKilo` REAL NOT NULL)
with some random data.
I'm building a video recommendation site (think pandora for music videos) in python and MySQL. I have three tables in my db:
video - a table of of the videos. Data doesn't change. Columns are:
CREATE TABLE `video` (
id int(11) NOT NULL AUTO_INCREMENT,
website_id smallint(3) unsigned DEFAULT '0',
rating_global varchar(128) DEFAULT '0',
title varchar(256) DEFAULT NULL,
thumb_url text,
PRIMARY KEY (`id`),
KEY `websites` (`website_id`),
KEY `id` (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=49362 DEFAULT CHARSET=utf8
video_tag - a table of the tags (attributes) associated with each video. Doesn't change.
CREATE TABLE `video_tag` (
id int(7) NOT NULL AUTO_INCREMENT,
video_id mediumint(7) unsigned DEFAULT '0',
tag_id mediumint(7) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `video_id` (`video_id`),
KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=562456 DEFAULT CHARSET=utf8
user_rating - a table of good or bad ratings that the user has given each tag. Data always changing.
CREATE TABLE `user_rating` (
id int(11) NOT NULL AUTO_INCREMENT,
user_id smallint(3) unsigned DEFAULT '0',
tag_id int(5) unsigned DEFAULT '0',
tag_rating float(10,5) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `video` (`tag_id`),
KEY `user_id` (`user_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=447 DEFAULT CHARSET=utf8
Based on the user's preferences, I want to score each unwatched video, and try and predict what they will like best. This has resulted in the following massive query, which takes about 2 seconds to complete for 50,000 videos:
SELECT video_tag.video_id,
(sum(user_rating.tag_rating) * video.rating_global) as score
FROM video_tag
JOIN user_rating ON user_rating.tag_id = video_tag.tag_id
JOIN video ON video.id = video_tag.video_id
WHERE user_rating.user_id = 1 AND video.website_id = 2
AND rating_global > 0 AND video_id NOT IN (1,2,3) GROUP BY video_id
ORDER BY score DESC LIMIT 20
I desperately need to make this more efficient, so I'm just looking for advice as to what the best direction is. Some ideas I've considered:
a) Rework my db table structure (not sure how)
b) Offload more of the grouping and aggregation into Python (haven't figured out a way to join three tables that is actually faster)
c) Store the non-changing tables in memory to try and speed computation time (earlier tinkering hasn't yielded any gains yet..)
How would you recommend making this more efficient?
Thanks you!!
--
Per request in the comments, EXPLAIN SELECT.. shows:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE user_rating ref video,user_id user_id 3 const 88 Using where; Using temporary; Using filesort
1 SIMPLE video_tag ref video_id,tag_id tag_id 4 db.user_rating.tag_id 92 Using where
1 SIMPLE video eq_ref PRIMARY,websites,id PRIMARY 4 db.video_tag.video_id 1 Using where
Change the field type of the *rating_global* to a numeric type (either float or integer), no need for it to be varchar. Personally I would change all rating fields to integer, I find no need for them to be float.
Drop the KEY on id, PRIMARY KEY is already indexed.
video.id,rating_global,website_id
Watch the integer length for your references (e.g. video_id -> video.id) you may run out of numbers. These sizes should be the same.
I suggest the following 2-step solution to replace your query:
CREATE TEMPORARY TABLE rating_stats ENGINE=MEMORY
SELECT video_id, SUM(tag_rating) AS tag_rating_sum
FROM user_rating ur JOIN video_tag vt ON vt.id = ur.tag_id AND ur.user_id=1
GROUP BY video_id ORDER BY NULL
SELECT v.id, tag_rating_sum*rating_global AS score FROM video v
JOIN rating_stats rs ON rs.video_id = v.id
WHERE v.website_id=2 AND v.rating_global > 0 AND v.id NOT IN (1,2,3)
ORDER BY score DESC LIMIT 20
For the latter query to perform really fast, you could incorporate in your PRIMARY KEY in the video table fields website_id and rating_global (perhaps only website_id is enough though).
You can also use another table with these statistics and precalculate dynamically based on user login/action frequency. I am guessing you can show the cached data instead of showing live results, there shouldn't be much difference.