Inserting arrays into databases

Inserting arrays into databases - python

I am trying to write a general function that will insert a line of data into a table in a database but I am trying to write an array of unknown length. I am aiming to just be able to call this function in any programand write a line of data of any length to the table (assuming the table and the array are the same length.
I have tried adding the array like it is a singular peice of data.
import sqlite3
def add2Db(dbName, tableName, data):
connection = sqlite3.connect(dbName)
cur = connection.cursor()
cur.execute("INSERT INTO "+ tableName +" VALUES (?)", (data))
connection.commit()
connection.close()
add2Db("items.db", "allItems", (1, "chair", 5, 4))
This just crashes and gives me an error saying it has 4 columns but only one value was supplied.

SQLite does not support arrays - you have to convert to a TEXT using ','.join() to join your array items into a single string and pass that.
Source: SQLite website
https://www.sqlite.org/datatype3.html

I'm not a Python programmer, but I've been doing SQL a long time. I even wrote my own ORM. My advice is do not write your own SQL query builder. There's a myriad of subtle issues and especially security issues. I elaborate on a few of them below.
Instead, use a well-established SQL Query Builder or ORM. They've already dealt with these issues. Here's an example using SQLAlchemy.
from datetime import date
from sqlalchemy import create_engine, MetaData
# Connect to the database with debugging on.
engine = create_engine('sqlite:///test.sqlite', echo=True)
conn = engine.connect()
# Read the schemas from the database
meta = MetaData()
meta.reflect(bind=engine)
# INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)
users = meta.tables['users']
conn.execute(
users.insert().values(name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US")
)
SQLAlchemy can do the entire range of SQL operations and will work with different SQL variants. You also get type safety.
conn.execute(
users.insert().values(name="Yarrow Hock", birthday="in the past", state="NY", country="US")
)
sqlalchemy.exc.StatementError: (exceptions.TypeError) SQLite Date type only accepts Python date objects as input. [SQL: u'INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)']
insert into table values (...) relies on column definition order
This relies on the order columns were defined in the schema. This leaves two problems. First is a readability problem.
add2Db(db, 'some_table', (1, 39, 99, 45, 'papa foxtrot', 0, 42, 0, 6)
What does any of that mean? A reader can't tell. They have to go digging into the schema and count columns to figure out what each value means.
Second is a maintenance problem. If, for any reason, the schema is altered and the column order is not exactly the same, this can lead to some extremely difficult to find bugs. For example...
create table users ( name text, birthday date, state text, country text );
vs
create table users ( name text, birthday date, country text, state text );
add2Db(db, 'users', ('Yarrow Hock', date(1977, 1, 23), 'NY', 'US'));
That insert will silently "work" with either column order.
You can fix this by passing in a dictionary and using the keys for column names.
add2Db(db, 'users', (name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US"));
Then we'd produce a query like:
insert into users
(name, birthday, state, country)
values (?, ?, ?, ?)
This leads to the next and much bigger problem.
SQL Injection Attack
Now this opens up a new problem. If we simply stick the table and column names into the query that leaves us open to one of the most common security holes, a SQL Injection Attack. That's where someone can craft a value which when naively used in a SQL statement causes the query to do something else. Like Little Bobby Tables.
While the ? protects against SQL Injection for values, it's still possible to inject via the column names. There's no guarantee the column names can be trusted. Maybe they came from the parameters of a web form?
Protecting table and column names is complicated and easy to get wrong.
The more SQL you write the more likely you're vulnerable to an injection attack.
You have to write code for everything else.
Ok, you've done insert. Now update? select? Don't forget about subqueries, group by, unions, joins...
If you want to write a SQL query builder, cool! If, instead, you have a job to do using SQL, writing yet another SQL query builder is not your job.
It's harder for anyone else to understand.
There's a good chance that any given Python programmer knows how SQLAlchemy works, and there's plenty of tutorials and documentation if they don't. There's no chance they know about your home-rolled SQL functions, and you have to write all the tutorials and docs.

You shouldn't try to write your own ORMs without an argumented need. You will have a lot of problems, for example here's quick 25 reasons not to.
Instead use any popular orm that is proven. I recommend using SQLAlchemy as a go to outside of Django. Using it you can map a dict of values to insert it into a model just like insert(schema_name).values(**dict_name) (here's an example of insert/update).

Change your function to this:
def add2Db(dbName, tableName, data):
num_qs = len(data)
qm = ','.join(list('?' * num_qs))
query = """
INSERT INTO {table}
VALUES ({qms})
""".format(table=tableName,
qms=qm)
connection = sqlite3.connect(dbName)
cur = connection.cursor()
cur.execute(query, data)
connection.commit()
connection.close()

Related

2 questions: Importing data from MySQL data base to Python

Q1. My database contains 3 columns: time, value A and value B. The time data is written in the form 00:00:00 and the increment is 1 minute.
When I try to import data ...
cursor.execute (f"SELECT * FROM trffc_int_data.{i};")
instead getting (00:00:00, A, B), I get
(datetime.timedelta(0), 7, 2), (datetime.timedelta(seconds=60), 8, 5), .....
I suppose Python doesn't convert the time right. Any suggestions?
Q2. I have an initial database with the data mentioned above. I need to get the data from the initial database, convert it, and save it to another database.
I'm stuck at a point where data should be saved to a new table.
Here are the sections of the code...
# Creating new DB
NewDB = input(" :: Enter the Database name : ")
sqlsynt = f"CREATE DATABASE IF NOT EXISTS {NewDB}"
cursor.execute(sqlsynt,NewDB)
stdb.commit()
# Creating table and writing the data
cursor.execute (f"USE {NewDB}")
sqlsynt = f"CREATE TABLE {dayinweek} (time TIME, Vehicles INT(3), Pedestrians INT(3))"
cursor.execute (sqlsynt, NewDB, dayinweek)
#stdb.commit()
sqlsyntax = f"INSERT INTO {NewDB}.{dayinweek} (time, Vehicles, Pedestrians) VALUES (%s, %s, %s)"
cursor.executemany(sqlsyntax, temp_list_day)
The program stucks on the last line saying that there is no table 1 in NewDB!
mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'test001.1' doesn't exist
What's wrong with the code? Maybe the problem is in mixing f and %s formating?
Thanks in advance

If I am followin this correctly, you are creating a table called 1. Digit-only identifiers are not allowed in MySQL, unless the identifier is quoted, as explained in the documentation.
Identifiers may begin with a digit but unless quoted may not consist solely of digits.
Your create table statement did fail, but you did not notice that error until you tried to insert.
You could quote the table name, using backticks:
CREATE TABLE `{dayinweek}` (time TIME, Vehicles INT(3), Pedestrians INT(3))
And then:
INSERT INTO `{NewDB}`.`{dayinweek}` (time, Vehicles, Pedestrians) VALUES (%s, %s, %s)
Quoting the database name may also be a good idea: the same rules apply as for table names (and this is user input to start with).
But overall, changing the table name seems like a better option, as this makes for cleaner code: how about something like table1 for example - or better yet, a table name that is more expressive on what kind of data it contains, such as customer1, or sales1.
Note: your code is open to SQL injection, as you are passing user input directly to the database in a create database statement. Obviously such information cannot be parameterized, however I would still recommend performing a minimal sanity check on application side beforehand.

Too many server roundtrips w/ psycopg2

I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.
My procedure now:
get everything I need from the metadataDatabase.
In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).
Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table.
I think that instead of me altering (which might be not so clever approach), I should do something like the following:
Note that this is just a stupid but valid example:
for table in tables:
con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen date", "valid_from timestamp"))
But it gives me this error (it seems like it reads the table name as a string in a string..):
psycopg2.errors.SyntaxError: syntax error at or near "'billing'"
LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...

Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.
from psycopg2 import sql
# CREATE TABLE
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table")))
# ALTER TABLE
items = [("last_seen", "date"), ("valid_from", "timestamp")]
query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
for item in items:
# KEEP IDENTIFIER PLACEHOLDERS
final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"),
col = sql.Identifier(item[0]))
Alternatively, use str.join with list comprehension for one CREATE TABLE:
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
"id" serial,
{vals}
)"""
items = [("last_seen", "date"), ("valid_from", "timestamp")]
val = ",\n ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
# KEEP IDENTIFIER PLACEHOLDERS
pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items],
shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"))
cur.execute(final_query)
SQL (sent to database)
CREATE TABLE IF NOT EXISTS "tester"."table" (
"id" serial,
"last_seen" date,
"valid_from" timestamp
)

However, this becomes heavy as there are too many server roundtrips.
How many tables with how many columns are you creating that this is slow? Could you ssh to a machine closer to your server and run the python there?
I don't get that error. Rather, I get an SQL syntax error. A values list is for conveying data. But ALTER TABLE is not about data, it is about metadata. You can't use a values list there. You need the names of the columns and types in double quotes (or no quotes) rather than single quotes. And you can't have a comma between name and type. And you can't have parentheses around each pair. And each pair needs to be introduced with "ADD", you can't have it just once. You are using the wrong tool for the job. execute_batch is almost the right tool, except it will use single quotes rather than double quotes around the identifiers. Perhaps you could add a flag to it tell it to use quote_ident.
Not only is execute_values the wrong tool for the job, but I think python in general might be as well. Why not just load from a .sql file?

Proper way to insert iterative data into Cassandra using Python

Let's say I have cassandra table define like this:
CREATE TABLE IF NOT EXISTS {} (
user_id bigint ,
username text,
age int,
PRIMARY KEY (user_id)
);
I have 3 list of same size let's 1 000 000 records in each list. Is it a good practice to insert data using a for loop like this:
for index, user_id in enumerate(user_ids):
query = "INSERT INTO TABLE (user_id, username, age) VALUES ({0}, '{1}', {1});".format(user_id, username[index] ,age[index])
session.execute(query)

Prepared statements with concurrent execution will be your best bet. The driver provides utility functions for concurrent execution of statements with sequences of parameters, just as you have with your lists: execute_concurrent_with_args
Zipping your lists together will produce a sequence of parameter tuples suitable for input to that function.
Something like this:
prepared = session.prepare("INSERT INTO table (user_id, username, age) VALUES (?, ?, ?)")
execute_concurrent_with_args(session, prepared, zip(user_ids, username, age))

Its probably a good idea to start by looking at the python driver getting started guide. If you have already seen that then apologies but I thought it worth mentioning.
Generally speaking you'd create your session object and then do your inserts inside your loop, probably using something like a prepared statement (talked about further down the getting started page) but also here and here
The example of the above page uses this as a good starting point
user_lookup_stmt = session.prepare("SELECT * FROM users WHERE user_id=?")
users = []
for user_id in user_ids_to_query:
user = session.execute(user_lookup_stmt, [user_id])
users.append(user)
You may also find this blog helps when talking about better throughput with the python driver
You might find the python driver github page a useful resource, in particular I found this example using a prepared statement here that might help you too.

dynamic table mysqldb python string/int issue

I am receiving an error when trying to write data to a database table when using a variable for the table name that I do not get when using a static name. For some reason on the line where I insert, if I insert an integer as the column values the code runs and the table is filled, however, if I try to use a string I get a SQL syntax error
cursor = db.cursor()
cursor.execute('DROP TABLE IF EXISTS %s' %data[1])
sql ="""CREATE TABLE %s (IP TEXT, AVAILIBILITY INT)""" %data[1]
cursor.execute(sql)
for key in data[0]:
cur_ip = key.split(".")[3]
cursor.execute("""INSERT INTO %s VALUES (%s,%s)""" %(data[1],key,data[0][key]))
db.commit()
the problem is where I have %(data[1], key, data[0][key]) any ideas?

It's a little hard to analyse your problem when you don't post the actual error, and since we have to guess what your data actually is. But some general points as advise:
Using a dynamic table name is often not way DB-systems want to be used. Try thinking if the problem could be used by using a static table name and adding an additional key column to your table. Into that field you can put what you did now as a dynamic table name. This way the DB might be able to better optimize your queries, and your queries are less likely to get errors (no need to create extra tables on the fly for once, which is not a cheap thing to do. Also you would not have a need for dynamic DROP TABLE queries, which could be a security risk.
So my advice to solve your problem would be to actually work around it by trying to get rid of dynamic table names altogether.
Another problem you have is that you are using python string formatting and not parameters to the query itself. That is a security problem in itself (SQL-Injections), but also is the problem of your syntax error. When you use numbers, your expression evaluates to
INSERT INTO table_name VALUES (100, 200)
Which is valid SQL. But with strings you get
INSERT INTO table_name VALUES (Some Text, some more text)
which is not valid (since you have no quotes ' around the strings.
To get rid of your syntax problem and of the sql-injection-problem, don't add the values to the string, pass them as a list to execute():
cursor.execute("INSERT INTO table_name VALUES (%s,%s)", (key, data[0][key]))
If you must have a dynamic table name, put that in your query string first (e.g. with % formatting), and give the actual values for your query as parameters as above (since I cannot imagine that execute will accept the table name as a parameter).
To put it in some simple sample code. Right now you are trying to do it like this:
# don't do this, this won't even work!
table_name = 'some_table'
user_name = 'Peter Smith'
user_age = 47
query = "INSERT INTO %s VALUES (%s, %s)" % (table_name, user_name, user_age)
cursor.execute(query)
That creates query
INSERT INTO some_table VALUES (Peter Smith, 100)
Which cannot work, because of the unquoted string. So you needed to do:
# DON'T DO THIS, it's bad!
query = "INSERT INTO %s VALUES ('%s', %s)" % (table_name, user_name, user_age)
That's not a good idea, because you need to know where to put quotes and where not (which you will mess up at some point). Even worse, imagine a user named named Connor O'Neal. You would get a syntax error:
INSERT INTO some_table VALUES ('Connor O'Neal', 100)
(This is also the way sql-injections are used to crush your system / steal your data). So you would also need to take care of escaping the values that are strings. Getting more complicated.
Leave those problems to python and mysql, by passing the date (not the table name) as arguments to execute!
table_name = 'some_table'
user_name = 'Peter Smith'
user_age = 47
query = "INSERT INTO " + table_name + " VALUES (%s, %s)"
cursor.execute(query, (user_name, user_age))
This way you can even pass datetime objects directly. There are other ways to put the data than using %s, take a look at this examples http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html (that is python3 used there, I don't know which you use - but except of the print statements it should work with python2 as well, I think).

Writing URL into a database (sqlite3.OperationalError: no such column: )

I am trying to write in some URL into a sqlite database. I have gotten this to work without the URL. It even works if I replaces the 'Volumes/data/rc3/2/sdss/SDSS_r_0_0.fits' with a number .
c.execute("INSERT INTO rc3 (ID,ra,dec,radius,PGC_number,new_ra, new_dec, new_radius,ufits,gfits,rfits,ifits,zfits,best,low,in_SDSS_footprint,clean,error)VALUES({},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{})".format(n,pgc,ra,dec,radius,new_ra,new_dec,new_radius,2, 3, 4, 5, 6, 7,8,in_SDSS_footprint,clean,error))
It seems like it is mistakening the URL as a column as it is throwing the error:
c.execute("CREATE TABLE rc3 (ID INT , PGC_number INT,ra REAL, dec REAL,radius REAL,new_ra REAL,new_dec REAL,new_radius REAL, ufits TEXT, gfits TEXT, rfits TEXT ,ifits TEXT, zfits TEXT, best TEXT, low TEXT,in_SDSS_footprint BIT ,clean BIT, error INT,PRIMARY KEY(ID))")
....
c.execute("INSERT INTO rc3 (ID,ra,dec,radius,PGC_number,new_ra, new_dec, new_radius,ufits,gfits,rfits,ifits,zfits,best,low,in_SDSS_footprint,clean,error)VALUES({},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{})".format(0, 0.0075, 47.2744444444, 0.01999027436515, 2, 0, 0, 0, 'Volumes/data/rc3/2/sdss/SDSS_u_0_0.fits', 'Volumes/data/rc3/2/sdss/SDSS_g_0_0.fits', 'Volumes/data/rc3/2/sdss/SDSS_r_0_0.fits', 'Volumes/data/rc3/2/sdss/SDSS_i_0_0.fits', 'Volumes/data/rc3/2/sdss/SDSS_z_0_0.fits', 'Volumes/data/rc3/2/sdss/SDSS_0_0_BEST.tiff ', 'Volumes/data/rc3/2/sdss/SDSS_0_0_LOW.tiff ', 0, 1, 0))
sqlite3.OperationalError: no such column: Volumes
but I am not sure what to do. Thanks in advance.

Never try to create SQL statements by embedding the values with string formatting commands. Instead, use SQL parameters.
Instead of this:
c.execute("INSERT INTO breakfast (id, spam, eggs) VALUES({}, {}, {})".format(
id, spam, eggs))
… do this:
c.execute("INSERT INTO breakfast (id, spam, eggs) VALUES(?, ?, ?)",
id, spam, eggs)
This is explained at the very top of the sqlite3 documentation. But briefly, the reasons to do things this way are (in rough order of importance):
Avoids SQL injection if any of the data may come from malicious or incompetent users or external programs.
Means you don't have to worry about how to quote/escape strings, format numbers, etc.
Makes errors from inappropriate value types clearer and easier to debug.
Allows the SQL engine to see your 1000 separate inserts as the exact same statement with different values, instead of 1000 completely different statements, making it more likely it can cache or otherwise optimize.
Allows you to use executemany, which can be more readable than a loop, and also may give the SQL engine more optimization opportunities.
In your case, it's the second one you were running into. You're trying to use the string Volumes/data/rc3/2/sdss/SDSS_r_0_0.fits as a value. That's an expression, asking sqlite to divide the Volumes column by the data column, divide that by rc3, etc. If you wanted the string to be stored as a string, you need to put it in quotes.
But, again, don't try to fix this by adding quotes; just use parameters.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Inserting arrays into databases - python

SQLite does not support arrays - you have to convert to a TEXT using ','.join() to join your array items into a single string and pass that. Source: SQLite website https://www.sqlite.org/datatype3.html

Related

2 questions: Importing data from MySQL data base to Python

Too many server roundtrips w/ psycopg2

Proper way to insert iterative data into Cassandra using Python

dynamic table mysqldb python string/int issue

Writing URL into a database (sqlite3.OperationalError: no such column: )

Categories

Resources