2 questions: Importing data from MySQL data base to Python

2 questions: Importing data from MySQL data base to Python - python

Q1. My database contains 3 columns: time, value A and value B. The time data is written in the form 00:00:00 and the increment is 1 minute.
When I try to import data ...
cursor.execute (f"SELECT * FROM trffc_int_data.{i};")
instead getting (00:00:00, A, B), I get
(datetime.timedelta(0), 7, 2), (datetime.timedelta(seconds=60), 8, 5), .....
I suppose Python doesn't convert the time right. Any suggestions?
Q2. I have an initial database with the data mentioned above. I need to get the data from the initial database, convert it, and save it to another database.
I'm stuck at a point where data should be saved to a new table.
Here are the sections of the code...
# Creating new DB
NewDB = input(" :: Enter the Database name : ")
sqlsynt = f"CREATE DATABASE IF NOT EXISTS {NewDB}"
cursor.execute(sqlsynt,NewDB)
stdb.commit()
# Creating table and writing the data
cursor.execute (f"USE {NewDB}")
sqlsynt = f"CREATE TABLE {dayinweek} (time TIME, Vehicles INT(3), Pedestrians INT(3))"
cursor.execute (sqlsynt, NewDB, dayinweek)
#stdb.commit()
sqlsyntax = f"INSERT INTO {NewDB}.{dayinweek} (time, Vehicles, Pedestrians) VALUES (%s, %s, %s)"
cursor.executemany(sqlsyntax, temp_list_day)
The program stucks on the last line saying that there is no table 1 in NewDB!
mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'test001.1' doesn't exist
What's wrong with the code? Maybe the problem is in mixing f and %s formating?
Thanks in advance

If I am followin this correctly, you are creating a table called 1. Digit-only identifiers are not allowed in MySQL, unless the identifier is quoted, as explained in the documentation.
Identifiers may begin with a digit but unless quoted may not consist solely of digits.
Your create table statement did fail, but you did not notice that error until you tried to insert.
You could quote the table name, using backticks:
CREATE TABLE `{dayinweek}` (time TIME, Vehicles INT(3), Pedestrians INT(3))
And then:
INSERT INTO `{NewDB}`.`{dayinweek}` (time, Vehicles, Pedestrians) VALUES (%s, %s, %s)
Quoting the database name may also be a good idea: the same rules apply as for table names (and this is user input to start with).
But overall, changing the table name seems like a better option, as this makes for cleaner code: how about something like table1 for example - or better yet, a table name that is more expressive on what kind of data it contains, such as customer1, or sales1.
Note: your code is open to SQL injection, as you are passing user input directly to the database in a create database statement. Obviously such information cannot be parameterized, however I would still recommend performing a minimal sanity check on application side beforehand.

Related

Entering data into specific field using "WHERE" in "INSERT" command in SQL? (Python)

I'm currently writing a program for a parents evening system. I have two tables, a bookings table and a teacher table - set up with the following column headings: TeacherSubject | 15:30 | 15:35 | 15:40 etc... When people make a booking, they select a teacher from a drop-down menu and also a time. Therefore, I need the bookingID added into the booking table where the teacher selected = to the same teacher in the table and where time selected = time in the database.
At the moment, my code only attempts to match the teacher, but this doesn't work as I'm getting the error of: (line 5)
TypeError: 'str' object is not callable
Am I doing the whole thing wrong and is this actually possible with the way I have set the table up?
def insert(parent_name, parent_email, student_name,student_form,teacher,app_time,comments):
conn=sqlite3.connect("parentsevening.db")
cur=conn.cursor()
cur.execute("INSERT INTO bookings VALUES (NULL,?,?,?,?,?,?,?)",(parent_name,parent_email,student_name,student_form,teacher,app_time,comments))
cur.execute("INSERT INTO teachers VALUES (?) WHERE teachers = (?)" (id,teacherName,))
conn.commit()
conn.close()

This SQL Query is invalid.
INSERT INTO teachers VALUES (?) WHERE teachers = (?)
It should be
INSERT INTO teachers (id, name) VALUES(?, ?)
Note that I'm guessing the teachers columns (id, name) WHERE on the insert isn't valid because it's used to find data (SELECT, UPDATE, DELETE)

OK, let's take out the comments and make this into an answer.
Python error
I think your error comes from WHERE teachers = (?) have you tried WHERE teachers = ? instead.
But...
bad sql syntax
Also that command as a whole doesnt make much sense, SQL syntax wise - you seem to be trying to insert where a teacher that doesn't exist (if you are inserting them) and values on an insert does not go with where and where needs a from. i.e. once you've solved your python error, sqlite is going to have a fit as well.
That's already covered by another answer.
But...
probably not what you should be doing
If you have an existing teacher, you only need to insert their teacherid into table bookings. You don't have to, and in fact, you can't insert into table teachers at this point, you'd get a duplicate data error.
So, rather than fixing your second query, just get rid of it entirely.
If you can get a command line or GUI SQL tool up, try running these queries by hardcoding them by hand before coding them in Python. the sqlite command should be able to do that for you.
(recommendation) don't use insert table values
Try being explicit with insert into table (<column list>) values .... The reason is that, as soon as the table changes in some way that affects column order (possibly an alter column) the values won't line up with the implied insert list. hard to debug, hard to know what was intended at time of writing. Been there, done that. And had to debug buncha folks' code who took this shortcut, it's never fun

Too many server roundtrips w/ psycopg2

I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.
My procedure now:
get everything I need from the metadataDatabase.
In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).
Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table.
I think that instead of me altering (which might be not so clever approach), I should do something like the following:
Note that this is just a stupid but valid example:
for table in tables:
con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen date", "valid_from timestamp"))
But it gives me this error (it seems like it reads the table name as a string in a string..):
psycopg2.errors.SyntaxError: syntax error at or near "'billing'"
LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...

Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.
from psycopg2 import sql
# CREATE TABLE
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table")))
# ALTER TABLE
items = [("last_seen", "date"), ("valid_from", "timestamp")]
query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
for item in items:
# KEEP IDENTIFIER PLACEHOLDERS
final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"),
col = sql.Identifier(item[0]))
Alternatively, use str.join with list comprehension for one CREATE TABLE:
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
"id" serial,
{vals}
)"""
items = [("last_seen", "date"), ("valid_from", "timestamp")]
val = ",\n ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
# KEEP IDENTIFIER PLACEHOLDERS
pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items],
shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"))
cur.execute(final_query)
SQL (sent to database)
CREATE TABLE IF NOT EXISTS "tester"."table" (
"id" serial,
"last_seen" date,
"valid_from" timestamp
)

However, this becomes heavy as there are too many server roundtrips.
How many tables with how many columns are you creating that this is slow? Could you ssh to a machine closer to your server and run the python there?
I don't get that error. Rather, I get an SQL syntax error. A values list is for conveying data. But ALTER TABLE is not about data, it is about metadata. You can't use a values list there. You need the names of the columns and types in double quotes (or no quotes) rather than single quotes. And you can't have a comma between name and type. And you can't have parentheses around each pair. And each pair needs to be introduced with "ADD", you can't have it just once. You are using the wrong tool for the job. execute_batch is almost the right tool, except it will use single quotes rather than double quotes around the identifiers. Perhaps you could add a flag to it tell it to use quote_ident.
Not only is execute_values the wrong tool for the job, but I think python in general might be as well. Why not just load from a .sql file?

Inserting arrays into databases

I am trying to write a general function that will insert a line of data into a table in a database but I am trying to write an array of unknown length. I am aiming to just be able to call this function in any programand write a line of data of any length to the table (assuming the table and the array are the same length.
I have tried adding the array like it is a singular peice of data.
import sqlite3
def add2Db(dbName, tableName, data):
connection = sqlite3.connect(dbName)
cur = connection.cursor()
cur.execute("INSERT INTO "+ tableName +" VALUES (?)", (data))
connection.commit()
connection.close()
add2Db("items.db", "allItems", (1, "chair", 5, 4))
This just crashes and gives me an error saying it has 4 columns but only one value was supplied.

SQLite does not support arrays - you have to convert to a TEXT using ','.join() to join your array items into a single string and pass that.
Source: SQLite website
https://www.sqlite.org/datatype3.html

I'm not a Python programmer, but I've been doing SQL a long time. I even wrote my own ORM. My advice is do not write your own SQL query builder. There's a myriad of subtle issues and especially security issues. I elaborate on a few of them below.
Instead, use a well-established SQL Query Builder or ORM. They've already dealt with these issues. Here's an example using SQLAlchemy.
from datetime import date
from sqlalchemy import create_engine, MetaData
# Connect to the database with debugging on.
engine = create_engine('sqlite:///test.sqlite', echo=True)
conn = engine.connect()
# Read the schemas from the database
meta = MetaData()
meta.reflect(bind=engine)
# INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)
users = meta.tables['users']
conn.execute(
users.insert().values(name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US")
)
SQLAlchemy can do the entire range of SQL operations and will work with different SQL variants. You also get type safety.
conn.execute(
users.insert().values(name="Yarrow Hock", birthday="in the past", state="NY", country="US")
)
sqlalchemy.exc.StatementError: (exceptions.TypeError) SQLite Date type only accepts Python date objects as input. [SQL: u'INSERT INTO users (name, birthday, state, country) VALUES (?, ?, ?, ?)']
insert into table values (...) relies on column definition order
This relies on the order columns were defined in the schema. This leaves two problems. First is a readability problem.
add2Db(db, 'some_table', (1, 39, 99, 45, 'papa foxtrot', 0, 42, 0, 6)
What does any of that mean? A reader can't tell. They have to go digging into the schema and count columns to figure out what each value means.
Second is a maintenance problem. If, for any reason, the schema is altered and the column order is not exactly the same, this can lead to some extremely difficult to find bugs. For example...
create table users ( name text, birthday date, state text, country text );
vs
create table users ( name text, birthday date, country text, state text );
add2Db(db, 'users', ('Yarrow Hock', date(1977, 1, 23), 'NY', 'US'));
That insert will silently "work" with either column order.
You can fix this by passing in a dictionary and using the keys for column names.
add2Db(db, 'users', (name="Yarrow Hock", birthday=date(1977, 1, 23), state="NY", country="US"));
Then we'd produce a query like:
insert into users
(name, birthday, state, country)
values (?, ?, ?, ?)
This leads to the next and much bigger problem.
SQL Injection Attack
Now this opens up a new problem. If we simply stick the table and column names into the query that leaves us open to one of the most common security holes, a SQL Injection Attack. That's where someone can craft a value which when naively used in a SQL statement causes the query to do something else. Like Little Bobby Tables.
While the ? protects against SQL Injection for values, it's still possible to inject via the column names. There's no guarantee the column names can be trusted. Maybe they came from the parameters of a web form?
Protecting table and column names is complicated and easy to get wrong.
The more SQL you write the more likely you're vulnerable to an injection attack.
You have to write code for everything else.
Ok, you've done insert. Now update? select? Don't forget about subqueries, group by, unions, joins...
If you want to write a SQL query builder, cool! If, instead, you have a job to do using SQL, writing yet another SQL query builder is not your job.
It's harder for anyone else to understand.
There's a good chance that any given Python programmer knows how SQLAlchemy works, and there's plenty of tutorials and documentation if they don't. There's no chance they know about your home-rolled SQL functions, and you have to write all the tutorials and docs.

You shouldn't try to write your own ORMs without an argumented need. You will have a lot of problems, for example here's quick 25 reasons not to.
Instead use any popular orm that is proven. I recommend using SQLAlchemy as a go to outside of Django. Using it you can map a dict of values to insert it into a model just like insert(schema_name).values(**dict_name) (here's an example of insert/update).

Change your function to this:
def add2Db(dbName, tableName, data):
num_qs = len(data)
qm = ','.join(list('?' * num_qs))
query = """
INSERT INTO {table}
VALUES ({qms})
""".format(table=tableName,
qms=qm)
connection = sqlite3.connect(dbName)
cur = connection.cursor()
cur.execute(query, data)
connection.commit()
connection.close()

Creating Insert Statement for MySQL in Python

I am trying to construct an insert statement that is built from the results of a query. I run a query that retrieves results from one database and then creates an insert statement from the results and inserts that into a different database.
The server that is initially queried only returns those fields in the reply which are populated and this can differ from record to record. The destination database table has all of the possible fields available. This is why I need to construct the insert statement on the fly for each record that is retrieved and why I cannot use a default list of fields as I have no control over which ones will be populated in the response.
Here is a sample of the code, I send off a request for the T&C for an isin and the response is a name and value.
fields = []
data = []
getTCQ = ("MDH:T&C|"+isin+"|NAME|VALUE")
mdh.execute(getTCQ)
TC = mdh.fetchall()
for values in TC:
fields.append(values[0])
data.append(values[1])
insertQ = ("INSERT INTO sp_fields ("+fields+") VALUES ('"+data+"')")
The problem is with the fields part, mysql is expecting the following:
INSERT INTO sp_fields (ACCRUAL_COUNT,AMOUNT_OUTSTANDING_CALC_DATE) VALUES ('030/360','2014-11-10')
But I am getting the following for insertQ:
INSERT INTO sp_fields ('ACCRUAL_COUNT','AMOUNT_OUTSTANDING_CALC_DATE') VALUES ('030/360','2014-11-10')
and mysql does not like the ' ' around the fields names.
How do I get rid of these? so that it looks like the 1st insertQ statement that works.
many thanks in advance.

You could use ','.join(fields) to create the desired string (without quotes around each field).
Then use parametrized sql and pass the values as the second argument to cursor.execute:
insertQ = ("INSERT INTO sp_fields ({}) VALUES ({})".format(
','.join(fields), ','.join(['%s']*len(dates)))
cursor.execute(insertQ, dates)
Note that the correct placemarker to use, e.g. %s, depends on the DB adapter you are using. MySQLdb uses %s, but oursql uses ?, for instance.

Efficient way to run select query for millions of data

I want to run various select query 100 million times and I have aprox. 1 million rows in a table. Therefore, I am looking for the fastest method to run all these select queries.
So far I have tried three different methods, and the results were similar.
The following three methods are, of course, not doing anything useful, but are purely for comparing performance.
first Method:
for i in range (100000000):
cur.execute("select id from testTable where name = 'aaa';")
second method:
cur.execute("""PREPARE selectPlan AS
SELECT id FROM testTable WHERE name = 'aaa' ;""")
for i in range (10000000):
cur.execute("""EXECUTE selectPlan ;""")
third method:
def _data(n):
cur = conn.cursor()
for i in range (n):
yield (i, 'test')
sql = """SELECT id FROM testTable WHERE name = 'aaa' ;"""
cur.executemany(sql, _data(10000000))
And the table is created like this:
cur.execute("""CREATE TABLE testTable ( id int, name varchar(1000) );""")
cur.execute("""CREATE INDEX indx_testTable ON testTable(name)""")
I thought that using the prepared statement functionality would really speed up the queries, but as it seems like this will not happen, I thought you could give me a hint on other ways of doing this.

This sort of benchmark is unlikely to produce any useful data, but the second method should be fastest, as once the statement is prepared it is stored in memory by the database server. Further calls to repeat the query do not require the text of the query to be transmitted, so saving a small about of time.
This is likely to be moot as the query is very small (likely the same quantity of packets over the wire as repeating sending the query text), and the query cache will serve the same data for every request.

What's the purpose of retrieving such amount of data at once? I don't know your situation, but I'd definitely page the results using limit and offset. Take a look at:
7.6. LIMIT and OFFSET

If you just want to benchmark SQL all on it's own and not mix Python into the equation try pgbench.
http://developer.postgresql.org/pgdocs/postgres/pgbench.html
Also what is your goal here?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.