I have stored the entry to be inserted into db as dictionary with dictionary keys same as field names. is there any simple command to directly do this?
currently this is the command I use
cursor.execute('INSERT INTO jobs (title, description, country, state, city)
VALUES (%(title)s, %(description)s, %(country)s, %(state)s, %(city)s)', (job_data.get_parsed_dictionary()))
Many times python has so many elegant library methods for all sorts of things. I am hoping there one such command which make it much simpler.
You could generate the column names:
data = job_data.get_parsed_dictionary()
columns = ', '.join(data.keys())
parameters = ', '.join(['%({0})s'.format(k) for k in data.keys()])
query = 'INSERT INTO jobs ({columns}) VALUES ({parameters})'.format(columns=columns, parameters=parameters)
cursor.execute(query, data)
You do need to make sure that the keys in the .get_parsed_dictionary() result are safe to interpolate into a SQL query (remember your SQL injection attack vectors). If you have a list of possible column names handy, I'd certainly check against that to filter out any stray 'extra' keys you may find in that dict.
Related
I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.
My procedure now:
get everything I need from the metadataDatabase.
In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).
Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table.
I think that instead of me altering (which might be not so clever approach), I should do something like the following:
Note that this is just a stupid but valid example:
for table in tables:
con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen date", "valid_from timestamp"))
But it gives me this error (it seems like it reads the table name as a string in a string..):
psycopg2.errors.SyntaxError: syntax error at or near "'billing'"
LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...
Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.
from psycopg2 import sql
# CREATE TABLE
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table")))
# ALTER TABLE
items = [("last_seen", "date"), ("valid_from", "timestamp")]
query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
for item in items:
# KEEP IDENTIFIER PLACEHOLDERS
final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"),
col = sql.Identifier(item[0]))
Alternatively, use str.join with list comprehension for one CREATE TABLE:
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
"id" serial,
{vals}
)"""
items = [("last_seen", "date"), ("valid_from", "timestamp")]
val = ",\n ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
# KEEP IDENTIFIER PLACEHOLDERS
pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items],
shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"))
cur.execute(final_query)
SQL (sent to database)
CREATE TABLE IF NOT EXISTS "tester"."table" (
"id" serial,
"last_seen" date,
"valid_from" timestamp
)
However, this becomes heavy as there are too many server roundtrips.
How many tables with how many columns are you creating that this is slow? Could you ssh to a machine closer to your server and run the python there?
I don't get that error. Rather, I get an SQL syntax error. A values list is for conveying data. But ALTER TABLE is not about data, it is about metadata. You can't use a values list there. You need the names of the columns and types in double quotes (or no quotes) rather than single quotes. And you can't have a comma between name and type. And you can't have parentheses around each pair. And each pair needs to be introduced with "ADD", you can't have it just once. You are using the wrong tool for the job. execute_batch is almost the right tool, except it will use single quotes rather than double quotes around the identifiers. Perhaps you could add a flag to it tell it to use quote_ident.
Not only is execute_values the wrong tool for the job, but I think python in general might be as well. Why not just load from a .sql file?
This question already has answers here:
imploding a list for use in a python MySQLDB IN clause
(8 answers)
Closed 1 year ago.
I want to insert a list in my database but I can't.
Here is an example of what I need:
variable_1 = "HELLO"
variable_2 = "ADIOS"
list = [variable_1,variable_2]
INSERT INTO table VALUES ('%s') % list
Can something like this be done? Can I insert a list as a value?
When I try it, an error says that is because of an error in MySQL syntax
The answer to your original question is: No, you can't insert a list like that.
However, with some tweaking, you could make that code work by using %r and passing in a tuple:
variable_1 = "HELLO"
variable_2 = "ADIOS"
varlist = [variable_1, variable_2]
print "INSERT INTO table VALUES %r;" % (tuple(varlist),)
Unfortunately, that style of variable insertion leaves your code vulnerable to SQL injection attacks.
Instead, we recommend using Python's DB API and building a customized query string with multiple question marks for the data to be inserted:
variable_1 = "HELLO"
variable_2 = "ADIOS"
varlist = [variable_1,variable_2]
var_string = ', '.join('?' * len(varlist))
query_string = 'INSERT INTO table VALUES (%s);' % var_string
cursor.execute(query_string, varlist)
The example at the beginning of the SQLite3 docs shows how to pass arguments using the question marks and it explains why they are necessary (essentially, it assures correct quoting of your variables).
Your question is not clear.
Do you want to insert the list as a comma-delimited text string into a single column in the database? Or do you want to insert each element into a separate column? Either is possible, but the technique is different.
Insert comma-delimited list into one column:
conn.execute('INSERT INTO table (ColName) VALUES (?);', [','.join(list)])
Insert into separate columns:
params = ['?' for item in list]
sql = 'INSERT INTO table (Col1, Col2. . .) VALUES (%s);' % ','.join(params)
conn.execute(sql, list)
both assuming you have established a connection name conn.
A few other suggestions:
Try to avoid INSERT statements that do not list the names and order of the columns you're inserting into. That kind of statement leads to very fragile code; it breaks if you add, delete, or move columns around in your table.
If you're inserting a comma-separted list into a single-field, that generally violates principals of database design and you should use a separate table with one value per record.
If you're inserting into separate fields and they have names like Word1 and Word2, that is likewise an indication that you should be using a separate table instead.
Never use direct string substitution to create SQL statements. It will break if one of the values is, for example o'clock. It also opens you to attacks by people using SQL injection techniques.
You can use json.dumps to convert a list to json and write the json to db.
For example:
insert table example_table(column_name) values(json.dumps(your_list))
Let's say I have cassandra table define like this:
CREATE TABLE IF NOT EXISTS {} (
user_id bigint ,
username text,
age int,
PRIMARY KEY (user_id)
);
I have 3 list of same size let's 1 000 000 records in each list. Is it a good practice to insert data using a for loop like this:
for index, user_id in enumerate(user_ids):
query = "INSERT INTO TABLE (user_id, username, age) VALUES ({0}, '{1}', {1});".format(user_id, username[index] ,age[index])
session.execute(query)
Prepared statements with concurrent execution will be your best bet. The driver provides utility functions for concurrent execution of statements with sequences of parameters, just as you have with your lists: execute_concurrent_with_args
Zipping your lists together will produce a sequence of parameter tuples suitable for input to that function.
Something like this:
prepared = session.prepare("INSERT INTO table (user_id, username, age) VALUES (?, ?, ?)")
execute_concurrent_with_args(session, prepared, zip(user_ids, username, age))
Its probably a good idea to start by looking at the python driver getting started guide. If you have already seen that then apologies but I thought it worth mentioning.
Generally speaking you'd create your session object and then do your inserts inside your loop, probably using something like a prepared statement (talked about further down the getting started page) but also here and here
The example of the above page uses this as a good starting point
user_lookup_stmt = session.prepare("SELECT * FROM users WHERE user_id=?")
users = []
for user_id in user_ids_to_query:
user = session.execute(user_lookup_stmt, [user_id])
users.append(user)
You may also find this blog helps when talking about better throughput with the python driver
You might find the python driver github page a useful resource, in particular I found this example using a prepared statement here that might help you too.
I want to execute an INSERT query via psycopg2, for this question let's simplify it to just:
query = """ INSERT INTO %s("%s") VALUES(%s); """
This works just fine when I do:
params = [AsIs(table_name), AsIs(column_name), value]
cursor.execute(query, params)
Now, my Pandas dataframe has about 90+ columns, I want to know what the best way to extend the query above to be able to execute it for multiple columns.
I have tried joining every column and value together as a single string and passing that in. I have also tried creating a string with 90+ "\"%s\"" and I have also tried creating a format string ie. """INSERT INTO {0} ({1}...{n}) VALUES ({n+1...n+n})""".format(...). There are unrelated issues that prevent these from working, but is there an easier way to handle this multiple column case?
I'm not familiar with pandas but you probably want something like this:
columns = ', '.join(column_names) # Where column names is a tuple or list of all the column headings you want in your query.
query = """ INSERT INTO %s("%s") VALUES(%%s); """ % (table_name, columns)
params = [value]
cursor.execute(query, params)
The point is that you need to insert the column headings and the values separately. See this post for a much better explanation than what I can provide:
Psycopg2 Insert Into Table with Placeholders
I am trying to construct an insert statement that is built from the results of a query. I run a query that retrieves results from one database and then creates an insert statement from the results and inserts that into a different database.
The server that is initially queried only returns those fields in the reply which are populated and this can differ from record to record. The destination database table has all of the possible fields available. This is why I need to construct the insert statement on the fly for each record that is retrieved and why I cannot use a default list of fields as I have no control over which ones will be populated in the response.
Here is a sample of the code, I send off a request for the T&C for an isin and the response is a name and value.
fields = []
data = []
getTCQ = ("MDH:T&C|"+isin+"|NAME|VALUE")
mdh.execute(getTCQ)
TC = mdh.fetchall()
for values in TC:
fields.append(values[0])
data.append(values[1])
insertQ = ("INSERT INTO sp_fields ("+fields+") VALUES ('"+data+"')")
The problem is with the fields part, mysql is expecting the following:
INSERT INTO sp_fields (ACCRUAL_COUNT,AMOUNT_OUTSTANDING_CALC_DATE) VALUES ('030/360','2014-11-10')
But I am getting the following for insertQ:
INSERT INTO sp_fields ('ACCRUAL_COUNT','AMOUNT_OUTSTANDING_CALC_DATE') VALUES ('030/360','2014-11-10')
and mysql does not like the ' ' around the fields names.
How do I get rid of these? so that it looks like the 1st insertQ statement that works.
many thanks in advance.
You could use ','.join(fields) to create the desired string (without quotes around each field).
Then use parametrized sql and pass the values as the second argument to cursor.execute:
insertQ = ("INSERT INTO sp_fields ({}) VALUES ({})".format(
','.join(fields), ','.join(['%s']*len(dates)))
cursor.execute(insertQ, dates)
Note that the correct placemarker to use, e.g. %s, depends on the DB adapter you are using. MySQLdb uses %s, but oursql uses ?, for instance.