If you wanted to manipulate the data in a table in a postgresql database using some python (maybe running a little analysis on the result set using scipy) and then wanted to export that data back into another table in the same database, how would you go about the implementation?
Is the only/best way to do this to simply run the query, have python store it in an array, manipulate the array in python and then run another sql statement to output to the database?
I'm really just asking, is there a more efficient way to deal with the data?
Thanks,
Ian
You could use PL/Python to write a PostgreSQL function to manipulate the data.
http://www.postgresql.org/docs/current/static/plpython.html
Although tbh I think it's much of a muchness compared to processing it in an external client for most cases.
I'm not sure I understand what you mean, but I'd say it sounds very much like
INSERT INTO anothertable SELECT stuff FROM the_table RETURNING *
and then work on the returned rows. That is, of course, if you don't want to modify the data when you manipulate it.
Is the only/best way to do this to
simply run the query, have python
store it in an array, manipulate the
array in python and then run another
sql statement to output to the
database?
Not the only way (see the other answers) but IMHO the best and certainly the simplest. It just requires a PostgreSQL librray (I use psycopg). The standard interface is documented in PEP 249.
An example of a SELECT with psycopg:
cursor.execute("SELECT * FROM students WHERE name=%(name)s;",
globals())
and an INSERT:
cursor.execute("INSERT INTO Foobar (t, i) VALUES (%s, %s)",
["I like Python", 42])
pgnumpy seems to be what you're looking for.
I'd think about using http://www.sqlalchemy.org/.
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
It supports postgres, too - http://www.sqlalchemy.org/docs/05/reference/dialects/postgres.html
You could use an ORM such as SQLAlchemy to retrieve the data into an "object", and manipulate that object. Such an implementation would probably be more elegant than just using an array.
I agree with the SQL Alchemy suggestions or using Django's ORM. Your needs seem to simple for PL/Python to be used.
Related
I was reading this SO question: psycopg2: insert multiple rows with one query and I found there was an excellent answer included that used cursor.mogrify to speed up a series of sql insertions. It got me wondering, does cursor.mogrify successfully escape all sql injection vulnerabilities?
The code for the answer posted by Alex Riley was as follows:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Does anyone know of any vulnerabilities to this method of using psychopg2's cursor.mogrify method and then following it up with a string interpolation in the cursor.execute function like this?
psycopg2 doesn't use server-side prepared statements and bind parameters at all. It actually does all queries via string interpolation, but it respects quoting rules carefully and does so in a secure manner.
cursor.mogrify is just a manual invocation of exactly the same logic that psycopg2 uses when it interpolates parameters into the SQL string its self, before it sends it to the server.
It is safe to do this. Just make sure your code has comments explaining why you're doing it and why it's safe.
However, personally I recommend avoiding this approach in favour of using its COPY support.
As far as I know, there exists no such features that gives the expected type of query result in sql without executing the query. However, I think that it's possible to implement and there could be some tricks for it.
I'm using sqlalchemy, so I hope that the solution is easy to implement with sqlalchemy. Any idea how to do this?
You can get column types from column_descriptions
[c['type'] for c in query.column_descriptions]
Or if you need to know Python types:
[c['type'].python_type for c in query.column_descriptions]
I use Python and SQLAlchemy to access an Oracle 11 database.
As far as I think, SQLAlchemy always use prepared statements.
On a huge table (4 millions records), correctly indexed, SQLAlchemy filters queries doesn't use the index, so doing a full table scan is very slow ; using the same SQL code in a "raw" SQL editor (SQLplus) make Oracle use the index with good performances.
We tried to add "+index" hints on requests, without effect on Oracle execution path which still doesn't want to use the index.
Any idea ? Is it possible to really force Oracle to use an index, or to make SQLAlchemy to not use prepared statements ??
Best regards,
Thierry
What placeholders can I use with pymssql. I'm getting my values from the html query string so they are all of type string. Is this safe with regard to sql injection?
query = dictify_querystring(Response.QueryString)
employeedata = conn.execute_row("SELECT * FROM employees WHERE company_id=%s and name = %s", (query["id"], query["name"]))
What mechanism is being used in this case to avoid injections?
There isn't much in the way of documentation for pymssql...
Maybe there is a better python module I could use to interface with Sql Server 2005.
Thanks,
Barry
Regarding SQL injection, and not knowing exactly how that implementation works, I would say that's not safe.
Some simple steps to make it so:
Change that query into a prepared statement (or make sure the implementation internally does so, but doesn't seem like it).
Make sure you use ' around your query arguments.
Validate the expected type of your arguments (if request parameters that should be numeric are indeed numeric, etc).
Mostly... number one is the key. Using prepared statements is the most important and probably easiest line of defense against SQL injection.
Some ORM's take care of some of these issues for you (notice the ample use of the word some), but I would advise making sure you know these problems and how to work around them before using an abstraction like an ORM.
Sooner or later, you need to know what's going on under those wonderful layers of time-saving.
Maybe there is a better python module I could use to interface with Sql Server 2005.
Well, my advice is using an ORM like SqlAlchemy to handle this.
>>> from sqlalchemy.ext.sqlsoup import SqlSoup
>>> db = SqlSoup('mssql:///DATABASE?PWD=yourpassword&UID=some_user&dsn=your_dsn')
>>> employeedata = db.employees.filter(db.employees.company_id==query["id"])\
.filter(db.employees.name==query["name"]).one()
You can use one() if you want to raise an exception if there is more than one record, .first() if you want just the first record or .all() if you want all records.
As a side benefit, if you later change to other DBMS, the code will remain the same except for the connection URL.
I would like to construct a sqlite3 database query in python, then write it to a file.
I am a huge fan of python's interfaces for sql databases, which AFAICT wrap all calls you could mess up with a nice little '?' parameters that sanitizes/escapes your strings for you, but that's not what I want. I actually just want to prepare and escape a sql statement - to do this, I need to escape/quote arbitrary strings.
For example:
query = "INSERT INTO example_table VALUES ('%s')",sqlite_escape_string("'")
And so query should contain:
"INSERT INTO example_table VALUES ('''')"
Note that it inserted an additional ' character.
PHP's equivalent is sqlite_escape_string()
perl's equivalent is DBI's quote()
I feel Python has a better overall interface, but I happen to need the query, pre-exec.
When you use SQLite it doesn't turn parameterized queries back into text. It has an api ("bindings") and stores the values separately. Queries can be reused with different values just by changing the bindings. This is what underlies the statement cache. Consequently you'll get no help from python/sqlite in doing what you describe.
What you didn't describe is why you want to do this. The usual reason is as some form of tracing. My alternate Python/SQLite interface (APSW) provides easy tracing - you don't even have to touch your code to use it:
https://rogerbinns.github.io/apsw/execution.html#apsw-trace
SQLite also has an authorizer API which lets you veto operations performed by a statement. This also has a side effect of telling you what operations a statement would end up performing.