This is using pyodbc.
Okay, let's say I create a procedure on the iSeries using something like this:
CREATE PROCEDURE MYLIB.MYSQL(IN WCCOD CHAR ( 6), INOUT WPLIN CHAR (3))
RESULT SETS 1 LANGUAGE CL NOT DETERMINISTIC
CONTAINS SQL EXTERNAL NAME MYLIB.MYCL PARAMETER STYLE GENERAL
Then in Python I do something like this:
cust = 'ABCDEF'
line = '123'
sql = "CALL MYLIB.MYSQL('%(cust)s',?)" % vars()
values = (line)
c = pyodbc.connect('DSN='+system+';CMT=0;NAM=0')
cursor = c.cursor()
cursor.execute(sql,values)
Nothing in the variables shows the return value. The sense I get from seeing comparable code in other languages (ie. .NET) is that the ODBC "variable" is defined, then updated with the return value, but in this case neither "line" or "values" is changed.
I realize one alternative is to have the CL program write the result to a file then read the file, but it seems like an extra step that requires maintenance, never mind added complexity.
Has anyone ever made this work?
First, you won't get any results at all if you don't fetch them. When using pyODBC (or practically any other package adhering to Python Database API Specification v2.0), you have a couple of choices to do this. You can explicitly call one of the fetch methods, such as
results = cursor.fetchall()
after which the result set will be in results (where each result is a tuple and results is a list of these tuples). Or you can iterate directly over the cursor (which is a bit like repeatedly calling the .fetchone() method):
for row in cursor:
# Do stuff with row here
# Each time through the loop gets another row
# When there are no more results, the loop ends
Now, whether you explicitly fetch or you use Python's looping mechanism, you receive a brand-new collection of values, accessed by the names you chose to receive them (results in my first example, row in my second.) You can't specify Python variables to be updated directly by individual values.
Besides the Python DB API 2.0 spec mentioned above, you'll probably want to read up on pyODBC features, and particularly on the way it handles stored procedures.
Holy cow, I got it working, but it takes two procedures, one to call the CL one to run the SQL statement.
For example (first procedure, this is what calls the CL program that returns a value in WPLIN):
CREATE PROCEDURE MYLIB.MYSQLA(IN WCCOD CHAR ( 6), INOUT WPLIN CHAR (3))
RESULT SETS 1 LANGUAGE CL NOT DETERMINISTIC
CONTAINS SQL EXTERNAL NAME MYLIB.MYCL PARAMETER STYLE GENERAL
Second procedure (will call the first, THIS is the procedure we call from ODBC):
CREATE PROCEDURE MYLIB.MYSQLB(IN WCCOD CHAR ( 6), INOUT WPLIN CHAR (3))
DYNAMIC RESULT SETS 1 LANGUAGE SQL
BEGIN
DECLARE C1 CURSOR WITH RETURN TO CLIENT
FOR
SELECT WPLIN FROM DUMMYLIB.DUMMYFILE;
CALL MYLIB.MYSQLA(WCCOD,WPLIN);
OPEN C1;
END
Then from an ODBC connection, we simply execute this:
customer = 'ABCDEF'
line='ABC'
sql = "{CALL MYLIB.MYSQLB('%(customer)s','%(line)s')}" % vars()
cursor.execute(sql)
print cursor.fetchone()
Et voila!
A caveat: The "DUMMYLIB/DUMMYFILE" are a single record physical file I created with a single byte column. It's only used for reference (unless there's a better way?) and it doesn't matter what's in it.
Maybe a bit clumsy, but it works! If anyone knows a way to combine these into a single procedure that would nice!
Related
I am new to this and trying to learn python. I wrote a select statement in python where I used a parameter
Select """cln.customer_uid = """[(num_cuid_number)])
TypeError: string indices must be integers
Agree with the others, this doesn't look really like Python by itself.
I will see even without seeing the rest of that code I'll guess the [(num_cuid_number)] value(s) being returned is a string, so you'll want to convert it to integer for the select statement to process.
num_cuid_number is most likely a string in your code; the string indices are the ones in the square brackets. So please first check your data variable to see what you received there. Also, I think that num_cuid_number is a string, while it should be in an integer value.
Let me give you an example for the python code to execute: (Just for the reference: I have used SQLAlchemy with flask)
#app.route('/get_data/')
def get_data():
base_sql="""
SELECT cln.customer_uid='%s' from cln
""" % (num_cuid_number)
data = db.session.execute(base_sql).fetchall()
Pretty sure you are trying to create a select statement with a "where" clause here. There are many ways to do this, for example using raw sql, the query should look similar to this:
query = "SELECT * FROM cln WHERE customer_uid = %s"
parameters = (num_cuid_number,)
separating the parameters from the query is secure. You can then take these 2 variables and execute them with your db engine like
results = db.execute(query, parameters)
This will work, however, especially in Python, it is more common to use a package like SQLAlchemy to make queries more "flexible" (in other words, without manually constructing an actual string as a query string). You can do the same thing using SQLAlchemy core functionality
query = cln.select()
query = query.where(cln.customer_uid == num_cuid_number)
results = db.execute(query)
Note: I simplified "db" in both examples, you'd actually use a cursor, session, engine or similar to execute your queries, but that wasn't your question.
I tried uploading modules to redshift through S3 but it always says no module found. please help
CREATE or replace FUNCTION olus_layer(subs_no varchar)
RETURNS varchar volatile AS
$$
import plpydbapi
dbconn = plpydbapi.connect()
cursor = dbconn.cursor()
cursor.execute("SELECT count(*) from busobj_group.olus_usage_detail")
d=cursor.fetchall()
dbconn.close()
return d
$$
LANGUAGE plpythonu;
–
You cannot do this in Redshift. so you will need to find another approach.
1) see here for udf constraints http://docs.aws.amazon.com/redshift/latest/dg/udf-constraints.html
2) see here http://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html
especially this part:
Important Amazon Redshift blocks all network access and write access
to the file system through UDFs.
This means that even if you try to get around the restriction, it won't work!
If you don't know an alternative way to get what you need, you should ask a new question specifying exactly what your challenge is and what you have tried, (leave this question ans answer here for future reference by others)
It can't connect to DB inside UDF, Python functions are scalar in Redshift, meaning it takes one or more values and returns only one output value.
However, if you want to execute a function against a set of rows try to use LISTAGG function to build an array of values or objects (if you need multiple properties) into a large string (beware of string size limitation), pass it to UDF as parameter and parse/loop inside the function.
Amazon has recently announced the support for Stored Procedures in Redshift. Unlike a user-defined function (UDF), a stored procedure can incorporate data definition language (DDL) and data manipulation language (DML) in addition to SELECT queries. Along with that, it also supports looping and conditional expressions, to control logical flow.
https://docs.aws.amazon.com/redshift/latest/dg/stored-procedure-overview.html
I've hit a strange inconsistency problem with SQL Server inserts using a stored procedure. I'm calling a stored procedure from Python via pyodbc by running a loop to call it multiple times for inserting multiple rows in a table.
It seems to work normally most of the time, but after a while it will just stop working in the middle of the loop. At that point even if I try to call it just once via the code it doesn't insert anything. I don't get any error messages in the Python console and I actually get back the incremented identities for the table as though the data were actually inserted, but when I go look at the data, it isn't there.
If I call the stored procedure from within SQL Server Management Studio and pass in data, it inserts it and shows the incremented identity number as though the other records had been inserted even though they are not in the database.
It seems I reach a certain limit on the number of times I can call the stored procedure from Python and it just stops working.
I'm making sure to disconnect after I finish looping through the inserts and other stored procedures written in the same way and sent via the same database connection still work as usual.
I've tried restarting the computer with SQL Server and sometimes it will let me call the stored procedure from Python a few more times, but that eventually stops working as well.
I'm wondering if it is something to do with calling the stored procedure in a loop too quickly, but that doesn't explain why after restarting the computer, it doesn't allow any more inserts from the stored procedure.
I've done lots of searching online, but haven't found anything quite like this.
Here is the stored procedure:
USE [Test_Results]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[insertStepData]
#TestCaseDataId int,
#StepNumber nchar(10),
#StepDateTime nvarchar(50)
AS
SET NOCOUNT ON;
BEGIN TRANSACTION
DECLARE #newStepId int
INSERT INTO TestStepData (
TestCaseDataId,
StepNumber,
StepDateTime
)
VALUES (
#TestCaseDataId,
#StepNumber,
#StepDateTime
)
SET #newStepId = SCOPE_IDENTITY();
SELECT #newStepId
FROM TestStepData
COMMIT TRANSACTION
Here is the method I use to call a stored procedure and get back the id number ('conn' is an active database connection via pyodbc):
def CallSqlServerStoredProc(self, conn, procName, *args):
sql = """DECLARE #ret int
EXEC #ret = %s %s
SELECT #ret""" % (procName, ','.join(['?'] * len(args)))
return int(conn.execute(sql, args).fetchone()[0])
Here is where I'm passing in the stored procedure to insert:
....
for testStep in testStepData:
testStepId = self.CallSqlServerStoredProc(conn, "insertStepData", testCaseId, testStep["testStepNumber"], testStep["testStepDateTime"])
conn.commit()
time.sleep(1)
....
SET #newStepId = SCOPE_IDENTITY();
SELECT #newStepId
FROM StepData
looks mighty suspicious to me:
SCOPE_IDENTITY() returns numeric(38,0) which is larger than int. A conversion error may occur after some time. Update: now that we know the IDENTITY column is int, this is not an issue (SCOPE_IDENTITY() returns the last value inserted into that column in the current scope).
SELECT into variable doesn't guarantee its value if more that one record is returned. Besides, I don't get the idea behind overwriting the identity value we already have. In addition to that, the number of values returned by the last statement is equal to the number of rows in that table which is increasing quickly - this is a likely cause of degradation. In brief, the last statement is not just useless, it's detrimental.
The 2nd statement also makes these statements misbehave:
EXEC #ret = %s %s
SELECT #ret
Since the function doesn't RETURN anything but SELECTs a single time, this chunk actually returns two data sets: 1) a single #newStepId value (from EXEC, yielded by the SELECT #newStepId <...>); 2) a single NULL (from SELECT #ret). fetchone() reads the 1st data set by default so you don't notice this but it doesn't work towards performance or correctness anyway.
Bottom line
Replace the 2nd statement with RETURN #newStepId.
Data not in the database problem
I believe it's caused by RETURN before COMMIT TRANSACTION. Make it the other way round.
In the original form, I believe it was caused by the long-working SELECT and/or possible side-effects from the SELECT not-to-a-variable being inside a transaction.
I'm trying to create a python script that constructs valid sqlite queries. I want to avoid SQL Injection, so I cannot use '%s'. I've found how to execute queries, cursor.execute('sql ?', (param)), but I want how to get the parsed sql param. It's not a problem if I have to execute the query first in order to obtain the last query executed.
If you're trying to transmit changes to the database to another computer, why do they have to be expressed as SQL strings? Why not pickle the query string and the parameters as a tuple, and have the other machine also use SQLite parameterization to query its database?
If you're not after just parameter substitution, but full construction of the SQL, you have to do that using string operations on your end. The ? replacement always just stands for a value. Internally, the SQL string is compiled to SQLite's own bytecode (you can find out what it generates with EXPLAIN thesql) and ? replacements are done by just storing the value at the correct place in the value stack; varying the query structurally would require different bytecode, so just replacing a value wouldn't be enough.
Yes, this does mean you have to be ultra-careful. If you don't want to allow updates, try opening the DB connection in read-only mode.
Use the DB-API’s parameter substitution. Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method.
# Never do this -- insecure!
symbol = 'hello'
c.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
# Do this instead
t = (symbol,)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print c.fetchone()
More reference is in the manual.
I want how to get the parsed 'sql param'.
It's all open source so you have full access to the code doing the parsing / sanitization. Why not just reading this code and find out how it works and if there's some (possibly undocumented) implementation that you can reuse ?
I have the following code
cur = db.cursor(cursors.SSDictCursor)
cur.execute("SELECT * FROM large_table")
result_count = cur.rowcount
print result_count
This prints the number 18446744073709551615 which is obviously wrong. If I remove the cursors.SSDictCursor the correct number is shown. Can anyone tell me how I can get the number of records returned while keeping the SSDictCursor?
To get the number of records returned by SSDictCursor or SSCursor, your only options are:
Fetch the entire result and count it using len(), which defeats the purpose of using SSDictCursor or SSCursor in the first place;
Count the rows yourself as you iterate through them, which means you won't know the count until hit the end (not likely to be practical); or,
Run an additional, separate COUNT(*) query.
I highly recommend the third option. It's extremely fast if all you're doing is SELECT COUNT(*) FROM table;. It would be slower for some more complex query, but with proper indexing it should still be quick enough for most purposes.
As an aside, the return value you're seeing is sort of correct; at least, as far as the MySQL C API is concerned.
Per the Python DB API defined in PEP 249, the rowcount attribute is -1 if the rowcount of the last operation cannot be determined by the interface. #glglgl explained why the rowcount can't be determined in their answer:
Internally, SSDictCursor uses mysql_use_result() which allows the server to start transferring the data before the acquiring is complete.
In other words, the server doesn't know how many rows it's ultimately going to fetch. When you execute a query, MySQLdb stores the return value of mysql_affected_rows() in the cursor's rowcount attribute. Because the count is indeterminate, this function returns -1 as an unsigned long long integer (my_ulonglong), a numeric type that's available in the ctypes module of the standard library:
>>> from ctypes import c_ulonglong
>>> n = c_ulonglong(-1)
>>> n.value
18446744073709551615L
A quick-and-dirty alternative to ctypes, when you know you'll always be dealing with a 64-bit unsigned integer, is:
>>> -1 & 0xFFFFFFFFFFFFFFFF
18446744073709551615L
It would be great if MySQLdb checked for this return value and gave you the signed integer you expect to see, but unfortunately it doesn't.
With a SSDictCursor, this value can only be read resp. determined when the cursor is used up.
Internally, SSDictCursor uses mysql_use_result() which allows the server to start transferring the data before the acquiring is complete.