I am attempting to get the results of a stored procedure and populate a model dynamically, or at a minimum, generate a model based off of the result.
My intent is to create a reusable function where it should be ambiguous to the data. I will not know the fields being returned, and wish to take what's returned from the stored procedure, get the field names and put the data in an object with said field names.
How can I dynamically discover the columns in a result set returned from a stored procedure and then create an object to match?
I was able to figure this out. I got a list of the column names from the returned data, created an object by name and set properties/attributes of the object by string.
def callProc(sqlString, clsName):
cursor = connection.cursor()
dataResults = []
try:
cursor.execute(sqlString)
#get data results
results = cursor.fetchall()
#Get column names
columns = [column[0] for column in cursor.description]
#populate class
for row in results:
p = getattr(sys.modules[__name__], clsName)
i=0
for x in columns:
#set property value of matching column name
setattr(p, x, row[i])
#get property value
#x = getattr(p, x)
i=i+1
dataResults.append(p)
except Exception as ex:
print(ex)
finally:
cursor.close()
return dataResults
Related
I was wondering if you could help me with my problem
I am trying to create a function that gets data frames based on SQL QUERY.
My function is:
def request_data(str,conn):
cur = conn.cursor()
cur.execute(str)
data = pd.read_sql_query(str, conn)
return data
When I try to apply my function using append method, I get what I expect as a result!
Tables_to_initialize = ['sales', 'stores', 'prices']
Tables = []
for x in Tables_to_initialize:
Tables.append(request_data("SELECT * FROM {i} WHERE d_cr_bdd =
(SELECT MAX(d_cr_bdd) FROM {i} ) ; ".format(i = x),conn))
But,
Tables is a list that contains all the sorted data frames based on my query, what i really want to do is to assign every element in my list tables to it's name, for example
Tables_to_initialize[0] = 'sales'
and i want to Tables[0] to be sales as object (data frame).
Is there any method to assign objects inside the function or with append automatically? Or any other solution?
I really appreciate your help
Best regards,
To get a list of objects based on given query.
table = [requested_data("SELECT * FROM {i} WHERE d_cr_bdd = (SELECT MAX(d_cr_bdd) FROM {i}".format(i)) for i in Tables_to_initialize ]
I am using sqlalchemy 0.8 and I want to get the column name of the input only, not all the column in the table.
here is the code:
rec = raw_input("Enter keyword to search: ")
res = session.query(test.__table__).filter(test.fname == rec).first()
data = ','.join(map(str, res)) +","
print data
#saw this here # SO but not the one I wanted. It displays all of the columns
columns = [m.key for m in data.columns]
print columns
You can just query for the columns you want. Like if you had some model MyModel
You can do:
session.query(MyModel.wanted_column1, ...) ... # rest of the query
This would only select all the columns mentioned there.
You can use the select syntax.
Or if you still want the model object to be returned and certain columns not loaded, you can use deferred column loading.
I've got a ESRI Point Shape file with (amongst others) a nMSLINK field and a DIAMETER field. The MSLINK is not unique, because of a spatial join. What I want to achieve is to keep only the features in the shapefile that have a unique MSLINK and the smallest DIAMETER value, together with the corresponding values in the other fields. I can use a searchcursor to achieve this (looping through all features and removing each feature that does not comply, but this takes ages (> 75000 features). I was wondering if eg. numpy could do the trick faster in ArcMap/arcpy.
I think, making that kind of processing would definitely be a lot faster if you work on memory instead of interacting with arcgis. For example, by putting all the rows first into a python object (probably a namedtuple would be a good option here). Then you can find out which rows you want to delete or insert.
The fastest approach depends on a) if you have a lot of (MSLINK) repeated rows, then the fastest would be inserting just the ones you need in a new layer. Or b) if the rows to be deleted are just a few compared to the total of rows, then deleting is faster.
For a) you'll need to fetch all fields into the tuple, including the point coordinates, so that you can just create a new feature class and insert the new rows.
# Example of Variant a:
from collections import namedtuple
# assuming the following:
source_fc # contains name of the fclass
the_path # contains path to the shape
cleaned_fc # the name of the cleaned fclass
# use all fields of source_fc plus the shape token to get a touple with xy
# coordinates (using 'mslink' and 'diam' here to simplify the example)
fields = ['mslink', 'diam', 'field3', ... ]
all_fields = fields + ['SHAPE#XY']
# define a namedtuple to hold and work with the rows, use the name 'point' to
# hold the coordinates-tuple
Row = namedtuple('Row', fields + ['point'])
data = []
with arcpy.da.SearchCursor(source_fc, fields) as sc:
for r in sc:
# unzip the values from each row into a new Row (namedtuple) and append
# to data
data.append(Row(*r))
# now just delete the rows we don't want, for this, the easiest way, is probably
# to order the tuple first after MSLINK and then after the diamater...
data = sorted(data, key = lambda x : (x.mslink, x.diam))
# ... now just keep the first ones for each mslink
to_keep = []
last_mslink = None
for d in data:
if last_mslink != d.mslink:
last_mslink = d.mslink
to_keep.append(d)
# create a new feature class with the same fields as the source_fc
arcpy.CreateFeatureclass_management(
out_path=the_path, out_name=cleaned_fc, template=source_fc)
with arcpy.da.InsertCursor(cleaned_fc, all_fields) as ic:
for r in to_keep:
ic.insertRow(*r)
And for alternative b) I would just fetch 3 fields, a unique ID, MSLINK and the diameter. Then make a delete list (here you only need the unique ids). Then loop again through the feature class and delete the rows with the id on your delete-list. Just to be sure, I would duplicate the feature class first, and work on a copy.
There are a few steps you can take to accomplish this task more efficiently. First and foremost, making use of the data analyst cursor as opposed to the older version of cursor will increase the speed of your process. This assumes you are working in 10.1 or beyond. Then you can employ summary statistics, namely its ability to find a minimum value based off a case field. For yours, the case field would be nMSLINK.
The code below first creates a statistics table with all unique 'nMSLINK' values, and its corresponding minimum 'DIAMETER' value. I then use a table select to select out only rows in the table whose 'FREQUENCY' field is not 1. From here I iterate through my new table and start to build a list of strings that will make up a final sql statement. After this iteration, I use the python join function to create an sql string that looks something like this:
("nMSLINK" = 'value1' AND "DIAMETER" <> 624.0) OR ("nMSLINK" = 'value2' AND "DIAMETER" <> 1302.0) OR ("nMSLINK" = 'value3' AND "DIAMETER" <> 1036.0) ...
The sql selects rows where nMSLINK values are not unique and where DIAMETER values are not the minimum. Using this SQL, I select by attribute and delete selected rows.
This SQL statement is written assuming your feature class is in a file geodatabase and that 'nMSLINK' is a string field and 'DIAMETER' is a numeric field.
The code has the following inputs:
Feature: The feature to be analyzed
Workspace: A folder that will store a couple intermediate tables temporarily
TempTableName1: A name for one temporary table.
TempTableName2: A name for a second temporary table
Field1 = The nonunique field
Field2 = The field with the numeric values that you wish to find the lowest of
Code:
# Import modules
from arcpy import *
import os
# Local variables
#Feature to analyze
Feature = r"C:\E1B8\ScriptTesting\Workspace\Workspace.gdb\testfeatureclass"
#Workspace to export table of identicals
Workspace = r"C:\E1B8\ScriptTesting\Workspace"
#Name of temp DBF table file
TempTableName1 = "Table1"
TempTableName2 = "Table2"
#Field names
Field1 = "nMSLINK" #nonunique
Field2 = "DIAMETER" #field with numeric values
#Make layer to allow selection
MakeFeatureLayer_management (Feature, "lyr")
#Path for first temp table
Table = os.path.join (Workspace, TempTableName1)
#Create statistics table with min value
Statistics_analysis (Feature, Table, [[Field2, "MIN"]], [Field1])
#SQL Select rows with frequency not equal to one
sql = '"FREQUENCY" <> 1'
# Path for second temp table
Table2 = os.path.join (Workspace, TempTableName2)
# Select rows with Frequency not equal to one
TableSelect_analysis (Table, Table2, sql)
#Empty list for sql bits
li = []
# Iterate through second table
cursor = da.SearchCursor (Table2, [Field1, "MIN_" + Field2])
for row in cursor:
# Add SQL bit to list
sqlbit = '("' + Field1 + '" = \'' + row[0] + '\' AND "' + Field2 + '" <> ' + str(row[1]) + ")"
li.append (sqlbit)
del row
del cursor
#Create SQL for selection of unwanted features
sql = " OR ".join (li)
print sql
#Select based on SQL
SelectLayerByAttribute_management ("lyr", "", sql)
#Delete selected features
DeleteFeatures_management ("lyr")
#delete temp files
Delete_management ("lyr")
Delete_management (Table)
Delete_management (Table2)
This should be quicker than a straight-up cursor. Let me know if this makes sense. Good luck!
I am fetching results out of a query from a table:
def getdata()
self.cursor.execute("....")
fetchall = self.cursor.fetchall()
result ={}
for row in fetchall:
detail1 = row['mysite']
details2 = row['url']
result[detail1] = row
return result
Now I need to process the result set as generated :
def genXML()
data = getdata()
doc = Document() ""create XML tree structure"""
Such that data would hold all the rows as fetched from query and I can extract each column values from it? Somehow I am not getting the desired out. My requirement is to fetch result set via a DB query and store result into a placeholder such that I can easily access it later in other method or locations?
================================================================================
I tried the below technique but still in method 'getXML()' I am unable to get each dict row so that I can traverse and manipulate:
fetchall = self.cursor.fetchall()
results= []
result={}
for row in fetchall:
result['mysite'] = row['mysite']
result['mystart'] = row['mystart']
..................................
results.append(result)
return results
def getXML(self):
doc = Document()
charts = doc.createElement("charts")
doc.appendChild(charts)
chartData = self.grabChartData()
for site in chartData:
print site[??]
So how do I get each chartData row values and then I can loop for each?
Note: I found that only last row fetched values are getting printed as in chartData. Say I know that 2 rows are getting returned by the query. Hence in case I print the list in getXML() method like below both rows are same:
chartData[0]
chartData[1]
How can I uniquely add each result to the list?
Here you are modifying and adding the same dict to results over and over again:
result={}
for row in fetchall:
result['mysite'] = row['mysite']
result['mystart'] = row['mystart']
..................................
results.append(result)
Create the dictionary inside the loop to solve this:
for row in fetchall:
result={}
Is there any way to get the column names from the pymssql results? If i specify as_dict=True I get back a dictionary, which does contain all the column headers, but since it is a dictionary they are not ordered.
pymssql claims to support the Python DB-API, so you should be able to get the .description attribute from your cursor object.
.description
This read-only attribute is a sequence of 7-item
sequences.
Each of these sequences contains information describing
one result column:
(name,
type_code,
display_size,
internal_size,
precision,
scale,
null_ok)
So, the first item in each of the "inner" sequences is the name for each column.
You can create a list of ordered column names using list comprehension on the cursor description attribute:
column_names = [item[0] for item in cursor.description]
To get the column names on a single comma separated line.
colNames = ""
for i in range(len(cursor.description)):
desc = cursor.description[i]
if i == 0:
colNames = str(desc[0])
else:
colNames += ',' + str(desc[0])
print colNames
Alternatively, pass the column names to a list and use .join to get them as string.
colNameList = []
for i in range(len(cursor.description)):
desc = cursor.description[i]
colNameList.append(desc[0])
colNames = ','.join(colNameList)
print colNames
It's a basic solution and need optimizing but the below example returns both column header and column value in a list.
import pymssql
def return_mssql_dict(sql):
try:
con = pymssql.connect(server, user, password, database_name)
cur = con.cursor()
cur.execute(sql)
def return_dict_pair(row_item):
return_dict = {}
for column_name, row in zip(cur.description, row_item):
return_dict[column_name[0]] = row
return return_dict
return_list = []
for row in cur:
row_item = return_dict_pair(row)
return_list.append(row_item)
con.close()
return return_list
except Exception, e:
print '%s' % (e)