Getting the column name of a specific data with sqlalchemy - python

I am using sqlalchemy 0.8 and I want to get the column name of the input only, not all the column in the table.
here is the code:
rec = raw_input("Enter keyword to search: ")
res = session.query(test.__table__).filter(test.fname == rec).first()
data = ','.join(map(str, res)) +","
print data
#saw this here # SO but not the one I wanted. It displays all of the columns
columns = [m.key for m in data.columns]
print columns

You can just query for the columns you want. Like if you had some model MyModel
You can do:
session.query(MyModel.wanted_column1, ...) ... # rest of the query
This would only select all the columns mentioned there.
You can use the select syntax.
Or if you still want the model object to be returned and certain columns not loaded, you can use deferred column loading.

Related

How to use variable column name in filter in Django ORM?

I have two tables BloodBank(id, name, phone, address) and BloodStock(id, a_pos, b_pos, a_neg, b_neg, bloodbank_id). I want to fetch all the columns from two tables where the variable column name (say bloodgroup) which have values like a_pos or a_neg... like that and their value should be greater than 0. How can I write ORM for the same?
SQL query is written like this to get the required results.
sql="select * from public.bloodbank_bloodbank as bb, public.bloodbank_bloodstock as bs where bs."+blood+">0 and bb.id=bs.bloodbank_id order by bs."+blood+" desc;"
cursor = connection.cursor()
cursor.execute(sql)
bloodbanks = cursor.fetchall()
You could be more specific in your questions, but I believe you have a variable called blood which contains the string name of the column and that the columns a_pos, b_pos, etc. are numeric.
You can use a dictionary to create keyword arguments from strings:
filter_dict = {bloodstock__blood + '__gt': 0}
bloodbanks = Bloodbank.objects.filter(**filter_dict)
This will get you Bloodbank objects that have a related bloodstock with a greater than zero value in the bloodgroup represented by the blood variable.
Note that the way I have written this, you don't get the bloodstock columns selected, and you may get duplicate bloodbanks. If you want to get eliminate duplicate bloodbanks you can add .distinct() to your query. The bloodstocks are available for each bloodbank instance using .bloodstock_set.all().
The ORM will generate SQL using a join. Alternatively, you can do an EXISTS in the where clause and no join.
from django.db.models import Exists, OuterRef
filter_dict = {blood + '__gt': 0}
exists = Exists(Bloodstock.objects.filter(
bloodbank_id=OuterRef('id'),
**filter_dict
)
bloodbanks = Bloodbank.objects.filter(exists)
There will be no need for a .distinct() in this case.

Select specific columns to read from PostgreSQL based on python list

I have two lists : one contains the column names of categorical variables and the other numeric as shown below.
cat_cols = ['stat','zip','turned_off','turned_on']
num_cols = ['acu_m1','acu_cnt_m1','acu_cnt_m2','acu_wifi_m2']
These are the columns names in a table in Redshift.
I want to pass these as a parameter to pull only numeric columns from a table in Redshift(PostgreSql),write that into a csv and close the csv.
Next I want to pull only cat_cols and open the csv and then append to it and close it.
my query so far:
#1.Pull num data:
seg = ['seg1','seg2']
sql_data = str(""" SELECT {num_cols} """ + """FROM public.""" + str(seg) + """ order by random() limit 50000 ;""")
df_data = pd.read_sql(sql_data, cnxn)
# Write to csv.
df_data.to_csv("df_sample.csv",index = False)
#2.Pull cat data:
sql_data = str(""" SELECT {cat_cols} """ + """FROM public.""" + str(seg) + """ order by random() limit 50000 ;""")
df_data = pd.read_sql(sql_data, cnxn)
# Append to df_seg.csv and close the connection to csv.
with open("df_sample.csv",'rw'):
## Append to the csv ##
This is the first time I am trying to do selective querying based on python lists and hence stuck on how to pass the list as column names to select from table.
Can someone please help me with this?
If you want, to make a query in a string representation, in your case will be better to use format method, or f-strings (required python 3.6+).
Example for the your case, only with built-in format function.
seg = ['seg1', 'seg2']
num_cols = ['acu_m1','acu_cnt_m1','acu_cnt_m2','acu_wifi_m2']
query = """
SELECT {} FROM public.{} order by random() limit 50000;
""".format(', '.join(num_cols), seg)
print(query)
If you want use only one item from the seg array, use seg[0] or seg[1] in format function.
I hope this will help you!

How can I get column name and type from an existing table in SQLAlchemy?

Suppose I have the table users and I want to know what the column names are and what the types are for each column.
I connect like this;
connectstring = ('mssql+pyodbc:///?odbc_connect=DRIVER%3D%7BSQL'
'+Server%7D%3B+server%3D.....')
engine = sqlalchemy.create_engine(connectstring).connect()
md = sqlalchemy.MetaData()
table = sqlalchemy.Table('users', md, autoload=True, autoload_with=engine)
columns = table.c
If I call
for c in columns:
print type(columns)
I get the output
<class 'sqlalchemy.sql.base.ImmutableColumnCollection'>
printed once for each column in the table.
Furthermore,
print columns
prints
['users.column_name_1', 'users.column_name_2', 'users.column_name_3'....]
Is it possible to get the column names without the table name being included?
columns have name and type attributes
for c in columns:
print c.name, c.type
Better use "inspect" to obtain only information from the columns, with that you do not reflect the table.
import sqlalchemy
from sqlalchemy import inspect
engine = sqlalchemy.create_engine(<url>)
insp = inspect(engine)
columns_table = insp.get_columns(<table_name>, <schema>) #schema is optional
for c in columns_table :
print(c['name'], c['type'])
You could call the column names from the Information Schema:
SELECT *
FROM information_schema.columns
WHERE TABLE_NAME = 'users'
Then sort the result as an array, and loop through them.

ParsePy - Filter results that contain string

I have a table stored in Parse.com, and I'm using ParsePy to get and filter the data in my Python Django program.
My table has three columns, objectId (string), name (string), and type (array). I want to query the name column and return any objects that contain the partial term xyz. For example, if I search for amp, and there's a row where name: Example Name this row should be returned.
Here's my code so far:
def searchResults(self, searchTerm):
register('parseKey', 'parseRestKey')
myParseObject = ParseObject()
allData = myParseObject.Query.filter(name = searchTerm)
return allData
The problem with this code is it only works if searchTerm is exactly the same as what's in the name column. The Parse REST API says that the queries accept regex parameters, but I'm not sure how to use them in ParsePy.
Yes, you have to use regex for that. And, this is how it would be:
allData = myParseObject.Query.filter(name__regex = "<your_regex>")

Keep smallest value for each unique ID with arcpy/numpy

I've got a ESRI Point Shape file with (amongst others) a nMSLINK field and a DIAMETER field. The MSLINK is not unique, because of a spatial join. What I want to achieve is to keep only the features in the shapefile that have a unique MSLINK and the smallest DIAMETER value, together with the corresponding values in the other fields. I can use a searchcursor to achieve this (looping through all features and removing each feature that does not comply, but this takes ages (> 75000 features). I was wondering if eg. numpy could do the trick faster in ArcMap/arcpy.
I think, making that kind of processing would definitely be a lot faster if you work on memory instead of interacting with arcgis. For example, by putting all the rows first into a python object (probably a namedtuple would be a good option here). Then you can find out which rows you want to delete or insert.
The fastest approach depends on a) if you have a lot of (MSLINK) repeated rows, then the fastest would be inserting just the ones you need in a new layer. Or b) if the rows to be deleted are just a few compared to the total of rows, then deleting is faster.
For a) you'll need to fetch all fields into the tuple, including the point coordinates, so that you can just create a new feature class and insert the new rows.
# Example of Variant a:
from collections import namedtuple
# assuming the following:
source_fc # contains name of the fclass
the_path # contains path to the shape
cleaned_fc # the name of the cleaned fclass
# use all fields of source_fc plus the shape token to get a touple with xy
# coordinates (using 'mslink' and 'diam' here to simplify the example)
fields = ['mslink', 'diam', 'field3', ... ]
all_fields = fields + ['SHAPE#XY']
# define a namedtuple to hold and work with the rows, use the name 'point' to
# hold the coordinates-tuple
Row = namedtuple('Row', fields + ['point'])
data = []
with arcpy.da.SearchCursor(source_fc, fields) as sc:
for r in sc:
# unzip the values from each row into a new Row (namedtuple) and append
# to data
data.append(Row(*r))
# now just delete the rows we don't want, for this, the easiest way, is probably
# to order the tuple first after MSLINK and then after the diamater...
data = sorted(data, key = lambda x : (x.mslink, x.diam))
# ... now just keep the first ones for each mslink
to_keep = []
last_mslink = None
for d in data:
if last_mslink != d.mslink:
last_mslink = d.mslink
to_keep.append(d)
# create a new feature class with the same fields as the source_fc
arcpy.CreateFeatureclass_management(
out_path=the_path, out_name=cleaned_fc, template=source_fc)
with arcpy.da.InsertCursor(cleaned_fc, all_fields) as ic:
for r in to_keep:
ic.insertRow(*r)
And for alternative b) I would just fetch 3 fields, a unique ID, MSLINK and the diameter. Then make a delete list (here you only need the unique ids). Then loop again through the feature class and delete the rows with the id on your delete-list. Just to be sure, I would duplicate the feature class first, and work on a copy.
There are a few steps you can take to accomplish this task more efficiently. First and foremost, making use of the data analyst cursor as opposed to the older version of cursor will increase the speed of your process. This assumes you are working in 10.1 or beyond. Then you can employ summary statistics, namely its ability to find a minimum value based off a case field. For yours, the case field would be nMSLINK.
The code below first creates a statistics table with all unique 'nMSLINK' values, and its corresponding minimum 'DIAMETER' value. I then use a table select to select out only rows in the table whose 'FREQUENCY' field is not 1. From here I iterate through my new table and start to build a list of strings that will make up a final sql statement. After this iteration, I use the python join function to create an sql string that looks something like this:
("nMSLINK" = 'value1' AND "DIAMETER" <> 624.0) OR ("nMSLINK" = 'value2' AND "DIAMETER" <> 1302.0) OR ("nMSLINK" = 'value3' AND "DIAMETER" <> 1036.0) ...
The sql selects rows where nMSLINK values are not unique and where DIAMETER values are not the minimum. Using this SQL, I select by attribute and delete selected rows.
This SQL statement is written assuming your feature class is in a file geodatabase and that 'nMSLINK' is a string field and 'DIAMETER' is a numeric field.
The code has the following inputs:
Feature: The feature to be analyzed
Workspace: A folder that will store a couple intermediate tables temporarily
TempTableName1: A name for one temporary table.
TempTableName2: A name for a second temporary table
Field1 = The nonunique field
Field2 = The field with the numeric values that you wish to find the lowest of
Code:
# Import modules
from arcpy import *
import os
# Local variables
#Feature to analyze
Feature = r"C:\E1B8\ScriptTesting\Workspace\Workspace.gdb\testfeatureclass"
#Workspace to export table of identicals
Workspace = r"C:\E1B8\ScriptTesting\Workspace"
#Name of temp DBF table file
TempTableName1 = "Table1"
TempTableName2 = "Table2"
#Field names
Field1 = "nMSLINK" #nonunique
Field2 = "DIAMETER" #field with numeric values
#Make layer to allow selection
MakeFeatureLayer_management (Feature, "lyr")
#Path for first temp table
Table = os.path.join (Workspace, TempTableName1)
#Create statistics table with min value
Statistics_analysis (Feature, Table, [[Field2, "MIN"]], [Field1])
#SQL Select rows with frequency not equal to one
sql = '"FREQUENCY" <> 1'
# Path for second temp table
Table2 = os.path.join (Workspace, TempTableName2)
# Select rows with Frequency not equal to one
TableSelect_analysis (Table, Table2, sql)
#Empty list for sql bits
li = []
# Iterate through second table
cursor = da.SearchCursor (Table2, [Field1, "MIN_" + Field2])
for row in cursor:
# Add SQL bit to list
sqlbit = '("' + Field1 + '" = \'' + row[0] + '\' AND "' + Field2 + '" <> ' + str(row[1]) + ")"
li.append (sqlbit)
del row
del cursor
#Create SQL for selection of unwanted features
sql = " OR ".join (li)
print sql
#Select based on SQL
SelectLayerByAttribute_management ("lyr", "", sql)
#Delete selected features
DeleteFeatures_management ("lyr")
#delete temp files
Delete_management ("lyr")
Delete_management (Table)
Delete_management (Table2)
This should be quicker than a straight-up cursor. Let me know if this makes sense. Good luck!

Categories