How to use variable column name in filter in Django ORM? - python

I have two tables BloodBank(id, name, phone, address) and BloodStock(id, a_pos, b_pos, a_neg, b_neg, bloodbank_id). I want to fetch all the columns from two tables where the variable column name (say bloodgroup) which have values like a_pos or a_neg... like that and their value should be greater than 0. How can I write ORM for the same?
SQL query is written like this to get the required results.
sql="select * from public.bloodbank_bloodbank as bb, public.bloodbank_bloodstock as bs where bs."+blood+">0 and bb.id=bs.bloodbank_id order by bs."+blood+" desc;"
cursor = connection.cursor()
cursor.execute(sql)
bloodbanks = cursor.fetchall()

You could be more specific in your questions, but I believe you have a variable called blood which contains the string name of the column and that the columns a_pos, b_pos, etc. are numeric.
You can use a dictionary to create keyword arguments from strings:
filter_dict = {bloodstock__blood + '__gt': 0}
bloodbanks = Bloodbank.objects.filter(**filter_dict)
This will get you Bloodbank objects that have a related bloodstock with a greater than zero value in the bloodgroup represented by the blood variable.
Note that the way I have written this, you don't get the bloodstock columns selected, and you may get duplicate bloodbanks. If you want to get eliminate duplicate bloodbanks you can add .distinct() to your query. The bloodstocks are available for each bloodbank instance using .bloodstock_set.all().
The ORM will generate SQL using a join. Alternatively, you can do an EXISTS in the where clause and no join.
from django.db.models import Exists, OuterRef
filter_dict = {blood + '__gt': 0}
exists = Exists(Bloodstock.objects.filter(
bloodbank_id=OuterRef('id'),
**filter_dict
)
bloodbanks = Bloodbank.objects.filter(exists)
There will be no need for a .distinct() in this case.

Related

How to filter if this id not exist in another table?

I'm try to filter if id of column A not exist in column B by this code.
query = db.session.query().select_from(Spare_Parts, Vendors, Replacement)\
.filter(Vendors.vendor_code == Spare_Parts.vendor_code,\
~ exists().where(Spare_Parts.spare_part_code == Replacement.spare_part_code))
I want to query the data from Spare_Parts that not have an id exist in Replacement as a foriegn key but i got the error like this.
Select statement 'SELECT *
FROM spare_parts, replacement
WHERE spare_parts.spare_part_code = replacement.spare_part_code' returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.
So what is a problem and how to fix that.
try to use the subquery like this instead
to filter spare_part_code from spare_parts which are not in replacement table``
SELECT *
FROM spare_parts
WHERE spare_parts.spare_part_code not in
(select distinct
replacement.spare_part_code
FROM replacement)
or you can use not exists
SELECT *
FROM spare_parts
WHERE not exists
(select 1
FROM replacement
where spare_parts.spare_parts_code = replacement.spare_parts_code)

Pass column values into selection for SQL query using pandas.io.sql

I have multiple sql queries I need to run (via pandas.io.sql / .read_sql) that have a very similar structure so I am attempting to parameterize them.
I am wondering if there is a way to pass in column values using .format (which works for strings).
My query (truncated to simplify this post):
sql= '''
SELECT DISTINCT
CAST(report_suite AS STRING) AS report_suite, post_pagename,
COUNT(DISTINCT(CONCAT(post_visid_high,post_visid_low))) AS unique_visitors
FROM
FOO.db
WHERE
date_time BETWEEN '{0}' AND '{1}'
AND report_suite = '{2}'
GROUP BY
report_suite, post_pagename
ORDER BY
unique_visitors DESC
'''.format(*parameters)
What I would like to do, is be able to parameterize the COUNT(DISTINCT(CONCAT(post_visid_high, post_visid_low))) as Unique Visitors
like this somehow:
COUNT(DISTINCT({3})) as {'4'}
The problem I can't seem to get around is that in order to do this would require storing the column names as something other than a string to avoid the quotes. Is there any good ways around this?
Consider the following approach:
sql_dynamic_parms = dict(
func1='CONCAT(post_visid_high,post_visid_low)',
name1='unique_visitors'
)
sql= '''
SELECT DISTINCT
CAST(report_suite AS STRING) AS report_suite, post_pagename,
COUNT(DISTINCT({func1})) AS {name1}
FROM
FOO.db
WHERE
date_time BETWEEN %(date_from)s AND %(date_to)s
AND report_suite = %(report_suite)s
GROUP BY
report_suite, post_pagename
ORDER BY
unique_visitors DESC
'''.format(**sql_dynamic_parms)
params = dict(
date_from=pd.to_datetime('2017-01-01'),
date_to=pd.to_datetime('2017-12-01'),
report_suite=111
)
df = pd.read_sql(sql, conn, params=params)
PS you may want to read PEP-249 to see what kind of parameter placeholders are accepted

Write dictionnary with tuple containing parameters as unique value for features into postgresql table

In Python 2.7, let a dictionary with features' IDs as keys.
There are thousands of features.
Each feature has a single value, but this value is a tuple containing 6 parameters for the features (for example; size, color, etc.)
On the other hand I have a postgreSQL table in a database where these features parameters must be saved.
The features' IDs are already set in the table (as well as other informations about these features).
The IDs are unique (they are random (thus not serial) but unique numbers).
There is 6 empty columns with names: "param1", "param2", "param3", ..., "param6".
I already have a tuple containing these names:
columns = ("param1", "param2", "param3", ..., "param6")
The code I have doesn't work for saving these parameters in their respective columns for each feature:
# "view" is the dictionary with features's ID as keys()
# and their 6 params stored in values().
values = [view[i] for i in view.keys()]
columns = ("param1","param2","param3","param4","param5","param6")
conn = psycopg2.connect("dbname=mydb user=username password=password")
curs = conn.cursor()
curs.execute("DROP TABLE IF EXISTS mytable;")
curs.execute("CREATE TABLE IF NOT EXISTS mytable (LIKE originaltable including defaults including constraints including indexes);")
curs.execute("INSERT INTO mytable SELECT * from originaltable;")
insertstatmnt = 'INSERT INTO mytable (%s) values %s'
alterstatement = ('ALTER TABLE mytable '+
'ADD COLUMN param1 text,'+
'ADD COLUMN param2 text,'+
'ADD COLUMN param3 real,'+
'ADD COLUMN param4 text,'+
'ADD COLUMN param5 text,'+
'ADD COLUMN param6 text;'
)
curs.execute(alterstatement) # It's working up to this point.
curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values))) # The problem seems to be here.
conn.commit() # Making change to DB !
curs.close()
conn.close()
Here's the error I have:
curs.execute(insert_statement, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values)))
ProgrammingError: INSERT has more expressions than target columns
I must miss something.
How to do that properly?
When using '%s' to get the statement as what I think you want, you just need to change a couple things.
Ignoring c.execute(), this statement is by no means wrong, but it does not return what you are looking for. Using my own version, this is what I got with that statement. I also ignored psycopg2.extensions.AsIs() because, it is just a Adapter conform to the ISQLQuote protocol useful for objects whose string representation is already valid as SQL representation.
>>> values = [ i for i in range(0,5)] #being I dont know the keys, I just made up values.
>>> insertstatmnt, (','.join(columns), tuple(vlaues))
>>> ('INSERT INTO mytable (%s) values %s', ('param1,param2,param3,param4,param5,param6', (0, 1, 2, 3, 4)))
As you can see, what you entered returns a tuple with the values.
>>> insertstatmnt % (','.join(columns), tuple(values))
>>> 'INSERT INTO mytable (param1,param2,param3,param4,param5,param6) values (0, 1, 2, 3, 4)'
Where as, this returns a string that is more likely to be read by the SQL. The values obviously do not match the specified ones. I believe the problem you have lies within creating your string.
Reference for pycopg2: http://initd.org/psycopg/docs/extensions.html
As I took the syntax of the psycopg2 command from this thread:
Insert Python Dictionary using Psycopg2
and as my values dictionary doesn't exactly follow the same structure as the mentioned example (I also have 1 key as ID, like in this example, but mine has only 1 corresponding value, as a tuple containing my 6-parameters, thus "nested 1 lever deeper" instead of directly 6 values corresponding to the keys) I need to loop through all features to execute one SQL statement per feature:
[curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(', '.join(columns)), i)) for i in tuple(values)].
This, is working.

Dynamically add filter to SQLAlchemy TextClause

Assume I have a SQLAlchemy table which looks like:
class Country:
name = VARCHAR
population = INTEGER
continent = VARCHAR
num_states = INTEGER
My application allow seeing name and population for all Countries. So I have a TextClause which looks like
"select name, population from Country"
I allow raw queries in my application so I don't have option to change this to selectable.
At runtime, I want to allow my users to choose a field name and put a field value on which I want to allow filtering. eg: User can say I only want to see name and population for countries where Continent is Asia. So I dynamically want to add the filter
.where(Country.c.continent == 'Asia')
But I can't add .where to a TextClause.
Similarly, my user may choose to see name and population for countries where num_states is greater than 10. So I dynamically want to add the filter
.where(Country.c.num_states > 10)
But again I can't add .where to a TextClause.
What are the options I have to solve this problem?
Could subquery help here in any way?
Please add a filter based on the conditions. filter is used for adding where conditions in sqlalchemy.
Country.query.filter(Country.num_states > 10).all()
You can also do this:
query = Country.query.filter(Country.continent == 'Asia')
if user_input == 'states':
query = query.filter(Country.num_states > 10)
query = query.all()
This is not doable in a general sense without parsing the query. In relational algebra terms, the user applies projection and selection operations to a table, and you want to apply selection operations to it. Since the user can apply arbitrary projections (e.g. user supplies SELECT id FROM table), you are not guaranteed to be able to always apply your filters on top, so you have to apply your filters before the user does. That means you need to rewrite it to SELECT id FROM (some subquery), which requires parsing the user's query.
However, we can sort of cheat depending on the database that you are using, by having the database engine do the parsing for you. The way to do this is with CTEs, by basically shadowing the table name with a CTE.
Using your example, it looks like the following. User supplies query
SELECT name, population FROM country;
You shadow country with a CTE:
WITH country AS (
SELECT * FROM country
WHERE continent = 'Asia'
) SELECT name, population FROM country;
Unfortunately, because of the way SQLAlchemy's CTE support works, it is tough to get it to generate a CTE for a TextClause. The solution is to basically generate the string yourself, using a custom compilation extension, something like this:
class WrappedQuery(Executable, ClauseElement):
def __init__(self, name, outer, inner):
self.name = name
self.outer = outer
self.inner = inner
#compiles(WrappedQuery)
def compile_wrapped_query(element, compiler, **kwargs):
return "WITH {} AS ({}) {}".format(
element.name,
compiler.process(element.outer),
compiler.process(element.inner))
c = Country.__table__
cte = select(["*"]).select_from(c).where(c.c.continent == "Asia")
query = WrappedQuery("country", cte, text("SELECT name, population FROM country"))
session.execute(query)
From my tests, this only works in PostgreSQL. SQLite and SQL Server both treat it as recursive instead of shadowing, and MySQL does not support CTEs.
I couldn't find anything nice for this in the documentation for this. I ended up resorting to pretty much just string processing.... but at least it works!
from sqlalchemy.sql import text
query = """select name, population from Country"""
if continent is not None:
additional_clause = """WHERE continent = {continent};"""
query = query + additional_clause
text_clause = text(
query.format(
continent=continent,
),
)
else:
text_clause = text(query)
with sql_connection() as conn:
results = conn.execute(text_clause)
You could also chain this logic with more clauses, although you'll have to create a boolean flag for the first WHERE clause and then use AND for the subsequent ones.

Keep smallest value for each unique ID with arcpy/numpy

I've got a ESRI Point Shape file with (amongst others) a nMSLINK field and a DIAMETER field. The MSLINK is not unique, because of a spatial join. What I want to achieve is to keep only the features in the shapefile that have a unique MSLINK and the smallest DIAMETER value, together with the corresponding values in the other fields. I can use a searchcursor to achieve this (looping through all features and removing each feature that does not comply, but this takes ages (> 75000 features). I was wondering if eg. numpy could do the trick faster in ArcMap/arcpy.
I think, making that kind of processing would definitely be a lot faster if you work on memory instead of interacting with arcgis. For example, by putting all the rows first into a python object (probably a namedtuple would be a good option here). Then you can find out which rows you want to delete or insert.
The fastest approach depends on a) if you have a lot of (MSLINK) repeated rows, then the fastest would be inserting just the ones you need in a new layer. Or b) if the rows to be deleted are just a few compared to the total of rows, then deleting is faster.
For a) you'll need to fetch all fields into the tuple, including the point coordinates, so that you can just create a new feature class and insert the new rows.
# Example of Variant a:
from collections import namedtuple
# assuming the following:
source_fc # contains name of the fclass
the_path # contains path to the shape
cleaned_fc # the name of the cleaned fclass
# use all fields of source_fc plus the shape token to get a touple with xy
# coordinates (using 'mslink' and 'diam' here to simplify the example)
fields = ['mslink', 'diam', 'field3', ... ]
all_fields = fields + ['SHAPE#XY']
# define a namedtuple to hold and work with the rows, use the name 'point' to
# hold the coordinates-tuple
Row = namedtuple('Row', fields + ['point'])
data = []
with arcpy.da.SearchCursor(source_fc, fields) as sc:
for r in sc:
# unzip the values from each row into a new Row (namedtuple) and append
# to data
data.append(Row(*r))
# now just delete the rows we don't want, for this, the easiest way, is probably
# to order the tuple first after MSLINK and then after the diamater...
data = sorted(data, key = lambda x : (x.mslink, x.diam))
# ... now just keep the first ones for each mslink
to_keep = []
last_mslink = None
for d in data:
if last_mslink != d.mslink:
last_mslink = d.mslink
to_keep.append(d)
# create a new feature class with the same fields as the source_fc
arcpy.CreateFeatureclass_management(
out_path=the_path, out_name=cleaned_fc, template=source_fc)
with arcpy.da.InsertCursor(cleaned_fc, all_fields) as ic:
for r in to_keep:
ic.insertRow(*r)
And for alternative b) I would just fetch 3 fields, a unique ID, MSLINK and the diameter. Then make a delete list (here you only need the unique ids). Then loop again through the feature class and delete the rows with the id on your delete-list. Just to be sure, I would duplicate the feature class first, and work on a copy.
There are a few steps you can take to accomplish this task more efficiently. First and foremost, making use of the data analyst cursor as opposed to the older version of cursor will increase the speed of your process. This assumes you are working in 10.1 or beyond. Then you can employ summary statistics, namely its ability to find a minimum value based off a case field. For yours, the case field would be nMSLINK.
The code below first creates a statistics table with all unique 'nMSLINK' values, and its corresponding minimum 'DIAMETER' value. I then use a table select to select out only rows in the table whose 'FREQUENCY' field is not 1. From here I iterate through my new table and start to build a list of strings that will make up a final sql statement. After this iteration, I use the python join function to create an sql string that looks something like this:
("nMSLINK" = 'value1' AND "DIAMETER" <> 624.0) OR ("nMSLINK" = 'value2' AND "DIAMETER" <> 1302.0) OR ("nMSLINK" = 'value3' AND "DIAMETER" <> 1036.0) ...
The sql selects rows where nMSLINK values are not unique and where DIAMETER values are not the minimum. Using this SQL, I select by attribute and delete selected rows.
This SQL statement is written assuming your feature class is in a file geodatabase and that 'nMSLINK' is a string field and 'DIAMETER' is a numeric field.
The code has the following inputs:
Feature: The feature to be analyzed
Workspace: A folder that will store a couple intermediate tables temporarily
TempTableName1: A name for one temporary table.
TempTableName2: A name for a second temporary table
Field1 = The nonunique field
Field2 = The field with the numeric values that you wish to find the lowest of
Code:
# Import modules
from arcpy import *
import os
# Local variables
#Feature to analyze
Feature = r"C:\E1B8\ScriptTesting\Workspace\Workspace.gdb\testfeatureclass"
#Workspace to export table of identicals
Workspace = r"C:\E1B8\ScriptTesting\Workspace"
#Name of temp DBF table file
TempTableName1 = "Table1"
TempTableName2 = "Table2"
#Field names
Field1 = "nMSLINK" #nonunique
Field2 = "DIAMETER" #field with numeric values
#Make layer to allow selection
MakeFeatureLayer_management (Feature, "lyr")
#Path for first temp table
Table = os.path.join (Workspace, TempTableName1)
#Create statistics table with min value
Statistics_analysis (Feature, Table, [[Field2, "MIN"]], [Field1])
#SQL Select rows with frequency not equal to one
sql = '"FREQUENCY" <> 1'
# Path for second temp table
Table2 = os.path.join (Workspace, TempTableName2)
# Select rows with Frequency not equal to one
TableSelect_analysis (Table, Table2, sql)
#Empty list for sql bits
li = []
# Iterate through second table
cursor = da.SearchCursor (Table2, [Field1, "MIN_" + Field2])
for row in cursor:
# Add SQL bit to list
sqlbit = '("' + Field1 + '" = \'' + row[0] + '\' AND "' + Field2 + '" <> ' + str(row[1]) + ")"
li.append (sqlbit)
del row
del cursor
#Create SQL for selection of unwanted features
sql = " OR ".join (li)
print sql
#Select based on SQL
SelectLayerByAttribute_management ("lyr", "", sql)
#Delete selected features
DeleteFeatures_management ("lyr")
#delete temp files
Delete_management ("lyr")
Delete_management (Table)
Delete_management (Table2)
This should be quicker than a straight-up cursor. Let me know if this makes sense. Good luck!

Categories