I am using SQLAlchemy with the ORM paragdim. I don't manage to find a way to do a CASE WHEN instruction. I don't find info about this on the web.
Is it possible ?
See sqlalchemy.sql.expression.case function and more examples on the documentation page. But it would look like this (verbatim from the documentation linked to):
case([(orderline.c.qty > 100, item.c.specialprice),
(orderline.c.qty > 10, item.c.bulkprice)
], else_=item.c.regularprice)
case(value=emp.c.type, whens={
'engineer': emp.c.salary * 1.1,
'manager': emp.c.salary * 3,
})
edit-1: (answering the comment) Sure you can, see example below:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True, autoincrement=True)
first_name = Column(String)
last_name = Column(String)
xpr = case([(User.first_name != None, User.first_name + " " + User.last_name),],
else_ = User.last_name).label("full_name")
qry = session.query(User.id, xpr)
for _usr in qry:
print _usr.fullname
Also see Using a hybrid for an example of case used in the hybrid properties.
I got this to work with an aggregate function, in this case func.sum
My Example Code
from sqlalchemy import func, case
my_case_stmt = case(
[
(MyTable.hit_type.in_(['easy', 'medium']), 1),
(MyTable.hit_type == 'hard', 3)
]
)
score = db.session.query(
func.sum(my_case_stmt)
).filter(
MyTable.success == 1
)
return score.scalar()
My Use Case
MyTable looks like this:
| hit_type | success |
-----------------------------
| easy | 1 |
| medium | 1 |
| easy | 0 |
| hard | 1 |
| easy | 0 |
| easy | 1 |
| medium | 1 |
| hard | 1 |
score is computed as such:
score = num_easy_hits + num_medium_hits + (3 * num_hard_hits)
4 successful easy/medium hits and 2 successful hard hits gives you (4 + (2*3)) = 10
Here is the link in the doc:
http://docs.sqlalchemy.org/en/latest/core/sqlelement.html?highlight=case#sqlalchemy.sql.expression.Case
but it confused me to see those examples, and there is no runnable code.
I have try many times, and I have met many kinds of problem.
Finally, I found two ways to implement "Case when" within sqlalchemy.
The first way:
By the way, my occasion is I need to mask the phone field depending on if the user has logged in.
#staticmethod
def requirement_list_common_query(user=None):
`enter code here` phone_mask = case(
[
(db.true() if user else db.false(), Requirement.temp_phone),
],
else_=func.concat(func.left(Requirement.temp_phone, 3), '****', func.right(Requirement.temp_phone, 4))
).label('temp_phone')
query = db.session.query(Requirement.company_id,
Company.uuid.label('company_uuid'),
Company.name.label('company_name'),
Requirement.uuid,
Requirement.title,
Requirement.content,
Requirement.level,
Requirement.created_at,
Requirement.published_at,
Requirement.end_at,
Requirement.status,
# Requirement.temp_phone,
phone_mask,
User.name.label('user_name'),
User.uuid.label('user_uuid')
)
query = query.join(Company, Company.id == Requirement.company_id) \
.join(User, User.id == Requirement.user_id)
return query
Requirement is my one of my models.
the user argument in the method 'requirement_list_common_query' is the logged-in user if the user has logged in.
the second way:
the occasion here is I want to classify the employees depend on their income.
the models are:
class Dept(Base):
__tablename__ = 'dept'
deptno = Column(Integer, primary_key=True)
dname = Column(String(14))
loc = Column(String(13))
def __repr__(self):
return str({
'deptno': self.deptno,
'dname': self.dname,
'loc': self.loc
})
class Emp(Base):
__tablename__ = 'emp'
empno = Column(Integer, primary_key=True)
ename = Column(String(10))
job = Column(String(9))
mgr = Column(Integer)
hiredate = Column(Date)
sal = Column(DECIMAL(7, 2))
comm = Column(DECIMAL(7, 2))
deptno = Column(Integer, ForeignKey('dept.deptno'))
def __repr__(self):
return str({
'empno': self.empno,
'ename': self.ename,
'job': self.job,
'deptno': self.deptno,
'comm': self.comm
})
Here is the code:
from sqlalchemy import text
income_level = case(
[
(text('(emp.sal + ifnull(emp.comm,0))<1500'), 'LOW_INCOME'),
(text('1500<=(emp.sal + ifnull(emp.comm,0))<3500'), 'MIDDLE_INCOME'),
(text('(emp.sal + ifnull(emp.comm,0))>=3500'), 'HIGH_INCOME'),
], else_='UNKNOWN'
).label('income_level')
emps = sess.query(Emp.ename, label('income', Emp.sal + func.ifnull(Emp.comm, 0)),
income_level).all()
for item in emps:
print(item.ename, item.income, item.income_level)
why did I use "text"? Because code like this in SQLAlchemy 1.2.8 can't be implemented. I have tried so long and I can't find way like this, as #van has said:
case([(orderline.c.qty > 100, item.c.specialprice),
(orderline.c.qty > 10, item.c.bulkprice)
], else_=item.c.regularprice)
case(value=emp.c.type, whens={
'engineer': emp.c.salary * 1.1,
'manager': emp.c.salary * 3,
})
hopes it will help!
Related
I am currently working on a delivery app. Basically, what I am trying to achieve here is to have a counter that display out any potential duplicated jobs that the customer might have accidently double entered.
The criteria to be considered as a duplicated job is as such:
Has to have same delivery_address and same pickup_date.
This is my postgresql tables:
class Order(models.Model):
id_order = models.AutoField(primary_key=True)
class OrderDelivery(models.Model):
order = models.ForeignKey(Order, on_delete=models.SET_NULL, null=True, blank=True)
delivery_address = models.TextField()
class OrderPickup(models.Model):
order = models.ForeignKey(Order, on_delete=models.SET_NULL, null=True, blank=True)
pickup_date = models.DateField(blank=True, null=True)
This is what I have came up with so far:
def dashboard_duplicated_job(orders, month):
# This function finds for jobs that has
# (1) exact pickup_date
# (2) exact delivery address
# and mark it as duplicated job.
# Find current month, if not take current year
now = timezone.localtime(timezone.now())
month = "12" if month and int(month) == 0 else month
if month == 12 or month == '12':
now = timezone.localtime(timezone.now())
last_month = now.today() + relativedelta(months=-1)
start_date = last_month.replace(day=1).strftime('%Y-%m-%d')
year = last_month.replace(day=1).strftime('%Y')
month = last_month.replace(day=1).strftime('%m')
last_day = calendar.monthrange(int(year), int(month))[1]
string_start = str(year) + "-" + str(month) + "-01"
string_end = str(year) + "-" + str(month) + "-" + str(last_day)
start_date = datetime.strptime(string_start + " 00:00:00", "%Y-%m-%d %H:%M:%S")
end_date = datetime.strptime(string_end + " 23:59:59", "%Y-%m-%d %H:%M:%S")
else:
year = now.year
last = calendar.monthrange(year, int(month))[1]
string_start = str(year) + "-" + str(month) + "-01"
string_end = str(year) + "-" + str(month) + "-" + str(last)
start_date = datetime.strptime(string_start + " 00:00:00", "%Y-%m-%d %H:%M:%S")
end_date = datetime.strptime(string_end + " 23:59:59", "%Y-%m-%d %H:%M:%S")
# Filter pickup_date of Orderpickup to display only orders related to current month
opu = OrderPickup.objects.filter(
order_id__in=orders,
pickup_date__range=(start_date, end_date)
).values(
'order_id',
'pickup_date',
)
# Filter OrderDelivery based on pickup_date range
ods = OrderDelivery.objects.filter(
order_id__in=opu.values('order_id')
).values(
'order_id',
'delivery_address',
# 'reference_no'
)
# Find duplicated delivery_address
dup_ods = ods.values(
'delivery_address'
).annotate(
duplicated_delivery_address=Count('delivery_address')
).filter(
duplicated_delivery_address__gt=1
)
# Extract the IDs of the duplicated delivery_address from dup_ods
dup_ods_id = ods.filter(
delivery_address__in=[item['delivery_address'] for item in dup_ods]
).values(
'order_id'
)
# Find duplicated pickup_date based on duplicated_address <not working as intended>
dup_opu = opu.filter(
order_id__in=dup_ods_id
).values(
'pickup_date'
).annotate(
duplicated_pickup_date=Count('pickup_date')
).filter(
duplicated_pickup_date__gt=1
)
dup_opu_id = opu.filter(
pickup_date__in=[item['pickup_date'] for item in dup_opu]
).order_by()
orders = orders.filter(id_order__in=dup_opu_id)
return orders
Based on what I have came up with, I am having situation where orders that has same delivery_address but different pickup_date is showing up.
example (correct):
| delivery_address | pickup_date|
| -------- | -------------- |
| here | 08-03-2022 |
| here | 08-03-2022 |
| there| 09-03-2022 |
| there | 09-03-2022 |
example (incorrect, currently displaying):
| delivery_address | pickup_date|
| -------- | -------------- |
| here | 08-03-2022 |
| here | 08-03-2022 |
| here | 09-03-2022 |
| there| 09-03-2022 |
| there | 09-03-2022 |
Please advise thank you.
UPDATE
I have managed to solve my problem. Below is my solution:
dup_job = orders.filter(
orderpickup__pickup_date__range=(start_date, end_date)
).values(
'id_order',
'orderdelivery__delivery_address',
'orderpickup__pickup_date'
).annotate(
duplicated=Count('orderdelivery__delivery_address')
).filter(
duplicated__gt=1
)
UPDATE
I have managed to solve my problem. Below is my solution:
dup_job = orders.filter(
orderpickup__pickup_date__range=(start_date, end_date)
).values(
'id_order',
'orderdelivery__delivery_address',
'orderpickup__pickup_date'
).annotate(
duplicated=Count('orderdelivery__delivery_address')
).filter(
duplicated__gt=1
)
I've run into an issue after following the SqlAlchemy guide here.
Given the following simplified module:
class _Base():
id_ = Column(Integer, primary_key=True, autoincrement=True)
Base = declarative_base(cls=_Base)
class BlgMixin():
#declared_attr
def __table_args__(cls):
return {'schema': "belgarath_backup", "extend_existing": True}
class DataAccessLayer():
def __init__(self):
conn_string = "mysql+mysqlconnector://root:root#localhost/"
self.engine = create_engine(conn_string)
def create_session(self):
Base.metadata.create_all(self.engine)
Session = sessionmaker()
Session.configure(bind=self.engine)
self.session = Session()
class Player(Base, BlgMixin):
__tablename__ = "player"
name_ = Column(String(100))
match = relationship("MatchResult")
class MatchResult(Base, BlgMixin):
__tablename__ = "match_result"
p1_id = Column(Integer, ForeignKey(f"{BlgMixin.__table_args__.get('schema')}.player.id_"))
p2_id = Column(Integer, ForeignKey(f"{BlgMixin.__table_args__.get('schema')}.player.id_"))
p1 = relationship("Player", foreign_keys=f"{BlgMixin.__table_args__.get('schema')}.player.id_")
p2 = relationship("Player", foreign_keys=f"{BlgMixin.__table_args__.get('schema')}.player.id_")
That I am attempting to build a query using:
dal = DataAccessLayer()
dal.create_session()
player_1 = aliased(Player)
player_2 = aliased(Player)
matches = dal.session.query(MatchResult.p1_id, player_1.name_, MatchResult.p2_id, player_2.name_)
matches = matches.join(player_1)
matches = matches.join(player_2)
Why am I getting the following error?
Could not determine join condition between parent/child tables on relationship Player.match - there are multiple foreign key paths linking the tables. Specify the 'foreign_keys' argument, providing a list of those columns which should be counted as containing a foreign key reference to the parent table.
I was pretty sure I'd specified the two foreign key relationships?
Update:
I've tried the following combination as I think has been suggested in the comments but got the same error:
p1 = relationship("Player", foreign_keys=[p1_id])
p2 = relationship("Player", foreign_keys=[p2_id])
Update 2:
Added some details on what the output should look like:
player table:
+-----+-------+
| id_ | name_ |
+-----+-------+
| 1 | foo |
| 2 | bar |
| 3 | baz |
| 4 | zoo |
+-----+-------+
match_result table:
+-----+-------+-------+
| id_ | p1_id | p2_id |
+-----+-------+-------+
| 1 | 1 | 2 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 1 | 4 |
+-----+-------+-------+
Query output:
+-------+---------+-------+---------+
| p1_id | p1_name | p2_id | p2_name |
+-------+---------+-------+---------+
| 1 | foo | 2 | bar |
| 2 | bar | 1 | foo |
| 3 | baz | 1 | foo |
| 1 | foo | 4 | zoo |
+-------+---------+-------+---------+
The two-way relationship and multiple join paths prevent SQLAlchemy from automatically determining the joins, and the relationships in both tables emit very similar error messages makes it difficult to understand where the problems lie (and whether a given change makes any progress in solving them). I found the simplest approach was to comment out the relationship in Player until MatchResult was working properly.
The changes to MatchResult are the same as those specified in the multiple join paths docs referenced in the question. To get the relationship in Player to work I specified the primary join condition so that SQLAlchemy could determine how to join to MatchResult.
class Player(Base):
__tablename__ = 'player'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String(100))
matches = orm.relationship('MatchResult',
primaryjoin="or_(Player.id == MatchResult.p1_id, Player.id == MatchResult.p2_id)")
class MatchResult(Base):
__tablename__ = 'match_result'
id = sa.Column(sa.Integer, primary_key=True)
p1_id = sa.Column(sa.Integer, sa.ForeignKey('player.id'))
p2_id = sa.Column(sa.Integer, sa.ForeignKey('player.id'))
p1 = orm.relationship("Player", foreign_keys=[p1_id])
p2 = orm.relationship("Player", foreign_keys=[p2_id])
Once these changes have been made, basic querying can be done without any explicit aliasing or joins.
ms = session.query(MatchResult)
for r in ms:
print(r.p1_id, r.p1.name, r.p2_id, r.p2.name)
p1 = session.query(Player).filter(Player.name == 'bar').one()
for m in p1.matches:
print(m.p1.name, m.p2.name)
The above code, for clarity and usefulness to other readers, does not include the inheritance, mixin and session management code that is specific to the OP's application. Thiis version includes all of these.
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy import orm
class _Base():
id_ = sa.Column(sa.Integer, primary_key=True, autoincrement=True)
Base = declarative_base(cls=_Base)
class BlgMixin():
#declared_attr
def __table_args__(cls):
return {'schema': "belgarath_backup", "extend_existing": True}
class DataAccessLayer():
def __init__(self):
conn_string = "mysql+mysqlconnector://root:root#localhost/"
self.engine = sa.create_engine(conn_string)
def create_session(self):
Base.metadata.create_all(self.engine)
Session = orm.sessionmaker()
Session.configure(bind=self.engine)
self.session = Session()
class Player(Base, BlgMixin):
__tablename__ = 'player'
name = sa.Column(sa.String(100))
match = orm.relationship('MatchResult',
primaryjoin="or_(Player.id_ == MatchResult.p1_id, Player.id_ == MatchResult.p2_id)")
class MatchResult(Base, BlgMixin):
__tablename__ = 'match_result'
p1_id = sa.Column(sa.Integer, sa.ForeignKey(f'{BlgMixin.__table_args__.get("schema")}.player.id_'))
p2_id = sa.Column(sa.Integer, sa.ForeignKey(f'{BlgMixin.__table_args__.get("schema")}.player.id_'))
p1 = orm.relationship("Player", foreign_keys=[p1_id])
p2 = orm.relationship("Player", foreign_keys=[p2_id])
dal = DataAccessLayer()
Base.metadata.drop_all(bind=dal.engine)
Base.metadata.create_all(bind=dal.engine)
names = ['foo', 'bar', 'baz', 'zoo']
dal.create_session()
ps = [Player(name=n) for n in names]
dal.session.add_all(ps)
dal.session.flush()
p1, p2, p3, p4 = ps
m1 = MatchResult(p1_id=p1.id_, p2_id=p2.id_)
m2 = MatchResult(p1_id=p2.id_, p2_id=p1.id_)
m3 = MatchResult(p1_id=p3.id_, p2_id=p1.id_)
m4 = MatchResult(p1_id=p1.id_, p2_id=p4.id_)
dal.session.add_all([m1, m2, m3, m4])
dal.session.commit()
ms = dal.session.query(MatchResult)
for r in ms:
print(r.p1_id, r.p1.name, r.p2_id, r.p2.name)
print()
p1 = dal.session.query(Player).filter(Player.name == 'bar').one()
for m in p1.match:
print(m.p1.name, m.p2.name)
dal.session.close()
The issue is with the definition of this relationship match = relationship("MatchResult") for the Player class. If you completely remove this line, and use the below definitions for the relationships, all the queries you mentioned should work as expected:
class Player(Base, BlgMixin):
__tablename__ = "player"
name_ = Column(String(100))
class MatchResult(Base, BlgMixin):
__tablename__ = "match_result"
p1_id = Column(ForeignKey(Player.id_))
p2_id = Column(ForeignKey(Player.id_))
p1 = relationship(Player, foreign_keys=p1_id)
p2 = relationship(Player, foreign_keys=p2_id)
In fact, the desired select query can also be constructed, but you need to specify the relationships explicitly on JOINs:
player_1 = aliased(Player)
player_2 = aliased(Player)
q = (
dal.session
.query(
MatchResult.p1_id,
player_1.name_,
MatchResult.p2_id,
player_2.name_,
)
.join(player_1, MatchResult.p1) # explicitly specify which relationship/FK to join on
.join(player_2, MatchResult.p2) # explicitly specify which relationship/FK to join on
)
I would, however, make few more changes to the model to make it even more user-friednly:
add backref to the relationship so that it can be navigated back from the Player
add a property to show all the matches of one player for both sides
Model definitions:
class Player(Base, BlgMixin):
__tablename__ = "player"
name_ = Column(String(100))
#property
def all_matches(self):
return self.matches_home + self.matches_away
class MatchResult(Base, BlgMixin):
__tablename__ = "match_result"
p1_id = Column(ForeignKey(Player.id_))
p2_id = Column(ForeignKey(Player.id_))
p1 = relationship(Player, foreign_keys=p1_id, backref="matches_home")
p2 = relationship(Player, foreign_keys=p2_id, backref="matches_away")
This will allow navigating the relationships as per below example:
p1 = session.query(Player).get(1)
print(p1)
for match in p1.all_matches:
print(" ", match)
I am learning Django, and have gotten quite a long way using the documentation and various other posts on StackOverflow, but I am a bit stuck now. Essentially, I want to query the database as follows:
SELECT
w.wname,
w.act_owner_id,
wi.act_code,
wi.act_provider,
SUM(ft.quantity) AS "position",
prices.Current,
prices.MonthEnd,
prices.YearEnd,
cost.avgcost,
sec.securityName AS "security"
FROM
finance_wrapperinstance as wi
INNER JOIN finance_wrapper as w ON
(w.id = wi.wrapperType_id)
INNER JOIN finance_security as sec ON
(ft.security_id = sec.id)
left outer JOIN finance_transaction as ft ON
(wi.id = ft.investwrapperID_id)
left outer Join
(SELECT
hp.security_id as secid,
max(Case when hp.date = '2019-11-18' then hp.price end) as 'Current',
max(Case when hp.date = '2019-10-30' then hp.price end) as 'MonthEnd',
max(Case when hp.date = '2018-12-31' then hp.price end) as 'yearEnd'
FROM finance_historicprice as hp
GROUP BY hp.security_id
) AS prices ON
(prices.secid =ft.security_id)
INNER JOIN
(SELECT
trans.security_id AS secid,
trans.investwrapperID_id as iwID,
SUM((CASE WHEN trans.buysell = 'b' THEN trans.quantity ELSE 0 END)* trans.price) /
SUM(CASE WHEN trans.buysell = 'b' THEN trans.quantity ELSE 0 END) AS avgCost
FROM
finance_transaction as trans
GROUP BY
trans.security_id,
trans.investwrapperID_id) AS cost ON
(cost.secid = ft.security_id and cost.iwID = wi.id)
GROUP BY
w.wname,
wi.wrapperType_id,
wi.act_code,
wi.act_provider,
ft.security_id
but I don't know how to use the Django ORM to get my prices subquery or cost subquery.
The models look like this:
class Wrapper(models.Model):
wname = models.CharField(max_length=50,null=False,verbose_name="Wrapper Name")
act_owner = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
class Wrapperinstance(models.Model):
wrapperType = models.ForeignKey(Wrapper,on_delete=models.CASCADE)
act_code = models.CharField(max_length=50,null=False, verbose_name="Account Code")
act_provider = models.CharField(max_length=50,null=False,verbose_name="Account Provider")
class Security(models.Model):
securityName = models.CharField(max_length=200,null=False,verbose_name="Security Name")
securityType = models.ForeignKey(InstrumentType,on_delete=models.CASCADE)
class Transaction(models.Model):
BUY = 'b'
SELL = 's'
BUY_SELL_CHOICES = [
(BUY, 'Buy'),
(SELL, 'Sell'),
]
security = models.ForeignKey(Security,default=1,on_delete=models.CASCADE)
investwrapperID = models.ForeignKey(Wrapperinstance,default=1,on_delete=models.CASCADE)
quantity = models.DecimalField(max_digits=14, decimal_places=4)
buysell = models.CharField(max_length=2,choices=BUY_SELL_CHOICES, default = BUY)
price = models.DecimalField(max_digits=14, decimal_places=2)
class HistoricPrice(models.Model):
security = models.ForeignKey(Security,default=1,on_delete=models.CASCADE)
date = models.DateField()
price = models.DecimalField(max_digits=14, decimal_places=2)
Any help or pointers would be greatly appreciated. As an additional point, I have functions which will choose the correct dates to be entered for the SQL query. This again makes me think the RAW method may be the way to go.
I am learning SQLAlchemy of Python.
Below is an example I am useing.
First I generate a datafile contains puppy information like below:
class Puppy(Base):
__tablename__ = 'puppy'
id = Column(Integer, primary_key=True)
name = Column(String(250), nullable=False)
gender = Column(String(6), nullable = False)
dateOfBirth = Column(Date)
shelter_id = Column(Integer, ForeignKey('shelter.id'))
weight = Column(Numeric(10))
male_names = ["Bailey", "Max", ...just some names..., "Luke", "Henry"]
female_names = ['Bella', 'Lucy', ...just some names..., 'Honey', 'Dakota']
def CreateRandomAge():
today = datetime.today()
days_old = randint(0,540)
birthday = today - timedelta(days = days_old)
return birthday
def CreateRandomWeight():
return random.uniform(1.0, 40.0)
for i,x in enumerate(male_names):
new_puppy = Puppy(name = x, gender = "male", dateOfBirth = CreateRandomAge(), weight= CreateRandomWeight())
session.add(new_puppy)
session.commit()
for i,x in enumerate(female_names):
new_puppy = Puppy(name = x, gender = "female", dateOfBirth = CreateRandomAge(), weight= CreateRandomWeight())
session.add(new_puppy)
session.commit()
Now I want to filter some kinds of puppies as below:
testpuppy = session.query(Puppy).filter_by(name='Lucy')
print(testpuppy)
birthdate = datetime.today() - timedelta(days=180)
smallpuppy = session.query(Puppy).filter_by(dateOfBirth < birthdate)
print(smallpuppy)
Then it is strange, because the testpuppy passed, I can get Lucy, but the dateofBirth can not pass, every time I want to get these smallpuppies, I just got an error
NameError: name 'dateOfBirth' is not defined
I really can not understand, why my filter can only be operated on some attribute, where is wrong?
The problem is that you need to use filter instead of filter_by like this:
smallpuppy = session.query(Puppy).filter(Puppy.dateOfBirth < birthdate)
For filter, the criterion should use ClassName.propertyName to access the column, and you can use < or >.
For filter_by, the criterion could be use propertyName directly to access the column, but you cannot use < or >.
Please refer to this answer, it will give you more details about the difference between filter and filter_by.
I'm using SQLalchemy to define my tables. These tables describe seismic events, which are arranged in Events, Origin, Magnitude, Real_Quantity and Time_Quantity. They are well to follow the standard of QuakeML. The Event table is relationship with Origin through .preferredOriginID and .publicID, Origin is relationship with Real_Quantity through .latitude_id and .id.
I want to find all longitudes and latitudes that are within a specified radius, but the problem is that both latitude and longitude are in the same Real_Quantity column and the Origin table is where specify which are different.
This is the code that I want to implement, but it is in MySQL
SELECT
id,
(
acos(
(
cos(radians(37))
* cos(radians(lat))
* cos(radians(lng) - radians(-122))
)
+ (
sin(radians(37))
* sin(radians(lat))
)
) * 3959
) AS distance
FROM markers
HAVING distance < 25
ORDER BY distance
LIMIT 0, 20;
This is what I did, but only you can use the latitudes and I want to use the latitudes with longitudes
z = self.session.query(Event) \
.join(Origin) \
.join(RealQuantity, Origin.latitude) \
.filter(
Event.preferredOriginID == Origin.publicID,
RealQuantity.id == Origin.latitude_id
) \
.group_by(Event, Origin.latitude, RealQuantity.value) \
.having(func.cos(RealQuantity.value) < 50)
Event:
id| publicID | preferredOriginID | preferredMagnitudeID | type |....
Origin:
id| publicID | time_id |latitude_id | longitude_id | depth_id |...
Real_Quantity:
id| value | ....
The Origin is just pointers, the values of this are in Real_Quantity
My models are:
class Event(Base):
__tablename__ = 'event'
id = Column(Integer, primary_key=True)
publicID = Column(String)
preferredOriginID = Column(String)
preferredMagnitudeID = Column(String)
type = Column(String)
typeCertainty = Column(String)
creationInfo_id = Column(Integer, ForeignKey('creation_info.id'))
creationInfo = relationship(CreationInfo, backref=backref('event', uselist=False))
class Origin(Base):
__tablename__ = 'origin'
id = Column(Integer, primary_key=True)
publicID = Column(String)
time_id = Column(Integer, ForeignKey('time_quantity.id'))
time = relationship(TimeQuantity, backref=backref('origin', uselist=False))
latitude_id = Column(Integer, ForeignKey('real_quantity.id'))
latitude = relationship(RealQuantity, foreign_keys=[latitude_id]
, backref=backref('origin_lat', uselist=False))
longitude_id = Column(Integer, ForeignKey('real_quantity.id'))
longitude = relationship(RealQuantity, foreign_keys=[longitude_id]
, backref=backref('origin_lon', uselist=False))
depth_id = Column(Integer, ForeignKey('real_quantity.id'))
depth = relationship(RealQuantity, foreign_keys=[depth_id],
backref=backref('origin_depth', uselist=False))
creationInfo_id = Column(Integer, ForeignKey('creation_info.id'))
creationInfo = relationship(CreationInfo, backref=backref('origin', uselist=False))
event_id = Column(Integer, ForeignKey('event.id'))
event = relationship('Event', backref=backref('origin', uselist=True))
class RealQuantity(Base):
__tablename__ = 'real_quantity'
id = Column(Integer, primary_key=True)
value = Column(Float)
uncertainty = Column(Float)
lowerUncertainty = Column(Float)
upperUncertainty = Column(Float)
confidenceLevel = Column(Float)
Not a solution (yet), just some comments:
For every query, you are doing a complex calculation on every entry in the Origin table. As the number of entries increases, this will become very slow (computationally expensive).
Think of a circle (x=lon, y=lat, r=distance) projected on the globe. You can calculate min and max latitude easily; min and max longitude can also be done, although the math is quite a bit trickier.
If you have properly indexed the Origin table by latitude and longitude, you can do a very fast (computationally cheap) initial box-select on min_lat <= lat <= max_lat and min_lon <= lon <= max_lon which should trivially discard 99% of the entries (depending on radius and clustery-ness of the Origin points); remaining entries should have roughly an 80% chance of belonging to your desired data-set, and you only need to run the expensive calculation on the remaining entries.
I would strongly recommend writing this as a stored procedure.