py2neo ogm Relationship with Multiple Node Labels - python

I would like to use OGM of py2neo to represent a relationship from one node type to two node types.
I have a solution (below) that works only to store nodes/relationships in the DB, and I could not find one that works properly when retrieving relationships.
This is my example. Consider the relationship OWNS from a Person to a Car:
from py2neo.ogm import GraphObject, Property, RelatedTo
from py2neo import Graph
class Person(GraphObject):
name = Property()
Owns = RelatedTo("Car")
class Car(GraphObject):
model = Property()
g = Graph(host="localhost", user="neo4j", password="neo4j")
# Create Pete
p = Person()
p.name = "Pete"
# Create Ferrari
c = Car()
c.model = "Ferrari"
# Pete OWNS Ferrari
p.Owns.add(c)
# Store
g.push(p)
This works well and fine. Now, let's assume that a Person OWNS a House as well (this code continues from the one above):
class House(GraphObject):
city = Property()
# Create House
h = House()
h.city = "New York"
# Pete OWNS House in New York
p.Owns.add(h)
# Update
g.push(p)
The "to" end of the relationship OWNS is supposed to point to a Car, not a House. But apparently py2neo does not care that much and stores everything in the DB as expected: a Person, a Car and a House connected via OWNS relationships.
Now the problem is to use the above classes to retrieve nodes and relationships. While node properties are loaded correctly, relationships are not:
p = Person.select(g).where(name="Pete").first()
for n in list(p.Owns):
print type(n).__name__
This results in:
Car
Car
This behavior is consistent with the class objects.
How can I model "Person OWNS Car" and "Person OWNS House" with the same class in py2neo.ogm? Is there any known solution or workaround that I can use here?

The issue is that "Owns" is set up as a relationship to the "Car" node. You need to set up another relationship to own a house. If you want the relationship to have the label of "OWNS" in Neo4j, you need to populate the second variable of the RelatedTo function. This is covered in the Py2Neo documentation (http://py2neo.org/v3/) in chapter 3.
class Person(GraphObject):
name = Property()
OwnsCar = RelatedTo("Car", "OWNS")
OwnsHouse = RelatedTo("House" "OWNS")
class Car(GraphObject):
model = Property()
class House(GraphObject):
city = Property()
I do want to say that Rick's answer addressed something I was trying to figure out with labeling with the Py2Neo OGM. So thanks Rick!

I had essentially the same question. I was unable to find an answer and tried to come up with a solution to this using both py2neo and neomodel.
Just a Beginner
It is important to note that I am definitely not answering this as an expert in either one of these libraries but rather as someone trying to evaluate what might be the best one to start a simple project with.
End Result
The end result is that I found a workaround in py2neo that seems to work. I also got a result in neomodel that I was even happier with. I ended up a little frustrated by both libraries but found neomodel the more intuitive to a newcomer.
An Asset Label is the Answer Right?
I thought that the answer would be to create an "Asset" label and add this label to House and Car and create the [:OWNS] relationship between Person and Asset. Easy right? Nope, apparently not. There might be a straightforward answer but I was unable to find it. The only solution that I got to work in py2neo was to drop down to the lower-level (not OGM) part of the library.
Here's what I did in py2neo:
class Person(GraphObject):
name = Property()
class Car(GraphObject):
name = Property()
model = Property()
asset = Label("Asset")
class House(GraphObject):
name = Property()
city = Property()
asset = Label("Asset")
g = graph
# Create Pete
p = Person()
p.name = "Pete"
g.push(p)
# Create Ferrari
c = Car()
c.name = "Ferrari"
c.asset = True
g.push(c)
# Create House
h = House()
h.name = "White House"
h.city = "New York"
h.asset = True
g.push(h)
# Drop down a level and grab the actual nodes
pn = p.__ogm__.node
cn = c.__ogm__.node
# Pete OWNS Ferrari (lower level py2neo)
ap = Relationship(pn, "OWNS", cn)
g.create(ap)
# Pete OWNS House (lower level py2neo)
hn = h.__ogm__.node
ah = Relationship(pn, "OWNS", hn)
g.create(ah)
# Grab & Print
query = """MATCH (a:Person {name:'Pete'})-[:OWNS]->(n)
RETURN labels(n) as labels, n.name as name"""
data = g.data(query)
for asset in data:
print(asset)
This results in:
{'name': 'White House', 'labels': ['House', 'Asset']}
{'name': 'Ferrari', 'labels': ['Car', 'Asset']}
Neomodel Version
py2neo seems to do some clever tricks with the class names to do its magic and the library seems to exclude Labels from this magic. (I hope I am wrong about this but as I said, I could not solve it). I decided to try neomodel.
class Person(StructuredNode):
name = StringProperty(unique_index=True)
owns = RelationshipTo('Asset', 'OWNS')
likes = RelationshipTo('Car', "LIKES")
class Asset(StructuredNode):
__abstract_node__ = True
__label__ = "Asset"
name = StringProperty(unique_index=True)
class Car(Asset):
pass
class House(Asset):
city = StringProperty()
# Create Person, Car & House
pete = Person(name='Pete').save()
car = Car(name="Ferrari").save()
house = House(name="White House", city="Washington DC").save()
#Pete Likes Car
pete.likes.connect(car)
# Pete owns a House and Car
pete.owns.connect(house)
pete.owns.connect(car)
After these objects are created they are relatively simple to work with:
for l in pete.likes.all():
print(l)
Result:
{'name': 'Ferrari', 'id': 385}
With the "abstract" relationship the result is an object of that type, in this case Asset.
for n in pete.owns.all():
print(n)
print(type(n))
Result:
{'id': 389}
<class '__main__.Asset'>
There seems to be a way to "inflate" these objects to the desired type but I gave up trying to figure that out in favor of just using Cypher. (Would appreciate some help understanding this...)
Dropping down to the Cypher level, we get exactly what we want:
query = "MATCH (a:Person {name:'Pete'})-[:OWNS]->(n) RETURN n"
results, meta = db.cypher_query(query)
for n in results:
print(n)
Result:
[<Node id=388 labels={'Asset', 'Car'} properties={'name': 'Ferrari'}>]
[<Node id=389 labels={'Asset', 'House'} properties={'city': 'Washington DC', 'name': 'White House'}>]
Conclusion
The concept of Labels is very intuitive for a lot of the problems I would like to solve. I found py2neo's treatment of Labels confusing. Your workaround might be to drop down to the "lower-level" of py2neo. I personally thought the neomodel syntax was more friendly and suggest checking it out. HTH.

Related

How do I increment a count field automatically within a many to many intermediate table in django models?

So I have the following three models for creating an icecream (which consists of a flavour and the cone of the icecream):
from django.db import models
not_less_than_zero=MinValueValidator(0)
class Flavour(models.Model):
name = models.CharField(max_length=255)
price = models.DecimalField(
max_digits=5,
decimal_places=2,
validators=([not_less_than_zero])
)
class Icecream(models.Model):
cone = models.ForeignKey(
Cone,
on_delete=models.PROTECT
)
scoops = models.ManyToManyField(
Flavour,
through='quickstart.IcecreamFlavour'
)
quantity = models.IntegerField(default=1, validators=([not_less_than_zero]))
class IcecreamFlavour(models.Model):
icecream = models.ForeignKey(Icecream, on_delete=models.PROTECT)
flavour = models.ForeignKey(Flavour, on_delete=models.PROTECT)
count = models.IntegerField(blank=True, default=1, validators=([not_less_than_zero]))
I would like for the count field in IcecreamFlavour to be updated automatically whenever a flavour is added to Icecream that already exists for that icecream. example in python shell:
>>> from .models import Cone, Flavour, Icecream
>>>
>>> # defining some flavours in the database
>>> strawberry = Flavour.objects.create(name="Strawberry", price=0.75)
>>> chocolate = Flavour.objects.create(name="Chocolate", price=0.80)
>>>
>>> # for the purpose of this example, lets say there is already a predefined icecream in the database
>>> my_icecream = Icecream.objects.get(pk=1)
>>> my_icecream.scoops.add(chocolate)
>>> my_icecream.scoops.add(strawberry)
>>> my_icecream.scoops.count() # beware! this is not the count field of IcecreamFlavour model
2
>>> # so far so good
>>> my_icecream.scoops.add(strawberry)
>>> my_icecream.scoops.count()
2
So after seeing this I presumed that this was because there already exists an entry in the intermediate table IcecreamFlavour that connects the Flavour and Icecream models, therefore I added a count to the intermediate table to keep track of how many scoops of a single flavour are added to an icecream.
Unfortunately I have a hard time updating this count whenever a duplicate value is added for that Icecream.
I tried messing around with the m2m_changed signal and adding a callback to check what scoops already exists on the icecream, but unless I read the documentation wrong I see no way to check which flavour is added to the scoops of the icecream through this signal.
Basically this is what I would want to happen:
...
>>> from .models import IcecreamFlavour
>>> # let's say the id of the my_icecream variable is 4
>>> # and the id of the chocolate flavour is 2
>>> icecream_flavour = IcecreamFlavour.objects.get(icecream_id=4, flavour_id=2)
>>> icecream_flavour.count
1
>>> # at this point there is one chocolate scoop on my_icecream
>>> my_icecream.scoops.add(chocolate)
>>> icecream_flavour.count
2
...
After adding an icecream of the same flavour the count should be automatically updated for the icecream flavour.
Is there any way I can do this inside the models defined in Django, or do I actually need to write a custom trigger in the database to handle this behavior (which I would not prefer tbh)
It could be that I missed something really obvious, I just started learning django last week and I'm trying stuff out with some practice projects but I can't really find a solution in the docs for this particular problem, any help is greatly appreciated.
Thanks for reading in advance anyways :-).

SWRL rules in OWL 2

I'm currently discovering all the possibilities of the Owlready library.
Right now I'm trying to process some SWRL rules and so far it's been going very good, but I'm stuck at one point.
I've defined some rules in my ontology and now I want to see all the results (so, everything inferred from a rule).
For example, if I had a rule
has_brother(David, ?b) ^ has_child(?b, ?s) -> has_uncle(?s, David)
and David has two brothers, John and Pete, and John's kid is Anna, Pete's kid is Simon, I would like too see something like:
has_brother(David, John) ^ has_child(John, Anna) -> has_uncle(Anna, David)
has_brother(David, Pete) ^ has_child(Pete, Simon) -> has_uncle(Simon, David)
Is this possible in any way?
I thought that maybe if I run the reasoner, I could see it in its output, but I can't find this anywhere.
I appreciate any help possible!
This is my solution:
import owlready2 as owl
onto = owl.get_ontology("http://test.org/onto.owl")
with onto:
class Person(owl.Thing):
pass
class has_brother(owl.ObjectProperty, owl.SymmetricProperty, owl.IrreflexiveProperty):
domain = [Person]
range = [Person]
class has_child(Person >> Person):
pass
class has_uncle(Person >> Person):
pass
rule1 = owl.Imp()
rule1.set_as_rule(
"has_brother(?p, ?b), has_child(?p, ?c) -> has_uncle(?c, ?b)"
)
# This rule gives "irreflexive transitivity",
# i.e. transitivity, as long it does not lead to has_brother(?a, ?a)"
rule2 = owl.Imp()
rule2.set_as_rule(
"has_brother(?a, ?b), has_brother(?b, ?c), differentFrom(?a, ?c) -> has_brother(?a, ?c)"
)
david = Person("David")
john = Person("John")
pete = Person("Pete")
anna = Person("Anna")
simon = Person("Simon")
owl.AllDifferent([david, john, pete, anna, simon])
david.has_brother.extend([john, pete])
john.has_child.append(anna)
pete.has_child.append(simon)
print("Uncles of Anna:", anna.has_uncle) # -> []
print("Uncles of Simon:", simon.has_uncle) # -> []
owl.sync_reasoner(infer_property_values=True)
print("Uncles of Anna:", anna.has_uncle) # -> [onto.Pete, onto.David]
print("Uncles of Simon:", simon.has_uncle) # -> [onto.John, onto.David]
Notes:
One might think has_brother is
symmetric, i.e. has_brother(A, B) ⇒ has_brother(B, A)
transitive, i.e. has_brother(A, B) + has_brother(B, C) ⇒ has_brother(A, C)
irreflexive, i.e. no one is his own brother.
However, transitivity only holds if the unique name assumption holds. Otherwise A could be the same individual as C and this conflicts irreflexivity. Thus I used a rule for this kind of "weak transitivity".
Once, has_brother works as expected the uncle rule also does. Of course, the reasoner must run before.
Update: I published the solution in this Jupyter notebook (which also contains the output of the execution).

Any built-in Python lib or function for best (most exact) string matching like routing table? If not, is my code most effecient?

Code example:
# Something
dept_dict = {
# Generally, employee code starting with '15' means Department A, '16' B, '17' C.
'15': 'Department A',
'16': 'Department B',
'17': 'Department C',
# Exception: sub dept '15.233' and '17.312' belonged to dept A and C but now B.
'15.233': 'Department B',
'17.312': 'Department B',
# Exception: employees who had transferred to another department.
'15.233.19305': 'Department C',
'15.330.19306': 'Department B',
}
# Requirement: use exception (exact matched) item if it exists, otherwise "fall" to general item.
# Is there any built-in function implement the following?
def get_dept(emp_code):
dept_name = None
for i in range(len(emp_code), -1, -1):
dept_name = dept_dict.get(emp_code[0:i])
if dept_name:
break
return dept_name
# Test code:
print(get_dept('15.233.19305')) # The employee who transferred to Dept C from Dept A
print(get_dept('15.233.19300')) # The employee who belongs to a sub dept, all employees of which have transferred to Dept B from Dept A
print(get_dept('15.147.13500')) # The employee who belongs to Dept A just like most of the employees
print(get_dept(''))
result:
Department C
Department B
Department A
None
For function "get_dept", is there any built-in function which have already implemented it? Did I reinvent the wheel?
I've read some posts with tile "most exact matching" on this site, but most of them are about "fuzzy searching", for example matching "department" with ["depatment", "depart"], which is not what I want. When I search "routing table", I got posts like "URL matching", which is not what I want, too.
It seems to be an underlying technology used by "routing table".
If there is no such built-in function, the question would be: is my implementation most efficient? Should I use, for example binary search or something else?
(Edited)
Thanks to someone who commented (but somehow deleted it). The comment said that building a Trie might be an option. If the dict is not frequently changed and there are lots of queries, overhead of building a Trie could be ignored.
This reduces your worst case from 11 dictionary lookups to 3. Assuming there are no silver bullets you might consider something along these lines.
Example:
def get_dept(emp_code):
emp_sub_codes = emp_code.split(".")
while emp_sub_codes:
dept_name = dept_dict.get(".".join(emp_sub_codes))
if dept_name:
return dept_name
emp_sub_codes = emp_sub_codes[:-1]
return None

Django annotation on (model → FK → model) relation

Galaxies across the universe host millions/billions of stars, each belonging to a specific type, depending on its physical properties (Red stars, Blue Supergiant, White Dwarf, etc). For each Star in my database, I'm trying to find the number of distinct galaxies that are also home for some star of that same type.
class Galaxy(Model):
...
class Star(Model):
galaxy = ForeignKey(Galaxy, related_name='stars')
type = CharField(...)
Performing this query individually for each Star might be comfortably done by:
star = <some_Star>
desired_galaxies = Galaxy.objects.filter(stars__type=star.type).distinct()
desired_count = desired_galaxies.count()
Or even, albeit more redundant:
desired_count = Star.objects.filter(galaxy__stars__type=star.type).values('galaxy').distinct()
This get a little fuzzier when I try to get the count result for all the stars in a "single" query:
all_stars = Star.objects.annotate(desired_count=...)
The main reason I want to do that is to be capable of sorting Star.objects.order_by('desired_count') in a clean way.
What I've tried so far:
Star.annotate(desired_count=Count('galaxy', filter=Q(galaxy__stars__type=F('type')), distinct=True))
But this annotates 1 for every star. I guess I'll have to go for OuterRef, Subquery here, but not sure on how.
You can use GROUP BY to get the count:
Star.objects.values('type').annotate(desired_count=Count('galaxy')).values('type', 'desired_count')
Django doesn't provide a way to define multi-valued relationships between models that don't involve foreign keys yet. If it did you could do something like
class Galaxy(Model):
...
class Star(Model):
galaxy = ForeignKey(Galaxy, related_name='stars')
type = CharField(...)
same_type_stars = Relation(
'self', from_fields=('type',), to_fields=('type',)
)
Star.objects.annotate(
galaxies_count=Count('same_type_stars__galaxy', distinct=True)
)
Which would result in something along
SELECT
star.*,
COUNT(DISTINCT same_star_type.galaxy_id) galaxies_count
FROM star
LEFT JOIN star same_star_type ON (same_star_type.type = star.type)
GROUP BY star.id
If you want to achieve something similar you'll need to use subquery for now
Star.objects.annotate(
galaxies_count=Subquery(Star.objects.filter(
type=OuterRef('type'),
).values('type').values(
inner_count=Count('galaxy', distinct=True),
))
)
Which would result in something along
SELECT
star.*,
(
SELECT COUNT(DISTINCT inner_star.galaxy_id)
FROM star inner_star
WHERE inner_star.type = star.type
GROUP BY inner_star.type
) galaxies_count
FROM star
Which likely perform poorly on some databases that don't materialize correlated subqueries (e.g. MySQL). In all cases make sure you index Star.type otherwise you'll get bad performance no matter what. A composite index on ('type', 'galaxy') might be even better as it might allow you to perform index only scan (e.g. on PostgreSQL).

Optimizing py2neo's cypher insertion

I am using py2neo to import several hundred thousand nodes. I've created a defaultdict to map neighborhoods to cities. One motivation was to more efficiently import these relationships having been unsuccessful with Neo4j's load tool.
Because the batch documentation suggests to avoid using it, I veered away from an implementation like the OP of this post. Instead the documentation suggests I use Cypher. However, I like the being able to create nodes from the defaultdict I have created. Plus, I found it too difficult importing this information as the first link demonstrates.
To reduce the speed of the import, should I create a Cypher transaction (and submit every 10,00) instead of the following loop?
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
city_node = graph.find_one(label="City", property_key="Name", property_value=city_name)
for neighborhood_name in neighborhood_names:
neighborhood_node = Node("Neighborhood", Name=neighborhood_name)
rel = Relationship(neighborhood_node, "IN", city_node)
graph.create(rel)
I get a time-out, and it appears to be pretty slow when I do the following. Even when I break up the transaction so it commits every 1,000 neighborhoods, it still processes very slowly.
tx = graph.cypher.begin()
statement = "MERGE (city {Name:{City_Name}}) CREATE (neighborhood { Name : {Neighborhood_Name}}) CREATE (neighborhood)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()
It would be fantastic to save pointers to each city so I don't need to look it up each time with the merge.
It may be faster to do this in two runs, i.e. CREATE all nodes first with unique constraints (which should be very fast) and then CREATE the relationships in a second round.
Constraints first, use Labels City and Neighborhood, faster MATCH later:
graph.schema.create_uniqueness_constraint('City', 'Name')
graph.schema.create_uniqueness_constraint('Neighborhood', 'Name')
Create all nodes:
tx = graph.cypher.begin()
statement = "CREATE (:City {Name: {name}})"
for city_name in city_neighborhood_map.keys():
tx.append(statement, {"name": city_name})
statement = "CREATE (:Neighborhood {Name: {name}})"
for neighborhood_name in neighborhood_names: # get all neighborhood names for this
tx.append(statement, {name: neighborhood_name})
tx.commit()
Relationships should be fast now (fast MATCH due to constraints/index):
tx = graph.cypher.begin()
statement = "MATCH (city:City {Name: {City_Name}}), MATCH (n:Neighborhood {Name: {Neighborhood_Name}}) CREATE (n)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()

Categories