Updating a node with merge using Py2Neo - python

I'm trying to merge and then update a graph using the py2neo library. My code looks roughly like
from py2neo import Graph, Node, Relationship
graph = Graph(host, auth=(user, password,))
tx = graph.begin()
alice = Node("Person", name="Alice")
bob = Node("Person", name="Bob")
KNOWS = Relationship(alice, "KNOWS", bob)
tx.create(KNOWS)
graph.commit(tx)
This creates the nodes and edges as expected as
(:Person {name: "Alice"})-[:KNOWS]->(:Person {name: "Bob"})
If I try to modify alice in a new transaction though, I get no change
e.g.
new_tx = graph.begin()
alice["age"] = 32
new_tx.merge(alice, "Person", "name")
graph.commit(new_tx)
I suspect I have misunderstood how the Transaction works here. I would expect the above to be equivalent to either finding Alice and updating with the new property or creating a new node.
Update: I have discovered the Graph.push method, but would still appreciate advice on best practice.

You need to define a primary key to let the MERGE know which property to use as a primary key. From the documentation:
The primary property key used for Cypher MATCH and MERGE operations.
If undefined, the special value of "id" is used to hinge
uniqueness on the internal node ID instead of a property. Note that
this alters the behaviour of operations such as Graph.create() and
Graph.merge() on GraphObject instances.
It's probably best practice to define a custom class for every node type and define the primary key there.
class Person(GraphObject):
__primarykey__ = "name"
name = Property()
born = Property()
acted_in = RelatedTo(Movie)
directed = RelatedTo(Movie)
produced = RelatedTo(Movie)

Related

Neomodel Most Efficient Way to Query Relationship Data

Suppose I have the following models from the neomodel documentation.
class FriendRel(StructuredRel):
since = DateTimeProperty(
default=lambda: datetime.now(pytz.utc)
)
met = StringProperty()
class Person(StructuredNode):
name = StringProperty()
friends = RelationshipTo('Person', 'FRIEND', model=FriendRel)
And I create the following data.
bob = Person(name='bob').save()
frank = Person(name='frank').save()
rel = bob.friends.connect(frank, {'since': dt.datetime.now(), 'met': 'Germany'})
Now my question is how I should go about retrieving both the friends of an object and the corresponding FriendshipRel objects between those relationships.
The Neomodel docs seem to say to do the following.
>>> bob = Person.nodes.get(name='bob')
>>> frank = bob.friends[0] # get bob's friend frank using database query?
>>> rel = bob.friends.relationship(frank) # query database again?
>>> rel.met
'Germany'
When doing this, it really feels like there would be a better way of retrieving relationship objects without another database query. I would expect these relationship objects to already be cached when you retrieve a node's friends?
So in a loop, would this be the best way to retrieve all of a Person's friends and the FriendshipRel objects for those friendships?
# source: https://stackoverflow.com/questions/67821341/retrieve-the-relationship-object-in-neomodel
for friend in bob.friends:
rel = bob.friends.relationship(friend)
This seems quite inefficient, as doesn't it require another database query for each relationship? Or am I not understanding correctly?
With cypher, I would just do the following:
MATCH(i:Person{name: 'bob'})-[j:FRIEND]->(k) RETURN i,j,k
So my question: is there a way, using neomodel, to retrieve a node's relationships and the objects for those relationships both at the same time?
I've checked the neomodel source code and there doesn't seem to be a way to achieve what I want in a more efficient way than what I found in this stackoverflow answer.
But I now know how to do this using cypher queries like so:
from neomodel import db
from models import Person, FriendRel
bob = Person.nodes.get(name='bob')
# Only one database query. Yay!
results, cols = db.cypher_query(f"""MATCH (node)-[rel]-(neighbor)
WHERE id(node)={john.id}
RETURN node, rel, neighbor""")
rels = {} # friendships mapped to neighbor node ids
neighbors = []
for row in results:
neighbor = Person.inflate(row[cols.index('neighbor')])
neighbors.append(neighbor)
rel = FriendRel.inflate(row[cols.index('rel')])
rels[neighbor.id] = rel
Then, now that you've stored all neighbors and the relationships between them, you can loop through them like so:
for neighbor, rel in rels:
print(f"bob has a friendship with {neighbor}.")
print(f"They've been friends since {rel.since}")
Or like so:
for neighbor in neighbors:
rel = rels[neighbor.id]
Thanks to everyone's helpful advice!

Retrieve the Relationship object in Neomodel

I am using Neomodel and Python for my project. I have a number of nodes defined and am storing relevant information on the relationships between them. However I can't seem to find a mechanism for retrieving the relationship object itself to be able to use the attributes - I can only filter by the relationship attribute to return the Nodes.
class MyRelationship(StructuredRel):
source = StringProperty()
class Person(StructuredNode):
uid=UniqueIdProperty()
first_name = StringProperty()
last_name = StringProperty()
people = RelationshipTo('Person', "PERSON_RELATIONSHIP", model = MyRelationship)
I have a number of relationships of the same type [PERSON_RELATIONSHIP] between the same two nodes, but they differ by attribute. I want to be able to iterate through them and print out the to node and the attribute.
Given an Object person of type Person
for p in person.people:
gives me the Person objects
person.people.relationship(p).source always gives me the value for the first relationship only
A Traversal also seems to give me the Person objects as well
The only way it seems to get a Relationship object is on .connect.
Any clues? Thanks.
I just stumbled over the same problem and managed to solve it like below. But i am not sute if it is the most performant solution.
If you already have a Person node object in variable person:
for p in person.people:
r = person.people.relationship(p)
Or iterating over all Person nodes:
for person in Person.nodes.all():
for p in person.people:
r = person.people.relationship(p)
I've checked the neomodel source code and there doesn't seem to be a way to achieve what you want in a more efficient way than what Roman said.
but you could always use cypher queries.
from neomodel import db
from models import Person, MyRelationship
john = Person.nodes.get(name='John')
results, cols = db.cypher_query(f"""MATCH (node)-[rel]-(neighbor)
WHERE id(node)={john.id}
RETURN node, rel, neighbor""")
rels = {}
neighbors = []
for row in results:
neighbor = Person.inflate(row[cols.index('neighbor')])
neighbors.append(neighbor)
rel = MyRelationship.inflate(row[cols.index('rel')])
rels[neighbor.id] = rel
Then, now that you've stored all neighbors and the relationships between them, you can loop through them like so:
for neighbor, rel in rels:
print(f"john has a friendship with {neighbor} which has the source {rel.source}")
Hope this helps!
Ethan

neo4j CYPHER - at ON MATCH SET create new nodes on condition

To import XML data into a neo4j DB I first parse the XML to a python dictionary and then use CYPHER queries:
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
...
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!')})
ON CREATE SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
ON MATCH SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
MERGE (p)<-[:WROTE]-(a)
)
Unfortunately, Authors don't have unique IDs in the database, so it might be that different authors have the same last names but different initials or affiliations.
...
<Author ValidYN="Y">
<LastName>Smith</LastName>
<ForeName>A L</ForeName>
<Initials>AL</Initials>
<AffiliationInfo>
<Affiliation>University X</Affiliation>
</AffiliationInfo>
</Author>
...
<Author ValidYN="Y">
<LastName>Smith</LastName>
<ForeName>A L</ForeName>
<Initials>AL</Initials>
<AffiliationInfo>
<Affiliation>University BUMBABU</Affiliation>
</AffiliationInfo>
</Author>
My intention was to MERGE on author.LastName but ON MATCH check if the author has the same ForeName OR the same Affiliation and if not create a new node instead.
How would I do that using CYPHER queries?
EDIT 1
Node Key constraints are the solution, which is an Enterprise Edition feature, though. Looking for a workaround for that.
EDIT 2
This code is working almost perfectly:
WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
MERGE (p:Publication {pmid: particle.MedlineCitation.PMID.text})
ON CREATE SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
ON MATCH SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!'), first_name: COALESCE(author.ForeName, 'FIRST NAME MISSING!')})
MERGE (p)<-[:WROTE]-(a)
)
To sum it up:
For every author I want to create a new author IF LastName OR ForeName OR Affiliation are different. I also need NEW Nodes for authors where LAST NAME MISSING! and FIRST NAME MISSING!
Is it possible to achieve this result WITHOUT Key Node Constraints? (because this is an Enterprise Edition feature...)
The authors do have a unique ID in Neo4j, the node ID. That can be used to identify the node and then the set the properties. Maybe something like this:
Match (a:Author{LastName:'xxx',ForeName:'yyy'})
with a, id(a) as ID
where ID > -1
match (b) where id(b)=ID set b.first_name = author.ForeName, b.affiliation = author.AffiliationInfo.Affiliation
The node's ID is not necessarily stable or predictable, so you have to access it directly before using it.
Because you are using python code, you might to better with a global query to pull down the author node data:
match (a:Author{LastName:'xxx',ForeName:'yyy'}) return a.LastName,a.ForeName,id(a) as ID
then, you can write a csv file to bulk upload the desired info. The csv could look like this:
> "ID","ForeName","LastName","Affiliation"
"26","David","Smith","Johns Hopkins"
etc.
The python code could do the filtering of nodes that do not need processing.
Then load the file:
LOAD CVS with HEADER file:///'xxx.csv' as line
match (a) where id(a)=toInteger(line.ID)
set a.Affiliation=line.toString(line.Affiliation")
You can use constraints, then neo4j will check uniqueness for you.
From documentation:
To create a Node Key ensuring that all nodes with a particular label have a set of defined properties whose combined value is unique, and where all properties in the set are present
CREATE CONSTRAINT ON (author:Author) ASSERT (author.first_name, author.last_name, author.affiliation) IS NODE KEY

referencing an entity by its key before it gets saved to the ndb

I would like to be able to relate an entity of one class to another entity at the moment of the creation of both entities (one entity will have the other as it's parent and the other would have a key pointing to the other entity). It seems I am unable to obtain the key of an entity prior it gets saved to the Datastore. Is there any way to achieve the above without having to save one of the entities twice?
Below is the example:
class A(ndb.Model):
key_of_b = ndb.KeyProperty(kind='B')
class B(ndb.Model):
pass
What I am trying to do:
a = A()
b = B(parent=a.key)
a.key_of_b = b.key
a.put()
b.put()
If the key doesn't get assigned prior to the entity being saved, is there anyway I could construct it myself? Is there any way to achieve this or would the only solution be to save one of the entities twice?
You could do this with named keys but then you have to be sure you can name the two entities with unique keys:
# It is possible to construct a key for an entity that does not yet exist.
keyname_a = 'abc'
keyname_b = 'def'
key_a = ndb.Key(A, keyname_a)
key_b = ndb.Key(A, keyname_a, B, keyname_b)
a = A(id=keyname_a)
a.key_of_b = key_b
b = B(id=keyname_b, parent=key_a)
a.put()
b.put()
However, I would suggest thinking about why you would need the key_of_b property in the first place. If you only set A as the parent of B then you will always be able to navigate from from A to B and the other way around:
# If you have the A entity from somewhere and want to find B.
b = B.query(ancestor=entity_a.key).get()
# You have the B entity from somewhere and want to find A.
a = entity_b.key.parent().get()
This also gives you the opportunity to create one-to-many relationships between A and B.

Understanding ndb key class vs KeyProperty

I've looked through the documentation, the docs and SO questions and answers and am still struggling with understanding a small piece of this. Which should you choose and when?
This is what I've read so far (just sample):
ndb documentation
movie database structure on SO
Parent Key issues
The key class seems pretty straightforward to me. When you create an ndb entity the datastore automatically creates for you a key usually in the form of key(Kind, id) where the id is created for you .
So say you have these two models:
class Blah(ndb.Model):
last_name = ndb.StringProperty()
class Blah2(ndb.Model):
first_name = ndb.StringProperty()
blahkey = ndb.KeyProperty()
So just using the key kind and you want to make Blah1 a parent (or have several family members with the same last name)
lname = Blah(last_name = "Bonaparte")
l_key = lname.put() **OR**
l_key = lname.key.id() # spits out some long id
fname_key = l_key **OR**
fname_key = ndb.Key('Blah', lname.last_name) # which is more readable..
then:
lname = Blah2( parent=fname_key, first_name = "Napoleon")
lname.put()
lname2 = Blah2( parent=fname_key, first_name = "Lucien")
lname2.put()
So far so good (I think). Now about the KeyProperty for Blah2. Assume Blah1 is still the same.
lname3 = Blah2( first_name = "Louis", blahkey = fname_key)
lname3.put()
Is this correct ?
How to query various things
Query Last Name:
Blah.query() # all last names
Blah.query(last_name='Bonaparte') # That specific entity.
First Name:
Blah2.query()
napol = Blah2.query(first_name = "Napoleon")
bonakey = napol.key.parent().get() # returns Bonaparte's key ??
bona = bonakey.get() # I think this might be redundant
this is where I get lost. How to look for Bonaparte from first name by using either key or keyproperty. I didn't add it here and perhaps should have and that is the discussion of parents, grand parents, great grand parents since Keys keep track of ancestors/parents.
How and why would you use KeyProperty vs the inherent key class. Also imagine you had 3 sensors s1, s2, s3. Each sensor had thousands of readings but you want to keep readings associated with s1 so that you could graph say All readings for today for s1. Which would you use? KeyProperty or the key class ? I apologize if this has been answered elsewhere but I didn't see a clear example/guide about choosing which and why/how.
I think the confusion comes from using a Key. A Key is not associated with any properties inside of an entity, it is only a unique identifier to locate a single entity. It can be either a number or a string.
Fortunately, all your code looks good except for this one line:
fname_key = ndb.Key('Blah', lname.last_name) # which is more readable..
Constructing a Key takes a unique ID, which is not the same as a property. That is, it won't associate the variable lname.last_name with the property last_name. Instead, you can create your record like this:
lname = Blah(id = "Bonaparte")
lname.put()
lname_key = ndb.Key('Blah', "Bonaparte")
You are guaranteed to have only one Blah entity with that ID. In fact, if you use a string like last_name as the ID, you don't need to store it as a separate property. Think of the entity ID as an extra string property that is unique.
Next, Be careful not to assume that Blah.last_name and Blah2.first_name are unique in your queries:
lname = Blah2( parent=fname_key, first_name = "Napoleon")
lname.put()
If you do this more than once, there will be multiple entities with a first_name of Napoleon (all with the same parent key).
Continuing with your code from above:
napol = Blah2.query(first_name = "Napoleon")
bonakey = napol.key.parent().get() # returns Bonaparte's key ??
bona = bonakey.get() # I think this might be redundant
napol holds a Query, not a result. You need to call napol.fetch() to get all entities with "Napolean" (or napol.get() if you're sure there is just one entity).
bonakey is the opposite, it holds the parent entity because of the get() and not Bonaparte's key. If you left the .get() off, then bona would correctly have the parent.
Finally, your question about sensors. You may not need KeyProperty or "inherent" keys. If you have a Readings class like this:
class Readings(ndb.Model):
sensor = ndb.StringProperty()
reading = ndb.IntegerProperty()
then you can store them all in a single table without keys. (You may want to include a timestamp or other attribute.) Later, you can retrieve then with this query:
s1_readings = Readings.query(Readings.sensor == 'S1').fetch()
I'm new to NDB also, and I'm still not understanding all for now, but I think that when you create Blah2 with a parent for Napoleon, you will need the parent to query it or will not appear. For example:
napol = Blah2.query(first_name = "Napoleon")
will not get anything (and you are not using the right format for NDB), but using the parent will do:
napol = Blah2.query(ancestor=fname_key).filter(Blah2.first_name == "Napoleon").get
Don't know if this puts some light for your question.

Categories