Neo4J Query Create new or Replace existing Relationship

Neo4J Query Create new or Replace existing Relationship - python

I have a scenario in which I will have to create nodes for a new relationship or if nodes exist and a relationship exists, I have to replace the existing relationship with new one. Only one relationship will exist between 2 nodes.
Below commands doesn't seem to be working when I call from Python client using GDB.query:
match (a:user)-[r]->(b:user)
where a.id='3' and b.id='5'
merge (a)-[r2:test]->(b)
SET r2 = r SET r2.percentage = 80
WITH r
DELETE r
return r
MATCH (a:user),(b:user)
WHERE a.id='3' AND b.id='5'
MERGE (a)-[r:test]->(b)
RETURN r

If you want to replace an existing relationship of a particular type with a new one:
match (a:user {id:'3'})
match (b:user {id:'5'})
merge (a)-[newRel:NEW_TYPE]->(b) //create the new rel if missing
set newRel.percentage = 80
match (a)-[oldRel:OLD_TYPE]->(b) //match the old rel
delete oldRel //and delete it
But if you just want to set a property on an existing relationship and create it if missing:
match (a:user {id:'3'})
match (b:user {id:'5'})
merge (a)-[rel:REL_TYPE]->(b) //creates a new rel if it doesn't exist
set rel.percentage = 80

Finally, got the right query. First we execute the match and if it doesn't work, we execute the second query which does a create, if it already exists, it doesn't do anything.
match (a:user)-[r]->(b:user)
where a.id=3 and b.id=5
merge (a)-[r2:test4]->(b)
set r2.percentage = 50
delete r
return a,b, r2
MERGE (a:user {id:3})-[r:test]->(b:user {id:5})
ON CREATE
SET r.percentage = 55
ON MATCH
SET r.percentage = 55

Related

Pyneo Inserts Limited Amount of Edges

For a university project I am using Neo4j together with flask and pyneo for a shift scheduling algorithm. On saving the scheduled shifts to Neo4j I realized that relationships go missing, from 330 only 91 get inserted.
On printing them before/after inserting, they are in the list to be inserted, and I also moved the transaction around to check if this changes the result.
I have the following structure:
(w:Worker)-[r:works_during]->(s:Shift) with
r.day, r.month, r.year as set parameters for the relationship and multiple connections between each worker and each shift, which can be filtered via the relation then.
my code looks like the following:
header = df.columns.tolist()
header.remove("index")
header.remove("worker")
tuplelist = []
for index, row in df.iterrows():
for i in header:
worker = self.driver.nodes.match("Worker", id=int(row["worker"])).first()
if row[i] == 1:
# Shifts are in the format {day}_{shift_of_day}
shift_id = str(i).split("_")[1]
shift_day = str(i).split("_")[0]
shift = self.driver.nodes.match("Shift", id=int(shift_id)).first()
rel = Relationship(worker, "works_during", shift)
rel["day"] = int(shift_day)
rel["month"] = int(month)
rel["year"] = int(year)
tuplelist.append(rel)
print(len(tuplelist))
for i in tuplelist:
connection = self.driver.begin()
connection.create(i)
connection.commit()
Is there any special behaviour in pyneo which I need to be aware of that could cause this issue?

Pyneo allows just one connection from the same type between node A and node B.
If multiple connections of the same type (even with different attributes) are needed, it is necessary to use plain Cypher Querying as pyneo will merge this edges to a single edge.

Python Gitlab API - list shared projects of a group/subgroup

I need to find all projects and shared projects within a Gitlab group with subgroups. I managed to list the names of all projects like this:
group = gl.groups.get(11111, lazy=True)
# find all projects, also in subgroups
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
What I am missing is to list also the shared projects within the group. Or maybe to put this in other words: find out all projects where my group is a member. Is this possible with the Python API?

So, inspired by the answer of sytech, I found out that it was not working in the first place, as the shared projects were still hidden in the subgroups. So I came up with the following code that digs through all various levels of subgroups to find all shared projects. I assume this can be written way more elegant, but it works for me:
# group definition
main_group_id = 11111
# create empty list that will contain final result
list_subgroups_id_all = []
# create empty list that act as temporal storage of the results outside the function
list_subgroups_id_stored = []
# function to create a list of subgroups of a group (id)
def find_subgroups(group_id):
# retrieve group object
group = gl.groups.get(group_id)
# create empty lists to store id of subgroups
list_subgroups_id = []
#iterate through group to find id of all subgroups
for sub in group.subgroups.list():
list_subgroups_id.append(sub.id)
return(list_subgroups_id)
# function to iterate over the various groups for subgroup detection
def iterate_subgroups(group_id, list_subgroups_id_all):
# for a given id, find existing subgroups (id) and store them in a list
list_subgroups_id = find_subgroups(group_id)
# add the found items to the list storage variable, so that the results are not overwritten
list_subgroups_id_stored.append(list_subgroups_id)
# for each found subgroup_id, test if it is already part of the total id list
# if not, keep store it and test for more subgroups
for test_id in list_subgroups_id:
if test_id not in list_subgroups_id_all:
# add it to total subgroup id list (final results list)
list_subgroups_id_all.append(test_id)
# check whether test_id contains more subgroups
list_subgroups_id_tmp = iterate_subgroups(test_id, list_subgroups_id_all)
#if so, append to stored subgroup list that is currently checked
list_subgroups_id_stored.append(list_subgroups_id_tmp)
return(list_subgroups_id_all)
# find all subgroup and subsubgroups, etc... store ids in list
list_subgroups_id_all = iterate_subgroups(main_group_id , list_subgroups_id_all)
print("***ids of all subgroups***")
print(list_subgroups_id_all)
print("")
print("***names of all subgroups***")
list_names = []
for ids in list_subgroups_id_all:
group = gl.groups.get(ids)
group_name = group.attributes['name']
list_names.append(group_name)
print(list_names)
#print(list_subgroups_name_all)
print("")
# print all directly integrated projects of the main group, also those in subgroups
print("***integrated projects***")
group = gl.groups.get(main_group_id)
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
# print all shared projects
print("***shared projects***")
for sub in list_subgroups_id_all:
group = gl.groups.get(sub)
for shared_prj in group.shared_projects:
print(shared_prj['path_with_namespace'])
print("")
One question that remains - at the very beginning I retrieve the main group by its id (here: 11111), but can I actually also get this id by looking for the name of the group? Something like: group_id = gl.group.get(attribute={'name','foo'}) (not working)?

You can get the shared projects by the .shared_projects attribute:
group = gl.groups.get(11111)
for proj in group.shared_projects:
print(proj['path_with_namespace'])
However, you cannot use the lazy=True argument to gl.groups.get.
>>> group = gl.groups.get(11111, lazy=True)
>>> group.shared_projects
AttributeError: shared_projects

How to overwrite older existing ID's when merging into new table?

I currently am cacheing data from an API by storing all data to a temporary table and merging into a non-temp table where ID/UPDATED_AT is unique.
ID/UPDATED_AT example:
MERGE
INTO vet_data_patients_stg
USING vet_data_patients_temp_stg
ON vet_data_patients_stg.updated_at=vet_data_patients_temp_stg.updated_at
AND vet_data_patients_stg.id=vet_data_patients_temp_stg.id
WHEN NOT matched THEN
INSERT
(
id,
updated_at,
<<<my_other_fields>>>
)
VALUES
(
vet_data_patients_temp_stg.id,
vet_data_patients_temp_stg.updated_at,
<<<my_other_fields>>>
)
My issue is that this method will leave older ID's/UPDATED_AT's also in the table, but I only want the ID with the most recent UPDATED_AT, to remove the older UPDATED_AT's, and only have unique ID's in the table.
Can I accomplish this by modifying my merge statement?
My python way of auto-generating the string is:
merge_string = f'MERGE INTO {str.upper(tablex)}_{str.upper(envx)}
USING {str.upper(tablex)}_TEMP_{str.upper(envx)}
ON '+' AND '.join(f'{str.upper(tablex)}_{str.upper(envx)}.{x}={str.upper(tablex)}_TEMP_{str.upper(envx)}.{x}' for x in keysx) + f'
WHEN NOT MATCHED THEN INSERT ({field_columnsx})
VALUES ' + '(' + ','.join(f'{str.upper(tablex)}_TEMP_{str.upper(envx)}.{x}' for x in fieldsx) + ')'
EDIT - Examples to more clearly illustrate goal -
So if my TABLE_STG has:
ID|UPDATED_AT|FIELD
0|2018-01-01|X
1|2020-01-01|A
2|2020-02-01|B
And my API gets the following in TABLE_TEMP_STG:
ID|UPDATED_AT|FIELD
1|2020-02-01|A
2|2020-02-01|B
I currently end up with:
ID|UPDATED_AT|FIELD
0|2018-01-01|X
1|2020-01-01|A
1|2020-02-01|A
2|2020-02-01|B
But I really want tp remove the older updated_at's and end up with:
ID|UPDATED_AT|FIELD
0|2018-01-01|X
1|2020-02-01|A
2|2020-02-01|B

We can do deletes in the MATCHED branch of a MERGE statement. Your code needs to look like this:
MERGE
INTO vet_data_patients_stg
USING vet_data_patients_temp_stg
ON vet_data_patients_stg.updated_at=vet_data_patients_temp_stg.updated_at
AND vet_data_patients_stg.id=vet_data_patients_temp_stg.id
WHEN NOT matched THEN
INSERT
(
id,
updated_at,
<<<my_other_fields>>>
)
VALUES
(
vet_data_patients_temp_stg.id,
vet_data_patients_temp_stg.updated_at,
<<<my_other_fields>>>
)
WHEN matched THEN
UPDATE
SET some_other_field = vet_data_patients_temp_stg.some_other_field
DELETE WHERE 1 = 1
This will delete all the rows which are updated, that is all the updated rows.
Note that you need to include the UPDATE clause even though you want to delete all of them. The DELETE logic is applied only to records which are updated, but the syntax doesn't allow us to leave it out.
There is a proof of concept on db<>fiddle.
Re-writing the python code to generate this statement is left as an exercise for the reader :)
The Seeker hasn't posted a representative test case providing sample sets of input data and a desired outcome derived from those samples. So it may be that this doesn't do what they are expecting.

Reuse output of two MapReduce jobs and join the results together

I would like to join the output of two different MapReduce jobs. I want to be able to do something like I have below, but I cannot figure out how to reuse results from previous jobs and join them. How could I do this?
Job1:
Andrea Vanzo, c288f70f-f417-4a96-8528-25c61372cae7, 125
Job2:
c288f70f-f417-4a96-8528-25c61372cae7, 071e1103-1b06-4671-8324-a9beb3e90d18, 25
Result:
Andrea Vanzo, c288f70f-f417-4a96-8528-25c61372cae7, 25

you can use JobControl to set your workflow in your mappereduce ,BTW read job1&job2 's output (use MultipleInputs) also could solved your problem .
Use different processing methods and write data according to the path of the data.
mapper
job1data == job1.path = > split write key data[1] ,value data[0]+"tagjob1"
job2data ==job2.path = >split write key data[0] ,value data[0]+"tagjob2"
reducer
each key has it value sets .
put the values into two list group by your "tag"
write the key and each Cartesian product of the two list .
hopes

Neo4j import csv / DHCP data controlling duplication

i am confused on how to import data
I have a csv from DHCP with _time, hostname, IP_addr
I would like to add any changed IPs as new relationships, but keep the old ip relationships with a status attribute inactive, also think I want to limt to the last 10.
I am not sure the easiest way to do this in cypher, or should I be in python for this complexity
maybe an always add (remove duplicates)/csv import
and a second query to deactivate any old ips (how do I query non current if i have time as an attribute of relationship)
and a third query to remove relationships that if more that 10 previous ips are hanging off it.
any help or thoughts would be greatly appreciated

Sounds like fun. Not sure if every host-ip combination appears only once in a csv or also at later times like an "still-here" update
Import Statement
LOAD CSV FROM "url" AS row
MERGE (h:Host {name:row.hostname})
MERGE (ip:IP {name:row.IP_addr})
MERGE (h)-[:IP]->(ip) ON CREATE SET rel.created = row._time, rel.status = 1
// optional for pre-existing/previous rels
ON MATCH SET rel.status = 0
SET rel.updated = row._time;
Cleanup statement
MATCH (h:Host) WHERE size( (h)-[:IP]->() ) > 1
MATCH (h)-[rel:IP]->(:IP)
WITH h,rel ORDER BY rel.updated DESC
WITH h, collect(rel) as rels
// not necessary when the status is set above
FOREACH (r in rels[1..9] | SET r.status=0)
FOREACH (r IN rels[10..-1] | DELETE r)
When the status is set correctly in the load statement
MATCH (h:Host)-[rel:IP {status:0}]->(:IP)
WITH h,rel ORDER BY rel.updated DESC
WITH h, collect(rel) as rels
FOREACH (r IN rels[9..-1] | DELETE r)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Neo4J Query Create new or Replace existing Relationship - python

Related

Pyneo Inserts Limited Amount of Edges

Python Gitlab API - list shared projects of a group/subgroup

How to overwrite older existing ID's when merging into new table?

Reuse output of two MapReduce jobs and join the results together

Neo4j import csv / DHCP data controlling duplication

Categories

Resources