Neo4j import csv / DHCP data controlling duplication

Neo4j import csv / DHCP data controlling duplication - python

i am confused on how to import data
I have a csv from DHCP with _time, hostname, IP_addr
I would like to add any changed IPs as new relationships, but keep the old ip relationships with a status attribute inactive, also think I want to limt to the last 10.
I am not sure the easiest way to do this in cypher, or should I be in python for this complexity
maybe an always add (remove duplicates)/csv import
and a second query to deactivate any old ips (how do I query non current if i have time as an attribute of relationship)
and a third query to remove relationships that if more that 10 previous ips are hanging off it.
any help or thoughts would be greatly appreciated

Sounds like fun. Not sure if every host-ip combination appears only once in a csv or also at later times like an "still-here" update
Import Statement
LOAD CSV FROM "url" AS row
MERGE (h:Host {name:row.hostname})
MERGE (ip:IP {name:row.IP_addr})
MERGE (h)-[:IP]->(ip) ON CREATE SET rel.created = row._time, rel.status = 1
// optional for pre-existing/previous rels
ON MATCH SET rel.status = 0
SET rel.updated = row._time;
Cleanup statement
MATCH (h:Host) WHERE size( (h)-[:IP]->() ) > 1
MATCH (h)-[rel:IP]->(:IP)
WITH h,rel ORDER BY rel.updated DESC
WITH h, collect(rel) as rels
// not necessary when the status is set above
FOREACH (r in rels[1..9] | SET r.status=0)
FOREACH (r IN rels[10..-1] | DELETE r)
When the status is set correctly in the load statement
MATCH (h:Host)-[rel:IP {status:0}]->(:IP)
WITH h,rel ORDER BY rel.updated DESC
WITH h, collect(rel) as rels
FOREACH (r IN rels[9..-1] | DELETE r)

Related

How to overwrite older existing ID's when merging into new table?

I currently am cacheing data from an API by storing all data to a temporary table and merging into a non-temp table where ID/UPDATED_AT is unique.
ID/UPDATED_AT example:
MERGE
INTO vet_data_patients_stg
USING vet_data_patients_temp_stg
ON vet_data_patients_stg.updated_at=vet_data_patients_temp_stg.updated_at
AND vet_data_patients_stg.id=vet_data_patients_temp_stg.id
WHEN NOT matched THEN
INSERT
(
id,
updated_at,
<<<my_other_fields>>>
)
VALUES
(
vet_data_patients_temp_stg.id,
vet_data_patients_temp_stg.updated_at,
<<<my_other_fields>>>
)
My issue is that this method will leave older ID's/UPDATED_AT's also in the table, but I only want the ID with the most recent UPDATED_AT, to remove the older UPDATED_AT's, and only have unique ID's in the table.
Can I accomplish this by modifying my merge statement?
My python way of auto-generating the string is:
merge_string = f'MERGE INTO {str.upper(tablex)}_{str.upper(envx)}
USING {str.upper(tablex)}_TEMP_{str.upper(envx)}
ON '+' AND '.join(f'{str.upper(tablex)}_{str.upper(envx)}.{x}={str.upper(tablex)}_TEMP_{str.upper(envx)}.{x}' for x in keysx) + f'
WHEN NOT MATCHED THEN INSERT ({field_columnsx})
VALUES ' + '(' + ','.join(f'{str.upper(tablex)}_TEMP_{str.upper(envx)}.{x}' for x in fieldsx) + ')'
EDIT - Examples to more clearly illustrate goal -
So if my TABLE_STG has:
ID|UPDATED_AT|FIELD
0|2018-01-01|X
1|2020-01-01|A
2|2020-02-01|B
And my API gets the following in TABLE_TEMP_STG:
ID|UPDATED_AT|FIELD
1|2020-02-01|A
2|2020-02-01|B
I currently end up with:
ID|UPDATED_AT|FIELD
0|2018-01-01|X
1|2020-01-01|A
1|2020-02-01|A
2|2020-02-01|B
But I really want tp remove the older updated_at's and end up with:
ID|UPDATED_AT|FIELD
0|2018-01-01|X
1|2020-02-01|A
2|2020-02-01|B

We can do deletes in the MATCHED branch of a MERGE statement. Your code needs to look like this:
MERGE
INTO vet_data_patients_stg
USING vet_data_patients_temp_stg
ON vet_data_patients_stg.updated_at=vet_data_patients_temp_stg.updated_at
AND vet_data_patients_stg.id=vet_data_patients_temp_stg.id
WHEN NOT matched THEN
INSERT
(
id,
updated_at,
<<<my_other_fields>>>
)
VALUES
(
vet_data_patients_temp_stg.id,
vet_data_patients_temp_stg.updated_at,
<<<my_other_fields>>>
)
WHEN matched THEN
UPDATE
SET some_other_field = vet_data_patients_temp_stg.some_other_field
DELETE WHERE 1 = 1
This will delete all the rows which are updated, that is all the updated rows.
Note that you need to include the UPDATE clause even though you want to delete all of them. The DELETE logic is applied only to records which are updated, but the syntax doesn't allow us to leave it out.
There is a proof of concept on db<>fiddle.
Re-writing the python code to generate this statement is left as an exercise for the reader :)
The Seeker hasn't posted a representative test case providing sample sets of input data and a desired outcome derived from those samples. So it may be that this doesn't do what they are expecting.

Problem when deleting records from mongodb using pymongo

So I have some 50 document ID's. My python veno list contains document ID's as shown below.
5ddfc565bd293f3dbf502789
5ddfc558bd293f3dbf50263b
5ddfc558bd293f3dbf50264f
5ddfc558bd293f3dbf50264d
5ddfc565bd293f3dbf502792
But when I am trying to delete those 50 document ID's then I am finding a hard time. Let me explain - I need to run my python script over and over again in order to delete all the 50 documents. The first time I run my script it will delete some 10, the next time I run then it deletes 18 and so on. My for loop is pretty simple as shown below
for i in veno:
vv = i[0]
db.Products2.delete_many({'_id': ObjectId(vv)})

If your list is just the ids, then you want:
for i in veno:
db.Products2.delete_many({'_id': ObjectId(i)})
full example:
from pymongo import MongoClient
from bson import ObjectId
db = MongoClient()['testdatabase']
# Test data setup
veno = [str(db.testcollection.insert_one({'a': 1}).inserted_id) for _ in range(50)]
# Quick peek to see we have the data correct
for x in range(3): print(veno[x])
print(f'Document count before delete: {db.testcollection.count_documents({})}')
for i in veno:
db.testcollection.delete_many({'_id': ObjectId(i)})
print(f'Document count after delete: {db.testcollection.count_documents({})}')
gives:
5ddffc5ac9a13622dbf3d88e
5ddffc5ac9a13622dbf3d88f
5ddffc5ac9a13622dbf3d890
Document count before delete: 50
Document count after delete: 0

I dont have any mongo instance to test but what about
veno = [
'5ddfc565bd293f3dbf502789',
'5ddfc558bd293f3dbf50263b',
'5ddfc558bd293f3dbf50264f',
'5ddfc558bd293f3dbf50264d',
'5ddfc565bd293f3dbf502792',
]
# Or for your case (Whatever you have in **veno**)
veno = [vv[0] for vv in veno]
####
db.Products2.delete_many({'_id': {'$in':[ObjectId(vv) for vv in veno]}})
If this doesnt work, then maybe this
db.Products2.remove({'_id': {'$in':[ObjectId(vv) for vv in veno]}})
From what I understand, delete_many's first argument is filter, so
its designed in such a way, that you dont delete particular documents
but instead documents that satisfies particular condition.
In above case, best is delete all documents at once by saying -> delete all documents whose _id is in ($in) the list [ObjectId(vv) for vv in veno]

Neo4J Query Create new or Replace existing Relationship

I have a scenario in which I will have to create nodes for a new relationship or if nodes exist and a relationship exists, I have to replace the existing relationship with new one. Only one relationship will exist between 2 nodes.
Below commands doesn't seem to be working when I call from Python client using GDB.query:
match (a:user)-[r]->(b:user)
where a.id='3' and b.id='5'
merge (a)-[r2:test]->(b)
SET r2 = r SET r2.percentage = 80
WITH r
DELETE r
return r
MATCH (a:user),(b:user)
WHERE a.id='3' AND b.id='5'
MERGE (a)-[r:test]->(b)
RETURN r

If you want to replace an existing relationship of a particular type with a new one:
match (a:user {id:'3'})
match (b:user {id:'5'})
merge (a)-[newRel:NEW_TYPE]->(b) //create the new rel if missing
set newRel.percentage = 80
match (a)-[oldRel:OLD_TYPE]->(b) //match the old rel
delete oldRel //and delete it
But if you just want to set a property on an existing relationship and create it if missing:
match (a:user {id:'3'})
match (b:user {id:'5'})
merge (a)-[rel:REL_TYPE]->(b) //creates a new rel if it doesn't exist
set rel.percentage = 80

Finally, got the right query. First we execute the match and if it doesn't work, we execute the second query which does a create, if it already exists, it doesn't do anything.
match (a:user)-[r]->(b:user)
where a.id=3 and b.id=5
merge (a)-[r2:test4]->(b)
set r2.percentage = 50
delete r
return a,b, r2
MERGE (a:user {id:3})-[r:test]->(b:user {id:5})
ON CREATE
SET r.percentage = 55
ON MATCH
SET r.percentage = 55

Filtering a set of data based on indices in line

I have a python script that pulls data from an external servers SQL database and sum's the values based on transaction numbers. I've gotten some assistance in cleaning up the result sets - which have been a huge help, but now I've hit another problem.
My original query:
SELECT th.trans_ref_no, th.doc_no, th.folio_yr, th.folio_mo, th.transaction_date, tc.prod_id, tc.gr_gals FROM TransHeader th, TransComponents tc WHERE th.term_id="%s" and th.source="L" and th.folio_yr="%s" and th.folio_mo="%s" and (tc.prod_id="TEXLED" or tc.prod_id="103349" or tc.prod_id="103360" or tc.prod_id="103370" or tc.prod_id="113107" or tc.prod_id="113093")and th.trans_ref_no=tc.trans_ref_no;
Returns a set of data that I've copied a snippet here:
"0520227370","0001063257","2014","01","140101","113107","000002000"
"0520227370","0001063257","2014","01","140101","TEXLED","000002550"
"0520227378","0001063265","2014","01","140101","113107","000001980"
"0520227378","0001063265","2014","01","140101","TEXLED","000002521"
"0520227380","0001063267","2014","01","140101","113107","000001500"
"0520227380","0001063267","2014","01","140101","TEXLED","000001911"
"0520227384","0001063271","2014","01","140101","113107","000003501"
"0520227384","0001063271","2014","01","140101","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000004000"
"0520227384","0001063271","2014","01","140101","TEXLED","000005103"
"0520227385","0001063272","2014","01","140101","113107","000007500"
"0520227385","0001063272","2014","01","140101","TEXLED","000009565"
"0520227388","0001063275","2014","01","140101","113107","000002000"
"0520227388","0001063275","2014","01","140101","TEXLED","000002553"
The updated query runs this twice and JOINS the trans_ref_no, which is the first position in the result set, so the first 6 lines get condensed into three, and the last four lines get condensed into two. The problem I'm having is getting transaction number 0520227384 to get condensed to two lines.
SELECT t1.trans_ref_no, t1.doc_no, t1.folio_yr, t1.folio_mo, t1.transaction_date, t1.prod_id, t1.gr_gals, t2.prod_id, t2.gr_gals FROM (SELECT th.trans_ref_no, th.doc_no, th.folio_yr, th.folio_mo, th.transaction_date, tc.prod_id, tc.gr_gals FROM Tms6Data.TransHeader th, Tms6Data.TransComponents tc WHERE th.term_id="00000MA" and th.source="L" and th.folio_yr="2014" and th.folio_mo="01" and (tc.prod_id="103349" or tc.prod_id="103360" or tc.prod_id="103370" or tc.prod_id="113107" or tc.prod_id="113093") and th.trans_ref_no=tc.trans_ref_no) t1 JOIN (SELECT th.trans_ref_no, th.doc_no, th.folio_yr, th.folio_mo, th.transaction_date, tc.prod_id, tc.gr_gals FROM Tms6Data.TransHeader th, Tms6Data.TransComponents tc WHERE th.term_id="00000MA" and th.source="L" and th.folio_yr="2014" and th.folio_mo="01" and tc.prod_id="TEXLED" and th.trans_ref_no=tc.trans_ref_no) t2 ON t1.trans_ref_no = t2.trans_ref_no;
Here is what the new query returns for transaction number 0520227384:
"0520227384","0001063271","2014","01","140101","113107","000003501","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000003501","TEXLED","000005103"
"0520227384","0001063271","2014","01","140101","113107","000004000","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000004000","TEXLED","000005103"
What I need to get out of this is a set of condensed lines where, in this group, the seconds and third need to be removed:
"0520227384","0001063271","2014","01","140101","113107","000003501","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000004000","TEXLED","000005103"
How can I go about filtering these lines from the updated query result set?

i think, the answer is:
(... your heavy sql ..) group by 7
or
(... your heavy sql ..) group by t1.gr_gals

Loop on Select_Analysis tool (Python and ArcGIS 9.3)

First, I'm new in Python and I work on Arc GIS 9.3.
I'd like to realize a loop on the "Select_Analysis" tool. Indeed I have a layer "stations" composed of all the bus stations of a city.
The layer has a field "rte_id" that explains on what line a station is located.
And I'd like to save in distinct layers all the stations with "rte_id" = 1, the stations with "rte_id" = 2 and so on. Hence the use of the tool select_analysis.
So, I decided to make a loop (I have 70 different "rte_id" .... so 70 different layers to create!). But it does not work and I'm totally lost!
Here is my code:
import arcgisscripting, os, sys, string
gp = arcgisscripting.create(9.3)
gp.AddToolbox("C:/Program Files (x86)/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx")
stations = "d:/Travaux/NantesMetropole/Traitements/SIG/stations.shp"
field = "rte_id"
for i in field:
gp.Select_Analysis (stations, "d:/Travaux/NantesMetropole/Traitements/SIG/stations_" + i + ".shp", field + "=" + i)
i = i+1
print "ok"
And here is the error message:
gp.Select_Analysis (stations, "d:/Travaux/NantesMetropole/Traitements/SIG/stations_" + i + ".shp", field + "=" + i)
TypeError: can only concatenate list (not "str") to list
Have you got any ideas to solve my problem?
Thanks in advance!
Julien

The main problem here is in the string
for i in field:
You are trying to iterate a string - field name ("rte_id").
This is not correct.
You need to iterate all possible values of field "rte_id".
Easiest solution:
if you know that field "rte_id" have values 1 - 70 (for example) then you can try
for i in range(1, 71):
shp_name = "d:/Travaux/NantesMetropole/Traitements/SIG/stations_" + str(i) + ".shp"
expression = '{0} = {1}'.format(field, i)
gp.Select_Analysis (stations, shp_name , expression)
print "ok"
More sophisticated solution:
You need to get a list of all unique values of field "rte_id" in terms of SQL - to perform GROUP BY.
I think it is not actually possible to perform GROUP BY operation on SHP files with one tool.
You can use SearchCursor, iterate through all features and generate a list of unique values of you field. But this is more complex task.
Another way is to use the Summarize option on the shapefile table in ArcMap (open table, right click on the column header). You will get dbf table with unique values which you can read in your script.
I hope it will help you to start!
Don't have ArcGIS right now and can't write and check any script.

You will need to make substantial changes to this code in order to get it to do what you want. You may just want to download the Split Layer By Attribute Code from ArcGIS online which does the exact same thing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Neo4j import csv / DHCP data controlling duplication - python

Related

How to overwrite older existing ID's when merging into new table?

Problem when deleting records from mongodb using pymongo

Neo4J Query Create new or Replace existing Relationship

Filtering a set of data based on indices in line

Loop on Select_Analysis tool (Python and ArcGIS 9.3)

Categories

Resources