py2neo: depending batch insertion

py2neo: depending batch insertion - python

I use py2neo (v 1.9.2) to write data to a neo4j db.
batch = neo4j.WriteBatch(graph_db)
current_relationship_index = graph_db.get_or_create_index(neo4j.Relationship, "Current_Relationship")
touched_relationship_index = graph_db.get_or_create_index(neo4j.Relationship, "Touched_Relationship")
get_rel = current_relationship_index.get(some_key1, some_value1)
if len(get_rel) == 1:
batch.add_indexed_relationship(touched_relationship_index, some_key2, some_value2, get_rel[0])
elif len(get_rel) == 0:
created_rel = current_relationship_index.create(some_key3, some_value3, (my_start_node, "KNOWS", my_end_node))
batch.add_indexed_relationship(touched_relationship_index, some_key4, "touched", created_rel)
batch.submit()
Is there a way to replace current_relationship_index.get(..) and current_relationship_index.create(...) with a batch command? I know that there is one, but the problem is, that I need to act depending on the return of these commands. And I would like to have all statements in a batch due to performance.
I have read that it is rather uncommon to index relationships but the reason I do it is the following: I need to parse some (text) file everyday and then need to check if any of the relations have changed towards the previous day, i.e. if a relation does not exist in the text file anymore I want to mark it with a "replaced" property in neo4j. Therefore, I add all "touched" relationships to the appropriate index, so I know that these did not change. All relations that are not in the touched_relationship_index obviously do not exist anymore so I can mark them.
I can't think of an easier way to do so, even though I'm sure that py2neo offers one.
EDIT: Considering Nigel's comment I tried this:
my_rel = batch.get_or_create_indexed_relationship(current_relationship_index, some_key, some_value, my_start_node, my_type, my_end_node)
batch.add_indexed_relationship(touched_relationship_index, some_key2, some_value2, my_rel)
batch.submit()
This obviously does not work, because i can't refer to "my_rel" in the batch. How can I solve this? Refer with "0" to the result of the previous batch statement? But consider that the whole thing is supposed to run in a loop, so the numbers are not fixed. Maybe use some variable "batch_counter" which refers to the current batch statement and is always incremented, whenever a statement is added to the batch??

Have a look at WriteBatch.get_or_create_indexed_relationship. That can conditionally create a relationship based on whether or not one currently exists and operates atomically. Documentation link below:
http://book.py2neo.org/en/latest/batches/#py2neo.neo4j.WriteBatch.get_or_create_indexed_relationship
There are a few similar uniqueness management facilities in py2neo that I recently blogged about here that you might want to read about.

Related

How to check if a key already exists in the database using Firebase and Python?

Essentially, I'll be using a database of this structure:
to keep track of the users' xp. Under the xp_data section, there will be multiple timestamps and xp numbers for each timestamp. A function will run every 24 hours, that will log the users' XP. I want to have some way to check if the player is already in the database (and if so, add to their existing xp count) and if not, create a new node for them. Here is my code for writing to the server:
db_ref = db.reference('/')
for i in range(100):
tom = await mee6API.levels.get_leaderboard_page(i)
if xp_trigger:
break
this_lb_list = {}
for l in tom['players']:
if l['xp'] < 300:
xp_trigger = True
break
this_lb_list.update({l['id']: {'name': l['username'], 'xp_data': {time.strftime(time_format_str, time.gmtime()): l['xp']}}})
details += [{ int(l['id']) : l['xp']}]
print(i)
db_ref.update(this_lb_list)
Basically, this code loops through each page in the leaderboard, obtains the XP for each user, and appends it to a dict, which is then used to update the database. there are two problems with this code, one is that it does not check if the user already exists, meaning that, and this is the second problem, that it overwrites the user's existing data. I've also attempted to write the data for each player individually, but problem 1 was still an issue, and it was painfully slow. What can I do to rectify this?

When you pass a value for a property in update(), that value replaces the entire existing value of the property in the database. So while update() leaves the properties you don't specify in the call unmodified, it does completely replace any property you do specify.
To add a value to an existing property, you'll want to specify the entire path as the key, separating the various child nodes with /.
So something like:
this_lb_list.update({'xp_data/13-Auth-2021': l['xp']})
This will write only the 13-Auth-2021 of xp_data, leaving all other child nodes of xp_data unmodified.
You'll of course want to use a variable for the date/time, but the important thing is that you specify it in the key, and not in the value of the dictionary.

Dynamo Revit set formula for a parameter in a family

I am trying to add a formula to a parameter within a Revit Family.
Currently I have multiple families in a project. I run Dynamo from within that project then I extract the families that I want to modify using Dynamo standard nodes.
Then I use a python script node that goes through every selected family and find the parameter I am interested in, and assign a formula for it.
That seemed fine until I noticed that it is not assigning the formula, but it is entering it as a string — as in it is in quotes. And sure enough, the code i am using will only work with Text type parameters.
Can someone shed the light on how to assign a formula to a parameter using dynamo?
see line 32 in code below
Thanks
for family in families:
TransactionManager.Instance.ForceCloseTransaction()
famdoc = doc.EditFamily(family)
FamilyMan = famdoc.FamilyManager
found.append(family.Name)
TransactionManager.Instance.EnsureInTransaction(famdoc)
check = 0
# Loop thru the list of parameters to assign formula values to them... these are given as imput
for r in range(len(param_name_lst)):
# Loop thru the list of parameters in the current family per the families outter loop above.
for param in FamilyMan.Parameters:
#for param in FamilyMan.get_Parameter(param_name_lst[r]):
# for each of the parameters get their name and store in paramName.
paramName = param.Definition.Name
# Check if we have a match in parameter name.
if param_name_lst[r] in paramName:
if param.CanAssignFormula:
canassignformula.append(param_name_lst[r])
else:
cannotassignformula.append(param_name_lst[r])
try:
# Make sure that the parameter is not locked.
if FamilyMan.IsParameterLocked(param):
FamilyMan.SetParameterLocked(param,False)
locked.append(paraName)
# Enter formula value to parameter.
FamilyMan.SetFormula(param, param_value_lst[r])
check += 1
except:
failed.append(paramName)
else:
continue

Actually, you can access the family from the main project, and you can assign a formula automatically.... That's what i currently do, i load all the families i want in one project and run the script.
After a lot of work, i was able to figure out what i was doing wrong, and in it is not in my code... my code was fine.
The main problem is that i need to have all of my formula's dependencies lined up.... just like in manual mode.
so if my formula is:
size_lookup(MY_ID_tbl, "MY_VAR", "MY_DefaultValue", ND1,ND2)
then i need to have the following:
MY_ID_tbl should exist and be assigned a valid value, in this case it should have a csv filename. Moreover, that file should be also loaded. This is important for the next steps.
MY_VAR should be defined in that csv file, so Does ND1, ND2
The default value (My_Default_Value) should match what that csv file says about that variable...in this case, it is a text.
Needless to say, i did not have all of the above lined up as it should be, once i fixed that, my setFormula code did its job. And i had to change my process altogether, cause i have to first create the MY_ID_tbl and load the csv file which i also do using dynamo, then i go and enter the formulas using dynamo.

Revit parameters can only be assigned to a formula inside the family editor only, that is the first point, so you should run your dynamo script inside the family editor for each family which will be a waste of time and you just edit the parameter's formula manually inside each family.
and the second point, I don't even think that it is possible to set a certain parameter's formula automatically, it must be done manually ( I haven't seen anything for it in the Revit API docs).

NDB .order returns an empty result

I have two entities in my database which are connected. We'll call them A and B. I have an instance of A in memory (we'll call him a), and the following query currently works:
B.query(B.parent == a.key).fetch(limit=None)
But the following code returns en empty set, even in dev mode with indexes being automatically created:
B.query(B.parent == a.key).order(B.foo, B.bar).fetch(limit=None)
I've tried every combination I can think of, and I'm completely stumped.

Turns out the fields in question were made as TextProperty by a previous dev, which are un-indexable, and thus un-searchable.

This is what you want:
B.query(ancestor=a.key)
I don't believe any of the snippets you posted will even work.

What is the most efficient way to do a ONE:ONE relation on google app engine datastore

Even with all I do know about the AppEngine datastore, I don't know the answer to this. I'm trying to avoid having to write and run all the code it would take to figure it out, hoping someone already knows the answer.
I have code like:
class AddlInfo(db.Model)
user = db.ReferenceProperty(User)
otherstuff = db.ListProperty(db.Key, indexed=False)
And create the record with:
info = AddlInfo(user=user)
info.put()
To get this object I can do something like:
# This seems excessively wordy (even though that doesn't directly translate into slower)
info = AddlInfo.all().filter('user =', user).fetch(1)
or I could do something like:
class AddlInfo(db.Model)
# str(user.key()) is the key to this record
otherstuff = db.ListProperty(db.Key, indexed=False)
Creation looks like:
info = AddlInfo(key_name=str(user.key()))
info.put()
And then get the info with:
info = AddlInfo.get(str(user.key()))
I don't need the reference_property in the AddlInfo, (I got there using the user object in the first place). Which is faster/less resource intensive?
==================
Part of why I was doing it this way is that otherstuff could be a list of 100+ keys and I only need them sometimes (probably less than 50% of the time) I was trying to make it more efficient by not having to load those 100+ keys on every request.....

Between those 2 options, the second is marginally cheaper, because you're determining the key by inference rather than looking it up in a remote index.
As Wooble said, it's cheaper still to just keep everything on one entity. Consider an Expando if you just need a way to store a bunch of optional, ad-hoc properties.

The second approach is the better one, with one modification: There's no need to use the whole key of the user as the key name of this entity - just use the same key name as the User record.

GeoModel with Google App Engine - queries

I'm trying to use GeoModel python module to quickly access geospatial data for my Google App Engine app.
I just have a few general questions for issues I'm running into.
There's two main methods, proximity_fetch and bounding_box_fetch, that you can use to return queries. They actually return a result set, not a filtered query, which means you need to fully prepare a filtered query before passing it in. It also limits you from iterating over the query set, since the results are fetched, and you don't have the option to input an offset into the fetch.
Short of modifying the code, can anyone recommend a solution for specifying an offset into the query? My problem is that I need to check each result against a variable to see if I can use it, otherwise throw it away and test the next. I may run into cases where I need to do an additional fetch, but starting with an offset.

You can also work directly with the location_geocells of your model.
from geospatial import geomodel, geocell, geomath
# query is a db.GqlQuery
# location is a db.GeoPt
# A resolution of 4 is box of environs 150km
bbox = geocell.compute_box(geocell.compute(geo_point.location, resolution=4))
cell = geocell.best_bbox_search_cells (bbox, geomodel.default_cost_function)
query.filter('location_geocells IN', cell)
# I want only results from 100kms.
FETCHED=200
DISTANCE=100
def _func (x):
x.dist = geomath.distance(geo_point.location, x.location)
return x.dist
results = sorted(query.fetch(FETCHED), key=_func)
results = [x for x in results if x.dist <= DISTANCE]

There's no practical way to do this, because a call to geoquery devolves into multiple datastore queries, which it merges together into a single result set. If you were able to specify an offset, geoquery would still have to fetch and discard all the first n results before returning the ones you requested.
A better option might be to modify geoquery to support cursors, but each query would have to return a set of cursors, not a single one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

py2neo: depending batch insertion - python

Related

How to check if a key already exists in the database using Firebase and Python?

Dynamo Revit set formula for a parameter in a family

NDB .order returns an empty result

What is the most efficient way to do a ONE:ONE relation on google app engine datastore

GeoModel with Google App Engine - queries

Categories

Resources