ArangoDB best way to get_or_create a document - python

I'm performing what I imagine is a common pattern with indexing graph databases: my data is a list of edges and I want to "stream" the upload of this data. I.e, for each edge, I want to create the two nodes on each side and then create the edge between them; I don't want to first upload all the nodes and then link them afterwards. A naive implementation would result in a lot of duplicate nodes obviously. Therefore, I want to implement some sort of "get_or_create" to avoid duplication.
My current implementation is below, using pyArango:
def get_or_create_graph(self):
db = self._get_db()
if db.hasGraph('citator'):
self.g = db.graphs["citator"]
self.judgment = db["judgment"]
self.citation = db["citation"]
else:
self.judgment = db.createCollection("judgment")
self.citation = db.createCollection("citation")
self.g = db.createGraph("citator")
def get_or_create_node_object(self, name, vertex_data):
object_list = self.judgment.fetchFirstExample(
{"name": name}
)
if object_list:
node = object_list[0]
else:
node = self.g.createVertex('judgment', vertex_data)
node.save()
return node
My problems with this solution are:
Since the application, not the database, is checking existence, there could be an insertion between the existence check and the creation. I have found duplicate nodes in practice I suspect this is why?
It isn't very fast. Probably because it hits the DB twice potentially.
I am wandering whether there is a faster and/or more atomic way to do this, ideally a native ArangoDB query? Suggestions? Thank you.
Update
As requested, calling code shown below. It's in a Django context, where Link is a Django model (ie data in a database):
... # Class definitions etc
links = Link.objects.filter(dirty=True)
for i, batch in enumerate(batch_iterator(links, limit=LIMIT, batch_size=ITERATOR_BATCH_SIZE)):
for link in batch:
source_name = cleaner.clean(link.case.mnc)
target_name = cleaner.clean(link.citation.case.mnc)
if source_name == target_name: continue
source_data = _serialize_node(link.case)
target_data = _serialize_node(link.citation.case)
populate_pair(citation_manager, source_name, source_data, target_name, target_data, link)
def populate_pair(citation_manager, source_name, source_data, target_name, target_data, link):
source_node = citation_manager.get_or_create_node_object(
source_name,
source_data
)
target_node = citation_manager.get_or_create_node_object(
target_name,
target_data
)
description = source_name + " to " + target_name
citation_manager.populate_link(source_node, target_node, description)
link.dirty = False
link.save()
And here's a sample of what the data looks like after cleaning and serializing:
source_data: {'name': 'P v R A Fu', 'court': 'ukw', 'collection': 'uf', 'number': 'CA 139/2009', 'tag': 'NA', 'node_id': 'uf89638', 'multiplier': '5.012480529547776', 'setdown_year': 0, 'judgment_year': 0, 'phantom': 'false'}
target_data: {'name': 'Ck v R A Fu', 'court': 'ukw', 'collection': 'uf', 'number': '10/22147', 'tag': 'NA', 'node_id': 'uf67224', 'multiplier': '1.316227766016838', 'setdown_year': 0, 'judgment_year': 0, 'phantom': 'false'}
source_name: [2010] ZAECGHC 9
target_name: [2012] ZAGPJHC 189

I don't know with the Python driver. But this could be done using AQL
FOR doc in judgement
Filter doc.name == "name"
Limit 1
Insert merge((vertexobject, { _from: doc.id }) into citator
The vertextObject need to be an AQL object with at least the _to value
Note There may be typo I'm answering from my phone

Related

How to update this for NetworkX 2.0?

I am trying to write this but I just realized attr_dict is not supported in the new version of NetworkX as I am not getting the desired rows from the code below.
Can someone tell me how to update this piece of code? Adding the row part is what's the issue as attr_dict is no longer supported.
if not this_user_id in G:
G.add_node(this_user_id, attr_dict={
'followers': row['followers'],
'age': row['age'],
})
This is the code in context
# Gather the data out of the row
#
this_user_id = row['author']
author = row['retweet_of']
followers = row['followers']
age = row['age']
rtfollowers = row['rtfollowers']
rtage = row['rtage']
#
# Is the sender of this tweet in our network?
#
if not this_user_id in G:
G.add_node(this_user_id, attr_dict={
'followers': row['followers'],
'age': row['age'],
})
#
# If this is a retweet, is the original author a node?
#
if author != "" and not author in G:
G.add_node(author, attr_dict={
'followers': row['rtfollowers'],
'age': row['rtage'],
})
#
# If this is a retweet, add an edge between the two nodes.
#
if author != "":
if G.has_edge(author, this_user_id):
G[author][this_user_id]['weight'] += 1
else:
G.add_weighted_edges_from([(author, this_user_id, 1.0)])
nx.write_gexf(G, 'tweets1.gexf')
Now the add_node function accepts the node attributes directly as keyword arguments so you can reformulate the function to be:
G.add_node(this_user_id, followers = row['followers'],age = row['age'])
or if you have the attributes saved in a dict named my_attr_dict, you can say
G.add_node(this_user_id,**my_attr_dict)

Create batch records with Odoo create method

I am trying to create multiple invoices from an array of dictionaries with the create values in odoo13.
Creating one record at a time is okay but when I try with the batch record I get the error can't adapt to type dict
I have tried looping through the array and create a record for each item in it but this error persists.
I am currently checking on the #api.model_create_multi decorator but haven't grasped it yet fully.
What I want is that for each line in the visa_line(same as order line), to create an invoice from that. Some fields in creating an invoice are missing but that should not be the issue.
When I print the record, in the final function, it prints the duct with values correctly.
Here is my code, thank you in advance
def _prepare_invoice(self):
journal = self.env['account.move'].with_context(
default_type='out_invoice')._get_default_journal()
invoice_vals = {
'type': 'out_invoice',
'invoice_user_id': self.csa_id and self.csa_id.id,
'source_id': self.id,
'journal_id': journal.id,
'state': 'draft',
'invoice_date': self.date,
'invoice_line_ids': []
}
return invoice_vals
def prepare_create_invoice(self):
invoice_val_dicts = []
invoice_val_list = self._prepare_invoice()
for line in self.visa_line:
invoice_val_list['invoice_partner_bank_id'] = line.partner_id.bank_ids[:1].id,
invoice_val_list['invoice_line_ids'] = [0, 0, {
'name': line.code,
'account_id': 1,
'quantity': 1,
'price_unit': line.amount,
}]
invoice_val_dicts.append(invoice_val_list)
return invoice_val_dicts
#api.model_create_multi
def create_invoice(self, invoices_dict):
invoices_dict = self.prepare_create_invoice()
for record in invoices_dict:
print(record)
records = self.env['account.move'].create(record)
I fixed this issue by explicitly type fixing the record with duct. Using a normal create method without the #api.model_create_multi.
def create_invoice(self):
invoices_dict = self.prepare_create_invoice()
for record in invoices_dict:
records = self.env['account.move'].create(dict(record))

Is it possible to make a transaction in Odoo v10? If yes, How to do it?

Well, I have many operations inside my function. there is a loop which has a self.create() function inside, and inside it there is still another loop containing another create function and so on... BUT they depends on each other with their ID.
Here is a sample of code (minified)
#api.multi
def create_report(self):
id_report = None
reports = [ResumeReport(resume) for resume in data] # a tab containing many reports to be created
repo = self.env["module1"].search([("date", "=", str(date))])
if repo:
for r in repo:
id_report = r
else:
id_report = self.env["module1"].create({
'name': name
})
for rep in report:
self.env["module2"].create({
'report_id': id_report.id,
'name': name
})
So what I want to do is to put this code under a transaction. It will be a pretty big SQL operation, so when an error occurs, I want it to be rollbacked and cancel all operations (Transaction!). Otherwise to commit it. But I don't know if there is a possibility to do that in Odoo 10 as I had not found any doc about transaciton.
Could you help me please?
SOLVED
#api.multi
def create_report(self):
self._cr.autocommit(False)
try:
id_report = None
reports = [ResumeReport(resume) for resume in data] # a tab containing many reports to be created
repo = self.env["module1"].search([("date", "=", str(date))])
if repo:
for r in repo:
id_report = r
else:
id_report = self.env["module1"].create({
'name': name
})
for rep in report:
self.env["module2"].create({
'report_id': id_report.id,
'name': name
})
self._cr.commit()
except:
self._cr.rollback()

Software Design in Python. Should I use one module or separate?

I wrote two scripts in Python:
The first appends to a list the key values of a dictionary.
The second one uses that list to create columns on a MySQL database.
Originally I wrote them in the same module. However, is it better if I separate them in two different modules or keep them together?
If it's better separating them, what is a pythonic way to use lists from one file in another? I know importing variables is not recommended.
Here is the code:
import pymysql
# This script first extract dictionary key values. Then, it creates columns using the keys values from the dictionaries.
# login into mysql
conn = pymysql.connect("localhost", "*****", "******", "tutorial")
# creating a cursor object
c = conn.cursor()
# update here table from mysql
table_name = "table01"
# data
dicts = {'id': 5141, 'product_id': 193, 'price_ex_tax': 90.0000, 'wrapping_cost_tax': 0.0000, 'type': 'physical', 'ebay_item_id': 444, 'option_set_id': 38, 'total_inc_tax': 198.0000, 'quantity': 2, 'price_inc_tax': 99.0000, 'cost_price_ex_tax': 0.0000, 'name': 'UQ Bachelor Graduation Gown Set', 'configurable_fields': [], 'base_cost_price': 0.0000, 'fixed_shipping_cost': 0.0000, 'wrapping_message': '', 'order_address_id': 964, 'total_ex_tax': 180.0000, 'refund_amount': 0.0000, 'event_name': None, 'cost_price_inc_tax': 0.0000, 'cost_price_tax': 0.0000, 'wrapping_cost_inc_tax': 0.0000, 'wrapping_name': '', 'price_tax': 9.0000, 'is_bundled_product ': False, 'ebay_transaction_id': 4444, 'bin_picking_number': 4444, 'parent_order_product_id': None, 'event_date': '', 'total_tax': 18.0000, 'wrapping_cost_ex_tax': 0.0000, 'base_total': 198.0000, 'product_options': [{'id': 4208, 'display_name': 'Gown size (based on height)', 'name': 'Bachelor gown size', 'display_value': 'L (175-182cm)', 'display_style': 'Pick list', 'type': 'Product list', 'option_id': 19, 'value': 77, 'product_option_id': 175, 'order_product_id': 5141}, {'id': 4209, 'display_name': 'Hood', 'name': 'H-QLD-BAC-STD', 'display_value': 'UQ Bachelor Hood', 'display_style': 'Pick list', 'type': 'Product list', 'option_id': 42, 'value': 119, 'product_option_id': 176, 'order_product_id': 5141}, {'id': 4210, 'display_name': 'Trencher size (based on head circumference)', 'name': 'Trencher size', 'display_value': 'M (53-54cm)', 'display_style': 'Pick list', 'type': 'Product list', 'option_id': 20, 'value': 81, 'product_option_id': 177, 'order_product_id': 5141}], 'base_price': 99.0000, 'sku': 'S-QLD-BAC-STD', 'return_id': 0, 'applied_discounts': [{'id': 'coupon', 'amount': 30}], 'quantity_shipped': 0, 'base_wrapping_cost': 0.0000, 'is_refunded': False, 'weight': 2.0000, 'order_id': 615496} # noqa
# creating empty lists
int_keys_lists = []
str_keys_lists = []
list_keys_lists = []
def extractDictKeys():
# for loop that runs through the dictionary, extracting the keys when their valures are int or str, and appending to the corresponding list
for i, j in enumerate(dicts):
k, v = list(dicts.items())[i]
if type(dicts[j]) != str:
int_keys_lists.append(k)
else:
str_keys_lists.append(k)
def createColumnStrKeys():
# for loop that create a column for each str key on the list
for i, j in enumerate(str_keys_lists):
c.execute("ALTER TABLE {0} ADD COLUMN {1} VARCHAR(255)".format(table_name, str_keys_lists[i]))
conn.commit()
def createColumnIntKeys():
# for loop that create a column for each id or float key on the list
for i, j in enumerate(int_keys_lists):
c.execute("ALTER TABLE {0} ADD COLUMN {1} int(30)".format(table_name, int_keys_lists[i]))
conn.commit()
extractDictKeys()
createColumnStrKeys()
createColumnIntKeys()
There are some problems in your design.
Functions should not be using global variables. Functions receive variables as parameters, and then return a value (the value is void in some cases, though)
For example:
def extract_dict_keys(dictionary):
key_list = list()
int_key_list = list()
str_key_list = list()
# Your code here
..........
return key_list, int_key_list, str_key_list
def create_col__str_keys(conn, dictionary, key_list):
cursor = conn.cursor()
# Your code here
..........
# Commit outside of the loop
conn.commit() # Though it's usually not recommended to commit in the function
def create_col__int_keys(conn, dictionary, key_list):
cursor = conn.cursor()
# Your code here
..........
conn.commit()
Only after you have made all your pieces of code independent with each other, Should you be able to structure them into modules
How to structure your code into module depends on how much the codes are reusable, and how do they relate to each other. For example, I would put all global variables into a main file that would be executed, utility functions into another module, and sql-related functions into another one:
main.py
from utilities import extract_dict_keys
from sql import create_col_str_keys, create_col_int_keys
# login into mysql
conn = pymysql.connect("localhost", "*****", "******", "tutorial")
# data
data = {...} # noqa
keys, int_keys, str_keys = extract_dict_keys(data)
create_col_int_keys(conn, data, int_keys)
create_col_str_keys(conn, data, str_keys)
utilities.py
def extract_dict_keys(dictionary):
key_list = list()
int_key_list = list()
str_key_list = list()
# Your code here
..........
return key_list, int_key_list, str_key_list
sql.py
import pymysql
def create_col__str_keys(conn, dictionary, key_list):
cursor = conn.cursor()
# Your code here
..........
conn.commit()
def create_col__int_keys(conn, dictionary, key_list):
cursor = conn.cursor()
# Your code here
..........
conn.commit()
Also, concerning the SQL Injection vulnerability, you should not use cursor.execute(a_built_string)
Consider this situation:
cursor.execute("DELETE FROM users WHERE id = {uid}".format(uid=user_id))
And someone clever enough to input user_id="1 or 1 = 1". The result would be:
cursor.execute("DELETE FROM users WHERE id = 1 or 1 = 1")
Which will probably delete all users from your precious users table.
Therefore, use this instead:
cursor.execute("DELETE FROM users WHERE id = %d", user_id)
It will not create a string, compile and execute it, but compile the SQL statement, then insert the variable, then execute it, which is a lot safer
There are two things I'd suggest researching a bit before asking yourself this question.
First, you're relying on a bunch of globals in module scope. This problem you're seeing of making those globals visible in other modules is only going to get worse as you write more complex programs. You should start thinking about involving classes a bit, importing those, instantiating them and looking at design patterns such as singletons. It's much easier and cleaner to define a bunch of important lists, variables, etc. under classes and then import those and pass instances of them around between functions. Being object-oriented is pythonic.
Second, your functions are conceptually looking more like subroutines. It would better to think of functions as... well... functions. They should take some arguments (inputs) and produce (ie. return) some output. Steer away from trying to manipulate globals, which can easily become problematic, and think more about using inputs to create and return output.
Research and combine these concepts and the answer to where code should live will become more clear.

How to add multiple entries to Mongodb document at once using MongoEngine in Flask?

How can I add several entries to a document at once in Flask using MongoEngine/Flask-MongoEngine?
I tried to iterate over the dictionary that contains my entries. I simplified the example a bit, but originally the data is a RSS file that my Wordpress spits out and that I parsed via feedparser.
But the problem obviously is that I cannot dynamically generate variables that hold my entries before being saved to the database.
Here is what I tried so far.
How can I add the entries to my MongoDB database in bulk?
# model
class Entry(db.Document):
created_at = db.DateTimeField(
default=datetime.datetime.now, required=True),
title = db.StringField(max_length=255, required=True)
link = db.StringField(required=True)
# dictionary with entries
e = {'entries': [{'title': u'title1',
'link': u'http://www.me.com'
},
{'title': u'title2',
'link': u'http://www.me.com/link/'
}
]
}
# multiple entries via views
i = 0
while i<len(e['entries']):
post[i] = Entry(title=e['entries'][i]['title'], link=e['entries'][i]['title'])
post[i].save();
i += 1
Edit 1:
I thought about skipping the variables alltogether and translate the dictionary to the form that mongoengine can understand.
Because when I create a list manually, I can enter them in bulk into MongoDB:
newList = [RSSPost(title="test1", link="http://www.google.de"),
RSSPost(title="test2", link="http://www.test2.com")]
RSSPost.objects.insert(newList)
This works, but I could not translate it completely to my problem.
I tried
f = []
for x in e['entries']:
f.append("insert " + x['link'] + " and " + x['title'])
But as you see I could not recreate the list I need.
How to do it correctly?
# dictionary with entries
e = {'entries': [{'title': u'title1',
'link': u'http://www.me.com'
},
{'title': u'title2',
'link': u'http://www.me.com/link/'
}
]
}
How is your data/case different from the examples you posted? As long as I'm not missing something you should be able to instantiate Entry objects like:
entries = []
for entry in e['entries']:
new_entry = Entry(title=entry['title'], link=entry['link'])
entries.append(new_entry)
Entry.objects.insert(entries)
Quick and easy way:
for i in e['entries']:
new_e = Entry(**i)
new_e.save()

Categories