py2neo not enforcing uniqueness constraints in Neo4j database - python

I have a neo4j database with nodes that have labels "Program" and "Session". In the Neo4j database I've enforced a uniqueness constraint on the properties: "name" and "href". From the :schema
Constraints
ON (program:Program) ASSERT program.href IS UNIQUE
ON (program:Program) ASSERT program.name IS UNIQUE
ON (session:Session) ASSERT session.name IS UNIQUE
ON (session:Session) ASSERT session.href IS UNIQUE
I want to periodically query another API (thus storing the name and API endpoint href as properties), and only add new nodes when they're not already in the database.
This is how I'm creating the nodes:
newprogram, = graph_db.create(node(name = programname, href = programhref))
newprogram.add_labels('Program')
newsession, = graph_db.create(node(name = sessionname, href = sessionhref))
newsession.add_labels('Session')
I'm running into the following error:
Traceback (most recent call last):
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/Users/jedc/appfolder/applicationapis.py", line 42, in post
newprogram.add_labels('Program')
File "/Users/jedc/appfolder/py2neo/util.py", line 99, in f_
return f(*args, **kwargs)
File "/Users/jedc/appfolder/py2neo/core.py", line 1638, in add_labels
if err.response.status_code == BAD_REQUEST and err.cause.exception == 'ConstraintViolationException':
AttributeError: 'ConstraintViolationException' object has no attribute 'exception'
My thought was that if I try to add the nodes and they're already in the database they just won't be added.
I've done a try/except AttributeError block around the creation/add_labels lines, but when I did that I managed to duplicate everything that was already in the database, even though I had the constraints shown. (?!?) (How can py2neo manage to violate those constraints??)
I'm really confused, and would appreciate any help in figuring out how to add nodes only when they don't already exist.

The problem seems to be that you are first creating nodes without a label and then subsequently adding the label after creation.
That is
graph_db.create(node(name = programname, href = programhref))
and
graph_db.create(node(name = sessionname, href = sessionhref))
This, first creates nodes without any labels which means the nodes satisfy the constraint conditions which only apply to nodes with the labels Program and Session.
Once you call newprogram.add_labels('Program') and newsession.add_labels('Session') Neo4j attempts to add labels to the node and raises an exception since the constraint assertions cannot be met.
Py2neo may be creating duplicate nodes. Although I'm sure if you inspect them, you'll find one set of nodes has the labels and the other set does not.
Can you use py2neo in a way that it adds the label at the same time as creation?
Otherwise you could use a Cypher query
CREATE (program:Program{name: {programname}, href: {programhref}})
CREATE (session:Session{name: {sessionname}, href: {sessionhref}})
Using Py2neo you should be able to do this as suggested in the docs
graph_db = neo4j.GraphDatabaseService()
qs = '''CREATE (program:Program{name: {programname}, href: {programhref}})
CREATE (session:Session{name: {sessionname}, href: {sessionhref}})'''
query = neo4j.CypherQuery(graph_db, qs)
query.execute(programname=programname, programhref=programhref,
sessionname=sessionname, sessionhref=sessionhref)

Firstly, the stack trace that you've shown highlights a a bug that should be fixed in the latest version of py2neo (1.6.4 at the time of writing this). There was an issue whereby error detail dropped an expected "exception" key and this has now been fixed so upgrading should give you a better error message.
However, this only addresses the error reporting bug. In terms of the constraint question itself, it is correct that the node creation and application of labels are necessarily carried out in two steps. This is due to a limitation in the REST API that does not allow a direct method for creating a node with label detail.
The next version of py2neo will make this easier/possible in a single step via batching. But for now, you probably want to look at a Cypher statement to carry out the creation and labelling as mentioned in the other answer here.

Related

Python: yahoo_fin.stock_info.get_quote_table() not returning table

Goal:
The goal is to generate a robot in replit which will iteratively scrape yahoo pages like this amazon page, and track the dynamic 'Volume' datapoint for abnormally large changes. I'm currently trying to be able to reliably pull this exact datapoint down, and I have been using the yahoo_fin API to do so. I have also considered using bs4, but I'm not sure if it is possible to use BS4 to extract dynamic data. (I'd greatly appreciate it if you happen to know the answer to this: can bs4 extract dynamic data?)
Problem:
The script seems to work, but it does not stay online due to what appears to be an error in yahoo_fin. Usually within around 5 minutes of turning the bot on, it throws the following error:
File "/home/runner/goofy/scrape.py", line 13, in fetchCurrentVolume
table = si.get_quote_table(ticker)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/yahoo_fin/stock_info.py", line 293, in get_quote_table
tables = pd.read_html(requests.get(site, headers=headers).text)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 1098, in read_html
return _parse(
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 926, in _parse
raise retained
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 906, in _parse
tables = p.parse_tables()
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 222, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 552, in _parse_tables
raise ValueError("No tables found")
ValueError: No tables found
However, this usually happens after a number of tables have already been found.
Here is the fetchCurrentVolume function:
import yahoo_fin.stock_info as si
def fetchCurrentVolume(ticker):
table = si.get_quote_table(ticker)
currentVolume = table['Volume']
return currentVolume
and the API documentation is found above under Goal. Whenever this error message is displayed, the bot exits a #tasks.loop , and the robot goes offline. If you know of a way to fix the current use of yahoo_fin, OR any other way to obtain the dynamic data found in this xpath: '//div[#id="quote-summary"]/div/table/tbody/tr' , then you will have pulled me out of a 3 weeks long debacle with this issue! Thank you.
If you are able to retrieve some data then it cuts out, it is probably due to a rate limit. Try adding a sleep of a few seconds between each one.
see here for how to use sleep
Maybe the web server bonks out when the tables are being re-written every so often. Or something like that.
If you use a try/except waited a few seconds and then tried again before bailing out to a failure maybe that would work if it is just a hicup once in a while?
import yahoo_fin.stock_info as si
import time
def fetchCurrentVolume(ticker):
try:
table = si.get_quote_table(ticker)
currentVolume = table['Volume']
except:
# hopefully this was just a hicup and it will be back up in 5 seconds
time.sleep(5)
table = si.get_quote_table(ticker)
currentVolume = table['Volume']
return currentVolume

How to call an Odoo model's method with no parameter(except self) on a specific record through xmlrpc in Odoo 13?

I am developing a script to create a record in a model of an Odoo. I need to run this model's methods on specific records. In my case the method which I need to run on a specific record doesn't have any parameter (just has self). I want to know how can I run the method on a specific record of the model through xmlrpc call from client to Odoo server. Below is the way I tried to call the method and pass the id of a specific record regarding this question.
xmlrpc_object.execute('test_db', user, 'admin', 'test.test', 'action_check_constraint', [record_id])
action_check_constraint checks some constraints on each record of the model and if all the constraints passed, changes the state of the record or raise validation errors. But the above method call with xmlrpc raise below error:
xmlrpc.client.Fault: <Fault cannot marshal None unless allow_none is enabled: 'Traceback (most recent call last):\n File "/home/ibrahim/workspace/odoo13/odoo/odoo/addons/base/controllers/rpc.py", line 60, in xmlrpc_1\n response = self._xmlrpc(service)\n File "/home/ibrahim/workspace/odoo13/odoo/odoo/addons/base/controllers/rpc.py", line 50, in _xmlrpc\n return dumps((result,), methodresponse=1, allow_none=False)\n File "/usr/local/lib/python3.8/xmlrpc/client.py", line 968, in dumps\n data = m.dumps(params)\n File "/usr/local/lib/python3.8/xmlrpc/client.py", line 501, in dumps\n dump(v, write)\n File "/usr/local/lib/python3.8/xmlrpc/client.py", line 523, in __dump\n f(self, value, write)\n File "/usr/local/lib/python3.8/xmlrpc/client.py", line 527, in dump_nil\n raise TypeError("cannot marshal None unless allow_none is enabled")\nTypeError: cannot marshal None unless allow_none is enabled\n'>
> /home/ibrahim/workspace/scripts/automate/automate_record_creation.py(328)create_record()
Can anyone help with the correct and best way of calling a model's method (with no parameter except self) on a specific record through xmlrpc client to Odoo server?
That error is raised, because the xmlrpc library is not allowing None as return value as default. But you should change that behaviour by just allowing it.
Following line is from Odoo's external API documentation, extended to allow None as return value:
models = xmlrpc.client.ServerProxy(
'{}/xmlrpc/2/object'.format(url), allow_none=True)
For more information about xmlrpc ServerProxy look into the python documentation
You can get the error if action_check_constraint does not return anything (by default None).
Try to run the server with the log-level option set to debug_rpc_answer to get more details.
After lost of search and try first I used this fix to solve the error but I think this fix is not a best practice. So, I found OdooRPC which does the same job but it handled the above case and there's no such error for model methods which return None. Using OdooRPC solved my problem and I done what I needed to do with xmlrpc in Odoo.

PyravenDB wrong query parsing

I'm using pyravendb + ravendb in order to store webpages. The main issue here is that when the url is in this form :
http://www.somedomain.com/nicepage.html?stuff=param&id=021345
pyravendb seems lost and tries to find the 021345 index (which obviously doesnt exists).
An example would be the following :
the url
http://www.example.com/ebx/LinkResolverServlet?classofcontent=Standard&id=63935
the query
session.query().where_equals("url",url).select("Id","html","date","metadata")
gives this stack
File "/home/myusername/***********/somepythonfile.py", line 60, in getDocumentbyURL
query_result = list(session.query().where_equals("url",url).select("Id","html","date","metadata"))
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 92, in __iter__
return self._execute_query().__iter__()
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 332, in _execute_query
includes=self.includes)
File "/usr/local/lib/python3.5/dist-packages/pyravendb/d_commands/database_commands.py", line 286, in query
raise exceptions.ErrorResponseException(response["Error"][:100])
pyravendb.custom_exceptions.exceptions.ErrorResponseException: Could not find index named: 63935
Could not find index named: 63935
Which is normal since there is no index 63935, it seems to be mistaking the url parameter for a query parameter.
Any help in how to fix it ?
Thank you !
Fixed via a new version of RavenDB !

Does appengine-mapreduce have a limit on operations?

I am working on a project that requires a big knowledgebase to be constructed based on word co-occurrences in text. As I have researched, a similar approach has not been tried in appengine. I would like to use appengine's flexibility and scalability, to be able to serve the knowledgebase and do reasoning on it to a wide scale of users.
So far I have come up with a mapreduce implementation based on the demo app for the pipeline. The source texts are stored in in the blobstore as zipped files containing one xml document, each containing a variable number of articles (as much as 30000).
The first step was to adapt the current BlobstoreZipLineInputReader, so that it parses the xml file, retrieving the relevant information from it. The XMLParser class uses the lxml iterparse approach to retrieve the xml elements to process from http://www.ibm.com/developerworks/xml/library/x-hiperfparse/, and returns an iterator.
The modified class BlobstoreXMLZipLineInputReader has a slightly different next function:
def next(self):
if not self._filestream:
if not self._zip:
self._zip = zipfile.ZipFile(self._reader(self._blob_key))
self._entries = self._zip.infolist()[self._start_file_index:
self._end_file_index]
self._entries.reverse()
if not self._entries:
raise StopIteration()
entry = self._entries.pop()
parser = XMLParser()
# the result here is an iterator with the individual articles
self._filestream = parser.parseXML(self._zip.open(entry.filename))
try:
article = self._filestream.next()
self._article_index += 1
except StopIteration:
article = None
if not article:
self._filestream.close()
self._filestream = None
self._start_file_index += 1
self._initial_offset = 0
return self.next()
return ((self._blob_key, self._start_file_index, self._article_index),
article)
The map function will then receive each of these articles, split by sentences, and then split by words:
def map_function(data):
"""Word count map function."""
(entry, article) = data
for s in split_into_sentences(article.body):
for w in split_into_words(s.lower()):
if w not in STOPWORDS:
yield (w, article.id)
And the reducer aggregates words, and joins the ids for the articles they appear on:
def reduce_function(key, values):
"""Word count reduce function."""
yield "%s: %s\n" % (key, list(set(values)))
This works beautifully on both the dev server and the live setup up to around 10000 texts (there are not that many words on them). It generally takes no more than 10 seconds. The problem is when it goes a bit over that, and mapreduce seems to hang processing the job continuously. The number of processed items per shard just increments, and my write op limits are soon reached.
Q1. Is there somehow a limit in how many map operations the mapreduce pipeline can do before it starts "behaving badly"?
Q2. Would there be a better approach to my problem?
Q3. I know this has been asked before, but can I circumvent the temporary mapreduce datastore writes? they're killing me...
P.S.: here's my main mapreduce call:
class XMLArticlePipeline(base_handler.PipelineBase):
def run(self, filekey, blobkey):
output = yield mapreduce_pipeline.MapreducePipeline(
"process_xml",
"backend.build_knowledgebase.map_function",
"backend.build_knowledgebase.reduce_function",
"backend.build_knowledgebase.BlobstoreXMLZipLineInputReader",
"mapreduce.output_writers.BlobstoreOutputWriter",
mapper_params={
"blob_keys": [blobkey],
},
reducer_params={
"mime_type": "text/plain",
},
shards=12)
yield StoreOutput(filekey, output)
EDIT.: I get some weird errors in the dev server when running a neverending job:
[App Instance] [0] [dev_appserver_multiprocess.py:821] INFO Exception in HandleRequestThread
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver_multiprocess.py", line 819, in run
HandleRequestDirectly(request, client_address)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver_multiprocess.py", line 957, in HandleRequestDirectly
HttpServer(), request, client_address)
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/SocketServer.py", line 310, in process_request
self.finish_request(request, client_address)
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver.py", line 2579, in __init__
BaseHTTPServer.BaseHTTPRequestHandler.__init__(self, *args, **kwargs)
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/SocketServer.py", line 641, in __init__
self.finish()
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/SocketServer.py", line 694, in finish
self.wfile.flush()
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

Google app engine key value error

I am writing a google app engine app and I have this key value error upon requests coming in
from the backtrace I just access and cause the key error
self.request.headers
entire code snippet is here, I just forward the headers unmodified
response = fetch( "%s%s?%s" % (
self.getApiServer() ,
self.request.path.replace("/twitter/", ""),
self.request.query_string
),
self.request.body,
method,
self.request.headers,
)
and get method handling the request calling proxy()
# handle http get
def get(self, *args):
parameters = self.convertParameters(self.request.query_string)
# self.prepareHeader("GET", parameters)
self.request.query_string = "&".join("%s=%s" % (quote(key) , quote(value)) for key, value in parameters.items())
self.proxy(GET, *args)
def convertParameters(self, source):
parameters = {}
for pairs in source.split("&"):
item = pairs.split("=")
if len(item) == 2:
parameters[item[0]] = unquote(item[1])
return parameters
the error back trace:
'CONTENT_TYPE'
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 513, in __call__
handler.post(*groups)
File "/base/data/home/apps/waytosing/1.342850593213842824/com/blogspot/zizon/twitter/RestApiProxy.py", line 67, in post
self.proxy(POST, *args)
File "/base/data/home/apps/waytosing/1.342850593213842824/com/blogspot/zizon/twitter/RestApiProxy.py", line 47, in proxy
self.request.headers,
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 240, in fetch
allow_truncated, follow_redirects)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 280, in make_fetch_call
for key, value in headers.iteritems():
File "/base/python_runtime/python_dist/lib/python2.5/UserDict.py", line 106, in iteritems
yield (k, self[k])
File "/base/python_runtime/python_lib/versions/1/webob/datastruct.py", line 40, in __getitem__
return self.environ[self._trans_name(item)]
KeyError: 'CONTENT_TYPE'
Any idea why it happens or is this a known bug?
This looks weird. The docs mention that response "Headers objects do not raise an error when you try to get or delete a key that isn't in the wrapped header list. Getting a nonexistent header just returns None". It's not clear from the request documentation if request.headers are also objects of this class, but even they were regular dictionaries, iteritems seems to be misbehaving. So this might be a bug.
It might be worth inspecting self.request.headers, before calling fetch, and see 1) its actual type, 2) its keys, and 3) if trying to get self.request.headers['CONTENT_TYPE'] raises an error then.
But, if you simply want to solve your problem and move forward, you can try to bypass it like:
if 'CONTENT_TYPE' not in self.request.headers:
self.request.headers['CONTENT_TYPE'] = None
(I'm suggesting setting it to None, because that's what a response Header object should return on non-existing keys)
Here's my observation about this problem:
When the content-type is application/x-www-form-urlencoded and POST data is empty (e.g. jquery.ajax GET, twitter's favorite and retweet API...), the content-type is dropped by Google appengine.
You can add:
self.request.headers.update({'content-type':'application/x-www-form-urlencoded'})
before urlfetch.
Edit: indeed, looking at the error more carefully, it doesn't seem to be related to convertParameters, as the OP points out in the comments. I'm retiring this answer.
I'm not entirely sure what you mean by "just forward the headers unmodified", but have you taken a look at self.request.query_string before and after you call convertParameters? More to the point, you're leaving out any (valid) GET parameters of the form "key=" (that is, keys with empty values).
Maybe your original query_string had a value like "CONTENT_TYPE=", and your convertParameters is stripping it out.
Known issue http://code.google.com/p/googleappengine/issues/detail?id=3427 and potential workarounds here http://code.google.com/p/googleappengine/issues/detail?id=2040

Categories