Can I have a temporary/alternative neo4j graph for testing?

Can I have a temporary/alternative neo4j graph for testing? - python

I'm using Py2neo in a project. Most of the time the neo4j server runs on localhost so in order to connect to the graph I just do:
g = Graph()
But when I run tests I'd like connect to a different graph, preferably one I can trash without any consequencews.
I'd like to have a "production" graph, possibly set up in such a way that even though it also runs on localhost, the tests won't have access to it.
Can this be done?
UPDATE 0 - A better way to put this question might have been how can I get my locahost Neo4J to serve up 2 databases on two different ports? Once I've got that working it's trivial ot use the REST client to connect to one or the other. I'm running the latest .deb version of Neo4J on an Ubuntu workstation (if that matters).

You can have multiple instances of Neo4j running on the same machine by configuring them to use different ports, i.e. 7474 for development and 7473 for tests.
Graph() defaults to http://localhost:7474/db/data/ but you can also pass a connection URI explicitly:
dev = Graph()
test = Graph("http://localhost:7473/db/data/")
prod = Graph("https://remotehost.com:6789/db/data/")

You can run neo4j server on a different machine and access it through REST service.
Inside the neo4j-server.properties, you can uncomment the line where it says IP address of 0.0.0.0
This would allow that server to be accessed from any place. Now I dont what with Python, but with Java I am using Java Rest library to access that server using the Java Rest Library for Neo4j. Take a look here
https://github.com/rash805115/bookeeping/blob/master/src/main/java/database/service/impl/Neo4JRestServiceImpl.java
Update 0: There are three ways to complete your wish.
Method 1: Start neo4j instance on a separate machine. Then access that instance using some REST API. The way to do that would be to go in conf/neo4j-server.properties and then to find this line and uncomment it.
#org.neo4j.server.webserver.address=0.0.0.0
Method 2: Start two neo4j instances on the same machine but different port and use the REST service to access those. To do this copy the neo4j distribution into two separate folders. Then change this line in conf/neo4j-server.properties and change the port in atleast one if them.
First Instance - org.neo4j.server.webserver.port=7474
org.neo4j.server.webserver.https.port=7473
Second Instance - org.neo4j.server.webserver.port=8484
org.neo4j.server.webserver.https.port=8483
Method 3: From your comments it appears you want to do this and indeed this is the easiest method. Have two separate databases on the same Neo4J Instance. For you to do this you dont have to change any configuration files, just a line in your code. I have not done this in python exactly, but I have done the same in Java. Let me give you the Java code and you can see how easy it is.
Production Code:
package rash.experiments.neo4j;
import org.neo4j.cypher.javacompat.ExecutionEngine;
import org.neo4j.cypher.javacompat.ExecutionResult;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
public class Neo4JEmbedded
{
public static void main(String args[])
{
GraphDatabaseService graphDatabaseService = new GraphDatabaseFactory().newEmbeddedDatabase("db/productiondata/");
ExecutionEngine executionEngine = new ExecutionEngine(graphDatabaseService);
try(Transaction transaction = graphDatabaseService.beginTx())
{
executionEngine.execute("create (node:Person {userId: 1})");
transaction.success();
}
ExecutionResult executionResult = executionEngine.execute("match (node) return count(node)");
System.out.println(executionResult.dumpToString());
}
}
Test Code:
package rash.experiments.neo4j;
import org.neo4j.cypher.javacompat.ExecutionEngine;
import org.neo4j.cypher.javacompat.ExecutionResult;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
public class Neo4JEmbedded
{
public static void main(String args[])
{
GraphDatabaseService graphDatabaseService = new GraphDatabaseFactory().newEmbeddedDatabase("db/testdata/");
ExecutionEngine executionEngine = new ExecutionEngine(graphDatabaseService);
try(Transaction transaction = graphDatabaseService.beginTx())
{
executionEngine.execute("create (node:Person {userId: 1})");
transaction.success();
}
ExecutionResult executionResult = executionEngine.execute("match (node) return count(node)");
System.out.println(executionResult.dumpToString());
}
}
Note the difference in line:
GraphDatabaseService graphDatabaseService = new GraphDatabaseFactory().newEmbeddedDatabase("db/testdata/");
This creates two separate folders db/productiondata and db/testdata. Both of these folders contains separate data and your code can use either folder based on your requirement.
I am pretty sure, in your python code you have to do almost the same thing. Something like (Note that this code might not be correct):
g = Graph("/db/productiondata")
g = Graph("/db/testdata")

Unfortunately, this is a problem without a perfect solution right now. There are however a few options available which may suffice for what you need.
First, have a look at the py2neo build script: https://github.com/nigelsmall/py2neo/blob/release/2.0.5/bau
This is a bash script that spawns a new database instance for each version that needs testing, starting up with an empty store beforehand and closing down afterwards. It uses the default port 7474 but it should be an easy change to tweak this automatically in the properties file. Specifically here, you'll probably want to look at the test, neo4j_start and neo4j_stop functions.
Additionally, py2neo provides an extension called neobox:
http://py2neo.org/2.0/ext/neobox.html
This is intended to be a quick and simple way to set up new database instances running on free ports and might be helpful in this case.
Note that generally speaking, clearing down the data store between tests is a bad idea as this is a slow operation and can seriously impact the running time of your test suite. For that reason, a test database that lives for all tests is a better idea although requires a little thought when writing tests so as they don't overlap.
Going forward, Neo4j will gain DROP functionality to help with this kind of work but it will likely be a few releases before this appears.

Related

Using an API / WebService in Python instead of C#

I have to use a Webservice, where on my own webserver a script should make GET requests regularly. There exists a documentation with multiple C# examples. This should work (I could not get it running on my windows pc).
https://integration.questback.com/integration.svc
You have created a service.
To test this service, you will need to create a client and use it to call the service. You can do this using the svcutil.exe tool from the command line with the following syntax:
svcutil.exe https://integration.questback.com/Integration.svc?wsdl
This will generate a configuration file and a code file that contains the client class. Add the two files to your client application and use the generated client class to call the Service. For example:
C#
class Test
{
static void Main()
{
QuestBackIntegrationLibraryClient client = new QuestBackIntegrationLibraryClient();
// Use the 'client' variable to call operations on the service.
// Always close the client.
client.Close();
}
}
Since the server is linux based and I don´t know a piece of C# + XML, I wanted to ask if there is an way to make this run on linux server, preferable with Python (I know this question is quite vague, I´m sorry).
Thank you!

How (in what form) to share (deliver) a Python function?

The final outcome of my work should be a Python function that takes a JSON object as the only input and return another JSON object as output. To keep it more specific, I am a data scientist, and the function that I am speaking about, is derived from data and it delivers predictions (in other words, it is a machine learning model).
So, my question is how to deliver this function to the "tech team" that is going to incorporate it into a web-service.
At the moment I face few problems. First, the tech team does not necessarily work in Python environment. So, they cannot just "copy and paste" my function into their code. Second, I want to make sure that my function runs in the same environment as mine. For example, I can imagine that I use some library that the tech team does not have or they have a version that differ from the version that I use.
ADDED
As a possible solution I consider the following. I start a Python process that listen to a socket, accept incoming strings, transforms them into JSON, gives the JSON to the "published" function and returns the output JSON as a string. Does this solution have disadvantages? In other words, is it a good idea to "publish" a Python function as a background process listening to a socket?

You have the right idea with using a socket but there are tons of frameworks doing exactly what you want. Like hleggs, I suggest you checkout Flask to build a microservice. This will let the other team post JSON objects in an HTTP request to your flask application and receive JSON objects back. No knowledge of the underlying system or additional requirements required!
Here's a template for a flask app that replies and responds with JSON
from flask import Flask, request, jsonify
app = Flask(__name__)
#app.route('/', methods=['POST'])
def index():
json = request.json
return jsonify(your_function(json))
if __name__=='__main__':
app.run(host='0.0.0.0', port=5000)
Edit: embeded my code directly as per Peter Britain's advice

My understanding of your question boils down to:
How can I share a Python library with the rest of my team, that may not be using Python otherwise?
And how can I make sure my code and its dependencies are what the receiving team will run?
And that the receiving team can install things easily mostly anywhere?
This is a simple question with no straightforward answer... as you just mentioned that this may be integrated in some webservice, but you do not know the actual platform for this service.
You also ask:
As a possible solution I consider the following. I start a Python process that listen to a socket, accept incoming strings, transforms them into JSON, gives the JSON to the "published" function and returns the output JSON as a string. Does this solution have disadvantages? In other words, is it a good idea to "publish" a Python function as a background process listening to a socket?
In the most simple case and for starting I would say no in general. Starting network servers such as an HTTP server (which is built-in Python) is super easy. But a service (even if qualified as "micro") means infrastructure, means security, etc.
What if the port you expect is not available on the deployment machine? - What happens when you restart that machine?
How will your server start or restart when there is a failure?
Would you need also to eventually provide an upstart or systemd service (on Linux)?
Will your simple socket or web server support multiple concurrent requests?
is there a security risk to expose a socket?
Etc, etc. When deployed, my experience with "simple" socket servers is that they end up being not so simple after all.
In most cases, it will be simpler to avoid redistributing a socket service at first. And the proposed approach here could be used to package a whole service at a later stage in a simpler way if you want.
What I suggest instead is a simple command line interface nicely packaged for installation.
The minimal set of things to consider would be:
provide a portable mechanism to call your function on many OSes
ensure that you package your function such that it can be installed with all the correct dependencies
make it easy to install and of course provide some doc!
Step 1. The simplest common denominator would be to provide a command line interface that accepts the path to a JSON file and spits JSON on the stdout.
This would run on Linux, Mac and Windows.
The instructions here should work on Linux or Mac and would need a slight adjustment for Windows (only for the configure.sh script further down)
A minimal Python script could be:
#!/usr/bin/env python
"""
Simple wrapper for calling a function accepting JSON and returning JSON.
Save to predictor.py and use this way::
python predictor.py sample.json
[
"a",
"b",
4
]
"""
from __future__ import absolute_import, print_function
import json
import sys
def predict(json_input):
"""
Return predictions as a JSON string based on the provided `json_input` JSON
string data.
"""
# this will error out immediately if the JSON is not valid
validated = json.loads(json_input)
# <....> your code there
with_predictions = validated
# return a pretty-printed JSON string
return json.dumps(with_predictions, indent=2)
def main():
"""
Print the JSON string results of a prediction, loading an input JSON file from a
file path provided as a command line argument.
"""
args = sys.argv[1:]
json_input = args[0]
with open(json_input) as inp:
print(predict(inp.read()))
if __name__ == '__main__':
main()
You can process eventually large inputs by passing the path to a JSON file.
Step 2. Package your function. In Python this is achieved by creating a setup.py script. This takes care of installing any dependent code from Pypi too. This will ensure that the version of libraries you depend on are the ones you expect. Here I added nltk as an example for a dependency. Add yours: this could be scikit-learn, pandas, numpy, etc. This setup.py also creates automatically a bin/predict script which will be your main command line interface:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
from __future__ import absolute_import, print_function
from setuptools import setup
from setuptools import find_packages
setup(
name='predictor',
version='1.0.0',
license='public domain',
description='Predict your life with JSON.',
packages=find_packages(),
# add all your direct requirements here
install_requires=['nltk >= 3.2, < 4.0'],
# add all your command line entry points here
entry_points={'console_scripts': ['predict = prediction.predictor:main']}
)
In addition as is common for Python and to make the setup code simpler I created a "Python package" directory moving the predictor inside this directory.
Step 3. You now want to package things such that they are easy to install. A simple configure.sh script does the job. It installs virtualenv, pip and setuptools, then creates a virtualenv in the same directory as your project and then installs your prediction tool in there (pip install . is essentially the same as python setup.py install). With this script you ensure that the code that will be run is the code you want to be run with the correct dependencies. Furthermore, you ensure that this is an isolated installation with minimal dependencies and impact on the target system. This is tested with Python 2 but should work quite likely on Python 3 too.
#!/bin/bash
#
# configure and installs predictor
#
ARCHIVE=15.0.3.tar.gz
mkdir -p tmp/
wget -O tmp/venv.tgz https://github.com/pypa/virtualenv/archive/$ARCHIVE
tar --strip-components=1 -xf tmp/venv.tgz -C tmp
/usr/bin/python tmp/virtualenv.py .
. bin/activate
pip install .
echo ""
echo "Predictor is now configured: run it with:"
echo " bin/predict <path to JSON file>"
At the end you have a fully configured, isolated and easy to install piece of code with a simple highly portable command line interface.
You can see it all in this small repo: https://github.com/pombredanne/predictor
You just clone or fetch a zip or tarball of the repo, then go through the README and you are in business.
Note that for a more engaged way for more complex applications including vendoring the dependencies for easy install and not depend on the network you can check this https://github.com/nexB/scancode-toolkit I maintain too.
And if you really want to expose a web service, you could reuse this approach and package that with a simple web server (like the one built-in in the Python standard lib or bottle or flask or gunicorn) and provide configure.sh to install it all and generate the command line to launch it.

Your task is (in generality) about productionizing a machine learning model, where the consumer of the model may not be working in the same environment as the one which was used to develop the model. I've been trying to tackle this problem since past few years. The problem is faced by many companies and it is aggravated due to skill set, objectives as well as environment (languages, run time) mismatch between data scientists and developers. From my experience, following solutions/options are available, each with its unique advantages and downsides.
Option 1 : Build the prediction part of your model as a standalone web service using any lightweight tool in Python (for example, Flask). You should try to decouple the model development/training and prediction part as much as possible. The model that you have developed, must be serialized to some form so that the web server can use it.
How frequently is your machine learning model updated? If it is not done very frequently, the serialized model file (example: Python pickle file) can be saved to a common location accessible to the web server (say s3), loaded in memory. The standalone web server should offer APIs for prediction.
Please note that exposing a single model prediction using Flask would be simple. But scaling this web server if needed, configuring it with right set of libraries, authentication of incoming requests are all non-trivial tasks. You should choose this route only if you have dev teams ready to help with these.
If the model gets updated frequently, versioning your model file would be a good option. So in fact, you can piggyback on top of any version control system by checking in the whole model file if it is not too large. The web server can de-serialize (pickle.load) this file at startup/update and convert to a Python object on which you can call prediction methods.
Option 2 : use predictive modeling markup language. PMML was developed specifically for this purpose: predictive modeling data interchange format independent of environment. So data scientist can develop model, export it to a PMML file. The web server used for prediction can then consume the PMML file for doing predictions. You should definitely check the open scoring project which allows you to expose machine learning models via REST APIs for deploying models and making predictions.
Pros: PMML is standardized format, open scoring is a mature project with good development history.
Cons: PMML may not support all models. Open scoring is primarily useful if your tech team's choice of development platform is JVM. Exporting machine learning models from Python is not straightforward. But R has good support for exporting models as PMML files.
Option 3 : There are some vendors offering dedicated solutions for this problem. You will have to evaluate cost of licensing, cost of hardware as well as stability of the offerings for taking this route.
Whichever option you choose, please consider the long term costs of supporting that option. If your work is in a proof of concept stage, Python flask based web server + pickled model files will be the best route. Hope this answer helps you!

As already suggested in other answers the best option would be creating a simple web service. Besides Flask you may want to try bottle which is very thin one-file web framework. Your service may looks as simple as:
from bottle import route, run, request
#route('/')
def index():
return my_function(request.json)
run(host='0.0.0.0', port=8080)
In order to keep environments the same check virtualenv to make isolated environment for avoiding conflicts with already installed packages and pip to install exact version of packages into virtual environment.

I guess you have 3 possibilities :
convert python function to javascript function:
Assuming the "tech-team" use Javascript for web-service, you may try to convert your python function directly to a Javascript function (which will be really easy to integrate on web page) using empythoned (based on emscripten)
The bad point of this method is that each time you need update/upgrade your python function, you need also to convert to Javascript again, then check & validate that the function continue to work.
simple API server + JQuery
If the conversion method is impossible, I am agree with #justin-bell, you may use FLASK
getting JSON as input > JSON to your function parameter > run python function > convert function result to JSON > serve the JSON result
Assuming you choose the FLASK solution, "tech-team" will only need to send an async. GET/POST request containing all the arguments as JSON obj, when they need to get some result from your python function.
websocket server + socket.io
You can also use take a look on Websocket to dispatch to the webservice (look at flask + websocket for your side & socket.io for webservice side.)
=> websocket is really usefull when you need to push/receive data with low cost and latency to (or from) a lot of users (Not sure that websocket will be the best fit to your need)
Regards

Amazon EC2 file structure / web app with separate Python backend?

I'm currently running a t2.micro instance on EC2 right now. I have the html/web interface side of it working, along with a MySQL database.
The site allows users to register and stores them in the DB via a PHP script.
I want there to be an actual Python application that queries the MySQL database and returns user data, to then be executed in a Python script.
What I cannot find is whether I host this Python application as a totally separate instance or if it can exist on the same instance, in a different directory. I ultimately just need to query the database, which makes me thing it must exist on the same instance.
Could someone please provide some guidance?
Let me just be clear: this is not a Python web app. This Python backend is entirely separate except making queries against the database.

Either approach is possible, but there are pros & cons to each.
Running separate Python app on the same server:
Pros:
Setting up local access to the database is fairly simple
Only need to handle backups or making snapshots, etc. for a single instance
Cons:
Harder to scale up individual pieces if you need more memory, processing power, etc. in the future
Running the Python app on a separate server:
Pros:
Separate pieces means you can scale up & down the hardware each piece is running on, according to their individual needs
If you're using all micro instances, you get more resources to work with, without any extra costs (assuming you're still meeting all the other 'free tier eligible' criteria)
Cons:
In general, more pieces == more time spent on configuration, administration tasks, etc.
You have to open up the database to non-local access
Simplest: open up the database to access from anywhere (e.g. all remote IP addresses), and have the Python app log in via the internet
Somewhat safer, more complex: set the Python app server up with an elastic IP, open up the database to access only from that address
Much safer, more complex: set up your own virtual private cloud (VPC), and allow connections to the database only from within the VPC. You'd have to configure public access for each of the servers for whatever public traffic you'll have, presumably ports 80 and/or 443.

Neo4j ImpermanentDatabase in python unittests

I am trying to create unit tests for a python project that will interface with a Neo4j Graph database.
Currently, I am implementing the embedded graph database, but will likely migrate to a REST interface if I choose to deploy this to a web application.
I have intstalled v1.9rc2 of the embedded neo4j project, installed via pip on a virtual environment.
There are mentionings of a java class org.neo4j.test.TestGraphDatabaseFactory, here, which sounds perfect for what I have in mind. I am currently reading and writing to a database on file, which is ok, but I am having trouble properly cleaning up after ech test that doesn't include a call to shutil.rmtree ... or is that how it should be done?
Another possible method is to create and shutdown the database for each test, via the setUp and tearDown methods of my TestCase.
>>> import neo4j
>>> print neo4j.__version__
'1.9.c2'

The best practice is to create and shutdown the database individually for each test using setUp/tearDown - exactly as you've mentioned.
side note: 1.9rc2 is rather outdated, consider upgrading to latest stable since couple of bugs have been fixed since then.

This is the way they do it at the official Python Neo4j Driver, it should probably be considered "a good example" considering where it's coming from.
class ServerTestCase(TestCase):
""" Base class for test cases that use a remote server.
"""
known_hosts = KNOWN_HOSTS
known_hosts_backup = known_hosts + ".backup"
def setUp(self):
if isfile(self.known_hosts):
if isfile(self.known_hosts_backup):
remove(self.known_hosts_backup)
rename(self.known_hosts, self.known_hosts_backup)
def tearDown(self):
if isfile(self.known_hosts_backup):
if isfile(self.known_hosts):
remove(self.known_hosts)
rename(self.known_hosts_backup, self.known_hosts)
Here's the full source file: https://github.com/neo4j/neo4j-python-driver/blob/1.1/test/util.py

When do I generate new GUID's for COM Servers? (Examples in Python)

I’m new at making COM servers and working with COM from Python so I want to clarify a few things I could not find explicit answers for:
Creating GUID’s Properly for COM servers
Do I generate:
The GUID for my intended COM server manually, copy it and use that # for the server from then on in? Therefore, when I distribute the application the other users will use the same GUID I created during development.
A new GUID each time the application or COM server object is initialized?
A new GUID on a per computer basis, only once during the initial setup then have the app save the GUID # and on future loads it pulls that # from the users setup file?
Example Scenario 1:
i) print(pythoncom.CreateGuid()) #in interpreter
ii) _reg_clsid_ = copy above GUID into your app
Example Scenario 2:
i)_reg_clsid_ = pythoncom.CreateGuid()
Example Scenario 3:
if self.isfile = os.path.isfile(url):
load_previous_generated_GUID(url)
else:
#first time running application or setup file is missing
GUID = pythoncom.CreateGuid()
save_GUID_to_setup_file(GUID)
Can I use GUID’s for tracking program/COM server versions?
If scenario #1 above is correct then:
When I make an upgrade I can test for the old GUID so I can interact with it correctly?
TODO: How do I get the GUID of a COM server from Python and/or VBA?

You definitely want scenario 1. You should generate a CLSID once during development, and use that until you change the set of functions your class exports (to COM).
Not only can you use CLSIDs for tracking versions, you must change the CLSIDs whenever you release a new version, or else you break COM identity rules; breaking the rules usually results in obscure failure modes when invoking your object out-of-process.
Typically, however, you don't expose CLSIDs directly to code, you expose PROGIDs instead, which are human-readable strings with embedded versioning information, and use the Win32 API CLSIDFromProgID to convert between the two. (VBA will do this for you; Python may do this too.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.