Graph library API [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am creating a library to support some standard graph traversals. Some of the graphs are defined explicitly: i.e., all edges are added by providing a data structure, or by repeatedly calling a relevant method. Some graphs are only defined implicitly: i.e., I only can provide a function that, given a node, will return its children (in particular, all the infinite graphs I traverse must be defined implicitly, of course).
The traversal generator needs to be highly customizable. For example, I should be able to specify whether I want DFS post-order/pre-order/in-order, BFS, etc.; in which order the children should be visited (if I provide a key that sorts them); whether the set of visited nodes should be maintained; whether the back-pointer (pointer to parent) should be yielded along with the node; etc.
I am struggling with the API design for this library (the implementation is not complicated at all, once the API is clear). I want it to be elegant, logical, and concise. Is there any graph library that meets these criteria that I can use as a template (doesn't have to be in Python)?
Of course, if there's a Python library that already does all of this, I'd like to know, so I can avoid coding my own.
(I'm using Python 3.)

if you need to handle infinite graphs then you are going to need some kind of functional interface to graphs (as you say in the q). so i would make that the standard representation and provide helper functions that take other representations and generate a functional representation.
for the results, maybe you can yield (you imply a generator and i think that is a good idea) a series of result objects, each of which represents a node. if the user wants more info, like backlinks, they call a method on that, and the extra information is provided (calculated lazily, where possible, so that you avoid that cost for people that don't need it).
you don't mention if the graph is directed or not. obviously you can treat all graphs as directed and return both directions. but then the implementation is not as efficient. typically (eg jgrapht) libraries have different interfaces for different kinds of graph.
(i suspect you're going to have to iterate a lot on this, before you get a good balance between elegant api and efficiency)
finally, are you aware of the functional graph library? i am not sure how it will help, but i remember thinking (years ago!) that the api there was a nice one.

The traversal algorithm and the graph data structure implementation should be separate, and should talk to each other only through the standard API. (If they are coupled, each traversal algorithm would have to be rewritten for every implementation.)
So my question really has two parts:
How to design the API for the graph data structure (used by graph algorithms such as traversals and by client code that creates/accesses graphs)
How to design the API for the graph traversal algorithms (used by client code that needs to traverse a graph)
I believe C++ Boost Graph Library answers both parts of my question very well. I would expect it can be (theoretically) rewritten in Python, although there may be some obstacles I don't see until I try.
Incidentally, I found a website that deals with question 1 in the context of Python: http://wiki.python.org/moin/PythonGraphApi. Unfortunately, it hasn't been updated since Aug 2011.

Related

Python example of 3D bin packing problem and visualization

I am wondering if there are any python-based applications and examples about the 3D bin packing problem? I am facing a problem of planning the loading of thousands of items/boxes/pallets into the ocean containers (mainly 40HC).
Mannual planning, as most companies currently are doing, is very inefficient and painful. I am very interested to know if there is any python-based optimization toolkit that helps to solve such problem in a relatively accurate manner.
Allocating the items to containers are relatively easy, as I see from an example here: https://developers.google.com/optimization/bin/bin_packing#complete_programs
What is challenging is to generate a concrete plan telling people how to load each items into a specific container, like which box needs to be placed inside first, which item needs to be placed on top of others, so that the container storage space is maximally utilized.
And I hope to have a visual ouput like below. Are there any good coding examples?
I work on the Bin Packing problem and the only open-source python implementations that I have seen are those of academic research papers or heuristics. These have various limitations though such as not handling a large number of items or various physical constraints, and lacking a practical simulation guide for packing.
If you don't necessarily need access to the code, there is a growing number of sophisticated software for your use case, such as DeepPack by InstaDeep which is currently open for free access. It determines an optimal packing sequence using reinforcement learning, and shows a 3D simulation of the item packing order.

NLP to match with previous requests [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working on a requirement where I have history of previous requests. Requests may be like "Send me a report of .." or "Get me this doc" and this will get assigned to some one and that person will respond.
I need to build an app which will analyse the previous request and if a new request arrives and if any of the previous requests matches then I should recommend the previous request's solution.
I am trying to implement the above using Python and after some research I found doc2vector is one of the approach to convert the previous requests to a vector and match with vector of new request. I want to know, is this the right approach or are better approaches available?
There are several different approaches for your problem. Actually, there's no right or wrong answer, but the one that fits your data, objectives and expected results more properly. To mention a few:
Vectorization (doc2vec)
This approach will make a vector representation of a document based on individual words vector from a pretrained source (these so called embeddings can be more general with worse results in too closed contexts or more specific, being better fit to a special type of text).
In order to match a new request to this vector representation of your document, the new request have to share words with a closely related vector representation, otherwise it won't work.
Keyword matching (or topicalization)
A simpler approach, where a document is classified by the more representative keywords in it (using techniques such as TF-IDF or even simpler word distribution).
To match a new request, this has to include the keywords of the document.
Graph Based Approach
I've worked with this approach for Question Answering in my Master's research. In it, each document is modeled as a graph node connected to its keywords (which are also nodes). Each word in the graph is related to other words and compose a network through which the document is accessed.
To match a new request, the keywords from the request are retrieved and "spread" using one of many network traversal techniques, attempting to get to the closest document into the graph. You can see how I documented my approach here. However, this approach requires either an already existing set of inter-word relations (wordnet for a simpler approach) or a good time spent annotating word relations.
Final Words
However, if you're interested in matching "this document" to "Annex A from e-mail 5". Thats a whooooole other problem. One that is actually not solved. You can attempt to use coreference resolution for references inside the same paragraph or phrase. But that won't work with different documents (e-mails). If you want to win some notoriety in NLP (actually NLU - Natural Language Understanding), that's a research to delve into.

Cyclomatic complexity metric practices for Python [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have a relatively large Python project that I work on, and we don't have any cyclomatic complexity tools as a part of our automated test and deployment process.
How important are cyclomatic complexity tools in Python? Do you or your project use them and find them effective? I'd like a nice before/after story if anyone has one so we can take a bit of the subjectiveness out of the answers (i.e. before we didn't have a cyclo-comp tool either, and after we introduced it, good thing A happened, bad thing B happened, etc). There are a lot of other general answers to this type of question, but I didn't find one for Python projects in particular.
I'm ultimately trying to decide whether or not it's worth it for me to add it to our processes, and what particular metric and tool/library is best for large Python projects. One of our major goals is long term maintenance.
We used the RADON tool in one of our projects which is related to Test Automation.
RADON
Depending on new features and requirements, we need to add/modify/update/delete codes in that project. Also, almost 4-5 people were working on this. So, as a part of review process, we identified and used RADON tools since we want our code maintainable and readable.
Depend on the RADON tool output, there were several times we re-factored our code, added more methods and modified the looping.
Please let me know if this is useful to you.
Python isn't special when it comes to cyclomatic complexity. CC measures how much branching logic is in a chunk of code.
Experience shows that when the branching is "high", that code is harder to understand and change reliably than code in which the branching is lower.
With metrics, it typically isn't absolute values that matter; it is relative values as experienced by your organization. What you should do is to measure various metrics (CC is one) and look for a knee in the curve that relates that metric to bugs-found-in-code. Once you know where the knee is, ask coders to write modules whose complexity is below the knee. This is the connection to long-term maintenance.
What you don't measure, you can't control.
wemake-python-styleguide supports both radon and mccabe implementations of Cyclomatic Complexity.
There are also different complexity metrics that are not covered by just Cyclomatic Complexity, including:
Number of function decorators; lower is better
Number of arguments; lower is better
Number of annotations; higher is better
Number of local variables; lower is better
Number of returns, yields, awaits; lower is better
Number of statements and expressions; lower is better
Read more about why it is important to obey them: https://sobolevn.me/2019/10/complexity-waterfall
They are all covered by wemake-python-styleguide.
Repo: https://github.com/wemake-services/wemake-python-styleguide
Docs: https://wemake-python-stylegui.de
You can also use mccabe library. It counts only McCabe complexity, and can be integrated in your flake8 linter.

Python - side effects/purity analysis tools? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Are there any existing tools for side effects/purity analysis in Python, similar to http://jppa.sourceforge.net in Java?
I don't know of any that exist, but here are some general approaches to making one:
Analysing source files as text - using regular expressions to find things that show a function definitely isn't pure - e.g. the global keyword. For practical purposes, most decently written functions that only have a return statement in the body are likely to be pure. On the other hand, if a function doesn't have a return statement, it is either useless, or impure.
Analysing functions in a source file as code. If testing a function in isolation produces a NameError, you know that it is either impure (because it doesn't have access to variables at a higher level), or has a mistake in it (referring to a variable before it is defined or some such), however the latter case should be covered by normal testing. The inspect module's function isfunction may be useful if you want to do this.
For each function you test, if it has a relatively small domain (e.g. one input that can either be 1, 2, 3 or 4) then you could exhaustively test all possible inputs, and get a certain answer this way. If it has a limited, or finite but large domain (e.g. all the real numbers between 0 and 1000 (infinite but limited), or all the integers between -12345 and 67890) then you could try sampling a selection of inputs in that domain, and use that to get a probability of purity. However, this approach may not be very useful, as the domain of the function is unlikely to be specified, so you may only be able to check it if you wrote the function, in which case you may not need to analyse it anyway.
Doing something clever, possibly in combination with the above techniques. For instance, making a neural network, with the input as the text of a function, and the output as the likelihood of it being pure. You could then train the network on examples of functions you know to be pure or impure, and then use it on functions of unknown purity.
Edit:
I came back to this question after someone downvoted with new knowledge! The ast module should make it relatively easy to write your own analysis tool like this, as it allows you access to the abstract syntax tree of code. It should be fairly easy to walk through this tree and see if there is anything preventing purity. This is a much better approach than analysing source files as text, and I might have a go at it at some point.
Finally, this question might also be useful, and also this one, which is basically a duplicate of this question.

Good geometry library in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am looking for a good and well developed library for geometrical manipulations and evaluations in python, like:
evaluate the intersection between two lines in 2D and 3D (if present)
evaluate the point of intersection between a plane and a line, or the line of intersection between two planes
evaluate the minimum distance between a line and a point
find the orthonormal to a plane passing through a point
rotate, translate, mirror a set of points
find the dihedral angle defined by four points
I have a compendium book for all these operations, and I could implement it but unfortunately I have no time, so I would enjoy a library that does it. Most operations are useful for gaming purposes, so I am sure that some of these functionalities can be found in gaming libraries, but I would prefer not to include functionalities (such as graphics) I don't need.
Any suggestions ? Thanks
Perhaps take a look at SymPy.
Shapely is a nice python wrapper around the popular GEOS library.
I found pyeuclid to be a great simple general purpose euclidean math package. Though the library may not contain exactly the problems that you mentioned, its infrastructure is good enough to make it easy to write these on your own.
CGAL has Python bindings too.
I really want a good answer to this question, and the ones above left me dissatisfied. However, I just came across pythonocc which looks great, apart from lacking good docs and still having some trouble with installation (not yet pypi compatible). The last update was 4 days ago (June 19th, 2011). It wraps OpenCascade which has a ton of geometry and modeling functionality. From the pythonocc website:
pythonOCC is a 3D CAD/CAE/PLM development framework for the Python programming language. It provides features such as advanced topological and geometrical operations, data exchange (STEP, IGES, STL import/export), 2D and 3D meshing, rigid body simulation, parametric modeling.
[EDIT: I've now downloaded pythonocc and began working through some of the examples]
I believe it can perform all of the tasks mentioned, but I found it to be unintuitive to use. It is created almost entirely from SWIG wrappers, and as a result, introspection of the commands becomes difficult.
geometry-simple has classes Point Line Plane Movement in ~ 300 lines, using only numpy; take a look.
You may be interested in Python module SpaceFuncs from OpenOpt project, http://openopt.org
SpaceFuncs is tool for 2D, 3D, N-dimensional geometric modeling with possibilities of parametrized calculations, numerical optimization and solving systems of geometrical equations
Python Wild Magic is another SWIG wrapped code. It is however a gaming library, but you could manipulate the SWIG library file to exclude any undesired graphics stuff from the Python API.

Categories