Python - side effects/purity analysis tools? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Are there any existing tools for side effects/purity analysis in Python, similar to http://jppa.sourceforge.net in Java?

I don't know of any that exist, but here are some general approaches to making one:
Analysing source files as text - using regular expressions to find things that show a function definitely isn't pure - e.g. the global keyword. For practical purposes, most decently written functions that only have a return statement in the body are likely to be pure. On the other hand, if a function doesn't have a return statement, it is either useless, or impure.
Analysing functions in a source file as code. If testing a function in isolation produces a NameError, you know that it is either impure (because it doesn't have access to variables at a higher level), or has a mistake in it (referring to a variable before it is defined or some such), however the latter case should be covered by normal testing. The inspect module's function isfunction may be useful if you want to do this.
For each function you test, if it has a relatively small domain (e.g. one input that can either be 1, 2, 3 or 4) then you could exhaustively test all possible inputs, and get a certain answer this way. If it has a limited, or finite but large domain (e.g. all the real numbers between 0 and 1000 (infinite but limited), or all the integers between -12345 and 67890) then you could try sampling a selection of inputs in that domain, and use that to get a probability of purity. However, this approach may not be very useful, as the domain of the function is unlikely to be specified, so you may only be able to check it if you wrote the function, in which case you may not need to analyse it anyway.
Doing something clever, possibly in combination with the above techniques. For instance, making a neural network, with the input as the text of a function, and the output as the likelihood of it being pure. You could then train the network on examples of functions you know to be pure or impure, and then use it on functions of unknown purity.
Edit:
I came back to this question after someone downvoted with new knowledge! The ast module should make it relatively easy to write your own analysis tool like this, as it allows you access to the abstract syntax tree of code. It should be fairly easy to walk through this tree and see if there is anything preventing purity. This is a much better approach than analysing source files as text, and I might have a go at it at some point.
Finally, this question might also be useful, and also this one, which is basically a duplicate of this question.

Related

Ressources on how to find the appropiate algorithm for minimization problem?

I have a Python function where I want to find the global minimum.
I am looking for a resource on how to choose the appropriate algorithm for it.
When looking at scipy.optimize.minimize documentation, I can find some not very specific statements like
has proven good performance even for non-smooth optimizations
Suitable for large-scale problems
recommended for medium and large-scale problems
Apart from that I am unsure whether for my function basinhopping might be better or even bayesian-optimization. For the latter I am unsure whether that is only meant for machine learning cost functions or can be used generally.
I am looking for a cheat sheet like
but just for minimization problems.
Does something like that already exist?
If not, how would I be able to choose the most appropriate algorithm based on my constraints (that are: time consuming to compute, 4 input variables, known boundaries)?

Cyclomatic complexity metric practices for Python [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have a relatively large Python project that I work on, and we don't have any cyclomatic complexity tools as a part of our automated test and deployment process.
How important are cyclomatic complexity tools in Python? Do you or your project use them and find them effective? I'd like a nice before/after story if anyone has one so we can take a bit of the subjectiveness out of the answers (i.e. before we didn't have a cyclo-comp tool either, and after we introduced it, good thing A happened, bad thing B happened, etc). There are a lot of other general answers to this type of question, but I didn't find one for Python projects in particular.
I'm ultimately trying to decide whether or not it's worth it for me to add it to our processes, and what particular metric and tool/library is best for large Python projects. One of our major goals is long term maintenance.
We used the RADON tool in one of our projects which is related to Test Automation.
RADON
Depending on new features and requirements, we need to add/modify/update/delete codes in that project. Also, almost 4-5 people were working on this. So, as a part of review process, we identified and used RADON tools since we want our code maintainable and readable.
Depend on the RADON tool output, there were several times we re-factored our code, added more methods and modified the looping.
Please let me know if this is useful to you.
Python isn't special when it comes to cyclomatic complexity. CC measures how much branching logic is in a chunk of code.
Experience shows that when the branching is "high", that code is harder to understand and change reliably than code in which the branching is lower.
With metrics, it typically isn't absolute values that matter; it is relative values as experienced by your organization. What you should do is to measure various metrics (CC is one) and look for a knee in the curve that relates that metric to bugs-found-in-code. Once you know where the knee is, ask coders to write modules whose complexity is below the knee. This is the connection to long-term maintenance.
What you don't measure, you can't control.
wemake-python-styleguide supports both radon and mccabe implementations of Cyclomatic Complexity.
There are also different complexity metrics that are not covered by just Cyclomatic Complexity, including:
Number of function decorators; lower is better
Number of arguments; lower is better
Number of annotations; higher is better
Number of local variables; lower is better
Number of returns, yields, awaits; lower is better
Number of statements and expressions; lower is better
Read more about why it is important to obey them: https://sobolevn.me/2019/10/complexity-waterfall
They are all covered by wemake-python-styleguide.
Repo: https://github.com/wemake-services/wemake-python-styleguide
Docs: https://wemake-python-stylegui.de
You can also use mccabe library. It counts only McCabe complexity, and can be integrated in your flake8 linter.

Refetch Values or Tote Them Around

My programs are growing larger and more sophisticated. As a result I am using more and more functions. My question is, should I "fetch" a value from a function once and then "tote" it around, sending it into other functions as a parameter, or just call, "fetch," the value again from within the other function(s)?
I am sure resources, and speed, are a factor, but what is the general rule, if any?
For example, should I call my sigmoid function, and then use that value as a parameter in a call to the next function that uses it, or just call the sigmoid function again from within that next function?
I know that this question borders on opinion, but I did not attend a CS school, and so find myself wondering what the "norm" for some things are.
Thanks.
You are correct that this question relates more to software engineering theory than just a language (Python). There are programming paradigms which promote one variant over the other but the most general rule of thumb you should aim for is:
High cohesion and low coupling
i.e., within a module of software (which roughly corresponds to a Python module, if you are using them), the functions should have dependence on each other and you should call them to fetch the value. However, across modules, you should not have functional calls and should tie them up at a higher level module (or the main function) by fetching values from one module and passing it to the other.
See also: Memoization.

Graph library API [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am creating a library to support some standard graph traversals. Some of the graphs are defined explicitly: i.e., all edges are added by providing a data structure, or by repeatedly calling a relevant method. Some graphs are only defined implicitly: i.e., I only can provide a function that, given a node, will return its children (in particular, all the infinite graphs I traverse must be defined implicitly, of course).
The traversal generator needs to be highly customizable. For example, I should be able to specify whether I want DFS post-order/pre-order/in-order, BFS, etc.; in which order the children should be visited (if I provide a key that sorts them); whether the set of visited nodes should be maintained; whether the back-pointer (pointer to parent) should be yielded along with the node; etc.
I am struggling with the API design for this library (the implementation is not complicated at all, once the API is clear). I want it to be elegant, logical, and concise. Is there any graph library that meets these criteria that I can use as a template (doesn't have to be in Python)?
Of course, if there's a Python library that already does all of this, I'd like to know, so I can avoid coding my own.
(I'm using Python 3.)
if you need to handle infinite graphs then you are going to need some kind of functional interface to graphs (as you say in the q). so i would make that the standard representation and provide helper functions that take other representations and generate a functional representation.
for the results, maybe you can yield (you imply a generator and i think that is a good idea) a series of result objects, each of which represents a node. if the user wants more info, like backlinks, they call a method on that, and the extra information is provided (calculated lazily, where possible, so that you avoid that cost for people that don't need it).
you don't mention if the graph is directed or not. obviously you can treat all graphs as directed and return both directions. but then the implementation is not as efficient. typically (eg jgrapht) libraries have different interfaces for different kinds of graph.
(i suspect you're going to have to iterate a lot on this, before you get a good balance between elegant api and efficiency)
finally, are you aware of the functional graph library? i am not sure how it will help, but i remember thinking (years ago!) that the api there was a nice one.
The traversal algorithm and the graph data structure implementation should be separate, and should talk to each other only through the standard API. (If they are coupled, each traversal algorithm would have to be rewritten for every implementation.)
So my question really has two parts:
How to design the API for the graph data structure (used by graph algorithms such as traversals and by client code that creates/accesses graphs)
How to design the API for the graph traversal algorithms (used by client code that needs to traverse a graph)
I believe C++ Boost Graph Library answers both parts of my question very well. I would expect it can be (theoretically) rewritten in Python, although there may be some obstacles I don't see until I try.
Incidentally, I found a website that deals with question 1 in the context of Python: http://wiki.python.org/moin/PythonGraphApi. Unfortunately, it hasn't been updated since Aug 2011.

Any Naive Bayesian Classifier in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have tried the Orange Framework for Naive Bayesian classification.
The methods are extremely unintuitive, and the documentation is extremely unorganized. Does anyone here have another framework to recommend?
I use mostly NaiveBayesian for now.
I was thinking of using nltk's NaiveClassification but then they don't think they can handle continuous variables.
What are my options?
The scikit-learn has an implementation of Gaussian naive Bayesian classifier. In general, the goal of this library is to provide a good trade off between code that is easy to read and use, and efficiency. Hopefully it should be a good library to learn of the algorithms work.
This might be a good place to start. It's the full source code (the text parser, the data storage, and the classifier) for a python implementation of of a naive Bayesian classifier. Although it's complete, it's still small enough to digest in one session. I think the code is reasonably well written and well commented. This is part of the source code files for the book Programming Collective Intelligence.
To get the source, click the link, dl and unpack the zip, from the main folder 'PCI_Code', go to the folder 'chapter 6', which has a python source file 'docclass.py. That's the complete source code for a Bayesian spam filter. The training data (emails) are persisted in an sqlite database which is also included in the same folder ('test.db') The only external library you need are the python bindings to sqlite (pysqlite); you also need sqlite itself if you don't already have it installed).
If you're processing natural language, check out the Natural Language Toolkit.
If you're looking for something else, here's a simple search on PyPI.
pebl appears to handle continuous variables.
I found Divmod Reverend to be the simplest and easiest to use Python Bayesian classifier.
I just took Paul Graham's LISP stuff and converted to to Python
http://www.paulgraham.com/spam.html
There’s also SpamBayes, which I think can be used as a general naive Bayesian clasisfier, instead of just for spam.

Categories