Any Naive Bayesian Classifier in python? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have tried the Orange Framework for Naive Bayesian classification.
The methods are extremely unintuitive, and the documentation is extremely unorganized. Does anyone here have another framework to recommend?
I use mostly NaiveBayesian for now.
I was thinking of using nltk's NaiveClassification but then they don't think they can handle continuous variables.
What are my options?

The scikit-learn has an implementation of Gaussian naive Bayesian classifier. In general, the goal of this library is to provide a good trade off between code that is easy to read and use, and efficiency. Hopefully it should be a good library to learn of the algorithms work.

This might be a good place to start. It's the full source code (the text parser, the data storage, and the classifier) for a python implementation of of a naive Bayesian classifier. Although it's complete, it's still small enough to digest in one session. I think the code is reasonably well written and well commented. This is part of the source code files for the book Programming Collective Intelligence.
To get the source, click the link, dl and unpack the zip, from the main folder 'PCI_Code', go to the folder 'chapter 6', which has a python source file 'docclass.py. That's the complete source code for a Bayesian spam filter. The training data (emails) are persisted in an sqlite database which is also included in the same folder ('test.db') The only external library you need are the python bindings to sqlite (pysqlite); you also need sqlite itself if you don't already have it installed).

If you're processing natural language, check out the Natural Language Toolkit.
If you're looking for something else, here's a simple search on PyPI.
pebl appears to handle continuous variables.

I found Divmod Reverend to be the simplest and easiest to use Python Bayesian classifier.

I just took Paul Graham's LISP stuff and converted to to Python
http://www.paulgraham.com/spam.html

There’s also SpamBayes, which I think can be used as a general naive Bayesian clasisfier, instead of just for spam.

Related

Cyclomatic complexity metric practices for Python [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have a relatively large Python project that I work on, and we don't have any cyclomatic complexity tools as a part of our automated test and deployment process.
How important are cyclomatic complexity tools in Python? Do you or your project use them and find them effective? I'd like a nice before/after story if anyone has one so we can take a bit of the subjectiveness out of the answers (i.e. before we didn't have a cyclo-comp tool either, and after we introduced it, good thing A happened, bad thing B happened, etc). There are a lot of other general answers to this type of question, but I didn't find one for Python projects in particular.
I'm ultimately trying to decide whether or not it's worth it for me to add it to our processes, and what particular metric and tool/library is best for large Python projects. One of our major goals is long term maintenance.
We used the RADON tool in one of our projects which is related to Test Automation.
RADON
Depending on new features and requirements, we need to add/modify/update/delete codes in that project. Also, almost 4-5 people were working on this. So, as a part of review process, we identified and used RADON tools since we want our code maintainable and readable.
Depend on the RADON tool output, there were several times we re-factored our code, added more methods and modified the looping.
Please let me know if this is useful to you.
Python isn't special when it comes to cyclomatic complexity. CC measures how much branching logic is in a chunk of code.
Experience shows that when the branching is "high", that code is harder to understand and change reliably than code in which the branching is lower.
With metrics, it typically isn't absolute values that matter; it is relative values as experienced by your organization. What you should do is to measure various metrics (CC is one) and look for a knee in the curve that relates that metric to bugs-found-in-code. Once you know where the knee is, ask coders to write modules whose complexity is below the knee. This is the connection to long-term maintenance.
What you don't measure, you can't control.
wemake-python-styleguide supports both radon and mccabe implementations of Cyclomatic Complexity.
There are also different complexity metrics that are not covered by just Cyclomatic Complexity, including:
Number of function decorators; lower is better
Number of arguments; lower is better
Number of annotations; higher is better
Number of local variables; lower is better
Number of returns, yields, awaits; lower is better
Number of statements and expressions; lower is better
Read more about why it is important to obey them: https://sobolevn.me/2019/10/complexity-waterfall
They are all covered by wemake-python-styleguide.
Repo: https://github.com/wemake-services/wemake-python-styleguide
Docs: https://wemake-python-stylegui.de
You can also use mccabe library. It counts only McCabe complexity, and can be integrated in your flake8 linter.

Python frameworks for NLP? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am working on a project wherein I have to extract the following information from a set of articles (the articles could be on anything):
People Find the names of any people present, like "Barack Obama"
Topic or related tags of the article, like "Parliament", "World Energy"
Company/Organisation I should be able to obtain the names of the any companies or organisations mentioned, like "Apple" or "Google"
Is there an NLP framework/library of this sort available in Python which would help me accomplish this task?
#sel and #3kt really good answers. OP you are looking for Entity Extraction, commonly referred to as Named entity recognition. There exist many APIs to perform this. But the first question you need to ask yourself is
What is the structure of my DATA? or rather,
Are my sentences good English sentences?
In the sense of figuring out whether the data you are working with is consistently grammatically correct, well capitalized and is well structured. These factors are paramount when it comes to extracting entities. The data I worked with were tweets. ABSOLUTE NIGHTMARE!! I performed a detailed analysis on the performance of various APIs on entity extraction and I shall share with you what I found.
Here's APIs that perform fabulous entity extraction-
NLTK has a handy reference book which talks in-depth about its functions with multiple examples. NLTK does not perform well on noisy data(tweets) because it has been trained on structured data.NLTK is absolute garbage for badly capitalized words(Eg, DUCK, Verb, CHAIR). Moreover, it is slightly less precise when compared to other APIs. It is great for structured data or curated data from News articles and Scholarly reports. It is a great learning tool for beginners.
Alchemy is simpler to implement and performs very well in categorizing the named entities.It has great precision when compared to the APIs I have mentioned.However, it has a certain transaction cost. You can only perform 1000 queries in a day! It identifies twitter-handles and can handle awkward capitalization.
IMHO sPacy API is probably the best. It's open source. It outperforms the Alchemy API but is not as precise. Categorizes entities almost as well Alchemy.
Choosing which API should be a simple problem for you now that you know how each API is likely to behave according to the data you have.
EXTRA -
POLYGLOT is yet another API.
Here is a blog post that performs entity extraction in NLTK.
There is a beautiful paper by Alan Ritter that might go over your head. But it is the standard for entity extraction(particularly in noisy data) at a professional level. You could refer to it every now and then to understand complex concepts like LDA or SVM for capitalisation.
What you are actually looking for is called in literature 'Named entity Recognition' or NER.
You might like to take a look at this tutorial:
http://textminingonline.com/how-to-use-stanford-named-entity-recognizer-ner-in-python-nltk-and-other-programming-languages
One easy way of solving this problem partially this problem is using regular expressions to extract words having the patterns that you can find in this paper to extract peoples names. This of course might lead to extracting all the categories you are looking for i.e. the topics and the campanies names as well.
There is also an API that you can use, that actually gives the same results you are looking for, which is called Alchemy. Unfortunatelly no documentation is available to explain the method they use to extract the topics nor the people's names.
Hope this helps.
You should take a look at NLTK.
Finding names and companies can be achieved by tagging the recovered text, and extracting proper nouns (tagged NNP). Finding the topic is a bit more tricky, and may require some machine learning on a given set of article.
Also, since we're talking about articles, I recommend the newspaper module, that can recover those from their URLs, and do some basic nlp operations (summary, keywords).

Python - side effects/purity analysis tools? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Are there any existing tools for side effects/purity analysis in Python, similar to http://jppa.sourceforge.net in Java?
I don't know of any that exist, but here are some general approaches to making one:
Analysing source files as text - using regular expressions to find things that show a function definitely isn't pure - e.g. the global keyword. For practical purposes, most decently written functions that only have a return statement in the body are likely to be pure. On the other hand, if a function doesn't have a return statement, it is either useless, or impure.
Analysing functions in a source file as code. If testing a function in isolation produces a NameError, you know that it is either impure (because it doesn't have access to variables at a higher level), or has a mistake in it (referring to a variable before it is defined or some such), however the latter case should be covered by normal testing. The inspect module's function isfunction may be useful if you want to do this.
For each function you test, if it has a relatively small domain (e.g. one input that can either be 1, 2, 3 or 4) then you could exhaustively test all possible inputs, and get a certain answer this way. If it has a limited, or finite but large domain (e.g. all the real numbers between 0 and 1000 (infinite but limited), or all the integers between -12345 and 67890) then you could try sampling a selection of inputs in that domain, and use that to get a probability of purity. However, this approach may not be very useful, as the domain of the function is unlikely to be specified, so you may only be able to check it if you wrote the function, in which case you may not need to analyse it anyway.
Doing something clever, possibly in combination with the above techniques. For instance, making a neural network, with the input as the text of a function, and the output as the likelihood of it being pure. You could then train the network on examples of functions you know to be pure or impure, and then use it on functions of unknown purity.
Edit:
I came back to this question after someone downvoted with new knowledge! The ast module should make it relatively easy to write your own analysis tool like this, as it allows you access to the abstract syntax tree of code. It should be fairly easy to walk through this tree and see if there is anything preventing purity. This is a much better approach than analysing source files as text, and I might have a go at it at some point.
Finally, this question might also be useful, and also this one, which is basically a duplicate of this question.

Good geometry library in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am looking for a good and well developed library for geometrical manipulations and evaluations in python, like:
evaluate the intersection between two lines in 2D and 3D (if present)
evaluate the point of intersection between a plane and a line, or the line of intersection between two planes
evaluate the minimum distance between a line and a point
find the orthonormal to a plane passing through a point
rotate, translate, mirror a set of points
find the dihedral angle defined by four points
I have a compendium book for all these operations, and I could implement it but unfortunately I have no time, so I would enjoy a library that does it. Most operations are useful for gaming purposes, so I am sure that some of these functionalities can be found in gaming libraries, but I would prefer not to include functionalities (such as graphics) I don't need.
Any suggestions ? Thanks
Perhaps take a look at SymPy.
Shapely is a nice python wrapper around the popular GEOS library.
I found pyeuclid to be a great simple general purpose euclidean math package. Though the library may not contain exactly the problems that you mentioned, its infrastructure is good enough to make it easy to write these on your own.
CGAL has Python bindings too.
I really want a good answer to this question, and the ones above left me dissatisfied. However, I just came across pythonocc which looks great, apart from lacking good docs and still having some trouble with installation (not yet pypi compatible). The last update was 4 days ago (June 19th, 2011). It wraps OpenCascade which has a ton of geometry and modeling functionality. From the pythonocc website:
pythonOCC is a 3D CAD/CAE/PLM development framework for the Python programming language. It provides features such as advanced topological and geometrical operations, data exchange (STEP, IGES, STL import/export), 2D and 3D meshing, rigid body simulation, parametric modeling.
[EDIT: I've now downloaded pythonocc and began working through some of the examples]
I believe it can perform all of the tasks mentioned, but I found it to be unintuitive to use. It is created almost entirely from SWIG wrappers, and as a result, introspection of the commands becomes difficult.
geometry-simple has classes Point Line Plane Movement in ~ 300 lines, using only numpy; take a look.
You may be interested in Python module SpaceFuncs from OpenOpt project, http://openopt.org
SpaceFuncs is tool for 2D, 3D, N-dimensional geometric modeling with possibilities of parametrized calculations, numerical optimization and solving systems of geometrical equations
Python Wild Magic is another SWIG wrapped code. It is however a gaming library, but you could manipulate the SWIG library file to exclude any undesired graphics stuff from the Python API.

Looking for a Good Reference on Neural Networks [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Duplicate
What are some good resources for learning about Artificial Neural Networks?
I'm looking for a good (beginner level) reference book (or website) on different types of Neural Nets/their applications/examples. I don't have any particular application in mind, I'm just curious as to how I can make use of them. I'm specifically interested in using them with Python, but any language, or even just theory would do fine.
There is quite a extensive series of courses avaliable at Heaton Research. The course is for C# (Avaliable also for Java) however it explains the concepts at length, so I suggest you take a look at it even if you will code in python yourself.
The courses are in video format, however most important concepts are also writen down.
See the below three links for Neural Networks using Python:
An Introduction to Neural Networks
Weave a Neural Net with Python
Neural Networks in Pyro
Ron Stephens
"Programming collective intelligence" by Toby Segaran has a chapter about NN and also some examples in Python.
You might want to try out A Brief Introdruction to Neural Networks by David Kriesel. It's a richly illustrated ebook, and it's available for free. It covers lots of Network paradigms and is less theoretical than the ebook of Rojas. Seems to be the best on the web that you can get for free
AI-Junkie has a very good intuitive tutorial about neural networks. The site is designed to minimize the required mathematics so that the tutorial is accessible.
I am currently using this site as a primer - not python, but a good feed-forward network example and pretty straightforward to follow.
At the same time I have been reading The Essence of Neural Networks by Robert Callan (ISBN 0-13-908732-X) which has a wide range of network architectures and applications and is an easy read.
Since you mention python I should direct you to this IBM site, which I found very useful and the underlying code is in python here. Citation should go to Neil Schemenauer.
I should also mention that I took the python code and ported it to numpy because it ran very slowly. I was unsuccessful, but before I rubbish numpy I have to say that I suspect my implementation was not very good and I'm sure there is a vectorised way of doing forward passes and backpropagations, I just didn't find it.
What I have ended up doing is implementing in Java by a simple port of the python code. This only took an hour or so and it runs about 100 times faster. I think this is more proof that I don't know what I'm doing with numpy, but if you are starting from scratch I would question whether raw python is the right language for you. You may be better coding this sort of thing in C or C++ if you have to use python.
Best of luck.
I think you have the python bit covered with the answers given above. As for the "or even just theory", Raúl Rojas has a hefty ebook you can download from his wiki page.
The best reference is "Neural Networks for Pattern Recognition", by Bishop. Another good book is "Neural Networks and Learning Machines", by Haykin.
More practical references include the user guides of the Neural Network Toolbox for Matlab or the Open Source Neural Networks C++ Library Flood.
FAQ ANN
newsgroup comp.ai.neural-nets archieves usable online or offline

Categories