Visualizing LDA topic models using Stephen Hansen's topicmodels package - python

I'm using Stephen Hansen's topicmodels package to create a topic model. I'm using the tutorial data available at that link to start. This tutorial uses Python 2 but only requires one changes xrange() to range() in order to have it work properly with Python 3. I was able to run through the tutorial fully but I'm struggling to visually represent the results. First, does anyone have and resources on visualizing LDA models using this package? I haven't been able to locate any.
Primarily, I'm trying to create two kinds of graphics. First, I'm trying to create a word cloud by topic. The second is something that looks like this: (full paper available here)
I have a CSV document from running the tutorial that has all the necessary information to do this that looks like the following:
The issue is that I'm unsure how to go about creating this document. I'm also unsure how I could use this to create word clouds by topic. There's not much use of topicmodels in python and I don't have very much experience creating any graphics with Python. I'm lost trying to figure out how to convert my output to something I can use with wordclouds or any other visual.
If anyone wants to replicate this data, you just need to have a C++ compiler installed on your computer, download and adjust xrange() in the tutorial, and run it through.

Related

Python and Bridge for JS

Im starting a new project for personal porpuse!
Im working personally in finance, I decided to create personal chart viewing software to suit my needs. I thought I'd create a good part of the backend in Python as it is a language I know quite well. Use Javascript for the graphic side, and use a webview in a windows form to make everything available as if it were software.
As for the graphics I thought of using: Lightweight Library for JS, I find that today it is the most avant-garde library compared to the classic plotly, matplotlib.
Use pywebview as a bridge between Python and JS and to redirect everything in a windows form.
However I find it a bit difficult to use this library (pywebview), there is a poor documentation around and not very clear to me (my level of JS is really basic). However, I believe it is one of the most convenient solutions.
I would have thought of using pyscript, but it still seems early to use this library in production.
Questions:
Do you think such a job is feasible?
Do you know other libraries/better solutions to do such jobs?
What kind of approach would you use if you were to do such work?
What I am trying to achieve is to write most of the functions in python and use only JS to make calls through buttons or to get data from various inputs.
Wandering around the web I found a work already partially created by this guy (if you are reading, Thanks Filipe you have been very helpful! here you can see his project hosted on Github) but unfortunately it is difficult for me to get my hands on a code not produced by me.

How to quickly familiarize yourself with the usage of a python third party Library

When learning python, I have a puzzle. Python has rich third-party libraries, but it makes me a novice difficult to use them. For example, when using Matplotlib, I just know what it can do, but specifically, for example, I want to draw a complex diagram, but I can't start with it, because there are many functions, but I don't know where they are, The introduction of the official manual sometimes feels a little abstract. If you go to Google and search a specific function, you may not get the desired result. So how do you quickly start a third-party library
One of the ways to get familiarised quickly with any 3rd party lib (python lib) is to go through getting started / Quickstart section of the documentation (for any library)
If that doesn't help then these two below sites have always helped me get a quick intro and basic hands-on for most of python libs
Real Python (https://realpython.com)
Tutorialspoint (https://www.tutorialspoint.com/index.htm)
Full-stack Python (https://www.fullstackpython.com) is another site I refer to when I have to find a new python library.
These sites pretty much cover almost all the well known python libraries.
And most of the famous libraries documentation sites provide a link to some sort of community on discord / Gitter / some site which would help further.
Example: Numpy
Learn section with Quickstart and other example based tuts
community section with links to several groups

Implementing a CNN Deep Learning Model in C++

I apologize in advance if anything about the structure of my question or the way I’m asking it isn’t exactly correct; this is my first time asking a question on here.
The Problem
I’ve built a GUI application for my research position that interfaces with a radar sensor’s API in order to perform real-time imaging in a variety of formats (this uses C++/Qt). My next step will be architecting and implementing a CNN that will essentially take in image data retrieved by the sensor and perform binary classification on it. For the past week or so, I’ve had an absolutely horrible time attempting to include any kind of mainstream deep learning framework in my pre-existing app.
What I’ve Tried
TensorFlow
My first thought was to employ TensorFlow (due to its popularity) to construct my network in Python and then load the model into my C++ app. Unfortunately, I’ve been completely unable to include TensorFlow in my app properly due to a lack of any kind of clear documentation or instruction on how to do so.
PyTorch
After beating my head against a wall for a few days with TensorFlow, I figured I’d try a similar approach with PyTorch because I heard plenty of people commending it for being more user-friendly and intuitive. Once again, trying to include the PyTorch C++ API in my app has been a total nightmare. All I can find in the docs are a couple tutorials in which CMake is used to generate a new project, but for obvious reasons, this wouldn’t work for my use case.
I feel like I’m chasing my tail at this point; my next thought is to try again with another kind of deep learning framework, but I feel like I’ll fall right back into this same issue of being unable to include the library in my pre-existing app. Any recommendations/guidance would be greatly appreciated!

Passing Python strings to Mallet for topic modelling

I'm building a corpus of texts harvested alongside some metadata from HTML with BeautifulSoup. It would be really helpful if I could call Mallet from within Python, and have it model topics from Python strings, rather than from text files in a directory. That way I could put the n keywords located by Mallet into each file.
I get a message saying that Mallet has been recognised when I run:
from nltk.classify import mallet
from subprocess import call
mallet.config_mallet("malletdir/mallet-2.0.7/bin")
But I haven't had any luck with the next steps, and am not even sure if Mallet accepts anything other than saved files.
I have not been able to turn up any documentation that I can really understand. Has anybody seen digestable documentation for this? (The NLTK book doesn't get into Mallet). I would also be happy to learn of any other means of topic modelling within Python that I could operationalise without a really deep knowledge of Python.
Sorry, this is my first rodeo.
In case you are still looking for a solution: Gensim (a Python topic modeling/machine learning packet) has a wrapper for Mallet which is easy to use and well documented. Here are some Gensim tutorials and a specific tutorial for the Mallet wrapper. You may also want to read some installation instructions (mostly the part about setting Java memory) here and then you'd be ready to go.
I once tried implementing Mallet with an NLTK project and I too ran into dead end after dead end. I think that main thing to keep in here is Mallet is Java based while NLTK is written in Python.
You already knew that but my point is for me personally I struggled with mixing the technologies because I do not have a strong background with Java. I've received the same feedback from coworkers about Mallet with Python, "Be ready to spend a lot of time debugging."
Since then I've been using the sklearn library for Python. It is aimed at machine learning more generally, not directly for NLP but can be used for it quite nicely. It comes with a very large selection of modelling tools and most of it seems to rely on NumPy so it should be pretty fast. I've used it quite a bit and can say that it is very well written and documented.
I don't want to discourage you from using Mallet, especially just because I said so. But if you are open to alternatives, I think you will find that when building projects with NLTK it's far easier to using Python modules since it itself is written in Python. I hope this helps!

Data visualization in python - after connecting to a database

Can you help me to connect to my postgresql database with python? I need to create graphic interface with python which will visualize shapefile data from my database (i have about 50 polygons in shapefile format in that database). Can you help me with creating such application? I am begginer in python.
For communicating with the database, use psycopg2. It's quick, easy and efficient if you are familiar with basic DB concepts.
You have several options from here. You can use shpUtils, which is supposed to be a nice package for parsing shapefiles. You can then visualize the data using numerous python graphics packages, like pil.
PIL image source code here.
Every option suits a different need, depending on what you define as "create graphic interface". If you need to create a simple graphics output, build the polygons from text using one of the graphics utilities mentioned above. If you need to create a professional-looking image, try using mapnik (mentioned in some other answers), which easily reads shapefiles. If you need to create a fully functional GUI, it's probably not a beginner's task - you should start with programming basic GUI applications before diving into shapefiles and polygons.
If, however, you just need to view the polygons - skip python and just use qgis, which will very easily visualize your polygons. It also comes with a handful of other nice features, like colors, zooms and so on.
(source: sourceforge.net)
I would approach this by breaking it up into smaller problems and solving each of them
a) How do I connect to a postgresql database with python?
https://stackoverflow.com/search?q=postgresql+database+python - Looks like psycopg2 is a good option as Adam Matan suggested.
b) Drawing shapefile data in python
postgresql and python
Mapnik is great for drawing maps. It can handle various formats and shapefiles, too. As far as I know it also supports PostgreSQL (at least PostGIS).
And least but not last: it comes with a Python interface (see Getting started)

Categories