Can anyone please point me to some techniques to do a Point of View Analysis on novel text?
I'm basically looking for methods to determine how many words were written from different characters Points of View in any novel preferably using Python
Something like this: Statistical Analysis of WoT
Maybe this could help you.
You first start by storing every word, then you count every apparition.
If you also need to plot them in a graph, you should try to do an histogram with matplotlib like here.
Hope it helps !
Related
I am working on Python 2.7. I want to create nomograms based on the data of various variables in order to predict one variable. I am looking into and have installed PyNomo package.
However, the from the documentation here and here and the examples, it seems that nomograms can only be made when you have equation(s) relating these variables, and not from the data. For example, examples here show how to use equations to create nomograms. What I want, is to create a nomogram from the data and use that to predict things. How do I do that? In other words, how do I make the nomograph take data as input and not the function as input? Is it even possible?
Any input would be helpful. If PyNomo cannot do it, please suggest some other package (in any language). For example, I am trying function nomogram from package rms in R, but not having luck with figuring out how to properly use it. I have asked a separate question for that here.
The term "nomogram" has become somewhat confused of late as it now refers to two entirely different things.
A classic nomogram performs a full calculation - you mark two scales, draw a straight line across the marks and read your answer from a third scale. This is the type of nomogram that pynomo produces, and as you correctly say, you need a formula. As mentioned above, producing nomograms like this is definitely a two-step process.
The other use of the term (very popular, recently) is to refer to regression nomograms. These are graphical depictions of regression models (usually logistic regression models). For these, a group of parallel predictor variables are depicted with a common scale on the bottom; for each predictor you read the 'score' from the scale and add these up. These types of nomograms have become very popular in the last few years, and thats what the RMS package will draft. I haven't used this but my understanding is that it works directly from the data.
Hope this is of some use! :-)
Objective: I am trying to do a project on Natural Language Processing (NLP), where I want to extract information and represent it in graphical form.
Description:
I am considering news article as input to my project.
Removing unwanted data in the input & making it in Clean Format.
Performing NLP & Extracting Information/Knowledge
Representing Information/Knowledge in Graphical Format.
Is it Possible?
If want to use nltk, you can start here. Its has some explanation about tokenizing, Part Of Speech Tagging, Parsing and more.
Check this page for an example of named entity detection using nltk.
The Graphical representation can be performed using igraph or matplotlib.
Also, scikit-learn has a great text feature extraction methods, in case you want to run some more sophisticated models.
The first step is to try and do this job yourself by hand with a pencil. Try it on not just one but a collection of news stories. You really do have to do this and not just think about it. Draw the graphics just as you'd want the computer to.
What this does is forces you to create rules about how information is transformed to graphics. This is NOT always possible, so doing it by hand is a good test. If you can't do it then you can't program a computer to do it.
Assuming you have found a paper and pencil method. What I like to do is work BACKWARDS. Your method starts with the text. No. Start with the numbers you need to draw the graphic. Then you think about where are these numbers in the stories and what words do I have to look at to get these numbers. Your job is now more like a hunting trip, you know the data is there, but how to find it.
Sorry for the lack of details but I don't know your exact problem but this works in every case. First learn to do the job yourself on paper then work backwards from the output to the input.
If you try to design this software in the forward direction you get stuck soon because you can't possibly know what to do with your text because you don't know what you need, it's like pushing a rope it don't work. Go to the other end and pull the rope. Do the graphic work FIRST then pull the needed data from the news stories.
My Question is as follows:
I know a little bit about ML in Python (using NLTK), and it works ok so far. I can get predictions given certain features. But I want to know, is there a way, to display the best features to achieve a label? I mean the direct opposite of what I've been doing so far (put in all circumstances, and get a label for that)
I try to make my question clear via an example:
Let's say I have a database with Soccer games.
The Labels are e.g. 'Win', 'Loss', 'Draw'.
The Features are e.g. 'Windspeed', 'Rain or not', 'Daytime', 'Fouls committed' etc.
Now I want to know: Under which circumstances will a Team achieve a Win, Loss or Draw? Basically I want to get back something like this:
Best conditions for Win: Windspeed=0, No Rain, Afternoon, Fouls=0 etc
Best conditions for Loss: ...
Is there a way to achieve this?
My paint skills aren't the best!
All I know is theory, so well you'll have to look for the code..
If you have only 1 case(The best for "x" situations) the diagram becomes something like (It won't be 2-D, but something like this):
Green (Win), Orange(Draw), Red(Lose)
Now if you want to predict whether the team wins, loses or draws, you have (at least) 2 models to classify:
Linear Regression, the separator is the Perpendicular bisector of the line joining the 2 points:
K-nearest-neighbours: it is done just by calculating the distance from all the points, and classifying the point as the same as the closest..
So, for example, if you have a new data, and have to classify it, here's how:
We have a new point, with certain attributes..
We classify it by seeing/calculating which side of the line the point comes in (or seeing how far it is from our benchmark situations...
Note: You will have to give some weightage to each factor, for more accuracy..
You could compute the representativeness of each feature to separate the classes via feature weighting. The most common method for feature selection (and therefore feature weighting) in Text Classification is chi^2. This measure will tell you which features are better. Based on this information you can analyse the specific values that are best for every case. I hope this helps.
Regards,
Not sure if you have to do this in python, but if not, I would suggest Weka. If you're unfamiliar with it, here's a link to a set of tutorials: https://www.youtube.com/watch?v=gd5HwYYOz2U
Basically, you'd just need to write a program to extract your features and labels and then output a .arff file. Once you've generated a .arff file, you can feed this to Weka and run myriad different classifiers on it to figure out what model best fits your data. If necessary, you can then program this model to operate on your data. Weka has plenty of ways to analyze your results and to graphically display said results. It's truly amazing.
Alright everyone, this one is super niche:
I am attempting to use the earworm.py code to analyze the timbre and pitch features of very short mp3s/tracks (1 second minimum); however, the code is returning no features and an empty graph.
The issue seems to stem from the function get_central(analysis, member='segments'). With short tracks, '"member = getattr(analysis, member)" returns empty.
Why is this? Is there a quick fix I could use like changing "member='segments'" to something that is more fine-grained?
Is there a way to extract timbre and pitch features from such short tracks using EchoNest?
I have some US demographic and firmographic data.
I would like to plot zipcode areas in a state or a smaller region (e.g. city). Each area would be annotated by color and/or text specific to that area. The output would be similar to http://maps.huge.info/ but a) with annotated text; b) pdf output; c) scriptable in R or Python.
Is there any package and code that allows me to do this?
I am assuming you want static maps.
(source: eduardoleoni.com)
1) Get the shapefiles of the zip boundaries and state boundaries at census.gov:
2) Use the plot.heat function I posted in this SO question.
For example (assumes you have the maryland shapefiles in the map subdirectory):
library(maptools)
##substitute your shapefiles here
state.map <- readShapeSpatial("maps/st24_d00.shp")
zip.map <- readShapeSpatial("maps/zt24_d00.shp")
## this is the variable we will be plotting
zip.map#data$noise <- rnorm(nrow(zip.map#data))
## put the lab point x y locations of the zip codes in the data frame for easy retrieval
labelpos <- data.frame(do.call(rbind, lapply(zip.map#polygons, function(x) x#labpt)))
names(labelpos) <- c("x","y")
zip.map#data <- data.frame(zip.map#data, labelpos)
## plot it
png(file="map.png")
## plot colors
plot.heat(zip.map,state.map,z="noise",breaks=c(-Inf,-2,-1,0,1,2,Inf))
## plot text
with(zip.map#data[sample(1:nrow(zip.map#data), 10),] , text(x,y,NAME))
dev.off()
There are many ways to do this in R (see the spatial view); many of these depend on the "maps" package.
Check out this cool example of the US 2004 election. It ends up looking like this:
Here's a slightly ugly example of a model that uses the "maps" package with "lattice".
Andrew Gelman made some very nice plots like this. See, for instance, this blog post on red states/blue states and this follow up post.
Here's a very simple example using the "gmaps" package, which shows a map of Arrests by state for arrests per 100,000 for Murder:
require(gmaps)
data(USArrests)
attach(USArrests)
grid.newpage()
grid.frame(name="map")
grid.pack("map",USALevelPlot(states=rownames(USArrests),levels=Murder,col.fun=reds),height=unit(1,'null'))
grid.pack("map",gradientLegendGrob(at=quantile(Murder),col.fun=reds),side="bottom",height=unit(.2,'npc'))
detach(USArrests)
Someone may have something more direct for you, but I found O'Reilly's 'Data Mashups in R' very interesting... in part, it's a spatial mapping of home foreclosure auctions.
http://oreilly.com/catalog/9780596804770/
In Python, you can use shapefiles from the US census along with the basemap package. Here is an example of filling in states according to population.
There is a rich and sophisticated series of packages in R to plot, do analysis, and other functions related to GIS. One place to get started is the CRAN task view on Spatial Data:
This is a complex and sometimes arcane world, and takes some work to understand.
If you are looking for a free, very functional mapping application, may I suggest:
MapWindow ( mapwindow.com)
Daniel Levine at TechCrunch Trends has done nice things with the maps package in R. He has code available on his site, too.
Paul's suggestion of looking into Processing - which Ben Fry used to make zipdecode - is also a good one, if you're up for learning a (Java-like) new language.
Depending on your application, a long way around might be to use something like this:
http://googlemapsmania.blogspot.com/2006/07/new-google-maps-us-zip-code-mashups.html
To map your data. If that wasn't quite what you wanted, you can get raw zip code shapefiles from census.gov and do it manually, which is quite a pain.
Also, if you haven't seen it, this is a neat way to interact with similar data, and might offer some pointers:
http://benfry.com/zipdecode/
Check out this excellent online visualization tool by IBM
http://manyeyes.alphaworks.ibm.com/manyeyes/
EDIT FYI, ManyEyes uses the Prefuse visualization toolkit for some of its viz. Even though it is a java-based framework, they also provide a Flash/ActionScript tool for the web.