I need to create a specific scatterplot using 3 data columns, x,y,z, (height, weight, ID number) as the basic inputs.
I have height, weight, and a unique identifier for each of ~2000 individuals in the set. I want the user to be able to highlight “their location” within the scatterplot of all the data point precisely. To do that out of 2000 datapoints, they’ll need to input their unique ID into a text box, executing, and altering the graph:
a)”accent” the unique input data point (e.g., change the individuals specific point to red while other data points remain gray)
b) as a sort of “tool tip”: Provide the exact values of height, weights and IDnumber in a readable “box” somewhere in or near the graph area, preferably in some open area of the graph’s “state space” This is mainly to let them note their recorded values in our dataset. (Yes, they presumably know their own height and weight… but imagine they’d like to check on whether we have misenterwd their values in the dataset)
I figure there’s an interactive graph package that allows this filter-by typed-input-value-z option, but have only seen filter options that are a small number predefined categories. For example what I have seen permits a drop down box to filter data based on Z. The problem is that my drop down for ID would have as many values as data points and that’s 1000s… so unwieldy compared to my text box idea.
I would like to do this in R or a package that can easily (Im a Stats user mainly, so my do not have much programming skills are limited to writing basic batch programs, .do files, with canned procedures). A non R package that will easily let me create, edit and slap this in a webpage would certainly do.
Related
I've implemented a function which, given a selection of cells, color codes them based on formulas. This is fairly fast, because we can retrieve the formula information, and then write the color data to the sheet in batches (one batch for each color, subject to the constraint that the range function has a 255 character limit on argument).
The end user might then want to recover their existing formatting (e.g., if their excel sheet is already color coded, it is helpful to spot mistakes in formulas using my function, but they won't want to lose the existing colors). For this, I'd have to be able to retrieve the existing color information before executing my function.
However, I cannot work out how to do this efficiently - it seems that we can only get color information from a cell using mysheet.range(...).color=... or .api.Interior.ColorIndex one cell at a time, which would then require iterating through the entire range of cells and accessing their colors. This is then quite slow for large ranges, as it requires interacting with excel once for each cell.
Is it possible to get cell color information in batches/minimize the number of calls to excel? Or are there other potential workarounds?
I have a set of co-ordinates(latitudes and longitudes) of different buildings of a city. The sample size is around 16,000. I plan to use these co-ordinates as the central point of their locality/neighbourhood, and do some analysis on the different neighbourhoods of the city. The "radius/size" for each neighbourhood is still undecided as of now.
However, a lot of these co-ordinates are too close to each other. So, many of them actually represent the same locality/neighbourhood.
As a result, I want to select a smaller sample(say, 3-6k) of co-ordinates that will be more evenly spread out.
Example:- If two of the co-ordinates are representing two neighbouring buildings, I don't want to include both as they pretty much represent the same area. So we must select only one of them.
This way, I was hoping to reduce the population to a smaller size, while at the same time being able to cover most of the city through the remaining co-ordinates.
One way I was imagining the solution is to plot these co-ordinates on a 2D graph(for visualisation). Then, we can select different values of "radius" to see how many co-ordinates would remain. But I do not know how to implement such a "graph".
I am doing this analysis in Python. Is there a way I can obtain such a sample of these co-ordinates that are evenly distributed with minimal overlap?
Thanks for your help,
It seems like for your use case, you might need clustering instead of sampling to reduce your analysis set.
Given that you'd want to reduce your "houses" data to "neighborhoods" data, I'd suggest exploring geospatial clustering to cluster houses that are closer together and then take your ~3-4K clusters as your data set to begin with.
That being said, if your objective still is to remove houses that are closer together, you can obviously create an N*N matrix of the geospatial distance between each house vs. others and remove pairs that are within (0, X] where X is your threshold.
I have (financial) data that I get in real time using an API and I'd like to display it in a customised manner (a bit like the result of a javascript code). For example, if I want to display 10x10 prices and update them as I receive the data and customise them to be green if it is higher than the previous price, red if lower or so, how should I do, what should I use?
I assume there exist a way to do so using python, but I can't formulate my demand briefly so I only get results that confuse me more using search engines...
Could someone help me by explaining where I can get started with that?
I'll give you an overview because what you want is a generalized approach and most UI packages (if not all) should be able to handle this. First, you need to pick a package to write your UI with. There are a number of these available for Python: see here. I'm not sure what your other requirements are so you'll have to choose the one you want yourself. Once you've picked it out, you'll basically go through and create a grid structure composed of individual cells. Each cell will contain a currency value. You'll then add an event for each cell that captures an "on-change" event for the value in the cell. If the new value is greater than the old one, you color it green. If it's less, color it red. You may also want to add a timer for each cell so that the color fades after a period of time.
I am doing an agent based modeling and currently have this set up in Python, but I can switch over to Java if necessary.
I have a dataset on Twitter (11 million nodes and 85 million directed edges), and I have set up a dictionary/hashmap so that the key is a specific user A and its value is a list of all the followers (people that follow user A). The "nodes" are actually just the integer ID numbers (unique), and there is no other data. I want to be able to visualize this data through some method of clustering. Not all individual nodes have to be visualized, but I want the nodes with the n most followers to be visualized clearly, and the surrounding area around that node would represent all the people who follow it. I'm modeling the spread of something throughout the map, so I need the nodes and areas around the nodes to change colors. Ideally, it would be a continuous visualization, but I don't mind it just taking snapshots at every ith iteration.
Additionally, I was thinking of having the clusters be separated such that:
if person A and person B have enough followers to be visualized individually, and person A and B are connected (one follows the other or maybe even both ways), then they are both visualized, but are visually separated from each other despite being connected so that the visualization is clearer.
Anyways, I was wondering whether there was a package in Python (preferably) or Java that would allow one to do this semi easily.
Gephi has a very nice GUI and an associated Java toolkit. You can experiment with visual layout in the GUI until you have everything looking the way you like and then code up your own version using the toolkit.
I'm trying to figure out how to calculate how many boxes can fit into a shipping container for my work. Does anyone know how I can do this in with Python? There's an example here:
http://www.searates.com/reference/stuffing/
Basically, user enters some different dimensions for boxes, quantities and weights. The program then would return how full the container is or how many more of each box could be added to it. It would also calculate weights of the entire unit.
Even more ideally, would be a picture describing how it's packed.