Apologies if this is not the right place for this question.
I've recently started using MIT's MEEP software (Python3, on Linux). I am quite new to it and would like to mostly use it for photovoltaics projects. Somewhat common shapes that show up here are "inverted pyramid" and slanted (oblique) cone structures. Creating shapes in MEEP seems to generally be done with the GeometricObject class, but they don't seem to directly support either of these structures. Is there any way around this or is my only real option simulating these structures by stacking small Block objects?
As described in my own "answer" posted, it's not too difficult to just define these geometric objects myself, write a function to check if it's inside the object, and return the appropriate material. How would I go about converting this to a MEEP GeometricObject, instead of converting that to a material_func as I've done?
No responses, so I thought I'd post my hacky way around it. There are two solutions: First is as mentioned in the question, just stacking MEEP's Block object. The other approach I did was define my own class Pyramid, which works basically the same way as described here. Then, I convert a list of my class objects and MEEP's shape object to a function that takes a vector and returns a material, and this is fed as material_func in MEEP's Simulation object. So far, it seems to work, hence I'm posting it as an answer. However, It substantially slows down subpixel averaging (and maybe the rest of the simulation, though I haven't done an actual analysis), so I'm not very happy with it.
I'm not sure which is "better" but the second method does feel more precise, insofar that you have pyramids, not just a stack of Blocks.
Related
I can't seem to find the code for numpy argmax.
The source link in the docs lead me to here, which doesn't have any actual code.
I went through every function that mentions argmax using the github search tool and still no luck. I'm sure I'm missing something.
Can someone lead me in the right direction?
Thanks
Numpy is written in C. It uses a template engine that parsing some comments to generate many versions of the same generic function (typically for many different types). This tool is very helpful to generate fast code since the C language does not provide (proper) templates unlike C++ for example. However, it also make the code more cryptic than necessary since the name of the function is often generated. For example, generic functions names can look like #TYPE#_#OP# where #TYPE# and #OP# are two macros that can take different values each. On top of all of this, the CPython binding also make the code more complex since C functions have to be wrapped to be called from a CPython code with complex arrays (possibly with a high amount of dimensions and custom user types) and CPython arguments to decode.
_PyArray_ArgMinMaxCommon is a quite good entry point but it is only a wrapping function and not the main computing one. It is only useful if you plan to change the prototype of the Numpy function from Python.
The main computational function can be found here. The comment just above the function is the one used to generate the variants of the functions (eg. CDOUBLE_argmax). Note that there are some alternative specific implementation for alternative type below the main one like OBJECT_argmax since CPython objects and strings must be computed a bit differently. Thank you for contributing to Numpy.
As mentioned in the comments, you'll likely find what you are searching in the C code implementation (here under _PyArray_ArgMinMaxCommon). The code itself can be very convoluted, so if your intent was to open an issue on numpy with a broad idea, I would do it on the page you linked anyway.
I have taught myself python in quite a haphazard way. So my question perhaps won't be very pythonic.
When I write functions in classes, I often lose overview of what each function does. I do try to name them properly. But still, they are sometimes smaller parts of code where it seems arbitrary in which functions to put them. So whenever I want to make changes, I still need to go through the entire code in order to figure out how my functions actually flow.
From what I understand, we have objects and we have functions, and these are our units for structuring our code. But this only gives you a flat structure. It doesn't allow you to do any kind of nesting, like in a tree diagram, over multiple levels. Especially, the code in my file doesn't automatically water itself so that the functions that are called first and more often automatically further up on the top, whereas helper functions would be automatically further down in the document, or even nested.
In fact, even being able to visually nest lower-order subroutines "inside" a higher-order function that calls it would seem helpful. But it's not something that would be supported by Python's syntax. (Plus, it wouldn't quite suffice, because a sub-routine might be used by several higher-order functions.)
I imagine it would be useful to see all functions in my code visualized as a tree, or as a concept map:
Where each function is a dot. And calling order visualized by arrows. This way, I would also easily see which functions are more central, and which are more outliers, or even orphaned.
Yet perhaps this isn't even a case for another tool. Perhaps this is more a case of me not understanding proper coding. Perhaps there is something I can do differently in order to get the kind of intuitive overview over how my program works, without needing another tool.
Firstly, I am not quite sure why this isn't asked more often. Reading code is not intuitive, at all! We should be able to visualize the evolution of a process or function so well that close to every one will be able to understand its behavior. In the 60s and so, people had to be reasonably sure their programs would run, because getting access to the computer would take time; today we execute or compile our program, run tests if we have them, and get to know immediately whether it works. What happened is there is less mental effort now, we execute a bit less code in our heads, and the computer a bit more. But we must still think of how the program behaves midst execution in order to debug. For the future, it would be nice if the computer could just tell us how the program behaves.
You propose looking at a sort of tree of the program as a resolute, and after all, the abstract syntax tree is literally a tree, but I don't think this is what we ought to spend our efforts on when it comes to visualizing systems. What would be preferable is if we could look at an interactive view of how the problem changes its intermediate data-structures as a function of time.
Currently, we have debuggers--but that's akin to looking at the issue by asking what a function is at many values, when we would much rather look at its graph. A lot of programming is done by doing something you feel is right, observing if the behavior correct, if it's not correct then we make modifications by reacting and correcting said behavior.
Bret Victor in his essay, Learnable Programming, discusses this topic. I highly recommend it, even though it won't help you right now, maybe you can help others in the future by making these ideas more prevalent.
Onwards, then to where I tell you what you can do right now. In his book Clean Code, Robert C. Martin proposes structuring code much like how a newspaper is laid out.
Think of a well-written newspaper article. You read it vertically. At the top you expect a headline that will tell you what the story is about and allows you to decide whether it is something you want to read. The first paragraph gives you a synopsis of the whole story, hiding all the details while giving you the broad-brush concepts. As you continue downward, the details increase until you have all the dates, names, quotes, claims, and other minutia.
What is proposed, is to organize your program top-down, with higher level procedures that call mid-level procedures, which in turn call the lower level procedures. At any place, it should be obvious that (1) you are at the appropriate level of abstraction, and (2) you are looking at the part of the program implementing the behavior you seek to modify.
This means storing state at the level where it belongs, and exposing it anywhere else. Procedures should take only the parameters they need, because more parameters means you must also think about more parameters when reasoning about the code.
This is the primary reason for decomposing procedures into smaller procedures. For example, here some code I've written previously. It's not perfect, but you can see very clearly which part of the program you need to go to if you want to change anything.
Of course, higher order procedures are listed before any other. I'm telling you what I'm going to do, before I show you how I do it.
function formatLinks(matches, stripped) {
let formatted_links = []
for (match of matches) {
let diff = difference(match, stripped)
if (isSimpleLink(diff)) {
formatted_links.push(formatAsSimpleLink(diff))
} else if (hasPrefix(diff)) {
formatted_links.push(formatAsPrefixedLink(diff))
} else if (hasSuffix(diff)) {
formatted_links.push(formatAsSuffixedLink(diff))
}
}
// There might be multiple links within a word
// in which case we join them and strip any duplicated parts
// (which are otherwise interpreted as suffixes)
if (formatted_links.length > 1) {
return combineLinks(formatted_links)
}
// Default case
return formatted_links[0]
}
If JavaScript was a typed language, and if we could see an image of the decisions made in the code, as a factor of input and time, this could be even better.
I think Quokka.js and VS Code Debug Visualizer are both doing interesting work in this sector.
We're trying to assess the feasibility of this idea:
We have a pretty deep stack of HasTraits objects in a modeling program. For example, if we are modeling two materials, we could access various attributes on these with:
Layer.Material1.Shell.index_of_refraction
Layer.Material5.Medium.index_of_refraction
We've used this code for simulations, where we merely increment the values of a trait. For example, we could run a simulation were the index_of_refraction of one of these materials varies from 1.3 to 1.6 over 10 iterations. It actually is working quite nicely.
The problem is in selecting the desired traits for the simulation. Users aren't going to know all of these trait variable names, so we wanted to present a heirarchal/tree view of the entire trait structure of the program. For the above two traits, it might look like:
Layer
- Material1
- Shell
- index_of_refraction
- Material2
- Medium
- index_of_refraction
Etc...
I know that traitsui supports TreeEditors, but are there any examples of building a TreeEditor based on the inspection of a HasTraits stack like this? What is the most straightforward way to get the Stack of traits from an object? Essentially, is this idea feasible or should I go back to the drawing board?
Thanks
The ValueEditor does this. You can take a look at how it configures the TreeEditor to do this here:
https://github.com/enthought/traitsui/blob/master/traitsui/value_tree.py
Here is an image from Robert's solution.
Followup Discussion
Robert, imagine I had a custom TreeEditor. It doesn't seem to let me use it directly:
Item('myitem', editor=TreeEditor())
I get:
traits.trait_errors.TraitError: The 'adapter' trait of an ITreeNodeAdapterBridge instance must be an implementor of, or can be adapted to implement, ITreeNode or None, but a value of [<pame.gensim.LayerSimulation object at 0x7fb623bf0830>] <class 'traits.trait_handlers.TraitListObject'> was specified.
I've tried this with _ValueTree, ValueTree, value_tree_editor, value_tree_editor_with_root, _ValueEditor and ValueEditor.
The only one that works is ValueEditor, therefore, even though I can understand how to subclass TraitsNode, it doesn't seem like it's going to work unless I hook everything up through an EditorFactory. IE the behavior we want to customize is all the way down in TreeEditor, and that's buried under _ValueEditor, ValueEditor, EditorFactory etc...
Does this make any sense?
I'm looking for a Reportlab wrapper which does the heavy lifting for me.
I found this one, which looks promising.
It looks cumbersome to me to deal with the low-level api of Reportlab (especially positioning of elements, etc) and a library should facilitate at least this part.
My code for creating .pdfs is currently a maintain hell which consists of positioning elements, taking care which things should stick together, and logic to deal with varying length of input strings.
For example while creating pdf invoices, I have to give the user the ability to adjust the distance between two paragraphs. Currently I grab this info from the UI and then re-calculate the position of paragraph A and B based upon the input.
Besides that I look for a wrapper to help me with this, it would be great if someone could point me to / provide a best-practice example on how to deal with positioning of elements, varying lengh of input strings etc.
For future reference:
Having tested the lib PDFDocument, I can only recommend it. It takes away a lot of complexity, provides a lot of helper functions, and helps to keep your code clean. I found this resource really helpful to get started.
I have a scientific data management problem which seems general, but I can't find an existing solution or even a description of it, which I have long puzzled over. I am about to embark on a major rewrite (python) but I thought I'd cast about one last time for existing solutions, so I can scrap my own and get back to the biology, or at least learn some appropriate language for better googling.
The problem:
I have expensive (hours to days to calculate) and big (GB's) data attributes that are typically built as transformations of one or more other data attributes. I need to keep track of exactly how this data is built so I can reuse it as input for another transformation if it fits the problem (built with right specification values) or construct new data as needed. Although it shouldn't matter, I typically I start with 'value-added' somewhat heterogeneous molecular biology info, for example, genomes with genes and proteins annotated by other processes by other researchers. I need to combine and compare these data to make my own inferences. A number of intermediate steps are often required, and these can be expensive. In addition, the end results can become the input for additional transformations. All of these transformations can be done in multiple ways: restricting with different initial data (eg using different organisms), by using different parameter values in the same inferences, or by using different inference models, etc. The analyses change frequently and build on others in unplanned ways. I need to know what data I have (what parameters or specifications fully define it), both so I can reuse it if appropriate, as well as for general scientific integrity.
My efforts in general:
I design my python classes with the problem of description in mind. All data attributes built by a class object are described by a single set of parameter values. I call these defining parameters or specifications the 'def_specs', and these def_specs with their values the 'shape' of the data atts. The entire global parameter state for the process might be quite large (eg a hundred parameters), but the data atts provided by any one class require only a small number of these, at least directly. The goal is to check whether previously built data atts are appropriate by testing if their shape is a subset of the global parameter state.
Within a class it is easy to find the needed def_specs that define the shape by examining the code. The rub arises when a module needs a data att from another module. These data atts will have their own shape, perhaps passed as args by the calling object, but more often filtered from the global parameter state. The calling class should be augmented with the shape of its dependencies in order to maintain a complete description of its data atts.
In theory this could be done manually by examining the dependency graph, but this graph can get deep, and there are many modules, which I am constantly changing and adding, and ... I'm too lazy and careless to do it by hand.
So, the program dynamically discovers the complete shape of the data atts by tracking calls to other classes attributes and pushing their shape back up to the caller(s) through a managed stack of __get__ calls. As I rewrite I find that I need to strictly control attribute access to my builder classes to prevent arbitrary info from influencing the data atts. Fortunately python is making this easy with descriptors.
I store the shape of the data atts in a db so that I can query whether appropriate data (i.e. its shape is a subset of the current parameter state) already exists. In my rewrite I am moving from mysql via the great SQLAlchemy to an object db (ZODB or couchdb?) as the table for each class has to be altered when additional def_specs are discovered, which is a pain, and because some of the def_specs are python lists or dicts, which are a pain to translate to sql.
I don't think this data management can be separated from my data transformation code because of the need for strict attribute control, though I am trying to do so as much as possible. I can use existing classes by wrapping them with a class that provides their def_specs as class attributes, and db management via descriptors, but these classes are terminal in that no further discovery of additional dependency shape can take place.
If the data management cannot easily be separated from the data construction, I guess it is unlikely that there is an out of the box solution but a thousand specific ones. Perhaps there is an applicable pattern? I'd appreciate any hints at how to go about looking or better describing the problem. To me it seems a general issue, though managing deeply layered data is perhaps at odds with the prevailing winds of the web.
I don't have specific python-related suggestions for you, but here are a few thoughts:
You're encountering a common challenge in bioinformatics. The data is large, heterogeneous, and comes in constantly changing formats as new technologies are introduced. My advice is to not overthink your pipelines, as they're likely to be changing tomorrow. Choose a few well defined file formats, and massage incoming data into those formats as often as possible. In my experience, it's also usually best to have loosely coupled tools that do one thing well, so that you can chain them together for different analyses quickly.
You might also consider taking a version of this question over to the bioinformatics stack exchange at http://biostar.stackexchange.com/
ZODB has not been designed to handle massive data, it is just for web-based applications and in any case it is a flat-file based database.
I recommend you to try PyTables, a python library to handle HDF5 files, which is a format used in astronomy and physics to store results from big calculations and simulations. It can be used as an hierarchical-like database and has also an efficient way to pickle python objects. By the way, the author of pytables explained that ZOdb was too slow for what he needed to do, and I can confirm you that. If you are interested in HDF5, there is also another library, h5py.
As a tool for managing the versioning of the different calculations you have, you can have a try at sumatra, which is something like an extension to git/trac but designed for simulations.
You should ask this question on biostar, you will find better answers there.