I am still getting used to PVLIB and figuring out how to use the methods associated with it. I'd like to be able to model the growth of localised hotspots in a PV module due to current mismatch that results from partial shading.
I'd be surprised if I was the first person to do this, so I'm wondering what other solutions exist to do this in a straightforward way from PVLIB.
You can use the single-diode model functions from pvlib to build up the electrical simulation for this scenario, and thereby determine how much electrical power the affected cell absorbs.
There isn't a thermal model in pvlib to tell you how hot it would get, but as a first approximation you could adapt one of the existing module/cell temperature functions quite easily. There is a local variable called heat_input to which you can add the electrical power.
Use PVMismatch. It doesn't do self heating, but it will calculate mismatch for arbitrary cell patterns and string layouts with arbitrary irradiance, temperature and cell properties. Once calculated you can iterate temperature to find equilibrium
Related
I am new to OpenMM and I would appreciate some guidance on the following matter:
Currently I am not interested in running molecular dynamics simulations, for starters I would just like to compute what are the forces or free energies between individual pairs of atoms using OpenMMs AMBER force field for example. Essentially I would like to end up with a heat map which represents forces between atom pairs something like this:
Where numbers represent strength of the force or value of free energy.
I have trouble finding out how to access such lower level functionality of OpenMM where I could write a custom script that calculates only desired forces provided the 3D coordinates of atoms and their types. In their tutorials I have just found how to run fully fledged simulations by providing force field data and PDB files of molecular systems.
Preferably I would like to achieve this with python.
Any concrete example or guidance is much appreciated.
I have found an answer in the Openmm's issue tracker on GitHub.
In short: There is no API to achieve exactly that in OpenMM as what I am trying to do is not well defined from purely physical/chemical perspective. My best bet is to compute something that looks like an energy based only on pairwise inter-atom distances which can be quarried from an openmm state like this (as suggested in the discussion referenced above):
state = simulation.context.getState(getPositions=True)
positions = state.getPositions(asNumpy=True).value_in_unit(nanometer)
I am using lifelines package to do Cox Regression. After trying to fit the model, I checked the CPH assumptions for any possible violations and it returned some problematic variables, along with the suggested solutions.
One of the solution that I would like to try is the one suggested here:
https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Introduce-time-varying-covariates
However, the example written here is using CoxTimeVaryingFitter which, unlike CoxPHFitter, does not have concordance score, which will help me gauge the model performance. Additionally, CoxTimeVaryingFitter does not have check assumption feature. Does this mean that by putting it into episodic format, all the assumptions are automatically satisfied?
Alternatively, after reading a SAS textbook on survival analysis, it seemed like their solution is to create the interaction term directly (multiplying the problematic variable with the survival time) without changing the format to episodic format (as shown in the link). This way, I was hoping to just keep using CoxPHFitter due to its model scoring capability.
However, after doing this alternative, when I call check_assumptions again on the model with the time-interaction variable, the CPH assumption on the time-interaction variable is violated.
Now I am torn between:
Using CoxTimeVaryingFitter without knowing what the model performance is (seems like a bad idea)
Using CoxPHFitter, but the assumption is violated on the time-interaction variable (which inherently does not seem to fix the problem)
Any help regarding to solve this confusion is greatly appreciated
Here is one suggestion:
If you choose the CoxTimeVaryingFitter, then you need to somehow evaluate the quality of your model. Here is one way. Use the regression coefficients B and write down your model. I'll write it as S(t;x;B), where S is an estimator of the survival, t is the time, and x is a vector of covariates (age, wage, education, etc.). Now, for every individual i, you have a vector of covariates x_i. Thus, you have the survival function for each individual. Consequently, you can predict which individual will 'fail' first, which 'second', and so on. This produces a (predicted) ranking of survival. However, you know the real ranking of survival since you know the failure times or times-to-event. Now, quantify how many pairs (predicted survival, true survival) share the same ranking. In essence, you would be estimating the concordance.
If you opt to use CoxPHFitter, I don't think it was meant to be used with time-varying covariates. Instead, you could use two other approaches. One is to stratify your variable, i.e., cph.fit(dataframe, time_column, event_column, strata=['your variable to stratify']). The downside is that you no longer obtain a hazard ratio for that variable. The other approach is to use splines. Both of these methods are explained in here.
Hello old faithful community,
This might be a though one as I can barely find any material on this.
The Problem
I have a data set of crimes committed in NSW Australia by council, and have merged this with average house prices by council. I'm now looking to produce a linear regression to try and predict said house price by the crime in the neighbourhood. The issue is, I have 49 crimes, and only want the best ones (statistically speaking) to be used in my model.
I've run the regression score over all and some variables (using correlation), and had results from .23 - .38 but I want to perfect this to the best possible - if there is a way to do this of course.
I've thought about looping over every possible combination, but this would end up by couple of million according to google.
So, my friends - how can I python this dataframe to get the best columns?
If I might add, you may want to take a look at the Python package mlxtend, http://rasbt.github.io/mlxtend.
It is a package that features several forward/backward stepwise regression algorithms, while still using the regressors/selectors of sklearn.
There is no gold standard to solving this problem and you are right, selecting every combination is computational not feasible most of the time -- especially with 49 variables. One method would be to implement a forward or backward selection by adding/removing variables based on a user specified p-value criteria (this is the statistically relevant criteria you mention). For python implementations using statsmodels, check out these links:
https://datascience.stackexchange.com/questions/24405/how-to-do-stepwise-regression-using-sklearn/24447#24447
http://planspace.org/20150423-forward_selection_with_statsmodels/
Other approaches that are less 'statistically valid' would be to define a model evaluation metric (e.g., r squared, mean squared error, etc) and use a variable selection approach such as LASSO, random forest, genetic algorithm, etc to identify the set of variables that optimize the metric of choice. I find that in practice, ensembling these techniques in a voting-type scheme works the best as different techniques work better for certain types of data. Check out the links below from sklearn to see some options that you can code up pretty quickly with your data:
Overview of techniques: http://scikit-learn.org/stable/modules/feature_selection.html
A stepwise procedure: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html
Select best features based on model: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFromModel.html
If you are up for it, I would try a few techniques and see if the answers converge to the same set of features -- This will give you some insight into the relationships between your variables.
I was analyzing some geographical data and attempting to predict/forecast next occurrence of event with respect to time and it geographical position. The data was in following order (with sample data)
Timestamp Latitude Longitude Event
13307266 102.86400972 70.64039541 "Event A"
13311695 102.8082912 70.47394645 "Event A"
13314940 102.82240522 70.6308513 "Event A"
13318949 102.83402128 70.64103035 "Event A"
13334397 102.84726242 70.66790352 "Event A"
First step was classifying it into 100 zones, so that reduces dimensions and complexity.
Timestamp Zone
13307266 47
13311695 65
13314940 51
13318949 46
13334397 26
Next step was to do time series analysis then I got stuck here for 2 months, read around a lot of literature and figured these were my options
* ARIMA (auto-regression method)
* Machine Learning
I wanted to utilize Machine learning to forecast using python but couldn't really figure out how.Specifically are there any python libraries/open-source-code specific for use case, which I can build upon.
EDIT 1:
To clarify, data is loosely dependent on past data but over a period of time is uniformly distributed.
The best way to visualize the data would be, to imagine N number of agents controlled by a algorithm which allots them task of picking resource from grids. Resources are function of socioeconomic structure of society and also strongly dependent on geography. Its in interest of " algorithm " to be able to predict demand zone and time wise.
p.s:
For Auto-regressive models like ARIMA Python already has a library http://pypi.python.org/pypi/statsmodels .
Without example data or existing code I can't offer you anything concrete.
However, often it's helpful to re-phrase your problem in the nomenclature of the field you want to explore. In ML terms:
Your problem's features: How your inputs are specified. Timestamp is continuous, geographic zone is discrete.
Your problems's target label: an event, precisely whether or not a given event has occurred.
Your problem is supervised: target labels for previous data are available. You have previous instances of (timestamp, geographic zone) to event mappings.
The target label is discrete, so this is a classification problem (as opposed to a regression problem, where the output is continuous).
So I'd say you have a supervised classification problem. As an aside you may want to do some sort of time regularisation first; I'm guessing there are going to be patterns of the events depending on what time of the day, day of the month, or month of the year it is, and you may want to represent this as an additional feature.
Taking a look at one of the popular Python ML libraries available, scikit-learn, here:
http://scikit-learn.org/stable/supervised_learning.html
and consulting a recent posting on a cheatsheet for scikit-learn by one of the contributors:
http://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html
Your first good bet would be to try Support Vector Machines (SVM), and if that fails maybe give k Nearest Neighbours (kNN) a shot as well. Note that using an ensemble classifier is usually superior than using just one instance of a given SVM/kNN.
How, exactly, to apply SVM/kNN with time as a feature may require more research, since AFAIK (and others will probably correct me) SVM/kNN require bounded inputs with a mean of zero (or normalised to have a mean of zero). Just doing some random Googling you may be able to find certain SVM kernels, for example a Fourier kernel, that can transform a time-series feature for you:
SVM Kernels for Time Series Analysis
http://www.stefan-rueping.de/publications/rueping-2001-a.pdf
scikit-learn handily allows you to specify a custom kernel for an SVM. See:
http://scikit-learn.org/stable/auto_examples/svm/plot_custom_kernel.html#example-svm-plot-custom-kernel-py
With your knowledge of ML nomenclature, and example data in hand, you may want to consider posting the question to Cross Validated, the statistics Stack Exchange.
EDIT 1: Thinking about this problem more you need to really understand if your features and corresponding labels are independent and identically distributed (IID) or not. For example what if you were modelling how forest fires spread over time. It's clear that the likelihood of a given zone catches fire is contingent on its neighbours being on fire or not. AFAIK SVM and kNN assume the data is IID. At this point I'm starting to get out of my depth, but I think you should at least give several ML methods a shot and see what happens! Remember to cross-validate! (scikit-learn does this for you).
I am trying to devise an iterative markov decision process (MDP) agent in Python with the following characteristics:
observable state
I handle potential 'unknown' state by reserving some state space
for answering query-type moves made by the DP (the state at t+1 will
identify the previous query [or zero if previous move was not a query]
as well as the embedded result vector) this space is padded with 0s to
a fixed length to keep the state frame aligned regardless of query
answered (whose data lengths may vary)
actions that may not always be available at all states
reward function may change over time
policy convergence should incremental and only computed per move
So the basic idea is the MDP should make its best guess optimized move at T using its current probability model (and since its probabilistic the move it makes is expectedly stochastic implying possible randomness), couple the new input state at T+1 with the reward from previous move at T and reevaluate the model. The convergence must not be permanent since the reward may modulate or the available actions could change.
What I'd like to know is if there are any current python libraries (preferably cross-platform as I necessarily change environments between Windoze and Linux) that can do this sort of thing already (or may support it with suitable customization eg: derived class support that allows redefining say reward method with one's own).
I'm finding information about on-line per-move MDP learning is rather scarce. Most use of MDP that I can find seems to focus on solving the entire policy as a preprocessing step.
Here is a python toolbox for MDPs.
Caveat: It's for vanilla textbook MDPs and not for partially observable MDPs (POMDPs), or any kind of non-stationarity in rewards.
Second Caveat: I found the documentation to be really lacking. You have to look in the python code if you want to know what it implements or you can quickly look at their documentation for a similar toolbox they have for MATLAB.