Related
I have a set of data like this:
numpy.array([[3, 7],[5, 8],[6, 19],[8, 59],[10, 42],[12, 54], [13, 32], [14, 19], [99, 19]])
which I want to split into number of chunkcs with a percantage of overlapping, for each column separatly... for example for column 1, splitting into 3 chunkcs with %50 overlapping (results in a 2-d array):
[[3, 5, 6, 8,],
[6, 8, 10, 12,],
[10, 12, 13, 14,]]
(ignoring last row which will result in [13, 14, 99] not identical in size as the rest).
I'm trying to make a function that takes the array, number of chunkcs and overlpapping percantage and returns the results.
That's a window function, so use skimage.util.view_as_windows:
from skimage.util import view_as_windows
out = view_as_windows(in_arr[:, 0], window_shape = 4, step = 2)
If you need numpy only, you can use this recipe
For numpy only, quite fast approach is:
def rolling(a, window, step):
shape = ((a.size - window)//step + 1, window)
strides = (step*a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
And you can call it like so:
rolling(arr[:,0].copy(), 4, 2)
Remark: I've got unexpected outputs for rolling(arr[:,0], 4, 2) so just took a copy instead.
So I have lots of data in a single, flat array that is grouped into irregularly sized chunks. The sizes of these chunks are given in another array. What I need to do is rearrange the chunks based on a third index array (think fancy indexing)
These chunks are always >= 3 long, usually 4, but technically unbounded, so it's not feasible to pad up to a max length and mask. Also, due to technical reasons I only have access to numpy, so nothing like scipy or pandas.
Just to be easier to read, the data in this example is easily grouped. In the real data, the numbers can be anything and do not follow this pattern.
[EDIT] Updated with less confusing data
data = np.array([1,2,3,4, 11,12,13, 21,22,23,24, 31,32,33,34, 41,42,43, 51,52,53,54])
chunkSizes = np.array([4, 3, 4, 4, 3, 4])
newOrder = np.array([0, 5, 4, 5, 2, 1])
The expected output in this case would be
np.array([1,2,3,4, 51,52,53,54, 41,42,43, 51,52,53,54, 21,22,23,24, 11,12,13])
Since the real data can be millions long, I'm hoping for some kind of numpy magic that can do this without python loops.
Approach #1
Here's a vectorized one based on creating a regular array and masking -
def chunk_rearrange(data, chunkSizes, newOrder):
m = chunkSizes[:,None] > np.arange(chunkSizes.max())
d1 = np.empty(m.shape, dtype=data.dtype)
d1[m] = data
return d1[newOrder][m[newOrder]]
Output for given sample -
In [4]: chunk_rearrange(data, chunkSizes, newOrder)
Out[4]: array([0, 0, 0, 0, 5, 5, 5, 5, 4, 4, 4, 5, 5, 5, 5, 2, 2, 2, 2, 1, 1, 1])
Approach #2
Another vectorized one based on cumsum and with smaller footprint for those very-ragged chunksizes -
def chunk_rearrange_cumsum(data, chunkSizes, newOrder):
# Setup ID array that will hold specific values at those interval starts,
# such that a final cumsum would lead us to the indices which when indexed
# by the input array gives us the re-arranged o/p
idar = np.ones(len(data), dtype=int)
# New chunk lengths
newlens = chunkSizes[newOrder]
# Original chunk intervals
c = np.r_[0,chunkSizes[:-1].cumsum()]
# Indices from original order that form the interval starts in new arrangement
d1 = c[newOrder]
# Starts of chunks in new arrangement where those from d1 are to be assigned
c2 = np.r_[0,newlens[:-1].cumsum()]
# Offset required for the starts in new arrangement for final cumsum to work
diffs = np.diff(d1)+1-np.diff(c2)
idar[c2[1:]] = diffs
idar[0] = d1[0]
# Final cumsum and indexing leads to desired new arrangement
out = data[idar.cumsum()]
return out
You can use np.split to create views into your data array corresponding to the chunkSizes, if you build up the indices with np.cumsum. You can then reorder the views according to the newOrder indices using fancy indexing. This should be reasonably efficient since the data is only copied to the new array when you call np.concatenate on the reordered views:
import numpy as np
data = np.array([0,0,0,0, 1,1,1, 2,2,2,2, 3,3,3,3, 4,4,4, 5,5,5,5])
chunkSizes = np.array([4, 3, 4, 4, 3, 4])
newOrder = np.array([0, 5, 4, 5, 2, 1])
cumIndices = np.cumsum(chunkSizes)
splitArray = np.array(np.split(data, cumIndices[:-1]))
targetArray = np.concatenate(splitArray[newOrder])
# >>> targetArray
# array([0, 0, 0, 0, 5, 5, 5, 5, 4, 4, 4, 5, 5, 5, 5, 2, 2, 2, 2, 1, 1, 1])
i'm starting to learn GEKKO. Now, I am solving a knapsak problem to learn, but this time I get the error "int 'object is not subscriptable". can you look at this code? what is the source of the problem How should I define the 1.10 matrices?
from gekko import GEKKO
import numpy as np
m = GEKKO(remote=False)
x = m.Var((10),lb=0,ub=1,integer=True)
#x = m.Array(m.Var,(1,10),lb=0,ub=1,integer=True)
v=np.array([2, 2, 7, 8, 2, 1, 7, 9, 4, 10])
w=np.array([2, 2, 2, 2, 2, 1, 6, 7, 3, 3])
capacity=16
for j in range(10):
m.Maximize(v[j]*x[j])
for i in range(10):
m.Equation(m.sum(x[i]*w[i])<=capacity)
m.options.solver = 1
m.solve()
#print('Objective Function: ' + str(m.options.objfcnval))
print(x)
My second question is that there is a function called "showproblem ()" in MATLAB. Does GEKKO have this function?
thanks for help.
new question that according to answer.
can i write here this style(that doesnt work, if i can do it, please write working style)(i want to write this style, because i think this style is easier to understand.),
for i in range(10):
xw = x[i]*w[i]
m.Equation(m.sum(xw)<=capacity)
instead of this.
xw = [x[i]*w[i] for i in range(10)]
m.Equation(m.sum(xw)<=capacity)
Here is a modified version that solves the mixed integer problem in gekko.
from gekko import GEKKO
import numpy as np
m = GEKKO(remote=False)
x = m.Array(m.Var,10,lb=0,ub=1,integer=True)
v=np.array([2, 2, 7, 8, 2, 1, 7, 9, 4, 10])
w=np.array([2, 2, 2, 2, 2, 1, 6, 7, 3, 3])
capacity=16
for j in range(10):
m.Maximize(v[j]*x[j])
xw = [x[i]*w[i] for i in range(10)]
m.Equation(m.sum(xw)<=capacity)
m.options.solver = 1
m.solve()
print('Objective Function: ' + str(-m.options.objfcnval))
print(x)
Your problem formulation was close. You just needed to define a list xw that you use to form the capacity constraint.
If you want to use a loop instead of a list comprehension then I recommend the following instead of xw = [x[i]*w[i] for i in range(10)].
xw = []
for i in range(10):
xw.append(x[i]*w[i])
I am trying to get median of each row of 2D torch.tensor. But the result is not what I expect when compared to working with standard array or numpy
import torch
import numpy as np
from statistics import median
print(torch.__version__)
>>> 0.4.1
y = [[1, 2, 3, 5, 9, 1],[1, 2, 3, 5, 9, 1]]
median(y[0])
>>> 2.5
np.median(y,axis=1)
>>> array([2.5, 2.5])
yt = torch.tensor(y,dtype=torch.float32)
yt.median(1)[0]
>>> tensor([2., 2.])
Looks like this is the intended behaviour of Torch as mentioned in this issue
https://github.com/pytorch/pytorch/issues/1837
https://github.com/torch/torch7/pull/182
The reasoning as mentioned in the link above
Median returns 'middle' element in case of odd-many elements, otherwise one-before-middle element (could also do the other convention to take mean of the two around-the-middle elements, but that would be twice more expensive, so I decided for this one).
You can emulate numpy median with pytorch:
import torch
import numpy as np
y =[1, 2, 3, 5, 9, 1]
print("numpy=",np.median(y))
print(sorted([1, 2, 3, 5, 9, 1]))
yt = torch.tensor(y,dtype=torch.float32)
ymax = torch.tensor([yt.max()])
print("torch=",yt.median())
print("torch_fixed=",(torch.cat((yt,ymax)).median()+yt.median())/2.)
I am plotting two lists using matplotlib python library. There are two arrays x and y which look like this when plotted-
Click here for plot (sorry don't have enough reputation to post pictures here)
The code used is this-
import matplotlib.pyplot as plt
plt.plot(x,y,"bo")
plt.fill(x,y,'#99d8cp')
It plots the points then connects the points using a line. But the problem is that it is not connecting the points correctly. Point 0 and 2 on x axis are connected wrongly instead of 1 and 2. Similarly on the other end it connects points 17 to 19, instead of 18 to 19. I also tried plotting simple line graph using-
plt.plot(x,y)
But then too it wrongly connected the points. Would really appreciated if anyone could point me in right direction as to why this is happening and what can be done to resolve it.
Thanks!!
The lines of matplotlib expects that the coordinates are in order, therefore you are connecting your points in a 'strange' way (although exactly like you told matplotlib to do, e.g. from (0,1) to (3,2)). You can fix this by simply sorting the data prior to plotting.
#! /usr/bin/env python
import matplotlib.pyplot as plt
x = [20, 21, 22, 23, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]
y = [ 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1]
x2,y2 = zip(*sorted(zip(x,y),key=lambda x: x[0]))
plt.plot(x2,y2)
plt.show()
That should give you what you want, as shown below: