structuring data in numpy for ltsm (examples) - python

I am having problem with understanding how data should be prepared for different models:
One to many
Many to one
Many to many(A)
Many to many(B)
Is the right way to think o it this way. Shape numbers are no relevant and do not match the one on picture. I am just trying to understand logic behind.:
import numpy as np
#1. one to many
# X for input y for output
X = np.ones([10,1,5])
y = np.zeros([10,3]) #3 represnts size of output vector
#2. many to one
X = np.ones([10,5,5])
y = np.zeros([10,1])
#3. many to many
X = np.ones([10,5,5])
y = np.zeros([10,5])
# in this case cell should be different than y. It must be bigger to shift some data
#4. many to many
X = np.ones([10,5,5])
y = np.zeros([10,5])
# in this case cell is the same shape as y

Related

Re-distributing 2d data with max in middle

Hey all I have a set up seemingly random 2D data that I want to reorder. This is more for an image with specific values at each pixel but the concept will be the same.
I have large 2d array that looks very random, say:
x = 100
y = 120
np.random.random((x,y))
and I want to re-distribute the 2d matrix so that the maximum value is in the center and the values from the maximum surround it giving it sort of a gaussian fall off from the center.
small example:
output = [[0.0,0.5,1.0,1.0,1.0,0.5,0.0]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.5,1.0,1.5,2.0,1.5,1.0,0.5]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.0,0.5,1.0,1.0,1.0,0.5,0.0]]
I know it wont really be a gaussian but just trying to give a visualization of what I would like. I was thinking of sorting the 2d array into a list from max to min and then using that to create a new 2d array but Im not sure how to distribute the values down to fill the matrix how I want.
Thank you very much!
If anyone looks at this in the future and needs help, Here is some advice on how to do this effectively for a lot of data. Posted below is the code.
def datasort(inputarray,spot_in_x,spot_in_y):
#get the data read
center_of_y = spot_in_y
center_of_x = spot_in_x
M = len(inputarray[0])
N = len(inputarray)
l_list = list(itertools.chain(*inputarray)) #listed data
l_sorted = sorted(l_list,reverse=True) #sorted listed data
#Reorder
to_reorder = list(np.arange(0,len(l_sorted),1))
x = np.linspace(-1,1,M)
y = np.linspace(-1,1,N)
centerx = int(M/2 - center_of_x)*0.01
centery = int(N/2 - center_of_y)*0.01
[X,Y] = np.meshgrid(x,y)
R = np.sqrt((X+centerx)**2 + (Y+centery)**2)
R_list = list(itertools.chain(*R))
values = zip(R_list,to_reorder)
sortedvalues = sorted(values)
unzip = list(zip(*sortedvalues))
unzip2 = unzip[1]
l_reorder = zip(unzip2,l_sorted)
l_reorder = sorted(l_reorder)
l_unzip = list(zip(*l_reorder))
l_unzip2 = l_unzip[1]
sorted_list = np.reshape(l_unzip2,(N,M))
return(sorted_list)
This code basically takes your data and reorders it in a sorted list. Then zips it together with a list based on a circular distribution. Then using the zip and sort commands you can create the distribution of data you wish to have based on your distribution function, in my case its a circle that can be offset.

ValueError: x and y must have same first dimension, but have shapes (101,) and (1,) [duplicate]

This question already has answers here:
Plotting: ValueError: x and y must have same first dimension
(2 answers)
Closed 1 year ago.
enter image description hereI am new in coding and in using JupyterNotebook and I wanted to ask how will I graph x(as any time t)=(0,10,101) and y(as acceleration)=-2.2 . those are the values given to us by our professor but when I try to plot, it gives me an error and it says that ValueError: x and y must have same first dimension, but have shapes (101,) and (1,). thank you.
Your description wasn't clear, I highly suggest next time you post to provide an example of the code that you are facing a problem. Have a look at how others frame their questions. Anyways I will try my best to help you.
We know that:
x = 0.5at^2 +V0t
Where:
x: position
a: acceleration
V0: initial velocity
t: time
In real life time is continous, however having an absolutley continous variable in programming is impossible, therefore the next best thing to do is use a range with a very small step size.
Let's start with assuming that the initial velocity is zero --> x = 0.5at*t
Now that we have simplified the equation let's tackle the problem of time.
import numpy as np
import matplotlib.pyplot as plt
# acceleration is a constant variable
a = -2.2
# get array for the time
t = np.arange(0,10,0.1)
# calculate position at each time and store in array
x = 0.5*a*t*t
plt.plot(t,x)
plt.show()
out:
[]
Above we calculated each value of x for the list of values in time, as you can see, in order to plot the values of position vs time, their the lengths of the arrays need to be the same. we can check the lengths of the arrays using the len function:
print(f"length of time: {len(t)} ")
print(f"length of position: {len(x)}" )
out:
length of time: 100
length of position: 100
Here are some sources to help you get started with learning python:
Great free Course covering all the basics by Microsoft
List Comprehension
Functions in python
Some channels on Youtube that I recommend:
Real Python
Corey Schafer
DataCamp
When you want to plot x versus y data you need to have matching shapes for x and y data.
So in order to plot horziontal line at y = -2.2 for x from 0 to 10 with 101 points instead of
y = (-2.2)
You need to use
y = np.full(101, -2.2)
Or better
y = np.full(x.shape, -2.2)
So that y would be of shape 101 matching x shape
Use this:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,10,101)
y = np.repeat(-2.2,101) # map y constant value
plt.plot(x,y)
plt.show()

Using Machine Learning in Python to load custom datasets?

Here's the problem:
It take 2 variable inputs, and predict a result.
For example: price and volume as inputs and a decision to buy/sell as a result.
I tried implementing this using K-Neighbors with no success. How would you go about this?
X = cleanedData['ES1 End Price'] #only accounts for 1 variable, don't know how to use input another.
y = cleanedData["Result"]
print(X.shape, y.shape)
kmm = KNeighborsClassifier(n_neighbors = 5)
kmm.fit(X,y) #ValueError for size inconsistency, but both are same size.
Thanks!
X needs to be a matrix/2d array where each column stands for a feature, which doesn't seem true from your code, try reshape X to 2d with X[:,None]:
kmm.fit(X[:,None], y)
Or without resorting to reshape, you'd better always use a list to extract features from a data frame:
X = cleanedData[['ES1 End Price']]
OR with more than one columns:
X = cleanedData[['ES1 End Price', 'volume']]
Then X would be a 2d array, and can be used directly in fit:
kmm.fit(X, y)

Printing remaining features in Feature Reduction

I am running a feature reduction (from 500 to around 30) for a randomforest classifier algo. I can reduce the number of features, but I want to see what features are left at every point in the reduction.As you can see below, I have made an attempt, but does not work.
X does not contain the ColumnNames. Ideally, it could be possible to also have the columnnames in X but only fit from row, then printing X would be possible I think.
I am sure there is a much better way though...
Anybody know how to do this?
FEATURES = []
readThisFile = r'C:\ManyFeatures.txt'
featuresFile = open(readThisFile)
AllFeatures = featuresFile.read()
FEATURES = AllFeatures.split('\n')
featuresFile.close()
Location = r'C:\MASSIVE.xlsx'
data = pd.read_excel(Location)
X = np.array(data[FEATURES])
y = data['_MiniTARGET'].values
for x in range(533, 10,-100):
X = SelectKBest(f_classif, k=x).fit_transform(X, y)
#U=pd.DataFrame(X)
#print (U.feature_importances_)

combining/merging multiple 2d arrays into single array by using python

I have four 2 dimensional np arrays. Shape of each array is (203 , 135). Now I want join all these arrays into one single array with respect to latitude and longitude.
I have used code below to read data
import pandas as pd
import numpy as np
import os
import glob
from pyhdf import SD
import datetime
import mpl_toolkits.basemap.pyproj as pyproj
DATA = ({})
files = glob.glob('MOD04*')
files.sort()
for n, f in enumerate(files):
SDS_NAME='Deep_Blue_Aerosol_Optical_Depth_550_Land'
hdf=SD.SD(f)
lat = hdf.select('Latitude')
latitude = lat[:]
min_lat=latitude.min()
max_lat=latitude.max()
lon = hdf.select('Longitude')
longitude = lon[:]
min_lon=longitude.min()
max_lon=longitude.max()
sds=hdf.select(SDS_NAME)
data=sds.get()
p = pyproj.Proj(proj='utm', zone=45, ellps='WGS84')
x,y = p(longitude, latitude)
def set_element(elements, x, y, data):
# Set element with two coordinates.
elements[x + (y * 10)] = data
elements = []
set_element(elements,x,y,data)
But I got error: only integer arrays with one element can be converted to an index
you can find the data: https://drive.google.com/open?id=0B2rkXkOkG7ExMElPRDd5YkNEeDQ
I have created toy datasets for this problem as per requested.
what I want is to get one single array from four (a,b,c,d) arrays. whose dimension should be something like (406, 270)
a = (np.random.rand(27405)).reshape(203,135)
b = (np.random.rand(27405)).reshape(203,135)
c = (np.random.rand(27405)).reshape(203,135)
d = (np.random.rand(27405)).reshape(203,135)
a_x = (np.random.uniform(10,145,27405)).reshape(203,135)
a_y = (np.random.uniform(204,407,27405)).reshape(203,135)
d_x = (np.random.uniform(150,280,27405)).reshape(203,135)
d_y = (np.random.uniform(204,407,27405)).reshape(203,135)
b_x = (np.random.uniform(150,280,27405)).reshape(203,135)
b_y = (np.random.uniform(0,202,27405)).reshape(203,135)
c_x = (np.random.uniform(10,145,27405)).reshape(203,135)
c_y = (np.random.uniform(0,202,27405)).reshape(203,135)
any help?
This should be a comment, yet the comment space is not enough for these questions. Therefore I am posting here:
You say that you have 4 input arrays (a,b,c,d) which are somehow to be intergrated into an output array. As far as is understood, two of these arrays contain positional information (x,y) such as longitude and latitude. The only line in your code, where you combine several input arrays is here:
def set_element(elements, x, y, data):
# Set element with two coordinates.
elements[x + (y * 10)] = data
Here you have four input variables (elements, x, y, data) which I assume to be your input arrays (a,b,c,d). In this operation yet you do not combine them, but you overwrite an element of elements (index: x + 10y) with a new value (data).
Therefore, I do not understand your target output.
When I was asking for toy data, I had something like this in mind:
a = [[1,2]]
b = [[3,4]]
c = [[5,6]]
d = [[7,8]]
This would be such an easy example that you could easily say:
What I want is this:
res = [[[1,2],[3,4]],[[5,6],[7,8]]]
Then we could help you to find an answer.
Please, thus, provide more information about the operation that you want to conduct either mathematically notated ( such as x = a +b*c +d) or with toy data so that we can deduce the function you ask for.

Categories