Shape manipulation of numpy array - python

The 'd' is given condition however it was obtained.
I want to get 'result' in the required shape.
I tried it as follows; but it's beyond my imagination.
import numpy as np
data = [np.ones((300,1)), np.ones((300,5)), np.ones((300,3))]
result = []
for d in data:
**print np.shape(np.array(d))**
result.append(d)
print np.shape(np.array(result))
The result should be in this shape:
(300, 1+5+3) = (300,9)
Can someone help me?
I got
ValueError: could not broadcast input array from shape (300,1) into shape (300)
EDIT:
data is just to make this question; it is just representation of my large program. given condition is d, which is a list but different shapes are list are generating from the for loop.

3 2d arrays the differ in the last dimension can be joined on that dimension
Np.concatenate(data, axis=1)
hstack does the same.
In my comment I suggested axis 0, but that was a quick response and I didn't a chance test it.
When you try ideas and they fail, show us what was wrong. You list a ValueError but don't show where that occurred. What operation.
Your comments make a big deal about d, but you don't show how d might differ from the elements of data.

You can also try numpy.column_stack, which essentially does numpy.concatenate under the hood.
Example use
In [1]: import numpy as np
In [2]: data = [np.ones((300,1)), np.ones((300,5)), np.ones((300,3))]
In [3]: out = np.column_stack(data)
In [4]: out.shape
Out[4]: (300, 9)

Your result is a Python list. In fact it is a list with the exact same contents as the original data. You are trying to concatenate arrays horizontally (along the second dimension), so you need to use numpy.hstack:
import numpy as np
data = []
for d in some_source:
data.append(d)
result = np.hstack(data)
print result.shape
If some_source is a list, a generator, or any other iterable, you can do this even more concisely:
result = np.hstack(some_source)

You want to stack the elements horizontally (if you imagine each element as a matrix with 300 rows and variable number of columns), i.e.
import numpy as np
data = [np.ones((300,1)), np.ones((300,5)), np.ones((300,3))]
result = np.hstack(data)
If you only have access to an iterator that generates elements d you can achieve the same as follows:
result = np.hstack([d for d in some_iterator_that_generates_your_ds])

try:
import numpy as np
data = [np.ones((300,1)), np.ones((300,5)), np.ones((300,3))]
result = []
print(len(data))
for d in data:
result.append(np.hstack(d))
print(result.shape)
This should do the job. You can also try:
import numpy as np
data = np.ones((300,1)), np.ones((300,5)), np.ones((300,3))
result = np.vstack(data[-1])
print(result.shape)
Both of which would yield (300, 3) as output.
If you're looking for (300, 9), you can do as follows:
result = np.hstack(data)
Finally, if you'd like your results to be in list() as opposed to numpy.array or numpy.matrix, you can just stick a .tolist() to the end, like so: result.tolist().

Related

How to concatenate numpy arrays to create a 2d numpy array

I'm working on using AI to give me better odds at winning Keno. (don't laugh lol)
My issue is that when I gather my data it comes in the form of 1d arrays of drawings at a time. I have different files that have gathered the data and formatted it as well as performed simple maths on the data set. Now I'm trying to get the data into a certain shape for my Neural Network layers and am having issues.
formatted_list = file.readlines()
#remove newline chars
formatted_list = list(filter(("\n").__ne__, formatted_list))
#iterate through each drawing, format the ends and split into list of ints
for i in formatted_list:
i = i[1:]
i = i[:-2]
i = [int(j) for j in i.split(",")]
#convert to numpy array
temp = np.array(i)
#t1 = np.reshape(temp, (-1, len(temp)))
#print(np.shape(t1))
#append to master list
master_list.append(temp)
print(np.shape(master_list))
This gives output of "(292,)" which is correct there are 292 rows of data however they contain 20 columns as well. If I comment in the "#t1 = np.reshape(temp, (-1, len(temp))) #print(np.shape(t1))" it gives output of "(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)", etc. I want all of those rows to be added together and keep the columns the same (292,20). How can this be accomplished?
I've tried reshaping the final list and many other things and had no luck. It either populates each number in the row and adds it to the first dimension, IE (5840,) I was expecting to be able to append each new drawing to a master list, convert to numpy array and reshape it to the 292 rows of 20 columns. It just appears that it want's to keep the single dimension. I've tried numpy.concat also and no luck. Thank you.
You can use vstack to concatenate your master_list.
master_list = []
for array in formatted_list:
master_list.append(array)
master_array = np.vstack(master_list)
Alternatively, if you know the length of your formatted_list containing the arrays and array length you can just preallocate the master_array.
import numpy as np
formatted_list = [np.random.rand(20)]*292
master_array = np.zeros((len(formatted_list), len(formatted_list[0])))
for i, array in enumerate(formatted_list):
master_array[i,:] = array
** Edit **
As mentioned by hpaulj in the comments, np.array(), np.stack() and np.vstack() worked with this input and produced a numpy array with shape (7,20).

Multidimensional numpy array appending with Python

In Python, I can concatenate two arrays like below,
myArray = []
myArray += [["Value1", "Value2"]]
I can then access the data by indexing the 2 dimensions of this array like so
print(myArray[0][0])
which will output:
Value1
How would I go about achieving the same thing with Numpy?
I tried the numpy append function but that only ever results in single dimensional arrays for me, no matter how many square brackets I put around the value I'm trying to append.
If you know the dimension of the final array then you could use np.vstack
>>> import numpy as np
>>> a = np.array([]).reshape(0,2)
>>> b = np.array([['Value1', 'Value2']])
>>> np.vstack([a,b])
array([['Value1', 'Value2']], dtype='<U32')

Efficient way to remove sections of Numpy array

I am working with a numpy array of features in the following format
[[feat1_channel1,feat2_channel1...feat6_channel1,feat1_channel2,feat2_channel2...]] (so each channel has 6 features and the array shape is 1 x (number channels*features_per_channel) or 1 x total_features)
I am trying to remove specified channels from the feature array, ex: removing channel 1 would mean removing features 1-6 associated with channel 1.
my current method is shown below:
reshaped_features = current_feature.reshape((-1,num_feats))
desired_channels = np.delete(reshaped_features,excluded_channels,axis=0)
current_feature = desired_channels.reshape((1,-1))
where I reshape the array to be number_of_channels x number_of_features, remove the rows corresponding to the channels I want to exclude, and then reshape the array with the desired variables into the original format of being 1 x total_features.
The problem with this method is that it tremendously slows down my code because this process is done 1000s of times so I was wondering if there were any suggestions on how to speed this up or alternative approaches?
As an example, given the following array of features:
[[0,1,2,3,4,5,6,7,8,9,10,11...48,49,50,51,52,53]]
i reshape to below:
[[0,1,2,3,4,5],
[6,7,8,9,10,11],
[12,13,14,15,16,17],
.
.
.
[48,49,50,51,52,53]]
and, as an example, if I want to remove the first two channels then the resulting output should be:
[[12,13,14,15,16,17],
.
.
.
[48,49,50,51,52,53]]
and finally:
[[12,13,14,15,16,17...48,49,50,51,52,53]]
I found a solution that did not use np.delete() which was the main culprit of the slowdown, building off the answer from msi_gerva.
I found the channels I wanted to keep using list comp
all_chans = [1,2,3,4,5,6,7,8,9,10]
features_per_channel = 5
my_data = np.arange(len(all_chans)*features_per_channel)
chan_to_exclude = [1,3,5]
channels_to_keep = [i for i in range(len(all_chans)) if i not in chan_to_exclude]
Then reshaped the array
reshaped = my_data.reshape((-1,features_per_channel))
Then selected the channels I wanted to keep
desired_data = reshaped[channels_to_keep]
And finally reshaped to the desired shape
final_data = desired_data.reshape((1,-1))
These changes made the code ~2x faster than the original method.
With the numerical examples, you provided, I would go with:
import numpy as np
arrays = [ii for ii in range(0,54)];
arrays = np.reshape(arrays,(int(54/6),6));
newarrays = arrays.copy();
remove = [1,3,5];
take = [0,2,4,6,7,8];
arrays = np.delete(arrays,remove,axis=0);
newarrays = newarrays[take];
arrays = list(arrays.flatten());
newarrays = list(newarrays.flatten());

Python “for” problem - Can only tuple-index with a MultiIndex

I wanna take array C2, size N,1, and make a array B, size N-1,1.
B[0] = C2[1]
B[1] = C2[2]
and so on. My code is:
import numpy as np
import pandas as pd
fields = "B:D"
data = pd.read_excel(r'C:\Users\file.xlsx', "Sheet2", usecols=fields)
N = 2
# Covariance calculation
C1 = data.cov() C2 = data.var()
B = np.zeros(shape=(N,1))
for i in B:
B[i,1] = C2[i+1,1]
But the error is:
ValueError: Can only tuple-index with a MultiIndex
I know it is a simple mistake, but cant find where :S (new python user)
First, are you sure you need to be using numpy arrays? This seems like a job for python lists.
Next, what do you mean to be doing with for i in B:? what type is i?
In this case, iterating over B is going to set i to [0.], and you can now see that the next line is going to fail in the substitution
B[[0.],i] = C2[[0.]+1,1]
In addition, the call to pd.var() returns a 1-d series, so the second index isn't doing anything.
I think you want to iterate over N like
for i in range(N):
B[i,1] = C2[i+1]

Selecting axis form multidimensional arrays with an array

I am trying to select a subset of a multidimensional array using another array, so for example, if I have:
a=np.linspace(1,30,30)
a=a.reshape(5,3,2)
I would like to take the subset [:,0,1], which I can do by saying
a_subset=a[:,0,1]
but, is there any way to define an array/list specifying that subset and then subtract it? The idea is to do something like:
b=[:,0,1]
a_subset=a[b]
which does not work as ":" is not accepted as item ("SyntaxError: invalid syntax")
You can do this using numpy.index_exp (docs) as follows:
import numpy as np
a = np.linspace(1, 30, 30)
a = a.reshape(5, 3, 2)
b = np.index_exp[:,0,1]
a_subset = a[b]

Categories