Create a 2-dimensional NumPy array with 1 row and 2 columns - python

Is it possible to create a 2-dimensional NumPy array with 1 row and 2 columns (row vector)?
This is what I'm doing (from the documentation), but I'd like to know if it's possible to do it in one (easier) step:
X_new2 = np.array([8.5,156])
X_new2 = X_new2[np.newaxis, :]
I've also tried:
X_new2 = np.array([[8.5], [156]])
But this is returning a column instead.

You can use the following syntax to achieve the same result as in your example:
X_new2 = np.array([[8.5,156]])
(Notice the extra [ and ] to make the array the correct shape.)

try this:
y = np.expand_dims(x, axis=0)
print(y.shape)

Related

How to get specific index of np.array of np.arrays fast

At the most basic I have the following dataframe:
a = {'possibility' : np.array([1,2,3])}
b = {'possibility' : np.array([4,5,6])}
df = pd.DataFrame([a,b])
This gives me a dataframe of size 2x1:
like so:
row 1: np.array([1,2,3])
row 2: np.array([4,5,6])
I have another vector of length 2. Like so:
[1,2]
These represent the index I want from each row.
So if I have [1,2] I want: from row 1: 2, and from row 2: 6.
Ideally, my output is [2,6] in a vector form, of length 2.
Is this possible? I can easily run through a for loop, but am looking for FAST approaches, ideally vectors approaches since it is already in pandas/numpy.
For actual use case approximations, I am looking to make this work in the 300k-400k row ranges. And need to run it in optimization problems (hence the fast part)
You could transform to a multi-dimensional numpy array and take_along_axis:
v = np.array([1,2])
a = np.vstack(df['possibility'])
np.take_along_axis(a.T, v[None], axis=0)[0]
output: array([2, 6])

Add data from one column to another column on every other row

I have two data frames:
import pandas as pd
import numpy as np
sgRNA = pd.Series(["ABL1_sgABL1_130854834","ABL1_sgABL1_130862824","ABL1_sgABL1_130872883","ABL1_sgABL1_130884018"])
sequence = pd.Series(["CTTAGGCTATAATCACAATG","GGTTCATCATCATTCAACGG","TCAGTGATGATATAGAACGG","TTGCTCCCTCGAAAAGAGCG"])
df1=pd.DataFrame(sgRNA,columns=["sgRNA"])
df1["sequence"]=sequence
df2=pd.DataFrame(columns=["column"],
index=np.arange(len(df1) * 2))
I want to add values from both columns from df1 to df2 every other row, like this:
ABL1_sgABL1_130854834
CTTAGGCTATAATCACAATG
ABL1_sgABL1_130862824
GGTTCATCATCATTCAACGG
ABL1_sgABL1_130872883
TCAGTGATGATATAGAACGG
ABL1_sgABL1_130884018
TTGCTCCCTCGAAAAGAGCG
To do this for df1["sgRNA"] I used this code:
df2.iloc[0::2, :]=df1["sgRNA"]
But I get this error:
ValueError: could not broadcast input array from shape (4,) into shape (4,1).
What am I doing wrong?
I think you're looking for DataFrame.stack():
df2["column"] = df1.stack().reset_index(drop=True)
print(df2)
Prints:
column
0 ABL1_sgABL1_130854834
1 CTTAGGCTATAATCACAATG
2 ABL1_sgABL1_130862824
3 GGTTCATCATCATTCAACGG
4 ABL1_sgABL1_130872883
5 TCAGTGATGATATAGAACGG
6 ABL1_sgABL1_130884018
7 TTGCTCCCTCGAAAAGAGCG
Besides Andrej Kesely's superior solution, to answer the question of what went wrong in the code, it's really minor:
df1["sgRNA"] is a series, one-dimensional, while df2.iloc[0::2, :] is
a dataframe, two-dimensional.
The solution would be to make the "df2" part one-dimensional by selecting the
one and only column, instead of selecting a slice of "all one columns", so to
say:
df2.iloc[0::2, 0] = df1["sgRNA"]

Efficient way to remove sections of Numpy array

I am working with a numpy array of features in the following format
[[feat1_channel1,feat2_channel1...feat6_channel1,feat1_channel2,feat2_channel2...]] (so each channel has 6 features and the array shape is 1 x (number channels*features_per_channel) or 1 x total_features)
I am trying to remove specified channels from the feature array, ex: removing channel 1 would mean removing features 1-6 associated with channel 1.
my current method is shown below:
reshaped_features = current_feature.reshape((-1,num_feats))
desired_channels = np.delete(reshaped_features,excluded_channels,axis=0)
current_feature = desired_channels.reshape((1,-1))
where I reshape the array to be number_of_channels x number_of_features, remove the rows corresponding to the channels I want to exclude, and then reshape the array with the desired variables into the original format of being 1 x total_features.
The problem with this method is that it tremendously slows down my code because this process is done 1000s of times so I was wondering if there were any suggestions on how to speed this up or alternative approaches?
As an example, given the following array of features:
[[0,1,2,3,4,5,6,7,8,9,10,11...48,49,50,51,52,53]]
i reshape to below:
[[0,1,2,3,4,5],
[6,7,8,9,10,11],
[12,13,14,15,16,17],
.
.
.
[48,49,50,51,52,53]]
and, as an example, if I want to remove the first two channels then the resulting output should be:
[[12,13,14,15,16,17],
.
.
.
[48,49,50,51,52,53]]
and finally:
[[12,13,14,15,16,17...48,49,50,51,52,53]]
I found a solution that did not use np.delete() which was the main culprit of the slowdown, building off the answer from msi_gerva.
I found the channels I wanted to keep using list comp
all_chans = [1,2,3,4,5,6,7,8,9,10]
features_per_channel = 5
my_data = np.arange(len(all_chans)*features_per_channel)
chan_to_exclude = [1,3,5]
channels_to_keep = [i for i in range(len(all_chans)) if i not in chan_to_exclude]
Then reshaped the array
reshaped = my_data.reshape((-1,features_per_channel))
Then selected the channels I wanted to keep
desired_data = reshaped[channels_to_keep]
And finally reshaped to the desired shape
final_data = desired_data.reshape((1,-1))
These changes made the code ~2x faster than the original method.
With the numerical examples, you provided, I would go with:
import numpy as np
arrays = [ii for ii in range(0,54)];
arrays = np.reshape(arrays,(int(54/6),6));
newarrays = arrays.copy();
remove = [1,3,5];
take = [0,2,4,6,7,8];
arrays = np.delete(arrays,remove,axis=0);
newarrays = newarrays[take];
arrays = list(arrays.flatten());
newarrays = list(newarrays.flatten());

Find minimum values of numpy columns

Looking to print the minimum values of numpy array columns.
I am using a loop in order to do this.
The array is shaped (20, 3) and I want to find the min values of columns, starting with the first (i.e. col_value=0)
I have coded
col_value=0
for col_value in X:
print(X[:, col_value].min)
col_value += 1
However, it is coming up with an error
"arrays used as indices must be of integer (or boolean) type"
How do I fix this?
Let me suggest an alternative approach that you might find useful. numpy min() has axis argument that you can use to find min values along various
dimensions.
Example:
X = np.random.randn(20, 3)
print(X.min(axis=0))
prints numpy array with minimum values of X columns.
You don't need col_value=0 nor do you need col_value+=1.
x = numpy.array([1,23,4,6,0])
print(x.min())
EDIT:
Sorry didn't see that you wanted to iterate through columns.
import numpy as np
X = np.array([[1,2], [3,4]])
for col in X.T:
print(col.min())
Transposing the axis of the matrix is one the best solution.
X=np.array([[11,2,14],
[5,15, 7],
[8,9,20]])
X=X.T #Transposing the array
for i in X:
print(min(i))

How to declare and fill an array in NumPy?

I need to create an empty array in Python and fill it in a loop method.
data1 = np.array([ra,dec,[]])
Here is what I have. The ra and dec portions are from another array I've imported. What I am having trouble with is filling the other columns.
Example. Lets say to fill the 3rd column I do this:
for i in range (0,56):
data1[i,3] = 32
The error I am getting is:
IndexError: invalid index for the second line in the aforementioned
code sample.
Additionally, when I check the shape of the array I created, it will come out at (3,). The data that I have already entered into this is intended to be two columns with 56 rows of data.
So where am I messing up here? Should I transpose the array?
You could do:
data1 = np.zeros((56,4))
to get a 56 by 4 array. If you don't like to start the array with 0, you could use np.ones or np.empty or np.ones((56, 4)) * np.nan
Then, in most cases it is best not to python-loop if not needed for performance reasons.
So as an example this would do your loop:
data[:, 3] = 32
data1 = np.array([ra,dec,[32]*len(ra)])
Gives a single-line solution to your problem; but for efficiency, allocating an empty array first and then copying in the relevant parts would be preferable, so you avoid the construction of the dummy list.
One thing that nobody has mentioned is that in Python, indexing starts at 0, not 1.
This means that if you want to look at the third column of the array, you actually should address [:,2], not [:,3].
Good luck!
Assuming ra and dec are vectors (1-d):
data1 = np.concatenate([ra[:, None], dec[:, None], np.zeros((len(ra), 1))+32], axis=1)
Or
data1 = np.empty((len(ra), 3))
data[:, 0] = ra
data[:, 1] = dec
data[:, 2] = 32
hey guys if u want to fill an array with just the same number just
x_2 = np.ones((1000))+1
exemple for 1000 numbers 2

Categories