Hey guys so I want to write a function that performs a z-score transformation to a single column in a 2d array and then return an array where the specified column is "transformed" and the other columns remain the same. So the way I went about this is first I deleted the column that I want to transform using np.delete(), then performed the transformation, and then finally added the array with the deleted column and the transformed column using np.insert(). However all the elements in the transformed column is all 0. What can I do??
I have attached an image so you can view the incorrect output as well.
x1 = np.array([[4,3,12],[1,5,20],[1,2,3],[10,20,40],[7,2,44]])
def myfunc(array, scalar):
total_result = np.delete(array, scalar, axis =1)
z_score = ((array - array.mean())/array.std())[:,1]
answer = np.insert(total_result, scalar, z_score, axis=1)
return answer
myfunc(x1, 1)
Your array is of type integer, and your z-score is float. When you insert float into an integer array, it converts it to integer, hence all 0. You need to convert your array into float first. Also, deleting/inserting is not the right way to do it, simply assign your new values to your desired column. No need for delete/insert. Here is how to do it:
def myfunc(array, scalar):
z_score = ((array - array.mean())/array.std())[:,scalar]
array[:,scalar] = z_score
return array
x1 = x1.astype(np.float64, copy=False)
myfunc(x1, 1)
output:
[[ 4. -0.64344154 12. ]
[ 1. -0.49380397 20. ]
[ 1. -0.71826033 3. ]
[10. 0.62847778 40. ]
[ 7. -0.71826033 44. ]]
Related
I have a 1x10 dataframe where the name of each column is a string in a list.
I have a 1x10 row vector with values.
I would like to integrate this vector in the dataframe, so that I have the names in the list as column names, and the values of the vector in a single row.
How could I do that ? The only way I found was appending eveything into one column, and rename the index with the names in my list, but I want a 1x10 dataframe instead of a 10x1.
names = ['agmi', 'aglo', 'mwmi', 'mcha', 'mcmi', 'mclo', 'bkdr', 'mchi', 'melo', 'mwlo']
alt_GPS = np.array([[ 0. , 0. , 0.85253906, 6.12797546, 5.49960327,
11.00892639, 0. , 2.08251953, 0. , 2.4508667 ]])
alt = pd.DataFrame(columns=names)
Your alt_GPS array is already 2D, so isn't simply this what you want:
alt = pd.DataFrame(alt_GPS, columns=names)
Output:
agmi aglo mwmi mcha mcmi mclo bkdr mchi melo mwlo
0 0.0 0.0 0.852539 6.127975 5.499603 11.008926 0.0 2.08252 0.0 2.450867
If you want to create the dataframe first and add afterwards:
alt = pd.DataFrame(columns=names)
alt.loc[0] = alt_GPS[0]
I have the following output sample:
[[-5.53759409e-01 -2.68382610e-01 4.06747784e+00]
[-1.66055379e+00 -8.08889466e-01 7.06720368e+01]
[ 2.92172488e-01 8.17347290e-01 3.18001189e+00]
[ 1.89072607e+00 -6.68502526e-01 9.08233869e+01]
[-1.31451627e+00 1.61831269e+00 5.41709058e+00]
[ 1.15886824e+00 3.31177259e-01 5.14391851e+00]
[ 1.87270676e+00 1.24100260e+00 2.64360316e+01]
[ 1.93323801e+00 -5.64255644e-02 7.28368451e+01]
[ 1.33014215e+00 1.96282476e+00 2.96295301e-01]]
The minimum function value at generation 10 is [0.2962953]
I have concatenated two arrays - the coordinate array (elements 0 and 1) and the function values (element 2) to form the above array.
However, I would like to not only display the minimum function value e.g 0.2962953 but also the coordinates associated with it, hence the row of the above array.
Any ideas how I would approach this?
In this case, I would need the bottom row of the above array and a way to highlight the coordinates and function value.
Problem fixed! Just used: printValues = array[np.argmin(array[:, 2]), (0,1)]
why a[:,[x]] could create a column vector from an array? The [ ] represents what?
Could anyone explain to me the principle?
a = np.random.randn(5,6)
a = a.astype(np.float32)
print(a)
c = torch.from_numpy(a[:,[1]])
[[-1.6919796 0.3160475 0.7606999 0.16881375 1.325092 0.71536326]
[ 1.217861 0.35804042 0.0285245 0.7097111 -2.1760604 0.992101 ]
[-1.6351479 0.6607222 0.9375339 0.5308735 -1.9699149 -2.002803 ]
[-1.1895325 1.1744579 -0.5980689 -0.8906375 -0.00494479 0.51751447]
[-1.7642071 0.4681248 1.3938268 -0.7519176 0.5987852 -0.5138923 ]]
###########################################
tensor([[0.3160],
[0.3580],
[0.6607],
[1.1745],
[0.4681]])
The [ ] mean you are giving extra dimension. Try numpy shape method to see the diference.
a[:,1].shape
output :
(10,)
with [ ]
a[:,[1]].shape
output :
(10,1)
That syntax is for array slicing in numpy, where arrays are indexed as a[rows, columns, page, ... (higher-dimensions)]
Selecting for a specific row/column/page is done by giving a specific number or range of numbers. So when you use a[1,2], numpy gets the element from row 1, column 2.
You can select for several specific indices by giving the dimension multiple values. So a[[1,3],1] gets you both elements (1,1) and (1,3).
The : tells numpy to get everything from that specific array dimension. So when you use a[:,1], numpy gets every row in column 1. Alternatively, a[1,:] gets every column in row 1.
I have trouble figuring out what would be the most efficient way to do the following:
import numpy as np
M = 10
K = 10
ind = np.array([0,1,0,1,0,0,0,1,0,0])
full = np.random.rand(sum(ind),K)
output = np.zeros((M,K))
output[1,:] = full[0,:]
output[3,:] = full[1,:]
output[7,:] = full[2,:]
I want to build output, which is a sparse matrix, whose rows are given in a dense matrix (full) and the row indices are specified through a binary vector.
Ideally, I want to avoid a for-loop. Is that possible? If not, I'm looking for the most efficient way to for-loop this.
I need to perform this operation quite a few times. ind and full will keep changing, hence I've just provided some exemplar values for illustration.
I expect ind to be pretty sparse (at most 10% ones), and both M and K to be large numbers (10e2 - 10e3). Ultimately, I might need to perform this operation in pytorch, but some decent procedure for numpy, would already get me quite far.
Please also help me find a more appropriate title for the question, if you have one or more appropriate categories for this question.
Many thanks,
Max
output[ind.astype(bool)] = full
By converting the integer values in ind to boolean values, you can do boolean indexing to select the rows in output that you want to populate with values in full.
example with a 4x4 array:
M = 4
K = 4
ind = np.array([0,1,0,1])
full = np.random.rand(sum(ind),K)
output = np.zeros((M,K))
output[ind.astype(bool)] = full
print(output)
[[ 0. 0. 0. 0. ]
[ 0.32434109 0.11970721 0.57156261 0.35839647]
[ 0. 0. 0. 0. ]
[ 0.66038644 0.00725318 0.68902177 0.77145089]]
I have a matrix tempsyntheticGroup2 with 6 columns. I want to change the value of columns (0,1,2,3,5) from float to int. This is my code:
tempsyntheticGroup2=tempsyntheticGroup2[:,[0,1,2,3,5]].astype(int)
but it doesn't work properly and I loose the other columns.
I don't think you can have a numpy array with some element that are ints, and some that are floats (there is only one possible dtype per array). But if you just want to round to lower integer (while keeping all elements as floats) you can do this:
# define dummy example matrix
t = np.random.rand(3,4) + np.arange(12).reshape((3,4))
array([[ 0.68266426, 1.4115732 , 2.3014562 , 3.5173022 ],
[ 4.52399807, 5.35321628, 6.95888015, 7.17438118],
[ 8.97272076, 9.51710983, 10.94962065, 11.00586511]])
# round some columns to lower int
t[:,[0,2]] = np.floor(t[:,[0,2]])
# or
t[:,[0,2]] = t[:,[0,2]].astype(int)
array([[ 0. , 1.4115732 , 2. , 3.5173022 ],
[ 4. , 5.35321628, 6. , 7.17438118],
[ 8. , 9.51710983, 10. , 11.00586511]])
otherwise you probably need to split your original array into 2 different arrays, with one containing the column that stay floats, the other containing the column that become ints.
t_int = t[:,[0,2]].astype(int)
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
t_float = t[:,[1,3]]
array([[ 1.4115732 , 3.5173022 ],
[ 5.35321628, 7.17438118],
[ 9.51710983, 11.00586511]])
Note that you'll have to change your indexing accordingly to access your elements...
I think you use wrong syntax to get column data.
read this article.
How do you extract a column from a multi-dimensional array?