So I'm finished one part of this assignment I have to do. There's only one part of the assignment that doesn't make any sense to me.
I'm doing a LinearRegression model and according to others I need to apply ans[i,:] = y_poly at the very end, but I never got an answer as to why.
Can someone please explain to me what [i,:] means? I haven't found any explanations online.
It's specific to the numpy module, used in most data science modules.
ans[i,:] = y_poly
this is assigning a vector to a slice of numpy 2D array (slice assignment). Self-contained example:
>>> import numpy
>>> a = numpy.array([[0,0,0],[1,1,1]])
>>> a[0,:] = [3,4,5]
>>> a
array([[3, 4, 5],
[1, 1, 1]])
There is also slice assignment in base python, using only one dimension (a[:] = [1,2,3])
I guess you are also using numpy to manipulate data (as a matrix)?
If based on numpy, ans[i,:] means to pick the ith 'row' of ans with all of its 'columns'.
Note: when dealing with numpy arrays, we should (almost) always use [i, j] instead of [i][j]. This might be counter-intuitive if you've used Python or Java to manipulate matrixes before.
I think in this case [] means the indexing operator for a class object which can be used by defining the getitem method
class A:
def __getitem__(self, key):
pass
key can be literally anything. In your case "[1,:]" key is a tuple containing of "1" and a slice(None, None, None). Such a key can be useful if your class represents multi-dimensional data which you want to access via [] operator. A suggested by others answers this could be a numpy array:
Here is a quick example of how such a multi-dimensional indexing could work:
class A:
values = [[1,2,3,4], [4,5,6,7]]
def __getitem__(self, key):
i, j = key
if isinstance(i, int):
i = slice(i, i + 1)
if isinstance(j, int):
j = slice(j, j + 1)
for row in self.values[i]:
print(row[j])
>>>a = A()
>>>a[:,2:4]
[3, 4]
[6, 7]
>>>a[1,1]
[5]
>>>a[:, 2]
[3]
[6]
Related
What's an elegant way to map func over each value in a 2D list?
The best I have so far:
result = [[func(value) for value in row] for row in grid]
It's a shame I have to nest square brackets, and have a lot of list comprehensions. Is there a less verbose or more elegant way?
result = map2d(func, grid)
"It's a shame I have to nest square brackets" - No it's not, it's a Pythonic way to go and makes it clear that result is a list of lists (as long as you don't go overboard with even more nested list comprehensions). If you really don't want to see that in the code just wrap it in your own function:
def map2d(func, grid):
return [[func(value) for value in row] for row in grid]
result = map2d(func, grid)
I think your initial approach is obvious, readable, error free and short enough for the task at hand. It can be refactored to a "map2d" func, but then, just create this func and put that code in, like in: map2d= lambda func, data: [[func(element) for element in row] for row in data]
What might feel strange about nestign the brackets is that the "list inside list" approach for 2D data, although straightforward, have some drawbacks.
It is easy to have other forms of 2D data structures in Python, by implementing a new class with custom __getitem__, __setitem__ and __delitem__ methods that will give allow the syntax mydata[x, y]. And then, if you derive your class, for example, from collections.abc.Mapping, you can iterate over it, or over the return of mydata.itemsto reach all data points with a linear for loop.
another advantage is that a custom class can them know about its width X height and check boundaries - and just return a default value for unfilled elments, working as a sparse matrix.
Another approach is simply using a dictionary, with no class creation, for your 2D data, and retrieve data with the get method in order to have a default value at hand:
from random import randint
W, H = 10, 10
grid = dict()
# fill in data:
for x in range(W):
for y in range(H):
grid[x,y] = randint(0, 100)
# loop through all values:
for coord in grid:
print(grid[coord])
# change value:
grid[coord] = new_value(grid[coord])
This approach won't let you do "map" with a onelinet, but by explictly calling in grid.__setitem__ though. By using a custom class, like suggested above, you can simply add a .map method that will do that.
You could use mal twice, not sure though if this is more elegant as compared to your solution
result = list(map(lambda row: list(map(func, row)), grid))
There's nothing wrong about the nested listcomp, it is my preferred solution. But you could "vectorize" the function you want to apply.
def vectorize(f):
def f_vectorized(args): # iterable args, check against str in real code
return [f(x) for x in args]
return f_vectorized
func_vec = vectorize(func)
result = list(map(func_vec, grid))
Demo:
>>> def func(x):
... return x + 1
>>> grid = [[1, 2], [3, 4]]
>>> func_vec = vectorize(func)
>>> func_vec_vec = vectorize(func_vec)
>>> list(map(func_vec, grid))
[[2, 3], [4, 5]]
>>> func_vec_vec(grid)
[[2, 3], [4, 5]]
You can define your own -
def map2d(func, grid):
out_grid = []
for row in grid:
out_grid.append(list(map(func, row)))
return out_grid
x = [[1, 1, 2], [2, 2, 3], [3, 4, 3]]
map2d(lambda z: z**2, x)
#[[1, 1, 4], [4, 4, 9], [9, 16, 9]]
I am trying to find the missing elements between arr1 and arr2 but not sure what's the issue with the code, why it's not working. Please suggest.
def miss2(arr1, arr2):
arr3=arr1
for i in arr1:
# print(i)
for j in arr2:
# print(i,j)
if i == j:
arr3.remove(j)
print(arr3)
arr1=[1,2,3,4]
arr2=[1,2]
miss2(arr1,arr2)
result:[2, 3, 4] instead of [3, 4]
Objects in Python are stored by reference,which means you didn't assign the value of arr1 to arr3, but a pointer to the object.You can use is operator to test if two objects have the same address in memory.
Sequences can be copied by slicing so you can use this to copy a list:
arr3 = arr1[:]
Also you can use
arr3 = list(arr1)
Or you can use copy() module:
from copy import copy
arr3 = copy(arr1)
By the way,you can try this:
print [i for i in arr1 if i not in arr2]
McGrady is right.
Here is an article tells you more stuff about the reference problem.
Is Python call-by-value or call-by-reference?
And you can use "set"(consider the math concept) data structure instead of "list", here:
x = set([1,2,3,4])
y = set([1,2])
x - y
I have a rather basic question about the NumPy module in Python 2, particularly the version on trinket.io. I do not see how to replace values in a multidimensional array several layers in, regardless of the method. Here is an example:
a = numpy.array([1,2,3])
a[0] = 0
print a
a = numpy.array([[1,2,3],[1,2,3]])
a[0][0] = a[1][0] = 0
print a
Result:
array([0, 2, 3], '<class 'int'>')
array([[1, 2, 3], [1, 2, 3]], '<class 'int'>')
I need the ability to change individual values, my specific code being:
a = numpy.empty(shape = (8,8,2),dtype = str)
for row in range(a.shape[0]):
for column in range(a.shape[1]):
a[row][column][1] = 'a'
Thank you for your time and any help provided.
To change individual values you can simply do something like:
a[1,2] = 'b'
If you want to change all the array, you can do:
a[:,:] = 'c'
Use commas (array[a,b]) instead of (array[a][b])
With numpy version 1.11.0, I get
[[0 2 3]
[0 2 3]]
When I run your code. I guess your numpy version is newer and better.
As user3408085 said, the correct thing is to go a[0,0] = 0 to change one element or a[:,0]=0 if your actually want to zero the entire first column.
The reason a[0][0]=0 does not modify a (at least in your version of numpy) is that a[0] is a new array. If break down your command a[0][0]=0 into 2 lines:
b=a[0]
b[0]=0
Then the fact that this modifies a is counterintuitive.
How to set the same value to the matrix of multiple rows and each row with different column numbers without for loop?
For example for matrix a:
a=matrix([[1,2,3],
[8,2,9],
[1,8,7]])
row = [1,2,3]
col = [[1,2]
[1,3]
[2,3]]
I want to set a[1,1],a[1,2],a[2,1],a[2,3],a[3,2],a[3,3] to the same value.
I know use for loop:
for i in xrange(len(row)):
a[row[i],col[i]] = setvalue
But is there anyway to do this without for loop?
Using numpy, you can avoid loops:
import numpy as np
from numpy.matlib import repmat
a = np.array([[1,2,3],
[8,2,9],
[1,8,7]])
row = np.array([[1],
[2],
[3]])
col = np.array([[1,2],
[1,3],
[2,3]])
row = repmat(row,1,col.shape[1])
setvalue = 0
a[row.ravel(),col.ravel()] = setvalue
However, it's important to note that in python indexing starts at 0, so you should actually do
a[row-1,col-1] = setvalue
Or even better, use the correct (zero-based) indices to initialise your row and col arrays.
Case 1: Use list comprehension
You can do like this:
value = 2
col_length = 3
line_length = 3
a = [[value for x in range(col_length)] for x in range(line_length)]
If you print a,
[[2, 2, 2], [2, 2, 2], [2, 2, 2]]
EDIT: Case 2 : Use map()
I am not very used to this one. But you can find more informations about it here in terms of performance. General idea: it seems faster when used with one function and no lambda expression.
You'll have to use a for loop.
Usually you want to avoid for loops (by using comprehesions) when following the functional paradigm, by building new instances instead of mutating the old one. As your goal is to mutate the old one, somewhere you will need a loop. The best you can do is to wrap it up in a function:
def set_items_to(mx, indices, value=0):
for row,cols in indices:
for col in cols:
mx[row, col] = value
a = matrix([[1,2,3],[4,5,6],[7,8,9]])
set_items_to(a, [
[0, [0,1]],
[1, [0,2]],
[2, [1,2]]
], setvalue)
EDIT
In case it is a programming challenge, there are ways to accomplish that without explicit for loops by using one of the built in aggregator functions. But this approach doesn't make the code clearer nor shorter. Just for completeness, it would look something like this:
def set_items_to(mx, indices, value=0):
sum(map(lambda item: [0,
sum(map(lambda col: [0,
mx.__setitem__((item[0], col), value)
][0], item[1]))
][0], indices))
This question already has answers here:
Is there a NumPy function to return the first index of something in an array?
(20 answers)
Closed 2 years ago.
In Python we can get the index of a value in an array by using .index().
But with a NumPy array, when I try to do:
decoding.index(i)
I get:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
How could I do this on a NumPy array?
Use np.where to get the indices where a given condition is True.
Examples:
For a 2D np.ndarray called a:
i, j = np.where(a == value) # when comparing arrays of integers
i, j = np.where(np.isclose(a, value)) # when comparing floating-point arrays
For a 1D array:
i, = np.where(a == value) # integers
i, = np.where(np.isclose(a, value)) # floating-point
Note that this also works for conditions like >=, <=, != and so forth...
You can also create a subclass of np.ndarray with an index() method:
class myarray(np.ndarray):
def __new__(cls, *args, **kwargs):
return np.array(*args, **kwargs).view(myarray)
def index(self, value):
return np.where(self == value)
Testing:
a = myarray([1,2,3,4,4,4,5,6,4,4,4])
a.index(4)
#(array([ 3, 4, 5, 8, 9, 10]),)
You can convert a numpy array to list and get its index .
for example:
tmp = [1,2,3,4,5] #python list
a = numpy.array(tmp) #numpy array
i = list(a).index(2) # i will return index of 2, which is 1
this is just what you wanted.
I'm torn between these two ways of implementing an index of a NumPy array:
idx = list(classes).index(var)
idx = np.where(classes == var)
Both take the same number of characters, but the first method returns an int instead of a numpy.ndarray.
This problem can be solved efficiently using the numpy_indexed library (disclaimer: I am its author); which was created to address problems of this type. npi.indices can be viewed as an n-dimensional generalisation of list.index. It will act on nd-arrays (along a specified axis); and also will look up multiple entries in a vectorized manner as opposed to a single item at a time.
a = np.random.rand(50, 60, 70)
i = np.random.randint(0, len(a), 40)
b = a[i]
import numpy_indexed as npi
assert all(i == npi.indices(a, b))
This solution has better time complexity (n log n at worst) than any of the previously posted answers, and is fully vectorized.
You can use the function numpy.nonzero(), or the nonzero() method of an array
import numpy as np
A = np.array([[2,4],
[6,2]])
index= np.nonzero(A>1)
OR
(A>1).nonzero()
Output:
(array([0, 1]), array([1, 0]))
First array in output depicts the row index and second array depicts the corresponding column index.
If you are interested in the indexes, the best choice is np.argsort(a)
a = np.random.randint(0, 100, 10)
sorted_idx = np.argsort(a)