Accessing required number of indices in an array

Accessing required number of indices in an array - python

I have an array like:
a=np.array([20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79])
requirement:
I would to like to access 10 indices in an array
the above array length is 60,60/10=6. So, i need every 6th indices in an array a.
required output:[0,6,12,18,24,30,36,42,48,64,60]

Numpy is powerful i would recommend to read the Documentation about indexing in numpy
everySixthEntry=a[np.arange(0,a.shape[0],6)]

You can generate the indexes for any array a with np.arange(len(a)). To access every 6th index use the a slice a[start:stop:step]. Jack posted one way, here a bit more detailed.
import numpy as np
# define your data. a = [20, ..., 79]
a = np.arange(60) + 20
# generate indexes for the array, index start at 0 till len(a)-1
indexes = np.arange(len(a))
# reduce the indexes to every 6th index
indexes = indexes[::6] # [start:stop:step]
print(indexes)
# -> array([ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54])
# 60 isn't included as the array is only 59 long
The same result a bit different. You can also use np.arange steps.
# the same result a bit different
indexes = np.arange(0, len(a), 6) # (start,stop,step)
print(indexes)
# -> array([ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54])
and in case you want to access the values of your original array
print(a[indexes])
# -> array([20, 26, 32, 38, 44, 50, 56, 62, 68, 74])
Basics of slicing
a[start:stop:step] is equivalent to a[slice(start, stop, step)]. If you don't want to specify any of start, stop, step set it to None. start and stop takes values from 0 to len(a)-1 and negative represents the position from the end of the array.
Some Slice Examples:
step = 20
a[slice(None, None, step)], a[slice(0, -1, step)], a[0: -1: step], a[::step]
# all -> array([20, 40, 60])
# the first 4 elements
step = 1
start = 0 # or None
end = 5
a[slice(start, end, step)], a[slice(start, end)] , a[start: end: step] , a[start:end]
# all -> array([20, 21, 22, 23])
# the last 4 elements
step = 1
start = -4
end = None # -1 will cute the last entry
a[slice(start, end, step)], a[slice(start, end)] , a[start: end: step] , a[start:end]
# all -> array([76, 77, 78, 79]

I think you meant to say:
The required index values are [0,6,12,18,24,30,36,42,48,64,60]
Corresponding output values are [20, 26, 32, 38, 44, 50, 56, 62, 68, 74]
The code below should give you the values for every 6th index.
a=np.array([20,21,22,23,24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,])
Out=[]
for i in range(10):
Out.append(a[6*i])
print(Out)
Output is :
[20, 26, 32, 38, 44, 50, 56, 62, 68, 74]
If the Index values are required: Do the following
Out1=[]
for i in range(0,11): #for only 10 indices (going from 0 to 10)
print(6*i)
Out1.append(6*i)
print("The required index values is : {}".format(Out1))
This gives an output :
0
6
12
18
24
30
36
42
48
54
60
The required index values is : [0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60]

Related

finding consecutive numbers in a matrix with python numpy

I am practicing some exercises and have been going in circles trying to figure it out. The first part is to create a 5x5 matrix with NumPy giving it random values, and I’ve already solved it.
Now I need to see if the matrix, either horizontally or vertically, has consecutive numbers
(for example: The matrix of the attached image does not have consecutive numbers).
Here is there are 4 consecutive numbers in the first column:
[[50 92 78 84 36]
[51 33 94 73 32]
[52 94 35 47 9]
[53 5 60 55 67]
[83 51 56 99 18]]`
Here are 4 consecutive numbers in the last row
[[50 92 78 84 36]
[41 33 94 73 32]
[72 94 35 47 9]
[55 5 60 55 67]
[84 85 86 87 18]]"
The last step is to continue randomizing the array until you find those consecutive numbers.

Here is a naive approach to check whether each row/column of a given matrix has a given amount (4 in this case) of consecutive numbers:
import numpy as np
def has_consecutive_number(M, num_consecutive=4):
for v in np.vstack((M, M.T)): # You need to check both columns and rows
count = 1 # Record how many consecutive numbers found
current_num = 0 # Recording 1 or -1
for i in range(1, len(v)):
diff = v[i] - v[i-1]
if diff == 1: # if diff is 1
if current_num != 1: # if previous diff is not 1, reset counter
count = 1
current_num = 1
count += 1
elif diff == -1:
if current_num != -1: count = 1
current_num = -1
count += 1
else: # reset counter
current_num = 0
count = 1
if count == num_consecutive:
return True
return False
M1 = np.array([ [10, 43, 74, 32, 69],
[20, 19, 69, 83, 8],
[89, 31, 62, 61, 17],
[35, 3, 77, 22, 29],
[52, 59, 86, 55, 73] ])
print(has_consecutive_number(M1, 4))
M2 = np.array([ [10, 43, 74, 32, 69],
[20, 19, 69, 83, 8],
[89, 31, 62, 61, 17],
[35, 3, 77, 22, 29],
[52, 53, 54, 55, 73] ])
print(has_consecutive_number(M2, 4))
Output is False for first matrix and True for second matrix
False
True

Access x_train columns after train test split function

After the splitting of my data, im trying a feature ranking but when im trying to access the X_train.columns im getting this 'numpy.ndarray' object has no attribute 'columns'.
from sklearn.model_selection import train_test_split
y=df['DIED'].values
x=df.drop('DIED',axis=1).values
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)
print('X_train',X_train.shape)
print('X_test',X_test.shape)
print('y_train',y_train.shape)
print('y_test',y_test.shape)
bestfeatures = SelectKBest(score_func=chi2, k="all")
fit = bestfeatures.fit(X_train,y_train)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X_train.columns)
i know that train test split returns a numpy array, but how i should deal with it?

May be this code makes it clear:
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
# here i imitate your example of data
df = pd.DataFrame(data = np.random.randint(100, size = (50,5)), columns = ['DIED']+[f'col_{i}' for i in range(4)])
df.head()
Out[1]:
DIED col_0 col_1 col_2 col_3
0 36 0 23 43 55
1 81 59 83 37 31
2 32 86 94 50 87
3 10 69 4 69 27
4 1 16 76 98 74
#df here is a DataFrame, with all attributes, like df.columns
y=df['DIED'].values
x=df.drop('DIED',axis=1).values # <- here you get values, so the type of structure is array of array now (not DataFrame), so it hasn't any columns name
x
Out[2]:
array([[ 0, 23, 43, 55],
[59, 83, 37, 31],
[86, 94, 50, 87],
[69, 4, 69, 27],
[16, 76, 98, 74],
[17, 50, 52, 31],
[95, 4, 56, 68],
[82, 35, 67, 76],
.....
# now you can access to columns by index, like this:
x[:,2] # <- gives you access to the 3rd column
Out[3]:
array([43, 37, 50, 69, 98, 52, 56, 67, 81, 64, 48, 68, 14, 41, 78, 65, 11,
86, 80, 1, 11, 32, 93, 82, 93, 81, 63, 64, 47, 81, 79, 85, 60, 45,
80, 21, 27, 37, 87, 31, 97, 16, 59, 91, 20, 66, 66, 3, 9, 88])
# or you able to convert array of array back to DataFrame
pd.DataFrame(data = x, columns = df.columns[1:])
Out[4]:
col_0 col_1 col_2 col_3
0 0 23 43 55
1 59 83 37 31
2 86 94 50 87
3 69 4 69 27
....
The same approach with all your variables: X_train, X_test, Y_train, Y_test

Get second minimum values per column in 2D array

How can I get the second minimum value from each column? I have this array:
A = [[72 76 44 62 81 31]
[54 36 82 71 40 45]
[63 59 84 36 34 51]
[58 53 59 22 77 64]
[35 77 60 76 57 44]]
I wish to have output like:
A = [54 53 59 36 40 44]

Try this, in just one line:
[sorted(i)[1] for i in zip(*A)]
in action:
In [12]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [58, 53, 59, 22, 77 ,64],
...: [35 ,77, 60, 76, 57, 44]]
In [18]: [sorted(i)[1] for i in zip(*A)]
Out[18]: [54, 53, 59, 36, 40, 44]
zip(*A) will transpose your list of list so the columns become rows.
and if you have duplicate value, for example:
In [19]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [35, 53, 59, 22, 77 ,64], # 35
...: [35 ,77, 50, 76, 57, 44],] # 35
If you need to skip both 35s, you can use set():
In [29]: [sorted(list(set(i)))[1] for i in zip(*A)]
Out[29]: [54, 53, 50, 36, 40, 44]

Operations on numpy arrays should be done with numpy functions, so look at this one:
np.sort(A, axis=0)[1, :]
Out[61]: array([54, 53, 59, 36, 40, 44])

you can use heapq.nsmallest
from heapq import nsmallest
[nsmallest(2, e)[-1] for e in zip(*A)]
output:
[54, 53, 50, 36, 40, 44]
I added a simple benchmark to compare the performance of the different solutions already posted:
from simple_benchmark import BenchmarkBuilder
from heapq import nsmallest
b = BenchmarkBuilder()
#b.add_function()
def MehrdadPedramfar(A):
return [sorted(i)[1] for i in zip(*A)]
#b.add_function()
def NicolasGervais(A):
return np.sort(A, axis=0)[1, :]
#b.add_function()
def imcrazeegamerr(A):
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
return result
#b.add_function()
def Daweo(A):
return np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
#b.add_function()
def kederrac(A):
return [nsmallest(2, e)[-1] for e in zip(*A)]
#b.add_arguments('Number of row/cols (A is square matrix)')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, [[randint(0, 1000) for _ in range(size)] for _ in range(size)]
r = b.run()
r.plot()
Using zip with sorted function is the fastest solution for small 2d lists while using zip with heapq.nsmallest shows to be the best on big 2d lists

I hope I understood your question correctly but either way here's my solution, im sure there is a more elegent way of doing this but it works
A = [[72,76,44,62,81,31]
,[54,36,82,71,40,45]
,[63,59,84,36,34,51]
,[58,53,59,22,77,64]
,[35,77,50,76,57,44]]
#rotate the array 90deg
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
print(result)

Assuming that A is numpy.array (if this holds true please consider adding numpy tag to your question) then you might use apply_along_axis for that following way:
import heap
import numpy as np
A = np.array([[72, 76, 44, 62, 81, 31],
[54, 36, 82, 71, 40, 45],
[63, 59, 84, 36, 34, 51],
[58, 53, 59, 22, 77, 64],
[35, 77, 60, 76, 57, 44]])
second_mins = np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
print(second_mins) # [54 53 59 36 40 44]
Note that I used heapq.nsmallest as it does as much sorting as required to get 2 smallest elements, unlike sorted which does complete sort.

>>> A = np.arange(30).reshape(5,6).tolist()
>>> A
[[0, 1, 2, 3, 4, 5],
[6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]]
Updated:
Use set to prevent from duplicate and transpose list using zip(*A)
>>> [sorted(set(items))[1] for items in zip(*A)]
[6, 7, 8, 9, 10, 11]
old: second minimum item in each row
>>> [sorted(set(items))[1] for items in A]
[1, 7, 13, 19, 25]

A NumPy equivalent of pandas read_clipboard?

For example, if a question/answer you encounter posts an array like this:
[[ 0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]
[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55]
[56 57 58 59 60 61 62 63]]
How would you load it into a variable in a REPL session without having to add commas everywhere?

For a one-time occasion, I might do this:
Copy the text containing the array to the clipboard.
In an ipython shell, enter s = """, but do not hit return.
Paste the text from the clipboard.
Type the closing triple quote.
That gives me:
In [16]: s = """[[ 0 1 2 3 4 5 6 7]
...: [ 8 9 10 11 12 13 14 15]
...: [16 17 18 19 20 21 22 23]
...: [24 25 26 27 28 29 30 31]
...: [32 33 34 35 36 37 38 39]
...: [40 41 42 43 44 45 46 47]
...: [48 49 50 51 52 53 54 55]
...: [56 57 58 59 60 61 62 63]]"""
Then use np.loadtxt() as follows:
In [17]: a = np.loadtxt([line.lstrip(' [').rstrip(']') for line in s.splitlines()], dtype=int)
In [18]: a
Out[18]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])

If you have Pandas, pyperclip or something else to read from the clipboard you could use something like this:
from pandas.io.clipboard import clipboard_get
# import pyperclip
import numpy as np
import re
import ast
def numpy_from_clipboard():
inp = clipboard_get()
# inp = pyperclip.paste()
inp = inp.strip()
# if it starts with "array(" we just need to remove the
# leading "array(" and remove the optional ", dtype=xxx)"
if inp.startswith('array('):
inp = re.sub(r'^array\(', '', inp)
dtype = re.search(r', dtype=(\w+)\)$', inp)
if dtype:
return np.array(ast.literal_eval(inp[:dtype.start()]), dtype=dtype.group(1))
else:
return np.array(ast.literal_eval(inp[:-1]))
else:
# In case it's the string representation it's a bit harder.
# We need to remove all spaces between closing and opening brackets
inp = re.sub(r'\]\s+\[', '],[', inp)
# We need to remove all whitespaces following an opening bracket
inp = re.sub(r'\[\s+', '[', inp)
# and all leading whitespaces before closing brackets
inp = re.sub(r'\s+\]', ']', inp)
# replace all remaining whitespaces with ","
inp = re.sub(r'\s+', ',', inp)
return np.array(ast.literal_eval(inp))
And then read what you saved in the clipboard:
>>> numpy_from_clipboard()
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])
This should be able to parse (most) arrays (str as well as repr of arrays) from your clipboard. It should even work for multi-line arrays (where np.loadtxt fails):
[[ 0.34866207 0.38494993 0.7053722 0.64586156 0.27607369 0.34850162
0.20530567 0.46583039 0.52982216 0.92062115]
[ 0.06973858 0.13249867 0.52419149 0.94707951 0.868956 0.72904737
0.51666421 0.95239542 0.98487436 0.40597835]
[ 0.66246734 0.85333546 0.072423 0.76936201 0.40067016 0.83163118
0.45404714 0.0151064 0.14140024 0.12029861]
[ 0.2189936 0.36662076 0.90078913 0.39249484 0.82844509 0.63609079
0.18102383 0.05339892 0.3243505 0.64685352]
[ 0.803504 0.57531309 0.0372428 0.8308381 0.89134864 0.39525473
0.84138386 0.32848746 0.76247531 0.99299639]]
>>> numpy_from_clipboard()
array([[ 0.34866207, 0.38494993, 0.7053722 , 0.64586156, 0.27607369,
0.34850162, 0.20530567, 0.46583039, 0.52982216, 0.92062115],
[ 0.06973858, 0.13249867, 0.52419149, 0.94707951, 0.868956 ,
0.72904737, 0.51666421, 0.95239542, 0.98487436, 0.40597835],
[ 0.66246734, 0.85333546, 0.072423 , 0.76936201, 0.40067016,
0.83163118, 0.45404714, 0.0151064 , 0.14140024, 0.12029861],
[ 0.2189936 , 0.36662076, 0.90078913, 0.39249484, 0.82844509,
0.63609079, 0.18102383, 0.05339892, 0.3243505 , 0.64685352],
[ 0.803504 , 0.57531309, 0.0372428 , 0.8308381 , 0.89134864,
0.39525473, 0.84138386, 0.32848746, 0.76247531, 0.99299639]])
However I'm not too good with regexes so this probably isn't foolproof and using ast.literal_eval feels a bit awkard (but it avoids doing the parsing yourself).
Feel free to suggest improvements.

using the in operator and changing lists

Im trying to produce a table with one row with numbers increasing by one and another with the respective composites with the limit being 100 like:
Numbers----------composites
x---------------numbers 1-100 divisible by x
x+1---------------numbers 1-100 divisible by x+1 but aren't in x
x+2---------------numbers 1-100 divisible by x+2 but aren't in x or x+1 x+3---------------numbers 1-100 divisible by x+3 but aren't in x,x+1,or x+2 etc
Numbers is a permanent list that starts off as 2-100 I whittle down as I pull out every composite number within the function, at the end it should only contain prime numbers.
composites is a list I fill with composites of a certain number (2,3,4 etc) that I then wish to check with the current numbers list to make sure there are no duplicates. I print whats left, empty the list and increase the current variable by 1 and repeat.
This is the coding ive come up with, I understand its very sloppy but I literally know nothing about the subject and my professor likes us to learn trial by fire and this is what ive managed to scrounge up from the textbook. The main issue of my concern is the adding and removing of elements from certain lists
def main():
x=2
n=2
print("numbers"" ""composite")
print("------------------------")
cross_out(n,x)
def cross_out(n,x):
composites=[]
prime=[]
numbers=[]
while x<101:
numbers.append(x)
x=x+1
x=2
for x in range(2,102):
if x==101:
search=composites[0]
index=0
while index<=len(composites):
if search in numbers:
search=search+1
index=index+1
else:
if search in composites:
composites.remove(search)
else:
pass
print(n,"--->",composites)
x=2
composites=[]
n=n+1
index=0
elif x%n==0:
composites.append(x)
if x in numbers:
numbers.remove(x)
else:
pass
x=x+1
else:
x=x+1
main()
cross_out()

I think I'm understanding your description correctly, and this is what I came up with.
I used a set to keep track of the number you have added to the composites already. This makes the problem pretty simple. Also, advice when writing functions is to not overwrite your parameters. For example, in cross_out, you are doing x = <value> and n = <value> several times.
def cross_out(n,x):
composites=[]
seen = set()
numbers = range(x, 101)
primes = []
for num in numbers:
for val in numbers:
if val % num == 0 and val not in seen:
composites.append(val)
seen.add(val)
if composites:
primes.append(composites[0])
print(num,'--->',composites)
composites = []
print(primes)
def main():
print("numbers composite")
print("------------------------")
cross_out(2, 2)
main()
Sample Output
numbers composite
------------------------
2 ---> [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100]
3 ---> [3, 9, 15, 21, 27, 33, 39, 45, 51, 57, 63, 69, 75, 81, 87, 93, 99]
4 ---> []
5 ---> [5, 25, 35, 55, 65, 85, 95]
6 ---> []
7 ---> [7, 49, 77, 91]
8 ---> []
9 ---> []
10 ---> []
11 ---> [11]
12 ---> []
13 ---> [13]
14 ---> []
15 ---> []
16 ---> []
17 ---> [17]
18 ---> []
19 ---> [19]
20 ---> []
21 ---> []
22 ---> []
23 ---> [23]
24 ---> []
25 ---> []
Primes
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Accessing required number of indices in an array - python

Numpy is powerful i would recommend to read the Documentation about indexing in numpy everySixthEntry=a[np.arange(0,a.shape[0],6)]

Related

finding consecutive numbers in a matrix with python numpy

Access x_train columns after train test split function

Get second minimum values per column in 2D array

A NumPy equivalent of pandas read_clipboard?

using the in operator and changing lists

Categories

Resources