A NumPy equivalent of pandas read_clipboard?

A NumPy equivalent of pandas read_clipboard? - python

For example, if a question/answer you encounter posts an array like this:
[[ 0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]
[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55]
[56 57 58 59 60 61 62 63]]
How would you load it into a variable in a REPL session without having to add commas everywhere?

For a one-time occasion, I might do this:
Copy the text containing the array to the clipboard.
In an ipython shell, enter s = """, but do not hit return.
Paste the text from the clipboard.
Type the closing triple quote.
That gives me:
In [16]: s = """[[ 0 1 2 3 4 5 6 7]
...: [ 8 9 10 11 12 13 14 15]
...: [16 17 18 19 20 21 22 23]
...: [24 25 26 27 28 29 30 31]
...: [32 33 34 35 36 37 38 39]
...: [40 41 42 43 44 45 46 47]
...: [48 49 50 51 52 53 54 55]
...: [56 57 58 59 60 61 62 63]]"""
Then use np.loadtxt() as follows:
In [17]: a = np.loadtxt([line.lstrip(' [').rstrip(']') for line in s.splitlines()], dtype=int)
In [18]: a
Out[18]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])

If you have Pandas, pyperclip or something else to read from the clipboard you could use something like this:
from pandas.io.clipboard import clipboard_get
# import pyperclip
import numpy as np
import re
import ast
def numpy_from_clipboard():
inp = clipboard_get()
# inp = pyperclip.paste()
inp = inp.strip()
# if it starts with "array(" we just need to remove the
# leading "array(" and remove the optional ", dtype=xxx)"
if inp.startswith('array('):
inp = re.sub(r'^array\(', '', inp)
dtype = re.search(r', dtype=(\w+)\)$', inp)
if dtype:
return np.array(ast.literal_eval(inp[:dtype.start()]), dtype=dtype.group(1))
else:
return np.array(ast.literal_eval(inp[:-1]))
else:
# In case it's the string representation it's a bit harder.
# We need to remove all spaces between closing and opening brackets
inp = re.sub(r'\]\s+\[', '],[', inp)
# We need to remove all whitespaces following an opening bracket
inp = re.sub(r'\[\s+', '[', inp)
# and all leading whitespaces before closing brackets
inp = re.sub(r'\s+\]', ']', inp)
# replace all remaining whitespaces with ","
inp = re.sub(r'\s+', ',', inp)
return np.array(ast.literal_eval(inp))
And then read what you saved in the clipboard:
>>> numpy_from_clipboard()
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63]])
This should be able to parse (most) arrays (str as well as repr of arrays) from your clipboard. It should even work for multi-line arrays (where np.loadtxt fails):
[[ 0.34866207 0.38494993 0.7053722 0.64586156 0.27607369 0.34850162
0.20530567 0.46583039 0.52982216 0.92062115]
[ 0.06973858 0.13249867 0.52419149 0.94707951 0.868956 0.72904737
0.51666421 0.95239542 0.98487436 0.40597835]
[ 0.66246734 0.85333546 0.072423 0.76936201 0.40067016 0.83163118
0.45404714 0.0151064 0.14140024 0.12029861]
[ 0.2189936 0.36662076 0.90078913 0.39249484 0.82844509 0.63609079
0.18102383 0.05339892 0.3243505 0.64685352]
[ 0.803504 0.57531309 0.0372428 0.8308381 0.89134864 0.39525473
0.84138386 0.32848746 0.76247531 0.99299639]]
>>> numpy_from_clipboard()
array([[ 0.34866207, 0.38494993, 0.7053722 , 0.64586156, 0.27607369,
0.34850162, 0.20530567, 0.46583039, 0.52982216, 0.92062115],
[ 0.06973858, 0.13249867, 0.52419149, 0.94707951, 0.868956 ,
0.72904737, 0.51666421, 0.95239542, 0.98487436, 0.40597835],
[ 0.66246734, 0.85333546, 0.072423 , 0.76936201, 0.40067016,
0.83163118, 0.45404714, 0.0151064 , 0.14140024, 0.12029861],
[ 0.2189936 , 0.36662076, 0.90078913, 0.39249484, 0.82844509,
0.63609079, 0.18102383, 0.05339892, 0.3243505 , 0.64685352],
[ 0.803504 , 0.57531309, 0.0372428 , 0.8308381 , 0.89134864,
0.39525473, 0.84138386, 0.32848746, 0.76247531, 0.99299639]])
However I'm not too good with regexes so this probably isn't foolproof and using ast.literal_eval feels a bit awkard (but it avoids doing the parsing yourself).
Feel free to suggest improvements.

Related

Sort for Matrix

I have a problem with matrix sort.
I need to create a matrix (MxM) from input. And create nested lists using randrange.
matrix_size = int(input("Enter size of the matrix: "))
matrix = [[randrange(1, 51) for column in range(matrix_size)] for row in range(matrix_size)]
Next step i should find sum of each column of matrix. So i do this thing:
for i in range(matrix_size):
sum_column = 0
for j in range(matrix_size):
sum_column += matrix[j][i]
print(f'{matrix[i][j]:>5}', end='')
print(f'{sum_column:>5}')
So problem is... that i should add sum row in the end of a matrix. But what happens to me:
Enter the size of the matrix: 5
15 23 14 22 20 73
7 26 26 27 27 160
17 36 9 13 42 104
1 32 41 2 29 113
33 43 14 49 12 130
Yeah. It counting right but how i can add it to the end of matrix. And sort ascending to the sums of columns. Hope some of you will understand what i need. Thanks

Do you mean something like this?
import numpy as np
matrix = np.array(matrix)
rowsum = matrix.sum(axis=1) # sum of rows
idx = np.argsort(rowsum) # permutation that makes rowsum sorted
result = np.hstack([matrix, rowsum[:, None]]) # join matrix and roswum
result = result[idx] # sort rows in ascending order
for matrix
array([[31, 13, 29, 5, 1],
[21, 9, 34, 31, 22],
[13, 38, 29, 20, 50],
[21, 12, 26, 5, 15],
[19, 24, 38, 44, 41]])
would the output be:
array([[ 31, 13, 29, 5, 1, 79],
[ 21, 12, 26, 5, 15, 79],
[ 21, 9, 34, 31, 22, 117],
[ 13, 38, 29, 20, 50, 150],
[ 19, 24, 38, 44, 41, 166]])

Accessing required number of indices in an array

I have an array like:
a=np.array([20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79])
requirement:
I would to like to access 10 indices in an array
the above array length is 60,60/10=6. So, i need every 6th indices in an array a.
required output:[0,6,12,18,24,30,36,42,48,64,60]

Numpy is powerful i would recommend to read the Documentation about indexing in numpy
everySixthEntry=a[np.arange(0,a.shape[0],6)]

You can generate the indexes for any array a with np.arange(len(a)). To access every 6th index use the a slice a[start:stop:step]. Jack posted one way, here a bit more detailed.
import numpy as np
# define your data. a = [20, ..., 79]
a = np.arange(60) + 20
# generate indexes for the array, index start at 0 till len(a)-1
indexes = np.arange(len(a))
# reduce the indexes to every 6th index
indexes = indexes[::6] # [start:stop:step]
print(indexes)
# -> array([ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54])
# 60 isn't included as the array is only 59 long
The same result a bit different. You can also use np.arange steps.
# the same result a bit different
indexes = np.arange(0, len(a), 6) # (start,stop,step)
print(indexes)
# -> array([ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54])
and in case you want to access the values of your original array
print(a[indexes])
# -> array([20, 26, 32, 38, 44, 50, 56, 62, 68, 74])
Basics of slicing
a[start:stop:step] is equivalent to a[slice(start, stop, step)]. If you don't want to specify any of start, stop, step set it to None. start and stop takes values from 0 to len(a)-1 and negative represents the position from the end of the array.
Some Slice Examples:
step = 20
a[slice(None, None, step)], a[slice(0, -1, step)], a[0: -1: step], a[::step]
# all -> array([20, 40, 60])
# the first 4 elements
step = 1
start = 0 # or None
end = 5
a[slice(start, end, step)], a[slice(start, end)] , a[start: end: step] , a[start:end]
# all -> array([20, 21, 22, 23])
# the last 4 elements
step = 1
start = -4
end = None # -1 will cute the last entry
a[slice(start, end, step)], a[slice(start, end)] , a[start: end: step] , a[start:end]
# all -> array([76, 77, 78, 79]

I think you meant to say:
The required index values are [0,6,12,18,24,30,36,42,48,64,60]
Corresponding output values are [20, 26, 32, 38, 44, 50, 56, 62, 68, 74]
The code below should give you the values for every 6th index.
a=np.array([20,21,22,23,24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,])
Out=[]
for i in range(10):
Out.append(a[6*i])
print(Out)
Output is :
[20, 26, 32, 38, 44, 50, 56, 62, 68, 74]
If the Index values are required: Do the following
Out1=[]
for i in range(0,11): #for only 10 indices (going from 0 to 10)
print(6*i)
Out1.append(6*i)
print("The required index values is : {}".format(Out1))
This gives an output :
0
6
12
18
24
30
36
42
48
54
60
The required index values is : [0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60]

Access x_train columns after train test split function

After the splitting of my data, im trying a feature ranking but when im trying to access the X_train.columns im getting this 'numpy.ndarray' object has no attribute 'columns'.
from sklearn.model_selection import train_test_split
y=df['DIED'].values
x=df.drop('DIED',axis=1).values
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)
print('X_train',X_train.shape)
print('X_test',X_test.shape)
print('y_train',y_train.shape)
print('y_test',y_test.shape)
bestfeatures = SelectKBest(score_func=chi2, k="all")
fit = bestfeatures.fit(X_train,y_train)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X_train.columns)
i know that train test split returns a numpy array, but how i should deal with it?

May be this code makes it clear:
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
# here i imitate your example of data
df = pd.DataFrame(data = np.random.randint(100, size = (50,5)), columns = ['DIED']+[f'col_{i}' for i in range(4)])
df.head()
Out[1]:
DIED col_0 col_1 col_2 col_3
0 36 0 23 43 55
1 81 59 83 37 31
2 32 86 94 50 87
3 10 69 4 69 27
4 1 16 76 98 74
#df here is a DataFrame, with all attributes, like df.columns
y=df['DIED'].values
x=df.drop('DIED',axis=1).values # <- here you get values, so the type of structure is array of array now (not DataFrame), so it hasn't any columns name
x
Out[2]:
array([[ 0, 23, 43, 55],
[59, 83, 37, 31],
[86, 94, 50, 87],
[69, 4, 69, 27],
[16, 76, 98, 74],
[17, 50, 52, 31],
[95, 4, 56, 68],
[82, 35, 67, 76],
.....
# now you can access to columns by index, like this:
x[:,2] # <- gives you access to the 3rd column
Out[3]:
array([43, 37, 50, 69, 98, 52, 56, 67, 81, 64, 48, 68, 14, 41, 78, 65, 11,
86, 80, 1, 11, 32, 93, 82, 93, 81, 63, 64, 47, 81, 79, 85, 60, 45,
80, 21, 27, 37, 87, 31, 97, 16, 59, 91, 20, 66, 66, 3, 9, 88])
# or you able to convert array of array back to DataFrame
pd.DataFrame(data = x, columns = df.columns[1:])
Out[4]:
col_0 col_1 col_2 col_3
0 0 23 43 55
1 59 83 37 31
2 86 94 50 87
3 69 4 69 27
....
The same approach with all your variables: X_train, X_test, Y_train, Y_test

Get second minimum values per column in 2D array

How can I get the second minimum value from each column? I have this array:
A = [[72 76 44 62 81 31]
[54 36 82 71 40 45]
[63 59 84 36 34 51]
[58 53 59 22 77 64]
[35 77 60 76 57 44]]
I wish to have output like:
A = [54 53 59 36 40 44]

Try this, in just one line:
[sorted(i)[1] for i in zip(*A)]
in action:
In [12]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [58, 53, 59, 22, 77 ,64],
...: [35 ,77, 60, 76, 57, 44]]
In [18]: [sorted(i)[1] for i in zip(*A)]
Out[18]: [54, 53, 59, 36, 40, 44]
zip(*A) will transpose your list of list so the columns become rows.
and if you have duplicate value, for example:
In [19]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [35, 53, 59, 22, 77 ,64], # 35
...: [35 ,77, 50, 76, 57, 44],] # 35
If you need to skip both 35s, you can use set():
In [29]: [sorted(list(set(i)))[1] for i in zip(*A)]
Out[29]: [54, 53, 50, 36, 40, 44]

Operations on numpy arrays should be done with numpy functions, so look at this one:
np.sort(A, axis=0)[1, :]
Out[61]: array([54, 53, 59, 36, 40, 44])

you can use heapq.nsmallest
from heapq import nsmallest
[nsmallest(2, e)[-1] for e in zip(*A)]
output:
[54, 53, 50, 36, 40, 44]
I added a simple benchmark to compare the performance of the different solutions already posted:
from simple_benchmark import BenchmarkBuilder
from heapq import nsmallest
b = BenchmarkBuilder()
#b.add_function()
def MehrdadPedramfar(A):
return [sorted(i)[1] for i in zip(*A)]
#b.add_function()
def NicolasGervais(A):
return np.sort(A, axis=0)[1, :]
#b.add_function()
def imcrazeegamerr(A):
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
return result
#b.add_function()
def Daweo(A):
return np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
#b.add_function()
def kederrac(A):
return [nsmallest(2, e)[-1] for e in zip(*A)]
#b.add_arguments('Number of row/cols (A is square matrix)')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, [[randint(0, 1000) for _ in range(size)] for _ in range(size)]
r = b.run()
r.plot()
Using zip with sorted function is the fastest solution for small 2d lists while using zip with heapq.nsmallest shows to be the best on big 2d lists

I hope I understood your question correctly but either way here's my solution, im sure there is a more elegent way of doing this but it works
A = [[72,76,44,62,81,31]
,[54,36,82,71,40,45]
,[63,59,84,36,34,51]
,[58,53,59,22,77,64]
,[35,77,50,76,57,44]]
#rotate the array 90deg
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
print(result)

Assuming that A is numpy.array (if this holds true please consider adding numpy tag to your question) then you might use apply_along_axis for that following way:
import heap
import numpy as np
A = np.array([[72, 76, 44, 62, 81, 31],
[54, 36, 82, 71, 40, 45],
[63, 59, 84, 36, 34, 51],
[58, 53, 59, 22, 77, 64],
[35, 77, 60, 76, 57, 44]])
second_mins = np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
print(second_mins) # [54 53 59 36 40 44]
Note that I used heapq.nsmallest as it does as much sorting as required to get 2 smallest elements, unlike sorted which does complete sort.

>>> A = np.arange(30).reshape(5,6).tolist()
>>> A
[[0, 1, 2, 3, 4, 5],
[6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]]
Updated:
Use set to prevent from duplicate and transpose list using zip(*A)
>>> [sorted(set(items))[1] for items in zip(*A)]
[6, 7, 8, 9, 10, 11]
old: second minimum item in each row
>>> [sorted(set(items))[1] for items in A]
[1, 7, 13, 19, 25]

Iterating elements in 3D array gives wrong element

I have a numpy array (of an image), the 3rd dimension is of length 3. An example of my array is below. I am attempting to iterate it so I access/print the last dimension of the array. But each of the techniques below accesses each individual value in the 3d array rather than the whole 3d array.
How can I iterate this numpy array at the 3d array level?
My array:
src = cv2.imread('./myimage.jpg')
# naive/shortened example of src contents (shape=(1, 3, 3))
[[[117 108 99]
[115 105 98]
[ 90 79 75]]]
When iterating my objective is print the following values each iteration:
[117 108 99] # iteration 1
[115 105 98] # iteration 2
[ 90 79 75] # iteration 3
# Attempt 1 to iterate
for index,value in np.ndenumerate(src):
print(src[index]) # src[index] and value = 117 when I was hoping it equals [117 108 99]
# Attempt 2 to iterate
for index,value in enumerate(src):
print(src[index]) # value = is the entire row

Solution
You could use any of the following two methods. However, Method-2 is more robust and the justification for that has been shown in the section: Detailed Solution below.
import numpy as np
src = [[117, 108, 99], [115, 105, 98], [ 90, 79, 75]]
src = np.array(src).reshape((1,3,3))
Method-1
for row in src[0,:]:
print(row)
Method-2
Robust method.
for e in np.transpose(src, [2,0,1]):
print(e)
Output:
[117 108 99]
[115 105 98]
[90 79 75]
Detailed Solution
Let us make an array of shape (3,4,5). So, if we iterate over the 3rd dimension, we should find 5 items, each with a shape of (3,4). You could achieve this by using numpy.transpose as shown below:
src = np.arange(3*4*5).reshape((3,4,5))
for e in np.transpose(src, [2,0,1]):
print(row)
Output:
[[ 0 5 10 15]
[20 25 30 35]
[40 45 50 55]]
[[ 1 6 11 16]
[21 26 31 36]
[41 46 51 56]]
[[ 2 7 12 17]
[22 27 32 37]
[42 47 52 57]]
[[ 3 8 13 18]
[23 28 33 38]
[43 48 53 58]]
[[ 4 9 14 19]
[24 29 34 39]
[44 49 54 59]]
Here the array src is:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])

General advice: When working with numpy, explicit python loops should be a last resort. Numpy is an extremely powerful tool which covers most use cases. Learn how to use it properly! If it helps, you can think of numpy as almost its own mini-language within a language.
Now, onto the code. I chose here to keep only the subarrays whose values are all below 100, but of course this is completely arbitrary and serves only to demonstrate the code.
import numpy as np
arr = np.array([[[117, 108, 99], [115, 105, 98], [90, 79, 75]], [[20, 3, 99], [101, 250, 30], [75, 89, 83]]])
cond_mask = np.all(a=arr < 100, axis=2)
arr_result = arr[cond_mask]
Let me know if you have any questions about the code :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

A NumPy equivalent of pandas read_clipboard? - python

Related

Sort for Matrix

Accessing required number of indices in an array

Access x_train columns after train test split function

Get second minimum values per column in 2D array

Iterating elements in 3D array gives wrong element

Categories

Resources