Python thread and global interpreter lock when access same object - python

I am trying to understand python thread and how it works.
From my understanding, I know that there is GIL(Global Interpreter Lock) to prevent two thread to access memory at the same time.
This is pretty reasonable, even though it slows down the program.
But the code below, show unexpected result.
import thread, time
mylist = [[0,1]]
def listTo300Elem(id):
while len(mylist) < 300:
mylist.append([id, mylist[-1][1]+1])
thread.start_new_thread(listTo300Elem, (1,))
thread.start_new_thread(listTo300Elem, (2,))
thread.start_new_thread(listTo300Elem, (3,))
thread.start_new_thread(listTo300Elem, (4,))
thread.start_new_thread(listTo300Elem, (5,))
thread.start_new_thread(listTo300Elem, (6,))
thread.start_new_thread(listTo300Elem, (7,))
time.sleep(5)
print mylist
print len(mylist)
And the result is
[[0, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [1, 7], [1, 8], [1, 9], [1, 10], [1, 11], [1, 12], [1, 13], [1, 14], [1, 15], [1, 16], [1, 17], [1, 18], [1, 19], [1, 20], [1, 21], [1, 22], [1, 23], [1, 24], [1, 25], [1, 26], [1, 27], [1, 28], [1, 29], [1, 30], [1, 31], [1, 32], [1, 33], [1, 34], [1, 35], [1, 36], [1, 37], [1, 38], [1, 39], [1, 40], [1, 41], [1, 42], [1, 43], [1, 44], [1, 45], [1, 46], [1, 47], [1, 48], [1, 49], [1, 50], [1, 51], [1, 52], [1, 53], [1, 54], [1, 55], [1, 56], [1, 57], [1, 58], [1, 59], [1, 60], [1, 61], [1, 62], [1, 63], [1, 64], [1, 65], [1, 66], [1, 67], [1, 68], [1, 69], [1, 70], [1, 71], [1, 72], [2, 73], [2, 74], [2, 75], [2, 76], [2, 77], [2, 78], [2, 79], [5, 80], [5, 81], [5, 82], [5, 83], [5, 84], [5, 85], [5, 86], [5, 87], [5, 88], [5, 89], [5, 90], [5, 91], [5, 92], [5, 93], [5, 94], [3, 95], [3, 96], [3, 97], [3, 98], [3, 99], [3, 100], [3, 101], [3, 102], [3, 103], [3, 104], [3, 105], [3, 106], [3, 107], [3, 108], [3, 109], [3, 110], [3, 111], [3, 112], [3, 113], [3, 114], [3, 115], [3, 116], [3, 117], [7, 118], [7, 119], [7, 120], [7, 121], [7, 122], [7, 123], [7, 124], [2, 80], [2, 81], [2, 82], [2, 83], [2, 84], [2, 85], [2, 86], [2, 87], [2, 88], [2, 89], [2, 90], [2, 91], [2, 92], [2, 93], [2, 94], [2, 95], [2, 96], [2, 97], [2, 98], [2, 99], [2, 100], [2, 101], [2, 102], [2, 103], [2, 104], [2, 105], [2, 106], [2, 107], [2, 108], [2, 109], [2, 110], [2, 111], [2, 112], [2, 113], [2, 114], [2, 115], [2, 116], [2, 117], [2, 118], [2, 119], [2, 120], [2, 121], [2, 122], [2, 123], [2, 124], [2, 125], [2, 126], [2, 127], [2, 128], [2, 129], [2, 130], [2, 131], [2, 132], [2, 133], [2, 134], [2, 135], [2, 136], [2, 137], [2, 138], [2, 139], [2, 140], [2, 141], [7, 125], [7, 126], [7, 127], [7, 128], [7, 129], [7, 130], [7, 131], [7, 132], [7, 133], [7, 134], [7, 135], [7, 136], [7, 137], [7, 138], [7, 139], [7, 140], [7, 141], [7, 142], [7, 143], [7, 144], [7, 145], [7, 146], [7, 147], [7, 148], [7, 149], [7, 150], [7, 151], [7, 152], [7, 153], [7, 154], [7, 155], [7, 156], [7, 157], [7, 158], [7, 159], [7, 160], [7, 161], [7, 162], [7, 163], [7, 164], [7, 165], [7, 166], [7, 167], [7, 168], [7, 169], [7, 170], [7, 171], [6, 172], [6, 173], [6, 174], [6, 175], [6, 176], [6, 177], [6, 178], [3, 179], [3, 180], [3, 181], [3, 182], [3, 183], [3, 184], [3, 185], [3, 186], [3, 187], [3, 188], [3, 189], [3, 190], [3, 191], [3, 192], [3, 193], [3, 194], [3, 195], [3, 196], [3, 197], [3, 198], [3, 199], [3, 200], [3, 201], [7, 202], [7, 203], [7, 204], [7, 205], [7, 206], [7, 207], [7, 208], [7, 209], [1, 210], [1, 211], [1, 212], [1, 213], [1, 214], [1, 215], [1, 216], [1, 217], [1, 218], [1, 219], [1, 220], [1, 221], [1, 222], [1, 223], [1, 224], [1, 225], [1, 226], [1, 227], [1, 228], [1, 229], [1, 230], [1, 231], [1, 232], [1, 233], [1, 234], [1, 235], [1, 236], [1, 237], [1, 238], [3, 239], [5, 240], [2, 142], [6, 179]]
304
From my understanding, the result must be in order due to GIL, but they are not.
Can I get explanation for this example and further items to study?

What did you expect? That one thread add one item, after that another thread and etc.? Than why so many threads, if they work by one at once? Threads are trying to work simultaneously with one object. But since the GIL is not a good thing to do parallel computing, they do it so ugly.
To get more undestanding how GIL works, you may add logging.
logging.basicConfig(format="%(levelname)-8s [%(asctime)s] %(threadName)-12s %(message)s", level=logging.DEBUG, filename='log.log')
def listTo300Elem(id):
list_len = len(mylist)
while list_len < 300:
item = mylist[-1][1]+1]
mylist.append([id, item])
logging.debug('Len = {}, item {} added'.format(list_len, item))
list_len = len(mylist)
logging.debug('Len = {}, exit'.format(list_len, item))
So, threading in python is not suitable for all cases.

Related

Convert Bytes into BufferedReader object in Python?

The title of this question is the same as this one, and I have voted to reopoen the question.
I want to convert a byte object into a BufferedReader one, and here is my attempts(after referring to many articles):
import numpy as np
from PIL import Image as PILImage
from io import BytesIO
img_np = np.asarray([[[16, 16, 16], [2, 2, 2], [0, 0, 0], [6, 6, 6], [8, 8, 8], [0, 0, 0], [21, 21, 21], [3, 3, 3], [0, 0, 0], [62, 62, 62]], [[0, 0, 0], [71, 71, 71], [142, 142, 142], [107, 107, 107], [99, 99, 99], [101, 101, 101], [4, 4, 4], [86, 86, 86], [99, 99, 99], [146, 146, 146]], [[162, 162, 162], [203, 203, 203], [192, 192, 192], [228, 228, 228], [191, 191, 191], [178, 178, 178], [222, 222, 222], [200, 200, 200], [198, 198, 198], [182, 182, 182]], [[117, 117, 117], [178, 178, 178], [199, 199, 199], [214, 214, 214], [222, 222, 222], [208, 208, 208], [255, 255, 255], [251, 251, 251], [219, 219, 219], [255, 255, 255]], [[0, 0, 0], [0, 0, 0], [80, 80, 80], [169, 169, 169], [193, 193, 193], [238, 238, 238], [239, 239, 239], [243, 243, 243], [254, 254, 254], [230, 230, 230]], [[20, 20, 20], [20, 20, 20], [9, 9, 9], [1, 1, 1], [130, 130, 130], [194, 194, 194], [216, 216, 216], [255, 255, 255], [252, 252, 252], [255, 255, 255]], [[9, 9, 9], [0, 0, 0], [0, 0, 0], [0, 0, 0], [3, 3, 3], [44, 44, 44], [191, 191, 191], [217, 217, 217], [248, 248, 248], [225, 225, 225]], [[0, 0, 0], [11, 11, 11], [3, 3, 3], [11, 11, 11], [6, 6, 6], [15, 15, 15], [0, 0, 0], [153, 153, 153], [255, 255, 255], [253, 253, 253]], [[0, 0, 0], [5, 5, 5], [1, 1, 1], [4, 4, 4], [8, 8, 8], [4, 4, 4], [3, 3, 3], [0, 0, 0], [159, 159, 159], [241, 241, 241]], [[10, 10, 10], [9, 9, 9], [6, 6, 6], [2, 2, 2], [0, 0, 0], [0, 0, 0], [3, 3, 3], [20, 20, 20], [0, 0, 0], [185, 185, 185]]])
im = PILImage.fromarray(img_np.astype(np.uint8))
# im.save('./temp.jpeg', "JPEG")
# f = open('./temp.jpeg', 'rb')
# print(type(f))
#
b_handle = io.BytesIO()
im.save(b_handle, format="JPEG")
# b = im.tobytes()
print(type(b_handle))
b = b_handle.read()
print(type(b))
print(b)
im.save(b_handle, format="JPEG")
b_br = io.BufferedReader(b_handle)
print(type(b_br))
b = b_br.read()
print(type(b))
print(b)
The output is as below:
<class '_io.BytesIO'>
<class 'bytes'>
b''
<class '_io.BufferedReader'>
<class 'bytes'>
b''
It seems that the file like objects are empty. I know that for the b_handle I can get the value by b_handle.getvalue() but for the bufferedreader it doesn't work as a file object.
How can I convert a byte string into a bufferedreader object, the same as I open a file?
You are almost there. Once you save the image bytes into the buffer you need to seek(Change stream position) to byte offset 0 prior to the read call.
b_handle = io.BytesIO()
im.save(b_handle, format="JPEG")
b_handle.seek(0)
b_handle.name = "temp.jpeg"
b_br = io.BufferedReader(b_handle)
b = b_br.read()
Example,
>>> from io import BytesIO, BufferedReader
>>>
>>> b_handle = BytesIO()
>>> b_handle.write(b"Hello World")
11
>>> b_handle.seek(0) # This is important.
0
>>> br = BufferedReader(b_handle)
>>> br
<_io.BufferedReader>
>>> br.read()
b'Hello World'

How to do cumulative sum of array for 3 dimensions? (for loop for 3 dimensions)

I have an array of three dimension
x[i,j,k]=[[[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]], [[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]], [[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]], [[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]]]
And I need cumulative sum like the following
y[i][j][k]=[[[1, 21], [3, 28], [6, 36], [10, 45], [15, 55]], [1, 21], [3, 28], [6, 36], [10, 45], [15, 55]], [1, 21], [3, 28], [6, 36], [10, 45], [15, 55]], [1, 21], [3, 28], [6, 36], [10, 45], [15, 55]]]]
I have tried
for k in range(0,1):
for j in range(0,5):
for i in range(0,4):
y[i][j][k]=sum(sum(x[i][j][k] for jj in range(0,5) if jj<=j)for kk in range(0,1) if kk<=k)
but I got
y[i][j][k]=[[[1, 12], [3, 26], [6, 42], [10, 60], [15, 80]], [[1, 12], [3, 26], [6, 42], [10, 60], [15, 80]], [[1, 12], [3, 26], [6, 42], [10, 60], [15, 80]], [[1, 12], [3, 26], [6, 42], [10, 60], [15, 80]]]
How to do for loop as per my need?
I have
x[0][0][0]=1
x[0][1][0]=2
x[0][2][0]=3
x[0][3][0]=4
x[0][4][0]=5
x[0][0][1]=6
x[0][1][1]=7
x[0][2][1]=8
x[0][3][1]=9
x[0][4][1]=10
I need to do
y[0][0][0]=x[0][0][0]=1
y[0][1][0]=x[0][0][0]+x[0][1][0]=3
y[0][2][0]=x[0][0][0]+x[0][1][0]+x[0][2][0]=6
y[0][3][0]=x[0][0][0]+x[0][1][0]+x[0][2][0]+x[0][3][0]=10
y[0][4][0]=x[0][0][0]+x[0][1][0]+x[0][2][0]+x[0][3][0]+x[0][4][0]=15
y[0][0][1]=x[0][0][0]+x[0][1][0]+x[0][2][0]+x[0][3][0]+x[0][4][0]+x[0][0][1]=21
y[0][1][1]=x[0][0][0]+x[0][1][0]+x[0][2][0]+x[0][3][0]+x[0][4][0]+x[0][0][1]+x[0][1][1]=28
y[0][2][1]=x[0][0][0]+x[0][1][0]+x[0][2][0]+x[0][3][0]+x[0][4][0]+x[0][0][1]+x[0][1][1]+x[0][2][1]=36
y[0][3][1]=x[0][0][0]+x[0][1][0]+x[0][2][0]+x[0][3][0]+x[0][4][0]+x[0][0][1]+x[0][1][1]+x[0][2][1]+x[0][3][1]=45
y[0][4][1]=x[0][0][0]+x[0][1][0]+x[0][2][0]+x[0][3][0]+x[0][4][0]+x[0][0][1]+x[0][1][1]+x[0][2][1]+x[0][3][1]+x[0][4][1]=55
You can do the following, using some transpositioning trickery:
from itertools import accumulate, chain
result = []
for l in x:
a = [*accumulate(chain(*zip(*l)))] # [1, 3, 6, 10, 15, 21, 28, 36, 45, 55]
result.append([*map(list, zip(a[:len(l)], a[len(l):]))])
[[[1, 21], [3, 28], [6, 36], [10, 45], [15, 55]],
[[1, 21], [3, 28], [6, 36], [10, 45], [15, 55]],
[[1, 21], [3, 28], [6, 36], [10, 45], [15, 55]],
[[1, 21], [3, 28], [6, 36], [10, 45], [15, 55]]]

Sort nested lists by list

How can I efficiently sort nested lists by the first element of each nested list matching the order given in order list?
List of lists: [[97, 2, 0, 2], [97, 2, 0, 2], [98, 1, 2, 3], [99, 3, 3, 6], [99, 3, 3, 6], [99, 3, 3, 6], [101, 1, 6, 7], [100, 1, 7, 8]]
Order list: [97, 98, 99, 99, 101, 100, 97, 99]
Desired list: [[97, 2, 0, 2], [98, 1, 2, 3], [99, 3, 3, 6], [99, 3, 3, 6], [101, 1, 6, 7], [100, 1, 7, 8], [97, 2, 0, 2], [99, 3, 3, 6]]
Try creating a dict keyed to the first value from your nested list. Then build the output list from that dict:
nl = [[97, 2, 0, 2], [97, 2, 0, 2], [98, 1, 2, 3], [99, 3, 3, 6], [99, 3, 3, 6],
[99, 3, 3, 6], [101, 1, 6, 7], [100, 1, 7, 8]]
# Associate first value in list to the list
d = {v[0]: v for v in nl}
order_lst = [97, 98, 99, 99, 101, 100, 97, 99]
# Grab the list associated to each value in order_list from d
out = [d[v] for v in order_lst]
print(out)
out:
[[97, 2, 0, 2], [98, 1, 2, 3], [99, 3, 3, 6], [99, 3, 3, 6], [101, 1, 6, 7],
[100, 1, 7, 8], [97, 2, 0, 2], [99, 3, 3, 6]]
*Note this assumes that all sub-lists in the nested list are acceptable values as multiple variants of the same key are not supported in a dict.

python parallel paralle process - joblib for nested loops - track original indices for the input

I was using joblib for parallel processing a list (>500k rows) to find out duplicates in the file. Therefore, I needed to track indices of the input list. However, the result returned indices in each thread/processing and they were not originally indices in the list (range 0-500k+). How can I track the original indices of the input in the parallel processing? Thank you.
import time
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from joblib import Parallel, delayed
start_time = time.time()
texts = a_list
def match_name(texts):
result = []
for i, text in enumerate(texts):
for j, name in enumerate(texts[i+1:]):
fratio = fuzz.token_set_ratio(text, name)
if fratio>=75:
result.append([i,j, fratio])
return result
results2 = Parallel(n_jobs=200, verbose=5, backend="loky")(map(delayed(match_name), texts))
print(time.time() - start_time)
The actual result is:
[[[1, 1, 100],
[1, 4, 100],
[1, 6, 100],
[2, 2, 100],
[2, 4, 100],
[3, 2, 100],
[3, 4, 100],
[5, 1, 100],
[6, 1, 100]],
[[0, 14, 100],
[1, 6, 100],
[1, 14, 100],
[2, 9, 100],
[2, 14, 100],
[8, 7, 100],
[9, 0, 100],
[9, 12, 100],
[10, 11, 100],
[12, 4, 100],
[13, 9, 100]],
[[1, 24, 100],
[3, 21, 100],
[5, 7, 100],
[6, 17, 100],
[9, 1, 100],
[9, 9, 100],
[11, 7, 100],
[12, 2, 100],
[17, 4, 100]],
[[0, 18, 100],
[0, 19, 100],
[2, 5, 100],
...]
The expected result ranges 0 to 500k+, which is the length of the list.

How can I get elements from 3D matrix using specified indices in numpy?

I have a 3D matrix, in below example it's a (5, 4, 2) matrix: data_matrix
I have a another index array of shape (5, 4) where each row of array represent the element location: indx_array
I don't know how can I get the required_output. I'm trying to arrange (1,2) elements of each row based on the indx_array
I don't want to use for loops!
data_matrix = np.array([
[[0, 1], [2, 3], [4, 5], [6, 7]],
[[8, 9], [10, 11], [12, 13], [14, 15]],
[[16, 17], [18, 19], [20, 21], [22, 23]],
[[24, 25], [26, 27], [28, 29], [30, 31]],
[[32, 33], [34, 35], [36, 37], [38, 39]]
])
indx_array = np.array([[3,2,1,0], [0,1,2,3], [1,0,3,2], [0,3,1,2], [1,2,3,0]])
# I want following result:
required_output = [
[[6, 7], [4, 5], [2, 3], [0, 1]]
[[8, 9], [10, 11], [12, 13], [14, 15]]
[[18, 19], [16, 17], [22, 23], [20, 21]]
[[24, 25], [30, 31], [26, 27], [28, 29]]
[[34, 35], [36, 37], [38, 39], [32, 33]]
]
EDIT: Updated the indx_array to better illustrate the situation.
Can be done with a little bit of handing of the index array.
import numpy as np
_x = np.repeat(np.arange(indx_array.shape[0]),indx_array.shape[1])
_y = indx_array.ravel()
output = data_matrix[_x, _y].reshape(data_matrix.shape)
which results in the expected numpy array
array([[[ 6, 7],
[ 4, 5],
[ 2, 3],
[ 0, 1]],
[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]],
[[18, 19],
[16, 17],
[22, 23],
[20, 21]],
[[24, 25],
[30, 31],
[26, 27],
[28, 29]],
[[34, 35],
[36, 37],
[38, 39],
[32, 33]]])
Numpy: Indexing
Numpy: Indexing Multi-dimensional arrays
In [637]: data_matrix.shape
Out[637]: (5, 4, 2)
In [638]: indx_array.shape
Out[638]: (5, 4)
You need advanced-indexing on the first 2 dimensions. The first dimension array needs to broadcast with the second (5,4). To do that I make a (5,1) arange:
In [639]: data_matrix[np.arange(5)[:,None], indx_array]
Out[639]:
array([[[ 6, 7],
[ 4, 5],
[ 2, 3],
[ 0, 1]],
[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]],
[[18, 19],
[16, 17],
[22, 23],
[20, 21]],
[[24, 25],
[30, 31],
[26, 27],
[28, 29]],
[[34, 35],
[36, 37],
[38, 39],
[32, 33]]])
Contrast my (5,1) index with the accepted _x (which is (5,4) ravelled):
In [640]: np.arange(5)[:,None]
Out[640]:
array([[0],
[1],
[2],
[3],
[4]])
In [641]: _x = np.repeat(np.arange(indx_array.shape[0]),indx_array.shape[1])
In [643]: _x
Out[643]: array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
With broadcasting the _x doesn't need the repeat, (5,4); (5,1) is enough.
Broadcasting does a virtual repetition. This can be illustrated with the broadcast_to function:
In [648]: np.broadcast_to(np.arange(5)[:,None],(5,4))
Out[648]:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])
In [649]: _.strides
Out[649]: (8, 0)
It's that 0 strides that repeats without making copies. as_strided is the most useful stride_tricks function, especially when doing things like moving windows. Usually we just let the automatic broadcasting do the work without worrying too much about the how.

Categories