Multiprocessing Pooling Fails at Dask Functions - python

I am trying to take two arrays, "day 1": ranging from 0 to 11 (incremented by +1) and "day 2:" ranging from 11 to 0 (incremented by -1), and sum them. However, I wish to use multiprocessing and dask arrays to speed up the process (I will be going to bigger numbers later). I want to split day 1 and day 2 into four equal parts (day 1: [0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11] and day 2: [11, 10, 9], [8, 7, 6], [5, 4, 3], [2, 1, 0]) and have four processes to add work on each consequent array (i.e., day1's [0, 1, 2] with day 2's [11, 10, 9] and get [11, 11, 11]. After all four processes are done, I hope to return back into one big list of [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11]. However, within the function of the bolded step, the code fails to run and is stuck in an infinite loop or calculations of some sort.
Code:
import numpy as np
import dask.array as da
from dask import delayed
import threading
import multiprocessing as mp
NUM_WORKERS = 4
# create list from 0 to 11
day1 = list(range(12))
# create list form 11 to 0
day2 = day1[::-1]
def get_sum(i, base):
z = []
x = day1[i * length: i * length + length]
y = day2[i * length: i * length + length]
z.append(x)
z.append(y)
converted = da.from_array(z, chunks = NUM_WORKERS)
**summed = da.sum(converted, axis = 0).compute()**
list_concatenate = np.concatenate((base, summed), axis=0)
all_sum = sum(list_concatenate)
process_list = []
for i in range(NUM_WORKERS):
process_list = mp.Process(target = get_sum, args = (i, process_list))
process_list.start()
process_list.join()

Related

summing a column in python

I have a list of lists which consists of numerical data, a kind of matrix.
I'd like to create a function to sum up any column I later choose (1+2+9+10=?, 3+4+11+12=?, etc.)
The restraints are that I want to accomplish that by using for loops, and old-school python, no numpy, preferably without the zip function.
outside the loop I'd like to calculate an average within every column.
What would be the simplest way to accomplish that ?
Here's what I came up with thus far:
data = [[1, 3, 5, 7], [2, 4, 6, 8], [9, 11, 13, 15], [10, 12, 14, 16]]
def calc_avg(data, column):
total = 0
for row in data:
total += ....
avg = total / len(calc_avg)
later on, I would print the average for the column I choose.
Introduce a variable nr to keep count of number of rows added as you loop.
def calc_avg(data, column):
total = 0
nr = 0
for row in data:
nr += 1
total += row[column]
return total / nr
You'd probably need some counter to keep track of the "denominator" for your average -
data = [[1, 3, 5, 7], [2, 4, 6, 8], [9, 11, 13, 15], [10, 12, 14, 16]]
def calc_avg(data, column):
total = 0
counter = 0
for row in data:
total += row[column]
counter += 1
avg = total / counter
return avg
You can write a simple function to collect all column values and perform a math op.
Eg.
def get_sum_avg(chosen_column, dataset):
# filter the column values. Ignore rows with no such col
chosen_column_values = [element[chosen_column - 1] for element in dataset if len(element) >= chosen_column]
# find sum
col_sum = sum(chosen_column_values)
# find avg
average = col_sum / len(chosen_column_values) if len(chosen_column_values) > 0 else 0
return col_sum, average
data = [[1, 3, 5, 7], [2, 4, 6, 8], [9, 11, 13, 15], [10, 12, 14, 16]]
print(get_sum_avg(1, data))

Alternative to for loops for calculating 15^6 combinations in Python

Today, I have a nested for loop in python to calculate the value of all different combinations in a horse racing card consisting of six different races; i.e. six different arrays (of different lengths, but up to 15 items per array). It can be up to 11 390 625 combinations (15^6).
For each horse in each race, I calculate a value (EV) which I want to multiply.
Array 1: 1A,1B,1C,1D,1E,1F
Array 2: 2A,2B,2C,2D,2E,2F
Array 3: 3A,3B,3C,3D,3E,3F
Array 4: 4A,4B,4C,4D,4E,4F
Array 5: 5A,5B,5C,5D,5E,5F
Array 6: 6A,6B,6C,6D,6E,6F
1A * 1B * 1C * 1D * 1E * 1F = X,XX
.... .... .... .... ... ...
6A * 6B * 6C * 6D * 6E * 6F 0 X,XX
Doing four levels is OK. It takes me about 3 minutes.
I have yet not been able to do six levels.
I need help in creating a better way of doing this, and have no idea how to proceed. Does numpy perhaps offer help here? Pandas? I've tried compiling the code with Cython, but it did not help much.
My function takes in a list containing the horses in numerical order and their EV. (Since horse starting numbers do not start with zero, I add 1 to the index). I iterate through all the different races, and save the output for the combination into a dataframe.
def calculateCombos(horses_in_race_1,horses_in_race_2,horses_in_race_3,horses_in_race_4,horses_in_race_5,horses_in_race_6,totalCombinations, df):
totalCombinations = 0
for idx1, hr1_ev in enumerate(horses_in_race_1):
hr1_no = idx1 + 1
for idx2, hr2_ev in enumerate(horses_in_race_2):
hr2_no = idx2 + 1
for idx3, hr3_ev in enumerate(horses_in_race_3):
hr3_no_ = idx3 + 1
for idx4, hr4_ev in enumerate(horses_in_race_4):
hr4_no = idx4 + 1
for idx5, hr5_ev in enumerate(horses_in_race_5):
hr5_no = idx5 + 1
for idx6, hr6_ev in enumerate(horses_in_race_6):
hr6_no = idx6 + 1
totalCombinations = totalCombinations + 1
combinationEV = hr1_ev * hr2_ev * hr3_ev * hr4_ev * hr5_ev * hr6_ev
new_row = {'Race1':str(hr1_no),'Race2':str(hr2_no),'Race3':str(hr3_no),'Race4':str(hr4_no),'Race5':str(hr5_no),'Race6':str(hr6_no), 'EV':combinationEV}
df = appendCombinationToDF(df, new_row)
return df
Why don't you try this and see if you can run the function without any issues? This works on my laptop (I'm using PyCharm). If you can't run this, then I would say that you need a better PC perhaps. I did not encounter any memory error.
Assume that we have the following:
horses_in_race_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_2 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_3 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_4 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_5 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_6 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
I have re-written the function as follows - made a change in enumeration. Also, not using df as I do not know what function this is - appendCombinationToDF
def calculateCombos(horses_in_race_1,horses_in_race_2,horses_in_race_3,horses_in_race_4,horses_in_race_5,horses_in_race_6):
for idx1, hr1_ev in enumerate(horses_in_race_1, start = 1):
for idx2, hr2_ev in enumerate(horses_in_race_2, start = 1):
for idx3, hr3_ev in enumerate(horses_in_race_3, start = 1):
for idx4, hr4_ev in enumerate(horses_in_race_4, start = 1):
for idx5, hr5_ev in enumerate(horses_in_race_5, start = 1):
for idx6, hr6_ev in enumerate(horses_in_race_6, start = 1):
combinationEV = hr1_ev * hr2_ev * hr3_ev * hr4_ev * hr5_ev * hr6_ev
new_row = {'Race1':str(idx1),'Race2':str(idx2),'Race3':str(idx3),'Race4':str(idx4),'Race5':str(idx5),'Race6':str(idx6), 'EV':combinationEV}
l.append(new_row)
#df = appendCombinationToDF(df, new_row)
l = [] # df = ...
calculateCombos(horses_in_race_1, horses_in_race_2, horses_in_race_3, horses_in_race_4, horses_in_race_5, horses_in_race_6)
Executing len(l), I get:
11390625 # maximum combinations possible. This means that above function ran successfully and computation succeeded.
If the above can be executed, replace list l with df and see if function can execute without encountering memory error. I was able to run the above in less than 20-30 seconds.

how to create a multidimensional array on the fly using python?

I have a loop which generates a value_list each time it runs, at the end of each iteration i want to append all the lists into a one multi dimensional array
I have:
value_list = [1,2,3,4] in 1st iteration
value_list = [5,6,7,8] in 2nd iteration
value list = [9,10,11,12] in 3rd iteration
etc...
At the end of each iteration I want one multi dimensional array like
value_list_copy = [[1,2,3,4]] in the 1st iteration
value_list_copy = [[1,2,3,4],[5,6,7,8]] in the 2nd iteration
value_list_copy = [[1,2,3,4],[5,6,7,8],[9,10,11,12]]
etc...
How could I achieve this?
Thanks
You can use a nested comprehension and itertools.count:
from itertools import count, islice
cols = 4
rows = 5
c = count(1)
matrix = [[next(c) for _ in range(cols)] for _ in range(rows)]
# [[1, 2, 3, 4],
# [5, 6, 7, 8],
# [9, 10, 11, 12],
# [13, 14, 15, 16],
# [17, 18, 19, 20]]
The cool kids might also want to zip the count iterator with itself:
list(islice(zip(*[c]*cols), rows))
# [(1, 2, 3, 4),
# (5, 6, 7, 8),
# (9, 10, 11, 12),
# (13, 14, 15, 16),
# (17, 18, 19, 20)]
If you are using Python3.8 then use Walrus assignment(:=).
For Syntax and semantic.
count=0
rows=5
cols=4
[[(count:=count+1) for _ in range(cols)] for _ in range(rows)]
Output:
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20]]
Without using :=.
rows=5
cols=4
[list(range(i,i+cols)) for i in range(1,rows*cols,cols)]
Try this:
limit = 10
length_of_elements_in_each_list = 4
[range(i, i+length_of_elements_in_each_list) for i in range(1, limit)]
You can set a limit and length_of_elements_in_each_list according to your need.
Try this below :
value_list_copy = []
for i in range(n): # ----------> Assuming n is the number of times your loop is running
value_list_copy.append(value_list) # ------ Append your value list in value_list_copy in every iteration
Here you will get an array of arrays.
print(value_list_copy)
Here are two other possible solutions:
Double for loop approach
rows, cols, start = 3, 4, 1
value_list_copy = []
for j in range(rows):
value_list = []
for i in range(start, cols + start):
value_list.append((j*cols)+i)
value_list_copy.append(value_list)
print(
f'value_list = {value_list}\n'
f'value_list_copy = {value_list_copy}\n'
)
List comp method
rows, cols, start = 3, 4, 1
value_list_copy_2 = [
[
(j*cols)+i for i in range(start, cols + start)
] for j in range(rows)
]
print(f'value_list_copy_2 = {value_list_copy_2}')
Python Tutor Link to example code

Python array logic

I am trying to create a list of lists with the input of m and n, where m is the number of lists within the main list and n is the number of elements within each given list. The grid should contain the integers from start to start + rows * cols - 1 and be ascending. But, every odd numbered row should be descending instead.
The code I've written is returning the expected results, but my automated tester is saying it's incorrect. Maybe my logic is messed up somewhere?
inputs:
start = 1, m = 3, n = 5
expected:
[[1,2,3,4,5],[10,9,8,7,6],[11,12,13,14,15]]
result = []
mylist = []
start = 1
for x in range(0, rows):
for x in range(0, cols):
result.append(start)
start += 1
for y in range(0, rows):
if y%2 != 0:
mylist.append(result[cols - 1::-1])
del result[cols - 1::-1]
else:
mylist.append(result[0:cols])
del result[0:cols]
return mylist
One possible solution, using itertools.count:
from itertools import count
def build(m, n, start=1):
lst, c = [], count(start)
for i in range(m):
lst.append([next(c) for j in range(n)][::-1] if i % 2 else [next(c) for j in range(n)])
return lst
print(build(3, 5, 1))
Prints:
[[1, 2, 3, 4, 5], [10, 9, 8, 7, 6], [11, 12, 13, 14, 15]]
print(build(3, 0, 1))
Prints:
[[], [], []]
just generate the list of numbers you need which will be n * m, in your case that would generate 0 to 14 in the python range function. However as we want to start at ` then we need to add the start offset too the range end.
Now we can generate all the numbers we need we just need to think about how to create them.
well we can add numbers to the list until the list reaches the size of n, then we need to start a new list, However if the list we just finished is an even numbered row then we need to reverse that list.
def build_lists(m, n, start=1):
data =[[]]
for i in range(start, n * m + start):
if len(data[-1]) < n:
data[-1].append(i)
else:
if len(data) % 2 == 0:
data[-1] = data[-1][::-1]
data.append([i])
if len(data) % 2 == 0:
data[-1] = data[-1][::-1]
return data
print(build_lists(3, 5))
print(build_lists(6, 3))
print(build_lists(6, 2, 100))
OUTPUT
[[1, 2, 3, 4, 5], [10, 9, 8, 7, 6], [11, 12, 13, 14, 15]]
[[1, 2, 3], [6, 5, 4], [7, 8, 9], [12, 11, 10], [13, 14, 15], [18, 17, 16]]
[[100, 101], [103, 102], [104, 105], [107, 106], [108, 109], [111, 110]]

Python list slicing

I'm not able understand what to do here. Can someone help.
I've a few lists:
array = [7,8,2,3,4,10,5,6,7,10,8,9,10,4,5,12,13,14,1,2,15,16,17]
slice = [2, 4, 6, 8, 10, 12, 15, 17, 20, 22]
intervals = [12, 17, 22]
output = []
intermediate = []
slice is a list of indices I need to get from slicing array. interval is a list of indices used to stop the slicing when slice[i] is interval[j] where i and j are looping variables.
I need to form a list of lists from array based on slice and intervals based on the condition that when slice[i] is not interval[j]
intermediate =intermediate + array[slice[i]:slice[i+1]+1]
here in my case:
when slice[i] and interval[j] are equal for value 12. So I need to form a list of lists from array
intermediate = array[slice[0]:slice[0+1]+1] + array[slice[2]:slice[2+1]+1] + array[slice[4]:slice[4+1]+1]
which is
intermediate = array[2:(4+1)] + array[6:(8+1)] + array[10:(12+1)]
and when slice[i] is interval[j] output = output + intermediate and the slicing is continued.
output = output + [intermediate]
which is
output = output + [array[2:(4+1)] + array[6:(8+1)] + array[10:(12+1)]]
now the next value in interval is 17 so till we have 17 in slice we form another list from array[slice[6]:slice[6+1]+1] and add this to the output. This continues.
The final output should be:
output = [array[slice[0]:slice[0+1]+1] + array[slice[2]:slice[2+1]+1] + array[slice[4]:slice[4+1]+1] , array[slice[6]:slice[6+1]+1], array[slice[8]:slice[8+1]+1]]
which is
output = [[2, 3, 4, 5, 6, 7, 8, 9, 10], [12, 13, 14], [15, 16, 17]]
A straightforward solution:
array_ = [7,8,2,3,4,10,5,6,7,10,8,9,10,4,5,12,13,14,1,2,15,16,17]
slice_ = [2, 4, 6, 8, 10, 12, 15, 17, 20, 22]
intervals = [12, 17, 22]
output = []
intermediate = []
for i in range(0, len(slice_), 2):
intermediate.extend(array_[slice_[i]:slice_[i+1]+1])
if slice_[i+1] in intervals:
output.append(intermediate)
intermediate = []
print output
# [[2, 3, 4, 5, 6, 7, 8, 9, 10], [12, 13, 14], [15, 16, 17]]
I have changed some variable names to avoid conflicts.
On large data, you may convert intervals to a set.
Here is a recursive solution which goes through the index once and dynamically check if the index is within the intervals and append the sliced results to a list accordingly:
def slicing(array, index, stops, sliced):
# if the length of index is smaller than two, stop
if len(index) < 2:
return
# if the first element of the index in the intervals, create a new list in the result
# accordingly and move one index forward
elif index[0] in stops:
if len(index) >= 3:
sliced += [[]]
slicing(array, index[1:], stops, sliced)
# if the second element of the index is in the intervals, append the slice to the last
# element of the list, create a new sublist and move two indexes forward accordingly
elif index[1] in stops:
sliced[-1] += array[index[0]:(index[1]+1)]
if len(index) >= 4:
sliced += [[]]
slicing(array, index[2:], stops, sliced)
# append the new slice to the last element of the result list and move two index
# forward if none of the above conditions satisfied:
else:
sliced[-1] += array[index[0]:(index[1]+1)]
slicing(array, index[2:], stops, sliced)
sliced = [[]]
slicing(array, slice_, intervals, sliced)
sliced
# [[2, 3, 4, 5, 6, 7, 8, 9, 10], [12, 13, 14], [15, 16, 17]]
Data:
array = [7,8,2,3,4,10,5,6,7,10,8,9,10,4,5,12,13,14,1,2,15,16,17]
slice_ = [2, 4, 6, 8, 10, 12, 15, 17, 20, 22]
intervals = [12, 17, 22]

Categories