Array organizing in Python - python

I have python code below:
ht_24 = []
ht_23 = []
ht_22 = []
...
all_arr = [ht_24, ht_23, ht_22, ht_21, ht_20, ht_19, ht_18, ht_17, ht_16, ht_15, ht_14, ht_13, ht_12, ht_11, ht_10, ht_09, ht_08, ht_07, ht_06, ht_05, ht_04, ht_03, ht_02, ht_01]
i = 0
j = 0
while i < 24:
while j < 24864:
all_arr[i].append(read_matrix[j+i])
j += 24
print(j)
i += 1
print(i)
where read_matrix is an array of shape 24864, 17.
I want to read every 24th line from different starting indexs (0-24) and append them to the corresponding arrays for each line. Please help, this is so hard!

Two things to learn in Python:
ONE: for loops -- when you know ahead of time how many times you're going through the loop. Your while loops above are both this type. Try these instead:
for i in range(24):
for j in range(0, 24864, 24):
all_arr[i].append(read_matrix[j+i])
print(j)
print(i)
It's better when you let the language handle the index values for you.
TWO: List comprehensions: sort of a for loop inside a list construction. Your entire posted code can turn into a single statement:
all_arr = [[read_matrix[j+i] \
for j in range(0, 24864, 24) ] \
for i in range(24) ]

Your question is a little unclear, but I think
list(zip(*zip(*[iter(read_matrix)]*24)))
may be what you're looking for.
list(zip(*zip(*[iter(range(24864))]*24)))[0][:5]
The above just looks at the indices, and the first few elements of the first sublist are
(0, 24, 48, 72, 96)

Can numpy library do what you want?
import numpy as np
# 24864 row, 17 columns
read_matrix = np.arange(24864*17).reshape(24864,17)
new_matrices = [[] for i in range(24)]
for i in range(24):
# a has 17 columns
a = read_matrix[slice(i,None,24)]
new_matrices[i].append(a)

Related

Any easy way to transform a missing number sequence to its range?

Suppose I have a list that goes like :
'''
[1,2,3,4,9,10,11,20]
'''
I need the result to be like :
'''
[[4,9],[11,20]]
'''
I have defined a function that goes like this :
def get_range(lst):
i=0
seqrange=[]
for new in lst:
a=[]
start=new
end=new
if i==0:
i=1
old=new
else:
if new - old >1:
a.append(old)
a.append(new)
old=new
if len(a):
seqrange.append(a)
return seqrange
Is there any other easier and efficient way to do it? I need to do this in the range of millions.
You can use numpy arrays and the diff function that comes along with them. Numpy is so much more efficient than looping when you have millions of rows.
Slight aside:
Why are numpy arrays so fast? Because they are arrays of data instead of arrays of pointers to data (which is what Python lists are), because they offload a whole bunch of computations to a backend written in C, and because they leverage the SIMD paradigm to run a Single Instruction on Multiple Data simultaneously.
Now back to the problem at hand:
The diff function gives us the difference between consecutive elements of the array. Pretty convenient, given that we need to find where this difference is greater than a known threshold!
import numpy as np
threshold = 1
arr = np.array([1,2,3,4,9,10,11,20])
deltas = np.diff(arr)
# There's a gap wherever the delta is greater than our threshold
gaps = deltas > threshold
gap_indices = np.argwhere(gaps)
gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1]
# Finally, stack the two arrays horizontally
all_gaps = np.hstack((gap_starts, gap_ends))
print(all_gaps)
# Output:
# [[ 4 9]
# [11 20]]
You can access all_gaps like a 2D matrix: all_gaps[0, 1] would give you 9, for example. If you really need the answer as a list-of-lists, simply convert it like so:
all_gaps_list = all_gaps.tolist()
print(all_gaps_list)
# Output: [[4, 9], [11, 20]]
Comparing the runtime of the iterative method from #happydave's answer with the numpy method:
import random
import timeit
import numpy
def gaps1(arr, threshold):
deltas = np.diff(arr)
gaps = deltas > threshold
gap_indices = np.argwhere(gaps)
gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1]
all_gaps = np.hstack((gap_starts, gap_ends))
return all_gaps
def gaps2(lst, thr):
seqrange = []
for i in range(len(lst)-1):
if lst[i+1] - lst[i] > thr:
seqrange.append([lst[i], lst[i+1]])
return seqrange
test_list = [i for i in range(100000)]
for i in range(100):
test_list.remove(random.randint(0, len(test_list) - 1))
test_arr = np.array(test_list)
# Make sure both give the same answer:
assert np.all(gaps1(test_arr, 1) == gaps2(test_list, 1))
t1 = timeit.timeit('gaps1(test_arr, 1)', setup='from __main__ import gaps1, test_arr', number=100)
t2 = timeit.timeit('gaps2(test_list, 1)', setup='from __main__ import gaps2, test_list', number=100)
print(f"t1 = {t1}s; t2 = {t2}s; Numpy gives ~{t2 // t1}x speedup")
On my laptop, this gives:
t1 = 0.020834800001466647s; t2 = 1.2446780000027502s; Numpy gives ~59.0x speedup
My word that's fast!
There is iterator based solution. It'is allow to get intervals one by one:
flist = [1,2,3,4,9,10,11,20]
def get_range(lst):
start_idx = lst[0]
for current_idx in flist[1:]:
if current_idx > start_idx+1:
yield [start_idx, current_idx]
start_idx = current_idx
for inverval in get_range(flist):
print(inverval)
I don't think there's anything inefficient about the solution, but you can clean up the code quite a bit:
seqrange = []
for i in range(len(lst)-1):
if lst[i+1] - lst[i] > 1:
seqrange.append([lst[i], lst[i+1]])
I think this could be more efficient and a bit cleaner.
def func(lst):
ans=0
final=[]
sol=[]
for i in range(1,lst[-1]+1):
if(i not in lst):
ans+=1
final.append(i)
elif(i in lst and ans>0):
final=[final[0]-1,i]
sol.append(final)
ans=0
final=[]
else:
final=[]
return(sol)

customize step in loop through pandas

I know this question was asked a few times, but I couldn't understand the answers or apply them to my case.
I'm trying to iterate over a dataframe, and for each row, if column A has 1 add one to the counter, if it has 0 don't count the line in the counter (but don't skip it).
When we reach 10 for the counter, take all the rows and put them in an array and restart the counter. After searching a bit, it seems that generators could do the trick but I have a bit of trouble with them. So far I have something like this thanks to the help of SO community !
data = pd.DataFrame(np.random.randint(0,50,size=(50, 4)), columns=list('ABCD'))
data['C'] = np.random.randint(2, size=50)
data
counter = 0
chunk = 10
arrays = []
for x in range(0, len(data), chunk):
array = data.iloc[x: x+chunk]
arrays.append(array)
print(array)
the idea looks something like this :
while counter <= 10:
if data['A'] == 1:
counter += 1
yield counter
if counter > 10:
counter = 0
But I don't know how to combine this pseudo code with my current for loop.
When we use pandas, we should try not do for loop, based on your question , we can use groupby
arrays=[frame for _,frame in data.groupby(data.A.eq(1).cumsum().sub(1)//10)]
Explain :
we do cumsum with A if it is 1, then we will add the number up, 0 will keep same sum as pervious row, and // here is get the div to split the dataframe by step of 10 , for example 10//10 will return 1 and 20//10 will return 2.

nested loop in python with lists input

I want to have an array of signals which calculated from a function, by loop through different parameter combination, something like below:
fast_w = list(range(20, 100, 20))
slow_w = list(range(100, 200, 20))
tradeSignal = np.zeros((len(fast_w), len(slow_w)))
for i in fast_w:
for j in slow_w:
tradeSignal[i][j] = signalTrade(i, j, stock_price, end_date)
however "tradeSignal[i][j]" is incorrect as i and j would be the values in the fast_w and slow_w list, which here it suppose to be the index of array tradeSignal
So what is the right way to write such code?
new to python and its package ....thanks for help
How about this? I'm not certain that it works, since we're missing part of your code, but it should.
trade_signal = np.fromfunction(lambda i, j: signalTrade(i, j, stock_price, end_date), shape=(4, 5))
When working with numpy, you should avoid explicit loops and iteration unless absolutely necessary.
You can always use builtin function enumerate().
enumerate(iterable) will let you iterate getting tuples (index, value_from_iterable).
For example:
for idx, x in enumerate(range(5, 8)):
print(idx, x)
This code will produce the nex result:
0 5
1 6
2 7
Hence, your code will look like this:
fast_w = list(range(20, 100, 20))
slow_w = list(range(100, 200, 20))
tradeSignal = np.zeros((len(fast_w), len(slow_w)))
for i, fast_val in enumerate(fast_w):
for j, slow_val in enumerate(slow_w):
tradeSignal[i][j] = signalTrade(fast_val, slow_val, stock_price, end_date)
You can find more information here.

How to append randomized float values into array within loop

I have a set of randomized float values that are to be arranged into an array at the end of each loop that produces 67 of them, however, there are 64 total loops.
As an example, if I had 4 values per loop and 3 total loops of integers, I would like it to be like this:
values = [[0, 4, 5, 1],[6, 6, 5, 3],[0,0,0,7]]
such that I could identify them as separate arrays, however, I am unsure of the best way to append the values after they are created, but am aware of how to return them. Forgive me as I am unskilled with the logic.
import math
import random
funcs = []
coord = []
pi = math.pi
funcAmt = 0
coordAmt = 0
repeatAmt = 0
coordPass = 0
while funcAmt < 64:
while coordAmt < 67:
coordAmt += 1
uniform = round(random.uniform(-pi, pi), 2)
print("Coord [",coordAmt,"] {",uniform,"} Func:", funcAmt + 1)
if uniform in coord:
repeatAmt += 1
print("Repeat Found!")
coordAmt -= 1
print("Repeat [",repeatAmt,"] Resolved")
pass
else:
coordPass += 1
coord.append(uniform)
#<<<Append Here>>>
funcAmt += 1
coord.clear()
coordAmt = 0
In my given code above, it would be similar to:
func = [
[<67 items>],
...63 more times
]
Your "append here" logic should append the coordinate list and then clear that list for the next iteration of the outer loop:
funcs.append(coord[:]) # The slice notation makes a copy of the list
coord.clear() # or simply coord = []
You should learn to use a for loop. This will simplify your looping: you don't have to maintain the counts yourself. For instance:
for funcAmt in range(64):
for coordAmt in range(67):
...
You might also look up how to make a "list comprehension", which can reduce your process to a single line of code -- a long, involved line, but readable with proper white space.
Does that get you moving?
There are a couple of ways around this. Instead of using while lists and counters, you could just use for loops. Or at least do that for the outer loop, since it looks like you still want to check for repeats. Here's an example using your original dimensions of 3 and 4:
from math import pi
import random
coord_sets = 3
coords = 4
biglist = []
for i in range(coord_sets):
coords_set = []
non_repeating_coords = 0
while non_repeating_coords < coords:
new_coord = round(random.uniform(-1.0*pi, pi), 2)
if new_coord not in coords_set:
coords_set.append(new_coord)
non_repeating_coords += 1
biglist.append(coords_set)
print(biglist)
You can use sets because they don't allow duplicate values:
from math import pi
import random
funcs = []
funcAmt = 0
while funcAmt < 64: # This is the number of loops
myset = set()
while len(myset) < 67: # This is the length of each set
uniform = round(random.uniform(-pi, pi), 2)
myset.add(uniform)
funcs.append(list(myset)) # Append randomly generated set as a list
funcAmt += 1
print(funcs)
maybe you can benefit from arrays in numpy:
import numpy as np
funcs = np.random.uniform(-np.pi, np.pi, [63, 67])
This creates an array of shape (63, 67) from uniform random between -pi to pi.

Randomly iterating through a 2D list in Python

My first attempt to accomplish this resulted in:
def rand_Random(self):
randomRangeI = range(self.gridWidth)
shuffle(randomRangeI)
randomRangeJ = range(self.gridHeight)
shuffle(randomRangeJ)
for i in randomRangeI:
for j in randomRangeJ:
if self.grid[i][j] != 'b':
print i, j
self.grid[i][j].colour = self.rand_Land_Picker()
Which has the issue of going through one inner list at a time:
[1][1..X]
[2][1..X]
What I'd like to be able to do is iterate through the 2d array entirely at random (with no repeats).
Anyone have any solutions to this problem?
Edit: Thanks for the replies, it appears the way I view 2d arrays in my mind is different to most!
Create an array with all possible pairs of coordinates, shuffle this and iterate through as normal.
import random
coords = [(x,y) for x in range(self.gridWidth) for y in range(self.gridHeight)
random.shuffle(coords)
for i,j in coords:
if self.grid[i][j] != 'b':
print i, j
self.grid[i][j].colour = self.rand_Land_Picker()
You can consider 2D array as 1D array and randomly iterate through it.
def rand_Random(self):
randomRange = range(self.gridWidth*self.gridHeight)
shuffle(randomRange)
for index in randomRange:
i = index / self.gridWidth
j = index % self.gridWidth
if self.grid[i][j] != 'b':
print i, j
self.grid[i][j].colour = self.rand_Land_Picker()
You can do something like:
randomRange = range(w*h)
shuffle(randomRange)
for n in randomRange:
i = n/w
j = n%w
Here randomRange basically enumerates all the coordinates from 0 to w*h-1.
Even prettier, i and j, can be found in one statement:
i,j = divmod(n, w)

Categories