Splitting python list between 2 values - python

[0,
100,
7,
27,
34,
40,
41,
48,
58,
65,
75,
78,
79,
96,
126,
127,
0,
0,
2,
45,
54,
56,
57,
59,
66,
67,
82,
86,
102,
124,
133,
0,
0,
35,
39,
52,
53,
60,
61,
80,
81,
83,
87,
97,
98,
101,
109,
0,
0,
15,
28,
29,
30,
31,
32,
33,
37,
38,
49,
50,
51,
71,
95,
0,
0,
3,
16,
22,
23,
44,
72,
73,
74,
90,
110,
131,
0,
0,
10,
11,
18,
19,
36,
55,
89,
93,
94,
108,
113,
114,
0,
0,
1,
5,
6,
9,
12,
17,
24,
43,
64,
77,
85,
88,
91,
92,
111,
112,
130,
0,
0,
13,
20,
42,
62,
68,
84,
99,
104,
116,
119,
125,
128,
129,
132,
0,
0,
8,
14,
26,
63,
69,
70,
103,
105,
123,
0,
0,
4,
21,
25,
46,
47,
106,
107,
115,
117,
118,
120,
121,
122,
0,
0,
76,
0]
I have this list of values and I want to split is between every two zeroes.
So my list will look like this:
[0, 100, 7, 27, 34, 40, 41, 48, 58, 65, 75, 78, 79, 96, 126, 127, 0],[ 0, 2, 45, 54, 56, 57, 59, 66, 67, 82, 86, 102, 124, 133, 0],[ 0, 35, 39, 52, 53, 60, 61, 80, 81, 83, 87, 97, 98, 101, 109, 0],[ 0, 15, 28, 29, 30, 31, 32, 33, 37, 38, 49, 50, 51, 71, 95, 0],[ 0, 3, 16, 22, 23, 44, 72, 73, 74, 90, 110, 131, 0][ 0, 10, 11, 18, 19, 36, 55, 89, 93, 94, 108, 113, 114, 0],[ 0, 1, 5, 6, 9, 12, 17, 24, 43, 64, 77, 85, 88, 91, 92, 111, 112, 130, 0],[ 0, 13, 20, 42, 62, 68, 84, 99, 104, 116, 119, 125, 128, 129, 132, 0],[ 0, 8, 14, 26, 63, 69, 70, 103, 105, 123, 0],[ 0, 4, 21, 25, 46, 47, 106, 107, 115, 117, 118, 120, 121, 122, 0][ 0, 76, 0]
Can someone help me out?

you may use a simple for loop with the built-in function zip:
# l is your list
result = [[l[0]]]
for i, j in zip(l[1:], l):
if i == 0 == j:
result.append([i])
else:
result[-1].append(i)
result
output:
[[0, 100, 7, 27, 34, 40, 41, 48, 58, 65, 75, 78, 79, 96, 126, 127, 0],
[0, 2, 45, 54, 56, 57, 59, 66, 67, 82, 86, 102, 124, 133, 0],
[0, 35, 39, 52, 53, 60, 61, 80, 81, 83, 87, 97, 98, 101, 109, 0],
[0, 15, 28, 29, 30, 31, 32, 33, 37, 38, 49, 50, 51, 71, 95, 0],
[0, 3, 16, 22, 23, 44, 72, 73, 74, 90, 110, 131, 0],
[0, 10, 11, 18, 19, 36, 55, 89, 93, 94, 108, 113, 114, 0],
[0, 1, 5, 6, 9, 12, 17, 24, 43, 64, 77, 85, 88, 91, 92, 111, 112, 130, 0],
[0, 13, 20, 42, 62, 68, 84, 99, 104, 116, 119, 125, 128, 129, 132, 0],
[0, 8, 14, 26, 63, 69, 70, 103, 105, 123, 0],
[0, 4, 21, 25, 46, 47, 106, 107, 115, 117, 118, 120, 121, 122, 0],
[0, 76, 0]]

I think the easiest way is to do this:
outputList=[]
start = 0
for i in range(len(myList) - 1):
curr = myList[i]
next = myList[i+1]
if curr == 0 and next == 0:
outputList.append(myList[start:i+1]
start = i+1
I think that will do the job, let me now if that worked! :D

Related

Python - Reverse every other slice of an array

I wrote some code to reverse every other slice of rows.
import numpy as np
test_arr = np.arange(120).reshape(12,10)
test_arr = test_arr.tolist()
def rev_rows(matrix):
for I in range(len(matrix)): #did this to get the index of each row
if((int(I / 4) % 2) == True): #selct rows whose index divided by 4 truncate to an odd number
print("flip")
matrix[I].reverse() #flip said row
print(matrix[I])
rev_rows(test_arr)
There has to by an easier and more efficient way of doing this. I was thinking another way would be to use list operators like slices, but I can't think of one which is faster than this. Is there an easier way with numpy?
Note: the length of the matrix would be divisible by 4. i.e. (4x10), (8x10), ...
EDIT:
Sorry about the ambiguous usage of slice. What I meant by a slice is a set of rows (like test_arr[3] -> test_arr[7]). So, reversing every other slice would be reversing every row between indexes 3 and 7. I was in my little blurb about the slicing operator I was talking about this operator -> [3:7]. I don't have experience with them, and I read somewhere that they are called slicing, my bad.
Update
The question wasn't very clear, so my original answer didn't solve it. Here is a working example.
With a loop
The loop version's performance is more predictable, because it's not always clear when a reshape will trigger a copy.
>>> test_arr = np.arange(120).reshape(12, 10)
>>> for i in range(4, 8):
... test_arr[i::8] = test_arr[i::8,::-1]
>>> test_arr
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 49, 48, 47, 46, 45, 44, 43, 42, 41, 40],
[ 59, 58, 57, 56, 55, 54, 53, 52, 51, 50],
[ 69, 68, 67, 66, 65, 64, 63, 62, 61, 60],
[ 79, 78, 77, 76, 75, 74, 73, 72, 71, 70],
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])
>>>
Without a loop
A loopless version, as asked by #KellyBundy.
>>> test_arr = np.arange(120).reshape(12, 10)
>>> temp_arr = test_arr.reshape(test_arr.shape[0]//4, 4, test_arr.shape[1])
>>> temp_arr[1::2] = temp_arr[1::2,:,::-1]
>>> test_arr = temp_arr.reshape(*test_arr.shape)
>>> test_arr
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 49, 48, 47, 46, 45, 44, 43, 42, 41, 40],
[ 59, 58, 57, 56, 55, 54, 53, 52, 51, 50],
[ 69, 68, 67, 66, 65, 64, 63, 62, 61, 60],
[ 79, 78, 77, 76, 75, 74, 73, 72, 71, 70],
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])
>>>
Original answer
You can do this with slicing:
test_arr[::2] = test_arr[::2,::-1] or test_arr[1::2] = test_arr[1::2,::-1].
See the examples:
>>> import numpy as np
>>> test_arr = np.arange(120).reshape(12, 10)
>>> test_arr
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[ 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])
>>> test_arr[::2] = test_arr[::2,::-1]
>>> test_arr
array([[ 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 29, 28, 27, 26, 25, 24, 23, 22, 21, 20],
[ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 49, 48, 47, 46, 45, 44, 43, 42, 41, 40],
[ 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 69, 68, 67, 66, 65, 64, 63, 62, 61, 60],
[ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[ 89, 88, 87, 86, 85, 84, 83, 82, 81, 80],
[ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[109, 108, 107, 106, 105, 104, 103, 102, 101, 100],
[110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])
>>>
If, instead, you wanted to reverse rows with odd indices, you'd do
>>> test_arr = np.arange(120).reshape(12, 10)
>>> test_arr[1::2] = test_arr[1::2,::-1]
>>> test_arr
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 19, 18, 17, 16, 15, 14, 13, 12, 11, 10],
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 39, 38, 37, 36, 35, 34, 33, 32, 31, 30],
[ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[ 59, 58, 57, 56, 55, 54, 53, 52, 51, 50],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[ 79, 78, 77, 76, 75, 74, 73, 72, 71, 70],
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[ 99, 98, 97, 96, 95, 94, 93, 92, 91, 90],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[119, 118, 117, 116, 115, 114, 113, 112, 111, 110]])
>>>
Slice after reshape:
>>> test_arr = np.arange(120).reshape(12, 10)
>>> test_arr
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[ 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])
>>> reshaped = test_arr.reshape(-1, 4, 10)
>>> reshaped[1::2] = reshaped[1::2, ..., ::-1]
>>> test_arr
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 49, 48, 47, 46, 45, 44, 43, 42, 41, 40],
[ 59, 58, 57, 56, 55, 54, 53, 52, 51, 50],
[ 69, 68, 67, 66, 65, 64, 63, 62, 61, 60],
[ 79, 78, 77, 76, 75, 74, 73, 72, 71, 70],
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])
Note:
This mainly depends on the fact that ndarray.reshape generally does not copy array. However, according to this problem, we can know some details of reshape:
After reshaping the values in the data buffer need to be in a contiguous order, either 'C' or 'F'.
And condition for returning copies:
It will do a copy if the initial order is so 'messed up' that it can't return values like this.
If a copy is returned, this method will obviously fail. But here, I attach a conjecture of my own: if we simply raise the dimension of the array, rather than trying to do things like flattening a transposed array that affect data continuity, it may not cause data discontinuity, so this method may always be effective.

Permutations on lists within lists sequentially

I have two lists and I would like to calculate the permutations between the two. I have been able to successfully do this using itertools, but am having trouble taking it further.
I have two nested lists:
list_1 = [0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125][10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125]
list_2 = [10, 9, 2, 1, 0][10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125]
The first entry of list_1 ([0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125]) needs to be permutated with the first entry of list_2 ([10, 9, 2, 1, 0]). Then I need to get the permutations of the second entry of list_1 with the second entry of list_2, etc.
The issue is that there will be no set number of entries in each list, so it is not feasible to simply make variables for list_1[0], list_2[0], etc.
What would be the simplest way to do this?
import itertools
list_1 = ([0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125],
[10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125])
list_2 = ([10, 9, 2, 1, 0],
[10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125])
count = 0
for list1_item, list2_item in zip(list_1, list_2):
print(f"{list1_item=} {list2_item=}")
for permutation in itertools.permutations(itertools.chain(list1_item, list2_item)):
if count % 10**8 == 0: # print once in a while
print(permutation)
count += 1
print(count)
print(f"last permutation: {permutation}")
gives
list1_item=[0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125] list2_item=[10, 9, 2, 1, 0]
(0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125, 10, 9, 2, 1, 0)
(0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 210, 125, 10, 9, 114, 22, 0, 87, 2, 1, 16, 28)
(0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 28, 16, 210, 22, 125, 9, 1, 2, 87, 10, 114, 0)
...
list1_item=[10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125] list2_item=[10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125]
(10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125, 10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125)
...

How to use tf dataset sharding AFTER a shuffle operation, but not repeat entries

I am trying to solve the following problem:
I have 128 files that I want to break into 4 subsets. Each time around, I want the division to be different.
If I do tf.data.Dataset.list_files('glob_pattern', shuffle=False), the dataset has the right number of files. Sharding this works as expected, but each shard only ever has the same files.
I want to shard and end up with a different division of the files each go-through the data. However, if I turn shuffle=True, then each shard seems to have its own copy of the original dataset, meaning that I can see the same file multiple times before seeing all the files once.
Is there an idiomatic way of splitting these files?
Basically, I'm wondering why the original list_files dataset is able to have some files show up multiple times before all the files have been seen.
Here is some TF2.0 code to see the problem:
ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
ds = ds.shuffle(128)
n_splits = 4
sub_datasets = [ds.shard(n_splits, i) for i in range(n_splits)]
output = []
# go through each of the subsets
for i in range(n_splits):
results = [x.numpy().decode() for x in sub_datasets[i]]
output.extend(results)
print(len(set(output)), 'is the number of unique files seen (128 desired)')
Here's an answer from what I can understand of your question. To generate a new subset of 4 datasets (shared) each time randomly shuffled, you can use the following code.
import numpy as np
import tensorflow as tf
# ####################
# I used numbers for visualization ... feel free to replace with your demo code
# ####################
# ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
# ####################
arr = np.arange(128)
ds = tf.data.Dataset.from_tensor_slices(arr)
def get_four_datasets(original_ds, window_size=32, shuffle_size=128):
""" Every time you call this function you will get a new four datasets """
return original_ds.shuffle(shuffle_size).window(window_size)
remake_ds_1 = list()
remake_ds_2 = list()
for i, (dataset_1, dataset_2) in enumerate(zip(get_four_datasets(ds), get_four_datasets(ds))):
print(f"\n\nDATASET #1-{i+1}")
ds_subset = [value for value in dataset_1.as_numpy_iterator()]
print("\t", ds_subset)
remake_ds_1.extend(ds_subset)
print(f"\nDATASET #2-{i+1}")
ds_subset_2 = [value for value in dataset_2.as_numpy_iterator()]
print("\t", ds_subset_2)
remake_ds_2.extend(ds_subset_2)
print("\n\nCounts\n")
print("DS 1 ALL: ", len(remake_ds_1))
print("DS 1 UNIQUE: ", len(set(remake_ds_1)))
print("DS 2 ALL: ", len(remake_ds_2))
print("DS 2 UNIQUE: ", len(set(remake_ds_2)))
OUTPUT
DATASET #1-1
[96, 4, 66, 120, 42, 54, 110, 57, 67, 7, 13, 9, 69, 86, 122, 88, 10, 55, 27, 106, 77, 107, 114, 87, 59, 81, 1, 49, 118, 17, 36, 11]
DATASET #2-1
[47, 26, 122, 10, 110, 31, 86, 34, 52, 121, 36, 112, 55, 48, 50, 108, 100, 103, 113, 68, 58, 29, 32, 84, 124, 15, 38, 51, 6, 66, 24, 41]
DATASET #1-2
[56, 80, 94, 124, 52, 109, 83, 90, 112, 35, 6, 101, 20, 84, 73, 74, 100, 99, 108, 15, 14, 12, 89, 24, 8, 29, 68, 85, 125, 3, 33, 58]
DATASET #2-2
[125, 127, 74, 97, 12, 39, 109, 126, 98, 40, 99, 93, 35, 107, 91, 88, 45, 13, 106, 120, 19, 73, 83, 11, 105, 61, 16, 114, 79, 95, 94, 44]
DATASET #1-3
[105, 38, 43, 60, 0, 26, 127, 65, 22, 18, 123, 82, 121, 71, 51, 23, 113, 30, 63, 40, 2, 61, 16, 98, 64, 25, 41, 28, 45, 19, 117, 39]
DATASET #2-3
[75, 64, 1, 17, 7, 42, 80, 92, 3, 9, 54, 33, 82, 56, 118, 102, 115, 43, 28, 90, 60, 119, 0, 57, 123, 62, 22, 72, 65, 23, 30, 87]
DATASET #1-4
[48, 62, 31, 102, 111, 46, 103, 44, 116, 79, 21, 50, 53, 78, 93, 32, 95, 34, 92, 126, 104, 47, 119, 37, 5, 70, 97, 91, 76, 75, 72, 115]
DATASET #2-4
[4, 85, 21, 116, 78, 27, 117, 2, 59, 111, 69, 46, 63, 20, 49, 5, 81, 53, 18, 37, 8, 76, 71, 89, 14, 104, 25, 96, 67, 101, 77, 70]
Counts
DS 1 ALL: 128
DS 1 UNIQUE: 128
DS 2 ALL: 128
DS 2 UNIQUE: 128
If you just want to generate a dataset where every 32 examples pulled from the dataset is shuffled and you want to iterate over the dataset multiple times getting new 32-set samples every time, you can do the following.
import numpy as np
import tensorflow as tf
arr = np.arange(128)
N_REPEATS = 10
ds = tf.data.Dataset.from_tensor_slices(arr)
ds = ds.shuffle(128).batch(32).repeat(N_REPEATS)
OUTPUT
BATCH 1: [92, 94, 76, 38, 58, 9, 44, 16, 86, 28, 64, 7, 60, 42, 31, 0, 46, 1, 83, 57, 18, 102, 67, 110, 113, 101, 93, 61, 96, 17, 105, 6]
BATCH 2: [59, 15, 121, 3, 72, 100, 50, 52, 45, 23, 87, 43, 33, 29, 62, 25, 74, 65, 75, 68, 4, 56, 117, 47, 73, 109, 106, 35, 88, 91, 119, 66]
BATCH 3: [98, 78, 125, 24, 99, 51, 14, 114, 26, 22, 54, 89, 79, 63, 30, 124, 20, 13, 2, 34, 95, 41, 85, 39, 37, 77, 90, 107, 104, 118, 27, 97]
BATCH 4: [49, 5, 53, 115, 126, 40, 108, 48, 8, 84, 120, 32, 82, 11, 112, 55, 80, 69, 12, 70, 111, 123, 81, 116, 71, 122, 36, 21, 103, 19, 127, 10]
BATCH 5: [74, 61, 97, 6, 127, 119, 65, 15, 78, 72, 99, 18, 41, 76, 79, 33, 0, 105, 103, 46, 14, 50, 113, 26, 43, 45, 100, 90, 28, 48, 19, 9]
BATCH 6: [35, 20, 3, 64, 5, 96, 114, 34, 126, 85, 124, 69, 110, 54, 109, 24, 104, 32, 73, 92, 11, 13, 58, 107, 84, 88, 59, 75, 95, 40, 16, 101]
BATCH 7: [93, 66, 106, 44, 102, 125, 7, 30, 12, 116, 87, 111, 81, 56, 83, 37, 31, 77, 67, 21, 118, 1, 120, 36, 86, 62, 71, 98, 82, 52, 25, 27]
BATCH 8: [112, 68, 60, 70, 115, 117, 29, 91, 57, 10, 121, 89, 4, 2, 122, 39, 51, 22, 53, 63, 108, 94, 42, 17, 8, 23, 80, 38, 55, 49, 47, 123]
BATCH 9: [67, 20, 101, 123, 109, 4, 39, 65, 34, 71, 22, 62, 73, 81, 114, 112, 66, 35, 43, 49, 92, 68, 1, 54, 27, 103, 46, 12, 82, 6, 119, 99]
BATCH 10: [86, 69, 13, 44, 16, 50, 75, 61, 58, 104, 64, 47, 95, 10, 79, 70, 97, 63, 45, 17, 56, 74, 87, 53, 91, 21, 48, 76, 9, 51, 28, 126]
...
...
...
BATCH 40: [10, 41, 29, 39, 57, 127, 101, 106, 55, 62, 72, 76, 124, 81, 66, 126, 53, 24, 33, 49, 102, 75, 34, 61, 47, 15, 21, 121, 8, 94, 52, 13]
Please let me know if I misunderstood and I can update accordingly.
You need to set reshuffle_each_iteration=False:
ds = ds.shuffle(128, reshuffle_each_iteration=False)
Full code:
import tensorflow as tf
ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
ds = ds.shuffle(128, reshuffle_each_iteration=False)
n_splits = 4
sub_datasets = [ds.shard(n_splits, i) for i in range(n_splits)]
output = []
# go through each of the subsets
for i in range(n_splits):
results = [x.numpy().decode() for x in sub_datasets[i]]
output.extend(results)
print(len(set(output)), 'is the number of unique files seen (128 desired)')
128 is the number of unique files seen (128 desired)

Split a python list if it contains two consecutive zero's

I have a python list that represents cities, by integers. City 0 is HQ. For example, a possible route list would be:
[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
This list contains three routes. How do I split these lists when they have consecutive zero's? So basically my expected output is a nested list where all the elements are seperate routes, like so:
[[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0],
[0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0],
[0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]]
I've tried a number of list comprehensions but I cannot seem to find the right solution...
routes = []
current_route = []
in_route = False
for city in entire_trip: # entire_trip is your initial list
if not in_route:
if city != 0:
in_route = True
current_route.append(0)
current_route.append(city)
else:
if city == 0:
current_route.append(0)
routes.append(current_route)
current_route = []
in_route = False
else:
current_route.append(city)
print(routes)
Just goes to show, 10 ways to do anything ...
Here is a super simple approach with minimal loops, if's and variables:
from itertools import zip_longest
# Setup testing list.
l = [0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
# Initialise.
d = {}
k = 0
# Split list based on requirements.
for i, j in zip_longest(l, l[1:]):
d.setdefault(k, []).append(i)
if all([i == 0, j == 0]):
k += 1
# Unpack results.
result = [v for v in d.values()]
Output:
[[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0],
[0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0],
[0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]]
It is not the best solution but...
numbers = [0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
n = []
pointer = 0
for i in range(len(numbers) -1):
if numbers[i] ==0 and numbers[i+1] == 0:
x = numbers[pointer:i+1]
pointer = i+1
n.append(x)
x = numbers[pointer:]
n.append(x)
print(n)
Because you care about an element's relation to its neighbor, I wouldn't recommend using a list comprehension.
nums=[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
split_on = []
for i in range(len(nums)-1):
if nums[i] == 0 and nums[i+1] == 0:
split_on.append(i+1)
now that we have the "split points" we want to grab the different sub-lists at those indices.
current_ind = 0
visits = []
for index in split_on:
visits.append(nums[current_ind:index])
current_ind = index
and now we have all but the last leg of the trip:
visits.append(nums[current_ind:len(nums)+1])
x = [0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
item = []
final = []
for i in x :
if i != 0:
item.append(i)
else :
item.append(i)
if len(item) > 1 :
final.append(item)
item = []
print(final)
Output:
[[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0], [0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0], [0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]]
numpy approach:
import numpy as np
x=[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
res=np.array(x)
res=np.split(res, np.arange(1, len(res))[np.logical_and(res[:-1]==0, res[1:]==0)])
Outputs:
>>> res
[array([ 0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126,
78, 0]), array([ 0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54,
0]), array([ 0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109,
39, 53, 0])]
def break_after_two_zero(array):
curr_list = []
for index, value in enumerate(array[:-1]):
curr_list.append(value)
if value == array[index + 1] == 0:
yield curr_list
curr_list = list()
if curr_list:
yield curr_list
data = [0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
print(list(break_after_two_zero(data)))
# >>> [[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0],
# [0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0],
# [0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]]
data = [0, 0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0, 0]
print(list(break_after_two_zero(data)))
# >>> [[0], [0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0], [0]]
data = [0, 7, 40, 41, 34, 96, 75, 0, 0, 0, 127, 48, 65, 79, 27, 126, 78, 0]
print(list(break_after_two_zero(data)))
# >>> [[0, 7, 40, 41, 34, 96, 75, 0],
# [0],
# [0, 127, 48, 65, 79, 27, 126, 78]]
Other possible solutions:
res = [[]]
for i in range(len(data)):
res[-1].append(data[i])
if data[i] == 0 and i < len(data) - 1 and data[i + 1] == 0:
res.append([])
and
it = iter(data)
res = [[0, *iter(it.__next__, 0), 0] for _ in it]
print(res)
Both give the same output:
[
[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0],
[0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0],
[0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0],
]
Other possible solutions:
turn list to str,then string processing.
aa = [0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0,
0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0,
0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]
bb = map(str, aa)
str_city = "-".join(bb)
routes = str_city.replace('-0-0','+').split('+')
current_routes = [map(int, "0-{0}-0".format(route.strip('0').strip('-')).split('-')) for route in routes ]
print(current_routes)
result:
[[0, 7, 40, 41, 34, 96, 75, 127, 48, 65, 79, 27, 126, 78, 0],
[0, 56, 45, 2, 67, 66, 124, 59, 82, 133, 102, 57, 54, 0],
[0, 64, 97, 81, 87, 80, 61, 98, 52, 101, 83, 60, 109, 39, 53, 0]]

Combine numpy subarrays of varying dimensions

I have a nested numpy array (dtype=object), it contains 333 arrays that increase consistently from size 52x1 to size 52x333
I would like to effectively extract and concatenate these arrays so that I have a single 52x55611 array
I imagine this may be straightforward but my attempts using numpy.reshape have been unsuccesful
If you want to stack them along the second axis, you can use numpy.hstack.
list_of_arrays = [ array_1, ..., array_n] #all these arrays have same shape[0]
big_array = np.hstack( list_of_arrays)
if I have understood you correctly, you could use numpy.concatenate.
>>> import numpy as np
>>> a = np.array([range(52)])
>>> b = np.array([range(52,104), range(104, 156)])
>>> np.concatenate((a,b))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51],
[ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103],
[104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155]])
>>>

Categories