Combine numpy subarrays of varying dimensions

Combine numpy subarrays of varying dimensions - python

I have a nested numpy array (dtype=object), it contains 333 arrays that increase consistently from size 52x1 to size 52x333
I would like to effectively extract and concatenate these arrays so that I have a single 52x55611 array
I imagine this may be straightforward but my attempts using numpy.reshape have been unsuccesful

If you want to stack them along the second axis, you can use numpy.hstack.
list_of_arrays = [ array_1, ..., array_n] #all these arrays have same shape[0]
big_array = np.hstack( list_of_arrays)

if I have understood you correctly, you could use numpy.concatenate.
>>> import numpy as np
>>> a = np.array([range(52)])
>>> b = np.array([range(52,104), range(104, 156)])
>>> np.concatenate((a,b))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51],
[ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103],
[104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155]])
>>>

Related

Graph contraction not working as expected

My input is a graph G, written as an adjacency list (each row of G starts with a vertex and all further values are the vertices adjacent to it).
There are 200 vertices in total.
I'm trying to contract the graph randomly, until only two vertices are left (part of the Karger algorithm).
My issue is that after several iterations, v2's index can't be found in G.
It appears that my code merging v2 into v1 doesn't work, as the same vertex is picked multiple times as v2, but I've no idea why.
removed = []
n = len(G) # number of vertices in G
while n > 2:
# Randomly choosing two vertices (and their index)
iv1 = random.randint(0, n - 1)
v1 = G[iv1][0]
v2 = random.choice(G[iv1][1:])
iv2 = None
for index, sublist in enumerate(G):
if sublist[0] is v2:
iv2 = index
# Debug code
removed.append(v2)
if iv2 is None:
print("===")
print("removed=", removed)
print("len set=", len(set(removed)), " len list=", len(removed))
print("G[iv1]=", G[iv1])
print("v1=", v1, " iv1=", iv1, " v2=", v2, " iv2=", iv2, "n=", n)
print("===")
break
# Graph Contraction (v1 and v2 merged, v1 becomes the merged vertex)
G[iv2].remove(v1) # Removing self-loops
G[iv1] += G[iv2][1:] # Vertices adjacent to v2 now adjacent to v1 (1/2)
G[iv1].remove(v2) # Removing self-loops
del G[iv2]
n -= 1
for i in range(n):
if G[i][0] is not v1:
G[i] = [v1 if vert is v2 else vert for vert in G[i]] # (2/2)
return len(G[0])
Here's an output example :
===
removed= [91, 98, 173, 23, 169, 179, 85, 54, 89, 110, 180, 2, 37, 17, 73, 43, 77, 34, 66, 19, 51, 178, 61, 99, 26, 52, 162, 111, 22, 149, 57, 118, 120, 30, 4, 28, 5, 27, 147, 188, 75, 136, 32, 40, 156, 145, 70, 138, 36, 12, 41, 140, 55, 152, 105, 60, 81, 64, 142, 45, 7, 148, 164, 49, 183, 165, 78, 74, 158, 160, 24, 146, 141, 182, 97, 116, 86, 96, 177, 186, 65, 135, 76, 9, 108, 3, 88, 151, 115, 42, 167, 185, 8, 190, 189, 175, 194, 184, 153, 196, 126, 195, 197, 107, 58, 6, 104, 117, 56, 199, 82, 168, 130, 29, 87, 121, 109, 90, 18, 132, 163, 198, 125, 13, 21, 154, 103, 72, 174, 187, 171, 80, 161, 191, 150, 137, 106, 79, 192, 1, 50, 155, 159, 35, 172, 176, 139, 20, 63, 38, 84, 119, 69, 94, 68, 193, 10, 95, 130]
len set= 158 len list= 159
G[iv1]= [123, 92, 92, 92, 129, 92, 11, 92, 92, 92, 92, 33, 47, 92, 92, 129, 92, 92, 92, 92, 69, 69, 33, 92, 129, 13, 128, 134, 92, 69, 92, 134, 92, 13, 114, 47, 13, 13, 128, 44, 134, 33, 123, 44, 181, 69, 33, 92, 16, 69, 134, 33, 157, 44, 83, 47, 181, 33, 92, 44, 92, 92, 181, 134, 129, 170, 92, 47, 129, 47, 44, 16, 181, 92, 44, 134, 157, 92, 11, 33, 181, 33, 92, 48, 92, 33, 13, 134, 130, 47, 92, 69, 92, 92, 134, 134, 92, 47, 123, 69, 92, 129, 130, 92, 114, 69, 69, 92, 44, 129, 157, 123, 92, 44, 134, 13, 11, 47, 13, 47, 92, 181, 134, 123, 47, 128, 92, 181, 92, 44, 48, 123, 134, 69, 33, 92, 129, 33, 123, 16, 130, 33, 92, 44, 92, 13, 44, 92, 157, 129, 114, 181, 47, 69, 92, 92]
v1= 123 iv1= 27 v2= 130 iv2= None n= 42
===

How to use tf dataset sharding AFTER a shuffle operation, but not repeat entries

I am trying to solve the following problem:
I have 128 files that I want to break into 4 subsets. Each time around, I want the division to be different.
If I do tf.data.Dataset.list_files('glob_pattern', shuffle=False), the dataset has the right number of files. Sharding this works as expected, but each shard only ever has the same files.
I want to shard and end up with a different division of the files each go-through the data. However, if I turn shuffle=True, then each shard seems to have its own copy of the original dataset, meaning that I can see the same file multiple times before seeing all the files once.
Is there an idiomatic way of splitting these files?
Basically, I'm wondering why the original list_files dataset is able to have some files show up multiple times before all the files have been seen.
Here is some TF2.0 code to see the problem:
ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
ds = ds.shuffle(128)
n_splits = 4
sub_datasets = [ds.shard(n_splits, i) for i in range(n_splits)]
output = []
# go through each of the subsets
for i in range(n_splits):
results = [x.numpy().decode() for x in sub_datasets[i]]
output.extend(results)
print(len(set(output)), 'is the number of unique files seen (128 desired)')

Here's an answer from what I can understand of your question. To generate a new subset of 4 datasets (shared) each time randomly shuffled, you can use the following code.
import numpy as np
import tensorflow as tf
# ####################
# I used numbers for visualization ... feel free to replace with your demo code
# ####################
# ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
# ####################
arr = np.arange(128)
ds = tf.data.Dataset.from_tensor_slices(arr)
def get_four_datasets(original_ds, window_size=32, shuffle_size=128):
""" Every time you call this function you will get a new four datasets """
return original_ds.shuffle(shuffle_size).window(window_size)
remake_ds_1 = list()
remake_ds_2 = list()
for i, (dataset_1, dataset_2) in enumerate(zip(get_four_datasets(ds), get_four_datasets(ds))):
print(f"\n\nDATASET #1-{i+1}")
ds_subset = [value for value in dataset_1.as_numpy_iterator()]
print("\t", ds_subset)
remake_ds_1.extend(ds_subset)
print(f"\nDATASET #2-{i+1}")
ds_subset_2 = [value for value in dataset_2.as_numpy_iterator()]
print("\t", ds_subset_2)
remake_ds_2.extend(ds_subset_2)
print("\n\nCounts\n")
print("DS 1 ALL: ", len(remake_ds_1))
print("DS 1 UNIQUE: ", len(set(remake_ds_1)))
print("DS 2 ALL: ", len(remake_ds_2))
print("DS 2 UNIQUE: ", len(set(remake_ds_2)))
OUTPUT
DATASET #1-1
[96, 4, 66, 120, 42, 54, 110, 57, 67, 7, 13, 9, 69, 86, 122, 88, 10, 55, 27, 106, 77, 107, 114, 87, 59, 81, 1, 49, 118, 17, 36, 11]
DATASET #2-1
[47, 26, 122, 10, 110, 31, 86, 34, 52, 121, 36, 112, 55, 48, 50, 108, 100, 103, 113, 68, 58, 29, 32, 84, 124, 15, 38, 51, 6, 66, 24, 41]
DATASET #1-2
[56, 80, 94, 124, 52, 109, 83, 90, 112, 35, 6, 101, 20, 84, 73, 74, 100, 99, 108, 15, 14, 12, 89, 24, 8, 29, 68, 85, 125, 3, 33, 58]
DATASET #2-2
[125, 127, 74, 97, 12, 39, 109, 126, 98, 40, 99, 93, 35, 107, 91, 88, 45, 13, 106, 120, 19, 73, 83, 11, 105, 61, 16, 114, 79, 95, 94, 44]
DATASET #1-3
[105, 38, 43, 60, 0, 26, 127, 65, 22, 18, 123, 82, 121, 71, 51, 23, 113, 30, 63, 40, 2, 61, 16, 98, 64, 25, 41, 28, 45, 19, 117, 39]
DATASET #2-3
[75, 64, 1, 17, 7, 42, 80, 92, 3, 9, 54, 33, 82, 56, 118, 102, 115, 43, 28, 90, 60, 119, 0, 57, 123, 62, 22, 72, 65, 23, 30, 87]
DATASET #1-4
[48, 62, 31, 102, 111, 46, 103, 44, 116, 79, 21, 50, 53, 78, 93, 32, 95, 34, 92, 126, 104, 47, 119, 37, 5, 70, 97, 91, 76, 75, 72, 115]
DATASET #2-4
[4, 85, 21, 116, 78, 27, 117, 2, 59, 111, 69, 46, 63, 20, 49, 5, 81, 53, 18, 37, 8, 76, 71, 89, 14, 104, 25, 96, 67, 101, 77, 70]
Counts
DS 1 ALL: 128
DS 1 UNIQUE: 128
DS 2 ALL: 128
DS 2 UNIQUE: 128
If you just want to generate a dataset where every 32 examples pulled from the dataset is shuffled and you want to iterate over the dataset multiple times getting new 32-set samples every time, you can do the following.
import numpy as np
import tensorflow as tf
arr = np.arange(128)
N_REPEATS = 10
ds = tf.data.Dataset.from_tensor_slices(arr)
ds = ds.shuffle(128).batch(32).repeat(N_REPEATS)
OUTPUT
BATCH 1: [92, 94, 76, 38, 58, 9, 44, 16, 86, 28, 64, 7, 60, 42, 31, 0, 46, 1, 83, 57, 18, 102, 67, 110, 113, 101, 93, 61, 96, 17, 105, 6]
BATCH 2: [59, 15, 121, 3, 72, 100, 50, 52, 45, 23, 87, 43, 33, 29, 62, 25, 74, 65, 75, 68, 4, 56, 117, 47, 73, 109, 106, 35, 88, 91, 119, 66]
BATCH 3: [98, 78, 125, 24, 99, 51, 14, 114, 26, 22, 54, 89, 79, 63, 30, 124, 20, 13, 2, 34, 95, 41, 85, 39, 37, 77, 90, 107, 104, 118, 27, 97]
BATCH 4: [49, 5, 53, 115, 126, 40, 108, 48, 8, 84, 120, 32, 82, 11, 112, 55, 80, 69, 12, 70, 111, 123, 81, 116, 71, 122, 36, 21, 103, 19, 127, 10]
BATCH 5: [74, 61, 97, 6, 127, 119, 65, 15, 78, 72, 99, 18, 41, 76, 79, 33, 0, 105, 103, 46, 14, 50, 113, 26, 43, 45, 100, 90, 28, 48, 19, 9]
BATCH 6: [35, 20, 3, 64, 5, 96, 114, 34, 126, 85, 124, 69, 110, 54, 109, 24, 104, 32, 73, 92, 11, 13, 58, 107, 84, 88, 59, 75, 95, 40, 16, 101]
BATCH 7: [93, 66, 106, 44, 102, 125, 7, 30, 12, 116, 87, 111, 81, 56, 83, 37, 31, 77, 67, 21, 118, 1, 120, 36, 86, 62, 71, 98, 82, 52, 25, 27]
BATCH 8: [112, 68, 60, 70, 115, 117, 29, 91, 57, 10, 121, 89, 4, 2, 122, 39, 51, 22, 53, 63, 108, 94, 42, 17, 8, 23, 80, 38, 55, 49, 47, 123]
BATCH 9: [67, 20, 101, 123, 109, 4, 39, 65, 34, 71, 22, 62, 73, 81, 114, 112, 66, 35, 43, 49, 92, 68, 1, 54, 27, 103, 46, 12, 82, 6, 119, 99]
BATCH 10: [86, 69, 13, 44, 16, 50, 75, 61, 58, 104, 64, 47, 95, 10, 79, 70, 97, 63, 45, 17, 56, 74, 87, 53, 91, 21, 48, 76, 9, 51, 28, 126]
...
...
...
BATCH 40: [10, 41, 29, 39, 57, 127, 101, 106, 55, 62, 72, 76, 124, 81, 66, 126, 53, 24, 33, 49, 102, 75, 34, 61, 47, 15, 21, 121, 8, 94, 52, 13]
Please let me know if I misunderstood and I can update accordingly.

You need to set reshuffle_each_iteration=False:
ds = ds.shuffle(128, reshuffle_each_iteration=False)
Full code:
import tensorflow as tf
ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
ds = ds.shuffle(128, reshuffle_each_iteration=False)
n_splits = 4
sub_datasets = [ds.shard(n_splits, i) for i in range(n_splits)]
output = []
# go through each of the subsets
for i in range(n_splits):
results = [x.numpy().decode() for x in sub_datasets[i]]
output.extend(results)
print(len(set(output)), 'is the number of unique files seen (128 desired)')
128 is the number of unique files seen (128 desired)

How to return a list of strings stored in an HDF5 file

Hope someone can shed some light on this. I am trying to learn my way around with HDF5 files. Somehow this list of strings gets encoded into the file as a array of integers but I'm not able to figure out how to go about decoding it. I can plug the file back into pandas using the read_hdf function, but that's not the point - I am trying to understand the encoding logic. Summarized here is the example I was working with.
smiles.txt =
structure
[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F
[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
>>> import pandas as pd
>>> df = pd.read_csv('smiles.txt', header=0)
>>> df.to_hdf('smiles.h5', 'table')
I then explore the structure of the newly created HDF5 file:
>>> import h5py
>>> with h5py.File('smiles.h5',"r") as f:
>>> f.visit(print)
table
table/axis0
table/axis1
table/block0_items
table/block0_values
>>> with h5py.File('smiles_temp', 'r') as f:
>>> print(list(f.keys()))
>>> print(f['/thekey/axis0'][:])
>>> print(f['/thekey/axis1'][:])
>>> print(f['/thekey/block0_items'][:])
>>> print(f['/thekey/block0_values'][:])
['thekey']
[b'structure']
[0 1 2 3 4]
[b'structure']
[array([128, 4, 149, 123, 1, 0, 0, 0, 0, 0, 0, 140, 21,
110, 117, 109, 112, 121, 46, 99, 111, 114, 101, 46, 109, 117,
108, 116, 105, 97, 114, 114, 97, 121, 148, 140, 12, 95, 114,
101, 99, 111, 110, 115, 116, 114, 117, 99, 116, 148, 147, 148,
140, 5, 110, 117, 109, 112, 121, 148, 140, 7, 110, 100, 97,
114, 114, 97, 121, 148, 147, 148, 75, 0, 133, 148, 67, 1,
98, 148, 135, 148, 82, 148, 40, 75, 1, 75, 5, 75, 1,
134, 148, 104, 3, 140, 5, 100, 116, 121, 112, 101, 148, 147,
148, 140, 2, 79, 56, 148, 75, 0, 75, 1, 135, 148, 82,
148, 40, 75, 3, 140, 1, 124, 148, 78, 78, 78, 74, 255,
255, 255, 255, 74, 255, 255, 255, 255, 75, 63, 116, 148, 98,
137, 93, 148, 40, 140, 41, 91, 49, 49, 67, 72, 50, 93,
49, 78, 67, 67, 78, 50, 67, 91, 67, 64, 64, 72, 93,
51, 67, 67, 67, 91, 67, 64, 64, 72, 93, 51, 99, 52,
99, 99, 99, 99, 49, 99, 50, 52, 148, 140, 40, 91, 49,
49, 67, 72, 50, 93, 49, 78, 67, 67, 78, 50, 91, 67,
64, 64, 72, 93, 51, 67, 67, 67, 91, 67, 64, 64, 72,
93, 51, 99, 52, 99, 99, 99, 99, 49, 99, 50, 52, 148,
140, 54, 91, 49, 49, 67, 72, 51, 93, 99, 49, 99, 99,
99, 40, 99, 99, 49, 41, 99, 50, 99, 99, 40, 110, 110,
50, 99, 51, 99, 99, 99, 40, 99, 99, 51, 41, 83, 40,
61, 79, 41, 40, 61, 79, 41, 78, 41, 67, 40, 70, 41,
40, 70, 41, 70, 148, 140, 44, 91, 49, 49, 67, 72, 51,
93, 99, 49, 99, 99, 99, 99, 99, 49, 79, 91, 67, 64,
72, 93, 40, 91, 67, 64, 64, 72, 93, 50, 67, 78, 67,
67, 79, 50, 41, 99, 51, 99, 99, 99, 99, 99, 51, 148,
140, 44, 91, 49, 49, 67, 72, 51, 93, 99, 49, 99, 99,
99, 99, 99, 49, 83, 91, 67, 64, 72, 93, 40, 91, 67,
64, 64, 72, 93, 50, 67, 78, 67, 67, 79, 50, 41, 99,
51, 99, 99, 99, 99, 99, 51, 148, 101, 116, 148, 98, 46],
dtype=uint8)]
How does one go about returning the list of strings using h5py?

Just to clarify, the dataframe displays as:
In [2]: df = pd.read_csv('stack63452223.csv', header=0)
In [3]: df
Out[3]:
structure
0 [11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
1 [11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
2 [11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)...
3 [11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
4 [11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
In [11]: df._values
Out[11]:
array([['[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24'],
['[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24'],
['[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F'],
['[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3'],
['[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3']], dtype=object)
or as a list of strings:
In [24]: df['structure'].to_list()
Out[24]:
['[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24',
'[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24',
'[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F',
'[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3',
'[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3']
The h5 is written by pytables, which is different from h5py; generally h5py can read pytables, but the details can be complicated.
The top level keys:
['axis0', 'axis1', 'block0_items', 'block0_values']
A dataframe has axes (row and column). On another occasion I looked at how a dataframe stores its values, and found that it uses blocks, each holding columns with a common dtype. Here you have 1 column, and it is object dtype, since it contains strings.
Strings are bit awkward in HDF5, especially unicode. numpy arrays use a unicode string dtype; pandas uses object dtype, referencing Python strings (stored outside the dataframe). I suspect then that in saving such a frame pytables is using a more complex referencing scheme (that isn't immediately obvious via h5py).
Guess that's a long answer to just say I don't know.
Pandas own h5 load:
In [19]: pd.read_hdf('stack63452223.h5', 'table')
Out[19]:
structure
0 [11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
1 [11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
2 [11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)...
3 [11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
4 [11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
The h5 objects also have attrs,
In [38]: f['table'].attrs.keys()
Out[38]: <KeysViewHDF5 ['CLASS', 'TITLE', 'VERSION', 'axis0_variety', 'axis1_variety', 'block0_items_variety', 'encoding', 'errors', 'nblocks', 'ndim', 'pandas_type', 'pandas_version']>
Fiddling around I found that:
In [66]: x=f['table']['block0_values'][0]
In [67]: b''.join(x.view('S1').tolist())
Out[67]: b'\x80\x04\x95y\x01\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x05K\x01\x86\x94h\x03\x8c\x05dtype\x94\x93\x94\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(\x8c)[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c([11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c6[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F\x94\x8c,[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3\x94\x8c,[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3\x94et\x94b.'
Looks like your strings are there. uint8 is a single byte dtype, which can be viewed as byte. Joining them I see your strings, concatenated in some fashion.
reformating:
Out[67]: b'\x80\x04\x95y\x01\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x05K\x01\x86\x94h\x03\x8c\x05dtype\x94\x93\x94\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(\x8c)
[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c(
[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c6
[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F\x94\x8c,
[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3\x94\x8c,
[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3\x94et\x94b.'

Doing operations with only Nth element of a list

I want to multiply only every second number on a list(0-100) but I just can't get it to work.

[x*10 for x in range(100) if x%2==0] # take vale to be multiply instead of 10, use x if you want to multiply with number it self
below code will only multiply even elements and keep other as it is.
def iterate_lis(get_list):
ls = []
for x in get_list:
if x%2==0:
ls.append(x*2)
else:
ls.append(x)
print(ls)
return ls
iterate_count = 5 # list will be iterate 5 times
for i in range(iterate_count):
if i ==0:
get_lis = iterate_lis(range(100))
else:
get_lis = iterate_lis(get_lis)
result for iterate_count=5 will be as follow:
>>>
[0, 1, 4, 3, 8, 5, 12, 7, 16, 9, 20, 11, 24, 13, 28, 15, 32, 17, 36, 19, 40, 21, 44, 23, 48, 25, 52, 27, 56, 29, 60, 31, 64, 33, 68, 35, 72, 37, 76, 39, 80, 41, 84, 43, 88, 45, 92, 47, 96, 49, 100, 51, 104, 53, 108, 55, 112, 57, 116, 59, 120, 61, 124, 63, 128, 65, 132, 67, 136, 69, 140, 71, 144, 73, 148, 75, 152, 77, 156, 79, 160, 81, 164, 83, 168, 85, 172, 87, 176, 89, 180, 91, 184, 93, 188, 95, 192, 97, 196, 99]
[0, 1, 8, 3, 16, 5, 24, 7, 32, 9, 40, 11, 48, 13, 56, 15, 64, 17, 72, 19, 80, 21, 88, 23, 96, 25, 104, 27, 112, 29, 120, 31, 128, 33, 136, 35, 144, 37, 152, 39, 160, 41, 168, 43, 176, 45, 184, 47, 192, 49, 200, 51, 208, 53, 216, 55, 224, 57, 232, 59, 240, 61, 248, 63, 256, 65, 264, 67, 272, 69, 280, 71, 288, 73, 296, 75, 304, 77, 312, 79, 320, 81, 328, 83, 336, 85, 344, 87, 352, 89, 360, 91, 368, 93, 376, 95, 384, 97, 392, 99]
[0, 1, 16, 3, 32, 5, 48, 7, 64, 9, 80, 11, 96, 13, 112, 15, 128, 17, 144, 19, 160, 21, 176, 23, 192, 25, 208, 27, 224, 29, 240, 31, 256, 33, 272, 35, 288, 37, 304, 39, 320, 41, 336, 43, 352, 45, 368, 47, 384, 49, 400, 51, 416, 53, 432, 55, 448, 57, 464, 59, 480, 61, 496, 63, 512, 65, 528, 67, 544, 69, 560, 71, 576, 73, 592, 75, 608, 77, 624, 79, 640, 81, 656, 83, 672, 85, 688, 87, 704, 89, 720, 91, 736, 93, 752, 95, 768, 97, 784, 99]
[0, 1, 32, 3, 64, 5, 96, 7, 128, 9, 160, 11, 192, 13, 224, 15, 256, 17, 288, 19, 320, 21, 352, 23, 384, 25, 416, 27, 448, 29, 480, 31, 512, 33, 544, 35, 576, 37, 608, 39, 640, 41, 672, 43, 704, 45, 736, 47, 768, 49, 800, 51, 832, 53, 864, 55, 896, 57, 928, 59, 960, 61, 992, 63, 1024, 65, 1056, 67, 1088, 69, 1120, 71, 1152, 73, 1184, 75, 1216, 77, 1248, 79, 1280, 81, 1312, 83, 1344, 85, 1376, 87, 1408, 89, 1440, 91, 1472, 93, 1504, 95, 1536, 97, 1568, 99]
[0, 1, 64, 3, 128, 5, 192, 7, 256, 9, 320, 11, 384, 13, 448, 15, 512, 17, 576, 19, 640, 21, 704, 23, 768, 25, 832, 27, 896, 29, 960, 31, 1024, 33, 1088, 35, 1152, 37, 1216, 39, 1280, 41, 1344, 43, 1408, 45, 1472, 47, 1536, 49, 1600, 51, 1664, 53, 1728, 55, 1792, 57, 1856, 59, 1920, 61, 1984, 63, 2048, 65, 2112, 67, 2176, 69, 2240, 71, 2304, 73, 2368, 75, 2432, 77, 2496, 79, 2560, 81, 2624, 83, 2688, 85, 2752, 87, 2816, 89, 2880, 91, 2944, 93, 3008, 95, 3072, 97, 3136, 99]

You can`t multiply with 0 , because the result will be always 0.
result=1
for i in range (1 , 10 ):
if i%2==0:
result*=i
print(result)

import numpy as np
l = range(100)
np.product(l[0::2])
This will give you every second element of your list and multiply all.

I want to change to list so: 1,2,3,4,5,6,7,8 becomes 1,4,3,8,5,16 etc.
Though I don't understand how you multiply 6 to end up with 16, I assume this is what you need:
new_list = []
for x in range(1,100):
if x % 2 == 0: new_list.append(x*2)
else: new_list.append(x)
print(new_list)
If the number is divisible by 2, you multiply it with 2 and append it to a new list. If not, you just append it without multiplying.
Running this program, you get the following output:
[1, 4, 3, 8, 5, 12, 7, 16, 9, 20, 11, 24, 13, 28, 15, 32, 17, 36, 19, 40, 21, 44, 23, 48, 25, 52, 27, 56, 29, 60, 31, 64, 33, 68, 35, 72, 37, 76, 39, 80, 41, 84, 43, 88, 45, 92, 47, 96, 49, 100, 51, 104, 53, 108, 55, 112, 57, 116, 59, 120, 61, 124, 63, 128, 65, 132, 67, 136, 69, 140, 71, 144, 73, 148, 75, 152, 77, 156, 79, 160, 81, 164, 83, 168, 85, 172, 87, 176, 89, 180, 91, 184, 93, 188, 95, 192, 97, 196, 99]

Use range() to generate the indexes of the entries you want to change...
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
for i in range(1, len(numbers), 2):
numbers[i] *= 2
will result in numbers containing the list
[1, 4, 3, 8, 5, 12, 7, 16]

Iterating Swap Algorithm Python

I have an algorithm. I want that last solution of the algorithm if respect certains conditions become the first solution. In my case I have this:
First PArt
Split the multidimensional array q in 2 parts
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
Change and rename the splitted matrix:
S=B[1]
SF=B[2]
S2=copy(SF)
S2[:,3]=S2[:,3]+I
Define a function f:
f=sum(S[:,1]*S[:,3])+sum(S2[:,1]*S2[:,3])
This first part is an obligated passage.
Second Passage
Then I split again the array in 2 parts:
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
I rename and change parts of the matrix(like in the first passage:
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
u=random.sample(T[:],1) #I random select an array from T
v=random.sample(T2[:],1) #random select an array from T2
u=array(u)
v=array(v)
Here is my first problem: I want to continue the algorithm only if v[0,0]-u[0,0]+T[-1,3]<=UB, if not I want to repeat Second Passage until the condition is verified.
Now I swap 1 random array from T with another from T2:
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
I modified and recalculate some in the matrix:
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
Define f2:
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
Here my second and last problem. I need to iterate this algorithm. If f-f2<0 my new starting solution has to be E and E2 and my new f has to be f2 and iterate excluding last choice the algorithm (recalcultaing a new f and f2).
Thank you for the patience. I'm a noob :D
EDIT:
I have an example here(this part goes before the code I have written on top)
import numpy as np
import random
p=[ 29, 85, 147, 98, 89, 83, 49, 7, 48, 88, 106, 97, 2,
107, 33, 144, 123, 84, 25, 42, 17, 82, 125, 103, 31, 110,
34, 100, 36, 46, 63, 18, 132, 10, 26, 119, 133, 15, 138,
113, 108, 81, 118, 116, 114, 130, 134, 86, 143, 126, 104, 52,
102, 8, 90, 11, 87, 37, 68, 75, 69, 56, 40, 70, 35,
71, 109, 5, 131, 121, 73, 38, 149, 20, 142, 91, 24, 53,
57, 39, 80, 79, 94, 136, 111, 78, 43, 92, 135, 65, 140,
148, 115, 61, 137, 50, 77, 30, 3, 93]
w=[106, 71, 141, 134, 14, 53, 57, 128, 119, 6, 4, 2, 140,
63, 51, 126, 35, 21, 125, 7, 109, 82, 95, 129, 67, 115,
112, 31, 114, 42, 91, 46, 108, 60, 97, 142, 85, 149, 28,
58, 52, 41, 22, 83, 86, 9, 120, 30, 136, 49, 84, 38,
70, 127, 1, 99, 55, 77, 144, 105, 145, 132, 45, 61, 81,
10, 36, 80, 90, 62, 32, 68, 117, 64, 24, 104, 131, 15,
47, 102, 100, 16, 89, 3, 147, 48, 148, 59, 143, 98, 88,
118, 121, 18, 19, 11, 69, 65, 123, 93]
p=array(p,'double')
w=array(w,'double')
r=p/w
LB=12
UB=155
I=9
j=p,w,r
j=transpose(j)
k=j[j[:,2].argsort()]
c=np.cumsum(k[:,0])
q=k[:,0],k[:,1],k[:,2],c
q=transpose(q)
o=sum(q[:,1]*q[:,3])
split_at = q[:,3].searchsorted([1,UB-I])
B = numpy.split(q, split_at)
S=B[1]
SF=B[2]
S2=copy(SF)
S2[:,3]=S2[:,3]+I
f=sum(S[:,1]*S[:,3])+sum(S2[:,1]*S2[:,3])
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
u=random.sample(T[:],1)
v=random.sample(T2[:],1)
u=array(u)
v=array(v)
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
I tried:
def DivideRandom(T,T2):
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
Divide(T,T2)
def SelectJob(u,v):
u=random.sample(T[:],1)
v=random.sample(T2[:],1)
u=array(u)
v=array(v)
SelectJob(u,v)
d=v[0,0]-u[0,0]+T[-1,3]
def Swap(u,v):
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
while True:
if d<=UB
Swap(u,v)
if d>UB
DivideRandom(T,T2)
SelectJob(u,v)
if d<UB:
break

You can iterate indefinitely using while True, then stop whenever your conditions are met using break:
count = 0
while True:
count += 1
if count == 10:
break
So for your second example you can try:
while True:
...
if f - f2 < 0:
# use new variables
f, E = f2, E2
else:
break
Your first problem is similar; loop, test, reset the appropriate variables.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combine numpy subarrays of varying dimensions - python

If you want to stack them along the second axis, you can use numpy.hstack. list_of_arrays = [ array_1, ..., array_n] #all these arrays have same shape[0] big_array = np.hstack( list_of_arrays)

Related

Graph contraction not working as expected

How to use tf dataset sharding AFTER a shuffle operation, but not repeat entries

How to return a list of strings stored in an HDF5 file

Doing operations with only Nth element of a list

Iterating Swap Algorithm Python

Categories

Resources