Iterating Swap Algorithm Python - python

I have an algorithm. I want that last solution of the algorithm if respect certains conditions become the first solution. In my case I have this:
First PArt
Split the multidimensional array q in 2 parts
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
Change and rename the splitted matrix:
S=B[1]
SF=B[2]
S2=copy(SF)
S2[:,3]=S2[:,3]+I
Define a function f:
f=sum(S[:,1]*S[:,3])+sum(S2[:,1]*S2[:,3])
This first part is an obligated passage.
Second Passage
Then I split again the array in 2 parts:
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
I rename and change parts of the matrix(like in the first passage:
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
u=random.sample(T[:],1) #I random select an array from T
v=random.sample(T2[:],1) #random select an array from T2
u=array(u)
v=array(v)
Here is my first problem: I want to continue the algorithm only if v[0,0]-u[0,0]+T[-1,3]<=UB, if not I want to repeat Second Passage until the condition is verified.
Now I swap 1 random array from T with another from T2:
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
I modified and recalculate some in the matrix:
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
Define f2:
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
Here my second and last problem. I need to iterate this algorithm. If f-f2<0 my new starting solution has to be E and E2 and my new f has to be f2 and iterate excluding last choice the algorithm (recalcultaing a new f and f2).
Thank you for the patience. I'm a noob :D
EDIT:
I have an example here(this part goes before the code I have written on top)
import numpy as np
import random
p=[ 29, 85, 147, 98, 89, 83, 49, 7, 48, 88, 106, 97, 2,
107, 33, 144, 123, 84, 25, 42, 17, 82, 125, 103, 31, 110,
34, 100, 36, 46, 63, 18, 132, 10, 26, 119, 133, 15, 138,
113, 108, 81, 118, 116, 114, 130, 134, 86, 143, 126, 104, 52,
102, 8, 90, 11, 87, 37, 68, 75, 69, 56, 40, 70, 35,
71, 109, 5, 131, 121, 73, 38, 149, 20, 142, 91, 24, 53,
57, 39, 80, 79, 94, 136, 111, 78, 43, 92, 135, 65, 140,
148, 115, 61, 137, 50, 77, 30, 3, 93]
w=[106, 71, 141, 134, 14, 53, 57, 128, 119, 6, 4, 2, 140,
63, 51, 126, 35, 21, 125, 7, 109, 82, 95, 129, 67, 115,
112, 31, 114, 42, 91, 46, 108, 60, 97, 142, 85, 149, 28,
58, 52, 41, 22, 83, 86, 9, 120, 30, 136, 49, 84, 38,
70, 127, 1, 99, 55, 77, 144, 105, 145, 132, 45, 61, 81,
10, 36, 80, 90, 62, 32, 68, 117, 64, 24, 104, 131, 15,
47, 102, 100, 16, 89, 3, 147, 48, 148, 59, 143, 98, 88,
118, 121, 18, 19, 11, 69, 65, 123, 93]
p=array(p,'double')
w=array(w,'double')
r=p/w
LB=12
UB=155
I=9
j=p,w,r
j=transpose(j)
k=j[j[:,2].argsort()]
c=np.cumsum(k[:,0])
q=k[:,0],k[:,1],k[:,2],c
q=transpose(q)
o=sum(q[:,1]*q[:,3])
split_at = q[:,3].searchsorted([1,UB-I])
B = numpy.split(q, split_at)
S=B[1]
SF=B[2]
S2=copy(SF)
S2[:,3]=S2[:,3]+I
f=sum(S[:,1]*S[:,3])+sum(S2[:,1]*S2[:,3])
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
u=random.sample(T[:],1)
v=random.sample(T2[:],1)
u=array(u)
v=array(v)
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
I tried:
def DivideRandom(T,T2):
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
Divide(T,T2)
def SelectJob(u,v):
u=random.sample(T[:],1)
v=random.sample(T2[:],1)
u=array(u)
v=array(v)
SelectJob(u,v)
d=v[0,0]-u[0,0]+T[-1,3]
def Swap(u,v):
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
while True:
if d<=UB
Swap(u,v)
if d>UB
DivideRandom(T,T2)
SelectJob(u,v)
if d<UB:
break

You can iterate indefinitely using while True, then stop whenever your conditions are met using break:
count = 0
while True:
count += 1
if count == 10:
break
So for your second example you can try:
while True:
...
if f - f2 < 0:
# use new variables
f, E = f2, E2
else:
break
Your first problem is similar; loop, test, reset the appropriate variables.

Related

How to use tf dataset sharding AFTER a shuffle operation, but not repeat entries

I am trying to solve the following problem:
I have 128 files that I want to break into 4 subsets. Each time around, I want the division to be different.
If I do tf.data.Dataset.list_files('glob_pattern', shuffle=False), the dataset has the right number of files. Sharding this works as expected, but each shard only ever has the same files.
I want to shard and end up with a different division of the files each go-through the data. However, if I turn shuffle=True, then each shard seems to have its own copy of the original dataset, meaning that I can see the same file multiple times before seeing all the files once.
Is there an idiomatic way of splitting these files?
Basically, I'm wondering why the original list_files dataset is able to have some files show up multiple times before all the files have been seen.
Here is some TF2.0 code to see the problem:
ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
ds = ds.shuffle(128)
n_splits = 4
sub_datasets = [ds.shard(n_splits, i) for i in range(n_splits)]
output = []
# go through each of the subsets
for i in range(n_splits):
results = [x.numpy().decode() for x in sub_datasets[i]]
output.extend(results)
print(len(set(output)), 'is the number of unique files seen (128 desired)')
Here's an answer from what I can understand of your question. To generate a new subset of 4 datasets (shared) each time randomly shuffled, you can use the following code.
import numpy as np
import tensorflow as tf
# ####################
# I used numbers for visualization ... feel free to replace with your demo code
# ####################
# ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
# ####################
arr = np.arange(128)
ds = tf.data.Dataset.from_tensor_slices(arr)
def get_four_datasets(original_ds, window_size=32, shuffle_size=128):
""" Every time you call this function you will get a new four datasets """
return original_ds.shuffle(shuffle_size).window(window_size)
remake_ds_1 = list()
remake_ds_2 = list()
for i, (dataset_1, dataset_2) in enumerate(zip(get_four_datasets(ds), get_four_datasets(ds))):
print(f"\n\nDATASET #1-{i+1}")
ds_subset = [value for value in dataset_1.as_numpy_iterator()]
print("\t", ds_subset)
remake_ds_1.extend(ds_subset)
print(f"\nDATASET #2-{i+1}")
ds_subset_2 = [value for value in dataset_2.as_numpy_iterator()]
print("\t", ds_subset_2)
remake_ds_2.extend(ds_subset_2)
print("\n\nCounts\n")
print("DS 1 ALL: ", len(remake_ds_1))
print("DS 1 UNIQUE: ", len(set(remake_ds_1)))
print("DS 2 ALL: ", len(remake_ds_2))
print("DS 2 UNIQUE: ", len(set(remake_ds_2)))
OUTPUT
DATASET #1-1
[96, 4, 66, 120, 42, 54, 110, 57, 67, 7, 13, 9, 69, 86, 122, 88, 10, 55, 27, 106, 77, 107, 114, 87, 59, 81, 1, 49, 118, 17, 36, 11]
DATASET #2-1
[47, 26, 122, 10, 110, 31, 86, 34, 52, 121, 36, 112, 55, 48, 50, 108, 100, 103, 113, 68, 58, 29, 32, 84, 124, 15, 38, 51, 6, 66, 24, 41]
DATASET #1-2
[56, 80, 94, 124, 52, 109, 83, 90, 112, 35, 6, 101, 20, 84, 73, 74, 100, 99, 108, 15, 14, 12, 89, 24, 8, 29, 68, 85, 125, 3, 33, 58]
DATASET #2-2
[125, 127, 74, 97, 12, 39, 109, 126, 98, 40, 99, 93, 35, 107, 91, 88, 45, 13, 106, 120, 19, 73, 83, 11, 105, 61, 16, 114, 79, 95, 94, 44]
DATASET #1-3
[105, 38, 43, 60, 0, 26, 127, 65, 22, 18, 123, 82, 121, 71, 51, 23, 113, 30, 63, 40, 2, 61, 16, 98, 64, 25, 41, 28, 45, 19, 117, 39]
DATASET #2-3
[75, 64, 1, 17, 7, 42, 80, 92, 3, 9, 54, 33, 82, 56, 118, 102, 115, 43, 28, 90, 60, 119, 0, 57, 123, 62, 22, 72, 65, 23, 30, 87]
DATASET #1-4
[48, 62, 31, 102, 111, 46, 103, 44, 116, 79, 21, 50, 53, 78, 93, 32, 95, 34, 92, 126, 104, 47, 119, 37, 5, 70, 97, 91, 76, 75, 72, 115]
DATASET #2-4
[4, 85, 21, 116, 78, 27, 117, 2, 59, 111, 69, 46, 63, 20, 49, 5, 81, 53, 18, 37, 8, 76, 71, 89, 14, 104, 25, 96, 67, 101, 77, 70]
Counts
DS 1 ALL: 128
DS 1 UNIQUE: 128
DS 2 ALL: 128
DS 2 UNIQUE: 128
If you just want to generate a dataset where every 32 examples pulled from the dataset is shuffled and you want to iterate over the dataset multiple times getting new 32-set samples every time, you can do the following.
import numpy as np
import tensorflow as tf
arr = np.arange(128)
N_REPEATS = 10
ds = tf.data.Dataset.from_tensor_slices(arr)
ds = ds.shuffle(128).batch(32).repeat(N_REPEATS)
OUTPUT
BATCH 1: [92, 94, 76, 38, 58, 9, 44, 16, 86, 28, 64, 7, 60, 42, 31, 0, 46, 1, 83, 57, 18, 102, 67, 110, 113, 101, 93, 61, 96, 17, 105, 6]
BATCH 2: [59, 15, 121, 3, 72, 100, 50, 52, 45, 23, 87, 43, 33, 29, 62, 25, 74, 65, 75, 68, 4, 56, 117, 47, 73, 109, 106, 35, 88, 91, 119, 66]
BATCH 3: [98, 78, 125, 24, 99, 51, 14, 114, 26, 22, 54, 89, 79, 63, 30, 124, 20, 13, 2, 34, 95, 41, 85, 39, 37, 77, 90, 107, 104, 118, 27, 97]
BATCH 4: [49, 5, 53, 115, 126, 40, 108, 48, 8, 84, 120, 32, 82, 11, 112, 55, 80, 69, 12, 70, 111, 123, 81, 116, 71, 122, 36, 21, 103, 19, 127, 10]
BATCH 5: [74, 61, 97, 6, 127, 119, 65, 15, 78, 72, 99, 18, 41, 76, 79, 33, 0, 105, 103, 46, 14, 50, 113, 26, 43, 45, 100, 90, 28, 48, 19, 9]
BATCH 6: [35, 20, 3, 64, 5, 96, 114, 34, 126, 85, 124, 69, 110, 54, 109, 24, 104, 32, 73, 92, 11, 13, 58, 107, 84, 88, 59, 75, 95, 40, 16, 101]
BATCH 7: [93, 66, 106, 44, 102, 125, 7, 30, 12, 116, 87, 111, 81, 56, 83, 37, 31, 77, 67, 21, 118, 1, 120, 36, 86, 62, 71, 98, 82, 52, 25, 27]
BATCH 8: [112, 68, 60, 70, 115, 117, 29, 91, 57, 10, 121, 89, 4, 2, 122, 39, 51, 22, 53, 63, 108, 94, 42, 17, 8, 23, 80, 38, 55, 49, 47, 123]
BATCH 9: [67, 20, 101, 123, 109, 4, 39, 65, 34, 71, 22, 62, 73, 81, 114, 112, 66, 35, 43, 49, 92, 68, 1, 54, 27, 103, 46, 12, 82, 6, 119, 99]
BATCH 10: [86, 69, 13, 44, 16, 50, 75, 61, 58, 104, 64, 47, 95, 10, 79, 70, 97, 63, 45, 17, 56, 74, 87, 53, 91, 21, 48, 76, 9, 51, 28, 126]
...
...
...
BATCH 40: [10, 41, 29, 39, 57, 127, 101, 106, 55, 62, 72, 76, 124, 81, 66, 126, 53, 24, 33, 49, 102, 75, 34, 61, 47, 15, 21, 121, 8, 94, 52, 13]
Please let me know if I misunderstood and I can update accordingly.
You need to set reshuffle_each_iteration=False:
ds = ds.shuffle(128, reshuffle_each_iteration=False)
Full code:
import tensorflow as tf
ds = tf.data.Dataset.from_tensor_slices([f'train_{str(i).zfill(5)}' for i in range(128)])
ds = ds.shuffle(128, reshuffle_each_iteration=False)
n_splits = 4
sub_datasets = [ds.shard(n_splits, i) for i in range(n_splits)]
output = []
# go through each of the subsets
for i in range(n_splits):
results = [x.numpy().decode() for x in sub_datasets[i]]
output.extend(results)
print(len(set(output)), 'is the number of unique files seen (128 desired)')
128 is the number of unique files seen (128 desired)

How to return a list of strings stored in an HDF5 file

Hope someone can shed some light on this. I am trying to learn my way around with HDF5 files. Somehow this list of strings gets encoded into the file as a array of integers but I'm not able to figure out how to go about decoding it. I can plug the file back into pandas using the read_hdf function, but that's not the point - I am trying to understand the encoding logic. Summarized here is the example I was working with.
smiles.txt =
structure
[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F
[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
>>> import pandas as pd
>>> df = pd.read_csv('smiles.txt', header=0)
>>> df.to_hdf('smiles.h5', 'table')
I then explore the structure of the newly created HDF5 file:
>>> import h5py
>>> with h5py.File('smiles.h5',"r") as f:
>>> f.visit(print)
table
table/axis0
table/axis1
table/block0_items
table/block0_values
>>> with h5py.File('smiles_temp', 'r') as f:
>>> print(list(f.keys()))
>>> print(f['/thekey/axis0'][:])
>>> print(f['/thekey/axis1'][:])
>>> print(f['/thekey/block0_items'][:])
>>> print(f['/thekey/block0_values'][:])
['thekey']
[b'structure']
[0 1 2 3 4]
[b'structure']
[array([128, 4, 149, 123, 1, 0, 0, 0, 0, 0, 0, 140, 21,
110, 117, 109, 112, 121, 46, 99, 111, 114, 101, 46, 109, 117,
108, 116, 105, 97, 114, 114, 97, 121, 148, 140, 12, 95, 114,
101, 99, 111, 110, 115, 116, 114, 117, 99, 116, 148, 147, 148,
140, 5, 110, 117, 109, 112, 121, 148, 140, 7, 110, 100, 97,
114, 114, 97, 121, 148, 147, 148, 75, 0, 133, 148, 67, 1,
98, 148, 135, 148, 82, 148, 40, 75, 1, 75, 5, 75, 1,
134, 148, 104, 3, 140, 5, 100, 116, 121, 112, 101, 148, 147,
148, 140, 2, 79, 56, 148, 75, 0, 75, 1, 135, 148, 82,
148, 40, 75, 3, 140, 1, 124, 148, 78, 78, 78, 74, 255,
255, 255, 255, 74, 255, 255, 255, 255, 75, 63, 116, 148, 98,
137, 93, 148, 40, 140, 41, 91, 49, 49, 67, 72, 50, 93,
49, 78, 67, 67, 78, 50, 67, 91, 67, 64, 64, 72, 93,
51, 67, 67, 67, 91, 67, 64, 64, 72, 93, 51, 99, 52,
99, 99, 99, 99, 49, 99, 50, 52, 148, 140, 40, 91, 49,
49, 67, 72, 50, 93, 49, 78, 67, 67, 78, 50, 91, 67,
64, 64, 72, 93, 51, 67, 67, 67, 91, 67, 64, 64, 72,
93, 51, 99, 52, 99, 99, 99, 99, 49, 99, 50, 52, 148,
140, 54, 91, 49, 49, 67, 72, 51, 93, 99, 49, 99, 99,
99, 40, 99, 99, 49, 41, 99, 50, 99, 99, 40, 110, 110,
50, 99, 51, 99, 99, 99, 40, 99, 99, 51, 41, 83, 40,
61, 79, 41, 40, 61, 79, 41, 78, 41, 67, 40, 70, 41,
40, 70, 41, 70, 148, 140, 44, 91, 49, 49, 67, 72, 51,
93, 99, 49, 99, 99, 99, 99, 99, 49, 79, 91, 67, 64,
72, 93, 40, 91, 67, 64, 64, 72, 93, 50, 67, 78, 67,
67, 79, 50, 41, 99, 51, 99, 99, 99, 99, 99, 51, 148,
140, 44, 91, 49, 49, 67, 72, 51, 93, 99, 49, 99, 99,
99, 99, 99, 49, 83, 91, 67, 64, 72, 93, 40, 91, 67,
64, 64, 72, 93, 50, 67, 78, 67, 67, 79, 50, 41, 99,
51, 99, 99, 99, 99, 99, 51, 148, 101, 116, 148, 98, 46],
dtype=uint8)]
How does one go about returning the list of strings using h5py?
Just to clarify, the dataframe displays as:
In [2]: df = pd.read_csv('stack63452223.csv', header=0)
In [3]: df
Out[3]:
structure
0 [11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
1 [11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
2 [11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)...
3 [11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
4 [11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
In [11]: df._values
Out[11]:
array([['[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24'],
['[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24'],
['[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F'],
['[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3'],
['[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3']], dtype=object)
or as a list of strings:
In [24]: df['structure'].to_list()
Out[24]:
['[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24',
'[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24',
'[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F',
'[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3',
'[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3']
The h5 is written by pytables, which is different from h5py; generally h5py can read pytables, but the details can be complicated.
The top level keys:
['axis0', 'axis1', 'block0_items', 'block0_values']
A dataframe has axes (row and column). On another occasion I looked at how a dataframe stores its values, and found that it uses blocks, each holding columns with a common dtype. Here you have 1 column, and it is object dtype, since it contains strings.
Strings are bit awkward in HDF5, especially unicode. numpy arrays use a unicode string dtype; pandas uses object dtype, referencing Python strings (stored outside the dataframe). I suspect then that in saving such a frame pytables is using a more complex referencing scheme (that isn't immediately obvious via h5py).
Guess that's a long answer to just say I don't know.
Pandas own h5 load:
In [19]: pd.read_hdf('stack63452223.h5', 'table')
Out[19]:
structure
0 [11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
1 [11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
2 [11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)...
3 [11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
4 [11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
The h5 objects also have attrs,
In [38]: f['table'].attrs.keys()
Out[38]: <KeysViewHDF5 ['CLASS', 'TITLE', 'VERSION', 'axis0_variety', 'axis1_variety', 'block0_items_variety', 'encoding', 'errors', 'nblocks', 'ndim', 'pandas_type', 'pandas_version']>
Fiddling around I found that:
In [66]: x=f['table']['block0_values'][0]
In [67]: b''.join(x.view('S1').tolist())
Out[67]: b'\x80\x04\x95y\x01\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x05K\x01\x86\x94h\x03\x8c\x05dtype\x94\x93\x94\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(\x8c)[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c([11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c6[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F\x94\x8c,[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3\x94\x8c,[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3\x94et\x94b.'
Looks like your strings are there. uint8 is a single byte dtype, which can be viewed as byte. Joining them I see your strings, concatenated in some fashion.
reformating:
Out[67]: b'\x80\x04\x95y\x01\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x05K\x01\x86\x94h\x03\x8c\x05dtype\x94\x93\x94\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(\x8c)
[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c(
[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c6
[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F\x94\x8c,
[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3\x94\x8c,
[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3\x94et\x94b.'

Combine numpy subarrays of varying dimensions

I have a nested numpy array (dtype=object), it contains 333 arrays that increase consistently from size 52x1 to size 52x333
I would like to effectively extract and concatenate these arrays so that I have a single 52x55611 array
I imagine this may be straightforward but my attempts using numpy.reshape have been unsuccesful
If you want to stack them along the second axis, you can use numpy.hstack.
list_of_arrays = [ array_1, ..., array_n] #all these arrays have same shape[0]
big_array = np.hstack( list_of_arrays)
if I have understood you correctly, you could use numpy.concatenate.
>>> import numpy as np
>>> a = np.array([range(52)])
>>> b = np.array([range(52,104), range(104, 156)])
>>> np.concatenate((a,b))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51],
[ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103],
[104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155]])
>>>

Removing elements from a list by index from another list of integers

So I have two lists:
num = [
3, 22, 23, 25, 28, 29, 56, 57, 67, 68, 73, 78, 79, 82, 83, 89, 90, 91,
92, 98, 99, 108, 126, 127, 128, 131, 132, 133
]
details = [
'num=10,time=088', 'num=10,time=084', 'num=10,time=080', 'num=10,time=076',
'num=10,time=072', 'num=10,time=068', 'num=10,time=064', 'num=10,time=060',
'num=10,time=056', 'num=10,time=052', 'num=10,time=048', 'num=10,time=044',
.
.
.
.
'num=07,time=280', 'num=07,time=276', 'num=05,time=508', 'num=05,time=504',
'num=05,time=500', 'num=05,time=496'
]
num has 28 elements and details has 134 elements. I want to remove elements in details by index based on values from num. For example elements with index 3, 22, 23, 25, 28... (these are numbers from num list) should be removed from details.
When I use .pop() as it is described here it gives me an error saying:
AttributeError: 'str' object has no attribute 'pop'
similarily when I use del details[] it gives me an error saying:
IndexError: list assignment index out of range
Here is my code:
for an in details:
an.pop(num)
This should do what you want (delete from details every element indexed by the values in num):
for i in reversed(num):
del details[i]
It iterates over the list backwards so that the indexing of future things to delete doesn't change (otherwise you'd delete 3 and then the element formerly indexed as 22 would be 21)--this is probably the source of your IndexError.
Hmm. Two things. First your loop not quite right. Instead of
for an in details:
an.pop(num)
you want
for an in num: # step through every item in num list
details.pop(an) # remove the item with index an from details list
Second, you need to make sure to pop() items from details in reverse order so that your indexes are good. For example if you pop() index 3 from details, then everything else in details is reordered and when you go to remove index 22 it will be the wrong "cell".
I've simplified details to be a list containing numbers 0 to 133, but this code should work just fine on your real list
num = [3, 22, 23, 25, 28, 29, 56, 57, 67, 68, 73, 78, 79, 82, 83, 89, 90, 91, 92, 98,
99, 108, 126, 127, 128, 131, 132, 133]
details = list(range(134))
# sort indexes in descending order (sort in place)
num.sort(reverse = True)
for an in num:
details.pop(an)
print(details)
output
[0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 24,
26, 27, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 58, 59, 60, 61, 62, 63, 64, 65, 66, 69, 70, 71,
72, 74, 75, 76, 77, 80, 81, 84, 85, 86, 87, 88, 93, 94, 95, 96, 97, 100, 101, 10
2, 103, 104, 105, 106, 107, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 11
9, 120, 121, 122, 123, 124, 125, 129, 130]

Working with a list, performing arithmetic logic in Python

Suppose I have made a large list of numbers, and I want to make another one which I will add, pairwise, with the first list.
Here's the first list, A:
[109, 77, 57, 34, 94, 68, 96, 72, 39, 67, 49, 71, 121, 89, 61, 84, 45, 40, 104, 68, 54, 60, 68, 62, 91, 45, 41, 118, 44, 35, 53, 86, 41, 63, 111, 112, 54, 34, 52, 72, 111, 113, 47, 91, 107, 114, 105, 91, 57, 86, 32, 109, 84, 85, 114, 48, 105, 109, 68, 57, 78, 111, 64, 55, 97, 85, 40, 100, 74, 34, 94, 78, 57, 77, 94, 46, 95, 60, 42, 44, 68, 89, 113, 66, 112, 60, 40, 110, 89, 105, 113, 90, 73, 44, 39, 55, 108, 110, 64, 108]
And here's B:
[35, 106, 55, 61, 81, 109, 82, 85, 71, 55, 59, 38, 112, 92, 59, 37, 46, 55, 89, 63, 73, 119, 70, 76, 100, 49, 117, 77, 37, 62, 65, 115, 93, 34, 107, 102, 91, 58, 82, 119, 75, 117, 34, 112, 121, 58, 79, 69, 68, 72, 110, 43, 111, 51, 102, 39, 52, 62, 75, 118, 62, 46, 74, 77, 82, 81, 36, 87, 80, 56, 47, 41, 92, 102, 101, 66, 109, 108, 97, 49, 72, 74, 93, 114, 55, 116, 66, 93, 56, 56, 93, 99, 96, 115, 93, 111, 57, 105, 35, 99]
How might I generate the arithmetic addition logic, processing each pairwise value one by one (A[0] and B[0], through A[99], B[99]) and producing the list C (A[0] + B[0] through A[99]+ B[99])?
result = [(x + y) for x, y in itertools.izip(A, B)]
Or:
result = map(operator.add, itertools.izip(A, B))
Here is two possible options:
Use list comprehension.
Use NumPy.
I will be using shortened versions of your lists for convenience, and the element-wise sum will go into c.
List comprehension
a = [109, 77, 57, 34, 94, 68, 96]
b = [35, 106, 55, 61, 81, 109, 82]
c = [a_el + b_el for a_el,b_el in zip(a, b)]
NumPy
import numpy as np
a = np.array([109, 77, 57, 34, 94, 68, 96])
b = np.array([35, 106, 55, 61, 81, 109, 82])
c = a + b
With a list comprehension:
C = [A[i]+ B[i] for i in range(len(A))]
And even safer:
C = [A[i]+ B[i] for i in range(len(A)) if len(A) == len(B)]

Categories