I want to extract a substring from a string, which is conform to a certain regex. The regex is:
(\[\s*(\d)+ byte(s)?\s*\](\s*|\d|[A-F]|[a-f])+)
Which effectively means that all of these strings get accepted:
[4 bytes] 66 74 79 70 33 67 70 35
[ 4 bytes ] 66 74 79 70 33 67 70 35
[1 byte] 66 74 79 70 33 67 70 35
I want to extract only the amount of bytes (just the number) from this string. I thought of doing this with re.search, but I'm not sure if that will work. What would be the cleanest and most performant way of doing this?
Use match.group to get the groups your regular expression defines:
import re
s = """[4 bytes] 66 74 79 70 33 67 70 35
[ 4 bytes ] 66 74 79 70 33 67 70 35
[1 byte] 66 74 79 70 33 67 70 35"""
r = re.compile(r"(\[\s*(\d)+ byte(s)?\s*\](\s*|\d|[A-F]|[a-f])+)")
for line in s.split("\n"):
m = r.match(line)
if m:
print(m.group(2))
The first group matches [4 bytes], the second only 4.
Output:
4
4
1
Related
I am trying to understand the tf.data.experimental.group_by_window() method in Tensorflow 2 but I have some difficulties.
For a reproducible example I use the one presented in the documentation:
components = np.arange(100).astype(np.int64)
dataset20 = tf.data.Dataset.from_tensor_slices(components)
dataset20 = dataset.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _,\
els: els.batch(10), window_size=100))
i = 0
for elem in dataset20:
print('i is {0}\n'.format(i))
print('elem is {0}'.format(elem.numpy()))
i += 1
print('\n--------------------------------\n')
i is 0
elem is [0 2 4 6 8]
--------------------------------
i is 1
elem is [1 3 5 7 9]
--------------------------------
Part of the confusion may be that the output doesn't correspond to the example code. The actual output from this:
components = np.arange(100).astype(np.int64)
dataset20 = tf.data.Dataset.from_tensor_slices(components)
dataset20 = dataset20.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _,els: els.batch(10), window_size=100))
for i, d in enumerate(dataset20):
print(i, d.numpy())
is
0 [ 0 2 4 6 8 10 12 14 16 18]
1 [20 22 24 26 28 30 32 34 36 38]
2 [40 42 44 46 48 50 52 54 56 58]
3 [60 62 64 66 68 70 72 74 76 78]
4 [80 82 84 86 88 90 92 94 96 98]
5 [ 1 3 5 7 9 11 13 15 17 19]
6 [21 23 25 27 29 31 33 35 37 39]
7 [41 43 45 47 49 51 53 55 57 59]
8 [61 63 65 67 69 71 73 75 77 79]
9 [81 83 85 87 89 91 93 95 97 99]
As described in the documentation here, the key func separates the data into groups with associated key values. In the example the key func separates the data [0, 99] into even and odd groups. The reduce_func then operates on the key, group pairs to produce another dataset. Note though that reduce_func only operates on groups of data no greater than window_size. In the example, the window size is greater than the two group sizes (100 vs 50 elements), so has no effect and all evens are given in batches of 10 followed by all odds. If window size is changed to a value less than 50 then it does have an effect. For example, if the window size is changed to 5 and also the batching is moved to outside the group_by_window function:
dataset20 = dataset20.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _, els: els, window_size=5)).batch(10)
then the following output is produced:
0 [0 2 4 6 8 1 3 5 7 9]
1 [10 12 14 16 18 11 13 15 17 19]
2 [20 22 24 26 28 21 23 25 27 29]
3 [30 32 34 36 38 31 33 35 37 39]
4 [40 42 44 46 48 41 43 45 47 49]
5 [50 52 54 56 58 51 53 55 57 59]
6 [60 62 64 66 68 61 63 65 67 69]
7 [70 72 74 76 78 71 73 75 77 79]
8 [80 82 84 86 88 81 83 85 87 89]
9 [90 92 94 96 98 91 93 95 97 99]
I have a 3D array and I want to rearrange them so each position in each dimension still corresponds to the next dimension, but the order of all 3 are changed.
I have a 3D array:
array = [32,30,96]
The third dimension with 96 values I already ranked ordered to find that :
ranks = [57 23 68 58 25 91 70 83 77 75 89 34 49 79 66 54 67 44 63 52 46 20 64 10 80 33 30 29 28 26 17 27 50 51 92 86 69 47 0 7 3 85 18 11 13 53 8 78 82 81 14 74 59 32 42 39 1 31 36 19 24 5 38 9 73 71 76 87 41 55 94 93 84 16 90 62 48 43 72 95 65 45 61 22 21 15 37 88 2 40 56 6 12 60 4 35]
I want to rearrange the whole 3D array to reflect the rank order. SO I still want the same association across the array, but in a different order.
So i want a new array that has the same dimensions, newarray=[32,30,96] but where its order based off the last dimension so [32,30,ranks] i don't care about the specific value, position 0 doesn't have to become 57, I want all the data corresponding to position 57 to now be position 0
The following code prints out number sequences up to around 100 from a list. A fair amount of the sequences print out above 100. I want to know how to only print out the numbers that add up to 100 on the button. I have tried printing the results to a list without luck. I tried putting in if and else statements to filter the results but with no luck. I looked at list comprehensions but I know those don't use while loops and so I don't know how to get the same results with a for loop. The only information I can seem to find online is basic lessons on how to use a while loop and just printing a list of numbers out. I could not find anything about how to sort a list of numbers printed.
Here is the code:
import itertools
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in list1:
a = 0
num1 = 2
num2 = i
seq = ([a])
it = itertools.cycle((num1,num2))
while a < 100:
a += next(it)
print(a, end = " ")
seq.append(a)
print()
print("Here are the numbers", num1, "&", num2, "added together in a sequence")
print()
Output:
2 3 5 6 8 9 11 12 14 15 17 18 20 21 23 24 26 27 29 30 32 33 35 36 38 39 41 42 44 45 47 48 50 51 53 54 56 57 59 60 62 63 65 66 68 69 71 72 74 75 77 78 80 81 83 84 86 87 89 90 92 93 95 96 98 99 101
Here are the numbers 2 & 1 added together in a sequence
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Here are the numbers 2 & 2 added together in a sequence
2 5 7 10 12 15 17 20 22 25 27 30 32 35 37 40 42 45 47 50 52 55 57 60 62 65 67 70 72 75 77 80 82 85 87 90 92 95 97 100
Here are the numbers 2 & 3 added together in a sequence
2 6 8 12 14 18 20 24 26 30 32 36 38 42 44 48 50 54 56 60 62 66 68 72 74 78 80 84 86 90 92 96 98 102
Here are the numbers 2 & 4 added together in a sequence
2 7 9 14 16 21 23 28 30 35 37 42 44 49 51 56 58 63 65 70 72 77 79 84 86 91 93 98 100
Here are the numbers 2 & 5 added together in a sequence
2 8 10 16 18 24 26 32 34 40 42 48 50 56 58 64 66 72 74 80 82 88 90 96 98 104
Here are the numbers 2 & 6 added together in a sequence
2 9 11 18 20 27 29 36 38 45 47 54 56 63 65 72 74 81 83 90 92 99 101
Here are the numbers 2 & 7 added together in a sequence
2 10 12 20 22 30 32 40 42 50 52 60 62 70 72 80 82 90 92 100
Here are the numbers 2 & 8 added together in a sequence
2 11 13 22 24 33 35 44 46 55 57 66 68 77 79 88 90 99 101
Here are the numbers 2 & 9 added together in a sequence
2 12 14 24 26 36 38 48 50 60 62 72 74 84 86 96 98 108
Here are the numbers 2 & 10 added together in a sequence
What I want is:
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Here are the numbers 2 & 2 added together in a sequence
2 5 7 10 12 15 17 20 22 25 27 30 32 35 37 40 42 45 47 50 52 55 57 60 62 65 67 70 72 75 77 80 82 85 87 90 92 95 97 100
Here are the numbers 2 & 3 added together in a sequence
2 7 9 14 16 21 23 28 30 35 37 42 44 49 51 56 58 63 65 70 72 77 79 84 86 91 93 98 100
Here are the numbers 2 & 5 added together in a sequence
2 10 12 20 22 30 32 40 42 50 52 60 62 70 72 80 82 90 92 100
Here are the numbers 2 & 8 added together in a sequence
Any and all help on this will be greatly appreciated.
Well, you only know if your sequence addition adds up to 100 once you are done, so you can't start printing before that point. This should do the job:
import itertools
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in list1:
a = 0
num1 = 2
num2 = i
seq = ([a])
it = itertools.cycle((num1,num2))
while a < 100:
a += next(it)
seq.append(a)
if seq[-1] == 100: # -1 as an index gets the last entry in a list
print(" ".join([str(val) for val in seq]))
print("Here are the numbers", num1, "&", num2, "added together in a sequence")
print()
Is there a way to find to find and rank rows in a Pandas Dataframe by their similarity to a row from another Dataframe?
My understanding of your question: you have two data frames, hopfully of the same column count. You want to rate first data frame's, the subject data frame, members by how close, i.e. similar, they are to any of the members of the target data frame.
I am not aware of a built in method.
It is probably not the most efficient way but here is how I'd approach:
#! /usr/bin/python3
import pandas as pd
import numpy as np
import pprint
pp = pprint.PrettyPrinter(indent=4)
# Simulate data
df_subject = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # This is the one we're iterating to check similarity to target.
df_target = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # This is the one we're checking distance to
# This will hold the min dstances.
distances=[]
# Loop to iterate over subject DF
for ix1,subject in df_subject.iterrows():
distances_cur=[]
# Loop to iterate over target DF
for ix2,target in df_target.iterrows():
distances_cur.append(np.linalg.norm(target-subject))
# Get the minimum distance for the subject set member.
distances.append(min(distances_cur))
# Distances to df
distances=pd.DataFrame(distances)
# Normalize.
distances=0.5-(distances-distances.mean(axis=0))/distances.max(axis=0)
# Column index joining, ordering and beautification.
Proximity_Ratings_name='Proximity Ratings'
distances=distances.rename(columns={0: Proximity_Ratings_name})
df_subject=df_subject.join(distances)
pp.pprint(df_subject.sort_values(Proximity_Ratings_name,ascending=False))
It should yeild something like the table below. Higher rating means there's a similar member in the target data frame:
A B C D Proximity Ratings
55 86 21 91 78 0.941537
38 91 31 35 95 0.901638
43 49 89 49 6 0.878030
98 28 98 98 36 0.813685
77 67 23 78 84 0.809324
35 52 16 36 58 0.802223
54 2 25 61 44 0.788591
95 76 3 60 46 0.766896
5 55 39 88 37 0.756049
52 79 71 90 70 0.752520
66 52 27 82 82 0.751353
41 45 67 55 33 0.739919
76 12 93 50 62 0.720323
94 99 84 39 63 0.716123
26 62 6 97 60 0.715081
40 64 50 37 27 0.714042
68 70 21 8 82 0.698824
47 90 54 60 65 0.676680
7 85 95 45 71 0.672036
2 14 68 50 6 0.661113
34 62 63 83 29 0.659322
8 87 90 28 74 0.647873
75 14 61 27 68 0.633370
60 9 91 42 40 0.630030
4 46 46 52 35 0.621792
81 94 19 82 44 0.614510
73 67 27 34 92 0.608137
30 92 64 93 51 0.608137
11 52 25 93 50 0.605770
51 17 48 57 52 0.604984
.. .. .. .. .. ...
64 28 56 0 9 0.397054
18 52 84 36 79 0.396518
99 41 5 32 34 0.388519
27 19 54 43 94 0.382714
92 69 56 73 93 0.382714
59 1 29 46 16 0.374878
58 2 36 8 96 0.362525
69 58 92 16 48 0.361505
31 27 57 80 35 0.349887
10 59 23 47 24 0.345891
96 41 77 76 33 0.345891
78 42 71 87 65 0.344398
93 12 31 6 27 0.329152
23 6 5 10 42 0.320445
14 44 6 43 29 0.319964
6 81 51 44 15 0.311840
3 17 60 13 22 0.293066
70 28 40 22 82 0.251549
36 95 72 35 5 0.249354
49 78 10 30 18 0.242370
17 79 69 57 96 0.225168
46 42 95 86 81 0.224742
84 58 81 59 86 0.221346
9 9 62 8 30 0.211659
72 11 51 74 8 0.159265
90 74 26 80 1 0.138993
20 90 4 6 5 0.117652
50 3 12 5 53 0.077088
42 90 76 42 1 0.075284
45 94 46 88 14 0.054244
Hope I understand correctly. Don't use if performance matters, I'm sure there's an algebraic way to approach this (Multiply matrices) that would run way faster.
the output I'm getting for looks like this:
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
100
I'm trying to get it to display by rows of 10.
row=1
column=1
while int(row) <= 10:
while int(column) <= 10:
print('{:3}'.format(int(row-1)*10+int(column)), end =' ')
column = column +1
if (column%10==0):
print('\n')
column = 1
row = row + 1
This is the code I have for the above result. I have been able to get the desired result with a while loop, but I am unable to get the same result with this while loop.
number = 1
while number <= 100:
print ('{:3}'.format(number), end=' ')
if (number%10==0):
print ("\n")
number = number +1