Extract data from line and add to specific numbered boxes - python

Sorry, but I need your help if possible. I am a complete beginner in Python and I am completely stuck. I would like to know what would be the first steps to solve my problem.
I have (many pages) of lines with the following structure:
pg10_65 * 3.2200 * 22 24 28 30 33 34 36 37
pg10_116 * 3.2420 * 24 28 30 33 34 37
pg10_118 * 3.1500 * 19 24 28 30 33 34 36
pg10_120 * 3.1230 * 24 28 30 33 34 36 37
pg74_32 * 3.0350 * 17 28 30 33 34 36 37 38
For each line and in between the * symbols I have a value (digit dot four decimals) and after the last * symbol I have a series of numbers, from 1 to 68 but not all of them.
I have 68 boxes.
In this example, and for the first line, I want to add 3.2200 to boxes 22, 24, ..., 36, 37. If there is a 0 add 3.2200 to 0, if there is another value, add to that value.
For the second line, I want to add the values 3.2420 to boxes 24, 28, ..., 34, 37. If there is a 0 add to 0, if there is another value, add 3.2420 to that value.
And so on for each of the lines.
In the end I would have 60 boxes with all values corresponding to that boxes added.
I am completely stuck on this.
Thanks a lot to everyone for your advice.
José

I came up with a solution to your problem.
def add_to_boxes(input_string, all_boxes):
'''
Takes a list of boxes and your string input and does the additions you requested
'''
_, float_value, box_list = input_string.split('*')
float_value = float(float_value) #convert from string to float
box_list = [int (s) for s in box_list.split(' ')[1:]] #convert from string to list of integers
print("Adding ", float(float_value), " to ", box_list)
for box in box_list:
all_boxes[box-1] += float_value #boxes are numbered from 0 to 67
boxes = [0 for i in range(68)] # 68 boxes with the value 0
input_str = "pg10_65 * 3.2200 * 22 24 28 30 33 34 36 37"
add_to_boxes(input_str, boxes)
print(boxes)
Please let me know if you have any questions regarding how it works.

Related

The very fast way to find repeating combinations in Python using pandas?

I have this "DrawsDB.csv" sample file as input:
Day,Hour,N1,N2,N3,N4,N5,N6,N7,N8,N9,N10,N11,N12,N13,N14,N15,N16,N17,N18,N19,N20
1996-03-18,15:00,4,9,10,16,21,22,23,26,27,34,35,41,42,48,62,66,68,73,76,78
1996-03-19,15:00,6,12,15,19,28,33,35,39,44,48,49,59,62,63,64,67,69,71,75,77
1996-03-21,15:00,2,4,6,7,15,16,17,19,20,26,28,45,48,52,54,69,72,73,75,77
1996-03-22,15:00,3,8,15,17,19,25,30,33,34,35,36,38,44,49,60,61,64,67,68,75
1996-03-25,15:00,2,10,11,14,18,22,26,27,29,30,42,44,45,55,60,61,66,67,75,79
2022-01-01,15:00,1,9,12,17,33,34,36,37,38,44,45,46,53,56,58,60,62,63,70,72
2022-01-01,22:50,1,3,4,14,19,22,24,27,32,33,35,36,44,48,53,55,69,70,76,78
2022-01-02,15:00,13,15,16,19,22,24,31,37,38,43,47,58,64,66,70,72,73,75,76,78
2022-01-02,22:50,5,10,11,14,16,28,29,36,41,53,54,56,58,59,61,67,68,71,73,77
2022-01-03,15:00,8,9,10,11,15,20,21,22,26,30,35,36,39,42,52,58,63,64,73,80
2022-01-03,22:50,4,9,17,21,22,32,33,34,36,37,38,41,48,49,50,60,64,69,70,75
2022-01-04,15:00,4,5,7,9,11,16,17,21,22,25,30,37,38,39,44,49,52,60,65,78
2022-01-04,22:50,17,18,22,26,27,30,31,40,43,49,55,62,63,64,65,71,72,73,76,80
2022-01-05,15:00,1,5,8,14,15,20,23,25,26,33,34,35,37,47,54,59,67,70,72,76
2022-01-05,22:50,6,7,14,15,16,18,26,37,39,41,45,51,52,54,55,59,61,70,71,80
2022-01-06,15:00,9,10,11,17,28,30,32,41,42,44,45,49,50,51,55,65,67,72,76,78
2022-01-06,22:50,1,2,6,9,11,15,21,26,31,37,40,43,47,51,52,54,67,68,73,75
This is just a sample. The real csv file is more than 50.000 rows in total.
N1 to N20 columns contains random values, non repeating across the same row, which means they are not duplicate. And they are sorted from smallest one (N1) to the biggest one (N20).
I want to get repeating combos (e.g. of 5 numbers let's say) across all rows from the DataFrame from columns N1 to N20.
So, for the entire .csv file posted above the output should be:
(6, 15, 26, 52, 54) 3
(17, 33, 34, 36, 38) 3
(17, 33, 34, 36, 60) 3
(17, 33, 34, 38, 60) 3
(17, 33, 36, 38, 60) 3
(17, 34, 36, 38, 60) 3
(33, 34, 36, 38, 60) 3
...
This is the full ouput which I'm not posting here because of text size limitations:
https://pastebin.com/4EVXXSn1
Please check it out.
Sorry for making such long output, I tried to create a shorter one but didn't succeed in getting representative combos for it.
This is the Python code I wrote to accomplish what I need: (please read its commented lines too)
import pandas as pd
from itertools import combinations
from collections import Counter
df = pd.read_csv("DrawsDB.csv")
# looping through db using method found here:
# https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
df = df.reset_index() # make sure indexes pair with number of rows
draws = []
# please read this: https://stackoverflow.com/a/55557758/7710871 (Conclusion:iter is very slow)
for index, row in df.iterrows():
draws.append(
[row['N1'], row['N2'], row['N3'], row['N4'], row['N5'], row['N6'], row['N7'], row['N8'], row['N9'], row['N10'],
row['N11'], row['N12'], row['N13'], row['N14'], row['N15'], row['N16'], row['N17'], row['N18'], row['N19'],
row['N20']])
# comparing to each other in order to check for repeating combos:
repeating_combos = []
for i in range(len(draws)):
for j in draws[i + 1:]:
repeating_combos.append(sorted(list(set(draws[i]).intersection(j))))
# e.g. getting any repeating combo of 5 across all rows:
combos_of_5 = []
for each in repeating_combos:
if len(each) == 5:
combos_of_5.append(tuple(each))
# print(each)
elif len(each) > 5:
# e.g. a repeating sequence of 6 numbers means in fact 6 combos taken by 5 numbers in this case.
# e.g. a repeating sequence of 7 numbers means in fact 21 combos of 5 numbers and so on.
# Combinations(k, n)
for cmb in combinations(each, 5):
combos_of_5.append(tuple(sorted(list(set(cmb)))))
# count how many times each combo appear:
x = Counter(combos_of_5)
sorted_x = dict(sorted(x.items(), key=lambda item: item[1], reverse=True))
for k, v in sorted_x.items():
print(k, v)
It works very well, as expected but there is one single problem: for a bigger DataFrame it takes a lot of time to do its job done. More than that, if you want to get repeating combinations with more than 5 numbers (let's say with 6, 7, 8 or 9 numbers) it will take for ever to run.
How to do it in full pandas in a very fast and much more smarter way than I did?
Also, please note that it does not generate every combo in the first instance and after that start looking for each of those combos into DataFrame because it will take even longer.
Thank you very much in advance!
P.S. What if the numbers from N1 to N20 were not sorted? Will this make any difference?
I read this topic and many others already but none is asking for the same thing so I think it is not duplicate and this could help many other have the same or very similar problem.
Proof of work:
Given this part of your dataframe:
index
Day
Hour
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
N12
N13
N14
N15
N16
N17
N18
0
1996-03-18
15:00
4
9
10
16
21
22
23
26
27
34
35
41
42
48
62
66
68
73
1
1996-03-19
15:00
6
12
15
19
28
33
35
39
44
48
49
59
62
63
64
67
69
71
2
1996-03-21
15:00
2
4
6
7
15
16
17
19
20
26
28
45
48
52
54
69
72
73
3
1996-03-22
15:00
3
8
15
17
19
25
30
33
34
35
36
38
44
49
60
61
64
67
You can update your code with something similar to the one below:
check = [6,15]
df['check'] = df.iloc[:,2:].apply(lambda r: all(s in r.values for s in check), axis=1)
true_count = df.check.sum()
print(f'The following numbers {check} appear {true_count} time(s) in the dataframe.')
Result:
The following numbers [6, 15] appear 2 time(s) in the dataframe.

how to sequentially assign two numbers in an array?

I try to assign two numbers diagonally to each other in the matrix according to certain procedures.
At first the first 1st number in the penultimate line of the line with the 2nd number in the last line, then the first number in the line up with the 2nd number in the penultimate line, etc..This sequence is shown in the example below. The matrix does not always have to be the same size.
Example
a=np.array([[11,12,13],
[21,22,23],
[31,32,33]])
required output:
21 32
11 22
11 33
22 33
12 23
or
a=np.array([[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44]])
required output:
31 42
21 32
21 43
32 43
11 22
11 33
11 44
22 33
22 44
12 23
12 34
23 34
13 24
It is possible?
Here's an iterative solution, assuming a square matrix. Modifying this for non-square matrices shouldn't be hard.
import numpy as np
a=np.array([[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44]])
w,h = a.shape
for y0 in range(1,h):
y = h-y0-1
for x in range(h-y-1):
print( a[y+x,x], a[y+x+1,x+1] )
for x in range(1,w-1):
for y in range(w-x-1):
print( a[y,x+y], a[y+1,x+y+1] )

How to read one column data as one by one row in csv file using python

Here I have a dataset with three inputs. Three inputs x1,x2,x3. Here I want to read just x2 column and in that column data stepwise row by row.
Here I wrote a code. But it is just showing only letters.
Here is my code
data = pd.read_csv('data6.csv')
row_num =0
x=[]
for col in data:
if (row_num==1):
x.append(col[0])
row_num =+ 1
print(x)
result : x1,x2,x3
What I expected output is:
expected output x2 (read one by one row)
65
32
14
25
85
47
63
21
98
65
21
47
48
49
46
43
48
25
28
29
37
Subset of my csv file :
x1 x2 x3
6 65 78
5 32 59
5 14 547
6 25 69
7 85 57
8 47 51
9 63 26
3 21 38
2 98 24
7 65 96
1 21 85
5 47 94
9 48 15
4 49 27
3 46 96
6 43 32
5 48 10
8 25 75
5 28 20
2 29 30
7 37 96
Can anyone help me to solve this error?
If you want list from x2 use:
x = data['x2'].tolist()
I am not sure I even get what you're trying to do from your code.
What you're doing (after fixing the indentation to make it somewhat correct):
Iterate through all columns of your dataframe
Take the first character of the column name if row_num is equal to 1.
Based on this guess:
import pandas as pd
data = pd.read_csv("data6.csv")
row_num = 0
x = []
for col in data:
if row_num == 1:
x.append(col[0])
row_num = +1
print(x)
What you probably want to do:
import pandas as pd
data = pd.read_csv("data6.csv")
# Make a list containing the values in column 'x2'
x = list(data['x2'])
# Print all values at once:
print(x)
# Print one value per line:
for val in x:
print(val)
When you are using pandas you can use it. You can try this to get any specific column values by using list to direct convert into a list.For loop not needed
import pandas as pd
data = pd.read_csv('data6.csv')
print(list(data['x2']))

Random positioning of pawns on chessboard

I want to (pseudo) randomly position a number points on a grid. Think of it as 10 x 10 chessboard with 100 squares) And think of these points as pawns on the chessboard, that occupy one square each. What I want is for the pawns to be "evenly" distributed over the board. The middle-square method is fine for generating the random numbers.
00 01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 ...
My problem that is that if I store the grid squares as list, and draw them from top left to bottom right, then square 9 is not next to 10, as you would expect. 10, is next to 0 & 20. What's the best way around this?
Element numbering in lists starts with zero, not with 1. So your first square has number 0 (you do see it, right?) and the 10th square has number 9. That is why the 1st square in the 2nd row has number 10 and not 11. It is because 0 + 10 = 10. If you want to print the numbers starting from 1 you can do it adding 1 to a square number in print:
print (square[number] + 1)
As I do not see your code I cannot provide a better code example which would perfectly pass to your code.
Just because you store the final squares as a list, that doesn't mean you have to generate random numbers in a linear range. If you know you have, say, a 100x100 grid, then choose points by choosing X and Y coordinates separately, and then fill in square number 100 * X + Y.
If you want the points distributed randomly, then just make sure you choose both X and Y from a uniform random distribution 0..99. If you want the points distributed evenly, use something like a Halton sequence.

Printing a rather specific matrix

I have a list consisting of 148 entries. Each entry is a four digit number. I would like to print out the result as this:
1 14 27 40
2 15 28 41
3 16 29 42
4 17 30 43
5 18 31 44
6 19 32 45
7 20 33 46
8 21 34 47
9 22 35 48
10 23 36 49
11 24 37 50
12 25 38 51
13 26 39 52
53
54
55... and so on
I have some code that work for the first 13 rows and 4 columns:
kort_identifier = [my_list_with_the_entries]
print_val = 0
print_num_1 = 0
print_num_2 = 13
print_num_3 = 26
print_num_4 = 39
while (print_val <= 36):
print kort_identifier[print_num_1], '%10s' % kort_identifier[print_num_2], '%10s' % kort_identifier[print_num_3], '%10s' % kort_identifier[print_num_4]
print_val += 1
print_num_1 += 1
print_num_2 += 1
print_num_3 += 1
print_num_4 += 1
I feel this is an awful solution and there has to be a better and simpler way of doing this. I have searched through here (searched for printing tables and matrices) and tried those solution but none seems to work with this odd table/matrix behaviour that I need.
Please point me in the right direction.
A bit tricky, but here you go. I opted to manipulate the list until it had the right shape, instead of messing around with indexes.
lst = range(1, 149)
lst = [lst[i:i+13] for i in xrange(0, len(lst), 13)]
lst = zip(*[lst[i] + lst[i+4] + lst[i+8] for i in xrange(4)])
for row in lst:
for col in row:
print col,
print
It might be overkill, but you could just make a numpy array.
import numpy as np
x = np.array(kort_identifier).reshape(2, 13, 4)
for subarray in x:
for row in subarray:
print row

Categories