Python looping back in a range when stop is exceed a value? - python

I have a range in like below. What I am trying to do is to loop back to 0 if the range stop is greater that a certain value (this example 96). I can simply loop through the range as I did below, but is there a better way to do perform this in Python's range?
my_range = range(90, 100)
tmp_list=[]
for i in range(90, 100):
if i >= 96:
tmp_list.append(i-96)
else:
tmp_list.append(i)
print(tmp_list)
[90, 91, 92, 93, 94, 95, 0, 1, 2, 3]

Checkout itertools.cycle:
from itertools import cycle
def clipped_cycle(start, end):
c = cycle(range(0, 96))
# Discard till start
for _ in range(start):
next(c)
return c
c = clipped_cycle(90, 96)
for i in c:
print(i)
what you get is an infinite output stream that cycles along.
90
91
92
93
94
95
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
.
.
.
to get a limited number of outputs:
n = 7
for _ in range(n):
print(next(c))
gives
90
91
92
93
94
95
0

First, I did not understand why you have defined my_range = range(90, 100), if you are never going to use it.
You can use 'mod' in these cases.
Try this, short and effective
xlist = [i%96 for i in range(90,100)]

Related

The very fast way to find repeating combinations in Python using pandas?

I have this "DrawsDB.csv" sample file as input:
Day,Hour,N1,N2,N3,N4,N5,N6,N7,N8,N9,N10,N11,N12,N13,N14,N15,N16,N17,N18,N19,N20
1996-03-18,15:00,4,9,10,16,21,22,23,26,27,34,35,41,42,48,62,66,68,73,76,78
1996-03-19,15:00,6,12,15,19,28,33,35,39,44,48,49,59,62,63,64,67,69,71,75,77
1996-03-21,15:00,2,4,6,7,15,16,17,19,20,26,28,45,48,52,54,69,72,73,75,77
1996-03-22,15:00,3,8,15,17,19,25,30,33,34,35,36,38,44,49,60,61,64,67,68,75
1996-03-25,15:00,2,10,11,14,18,22,26,27,29,30,42,44,45,55,60,61,66,67,75,79
2022-01-01,15:00,1,9,12,17,33,34,36,37,38,44,45,46,53,56,58,60,62,63,70,72
2022-01-01,22:50,1,3,4,14,19,22,24,27,32,33,35,36,44,48,53,55,69,70,76,78
2022-01-02,15:00,13,15,16,19,22,24,31,37,38,43,47,58,64,66,70,72,73,75,76,78
2022-01-02,22:50,5,10,11,14,16,28,29,36,41,53,54,56,58,59,61,67,68,71,73,77
2022-01-03,15:00,8,9,10,11,15,20,21,22,26,30,35,36,39,42,52,58,63,64,73,80
2022-01-03,22:50,4,9,17,21,22,32,33,34,36,37,38,41,48,49,50,60,64,69,70,75
2022-01-04,15:00,4,5,7,9,11,16,17,21,22,25,30,37,38,39,44,49,52,60,65,78
2022-01-04,22:50,17,18,22,26,27,30,31,40,43,49,55,62,63,64,65,71,72,73,76,80
2022-01-05,15:00,1,5,8,14,15,20,23,25,26,33,34,35,37,47,54,59,67,70,72,76
2022-01-05,22:50,6,7,14,15,16,18,26,37,39,41,45,51,52,54,55,59,61,70,71,80
2022-01-06,15:00,9,10,11,17,28,30,32,41,42,44,45,49,50,51,55,65,67,72,76,78
2022-01-06,22:50,1,2,6,9,11,15,21,26,31,37,40,43,47,51,52,54,67,68,73,75
This is just a sample. The real csv file is more than 50.000 rows in total.
N1 to N20 columns contains random values, non repeating across the same row, which means they are not duplicate. And they are sorted from smallest one (N1) to the biggest one (N20).
I want to get repeating combos (e.g. of 5 numbers let's say) across all rows from the DataFrame from columns N1 to N20.
So, for the entire .csv file posted above the output should be:
(6, 15, 26, 52, 54) 3
(17, 33, 34, 36, 38) 3
(17, 33, 34, 36, 60) 3
(17, 33, 34, 38, 60) 3
(17, 33, 36, 38, 60) 3
(17, 34, 36, 38, 60) 3
(33, 34, 36, 38, 60) 3
...
This is the full ouput which I'm not posting here because of text size limitations:
https://pastebin.com/4EVXXSn1
Please check it out.
Sorry for making such long output, I tried to create a shorter one but didn't succeed in getting representative combos for it.
This is the Python code I wrote to accomplish what I need: (please read its commented lines too)
import pandas as pd
from itertools import combinations
from collections import Counter
df = pd.read_csv("DrawsDB.csv")
# looping through db using method found here:
# https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
df = df.reset_index() # make sure indexes pair with number of rows
draws = []
# please read this: https://stackoverflow.com/a/55557758/7710871 (Conclusion:iter is very slow)
for index, row in df.iterrows():
draws.append(
[row['N1'], row['N2'], row['N3'], row['N4'], row['N5'], row['N6'], row['N7'], row['N8'], row['N9'], row['N10'],
row['N11'], row['N12'], row['N13'], row['N14'], row['N15'], row['N16'], row['N17'], row['N18'], row['N19'],
row['N20']])
# comparing to each other in order to check for repeating combos:
repeating_combos = []
for i in range(len(draws)):
for j in draws[i + 1:]:
repeating_combos.append(sorted(list(set(draws[i]).intersection(j))))
# e.g. getting any repeating combo of 5 across all rows:
combos_of_5 = []
for each in repeating_combos:
if len(each) == 5:
combos_of_5.append(tuple(each))
# print(each)
elif len(each) > 5:
# e.g. a repeating sequence of 6 numbers means in fact 6 combos taken by 5 numbers in this case.
# e.g. a repeating sequence of 7 numbers means in fact 21 combos of 5 numbers and so on.
# Combinations(k, n)
for cmb in combinations(each, 5):
combos_of_5.append(tuple(sorted(list(set(cmb)))))
# count how many times each combo appear:
x = Counter(combos_of_5)
sorted_x = dict(sorted(x.items(), key=lambda item: item[1], reverse=True))
for k, v in sorted_x.items():
print(k, v)
It works very well, as expected but there is one single problem: for a bigger DataFrame it takes a lot of time to do its job done. More than that, if you want to get repeating combinations with more than 5 numbers (let's say with 6, 7, 8 or 9 numbers) it will take for ever to run.
How to do it in full pandas in a very fast and much more smarter way than I did?
Also, please note that it does not generate every combo in the first instance and after that start looking for each of those combos into DataFrame because it will take even longer.
Thank you very much in advance!
P.S. What if the numbers from N1 to N20 were not sorted? Will this make any difference?
I read this topic and many others already but none is asking for the same thing so I think it is not duplicate and this could help many other have the same or very similar problem.
Proof of work:
Given this part of your dataframe:
index
Day
Hour
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
N12
N13
N14
N15
N16
N17
N18
0
1996-03-18
15:00
4
9
10
16
21
22
23
26
27
34
35
41
42
48
62
66
68
73
1
1996-03-19
15:00
6
12
15
19
28
33
35
39
44
48
49
59
62
63
64
67
69
71
2
1996-03-21
15:00
2
4
6
7
15
16
17
19
20
26
28
45
48
52
54
69
72
73
3
1996-03-22
15:00
3
8
15
17
19
25
30
33
34
35
36
38
44
49
60
61
64
67
You can update your code with something similar to the one below:
check = [6,15]
df['check'] = df.iloc[:,2:].apply(lambda r: all(s in r.values for s in check), axis=1)
true_count = df.check.sum()
print(f'The following numbers {check} appear {true_count} time(s) in the dataframe.')
Result:
The following numbers [6, 15] appear 2 time(s) in the dataframe.

How can I put text efficient in sub lists inside a list? (Python)

I wrote some code to calculate the maximum path sum of a triangle. This is the triangle:
75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
So the maximum path sum of this triangle is: 75+95+82+87+82 = 418
This is my code to calculate it:
lst = [[72],
[95,64],
[17,47,82],
[18,35,87,10],
[20,4,82,47,65]]
something = 1
i = 0
mid = 0
while something != 0:
for x in lst:
new = max(lst[i])
print(new)
i += 1
mid += new
something = 0
print(mid)
As you can see I put every item of the triangle down in lists and put the lists in a (head) list. This are not a lot numbers, but what if I have a bigger triangle? To do it manually is a lot of work. So my question is: How can I put the numbers from the triangle efficient in sub lists inside a head list?
If you have input starting with a line containing the number of rows in the triangle, followed by all the numbers on that many rows, read the first number to get the limit in a range(). Then use a list comprehension to create the list of sublists.
rows = int(input())
lst = [list(map(int, input().split())) for _ in range(rows)]
For instance, to read your sample triangle, the input would be:
5
75
95 64
17 47 82
18 35 87 10
20 04 82 47 65

Trying to construct a greedy algorithm with python

So i'm trying to create a greedy algorithm for a knapsack problem. The txt file below is the knap20.txt file. The first line gives the number of items, in this case 20. The last line gives the capacity of the knapsack, in this case 524. The remaining lines give the index, value and weight of each item.
My function is to ideally return the solution in a list and the value of the weights
From what I can tell by my results, my program is working correctly. Is it working as you would expect, and how can i improve it?
txt file
20
1 91 29
2 60 65
3 61 71
4 9 60
5 79 45
6 46 71
7 19 22
8 57 97
9 8 6
10 84 91
11 20 57
12 72 60
13 32 49
14 31 89
15 28 2
16 81 30
17 55 90
18 43 25
19 100 82
20 27 19
524
python file
import os
import matplotlib.pyplot as plt
def get_optimal_value(capacity, weights, values):
value = 0.
numItems = len(values)
valuePerWeight = sorted([[values[i] / weights[i], weights[i]] for i in range(numItems)], reverse=True)
while capacity > 0 and numItems > 0:
maxi = 0
idx = None
for i in range(numItems):
if valuePerWeight[i][1] > 0 and maxi < valuePerWeight[i][0]:
maxi = valuePerWeight[i][0]
idx = i
if idx is None:
return 0.
if valuePerWeight[idx][1] <= capacity:
value += valuePerWeight[idx][0]*valuePerWeight[idx][1]
capacity -= valuePerWeight[idx][1]
else:
if valuePerWeight[idx][1] > 0:
value += (capacity / valuePerWeight[idx][1]) * valuePerWeight[idx][1] * valuePerWeight[idx][0]
return values, value
valuePerWeight.pop(idx)
numItems -= 1
return value
def read_kfile(fname):
print('file started')
with open(fname) as kfile:
print('fname found', fname)
lines = kfile.readlines() # reads the whole file
n = int(lines[0])
c = int(lines[n+1])
vs = []
ws = []
lines = lines[1:n+1] # Removes the first and last line
for l in lines:
numbers = l.split() # Converts the string into a list
vs.append(int(numbers[1])) # Appends value, need to convert to int
ws.append(int(numbers[2])) # Appends weigth, need to convert to int
return n, c, vs, ws
dir_path = os.path.dirname(os.path.realpath(__file__)) # Get the directory where the file is located
os.chdir(dir_path) # Change the working directory so we can read the file
knapfile = 'knap20.txt'
nitems, capacity, values, weights = read_kfile(knapfile)
val1,val2 = get_optimal_value(capacity, weights, values)
print ('values',val1)
print('value',val2)
result
values [91, 60, 61, 9, 79, 46, 19, 57, 8, 84, 20, 72, 32, 31, 28, 81, 55, 43, 100, 27]
value 733.2394366197183

How to read a file word by word

I have a PPM file that I need to do certain operations on. The file is structured as in the following example. The first line, the 'P3' just says what kind of document it is. In the second line it gives the pixel dimension of an image, so in this case it's telling us that the image is 480x640. In the third line it declares the maximum value any color can take. After that there are lines of code. Every three integer group gives an rbg value for one pixel. So in this example, the first pixel has rgb value 49, 49, 49. The second pixel has rgb value 48, 48, 48, and so on.
P3
480 640
255
49 49 49 48 48 48 47 47 47 46 46 46 45 45 45 42 42 42 38 38
38 35 35 35 23 23 23 8 8 8 7 7 7 17 17 17 21 21 21 29 29
29 41 41 41 47 47 47 49 49 49 42 42 42 33 33 33 24 24 24 18 18
...
Now as you may notice, this particular picture is supposed to be 640 pixels wide which means 640*3 integers will provide the first row of pixels. But here the first row is very, very far from containing 640*3 integers. So the line-breaks in this file are meaningless, hence my problem.
The main way to read Python files is line-by-line. But I need to collect these integers into groups of 640*3 and treat that like a line. How would one do this? I know I could read the file in line-by-line and append every line to some list, but then that list would be massive and I would assume that doing so would place an unacceptable burden on a device's memory. But other than that, I'm out of ideas. Help would be appreciated.
To read three space-separated word at a time from a file:
with open(filename, 'rb') as file:
kind, dimensions, max_color = map(next, [file]*3) # read 3 lines
rgbs = zip(*[(int(word) for line in file for word in line.split())] * 3)
Output
[(49, 49, 49),
(48, 48, 48),
(47, 47, 47),
(46, 46, 46),
(45, 45, 45),
(42, 42, 42),
...
See What is the most “pythonic” way to iterate over a list in chunks?
To avoid creating the list at once, you could use itertools.izip() that would allow to read one rgb value at a time.
Probably not the most 'pythonic' way but...
Iterate through the lines containing integers.
Keep four counts - a count of 3 - color_code_count, a count of 1920 - numbers_processed, a count - col (0-639), and another - rows (0-479).
For each integer you encounter, add it to a temporary list at index of list[color_code_count]. Increment color_code_count, col, and numbers_processed.
Once color_code_count is 3, you take your temporary list and create a tuple 3 or triplet (not sure what the term is but your structure will look like (49,49,49) for the first pixel), and add that to a list of 640 columns, and 480 rows - insert your (49, 49, 49) into pixels[col][row].
Increment col.
Reset color_code_count.
'numbers_processed' will continue to increment until you get to 1920.
Once you hit 1920, you've reached the end of the first row.
Reset numbers_processed and col to zero, increment row by 1.
By this point, you should have 640 tuple3s or triplets in the row zero starting with (49,49,49), (48, 48, 48), (47, 47, 47), etc. And you're now starting to insert pixel values in row 1 column 0.
Like I said, probably not the most 'pythonic' way. There are probably better ways of doing this using join and map but I think this might work? This 'solution' if you want to call it that, shouldn't care about number of integers on any line since you're keeping count of how many numbers you expect to run through (1920) before you start a new row.
A possible way to go through each word is to iterate through each line then .split it into each word.
the_file = open("file.txt",r)
for line in the_file:
for word in line.split():
#-----Your Code-----
From there you can do whatever you want with your "words." You can add if-statements to check if there are numbers in each line with: (Though not very pythonic)
for line in the_file:
if "1" not in line or "2" not in line ...:
for word in line.split():
#-----Your Code-----
Or you can test if there is anything in each line: (Much more pythonic)
for line in the_file:
for word in line.split():
if len(word) != 0 or word != "\n":
#-----Your Code-----
I would recommend adding each of your new "lines" to a new document.
I am a C programmer. Sorry if this code looks like C Style:
f = open("pixel.ppm", "r")
type = f.readline()
height, width = f.readline().split()
height, width = int(height), int(width)
max_color = int(f.readline());
colors = []
count = 0
col_count = 0
line = []
while(col_count < height):
count = 0
i = 0
row =[]
while(count < width * 3):
temp = f.readline().strip()
if(temp == ""):
col_count = height
break
temp = temp.split()
line.extend(temp)
i = 0
while(i + 2 < len(line)):
row.append({'r':int(line[i]),'g':int(line[i+1]),'b':int(line[i+2])})
i = i+3
count = count +3
if(count >= width *3):
break
if(i < len(line)):
line = line[i:len(line)]
else:
line = []
col_count += 1
colors.append(row)
for row in colors:
for rgb in row:
print(rgb)
print("\n")
You can tweak this according to your needs. I tested it on this file:
P4
3 4
256
4 5 6 4 7 3
2 7 9 4
2 4
6 8 0
3 4 5 6 7 8 9 0
2 3 5 6 7 9 2
2 4 5 7 2
2
This seems to do the trick:
from re import findall
def _split_list(lst, i):
return lst[:i], lst[i:]
def iter_ppm_rows(path):
with open(path) as f:
ftype = f.readline().strip()
h, w = (int(s) for s in f.readline().split(' '))
maxcolor = int(f.readline())
rlen = w * 3
row = []
next_row = []
for line in f:
line_ints = [int(i) for i in findall('\d+\s+', line)]
if not row:
row, next_row = _split_list(line_ints, rlen)
else:
rest_of_row, next_row = _split_list(line_ints, rlen - len(row))
row += rest_of_row
if len(row) == rlen:
yield row
row = next_row
next_row = []
It isn't very pretty, but it allows for varying whitespace between numbers in the file, as well as varying line lengths.
I tested it on a file that looked like the following:
P3
120 160
255
0 1 2 3 4 5 6 7
8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
[...]
9993 9994 9995 9996 9997 9998 9999
That file used random line lengths, but printed numbers in order so it was easy to tell at what value the rows began and stopped. Note that its dimensions are different than in the question's example file.
Using the following test code...
for row in iter_ppm_rows('mock_ppm.txt'):
print(len(row), row[0], row[-1])
...the result was the following, which seems to not be skipping over any data and returning rows of the right size.
480 0 479
480 480 959
480 960 1439
480 1440 1919
480 1920 2399
480 2400 2879
480 2880 3359
480 3360 3839
480 3840 4319
480 4320 4799
480 4800 5279
480 5280 5759
480 5760 6239
480 6240 6719
480 6720 7199
480 7200 7679
480 7680 8159
480 8160 8639
480 8640 9119
480 9120 9599
As can be seen, trailing data at the end of the file that can't represent a complete row was not yielded, which was expected but you'd likely want to account for it somehow.

Printing a rather specific matrix

I have a list consisting of 148 entries. Each entry is a four digit number. I would like to print out the result as this:
1 14 27 40
2 15 28 41
3 16 29 42
4 17 30 43
5 18 31 44
6 19 32 45
7 20 33 46
8 21 34 47
9 22 35 48
10 23 36 49
11 24 37 50
12 25 38 51
13 26 39 52
53
54
55... and so on
I have some code that work for the first 13 rows and 4 columns:
kort_identifier = [my_list_with_the_entries]
print_val = 0
print_num_1 = 0
print_num_2 = 13
print_num_3 = 26
print_num_4 = 39
while (print_val <= 36):
print kort_identifier[print_num_1], '%10s' % kort_identifier[print_num_2], '%10s' % kort_identifier[print_num_3], '%10s' % kort_identifier[print_num_4]
print_val += 1
print_num_1 += 1
print_num_2 += 1
print_num_3 += 1
print_num_4 += 1
I feel this is an awful solution and there has to be a better and simpler way of doing this. I have searched through here (searched for printing tables and matrices) and tried those solution but none seems to work with this odd table/matrix behaviour that I need.
Please point me in the right direction.
A bit tricky, but here you go. I opted to manipulate the list until it had the right shape, instead of messing around with indexes.
lst = range(1, 149)
lst = [lst[i:i+13] for i in xrange(0, len(lst), 13)]
lst = zip(*[lst[i] + lst[i+4] + lst[i+8] for i in xrange(4)])
for row in lst:
for col in row:
print col,
print
It might be overkill, but you could just make a numpy array.
import numpy as np
x = np.array(kort_identifier).reshape(2, 13, 4)
for subarray in x:
for row in subarray:
print row

Categories