Find lowest latest and average milk production? - python

i have set of line which i read from file in to list . Each row is a new record and each row consist 3 numbers and 1 letter. There are 5 condition in it
507 W 1000 1
1 M 6 2
1 W 1400 3
1 M 8 8
1 T 101 10
507 M 4 12
1 W 1700 15
1 M 7 16
507 M 8 20
1) The first element is a cow ID a unique number representing a cow inside the data set.
2) The second element is a action code 'W' 'M' 'T'.
3) if 'W' is come then the 3rd element is the latest weight of cow.
4) if 'M' is come then the 3rd element is the amount of milk the cow produce.
5) if 'T' is come then the 3rd element is the current temperature of cow.
6) IMPORTANT: if a cow doesn't have least one W and least M then exclude it from the output.
output: (id , lowest weight , max weight , average milk)
507 1000 1000 6
1 1400 1700 7
My output is correct but how should i apply the 6th condition in my code?
my code
import sys
filename = sys.argv[1]
arr = []
with open(filename, "r") as fileToProcess:
for line in fileToProcess:
arr.append(line.strip().split(' '))
#print(L)
if not arr:
print("EMPTY")
else:
lst2 = [item[0] for item in arr]
# print(lst2)
mylist = list(set(lst2))
# print(mylist[0])
sum_1_M = 0
sum_1_W = 0
list_1 = []
count = 0
for i in range(len(mylist)):
for x in arr:
if x[0] == mylist[i] and x[1] == 'M':
sum_1_M += int(x[2])
count = count + 1
elif x[0] == mylist[i] and x[1] == 'W':
sum_1_W += int(x[2])
list_1.append(int(x[2]))
list_1.sort()
print('{} {} {} {}'.format(mylist[i], list_1[0], list_1[len(list_1) - 1], int(sum_1_M / count)))
sum_1_M = 0
sum_1_W = 0
list_1 = []
count = 0

I think you can actually calculate everything while reading, the key is to use a dictionary and updates entry while parsing line by line. Take a look at this code I made for you,
import sys
filename = sys.argv[1]
dic = {}
with open(filename, "r") as fileToProcess:
for line in fileToProcess:
arr = line.strip().split(' ')
if arr[0] not in dic:
dic[arr[0]] = {
'min_weight': 99999999,
'max_weight': 0,
'total_milk': 0,
'count_milk': 0
}
if arr[1] == 'W':
if dic[arr[0]]['min_weight'] >= int(arr[2]):
dic[arr[0]]['min_weight'] = int(arr[2])
if dic[arr[0]]['max_weight'] <= int(arr[2]):
dic[arr[0]]['max_weight'] = int(arr[2])
elif arr[1] == 'M':
dic[arr[0]]['total_milk'] += int(arr[2])
dic[arr[0]]['count_milk'] += 1
for k, v in dic.items():
if v['max_weight'] > 0 and v['total_milk'] > 0:
print('({}, {}, {}, {})'.format(
k,
v['min_weight'],
v['max_weight'],
v['total_milk']/v['count_milk']
))

Related

Find index of less or equal value in list recursively (python)

Got task to find indexes of value and double that value in input list.
In input first line we get list range, second - list of values, third - value to find.
The output is 2 numbers - indexes of equal or higher value and double the value. If there is none, return -1
Example input:
6
1 2 4 4 6 8
3
Example output:
3 5
What i got so far is standart binary search func, but i dont get how to make it search not only for exact number but nearest higher.
def binarySearch(arr, x, left, right):
if right <= left:
return -1
mid = (left + right) // 2
if arr[mid] >= x:
return mid
elif x < arr[mid]:
return binarySearch(arr, x, left, mid)
else:
return binarySearch(arr, x, mid + 1, right)
def main():
n = int(input())
k = input().split()
q = []
for i in k:
q.append(int(i))
s = int(input())
res1 = binarySearch(q, s, q[0], (n-1))
res2 = binarySearch(q, (s*2), q[0], (n-1))
print(res1, res2)
if __name__ == "__main__":
main()
The input is:
6
1 2 4 4 6 8
3
And output:
3 4
Here's a modified binary search which will return the base zero index of a value if found or the index of the next highest value in the list.
def bsearch(lst, x):
L = 0
R = len(lst) - 1
while L <= R:
m = (L + R) // 2
if (v := lst[m]) == x:
return m
if v < x:
L = m + 1
else:
R = m - 1
return L if L < len(lst) else -1
data = list(map(int, '1 2 4 4 6 8'.split()))
for x in range(10):
print(x, bsearch(data, x))
Output:
0 0
1 0
2 1
3 2
4 2
5 4
6 4
7 5
8 5
9 -1

List Comprehension nested in Dict Comprehension

I want to create a dict with lists as values, where the content on the lists depends on whether or not the key (numbers 1 to 100) is dividable by 3,5 and/or 7
The output would be like this:
{
1: ['nodiv3', 'nodiv5', 'nodiv7'],
3: ['div3', 'nodiv5', 'nodiv7'],
15: ['div3', 'div5', 'nodiv7'],
}
Similar questions where about filtering the list/values, not creating them.
dict_divider = {}
for x in range(0,101):
div_list= []
if x % 3 == 0:
div_list.append('div3')
else:
div_list.append('nodiv3')
if x % 5 == 0:
div_list.append('div5')
else:
div_list.append('nodiv5')
if x % 7 == 0:
div_list.append('div7')
else:
div_list.append('nodiv7')
dict_divider[x] = div_list
This works just fine, but is there a way to do this with a pythonic one-/twoliner?
Something along like this: d = dict((val, range(int(val), int(val) + 2)) for val in ['1', '2', '3'])
Pythonic is not about one or two liners. In my opinion is (mainly) about readability, perhaps this could be considered more pythonic:
def label(n, divisor):
return f"{'' if n % divisor == 0 else 'no'}div{divisor}"
def find_divisors(n, divisors=[3, 5, 7]):
return [label(n, divisor) for divisor in divisors]
dict_divider = {x: find_divisors(x) for x in range(1, 101)}
print(dict_divider)
You don't actually need to do all these brute-force divisions. Every third number is divisible by three, every seventh number is divisible by seven, etc:
0 1 2 3 4 5 6 7 8 9 ... <-- range(10)
0 1 2 0 1 2 0 1 2 0 ... <-- mod 3
0 1 2 3 4 5 6 7 8 9 ... <-- range(10)
0 1 2 3 4 5 6 0 1 2 ... <-- mod 7
So the best approach should take advantage of that fact, using the repeating patterns of modulo. Then, we can just zip the range with however many iterators you want to use.
import itertools
def divs(n):
L = [f"div{n}"] + [f"nodiv{n}"] * (n - 1)
return itertools.cycle(L)
repeaters = [divs(n) for n in (3, 5, 7)]
d = {x: s for x, *s in zip(range(101), *repeaters)}
There is actually a one liner that isnt even that complicated :)
my_dict = {}
for i in range(100):
my_dict[i] = ['div' + str(n) if i % n == 0 else 'nodiv' + str(n) for n in [3,5,7]]
you could write a second loop so that you only have to write if...else only once
dict_divider = {}
div_check_lst = [3, 5, 7]
for x in range(0,101):
div_list= []
for div_check in div_check_lst:
if x % div_check == 0:
div_list.append(f'div{str(div_check)}')
else:
div_list.append(f'nodiv{str(div_check)}')
dict_divider[x] = div_list
or
dict_divider = {x:[f'{'no' * x % div_check != 0}div{str(div_check)}' for x in range(0,101) for div_check in div_check_lst]}

Deleting a row from an array

I'm working on an array called numbers which will be created with 4 columns called (x), (y), (z) respectively and the fourth is used in the program.
I want that if the x and y values of two rows coincide, then based on their c, one of them would be deleted from the main array (a "0" z value removes "1", a "1" z value removes "2" and a "2" z value removes "0").
The original array looks like:
[[12 15 2 0]
[65 23 0 0]
[24 66 2 0]
[65 23 1 0]
[24 66 0 0]]
The problem is that when I try to run the following program I do not get the required array at the end. The expected output array would look like:
[[12 15 2 0]
[65 23 0 0]
[24 66 2 0]]
I have given an extract from the program below
import numpy as np
#Array
numbers = np.array([[12,15,2,0],[65,23,0,0],[24,66,2,0],[65,23,1,0],[24,66,0,0]])
#Original Array
print(numbers)
#Lists to store x, y and z values
xs = []
ys = []
zs = []
#Any removed row is added into this list
removed = []
#Code to delete a row
for line1 in numbers:
for line2 in numbers:
if line1[0] == line2[0]:
if line2[1] == line2[1]:
if line1[2] == 1 and line2[2] == 0:
removed.append(line1)
if line1[2] == 0 and line2[2] == 2:
removed.append(line1)
if line1[2] == 2 and line2[2] == 1:
removed.append(line1)
for i in removed:
numbers = np.delete(numbers,i,axis=0)
for line in numbers:
xs.append(line[0])
ys.append(line[1])
zs.append(line[2])
#Update the original Array
for i in removed:
print(removed)
print()
print("x\n", xs)
print("y\n", ys)
print("z\n", zs)
print()
#Updated Array
print(numbers)
Test array
a = lifeforms = np.array([[12,15,2,0],
[13,13,0,0],
[13,13,1,0],
[13,13,2,0],
[65,23,1,0],
[24,66,2,0],
[14,14,1,0],
[14,14,1,1],
[14,14,1,2],
[14,14,2,0],
[15,15,3,2],
[15,15,2,0],
[65,23,0,0],
[24,66,0,0]])
Function that implements color selection.
test_one = np.array([[0,1],[1,0],[1,2],[2,1]])
test_two = np.array([[0,2],[2,0]])
def f(g):
a = g.loc[:,2].unique()
if np.any(np.all(a == test_one, axis=1)):
idx = (g[2] == g[2].min()).idxmax()
elif np.any(np.all(a == test_two, axis=1)):
idx = (g[2] == g[2].max()).idxmax()
else:
raise ValueError('group colors outside bounds')
return idx
Groupby first two columns; iterate over groups; save indices of desired rows; use those indices to select rows from the DataFrame.
df = pd.DataFrame(a)
gb = df.groupby([0,1])
indices = []
for k,g in gb:
if g.loc[:,2].unique().shape[0] > 2:
#print(f'(0,1,2) - dropping indices {g.index}')
continue
if g.shape[0] == 1:
indices.extend(g.index.to_list())
#print(f'unique - keeping index {g.index.values}')
continue
#print(g.loc[:,2])
try:
idx = f(g)
except ValueError as e:
print(sep)
print(e)
print(g)
print(sep)
continue
#print(f'keeping index {idx}')
indices.append(idx)
#print(sep)
print(df.loc[indices,:])
If you can use pandas, you can do the following:
x = np.array([[12,15,2,0],[65,23,0,1],[24,66,2,0],[65,23,1,0],[24,66,0,0]])
df = pd.DataFrame(x)
new_df = df.iloc[df.loc[:,(0,1)].drop_duplicates().index]
print(new_df)
0 1 2 3
0 12 15 2 0
1 65 23 0 1
2 24 66 2 0
What it does is the following:
transform the array to pandas data-frame
df.loc[:,(0,1)].drop_duplicates().index will return the indices of the rows you wish to keep (based on the first and second columns)
df.iloc will return the sliced data-frame.
Edit based on OP questions in the comments and #wwii remarks:
you can return to numpy array using .to_numpy(), so just do arr = new_df.to_numpy()
You can try the following:
xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]])
df = pd.DataFrame(xx)
df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2].idxmin()])
df_new.reset_index(drop=True, inplace=True)
0 1 2 3
0 12 15 2 0
1 24 66 0 0
2 65 23 0 0
When there is a special heuristic to consider one can do the following:
import pandas as pd
import numpy as np
def f_(x):
vals = x[2].tolist()
if len(vals)==2:
# print(vals)
if vals[0] == 0 and vals[1] == 1:
return vals[0]
elif vals[0] == 1 and vals[1] == 0:
return vals[1]
elif vals[0] == 1 and vals[1] == 2:
return vals[0]
elif vals[0] == 2 and vals[1] == 0:
return vals[0]
elif len(vals) > 2:
return -1
else:
return x[2]
xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]])
df = pd.DataFrame(xx)
df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2] == f_(x)])
df_new.reset_index(drop=True, inplace=True)
print(df_new)
0 1 2 3
0 12 15 2 0
1 24 66 2 0
2 65 23 0 0

printing maximum number of consecutive 1's in python

Im trying to print the maximum number of consecutive 1's in python...but im getting stuck here....IDK why im getting a syntax error...strange...can anyone help me out
li2 = []
t = int(input())
for i in range(0, t): //testcases
n = int(input())
for i in range(n): //length of list(binary array)
li = list(map(int, input().strip().split())
count = 0
max_count=0
for i in range(len(li)):
if (li[i] == 0):
count = 0
else:
count += 1
max_count = max(max_count,count)
li2.append(max_count)
for i in range(len(li2)):
print(li2[i])
File "<ipython-input-2-f159cb61e247>", line 7
count = 0
^
SyntaxError: invalid syntax
Corrections:
li2 = []
t = int(input())
for i in range(0, t): #testcases
n = int(input())
li = list(map(int, input().strip().split())) #<--- before that loop is removed.
count = 0
max_count=0
for i in range(len(li)):
if (li[i] != 1): #<------- Here
count = 0
else:
count += 1
max_count = max(max_count,count) #<--- here
li2.append(max_count)
print()
for i in range(len(li2)):
print(li2[i])
3
5
2 1 1 1 1
3
2 3 4
7
1 2 3 1 1 1 1
4
0
4
Improve the above answer using inline function
li2 = []
t = int(input())
for i in range(0, t): #testcases
n = int(input())
li = ''.join(input().split())
li = [n if n == '1' else '0' for n in li] # replace the numbers not '1' to '0'
max_count = max(map(len, ''.join(li).split('0'))) # split by '0' and get max length from each 1's
li2.append(max_count)
print()
for i in range(len(li2)):
print(li2[i])

Python counting lines in files using exact locations

I know this is straightforward but I am not quite understanding how to make my for loop work.
My first file is a long list of two columns of data:
ROW VALUE
0 165
1 115
2 32
3 14
4 9
5 0
6 89
7 26
. .
406369 129
406370 103
My second file is a list of important row numbers:
1
43
192
so on
All I want to do is go to the row number of interest in file 1, and then walk down, row by row, until the value column hits zero. The output will then be simply a list of the important row numbers followed by the count of the lines there are until the first file reaches zero. For instance, the output for important row number "1" from file #2, should be 3, because there are three lines and then the values reaches 0 in file #1. I appreciate any help! I have some script I have started and can post it in an edit if that is helpful. THANK YOU!
EDIT:
Some script I have started:
for line in important_rows_file:
line = line.strip().split()
positive_starts.append(int(line[2])
countsfile = []
for line in file:
line = line.strip().split()
countsfile.append([line[0]] + [line[1]])
count = 0
i = 0
for i in range(0, len(countsfile)):
for start in positive_starts:
if int(countsfile[start + i][1]) > 0:
count = count + 1
else:
count = count
.... not sure what is next
Here are two ways to do it.
The first way builds a dictionary in memory for all row numbers. This would be a good way to do it if a. You are going to re-use this same data over and over (you can store it and read it back in) or b. You are going to process a lot of rows from the second file (ie. most of the rows need this done). The second way just does a one-off for a given row number.
Given this as the input file:
ROW VALUE
0 165
1 115
2 32
3 14
4 9
5 0
6 89
7 26
8 13
9 0
Method 1.
ref_dict = {}
with open("so_cnt_file.txt") as infile:
next(infile)
cur_start_row = 0
cur_rows = []
for line in infile:
row, col = [int(val) for val in line.strip().split(" ") if val]
if col == 0:
for cur_row in cur_rows:
ref_dict[cur_row] = row - cur_row - 1
cur_start_row = row
cur_rows = []
continue
cur_rows.append(row)
print ref_dict
OUTPUT
{0: 4, 1: 3, 2: 2, 3: 1, 4: 0, 6: 2, 7: 1, 8: 0}
Method 2
def get_count_for_row(row=1):
with open("so_cnt_file.txt") as infile:
for i in range(0, row + 2):
next(infile)
cnt = 0
for line in infile:
row, col = [int(val) for val in line.strip().split(" ") if val]
if col == 0:
return cnt
cnt += 1
print get_count_for_row(1)
print get_count_for_row(6)
OUTPUT
3
2
Here is a solution that takes all of the rows of interest in a single call.
def get_count_for_rows(*rows):
rows = sorted(rows)
counts = []
with open("so_cnt_file.txt") as infile:
cur_row = 0
for i in range(cur_row, 2):
next(infile)
while rows:
inrow = rows.pop(0)
for i in range(cur_row, inrow):
next(infile)
cnt = 0
for line in infile:
row, col = [int(val) for val in line.strip().split(" ") if val]
if col == 0:
counts.append((inrow, cnt))
break
cnt += 1
cur_row = row
return counts
print get_count_for_rows(1, 6)
OUTPUT
[(1, 3), (6, 2)]

Categories