Count consistent duplicates in python? - python

here is the input:
a = [1,1,2,3,4,1,1]
and I want to get the output like:
out = [1,2,3,4,1]
count = [2,1,1,1,2]
This is different from numpy.unique function.
here is my code, any better solutions?
def unique_count(input):
tmp = None
count = 0
count_list = []
value_list = []
for i in input:
if i == tmp:
count += 1
else:
if tmp != None:
count_list.append(count)
value_list.append(tmp)
count = 1
tmp = i
count_list.append(count)
value_list.append(tmp)
return((value_list,count_list))

What you want is itertools.groupby:
from itertools import groupby
a = [1,1,2,3,4,1,1]
group_counts = [(k, len(list(g))) for k, g in groupby(a)]
out, count = map(list, zip(*group_counts))
print(out)
print(count)
Or all in one line:
out, count = map(list, zip(*((k, len(list(g))) for k, g in groupby(a))))
Output:
[1, 2, 3, 4, 1]
[2, 1, 1, 1, 2]

If you want to know what's going on inside, you can take a look. Then you can try any library or other shorted/smarter solution. This is also a linear solution by the way.
arr = [1,1,2,3,4,1,1]
def customDupCounter(a):
result = [a[0]]
counter = [1]
curr_index = 0
for i in range(1,len(a)):
if a[i] == a[i-1]:
counter[curr_index] += 1
else:
curr_index += 1
result.append(a[i])
counter.append(1)
return result, counter
result, counter = customDupCounter(arr)
print(result)
print(counter)

a = [1, 1, 2, 3, 4, 1, 1]
uniques = []
count = []
curCount = 0
for i, num in enumerate(a):
if i == 0 or a[i - 1] != num:
uniques.append(num)
if a[i - 1] == num:
curCount += 1
else:
count.append(curCount)
curCount = 1
count.append(curCount)
print(uniques)
print(count)
Here, we go through each number in the code and add it to the uniques list if the previous number in the list was different. We also have a variable to keep track of the count which resets to 1 if the previous number was different.

Related

Find the longest increasing subarray

For example : input = [1,2,3,1,5,7,8,9] , output = [1,5,7,8,9]
find out the longest continuous increasing subarray
I have tried on my own like this :
def longsub(l):
newl = []
for i in range(len(l)) :
if l[i] < l[i+1] :
newl.append(l[i])
else :
newl = []
return newl
But it would get error since the list index out of range. (It could not get the value after last value)
def longsub(l):
newl = []
for i in range(len(l)) :
if l[i] > l[i-1] :
newl.append(l[i])
else :
newl = []
return newl
And then I did this, but I would get the result without the first value of increasing subarray.
What should I rectify my code? Thanks!
Suppose that you had this helper at your disposal:
def increasing_length_at(l, i):
"""Returns number of increasing values found at index i.
>>> increasing_length_at([7, 6], 0)
1
>>> increasing_length_at([3, 7, 6], 0)
2
"""
val = l[i] - 1
for j in range(i, len(l)):
if l[j] <= val: # if non-increasing
break
val = l[j]
return j - i
How could you use that as part of a solution?
You could use 2 loops (first to iterate over the input and the second loop to iterate from the index of the first loop until the end):
inp = [1,2,3,1,5,7,8,9]
output = [1,5,7,8,9]
i, res = 0, []
while i < len(inp):
tempResult = [startNum := inp[i]] # Python>3.8: Walrus operator
for j in range(i+1, len(inp)):
if startNum > inp[j]:
i = j-1 # skip already compared items!
break
tempResult.append(startNum := inp[j]) # Python>3.8: Walrus operator
if len(tempResult) > len(res):
res = tempResult
i += 1
print(res, res == output)
Out:
[1, 5, 7, 8, 9] True
Firstly, you can use len(l) - 1 to avoid the IndexError. However, your approach is invalid since this would just return the last increasing sub. Here's my approach:
def longsub(l):
res, newl = [], []
for i in range(len(l)-1):
if l[i] < l[i+1]:
newl.append(l[i])
else:
newl.append(l[i])
res.append(newl)
newl = []
if newl: res.append(newl)
return max(res, key=len)
input = [1,2,3,4,5,1,5,7,8,9]
print(longsub(input))
Output:
>>> [1, 2, 3, 4, 5]

find 1's in the row

I have a task to do:
a = [0,1,0,1,0,0,1,1,1,1,0]
I have the list - a - randomly generated each time the program runs.
Task 1: find the longest 1-row (here it is [1,1,1,1]) and output its starting index number.
Task 2: find 1,1 in a; how many times it occurs? 1,1,1 doesn't count, only exact matches are taken into account.
a = [1,0,0,1,1,0,1,1,1,1]
counter = 1
for i in range(len(a)):
if a[i] == 1:
a[i] = counter
counter += 1
print(a)
b = []
one_rows = []
for i in a:
if i > 0:
one_rows.append(i)
if i == 0:
b.append([one_rows])
one_rows.clear()
print(b)
If I've understood your question right, you can use can use itertools.groupby to group the list and count the number of 1s:
a = [0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0]
max_len, double_ones, max_idx = float("-inf"), 0, 0
for v, g in groupby(enumerate(a), lambda k: k[1]):
if v == 1:
idxs = [i for i, _ in g]
double_ones += len(idxs) == 2
if len(idxs) > max_len:
max_len = len(idxs)
max_idx = idxs[0]
print("Longest 1-row:", max_len, "Index:", max_idx)
print("How many 1,1:", double_ones)
Prints:
Longest 1-row: 4 Index: 6
How many 1,1: 0

calculation of measures of descriptive statistics

In this program you CANNOT USE python libraries (pandas, numpy, etc), nor python functions (sum, etc).
Fulfilling all this, I would like to know how I could calculate these measures of my quantitative variable: mean, median and mode.
This is the data reading of my quantitative variable.
#we enter people's salaries
def salary(n):
L=[]
for elem in range(n):
print("enter the person's salary:")
L.append(float(input()))
return(L)
You have to count several numbers separately and first sort the list of numbers (the following example assumes that the list of numbers you pass in is unordered)
median: just take the middle digit of the sorted list
plural: distinguish between the presence or absence of a plural and the existence of multiple pluralities
average: sum and divide by length, try this:
def get_sort_lst(lst):
n = len(lst)
for i in range(1, n):
tmp, j = lst[i], i - 1
while j >= 0 and lst[j] > tmp:
lst[j + 1] = lst[j]
j -= 1
lst[j + 1] = tmp
return lst
def get_median(lst):
if len(lst) % 2 == 0:
n = len(lst) // 2
return (lst[n-1] + lst[n]) / 2
else:
return lst[len(lst)//2]
def get_mean(lst):
res = 0
for item in lst:
res += item
return res / len(lst)
def get_plural(lst):
res, plural = {}, []
for item in lst:
if item not in res:
res[item] = 1
else:
res[item] += 1
for k, v in res.items():
if not plural:
plural.append(k)
else:
if v > res[plural[0]]:
plural = [k]
elif k not in plural and v == res[plural[0]]:
plural.append(k)
if res[plural[0]] == 1:
return "No plural"
else:
return plural
def salary(lst):
lst = get_sort_lst(lst)
print("Mean: {}, Median: {}, Plural: {}".format(get_mean(lst), get_median(lst), get_plural(lst)))
salary([1, 2, 3, 4, 5, 5])
You may try something like this
total = 0
count = 0
for i in L:
total += i
count += 1
Mean
mean = total/count
Median
median = L[count//2]
You can see this post to calculate mode

How to make loop calculate faster

I want to make this code calculate faster . My code have too much loop I want to make it less. How to minimize for loop and while loop. My code is about to Dividing English words, appearing in strings (String) text, 3 characters, and counting the frequency of three sets of characters. The function has a value of dict, where the key is a set of three characters in text, and a value is the frequency of that character in the key. Must be a word, the frequency count is Case Insensitive ('ant' and 'Ant' are cadia ) if the length is less than 3. Characters must be defined as keys such as 'a', 'in'.
def main():
text = "Thank you for help me"
print(three_letters_count(text))
def three_letters_count(text):
d = dict()
res = []
li = list(text.lower().split())
for i in li:
if len(i) < 3:
res.append(i)
while len(i) >= 3:
res.append(i[:3])
i = i[1:]
for i in res:
d[i] = res.count(i)
return d
if __name__ == '__main__':
main()
As promised, just an alternative to the accepted answer:
def main():
text = "Thank you for help me thank you really so much"
print(three_letters_count(text))
def three_letters_count(text):
d = dict()
res = {}
li = list(text.lower().split())
for i in li:
if len(i) < 3:
if (i in res):
res[i] = res[i] + 1
else:
res[i] = 1
startpos = 0
for startpos in range(0, len(i)):
chunk = i[startpos:startpos + 3]
if (chunk in res):
res[chunk] = res[chunk] + 1
else:
res[chunk] = 1
return res
if __name__ == '__main__':
main()
It yields (with the modified input):
{'tha': 2, 'han': 2, 'ank': 2, 'you': 2, 'for': 1, 'hel': 1, 'elp': 1, 'me': 1, 'rea': 1, 'eal': 1, 'all': 1, 'lly': 1, 'so': 1, 'muc': 1, 'uch': 1}
You could adjust your while look and switch this out for a for loop.
See the adjusted function below.
def three_letters_count(text):
d = dict()
res = []
li = list(text.lower().split())
for i in li:
if len(i) < 3:
res.append(i)
for index in range(0, len(i)):
three_letter = i[index:index+3]
if(len(three_letter) >= 3):
res.append(three_letter)
for i in res:
d[i] = res.count(i)
return d

How to count the frequency of characters in a string in a row? [duplicate]

This question already has answers here:
Count consecutive characters
(15 answers)
Closed 3 years ago.
input = 'XXYXYYYXYXXYYY'
output = [2,1,1,3,1,1,2,3]
How would count the number of X's and Y's in a string in the order that they are inputted and then put those values in a list?
import itertools
numbers = []
input = 'XXYXYYYXYXXYYY'
split_string = [''.join(g) for k, g in itertools.groupby(input)]
for i in split_string:
numbers.append(len(i))
print(numbers)
Output:
[2, 1, 1, 3, 1, 1, 2, 3]
You could do this using a while loop by iterating the whole list.
str = 'XXYXYYYXYXXYYY';
i = 0
output = []
k = 1
while i < len(str) - 1:
if str[i] == str[i+1]:
k = k + 1
else:
output.append(k)
k = 1
i = i + 1
output.append(k)
print(output)
Output
[2, 1, 1, 3, 1, 1, 2, 3]
Try using itertools.groupby:
from itertools import groupby
s = 'XXYXYYYXYXXYYY'
print([len(list(i)) for _, i in groupby(s)])
Short solution using regex
import re
s = 'XXYXYYYXYXXYYY'
l = [len(m.group()) for m in re.finditer(r'(.)\1*', s)]
Based on this answer
Here's what you can try
test = 'XXYXYYYXYXXYYY'
count = 1
result_list = list()
prev_char = test[0]
for char in test[1:]:
if char == prev_char:
count+=1
prev_char = char
else:
result_list.append(count)
count=1
prev_char = char
result_list.append(count)
print(result_list)
Output
[2, 1, 1, 3, 1, 1, 2, 3]
Without any libs it will be like this:
string = 'XXYXYYYXYXXYYY'
res = []
current = ''
for char in string:
if current == char:
res[-1] += 1
else:
res.append(1)
current = char
print('res', res) # [2,1,1,3,1,1,2,3]
Try This.
input1 = 'XXYXYYYXYXXYYY'
output_list = []
count = 1
for index in range(len(input1)-1):
if input1[index] == input1[index+1]:
count += 1
else:
output_list.append(count)
count = 1
if input1[-1] == input1[-2]:
output_list[-1] += 1
else:
output_list.append(1)
print(output_list)
The basic approach is to occurrences and stop if new char come. Code is below.
list_of_consec = []
def consec_occur(strr):
i = 0
cc = []
while ( i < len(strr) -1 ):
count =1
while strr[i] == strr[i+1]:
i += 1
count += 1
if i + 1 == len(strr):
break
cc.append(count)
i += 1
return (cc)
if __name__ == "__main__":
print(consec_occur('XXYXYYYXYXXYYY'))
You can change the code according to your need. If you want list then make cc global and remove return statement and in print statement use cc.

Categories