Count consistent duplicates in python?

Count consistent duplicates in python? - python

here is the input:
a = [1,1,2,3,4,1,1]
and I want to get the output like:
out = [1,2,3,4,1]
count = [2,1,1,1,2]
This is different from numpy.unique function.
here is my code, any better solutions?
def unique_count(input):
tmp = None
count = 0
count_list = []
value_list = []
for i in input:
if i == tmp:
count += 1
else:
if tmp != None:
count_list.append(count)
value_list.append(tmp)
count = 1
tmp = i
count_list.append(count)
value_list.append(tmp)
return((value_list,count_list))

What you want is itertools.groupby:
from itertools import groupby
a = [1,1,2,3,4,1,1]
group_counts = [(k, len(list(g))) for k, g in groupby(a)]
out, count = map(list, zip(*group_counts))
print(out)
print(count)
Or all in one line:
out, count = map(list, zip(*((k, len(list(g))) for k, g in groupby(a))))
Output:
[1, 2, 3, 4, 1]
[2, 1, 1, 1, 2]

If you want to know what's going on inside, you can take a look. Then you can try any library or other shorted/smarter solution. This is also a linear solution by the way.
arr = [1,1,2,3,4,1,1]
def customDupCounter(a):
result = [a[0]]
counter = [1]
curr_index = 0
for i in range(1,len(a)):
if a[i] == a[i-1]:
counter[curr_index] += 1
else:
curr_index += 1
result.append(a[i])
counter.append(1)
return result, counter
result, counter = customDupCounter(arr)
print(result)
print(counter)

a = [1, 1, 2, 3, 4, 1, 1]
uniques = []
count = []
curCount = 0
for i, num in enumerate(a):
if i == 0 or a[i - 1] != num:
uniques.append(num)
if a[i - 1] == num:
curCount += 1
else:
count.append(curCount)
curCount = 1
count.append(curCount)
print(uniques)
print(count)
Here, we go through each number in the code and add it to the uniques list if the previous number in the list was different. We also have a variable to keep track of the count which resets to 1 if the previous number was different.

Related

Find the longest increasing subarray

For example : input = [1,2,3,1,5,7,8,9] , output = [1,5,7,8,9]
find out the longest continuous increasing subarray
I have tried on my own like this :
def longsub(l):
newl = []
for i in range(len(l)) :
if l[i] < l[i+1] :
newl.append(l[i])
else :
newl = []
return newl
But it would get error since the list index out of range. (It could not get the value after last value)
def longsub(l):
newl = []
for i in range(len(l)) :
if l[i] > l[i-1] :
newl.append(l[i])
else :
newl = []
return newl
And then I did this, but I would get the result without the first value of increasing subarray.
What should I rectify my code? Thanks!

Suppose that you had this helper at your disposal:
def increasing_length_at(l, i):
"""Returns number of increasing values found at index i.
>>> increasing_length_at([7, 6], 0)
1
>>> increasing_length_at([3, 7, 6], 0)
2
"""
val = l[i] - 1
for j in range(i, len(l)):
if l[j] <= val: # if non-increasing
break
val = l[j]
return j - i
How could you use that as part of a solution?

You could use 2 loops (first to iterate over the input and the second loop to iterate from the index of the first loop until the end):
inp = [1,2,3,1,5,7,8,9]
output = [1,5,7,8,9]
i, res = 0, []
while i < len(inp):
tempResult = [startNum := inp[i]] # Python>3.8: Walrus operator
for j in range(i+1, len(inp)):
if startNum > inp[j]:
i = j-1 # skip already compared items!
break
tempResult.append(startNum := inp[j]) # Python>3.8: Walrus operator
if len(tempResult) > len(res):
res = tempResult
i += 1
print(res, res == output)
Out:
[1, 5, 7, 8, 9] True

Firstly, you can use len(l) - 1 to avoid the IndexError. However, your approach is invalid since this would just return the last increasing sub. Here's my approach:
def longsub(l):
res, newl = [], []
for i in range(len(l)-1):
if l[i] < l[i+1]:
newl.append(l[i])
else:
newl.append(l[i])
res.append(newl)
newl = []
if newl: res.append(newl)
return max(res, key=len)
input = [1,2,3,4,5,1,5,7,8,9]
print(longsub(input))
Output:
>>> [1, 2, 3, 4, 5]

find 1's in the row

I have a task to do:
a = [0,1,0,1,0,0,1,1,1,1,0]
I have the list - a - randomly generated each time the program runs.
Task 1: find the longest 1-row (here it is [1,1,1,1]) and output its starting index number.
Task 2: find 1,1 in a; how many times it occurs? 1,1,1 doesn't count, only exact matches are taken into account.
a = [1,0,0,1,1,0,1,1,1,1]
counter = 1
for i in range(len(a)):
if a[i] == 1:
a[i] = counter
counter += 1
print(a)
b = []
one_rows = []
for i in a:
if i > 0:
one_rows.append(i)
if i == 0:
b.append([one_rows])
one_rows.clear()
print(b)

If I've understood your question right, you can use can use itertools.groupby to group the list and count the number of 1s:
a = [0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0]
max_len, double_ones, max_idx = float("-inf"), 0, 0
for v, g in groupby(enumerate(a), lambda k: k[1]):
if v == 1:
idxs = [i for i, _ in g]
double_ones += len(idxs) == 2
if len(idxs) > max_len:
max_len = len(idxs)
max_idx = idxs[0]
print("Longest 1-row:", max_len, "Index:", max_idx)
print("How many 1,1:", double_ones)
Prints:
Longest 1-row: 4 Index: 6
How many 1,1: 0

calculation of measures of descriptive statistics

In this program you CANNOT USE python libraries (pandas, numpy, etc), nor python functions (sum, etc).
Fulfilling all this, I would like to know how I could calculate these measures of my quantitative variable: mean, median and mode.
This is the data reading of my quantitative variable.
#we enter people's salaries
def salary(n):
L=[]
for elem in range(n):
print("enter the person's salary:")
L.append(float(input()))
return(L)

You have to count several numbers separately and first sort the list of numbers (the following example assumes that the list of numbers you pass in is unordered)
median: just take the middle digit of the sorted list
plural: distinguish between the presence or absence of a plural and the existence of multiple pluralities
average: sum and divide by length， try this：
def get_sort_lst(lst):
n = len(lst)
for i in range(1, n):
tmp, j = lst[i], i - 1
while j >= 0 and lst[j] > tmp:
lst[j + 1] = lst[j]
j -= 1
lst[j + 1] = tmp
return lst
def get_median(lst):
if len(lst) % 2 == 0:
n = len(lst) // 2
return (lst[n-1] + lst[n]) / 2
else:
return lst[len(lst)//2]
def get_mean(lst):
res = 0
for item in lst:
res += item
return res / len(lst)
def get_plural(lst):
res, plural = {}, []
for item in lst:
if item not in res:
res[item] = 1
else:
res[item] += 1
for k, v in res.items():
if not plural:
plural.append(k)
else:
if v > res[plural[0]]:
plural = [k]
elif k not in plural and v == res[plural[0]]:
plural.append(k)
if res[plural[0]] == 1:
return "No plural"
else:
return plural
def salary(lst):
lst = get_sort_lst(lst)
print("Mean: {}, Median: {}, Plural: {}".format(get_mean(lst), get_median(lst), get_plural(lst)))
salary([1, 2, 3, 4, 5, 5])

You may try something like this
total = 0
count = 0
for i in L:
total += i
count += 1
Mean
mean = total/count
Median
median = L[count//2]
You can see this post to calculate mode

How to make loop calculate faster

I want to make this code calculate faster . My code have too much loop I want to make it less. How to minimize for loop and while loop. My code is about to Dividing English words, appearing in strings (String) text, 3 characters, and counting the frequency of three sets of characters. The function has a value of dict, where the key is a set of three characters in text, and a value is the frequency of that character in the key. Must be a word, the frequency count is Case Insensitive ('ant' and 'Ant' are cadia ) if the length is less than 3. Characters must be defined as keys such as 'a', 'in'.
def main():
text = "Thank you for help me"
print(three_letters_count(text))
def three_letters_count(text):
d = dict()
res = []
li = list(text.lower().split())
for i in li:
if len(i) < 3:
res.append(i)
while len(i) >= 3:
res.append(i[:3])
i = i[1:]
for i in res:
d[i] = res.count(i)
return d
if __name__ == '__main__':
main()

As promised, just an alternative to the accepted answer:
def main():
text = "Thank you for help me thank you really so much"
print(three_letters_count(text))
def three_letters_count(text):
d = dict()
res = {}
li = list(text.lower().split())
for i in li:
if len(i) < 3:
if (i in res):
res[i] = res[i] + 1
else:
res[i] = 1
startpos = 0
for startpos in range(0, len(i)):
chunk = i[startpos:startpos + 3]
if (chunk in res):
res[chunk] = res[chunk] + 1
else:
res[chunk] = 1
return res
if __name__ == '__main__':
main()
It yields (with the modified input):
{'tha': 2, 'han': 2, 'ank': 2, 'you': 2, 'for': 1, 'hel': 1, 'elp': 1, 'me': 1, 'rea': 1, 'eal': 1, 'all': 1, 'lly': 1, 'so': 1, 'muc': 1, 'uch': 1}

You could adjust your while look and switch this out for a for loop.
See the adjusted function below.
def three_letters_count(text):
d = dict()
res = []
li = list(text.lower().split())
for i in li:
if len(i) < 3:
res.append(i)
for index in range(0, len(i)):
three_letter = i[index:index+3]
if(len(three_letter) >= 3):
res.append(three_letter)
for i in res:
d[i] = res.count(i)
return d

How to count the frequency of characters in a string in a row? [duplicate]

This question already has answers here:
Count consecutive characters
(15 answers)
Closed 3 years ago.
input = 'XXYXYYYXYXXYYY'
output = [2,1,1,3,1,1,2,3]
How would count the number of X's and Y's in a string in the order that they are inputted and then put those values in a list?

import itertools
numbers = []
input = 'XXYXYYYXYXXYYY'
split_string = [''.join(g) for k, g in itertools.groupby(input)]
for i in split_string:
numbers.append(len(i))
print(numbers)
Output:
[2, 1, 1, 3, 1, 1, 2, 3]

You could do this using a while loop by iterating the whole list.
str = 'XXYXYYYXYXXYYY';
i = 0
output = []
k = 1
while i < len(str) - 1:
if str[i] == str[i+1]:
k = k + 1
else:
output.append(k)
k = 1
i = i + 1
output.append(k)
print(output)
Output
[2, 1, 1, 3, 1, 1, 2, 3]

Try using itertools.groupby:
from itertools import groupby
s = 'XXYXYYYXYXXYYY'
print([len(list(i)) for _, i in groupby(s)])

Short solution using regex
import re
s = 'XXYXYYYXYXXYYY'
l = [len(m.group()) for m in re.finditer(r'(.)\1*', s)]
Based on this answer

Here's what you can try
test = 'XXYXYYYXYXXYYY'
count = 1
result_list = list()
prev_char = test[0]
for char in test[1:]:
if char == prev_char:
count+=1
prev_char = char
else:
result_list.append(count)
count=1
prev_char = char
result_list.append(count)
print(result_list)
Output
[2, 1, 1, 3, 1, 1, 2, 3]

Without any libs it will be like this:
string = 'XXYXYYYXYXXYYY'
res = []
current = ''
for char in string:
if current == char:
res[-1] += 1
else:
res.append(1)
current = char
print('res', res) # [2,1,1,3,1,1,2,3]

Try This.
input1 = 'XXYXYYYXYXXYYY'
output_list = []
count = 1
for index in range(len(input1)-1):
if input1[index] == input1[index+1]:
count += 1
else:
output_list.append(count)
count = 1
if input1[-1] == input1[-2]:
output_list[-1] += 1
else:
output_list.append(1)
print(output_list)

The basic approach is to occurrences and stop if new char come. Code is below.
list_of_consec = []
def consec_occur(strr):
i = 0
cc = []
while ( i < len(strr) -1 ):
count =1
while strr[i] == strr[i+1]:
i += 1
count += 1
if i + 1 == len(strr):
break
cc.append(count)
i += 1
return (cc)
if __name__ == "__main__":
print(consec_occur('XXYXYYYXYXXYYY'))
You can change the code according to your need. If you want list then make cc global and remove return statement and in print statement use cc.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count consistent duplicates in python? - python

Related

Find the longest increasing subarray

find 1's in the row

calculation of measures of descriptive statistics

How to make loop calculate faster

How to count the frequency of characters in a string in a row? [duplicate]

Categories

Resources