How to make loop calculate faster - python

I want to make this code calculate faster . My code have too much loop I want to make it less. How to minimize for loop and while loop. My code is about to Dividing English words, appearing in strings (String) text, 3 characters, and counting the frequency of three sets of characters. The function has a value of dict, where the key is a set of three characters in text, and a value is the frequency of that character in the key. Must be a word, the frequency count is Case Insensitive ('ant' and 'Ant' are cadia ) if the length is less than 3. Characters must be defined as keys such as 'a', 'in'.
def main():
text = "Thank you for help me"
print(three_letters_count(text))
def three_letters_count(text):
d = dict()
res = []
li = list(text.lower().split())
for i in li:
if len(i) < 3:
res.append(i)
while len(i) >= 3:
res.append(i[:3])
i = i[1:]
for i in res:
d[i] = res.count(i)
return d
if __name__ == '__main__':
main()

As promised, just an alternative to the accepted answer:
def main():
text = "Thank you for help me thank you really so much"
print(three_letters_count(text))
def three_letters_count(text):
d = dict()
res = {}
li = list(text.lower().split())
for i in li:
if len(i) < 3:
if (i in res):
res[i] = res[i] + 1
else:
res[i] = 1
startpos = 0
for startpos in range(0, len(i)):
chunk = i[startpos:startpos + 3]
if (chunk in res):
res[chunk] = res[chunk] + 1
else:
res[chunk] = 1
return res
if __name__ == '__main__':
main()
It yields (with the modified input):
{'tha': 2, 'han': 2, 'ank': 2, 'you': 2, 'for': 1, 'hel': 1, 'elp': 1, 'me': 1, 'rea': 1, 'eal': 1, 'all': 1, 'lly': 1, 'so': 1, 'muc': 1, 'uch': 1}

You could adjust your while look and switch this out for a for loop.
See the adjusted function below.
def three_letters_count(text):
d = dict()
res = []
li = list(text.lower().split())
for i in li:
if len(i) < 3:
res.append(i)
for index in range(0, len(i)):
three_letter = i[index:index+3]
if(len(three_letter) >= 3):
res.append(three_letter)
for i in res:
d[i] = res.count(i)
return d

Related

Python sort() with lambda key - list of nums vs chars

I am trying to apply the solution of sorting a list of numbers by frequency of each number, but to a list of chars.
My solution for sorting the numbers is:
def num_sort_by_freq(list_of_nums):
num_count = {}
for num in list_of_nums:
if num not in num_count:
num_count[num] = 1
else:
num_count[num] += 1
list_of_nums.sort(key = lambda x:num_count[x])
return list_of_nums
print(num_sort_by_freq([1,1,1,2,2,2,2,3,3,3]))
Output: [1, 1, 1, 3, 3, 3, 2, 2, 2, 2]
Trying to sort the chars:
def char_sort_by_freq(string_to_be_list):
list_of_chars = list(string_to_be_list)
char_count = {}
for char in list_of_chars:
if char not in char_count:
char_count[char] = 1
else:
char_count[char] += 1
list_of_chars.sort(key = lambda x:char_count[x])
return "".join(list_of_chars)
print(char_sort_by_freq("asdfasdfasdddffffff"))
Output: asasasdddddffffffff
Expected output: aaasssdddddffffffff
I've gone through it too many times and cannot understand why the output's 'a's and 's's are jumbled together, rather than sequential.
Any help is appreciated.
edit: Thanks so much for the help! Lambda functions are new territory for me.
You can change your key function to return tuple to handle ties:
def char_sort_by_freq(string_to_be_list):
list_of_chars = list(string_to_be_list)
char_count = {}
for char in list_of_chars:
if char not in char_count:
char_count[char] = 1
else:
char_count[char] += 1
list_of_chars.sort(key = lambda x:(char_count[x], x))
# ^^^^^^^^^^^^^^^^^
return "".join(list_of_chars)
print(char_sort_by_freq("asdfasdfasdddffffff"))
Output:
aaasssdddddffffffff

Count consistent duplicates in python?

here is the input:
a = [1,1,2,3,4,1,1]
and I want to get the output like:
out = [1,2,3,4,1]
count = [2,1,1,1,2]
This is different from numpy.unique function.
here is my code, any better solutions?
def unique_count(input):
tmp = None
count = 0
count_list = []
value_list = []
for i in input:
if i == tmp:
count += 1
else:
if tmp != None:
count_list.append(count)
value_list.append(tmp)
count = 1
tmp = i
count_list.append(count)
value_list.append(tmp)
return((value_list,count_list))
What you want is itertools.groupby:
from itertools import groupby
a = [1,1,2,3,4,1,1]
group_counts = [(k, len(list(g))) for k, g in groupby(a)]
out, count = map(list, zip(*group_counts))
print(out)
print(count)
Or all in one line:
out, count = map(list, zip(*((k, len(list(g))) for k, g in groupby(a))))
Output:
[1, 2, 3, 4, 1]
[2, 1, 1, 1, 2]
If you want to know what's going on inside, you can take a look. Then you can try any library or other shorted/smarter solution. This is also a linear solution by the way.
arr = [1,1,2,3,4,1,1]
def customDupCounter(a):
result = [a[0]]
counter = [1]
curr_index = 0
for i in range(1,len(a)):
if a[i] == a[i-1]:
counter[curr_index] += 1
else:
curr_index += 1
result.append(a[i])
counter.append(1)
return result, counter
result, counter = customDupCounter(arr)
print(result)
print(counter)
a = [1, 1, 2, 3, 4, 1, 1]
uniques = []
count = []
curCount = 0
for i, num in enumerate(a):
if i == 0 or a[i - 1] != num:
uniques.append(num)
if a[i - 1] == num:
curCount += 1
else:
count.append(curCount)
curCount = 1
count.append(curCount)
print(uniques)
print(count)
Here, we go through each number in the code and add it to the uniques list if the previous number in the list was different. We also have a variable to keep track of the count which resets to 1 if the previous number was different.

How to count the frequency of characters in a string in a row? [duplicate]

This question already has answers here:
Count consecutive characters
(15 answers)
Closed 3 years ago.
input = 'XXYXYYYXYXXYYY'
output = [2,1,1,3,1,1,2,3]
How would count the number of X's and Y's in a string in the order that they are inputted and then put those values in a list?
import itertools
numbers = []
input = 'XXYXYYYXYXXYYY'
split_string = [''.join(g) for k, g in itertools.groupby(input)]
for i in split_string:
numbers.append(len(i))
print(numbers)
Output:
[2, 1, 1, 3, 1, 1, 2, 3]
You could do this using a while loop by iterating the whole list.
str = 'XXYXYYYXYXXYYY';
i = 0
output = []
k = 1
while i < len(str) - 1:
if str[i] == str[i+1]:
k = k + 1
else:
output.append(k)
k = 1
i = i + 1
output.append(k)
print(output)
Output
[2, 1, 1, 3, 1, 1, 2, 3]
Try using itertools.groupby:
from itertools import groupby
s = 'XXYXYYYXYXXYYY'
print([len(list(i)) for _, i in groupby(s)])
Short solution using regex
import re
s = 'XXYXYYYXYXXYYY'
l = [len(m.group()) for m in re.finditer(r'(.)\1*', s)]
Based on this answer
Here's what you can try
test = 'XXYXYYYXYXXYYY'
count = 1
result_list = list()
prev_char = test[0]
for char in test[1:]:
if char == prev_char:
count+=1
prev_char = char
else:
result_list.append(count)
count=1
prev_char = char
result_list.append(count)
print(result_list)
Output
[2, 1, 1, 3, 1, 1, 2, 3]
Without any libs it will be like this:
string = 'XXYXYYYXYXXYYY'
res = []
current = ''
for char in string:
if current == char:
res[-1] += 1
else:
res.append(1)
current = char
print('res', res) # [2,1,1,3,1,1,2,3]
Try This.
input1 = 'XXYXYYYXYXXYYY'
output_list = []
count = 1
for index in range(len(input1)-1):
if input1[index] == input1[index+1]:
count += 1
else:
output_list.append(count)
count = 1
if input1[-1] == input1[-2]:
output_list[-1] += 1
else:
output_list.append(1)
print(output_list)
The basic approach is to occurrences and stop if new char come. Code is below.
list_of_consec = []
def consec_occur(strr):
i = 0
cc = []
while ( i < len(strr) -1 ):
count =1
while strr[i] == strr[i+1]:
i += 1
count += 1
if i + 1 == len(strr):
break
cc.append(count)
i += 1
return (cc)
if __name__ == "__main__":
print(consec_occur('XXYXYYYXYXXYYY'))
You can change the code according to your need. If you want list then make cc global and remove return statement and in print statement use cc.

List of strings, get common substring of n elements, Python

My problem is maybe similar to this, but another situation.
Consider this list in input :
['ACCCACCCGTGG','AATCCC','CCCTGAGG']
And the other input is n,n is a number, the dimension of the substring in common in every element of the list. So the output has to be the maximum occorence substring with the number of occorences, similar to this:
{'CCC' : 4}
4 becouse in the first element of list are twice, and one time in the other two strings.CCC becouse is the longhest substring with 3 elements,that repeats at least 1 time per string
I started in that way :
def get_n_repeats_list(n,seq_list):
max_substring={}
list_seq=list(seq_list)
for i in range(0,len(list_seq)):
if i+1<len(list_seq):
#Idea : to get elements in common,comparing two strings at time
#in_common=set(list_seq[i])-set(list_seq[i+1])
#max_substring...
return max_substring
Maybe here a solution
import operator
LL = ['ACCCACCCGTGG','AATCCC','CCCTGAGG']
def createLenList(n,LL):
stubs = {}
for l in LL:
for i,e in enumerate(l):
stub = l[i:i+n]
if len(stub) == n:
if stub not in stubs: stubs[stub] = 1
else: stubs[stub] += 1
maxKey = max(stubs.iteritems(), key=operator.itemgetter(1))[0]
return [maxKey,stubs[maxKey]]
maxStub = createLenList(3,LL)
print maxStub
So this is my take on it. It is definitely not the prettiest thing on the planet but it should work just fine.
a = ['ACCCWCCCGTGG', 'AATCCC', 'CCCTGAGG']
def occur(the_list, a_substr):
i_found = 0
for a_string in the_list:
for i_str in range(len(a_string) - len(a_substr) + 1):
#print('Comparing {:s} to {:s}'.format(substr, a_string[i_str:i_str + len(substr)]))
if a_substr == a_string[i_str:i_str + len(a_substr)]:
i_found += 1
return i_found
def found_str(original_List, n):
result_dict = {}
if n > min(map(len, original_List)):
print("The substring has to be shorter than the shortest string!")
exit()
specialChar = '|'
b = specialChar.join(item for item in original_List)
str_list = []
for i in range(len(b) - n):
currStr = b[i:i+n]
if specialChar not in currStr:
str_list.append(currStr)
else:
continue
str_list = set(str_list)
for sub_strs in str_list:
i_found = 0
for strs in original_List:
if sub_strs in strs:
i_found += 1
if i_found == len(original_List):
#print("entered with sub = {:s}".format(sub_strs))
#print(occur(original_List, sub_strs))
result_dict[sub_strs] = occur(original_List, sub_strs)
if result_dict == {}:
print("No common substings of length {:} were found".format(n))
return result_dict
end = found_str(a, 3)
print(end)
returns: {'CCC': 4}
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and is_substr(data[0][i:i+j], data):
substr = data[0][i:i+j]
return substr
def is_substr(find, data):
if len(data) < 1 and len(find) < 1:
return False
for i in range(len(data)):
if find not in data[i]:
return False
return True
input_list = ['A', 'ACCCACCCGTGG','AATCCC','CCCTGAGG']
longest_common_str = long_substr(input_list)
if longest_common_str:
frequency = 0
for common in input_list:
frequency += common.count(longest_common_str)
print (longest_common_str, frequency)
else:
print ("nothing common")
Output
A 6

I want to take characters from a string and populate those characters in 2 different sets in python?

comp = "{1},{2},{3},{1,2},{2,3}"
I want the above string to be distributed to 2 different sets. S0_list = [{1},{2},{3}] and S1_list = {1,2},{2,3}.
The code basically has to run through each character in my string above(comp) and put single elements({1},{2}) in S0 and double elements ({1,2},{2,3}) in S1.
This is the code I have so far.
S0_list = []
S1_list = []
new_set = set()
comp = "{1},{2},{3},{1,2},{2,3}"
pos = 0
S0_list = S1_list = [0]
while pos < len(comp):
if comp[pos] == '{':
pos = pos + 1
count = 1
while comp[pos] != '}':
pos = pos + 1
if comp[pos] == ',':
count = count + 1
if count == 1:
S0_list.append(new_set)
elif count == 2:
S1_list.append(new_set)
pos = pos + 1
Can someone please help without marking me down for bad format of questions.
You can try this, find the groups inside {}, split them by comma and check its length
import re
S0_list = []
S1_list = []
comp = "{1},{2},{3},{1,2},{2,3}"
for proto_set in re.findall('{([\d,?]+)}', comp):
set_elements = set(proto_set.split(","))
if len(set_elements) > 1:
S1_list.append(set_elements)
else:
S0_list.append(set_elements)
Note that the syntax in comp is basically valid Python already, so we can eval() it (assuming that it is not untrusted input from a user!):
>>> comp = "{1},{2},{3},{1,2},{2,3}"
>>> values = eval('(' + comp + ')')
>>> values
(set([1]), set([2]), set([3]), set([1, 2]), set([2, 3]))
>>> S0_list = [x for x in values if len(x) == 1]
>>> S1_list = [x for x in values if len(x) == 2]
>>> S0_list
[set([1]), set([2]), set([3])]
>>> S1_list
[set([1, 2]), set([2, 3])]
Another option:
import sys
comp = "{1},{2},{3},{1,2},{2,3}"
set_size1 = set()
set_size2 = set()
[getattr(sys.modules[__name__], "set_size{}".format(s.count(",") + 1)).add(s)
for s in [v if idx == 0 else "{" + v for idx, v in enumerate(comp.split(",{"))]]
print set_size1, set_size2
Output:
set(['{3}', '{1}', '{2}']) set(['{1,2}', '{2,3}'])
A little explanation We are using list comprehension to iterate over the string and to split it, Then I'm counting the number of , in the string and with that number we can get the right object of sets (by its name) to update using getattar
Here you can more set sizes to get different elements for example lets add a 3 digit option:
import sys
comp = "{1},{2},{3},{1,2},{2,3},{2,3,4}"
set_size1 = set()
set_size2 = set()
set_size3 = set()
[getattr(sys.modules[__name__], "set_size{}".format(s.count(",") + 1)).add(s)
for s in [v if idx == 0 else "{" + v for idx, v in enumerate(comp.split(",{"))]]
print set_size1, set_size2, set_size3
Output:
set(['{3}', '{1}', '{2}']) set(['{1,2}', '{2,3}']) set(['{2,3,4}'])

Categories