Finding characters with minimum frequency

Finding characters with minimum frequency - python

As the title says, I am trying to print ALL characters with the minimum frequency:
least_occurring is holding only ONE value so I think that is the issue but can't figure it out..it could be something obvious I am missing here but I am out of brain cells :)
ex: aabbccdddeeeffff
expected output:
Least occuring character : a,b,c <=== this is what I can't figure out :(
repeated 2 time(s)
--------------------------
Character Frequency
--------------------------
a 2
b 2
c 2
d 3
e 3
f 4
results I am getting:
Least occurring character is: a
It is repeated 2 time(s)
--------------------------
Character Frequency
--------------------------
a 2
b 2
c 2
d 3
e 3
f 4
my code:
# Get string from user
string = input("Enter some text: ")
# Set frequency as empty dictionary
frequency_dict = {}
tab="\t\t\t\t\t"
for character in string:
if character in frequency_dict:
frequency_dict[character] += 1
else:
frequency_dict[character] = 1
least_occurring = min(frequency_dict, key=frequency_dict.get)
# Displaying result
print("\nLeast occuring character is: ", least_occurring)
print("Repeated %d time(s)" %(frequency_dict[least_occurring]))
# Displaying result
print("\n--------------------------")
print("Character\tFrequency")
print("--------------------------")
for character, frequency in frequency_dict.items():
print(f"{character + tab + str(frequency)}")

The Counter class from the collections module is ideal for this. However, in this trivial case, just use a dictionary.
s = 'aabbccdddeeeffff'
counter = {}
# count the occurrences of the individual characters
for c in s:
counter[c] = counter.get(c, 0) + 1
# find the lowest value
min_ = min(counter.values())
# create a list of all characters where the count matches the previously calculated minimum
lmin = [k for k, v in counter.items() if v == min_]
# print the results
print('Least occuring character : ', end='')
print(*lmin, sep=', ')
Output:
Least occuring character : a, b, c

You are very close!
If you have min value why not just iterate over your dictionary and check all keys that values are the min one?
for k, v in frequency_dict.items():
if v == least_occurring:
print(k)

Related

How to add or subtract 1 to a sum based on a certain value in a text?

After reading a text, I need to add 1 to a sum if I find a ( character, and subtract 1 if I find a ) character in the text. I can't figure out what I'm doing wrong.
This is what I tried at first:
file = open("day12015.txt")
sum = 0
up = "("
for item in file:
if item is up:
sum += 1
else:
sum -= 1
print(sum)
I have this long text like the following example (((())))((((( .... If I find a ), I need to subtract 1, if I find a (, I need to add 1. How can I solve it? I'm always getting 0 as output even if I change my file manually.

your for loop only gets all the string in the file so you have to loop through the string to get your desired output.
Example .txt
(((())))(((((
Full Code
file = open("Data.txt")
sum = 0
up = "("
for string in file:
for item in string:
if item is up:
sum += 1
else:
sum -= 1
print(sum)
Output
5
Hope this helps.Happy Coding :)

So you need to sum +1 for "(" character and -1 for ")".
Do it directly specifying what to occur when you encounter this character. Also you need to read the lines from a file as you're opening it. In your code, you are substracting one for every case that is not "(".
file = open("day12015.txt")
total = 0
for line in file:
for character in line:
if character == "(":
total += 1
elif character == ")":
total -= 1
print(sum)

That's simply a matter of counting each character in the text. The sum is the difference between those counts. Look:
from pathlib import Path
file = Path('day12015.txt')
text = file.read_text()
total = text.count('(') - text.count(')')
For the string you posted, for example, we have this:
>>> p = '(((())))((((('
>>> p.count('(') - p.count(')')
5
>>>
Just for comparison and out of curiosity, I timed the str.count() and a loop approach, 1,000 times, using a string composed of 1,000,000 randoms ( and ). Here is what I found:
import random
from timeit import timeit
random.seed(0)
p = ''.join(random.choice('()') for _ in range(1_000_000))
def f():
return p.count('(') - p.count(')')
def g():
a, b = 0, 0
for c in p:
if c == '(':
a = a + 1
else:
b = b + 1
return a - b
print('f: %5.2f s' % timeit(f, number=1_000))
print('g: %5.2f s' % timeit(g, number=1_000))
f: 8.19 s
g: 49.34 s
It means the loop approach is 6 times slower, even though the str.count() one is iterating over p two times to compute the result.

Python recognize most likely to occur in a repeating pattern of letters/numbers

I'm looking for a Python solution to extract from a series of letters/numbers, the most repeating pattern which comes with an outcome and a specific length.
Problem: When is search more likely to occur given a 4 digit block (Block Length) of digits/letters? (So the string has to END with search)
Example:
Input: 0010000101010001011010011101001101000011100010100101010111
Search: 1
Block Length: 4
---
Answer: 0101
Appeared: 5 times
In the above case "1" is more likely to appear when 010 comes before 1.
001000 [0101] 0100 [0101] 1010011101001101000011100 [0101] 0 [0101] [0101] 11
So the answer is 0101 an it appeared 5 times.
NOTE:
This could return 0001 but that only appeared 4 times while 0101 appeared 5 times.
Changing the length would result in:
Input: 0010000101010001011010011101001101000011100010100101010111 (same as above)
Search: 1
Block Length: 5
---
Answer: 00101
Appeared: 4 times
Because:
00100 [00101] 010 [00101] 101001110100110100001110 [00101][00101] 010111
NOTE:
The second example could return 00001 but that only appeared 2 times while 00101 appeared 4 times.
If there are multiple outcomes ie: 0101 and 0111 have the same presence, both outcome should be showing.
I'm at the point where I can find the more repeating string, but I don't know how to give the length:
def find_most_repetitive_substring(string):
max_counter = 1
position, substring_length, times = 0, 0, 0
for i in range(len(string)):
for j in range(len(string) - i):
counter = 1
if j == 0:
continue
while True:
if string[i + counter * j: i + (counter + 1) * j] != string[i: i + j] or i + (counter + 1) * j > len(string):
if counter > max_counter:
max_counter = counter
position, substring_length, times = i, j, counter
break
else:
counter += 1
return string[position: position + substring_length * times]

I've used re here as you're dealing with text but you can use over techiques to create blocks of N length with overlaps...
import re
def f(text, search, length):
# Get unique blocks of length N - including overlaps
overlaps = set(re.findall(f'(?=(.{{{length}}}))', text))
# Priotise those ending with length, then the count of non-overlapping and then include the block itself
return max((block.endswith(search), text.count(block), block) for block in overlaps)
S = '0010000101010001011010011101001101000011100010100101010111'
f(S, '1', 4)
# (True, 5, '0101')
f(S, '1', 5)
#(True, 4, '00101')

This might help you with part of your question (i.e., getting the counts); iterating and storing things in a lookup table (dictionary):
def find_most_repetitive_substring(string, substring_length, ending='1'):
"""
Finds the most repetive substring in a given string.
:param string: String to search for repetitions.
:param substring_length: Length of the substring to search for.
:param ending: character that pattern must end with. default is '1'.
:return: Most repetitive substring and its number of occurrences.
"""
substring_count = {}
for i in range(len(string) - substring_length + 1):
substring = string[i:i + substring_length]
if substring[-1] == ending: # added for ending
if substring in substring_count:
substring_count[substring] += 1
else:
substring_count[substring] = 1
max_substr = max(substring_count, key=substring_count.get)
return max_substr, substring_count[max_substr]
find_most_repetitive_substring('0010000101010001011010011101001101000011100010100101010111', 4)
And if you want to get all the keys with the max val, you can just return a list, changing the last lines to something like this:
max_substr = max(substring_count, key=substring_count.get)
max_substrs = [k for k, v in substring_count.items() if v == substring_count[max_substr]]
return max_substrs, substring_count[max_substr]

You could use Counter:
from collections import Counter
def find_most_repetitive_substring(string, size, search):
res = Counter([string[i:i+size] for i in range(len(string) - size)])
return max(res, key=lambda x: (x.endswith(search), res.get(x)))
Example run:
inp = "0010000101010001011010011101001101000011100010100101010111"
s = find_most_repetitive_substring(inp, 4, "1")
print(s) # 0101

How to efficiently count word occurrences in Python without additional modules

Background
I'm working on a HackerRank problem Word Order. The task is to
Read the following input from stdin
4
bcdef
abcdefg
bcde
bcdef
Produce the output that reflects:
Number of unique words in first line
Count of occurrences for each unique words
Example:
3 # Number of unique words
2 1 1 # count of occurring words, 'bcdef' appears twice = 2
Problem
I've coded two solutions, the second one passes initial tests but fail due to exceeding time limit. First one would also work but I was unnecessarily sorting outputs (time limit issue would occur though).
Notes
In first solution I was unnecessarily sorting values, this is fixed in the second solution
I'm keen to be making better (proper) use of standard Python data structures, list/dictionary comprehension - I would be particularly keen to receive a solution that doesn't import any addittional modules, with exception of import os if needed.
Code
import os
def word_order(words):
# Output no of distinct words
distinct_words = set(words)
n_distinct_words = len(distinct_words)
print(str(n_distinct_words))
# Count occurrences of each word
occurrences = []
for distinct_word in distinct_words:
n_word_appearances = 0
for word in words:
if word == distinct_word:
n_word_appearances += 1
occurrences.append(n_word_appearances)
occurrences.sort(reverse=True)
print(*occurrences, sep=' ')
# for o in occurrences:
# print(o, end=' ')
def word_order_two(words):
'''
Run through all words and only count multiple occurrences, do the maths
to calculate unique words, etc. Attempt to construct a dictionary to make
the operation more memory efficient.
'''
# Construct a count of word occurrences
dictionary_words = {word:words.count(word) for word in words}
# Unique words are equivalent to dictionary keys
unique_words = len(dictionary_words)
# Obtain sorted dictionary values
# sorted_values = sorted(dictionary_words.values(), reverse=True)
result_values = " ".join(str(value) for value in dictionary_words.values())
# Output results
print(str(unique_words))
print(result_values)
return 0
if __name__ == '__main__':
q = int(input().strip())
inputs = []
for q_itr in range(q):
s = input()
inputs.append(s)
# word_order(words=inputs)
word_order_two(words=inputs)

Those nested loops are very bad performance wise (they make your algorithm quadratic) and quite unnecessary. You can get all counts in single iteration. You could use a plain dict or the dedicated collections.Counter:
from collections import Counter
def word_order(words):
c = Counter(words)
print(len(c))
print(" ".join(str(v) for _, v in c.most_common()))
The "manual" implementation that shows the workings of the Counter and its methods:
def word_order(words):
c = {}
for word in words:
c[word] = c.get(word, 0) + 1
print(len(c))
print(" ".join(str(v) for v in sorted(c.values(), reverse=True)))
# print(" ".join(map(str, sorted(c.values(), reverse=True))))

Without any imports, you could count unique elements by
len(set(words))
and count their occurrences by
def counter(words):
count = dict()
for word in words:
if word in count:
count[word] += 1
else:
count[word] = 1
return count.values()

You can use Counter then print output like below:
>>> from collections import Counter
>>> def counter_words(words):
... cnt = Counter(words)
... print(len(cnt))
... print(*[str(v) for k,v in c.items()] , sep=' ')
>>> inputs = ['bcdef' , 'abcdefg' , 'bcde' , 'bcdef']
>>> counter_words(inputs)
3
2 1 1

Trouble printing things out so that they are aligned

I'm trying to print out a list of results i have but I want them to be aligned. They currently look like:
table
word: frequency:
i 9
my 2
to 2
test 2
it 2
hate 1
stupid 1
accounting 1
class 1
because 1
its 1
from 1
six 1
seven 1
pm 1
how 1
is 1
this 1
helping 1
becuase 1
im 1
going 1
do 1
a 1
little 1
on 1
freind 1
ii 1
I want the frequency to be aligned with each other so they aren't going in this weird zig zag form. I tried playing around with adding things to the format but it didn't work. This is what my code looks like:
import string
from collections import OrderedDict
f=open('mariah.txt','r')
a=f.read() # read the text file like it would normal look ie no \n or anything
# print(a)
c=a.lower() # convert everything in the textfile to lowercase
# print(c)
y=c.translate(str.maketrans('','',string.punctuation)) # get rid of any punctuation
# print(y)
words_in_novel=y.split() # splitting every word apart. the default for split is on white-space characters. Why when i split like " " for the spacing is it then giving me \n?
#print(words_in_novel)
count={}
for word in words_in_novel:
#print(word)
if word in count: # if the word from the word_in_novel is already in count then add a one to that counter
count[word]+=1
else:
count[word]=1 # if the word is the first time in count set it to 1
print(count)
print("\n\n\n\n\n\n\n")
# this orderes the dictionary where its sorts them by the second term wehre t[1] refers to the term after the colon
# reverse so we are sorting from greatest to least values
g=(sorted(count.items(), key=lambda t: t[1], reverse=True))
# g=OrderedDict(sorted(count.items(), key=lambda t: t[1]))
print(g)
print("\n\n\n\n\n\n\n")
print("{:^20}".format("table"))
print("{}{:>20}".format("word:","frequency:"))
for i in g:
# z=g[i]
# print(i)
# a=len(i[0])
# print(a)
# c=50+a
# print(c)
print("{}{:>20}".format(i[0],i[1]))
Does anyone know how to make them going in a straight line?

You need to adjust the width/alighnment of your 1st column, not the 2nd.
The right way:
...
print("{:<20}{}".format("word:","frequency:"))
for i in g:
print("{:<20}{}".format(i[0],i[1]))
The output would look as:
word: frequency:
i 9
my 2
...
accounting 2
class 1
because 1
...

Ok , for the part of your code :
for i in g:
r = " "*25
#print("{}{:>20}".format(i[0],i[1]))
r[:len(i[0])] = i[0]
r = r[:22]+str(i[1])
print(r)
it should work

If you even find that the frequency is greater than a single digit you could try something like this:
max_len = max(len(i[0]) for i in g)
format_str = "{{:<{}}}{{:>{}}}".format(max_len, 20 - max_len)
for i in g:
print(format_str.format(i[0], i[1]))

Align words too
print("{:<10}{:>10}".format(i[0],i[1]))

Creating a Letter a Histogram

So I want to create a histogram.
Here is my code:
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def print_hist(h):
for c in h:
print c, h[c]
It give me this:
>>> h = histogram('parrot')
>>> print_hist(h)
a 1
p 1
r 2
t 1
o 1
But I want this:
a: 1
o: 1
p: 1
r: 2
t: 1
So how can I get my histogram in alphabetical order, be case sensitive (so "a" and "A" are the same), and list the whole alphabet (so letters that are not in the string just get a zero)?

Use an ordered dictionary which store keys in the order they were put in.
from collections import OrderedDict
import string
def count(s):
histogram = OrderedDict((c,0) for c in string.lowercase)
for c in s:
if c in string.letters:
histogram[c.lower()] += 1
return histogram
for letter, c in count('parrot').iteritems():
print '{}:{}'.format(letter, c)
Result:
a:1
b:0
c:0
d:0
e:0
f:0
g:0
h:0
i:0
j:0
k:0
l:0
m:0
n:0
o:1
p:1
q:0
r:2
s:0
t:1
u:0
v:0
w:0
x:0
y:0
z:0

Just use collections.Counter for this, unless you really want your own:
>>> import collections
>>> c = collections.Counter('parrot')
>>> sorted(c.items(), key=lambda c: c[0])
[('a', 1), ('o', 1), ('p', 1), ('r', 2), ('t', 1)]
EDIT: As commenters pointed out, your last sentence indicates you want data on all the letters of the alphabet that do not occur in your word. Counter is good for this also since, as the docs indicate:
Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError.
So you can just iterate through something like string.ascii_lowercase:
>>> import string
>>> for letter in string.ascii_lowercase:
... print('{}: {}'.format(letter, c[letter]))
...
a: 1
b: 0
c: 0
d: 0
e: 0
f: 0
g: 0
h: 0
i: 0
j: 0
k: 0
l: 0
m: 0
n: 0
o: 1
p: 1
q: 0
r: 2
s: 0
t: 1
u: 0
v: 0
w: 0
x: 0
y: 0
z: 0
Finally, rather than implementing something complicated to merge the results of upper- and lowercase letters, just normalize your input first:
c = collections.Counter('PaRrOt'.lower())

A trivial answer would be:
import string
for letter in string.ascii_lowercase:
print letter, ': ', h.lower().count(letter)
(highly inefficient as you go through the string 26 times)
Can also use a Counter
from collections import Counter
import string
cnt = Counter(h.lower())
for letter in string.ascii_lowercase:
print letter, ': ', cnt[letter]
Quite neater.

If you want it ordered then you are going to have to use an ordereddictionary
You also are going to need to order the letters before you add them to the dictionary
It is not clear to me I think you want a case insensitive result so we need to get all letters in one case
from collections import OrderedDict as od
import string
def histogram(s):
first we need to create the dictionary that has all of the lower case letters
we imported string which will provide us a list but I think it is all lowercase including unicode so we need to only use the first 26 in string.lowercase
d = od()
for each_letter in string.lowercase[0:26]:
d[each_letter] = 0
Once the dictionary is created then we just need to iterate through the word after it has been lowercased. Please note that this will blow up with any word that has a number or a space. You may or may not want to test or add numbers and spaces to your dictionary. One way to keep it from blowing up is to try to add a value. If the value is not in the dictionary just ignore it.
for c in s.lower():
try:
d[c] += 1
except ValueError:
pass
return d

If you want to list the whole (latin only) alphabet anyway, you could use a list of length 26:
hist = [0] * 26
for c in s.lower():
hist[orc(c) - ord('a')] += 1
To get the desired output:
for x in range(26):
print chr(x), ":", hist[x]

Check this function for your output
def print_hist(h):
for c in sorted(h):
print c, h[c]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding characters with minimum frequency - python

You are very close! If you have min value why not just iterate over your dictionary and check all keys that values are the min one? for k, v in frequency_dict.items(): if v == least_occurring: print(k)

Related

How to add or subtract 1 to a sum based on a certain value in a text?

Python recognize most likely to occur in a repeating pattern of letters/numbers

How to efficiently count word occurrences in Python without additional modules

Trouble printing things out so that they are aligned

Creating a Letter a Histogram

Categories

Resources