I've been having a hard time trying to solve this recently (although this looks like a trivial matter).
I have these 3 dictionaries:
letters_words = {'A': ['allow', 'arise'], 'B': ['bring', 'buy']}
words_cxns = {'allow': ['CXN1', 'CXN2'], 'arise': ['CXN1', 'CXN3'], 'bring': ['CXN2', 'CXN3'], 'buy': ['CXN3']}
cxns_ids = {'CXN1': 1, 'CXN2': 2, 'CXN3': 3}
Every letter has a few words, every word is associated with certain constructions, every construction has an id.
In the end I want to get this:
A
allow
CXN1, 1
CXN2, 2
arise
CXN1, 1
CXN3, 3
B
bring
CXN2, 2
CXN3, 3
buy
CXN3, 3
The spaces and punctuation don't matter... The main thing is that it gets listed right.
Here is what I'm currently doing:
for letter, words in zip(letters_words.keys(), letters_words.values()):
print(letter)
for word in words:
print(word)
for w, cnxs in zip(words_cxns.keys(), words_cxns.values()):
if w == word:
for c in cxns:
for cxn, ix in zip(cxns_ids.keys(), cxns_ids.values()):
if cxn == c:
print(c, ix)
However, my output looks like this at the moment:
A
allow
CXN1 1
CXN2 2
CXN3 3
arise
CXN1 1
CXN2 2
CXN3 3
B
bring
CXN1 1
CXN2 2
CXN3 3
buy
CXN1 1
CXN2 2
CXN3 3
What am I missing? :/
You do not need zip for this task, as the construction merely depends on the word, not on the iteration of words. Here is a possible solution that produces your desired output:
for letter, words in letters_words.items():
print('\n' + letter)
for word in words:
print('\n' + word)
cxns = words_cxns[word]
for cxn in cxns:
cxn_id = cxns_ids[cxn]
print(cxn, ',', cxn_id)
No need to zip:
letters_words = {'A': ['allow', 'arise'], 'B': ['bring', 'buy']}
words_cxns = {'allow': ['CXN1', 'CXN2'], 'arise': ['CXN1', 'CXN3'], 'bring': ['CXN2', 'CXN3'], 'buy': ['CXN3']}
cxns_ids = {'CXN1': 1, 'CXN2': 2, 'CXN3': 3}
for k,v in letters_words.items():
print("\n" + k + "\n")
for w in v:
print(w)
for word in words_cxns[w]:
print(word, cxns_ids[word])
Output:
A
allow
CXN1 1
CXN2 2
arise
CXN1 1
CXN3 3
B
bring
CXN2 2
CXN3 3
buy
CXN3 3
Try this, the idea is to get the cxns directly from the dictionary instead of using a second zip object. I commented on the relevant row.
for letter, words in zip(letters_words.keys(), letters_words.values()):
print(letter)
for word in words:
print(word)
# no need to create a new zip object, get value from dict instead
for cxns in words_cxns[word]:
print(cxns, cxns_ids[cxns])
That's embarrassing, but I've made a typo which I couldn't find for 2 days! On line 6 of my code suggestion, I've written cnxs instead of cxns. Once I changed it, everything worked!
Related
As the title says, I am trying to print ALL characters with the minimum frequency:
least_occurring is holding only ONE value so I think that is the issue but can't figure it out..it could be something obvious I am missing here but I am out of brain cells :)
ex: aabbccdddeeeffff
expected output:
Least occuring character : a,b,c <=== this is what I can't figure out :(
repeated 2 time(s)
--------------------------
Character Frequency
--------------------------
a 2
b 2
c 2
d 3
e 3
f 4
results I am getting:
Least occurring character is: a
It is repeated 2 time(s)
--------------------------
Character Frequency
--------------------------
a 2
b 2
c 2
d 3
e 3
f 4
my code:
# Get string from user
string = input("Enter some text: ")
# Set frequency as empty dictionary
frequency_dict = {}
tab="\t\t\t\t\t"
for character in string:
if character in frequency_dict:
frequency_dict[character] += 1
else:
frequency_dict[character] = 1
least_occurring = min(frequency_dict, key=frequency_dict.get)
# Displaying result
print("\nLeast occuring character is: ", least_occurring)
print("Repeated %d time(s)" %(frequency_dict[least_occurring]))
# Displaying result
print("\n--------------------------")
print("Character\tFrequency")
print("--------------------------")
for character, frequency in frequency_dict.items():
print(f"{character + tab + str(frequency)}")
The Counter class from the collections module is ideal for this. However, in this trivial case, just use a dictionary.
s = 'aabbccdddeeeffff'
counter = {}
# count the occurrences of the individual characters
for c in s:
counter[c] = counter.get(c, 0) + 1
# find the lowest value
min_ = min(counter.values())
# create a list of all characters where the count matches the previously calculated minimum
lmin = [k for k, v in counter.items() if v == min_]
# print the results
print('Least occuring character : ', end='')
print(*lmin, sep=', ')
Output:
Least occuring character : a, b, c
You are very close!
If you have min value why not just iterate over your dictionary and check all keys that values are the min one?
for k, v in frequency_dict.items():
if v == least_occurring:
print(k)
How can I use the split() function to figure out how many vowels there are in total?
How can I print the number of a, e, i, o, and u in each of these sentences?
The sentence is
'I study Python programming at KAIST Center For Gifted Education'
umm.... counter is not working guys..(i mean I want you to give me details one by one, not the built-in function of the basic Python, So that it can work on other coding programs.
I would suggest using collections.Counter(), e.g. like this:
from collections import Counter
sentence = 'I study Python programming at KAIST Center For Gifted Education'
counts = Counter(sentence)
print(counts['a'])
# 3
print(counts['A'])
# 1
print(counts['e'])
# 3
print(counts['E'])
# 1
print(counts['i'])
# 3
print(counts['I'])
# 2
print(counts['o'])
# 4
print(counts['O'])
# 0
print(counts['u'])
# 2
print(counts['U'])
# 0
If you'd like to count the vowels case-independently you can call .lower() on the sentence before passing it to Counter(), e.g.:
from collections import Counter
sentence = 'I study Python programming at KAIST Center For Gifted Education'
counts = Counter(sentence.lower())
print(counts['a'])
# 4
print(counts['e'])
# 4
print(counts['i'])
# 5
print(counts['o'])
# 4
print(counts['u'])
# 2
EDIT:
If for some reason you cannot use the collections library, strings have a count() method:
sentence = 'I study Python programming at KAIST Center For Gifted Education'
print(sentence.count('a'))
# 3
print(sentence.count('e'))
# 3
print(sentence.count('i'))
# 3
print(sentence.count('o'))
# 4
print(sentence.count('u'))
# 2
In case you'd like to count more than just vowels, it may be more efficient to "manually" count the sub-strings (i.e. vowels in your case), e.g.:
sentence = 'I study Python programming at KAIST Center For Gifted Education'
# Initialise counters:
vowels = {
'a': 0,
'e': 0,
'i': 0,
'o': 0,
'u': 0,
}
for char in sentence:
if char in vowels:
vowels[char] += 1
print(vowels['a'])
# 3
print(vowels['e'])
# 3
print(vowels['i'])
# 3
print(vowels['o'])
# 4
print(vowels['u'])
# 2
Check this?
from collections import Counter
x='I study Python programming at KAIST Center For Gifted Education'
x=[a for a in x]
vowels=['a','e','i','o','u']
check_list=[]
for check in x:
if check.lower() in vowels:
check_list.append(check.lower())
count=Counter(check_list)
print(count)
I'm trying to print out a list of results i have but I want them to be aligned. They currently look like:
table
word: frequency:
i 9
my 2
to 2
test 2
it 2
hate 1
stupid 1
accounting 1
class 1
because 1
its 1
from 1
six 1
seven 1
pm 1
how 1
is 1
this 1
helping 1
becuase 1
im 1
going 1
do 1
a 1
little 1
on 1
freind 1
ii 1
I want the frequency to be aligned with each other so they aren't going in this weird zig zag form. I tried playing around with adding things to the format but it didn't work. This is what my code looks like:
import string
from collections import OrderedDict
f=open('mariah.txt','r')
a=f.read() # read the text file like it would normal look ie no \n or anything
# print(a)
c=a.lower() # convert everything in the textfile to lowercase
# print(c)
y=c.translate(str.maketrans('','',string.punctuation)) # get rid of any punctuation
# print(y)
words_in_novel=y.split() # splitting every word apart. the default for split is on white-space characters. Why when i split like " " for the spacing is it then giving me \n?
#print(words_in_novel)
count={}
for word in words_in_novel:
#print(word)
if word in count: # if the word from the word_in_novel is already in count then add a one to that counter
count[word]+=1
else:
count[word]=1 # if the word is the first time in count set it to 1
print(count)
print("\n\n\n\n\n\n\n")
# this orderes the dictionary where its sorts them by the second term wehre t[1] refers to the term after the colon
# reverse so we are sorting from greatest to least values
g=(sorted(count.items(), key=lambda t: t[1], reverse=True))
# g=OrderedDict(sorted(count.items(), key=lambda t: t[1]))
print(g)
print("\n\n\n\n\n\n\n")
print("{:^20}".format("table"))
print("{}{:>20}".format("word:","frequency:"))
for i in g:
# z=g[i]
# print(i)
# a=len(i[0])
# print(a)
# c=50+a
# print(c)
print("{}{:>20}".format(i[0],i[1]))
Does anyone know how to make them going in a straight line?
You need to adjust the width/alighnment of your 1st column, not the 2nd.
The right way:
...
print("{:<20}{}".format("word:","frequency:"))
for i in g:
print("{:<20}{}".format(i[0],i[1]))
The output would look as:
word: frequency:
i 9
my 2
...
accounting 2
class 1
because 1
...
Ok , for the part of your code :
for i in g:
r = " "*25
#print("{}{:>20}".format(i[0],i[1]))
r[:len(i[0])] = i[0]
r = r[:22]+str(i[1])
print(r)
it should work
If you even find that the frequency is greater than a single digit you could try something like this:
max_len = max(len(i[0]) for i in g)
format_str = "{{:<{}}}{{:>{}}}".format(max_len, 20 - max_len)
for i in g:
print(format_str.format(i[0], i[1]))
Align words too
print("{:<10}{:>10}".format(i[0],i[1]))
Let's say I have a text file with thousands of lines of the following form:
Word Number1 Number2
In this text file, the "Word" is indeed some word that changes from one line to another, and the numbers are likewise changing numbers. However, some of these words are the same... Consider the following example:
Hello 5 7
Hey 3 2
Hi 7 3
Hi 5 2
Hello 1 4
Hey 5 2
Hello 8 1
What would be a python script that reads the text file and keeps only the lines that contain the highest Number1 for any given Word (deleting all lines that do not satisfy this condition)? The output for the above example with such a script would be:
Hi 7 3
Hey 5 2
Hello 8 1
Note: the order of the lines in the output is irrelevant, all that matters is that the above condition is satisfied. Also, if for a given Word, the highest Number1 is the same for two or more lines, the output should keep only one of them, such that there is only one occurence of any Word in the output.
I've no clue how to approach the deletion aspect, but I can guess (perhaps incorrectly) that the first step would be to make a list from all the lines in the text file, i.e.
List1 = open("textfile.txt").readlines()
At any rate, many thanks in advance for the help!
You can try this:
f = [i.strip('\n').split() for i in open('the_file.txt')]
other_f = {i[0]:map(int, i[1:]) for i in f}
for i in f:
if other_f[i[0]][0] < int(i[1]):
other_f[i[0]] = map(int, i[1:])
new_f = open('the_file.txt', 'w')
for a, b in other_f.items():
new_f.write(a + " "+' '.join(map(str, b))+"\n")
new_f.close()
Output:
Hi 7 3
Hello 8 1
Hey 5 2
You can store the lines in a dict, with the words as keys. To make things easier, you can store a tuple with the value of the first numeric field (converted to integer, otherwise you would sort by lexicographic order) and the line.
We use dict.setdefault in case we encounter the word for the first time.
highest = {}
with open('text.txt') as f:
for line in f:
name, val, _ = line.split(' ', 2)
val = int(val)
if val > highest.setdefault(name, (val, line))[0]:
highest[name] = (val, line)
out = [tup[1] for name, tup in highest.items()]
print(''.join(out))
# Hey 5 2
# Hello 8 1
# Hi 7 3
first sorted the list with 1st and 2nd column as the key from high to low
then remove the duplicate items
list1 = open(r'textfile.txt').read().splitlines()
output = sorted(list1, key=lambda x:(x.split()[0], int(x.split()[1])), reverse=True)
uniq_key = []
for i in sorted_dat:
key = i.split()[0]
if key in uniq_key:
output.remove(i)
else:
uniq_key.append(key)
>>> output
['Hi 7 3', 'Hey 5 2', 'Hello 8 1']
Because file objects are iterable, it is not necessary to do the readlines up front. So let's open the file and then just iterate over it using a for loop.
fin = open('sometext.txt')
We create a dictionary to hold the results, as we go.
topwords = dict()
Iterating now, over the lines in the file:
for line in fin:
We strip off the new line characters and split the lines into individual strings, based on where the spaces are (the default behavior for split()).
word, val1, val2 = line.strip().split()
val1 = int(val1)
We check to see if we have already seen the word, if yes, we then check to see if the first value is greater than the first value previously stored.
if word in topwords:
if val1 > topwords[word][0]:
topwords[word] = [val1, val2]
else:
topwords[word] = [val1, val2]
Once we finish parsing all the words, we go back and iterate over the top words and print the results to the screen.
for word in topwords:
output = '{} {} {}'.format(word, *topwords[word])
print(output)
The final script looks like this:
fin = open('sometext.txt')
topwords = dict()
for line in fin:
word, val1, val2 = line.strip().split()
val1 = int(val1)
if word in topwords:
if val1 > topwords[word][0]:
topwords[word] = [val1, val2]
else:
topwords[word] = [val1, val2]
for word in topwords:
output = '{} {} {}'.format(word, *topwords[word])
print(output)
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I am looking for a python program that counts the frequencies of each word in a text, and output each word with its count and line numbers where it appears.
We define a word as a contiguous sequence of non-white-space characters. (hint: split())
Note: different capitalizations of the same character sequence should be considered same word, e.g. Python and python, I and i.
The input will be several lines with the empty line terminating the text. Only alphabet characters and white spaces will be present in the input.
The output is formatted as follows:
Each line begins with a number indicating the frequency of the word, a white space, then the word itself, and a list of line numbers containing this word.
Sample Input
Python is a cool language but OCaml
is even cooler since it is purely functional
Sample Output
3 is 1 2
1 a 1
1 but 1
1 cool 1
1 cooler 2
1 even 2
1 functional 2
1 it 2
1 language 1
1 ocaml 1
1 purely 2
1 python 1
1 since 2
PS.
I am not a student I am learning Python on my own..
Using collections.defaultdict, collections.Counter and string formatting:
from collections import Counter, defaultdict
data = """Python is a cool language but OCaml
is even cooler since it is purely functional"""
result = defaultdict(lambda: [0, []])
for i, l in enumerate(data.splitlines()):
for k, v in Counter(l.split()).items():
result[k][0] += v
result[k][1].append(i+1)
for k, v in result.items():
print('{1} {0} {2}'.format(k, *v))
Output:
1 since [2]
3 is [1, 2]
1 a [1]
1 it [2]
1 but [1]
1 purely [2]
1 cooler [2]
1 functional [2]
1 Python [1]
1 cool [1]
1 language [1]
1 even [2]
1 OCaml [1]
If the order matters, you can sort the result this way:
items = sorted(result.items(), key=lambda t: (-t[1][0], t[0].lower()))
for k, v in items:
print('{1} {0} {2}'.format(k, *v))
Output:
3 is [1, 2]
1 a [1]
1 but [1]
1 cool [1]
1 cooler [2]
1 even [2]
1 functional [2]
1 it [2]
1 language [1]
1 OCaml [1]
1 purely [2]
1 Python [1]
1 since [2]
Frequency tabulations are often best solved with a counter.
from collections import Counter
word_count = Counter()
with open('input', 'r') as f:
for line in f:
for word in line.split(" "):
word_count[word.strip().lower()] += 1
for word, count in word_count.iteritems():
print "word: {}, count: {}".format(word, count)
Ok, so you've already identified split to turn your string into a list of words. You want to list the lines on which each word occurs, however, so you should split the string first into lines, then into words. Then, you can create a dictionary, where keys are the words (put to lowercase first), and the values can be a structure containing the number of occurrences and the lines of occurrence.
You may also want to put in some code to check whether something is a valid word (e.g. whether it contains numbers), and to sanitise a word (remove punctuation). I'll leave these up to you.
def wsort(item):
# sort descending by count, then ascending alphabetically
word, freq = item
return -freq['count'], word
def wfreq(str):
words = {}
# split by line, then by word
lines = [line.split() for line in str.split('\n')]
for i in range(len(lines)):
for word in lines[i]:
# if the word is not in the dictionary, create the entry
word = word.lower()
if word not in words:
words[word] = {'count':0, 'lines':set()}
# update the count and add the line number to the set
words[word]['count'] += 1
words[word]['lines'].add(i+1)
# convert from a dictionary to a sorted list using wsort to give the order
return sorted(words.iteritems(), key=wsort)
inp = "Python is a cool language but OCaml\nis even cooler since it is purely functional"
for word, freq in wfreq(inp):
# generate the desired list format
lines = " ".join(str(l) for l in list(freq['lines']))
print "%i %s %s" % (freq['count'], word, lines)
This should provide the exact same output as in your sample:
3 is 1 2
1 a 1
1 but 1
1 cool 1
1 cooler 2
1 even 2
1 functional 2
1 it 2
1 language 1
1 ocaml 1
1 purely 2
1 python 1
1 since 2
First of all find all the words that are present in the text. Using split().
In case the text is present in a file, then we will first get it into a string, and all it text. also remove all the \n from the text.
filin=open('file','r')
di = readlines(filin)
text = ''
for i in di:
text += i</pre></code>
now check the number of times each word is there in the text. we will deal with the line numbers later.
dicts = {}
for i in words_list:
dicts[i] = 0
for i in words_list:
for j in range(len(text)):
if text[j:j+len(i)] == i:
dicts[i] += 1
now we have a dictionary with the words as keys and the values being the mumber of times the word appears in the text.
now for the line numbers:
dicts2 = {}
for i in words_list:
dicts2[i] = 0
filin.seek(0)
for i in word_list:
filin.seek(0)
count = 1
for j in filin:
if i in j:
dicts2[i] += (count,)
count += 1
now dicts2 has the words as the key and the list of the line numbers it is in as the values. inside a tuple
if in case the data is already in a string, you just need to remove all the \ns.
di = split(string_containing_text,'\n')
and everything else will be the same.
i am sure you can format the output.