How to count instances of words in multiple lines?

How to count instances of words in multiple lines? - python

I'm new to python and I'm learning it slowly. I'm trying to code a simple word counter that tracks instances of words across multiple lines. I'm attempting to place the line into a list and then track each list point in a dictionary, whilst removing each word from the list as the dictionary is updated. So far I have:
dic = {}
count = ''
liste = line.split()
listes = liste[0]
num = 0
while line:
while not liste:
listes = liste[0]
if listes in dic:
count = str(dic[listes])
count = count.rstrip("]")
count = count.lstrip("[")
count = int(count) + 1
liste.pop(0)
else:
skadoing = 1
dic [listes] = [skadoing]
line = input("Enter line: ")
for word in sorted(dic):
print(word, dic[word])
When run, it currently outputs the following:
Enter line: which witch
Enter line: is which
Enter line:
which ['']
I need it to output this:
Enter line: which witch
Enter line: is which
Enter line:
is 1
which 2
witch 1
liste is the list of words from the inputted line and listes is the word that I'm trying to update in the dictionary.
Any ideas?

I believe this is what you're looking to achieve:
dic = {}
line = input("Enter line: ")
while line:
for word in line.split(" "):
if word not in dic:
dic[word] = 1
else:
dic[word] +=1
line = input("Enter line: ")
for word in sorted(dic):
print(word, dic[word])
Output:
Enter line: hello world
Enter line: world
Enter line:
hello 1
world 2

If you really want to implement this by yourself and count the words, then it would be great to use defaultdict:
from collections import defaultdict
sentence = '''this is a test for which is witch and which
because of which'''
words = sentence.split()
d = defaultdict(int)
for word in words:
d[word] = d[word]+ 1
print(d)
Output:
{'this': 1, 'is': 2, 'a': 1, 'test': 1, 'for': 1, 'which': 3, 'witch': 1, 'and': 1, 'because': 1, 'of': 1}

Maybe you can use the collections package to do the job:-
from collections import Counter
line = input("Enter line: ")
words = line.split(" ")
word_count = dict(Counter(words))
print(word_count)
Enter line: hi how are you are you fine
{'hi': 1, 'how': 1, 'are': 2, 'you': 2, 'fine': 1}
Hope this helps!!

Related

How to update list value in dictionary in Python

Beginner here. I'm currently writing a program that will turn every word in a "movie reviews" text file into a key, storing a list value containing the review number and the number of times the word has been seen. For example:
4 I loved it
1 I hated it
... might look like this as a dictionary:
words['i'] = [5,2]
words['loved'] = [4,1]
words['it'] = [5,2]
words['hated'] = [1,1]
However, this is the output I've been getting:
{'i': [1, 2], 'loved': [4, 1], 'it': [1, 2], 'hated': [1, 1]}
I figured out the counter part, but I can't figure out how to update the review number. Here is my code so far:
def main():
reviews = open("testing.txt", "r")
data = reviews.read();
reviews.close()
# create new dictionary
words = {}
# iterate over every review in text file
splitlines = data.split("\n")
for line in splitlines:
lower = line.lower()
value = lower.split()
rev = int(value[0])
for word in value:
if word.isalpha():
count = 1
if word not in words:
words[word] = [rev, count]
else:
words[word] = [rev, count + 1]
How can I update the review number count?

This is pretty easy to do. Assuming each key has only 2 items in the value list:
if word not in words:
words[word] = [rev, 1]
else:
temp = words[word][1]
words[word] = [rev, temp + 1]

When updating the count, you're using count + 1, but count will always be 1 here; you need to retrieve the existing count first, using something like: count = words[word][1]

How to write two dictionaries into an output file "txt"

This is actually a 4 part question:
1) Returns a dictionary, which each key is word length and its value is the number of words with that length.
e.g. if the input file's text is "Hello Python people Welcome to the world of Python", then the dictionary should be:
{2: 2, 3: 1, 5: 2, 6: 3, 7: 1}
2) Returns a dictionary, which each key is a word and its value is the number of occurrences of that word.
e.g. {'hello': 1, 'of': 1, 'people': 1, 'python': 2, 'the': 1, 'to':
1,'welcome': 1, 'world': 1}
I already completed the first two parts using the following codes below.
def make_length_wordcount(x):
filename=x+'.txt'
infile=open(filename)
wordlist=infile.read().split()
counter1={}
for word in wordlist:
if len(word) in counter1:
counter1[len(word)]+=1
else:
counter1[len(word)]=1
infile.close()
print(counter1)
def make_word_count(string):
words=string.lower().split()
dictionary={}
for word in words:
dictionaryp[word]=0
for word in words:
dictionary[word]+=1
print(dictionary)
I'M HAVING TROUBLE FIGURING OUT HOW TO DO PART 3) AND 4):
3) Uses the two functions above - make_length_wordcount() and make_word_count() - to construct (i) length-wordcount dictionary and (ii) word count dictionary.
Opens a new output file "FILE_analyzed_FIRST_LAST.txt" and write two dictionaries into this file (in the format below). the output file name is
"test_analyzed_HYUN_KANG.txt" and it should contain the following lines:
Words of length 2 : 2
Words of length 3 : 1
Words of length 5 : 2
Words of length 6 : 3
Words of length 7 : 1
to : 1
of : 1
people : 1
the : 1
python : 2
welcome : 1
hello : 1
world : 1
4) In "hw2_FIRST_LAST.py" file, run the analyze_text() function three times with the following inputs:
a. "nasdaq.txt"
b. "raven.txt"
c. "frankenstein.txt"
Your hw2.py code should generate the following three files:
"nasdaq_analyzed_FIRST_LAST.txt", "raven_analyzed_FIRST_LAST.txt",
"frankenstein_analyzed_FIRST_LAST.txt"
My instructor didn't really teach us anything about writing files, so this is very confusing to me.

A few things first:
1) you can avoid the
if len(word) in counter1:
counter1[len(word)]+=1
else:
counter1[len(word)]=1
by using defaultdict or Counter from the collections module:
counter1 = defaultdict(int)
for word in wordlist:
counter1[word] += 1
Same applies to make_word_count:
def make_word_count(string):
words = string.lower().split()
dictionary = Counter(words)
print(dictionary.most_common(10))
For your 3rd points (I didn't test, but you get the idea):
def make_text_wordlen(counter1):
text = ''
for wordlen, value in counter1.items():
text += f'Words of length {wordlen} : {value}\n'
with open('your_file.txt', 'w') as f:
f.write(text)
def make_text_wordcount(dictionary):
text = ''
for word, count in dictionary.items():
text += f'{word} : {count}\n'
with open('your_file.txt', 'a') as f: # 'a' parameter to append to existing file
f.write(text)
I'll let you figure out the 4th point.

Adding list to dictionary while updating occurrences of a string with a counter

I am having troubles updating my dictionary and matching the key value pairs.
My program should split a string into a list of words. Then it should update a dictionary that keeps track of each unique word in the list along with its count.
For example, the output should resemble something like this:
string = "asdf asdf asdf hello hello hello world"
then my program would print
{'asdf': 3, 'hello': 3, 'world': 1}
my code looks like this:
dicto = {}
user = input("enter some text: ")
listo = []
listo = user.split()
for i in range (len(listo)):
count = 1
dicto = {listo[i]: count}
if listo[i] in dicto:
count = count + 1
print dicto
and the output for my string example is:
{'world': 1}

There is already a tool that does exactly that:
from collections import Counter
string = "asdf asdf asdf hello hello hello world"
c = Counter(string.split())
print(c)
This yields:
Counter({'asdf': 3, 'hello': 3, 'world': 1})
Counter has many useful functions, for example getting the least common word.
https://docs.python.org/3.7/library/collections.html#collections.Counter

somestr = "asdf asdf asdf hello hello hello world"
words = somestr.split(" ")
unique_words = set(words)
words_counts = {}
for word in unique_words:
words_counts[word] = somestr.count(word)
print(words_counts)

How to count the number of letters in a word?

I'm trying to create a program where if you input a word, it will print out each letter of the word and how many times the letter appears in that word.
Eg; when I input "aaaarggh", the output should be "a 4 r 1 g 2 h 1".
def compressed (word):
count = 0
index = 0
while index < len(word):
letter = word[index]
for letter in word:
index = index + 1
count = count + 1
print(letter, count)
break
print("Enter a word:")
word = input()
compressed(word)
So far it just prints out each letter and position in the word.
Any help appreciated, thank you!
(no using dict method)

Just type (for Python 2.7+):
import collections
dict(collections.Counter('aaaarggh'))
having:
{'a': 4, 'g': 2, 'h': 1, 'r': 1}

a="aaaarggh"
d={}
for char in set(a):
d[char]=a.count(char)
print(d)
output
{'a': 4, 'h': 1, 'r': 1, 'g': 2}

try this, You can use counter it will return dict type
from collections import Counter
print(Counter("aaaarggh"))

One way of implementing it using a dict:
def compressed(word):
letters = dict()
for c in word:
letters[c] = letters.get(c, 0) + 1
for key, value in letters.items():
print(f'{value}{key}', end=' ')

As others have suggested, you can do this easily with a dict !
test_input = "aaaarggh"
def compressed (word):
letter_dict = {}
for letter in test_input:
if letter not in letter_dict:
letter_dict[letter] = 1
else:
letter_dict[letter] = letter_dict[letter]+1
return letter_dict
print(compressed(test_input))
Outputs:
{'a': 4, 'r': 1, 'g': 2, 'h': 1}

Counter is concise. But here's an alternative using defaultdict, which is a subclass of dict.
from collections import defaultdict
test_input = "aaaarggh"
d = defaultdict(int)
for letter in test_input:
d[letter] += 1
https://docs.python.org/3.6/library/collections.html#defaultdict-examples

def counter(word):
dic ={}
for i in [*word]:
counter = word.count(i)
d={i:counter}
dic.update(d)
return dic
counter("aaaarggh")

How to stop having duplicate words in the list when using a list of positions in python

I have code which saves all the words in the sentence to a text file and saves the list of positions in to another textfile.
Rather than saving all the words in to the list I'm trying to find a method so that it will only save each word once to avoid duplication.
Additionally for my list of positions it will see if the word appears more than once and if it does it saves it as the first position which appears in the word which is fine but then it skips a position e.g [1,2,3,2,5] rather than the last position be 5 it should be 4 as there's no position 4 if that makes sense.
I don't expect anyone to do this for me but is there a method I should be using e.g if word in sentence do x or using enumerate()?
Here is my code:
#SUBROUTINES
def saveItem():
#save an item into a new file
print("creating a text file with the write() method")
textfile=open("positions.txt","w")
textfile.write(positions)
textfile.write("\n")
textfile.close()
print("The file has been added!")
#SUBROUTINES
def saveItem2():
#save an item into a new file
print("creating a text file with the write() method")
textfile=open("words.txt","w")
textfile.write(str(words))
textfile.write("\n")
textfile.close()
print("The file has been added!")
#mainprogram
sentence = input("Write your sentence here ")
words = sentence.split()
positions = str([words.index(word) + 1 for word in words])
print (sentence)
print (positions)
#we have finished with the file now.
a=True
while a:
print("what would you like to do?:\n\
1.Save a list of words?\n\
2.Save a list of positions?\n\
3.quit?\n\:")
z=int(input())
if z == 1:
saveItem()
elif z==2:
saveItem2()
elif z ==3:
print("Goodbye!!!")
a=False
else:
print("incorrect option")
Sample input sentence:
Programming is great Programming is so much fun
Sample list of words stored in text file:
['Programming','is','great','Programming','is','so','much','fun']
(the words are repeated)
Sample positions:
[1,2,3,1,2,6,7,8]
Instead I'd like the list to be stored like:
['Programming','is','great','so,'much','fun']
and the list of positions like:
[1,2,3,1,2,4,5,6]

Haven't tested it but I think this should work:
from collections import Counter
sentence = raw_input(">>> ")
words, positions, d = [], [], {}
for i,word in enumerate(sentence.split(' ')):
if word not in d.keys():
d[word]=i
words.append(word)
positions.append(d[word])
# To further process the list
c, new_positions = Counter(positions), []
cnt = list(i for i in range(len(positions)+1) if not(i in c and c[i]>1))
new_positions = [p if c[p]>1 else cnt.pop(0) for p in positions]
# store the positions result
with open('positions.txt','w') as f:
f.write(' '.join(map(str,new_positions)))
# store the words result
with open('words.txt','w') as w:
w.write(' '.join(words))
Output:
$ ./test.py
>>> Programming is great Programming is so much fun
Words list: ['Programming', 'is', 'great', 'so', 'much', 'fun']
Positions list: [0, 1, 2, 0, 1, 5, 6, 7]
New Positions list: [0, 1, 2, 0, 1, 3, 4, 5]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to count instances of words in multiple lines? - python

Related

How to update list value in dictionary in Python

How to write two dictionaries into an output file "txt"

Adding list to dictionary while updating occurrences of a string with a counter

How to count the number of letters in a word?

How to stop having duplicate words in the list when using a list of positions in python

Categories

Resources