I want to print the number of words in a txt file having 1-20 letter.
Tried this but it prints 20 zeroes instead. any idea?
edit - in the end the program should plot 20 numbers, each one is the number of words in the file containing 1-20 letters.
fin = open('words.txt')
for i in range(20):
counter = 0
for line in fin:
word = line.strip()
if len(word) == i:
counter = counter + 1
print counter,
EDIT
To produce individual counts for each word length you can use a collections.Counter:
from collections import Counter
def word_lengths(f):
for line in f:
for word in line.split(): # does not ignore punctuation
yield len(word)
with open('words.txt') as fin:
counts = Counter(length for length in word_lengths(fin) if length <= 20)
This uses a generator to read the file and produce a sequence of word lengths. The filtered word lengths are fed into a Counter. You could perform the length filtering on the Counter instead.
If you want to ignore punctuation you could look at using str.translate() to remove unwanted characters, or possibly re.split(r'\W+', line) instead of line.split().
Try it like this:
with open('words.txt') as fin:
counter = 0
for line in fin:
for word in line.split():
if len(word) <= 20:
counter = counter + 1
print counter,
This could be simplified to:
with open('words.txt') as fin:
counter = sum([1 for line in fin
for word in line.split() if len(word) <= 20])
but that's playing code golf.
You can also use a collections.Counter if it is practical to read the entire file into memory:
from collections import Counter
with open('words.txt') as fin:
c = Counter(fin.read().split())
counter = sum(c[k] for k in c if len(k) <= 20)
And no doubt there are many other ways to do it. None of the above expect or handle punctuation.
It should be like this,counter shouldn't be in for loop,and you could use len() method to get the length of words:
with open("test") as f:
counter = 0
for line in f:
for word in line.split():
if len(word)<=20:
counter+=1
print counter
Or my way:
import re
with open("file") as f:
print len(filter(lambda x:len(x)<20,re.split('\n| ', f.read())))
Hope this helps.
using regular expressions
import re
REGEX = r"(\b\S{1,20}\b)"
finder = re.compile(REGEX)
with open("words.txt") as out:
data = out.read()
matches = re.findall(finder, data)
lst = [0 for _ in range(20)]
for m in matches:
lst[len(m)] += 1
print(lst)
Related
I need to know which English words were used in the Italian chat and to count how many times they were used.
But in the output I also have the words I didn't use in the example chat (baby-blue-eyes': 0)
english_words = {}
with open("dizionarioen.txt") as f:
for line in f:
for word in line.strip().split():
english_words[word] = 0
with open("_chat.txt") as f:
for line in f:
for word in line.strip().split():
if word in english_words:
english_words[word] += 1
print(english_words)
You can simply iterate over your result and remove all elements that have value 0:
english_words = {}
with open("dizionarioen.txt") as f:
for line in f:
for word in line.strip().split():
english_words[word] = 0
with open("_chat.txt") as f:
for line in f:
for word in line.strip().split():
if word in english_words:
english_words[word] += 1
result = {key: value for key, value in english_words.items() if value}
print(result)
Also here is another solution that allows you to count words with usage of Counter:
from collections import Counter
with open("dizionarioen.txt") as f:
all_words = set(word for line in f for word in line.split())
with open("_chat.txt") as f:
result = Counter([word for line in f for word in line.split() if word in all_words])
print(result)
If you want to remove the words without occurrence after indexing, just delete these entries:
for w in list(english_words.keys()):
if english_words[w]==0: del english_words[w]
Then, your dictionary only contains words that occurred. Was that the question?
I am opening trying to create a function that opens a .txt file and counts the words that have the same length as the number specified by the user.
The .txt file is:
This is a random text document. How many words have a length of one?
How many words have the length three? We have the power to figure it out!
Is a function capable of doing this?
I'm able to open and read the file, but I am unable to exclude punctuation and find the length of each word.
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
if len(i) == number:
count += 1
return count
You can try using the replace() on the string and pass in the desired punctuation and replace it with an empty string("").
It would look something like this:
puncstr = "Hello!"
nopuncstr = puncstr.replace(".", "").replace("?", "").replace("!", "")
I have written a sample code to remove punctuations and to count the number of words. Modify according to your requirement.
import re
fin = """This is a random text document. How many words have a length of one? How many words have the length three? We have the power to figure it out! Is a function capable of doing this?"""
fin = re.sub(r'[^\w\s]','',fin)
print(len(fin.split()))
The above code prints the number of words. Hope this helps!!
instead of cascading replace() just use strip() a one time call
Edit: a cleaner version
pl = '?!."\'' # punctuation list
def samplePractice(number):
with open('sample.txt', 'r') as fin:
words = fin.read().split()
# clean words
words = [w.strip(pl) for w in words]
count = 0
for word in words:
if len(word) == number:
print(word, end=', ')
count += 1
return count
result = samplePractice(4)
print('\nResult:', result)
output:
This, text, many, have, many, have, have, this,
Result: 8
your code is almost ok, it just the second for block in wrong position
pl = '?!."\'' # punctuation list
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
i = i.strip(pl) # clean the word by strip
if len(i) == number:
count += 1
return count
result = samplePractice(4)
print(result)
output:
8
This is what I have so far:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
infile = open(filename)
lines = infile.readlines()
words = infile.read()
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
When executed, return the number of lines properly, but return 0 for words and character counts. Not sure why...
You can iterate through the file once and count lines, words and chars without seeking back to the beginning multiple times, which you would need to do with your approach because you exhaust the iterator when counting lines:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
lines = chars = 0
words = []
with open(filename) as infile:
for line in infile:
lines += 1
words.extend(line.split())
chars += len(line)
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
Or alternatively use the side effect that the file is at the end for chars:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
words = []
with open(filename) as infile:
for lines, line in enumerate(infile, 1):
words.extend(line.split())
chars = infile.tell()
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
you need to go back to beginning of file with infile.seek(0) after you read the position is at the end, seek(0) resets it to the start, so that you can read again.
infile = open('data')
lines = infile.readlines()
infile.seek(0)
print(lines)
words = infile.read()
infile.seek(0)
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
Output:
line count: 2
word count: 19
character counter: 113
other way of doing it....:
from collections import Counter
from itertools import chain
infile = open('data')
lines = infile.readlines()
cnt_lines = len(lines)
words = list(chain.from_iterable([x.split() for x in lines]))
cnt_words = len(words)
cnt_chars = len([ c for word in words for c in word])
# show words frequency
print(Counter(words))
You have exhausted the iterator after you call to readlines, you can seek back to the start but really you don't need to read all the file into memory at all:
def stats(filename):
chars, words, dupes = 0, 0, False
seen = set()
with open(filename) as f:
for i, line in enumerate(f, 1):
chars += len(line)
spl = line.split()
words += len(spl)
if dupes or not seen.isdisjoint(spl):
dupes = True
elif not dupes:
seen.update(spl)
return i, chars, words, dupes
Then assign the values by unpacking:
no_lines, no_chars, no_words, has_dupes = stats("your_file")
You may want to use chars += len(line.rstrip()) if you don't want to include the line endings. The code only stores exactly the amount of data needed, using readlines, read, dicts of full data etc.. means for large files your code won't be very practical
File_Name = 'file.txt'
line_count = 0
word_count = 0
char_count = 0
with open(File_Name,'r') as fh:
# This will produce a list of lines.
# Each line of the file will be an element of the list.
data = fh.readlines()
# Count of total number for list elements == total number of lines.
line_count = len(data)
for line in data:
word_count = word_count + len(line.split())
char_count = char_count + len(line)
print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)
I need a bit of help with Python code to count the frequency of consonants in a word. Consider the following sample input:
"There is no new thing under the sun."
Then the required output would be:
1 : 2
2 : 3
3 : 2
4 : 1
as there are 2 words with 1 consonant, 3 words with 2 consonants, 2 words with 3 consonants and 1 word with 4 consonants.
The following code does a similar job but instead of consonants it counts the frequency of whole words in text file. I know there is only a bit change which loops deeper into the word (I think).
def freqCounter(file1, file2):
freq_dict = {}
dict_static = {2:0, 3:0, 5:0}
# get rid of punctuation
punctuation = re.compile(r'[.?!,"\':;]') # use re.compile() function to convert string into a RegexObject.
try:
with open(file1, "r") as infile, open(file2, "r") as infile2: # open two files at once
text1 = infile.read() # read the file
text2 = infile2.read()
joined = " ".join((text1, text2))
for word in joined.lower().split():
#remove punctuation mark
word = punctuation.sub("", word)
#print word
l = len(word) # assign l tp be the word's length
# if corresponding word's length not found in dict
if l not in freq_dict:
freq_dict[l] = 0 # assign the dict key (the length of word) to value = 0
freq_dict[l] += 1 # otherwise, increase the value by 1
except IOError as e: # exception catch for error while reading the file
print 'Operation failed: %s' % e.strerror
return freq_dict # return the dictionary
Any help will be much appreciated!
I would try a simpler approach:
from collections import Counter
words = 'There is no new thing under the sun.'
words = words.replace('a', '').replace('e', '').replace('i', '').replace('o', '').replace('u', '') # you are welcome to replace this with a smart regex
# Now words have no more vowels i.e. only consonants
word_lengths = map(len, words.split(' '))
c = Counter(word_lengths)
freq_dict = dict(Counter(c))
A simple solution
def freqCounter(_str):
_txt=_str.split()
freq_dict={}
for word in _txt:
c=0
for letter in word:
if letter not in "aeiou.,:;!?[]\"`()'":
c+=1
freq_dict[c]=freq_dict.get(c,0)+ 1
return freq_dict
txt = "There is no new thing under the sun."
table=freqCounter(txt)
for k in table:
print( k, ":", table[k])
How about this?
with open('conts.txt', 'w') as fh:
fh.write('oh my god becky look at her butt it is soooo big')
consonants = "bcdfghjklmnpqrstvwxyz"
def count_cons(_file):
results = {}
with open(_file, 'r') as fh:
for line in fh:
for word in line.split(' '):
conts = sum([1 if letter in consonants else 0 for letter in word])
if conts in results:
results[conts] += 1
else:
results[conts] = 1
return results
print count_cons('conts.txt')
Missed the results
{1: 5, 2: 5, 3: 1, 4: 1}
[Finished in 0.0s]
import random
dictionary = open('word_list.txt', 'r')
for line in dictionary:
for i in range(0, len(line)):
if i >= 5:
word = random.choice(line)
dictionary.close()
this code doesnt seem to work for me
here is a link to the file if it helps
http://vlm1.uta.edu/~athitsos/courses/cse1310_summer2013/assignments/assignment8/word_list.txt
import random
with open('word_list.txt', 'r') as f:
words = [word.rstrip() for word in f if len(word) > 5]
print random.choice(words)
As #ashwini-chaudhary correctly pointed out, word on each step of iteration has newline \n at the end - that's why you need to use rstrip().
Assuming each word is on it's own line such as:
word
word2
word3
...
then you can do this:
from random import choice
with open("word_list.txt") as file:
print choice([line.rstrip() for line in file if len(line) > 5])