python get most common words , greater tha 3 characters - python

Hi I m quite new of Python
I m trying to figured out how to get the most common words listed in clean.txt file , but only word lenght > 3
`
import re
from collections import Counter
words = re.findall(r'\w+', open('clean.txt', 'r', encoding='utf-8').read().lower())
count = Counter(words).most_common(100)
# define a sort key
def sort_key(count):
return count[1]
def read_data():
f = open('clean.txt', 'r', encoding='utf-8')
s = f.read()
x = s.split()
for i in x:
if len(i) > 5:
print(i)
count.sort(key=sort_key, reverse=True)
print (count)
`
I tried print read_data but I ve got listed all words without showing number of times mentioned

Related

How to add digits in output

I have a file which I'm reading using python. In this file, I'm selecting certain numbers which are displayed as a list in the output, I want to add these numbers. Here is the code I'm using:
with open ("C:/xampp/htdocs/Final/uploads/file.dist", 'r') as rf:
g = [rf.replace(' ', '') for rf in rf]
k=[]
for e in g[1::47]:
r=(e[:12])
s=(r[:2])
i.append(s)
m= Counter(i)
for letter in m:
t= m[letter]
print(t)
This gives me output as follows:
80
80
80
80
I want to add these number so that the final output will be 320 (80+80+80+80). I've tried listing method, import math library, but none of them is giving me the required output. Any help will be highly appreciated.
Use += instead of = to add the values of m[letter] to t:
from collections import Counter
with open ("C:/path/file.dist", 'r') as rf:
g = [rf.replace(' ', '') for rf in rf]
i=[]
for e in g[1::47]:
r=(e[:12])
s=(r[:2])
i.append(s)
m = Counter(i)
t = 0
for letter in m:
t += m[letter]
print(t)

Find frequency of words line by line in txt file Python (how to format properly)

I'm trying to make a simple program that can find the frequency of occurrences in a text file line by line. I have it outputting everything correctly except for when more than one word is on a line in the text file. (More information below)
The text file looks like this:
Hello
Hi
Hello
Good Day
Hi
Good Day
Good Night
I want the output to be: (Doesn't have to be in the same order)
Hello: 2
Hi: 2
Good Day: 2
Good Night: 2
What it's currently outputting:
Day: 2
Good: 3
Hello: 2
Hi: 2
Night: 1
My code:
file = open("test.txt", "r")
text = file.read() #reads file (I've tried .realine() & .readlines()
word_list = text.split(None)
word_freq = {} # Declares empty dictionary
for word in word_list:
word_freq[word] = word_freq.get(word, 0) + 1
keys = sorted(word_freq.keys())
for word in keys:
final=word.capitalize()
print(final + ': ' + str(word_freq[word])) # Line that prints the output
You want to preserve the lines. Don't split. Don't capitalize. Don't sort
Use a Counter
from collections import Counter
c = Counter()
with open('test.txt') as f:
for line in f:
c[line.rstrip()] += 1
for k, v in c.items():
print('{}: {}'.format(k, v))
Instead of splitting the text by None, split it by each line break so you get each line into a list.
file = open("test.txt", "r")
text = file.read() #reads file (I've tried .realine() & .readlines()
word_list = text.split('\n')
word_freq = {} # Declares empty dictionary
for word in word_list:
word_freq[word] = word_freq.get(word, 0) + 1
keys = sorted(word_freq.keys())
for word in keys:
final=word.capitalize()
print(final + ': ' + str(word_freq[word])) # Line that prints the output
You can make it yourself very easy by using a Counter object. If you want to count the occurrences of full lines you can simply do:
from collections import Counter
with open('file.txt') as f:
c = Counter(f)
print(c)
Edit
Since you asked for a way without modules:
counter_dict = {}
with open('file.txt') as f:
l = f.readlines()
for line in l:
if line not in counter_dict:
counter_dict[line] = 0
counter_dict[line] +=1
print(counter_dict)
Thank you all for the answers, most of the code produces the desired output just in different ways. The code I ended up using with no modules was this:
file = open("test.txt", "r")
text = file.read() #reads file (I've tried .realine() & .readlines()
word_list = text.split('\n')
word_freq = {} # Declares empty dictionary
for word in word_list:
word_freq[word] = word_freq.get(word, 0) + 1
keys = sorted(word_freq.keys())
for word in keys:
final=word.capitalize()
print(final + ': ' + str(word_freq[word])) # Line that prints the output
The code I ended up using with modules was this:
from collections import Counter
c = Counter()
with open('live.txt') as f:
for line in f:
c[line.rstrip()] += 1
for k, v in c.items():
print('{}: {}'.format(k, v))

Python: How to increment the count when a variable repeats

I have a txt file which has following entries:
Rx = 34 // Counter gets incremented = 1, since the Rx was found for the first time
Rx = 2
Rx = 10
Tx = 2
Tx = 1
Rx = 3 // Counter gets incremented = 2, since the Rx was found for the first time after Tx
Rx = 41
Rx = 3
Rx = 19
I want to increment the count only for the 'Rx' that gets repeated for the first time and not for all the Rx in the text file My code is as follows:
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
print count
But this is giving me the count of all the Rx's in the txt file. I want the output as 2 and not 7.
Please help me out !
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
if count >=2:
break
print(m.group(0))
break the loop since you only needs to find out repeats.
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
if count >=2:
break
print count
By saying if m: it's going to continue to increment count as long as m != 0. If you'd like to only get the first 2, you need to introduce some additional logic.
if you want to find the count for the Rxes that are repeated 1x :
import re
rx_count = {}
with open("test.txt","r") as f:
count = 0
for lines in f:
if line.startswith('Rx'): rx_count[lines] = rx_count.get(lines,0)+1
now you have a counter dictionary in rx_count and we filter out all the values greater than 1, then sum those values together , and print out the count
rx_count = {k:v for k,v in rx_count.interitems() if v > 1}
count = sum(rx_count.values())
print count
To do exactly what you want, you're going need to keep track of which strings you've already seen.
You can do this by using a set to keep track of which you have seen until there is a duplicate, and then only counting occurrences of that string.
This example would do that
import re
count = 0
matches = set()
with open("test.txt", "r") as f:
for line in f:
m = re.search(r"Rx = \d{1,2}", line)
if not m:
# Skip the rest if no match
continue
if m.group(0) not in matches:
matches.add(m.group(0))
else:
# First string we saw
first = m.group(0)
count = 2
break
for line in f:
m = re.search(r"Rx = \d{1,2}", line)
## This or whatever check you want to do
if m.group(0) == first:
count += 1
print(count)

Function won't work when using a list created from a file

I am trying to create a list of words from a file is being read as then delete all words that contain duplicate letters. I was able to do it successfully with a list of words that I entered however when I try to use the function on the list created from a file the function still includes words with duplicates.
This works:
words = ['word','worrd','worrrrd','wordd']
alpha = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
x = 0
while x in range(0, len(alpha)):
i = 0
while i in range(0, len(words)):
if words[i].count(alpha[x]) > 1:
del(words[i])
i = i - 1
else:
i = i + 1
x = x + 1
print(words)
This is how I'm trying to do it when reading the file:
words = []
length = 5
file = open('dictionary.txt')
for word in file:
if len(word) == length+1:
words.append(word.splitlines())
alpha = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
x = 0
while x in range(0, len(alpha)):
i = 0
while i in range(0, len(words)):
if words[i].count(alpha[x]) > 1:
del(words[i])
i = i - 1
else:
i = i + 1
x = x + 1
print(words)
Try something like this. First, the string module is not quite deprecated, but it's unpopular. Lucky for you, it defines some useful constants to save you a bunch of typing. So you don't have to type all those quotes and commas.
Next, use the with open('filespec') as ... context for reading files: it's what it was put there for!
Finally, be aware of how iteration works for text files: for line in file: reads lines, including any trailing newlines. Strip those off. If you don't have one-word-per-line, you'll have to split the lines after you read them.
# Read words (possibly >1 per line) from dictionary.txt into lexicon[].
# Convert the words to lower case.
import string
Lexicon = []
with open('dictionary.txt') as file:
for line in file:
words = line.strip().lower().split()
Lexicon.extend(words)
for ch in string.ascii_lowercase:
for i in range(len(Lexicon)):
word = Lexicon[i]
if word.count(ch) > 1:
del Lexicon[i]
i -= 1
print('\n'.join(Lexicon))
How about this:
#This more comprehensive sample allows me to reproduce the file-reading
# problem in the script itself (before I changed the code "tee" would
# print, for instance)
words = ['green','word','glass','worrd','door','tee','wordd']
outlist = []
for word in words:
chars = [c for c in word]
# a `set` only contains unique characters, so if it is shorter than the
# `word` itself, we found a word with duplicate characters, so we keep
# looping
if len(set(chars)) < len(chars):
continue
else:
outlist.append(word)
print(outlist)
Result:
['word']
import string
words = ['word','worrd','worrrrd','wordd','5word']
new_words = [x for x in words if len(x) == len(set(x)) if all(i in string.ascii_letters for i in x)]
print(new_words)

Counting number of occurrence of a string in a text file

I have a text file containing:
Rabbit:Grass
Eagle:Rabbit
Grasshopper:Grass
Rabbit:Grasshopper
Snake:Rabbit
Eagle:Snake
I want to count the number of occurrence of a string, say, the number of times the animals occur in the text file and print the count. Here's my code:
fileName = input("Enter the name of file:")
foodChain = open(fileName)
table = []
for line in foodChain:
contents = line.strip().split(':')
table.append(contents)
def countOccurence(l):
count = 0
for i in l:
#I'm stuck here#
count +=1
return count
I'm unsure about how will python count the occurrence in a text file. The output i wanted is:
Rabbit: 4
Eagle: 2
Grasshopper: 2
Snake: 2
Grass: 2
I just need some help on the counting part and I will be able to manage the rest of it. Regards.
what you need is a dictionary.
dictionary = {}
for line in table:
for animal in line:
if animal in dictionary:
dictionary[animal] += 1
else:
dictionary[animal] = 1
for animal, occurences in dictionary.items():
print(animal, ':', occurences)
The solution using str.split(), re.sub() functions and collections.Counter subclass:
import re, collections
with open(filename, 'r') as fh:
# setting space as a common delimiter
contents = re.sub(r':|\n', ' ', fh.read()).split()
counts = collections.Counter(contents)
# iterating through `animal` counts
for a in counts:
print(a, ':', counts[a])
The output:
Snake : 2
Rabbit : 4
Grass : 2
Eagle : 2
Grasshopper : 2
Use in to judge if an array is an element of another array, in Python, you can use a string as array:
def countOccurence(l):
count = 0
#I'm stuck here#
if l in table:
count +=1
return count
from collections import defaultdict
dd = defaultdict(int)
with open(fpath) as f:
for line in f:
words = line.split(':')
for word in words:
dd[word] += 1
for k,v in dd.items():
print(k+': '+str(v))

Categories