I'm trying to print out a list of results i have but I want them to be aligned. They currently look like:
table
word: frequency:
i 9
my 2
to 2
test 2
it 2
hate 1
stupid 1
accounting 1
class 1
because 1
its 1
from 1
six 1
seven 1
pm 1
how 1
is 1
this 1
helping 1
becuase 1
im 1
going 1
do 1
a 1
little 1
on 1
freind 1
ii 1
I want the frequency to be aligned with each other so they aren't going in this weird zig zag form. I tried playing around with adding things to the format but it didn't work. This is what my code looks like:
import string
from collections import OrderedDict
f=open('mariah.txt','r')
a=f.read() # read the text file like it would normal look ie no \n or anything
# print(a)
c=a.lower() # convert everything in the textfile to lowercase
# print(c)
y=c.translate(str.maketrans('','',string.punctuation)) # get rid of any punctuation
# print(y)
words_in_novel=y.split() # splitting every word apart. the default for split is on white-space characters. Why when i split like " " for the spacing is it then giving me \n?
#print(words_in_novel)
count={}
for word in words_in_novel:
#print(word)
if word in count: # if the word from the word_in_novel is already in count then add a one to that counter
count[word]+=1
else:
count[word]=1 # if the word is the first time in count set it to 1
print(count)
print("\n\n\n\n\n\n\n")
# this orderes the dictionary where its sorts them by the second term wehre t[1] refers to the term after the colon
# reverse so we are sorting from greatest to least values
g=(sorted(count.items(), key=lambda t: t[1], reverse=True))
# g=OrderedDict(sorted(count.items(), key=lambda t: t[1]))
print(g)
print("\n\n\n\n\n\n\n")
print("{:^20}".format("table"))
print("{}{:>20}".format("word:","frequency:"))
for i in g:
# z=g[i]
# print(i)
# a=len(i[0])
# print(a)
# c=50+a
# print(c)
print("{}{:>20}".format(i[0],i[1]))
Does anyone know how to make them going in a straight line?
You need to adjust the width/alighnment of your 1st column, not the 2nd.
The right way:
...
print("{:<20}{}".format("word:","frequency:"))
for i in g:
print("{:<20}{}".format(i[0],i[1]))
The output would look as:
word: frequency:
i 9
my 2
...
accounting 2
class 1
because 1
...
Ok , for the part of your code :
for i in g:
r = " "*25
#print("{}{:>20}".format(i[0],i[1]))
r[:len(i[0])] = i[0]
r = r[:22]+str(i[1])
print(r)
it should work
If you even find that the frequency is greater than a single digit you could try something like this:
max_len = max(len(i[0]) for i in g)
format_str = "{{:<{}}}{{:>{}}}".format(max_len, 20 - max_len)
for i in g:
print(format_str.format(i[0], i[1]))
Align words too
print("{:<10}{:>10}".format(i[0],i[1]))
Related
After reading a text, I need to add 1 to a sum if I find a ( character, and subtract 1 if I find a ) character in the text. I can't figure out what I'm doing wrong.
This is what I tried at first:
file = open("day12015.txt")
sum = 0
up = "("
for item in file:
if item is up:
sum += 1
else:
sum -= 1
print(sum)
I have this long text like the following example (((())))((((( .... If I find a ), I need to subtract 1, if I find a (, I need to add 1. How can I solve it? I'm always getting 0 as output even if I change my file manually.
your for loop only gets all the string in the file so you have to loop through the string to get your desired output.
Example .txt
(((())))(((((
Full Code
file = open("Data.txt")
sum = 0
up = "("
for string in file:
for item in string:
if item is up:
sum += 1
else:
sum -= 1
print(sum)
Output
5
Hope this helps.Happy Coding :)
So you need to sum +1 for "(" character and -1 for ")".
Do it directly specifying what to occur when you encounter this character. Also you need to read the lines from a file as you're opening it. In your code, you are substracting one for every case that is not "(".
file = open("day12015.txt")
total = 0
for line in file:
for character in line:
if character == "(":
total += 1
elif character == ")":
total -= 1
print(sum)
That's simply a matter of counting each character in the text. The sum is the difference between those counts. Look:
from pathlib import Path
file = Path('day12015.txt')
text = file.read_text()
total = text.count('(') - text.count(')')
For the string you posted, for example, we have this:
>>> p = '(((())))((((('
>>> p.count('(') - p.count(')')
5
>>>
Just for comparison and out of curiosity, I timed the str.count() and a loop approach, 1,000 times, using a string composed of 1,000,000 randoms ( and ). Here is what I found:
import random
from timeit import timeit
random.seed(0)
p = ''.join(random.choice('()') for _ in range(1_000_000))
def f():
return p.count('(') - p.count(')')
def g():
a, b = 0, 0
for c in p:
if c == '(':
a = a + 1
else:
b = b + 1
return a - b
print('f: %5.2f s' % timeit(f, number=1_000))
print('g: %5.2f s' % timeit(g, number=1_000))
f: 8.19 s
g: 49.34 s
It means the loop approach is 6 times slower, even though the str.count() one is iterating over p two times to compute the result.
As the title says, I am trying to print ALL characters with the minimum frequency:
least_occurring is holding only ONE value so I think that is the issue but can't figure it out..it could be something obvious I am missing here but I am out of brain cells :)
ex: aabbccdddeeeffff
expected output:
Least occuring character : a,b,c <=== this is what I can't figure out :(
repeated 2 time(s)
--------------------------
Character Frequency
--------------------------
a 2
b 2
c 2
d 3
e 3
f 4
results I am getting:
Least occurring character is: a
It is repeated 2 time(s)
--------------------------
Character Frequency
--------------------------
a 2
b 2
c 2
d 3
e 3
f 4
my code:
# Get string from user
string = input("Enter some text: ")
# Set frequency as empty dictionary
frequency_dict = {}
tab="\t\t\t\t\t"
for character in string:
if character in frequency_dict:
frequency_dict[character] += 1
else:
frequency_dict[character] = 1
least_occurring = min(frequency_dict, key=frequency_dict.get)
# Displaying result
print("\nLeast occuring character is: ", least_occurring)
print("Repeated %d time(s)" %(frequency_dict[least_occurring]))
# Displaying result
print("\n--------------------------")
print("Character\tFrequency")
print("--------------------------")
for character, frequency in frequency_dict.items():
print(f"{character + tab + str(frequency)}")
The Counter class from the collections module is ideal for this. However, in this trivial case, just use a dictionary.
s = 'aabbccdddeeeffff'
counter = {}
# count the occurrences of the individual characters
for c in s:
counter[c] = counter.get(c, 0) + 1
# find the lowest value
min_ = min(counter.values())
# create a list of all characters where the count matches the previously calculated minimum
lmin = [k for k, v in counter.items() if v == min_]
# print the results
print('Least occuring character : ', end='')
print(*lmin, sep=', ')
Output:
Least occuring character : a, b, c
You are very close!
If you have min value why not just iterate over your dictionary and check all keys that values are the min one?
for k, v in frequency_dict.items():
if v == least_occurring:
print(k)
So I want to create a histogram.
Here is my code:
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def print_hist(h):
for c in h:
print c, h[c]
It give me this:
>>> h = histogram('parrot')
>>> print_hist(h)
a 1
p 1
r 2
t 1
o 1
But I want this:
a: 1
o: 1
p: 1
r: 2
t: 1
So how can I get my histogram in alphabetical order, be case sensitive (so "a" and "A" are the same), and list the whole alphabet (so letters that are not in the string just get a zero)?
Use an ordered dictionary which store keys in the order they were put in.
from collections import OrderedDict
import string
def count(s):
histogram = OrderedDict((c,0) for c in string.lowercase)
for c in s:
if c in string.letters:
histogram[c.lower()] += 1
return histogram
for letter, c in count('parrot').iteritems():
print '{}:{}'.format(letter, c)
Result:
a:1
b:0
c:0
d:0
e:0
f:0
g:0
h:0
i:0
j:0
k:0
l:0
m:0
n:0
o:1
p:1
q:0
r:2
s:0
t:1
u:0
v:0
w:0
x:0
y:0
z:0
Just use collections.Counter for this, unless you really want your own:
>>> import collections
>>> c = collections.Counter('parrot')
>>> sorted(c.items(), key=lambda c: c[0])
[('a', 1), ('o', 1), ('p', 1), ('r', 2), ('t', 1)]
EDIT: As commenters pointed out, your last sentence indicates you want data on all the letters of the alphabet that do not occur in your word. Counter is good for this also since, as the docs indicate:
Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError.
So you can just iterate through something like string.ascii_lowercase:
>>> import string
>>> for letter in string.ascii_lowercase:
... print('{}: {}'.format(letter, c[letter]))
...
a: 1
b: 0
c: 0
d: 0
e: 0
f: 0
g: 0
h: 0
i: 0
j: 0
k: 0
l: 0
m: 0
n: 0
o: 1
p: 1
q: 0
r: 2
s: 0
t: 1
u: 0
v: 0
w: 0
x: 0
y: 0
z: 0
Finally, rather than implementing something complicated to merge the results of upper- and lowercase letters, just normalize your input first:
c = collections.Counter('PaRrOt'.lower())
A trivial answer would be:
import string
for letter in string.ascii_lowercase:
print letter, ': ', h.lower().count(letter)
(highly inefficient as you go through the string 26 times)
Can also use a Counter
from collections import Counter
import string
cnt = Counter(h.lower())
for letter in string.ascii_lowercase:
print letter, ': ', cnt[letter]
Quite neater.
If you want it ordered then you are going to have to use an ordereddictionary
You also are going to need to order the letters before you add them to the dictionary
It is not clear to me I think you want a case insensitive result so we need to get all letters in one case
from collections import OrderedDict as od
import string
def histogram(s):
first we need to create the dictionary that has all of the lower case letters
we imported string which will provide us a list but I think it is all lowercase including unicode so we need to only use the first 26 in string.lowercase
d = od()
for each_letter in string.lowercase[0:26]:
d[each_letter] = 0
Once the dictionary is created then we just need to iterate through the word after it has been lowercased. Please note that this will blow up with any word that has a number or a space. You may or may not want to test or add numbers and spaces to your dictionary. One way to keep it from blowing up is to try to add a value. If the value is not in the dictionary just ignore it.
for c in s.lower():
try:
d[c] += 1
except ValueError:
pass
return d
If you want to list the whole (latin only) alphabet anyway, you could use a list of length 26:
hist = [0] * 26
for c in s.lower():
hist[orc(c) - ord('a')] += 1
To get the desired output:
for x in range(26):
print chr(x), ":", hist[x]
Check this function for your output
def print_hist(h):
for c in sorted(h):
print c, h[c]
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I am looking for a python program that counts the frequencies of each word in a text, and output each word with its count and line numbers where it appears.
We define a word as a contiguous sequence of non-white-space characters. (hint: split())
Note: different capitalizations of the same character sequence should be considered same word, e.g. Python and python, I and i.
The input will be several lines with the empty line terminating the text. Only alphabet characters and white spaces will be present in the input.
The output is formatted as follows:
Each line begins with a number indicating the frequency of the word, a white space, then the word itself, and a list of line numbers containing this word.
Sample Input
Python is a cool language but OCaml
is even cooler since it is purely functional
Sample Output
3 is 1 2
1 a 1
1 but 1
1 cool 1
1 cooler 2
1 even 2
1 functional 2
1 it 2
1 language 1
1 ocaml 1
1 purely 2
1 python 1
1 since 2
PS.
I am not a student I am learning Python on my own..
Using collections.defaultdict, collections.Counter and string formatting:
from collections import Counter, defaultdict
data = """Python is a cool language but OCaml
is even cooler since it is purely functional"""
result = defaultdict(lambda: [0, []])
for i, l in enumerate(data.splitlines()):
for k, v in Counter(l.split()).items():
result[k][0] += v
result[k][1].append(i+1)
for k, v in result.items():
print('{1} {0} {2}'.format(k, *v))
Output:
1 since [2]
3 is [1, 2]
1 a [1]
1 it [2]
1 but [1]
1 purely [2]
1 cooler [2]
1 functional [2]
1 Python [1]
1 cool [1]
1 language [1]
1 even [2]
1 OCaml [1]
If the order matters, you can sort the result this way:
items = sorted(result.items(), key=lambda t: (-t[1][0], t[0].lower()))
for k, v in items:
print('{1} {0} {2}'.format(k, *v))
Output:
3 is [1, 2]
1 a [1]
1 but [1]
1 cool [1]
1 cooler [2]
1 even [2]
1 functional [2]
1 it [2]
1 language [1]
1 OCaml [1]
1 purely [2]
1 Python [1]
1 since [2]
Frequency tabulations are often best solved with a counter.
from collections import Counter
word_count = Counter()
with open('input', 'r') as f:
for line in f:
for word in line.split(" "):
word_count[word.strip().lower()] += 1
for word, count in word_count.iteritems():
print "word: {}, count: {}".format(word, count)
Ok, so you've already identified split to turn your string into a list of words. You want to list the lines on which each word occurs, however, so you should split the string first into lines, then into words. Then, you can create a dictionary, where keys are the words (put to lowercase first), and the values can be a structure containing the number of occurrences and the lines of occurrence.
You may also want to put in some code to check whether something is a valid word (e.g. whether it contains numbers), and to sanitise a word (remove punctuation). I'll leave these up to you.
def wsort(item):
# sort descending by count, then ascending alphabetically
word, freq = item
return -freq['count'], word
def wfreq(str):
words = {}
# split by line, then by word
lines = [line.split() for line in str.split('\n')]
for i in range(len(lines)):
for word in lines[i]:
# if the word is not in the dictionary, create the entry
word = word.lower()
if word not in words:
words[word] = {'count':0, 'lines':set()}
# update the count and add the line number to the set
words[word]['count'] += 1
words[word]['lines'].add(i+1)
# convert from a dictionary to a sorted list using wsort to give the order
return sorted(words.iteritems(), key=wsort)
inp = "Python is a cool language but OCaml\nis even cooler since it is purely functional"
for word, freq in wfreq(inp):
# generate the desired list format
lines = " ".join(str(l) for l in list(freq['lines']))
print "%i %s %s" % (freq['count'], word, lines)
This should provide the exact same output as in your sample:
3 is 1 2
1 a 1
1 but 1
1 cool 1
1 cooler 2
1 even 2
1 functional 2
1 it 2
1 language 1
1 ocaml 1
1 purely 2
1 python 1
1 since 2
First of all find all the words that are present in the text. Using split().
In case the text is present in a file, then we will first get it into a string, and all it text. also remove all the \n from the text.
filin=open('file','r')
di = readlines(filin)
text = ''
for i in di:
text += i</pre></code>
now check the number of times each word is there in the text. we will deal with the line numbers later.
dicts = {}
for i in words_list:
dicts[i] = 0
for i in words_list:
for j in range(len(text)):
if text[j:j+len(i)] == i:
dicts[i] += 1
now we have a dictionary with the words as keys and the values being the mumber of times the word appears in the text.
now for the line numbers:
dicts2 = {}
for i in words_list:
dicts2[i] = 0
filin.seek(0)
for i in word_list:
filin.seek(0)
count = 1
for j in filin:
if i in j:
dicts2[i] += (count,)
count += 1
now dicts2 has the words as the key and the list of the line numbers it is in as the values. inside a tuple
if in case the data is already in a string, you just need to remove all the \ns.
di = split(string_containing_text,'\n')
and everything else will be the same.
i am sure you can format the output.
If there's a window list
text='abcdefg'
window_list=[text[i:i+3] for i in range(len(text)-3)]
print window_list
['abc', 'bcd', 'cde', 'def']
for i in window_list:
for j,k in zip(range(len(text)),i):
print j,k
0 a
1 b
2 c
0 b
1 c
2 d
0 c
1 d
2 e
0 d
1 e
2 f
i'm trying to make it so when
(j==0 and k=='c') and (j==1 and k=='d') and (j==2 and k=='e')
it would give me the starts and ending position where that occurs on the string text
so it would give me
[2-4]
Have you thought to do it in this way?
>>> text='abcdefg'
>>> window_list=[text[i:i+3] for i in range(len(text)-3)]
>>> ["-".join([str(i),str(i+len(w))]) for i,w in enumerate(window_list) if w == 'cde'] #for single item
['2-5']
>>> ["-".join([str(i),str(i+len(w))]) for i,w in enumerate(window_list) if w in ['cde','def']] # for multiple items
['2-5', '3-6']
>>>
Note: enumerate the list and search for those items which matches the condition. Return the index followed by the end position (which is index + length of the sub-sequence). Please note, the result would be a string rather than what you are expecting.
import re
seq = 'abcdefabcdefabcdefg'
for match in re.finditer('abc', seq):
print match.start(), match.end()
Logic:
All you need to do is find if your pattern is in the doc or not. This can be done easily using Python in built string "find" function. Once you find the start position of your string, then all you need to do is add the length of your pattern to get end position. thats it! job done ;)
The Code:
text = "abcdefghifjklmnopqrstuvwxyz"
start_position = text.find("abc")
if(start_position>-1):
end_position = start_position+len(start_position) - 1
else:
print "Pattern not found"
print start_position, "-", end_position
The Output:
0 - 2
Reference:
Check Official Python String Functions Documentation