So I wrote a function that analyzes a text file and returns the text as a list excluding several characters like ('\n',' ','!','.','#','#')
I tried to program my code and used a sample text file filename which says I love Computer Science sooooooooooooooo much!!!!!
Now I expect my output to look like this...
['I', 'l', 'o', 'v', 'e', 'C', 'o', 'm', 'p', 'u', 't', 'e', 'r', 'S', 'c', 'i', 'e', 'n', 'c', 'e', 's', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'm', 'u', 'c', 'h']
but my output returns
['I', 'l', 'o', 'v', 'e', 'C', 'o', 'm', 'p', 'u', 't', 'e', 'r', 'S', 'c', 'i', 'e', 'n', 'c', 'e', 's', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'm', 'u', 'c', 'h', '!', '!']
but my code is programmed to remove the two '!' at the end....
What should I change in my code???
Here is my code btw...
def reverse(filename):
s = open(filename, 'r')
content = s.read()
g = list(content)
for x in g:
if x in ('\n',' ','!','.','#','#'):
g.remove(x)
return g
You can't modify a list while you are iterating through it. Well, you can, but it gets the iterator pointers screwed up. The right answer is to create a new list with the things you want to keep. And you don't have to convert the file to a list in order to iterate its contents. Strings are iterables, just like lists.
def reverse(filename):
s = open(filename, 'r')
g = []
for c in s.read():
if c not in '\n !.##':
g.append(c)
return g
Or:
def reverse(filename):
s = open(filename, 'r')
return [c for c in s.read() if c not in '\n !.##']
You can't iterate the list while removing the item from the same list because the index get changed so the best way is to iterate the list in reverse order
see the solution is :
def reverse(filename):
s = open(filename, 'r')
content = s.read()
g = list(content)
for x in reversed(g):
if x in ('\n',' ','!','.','#','#'):
g.remove(x)
return g
OUTPUT :
['I','l','o','v','e','C','o','m','p','u','t','e','r','S','c','i','e','n','c','e','s','o','o','o','o','o','o','o','o','o','o','o','o','o','o','o','m', 'u','c','h']
As Tim's answer says, the principled way to handle this is create a new list with the things you want to keep. That said, you might find it interesting that it is technically possible to remove items from the list without messing up the iterator pointer if you traverse the list backwards.
So with that said, here's a modification of your code that doesn't generate a separate list.
def reverse(filename):
s = open(filename, 'r')
content = s.read()
g = list(content)
for k in range(len(g))[::-1]:
if g[k] in ('\n',' ','!','.','#','#'):
g.pop(k)
return g
Note that unlike the "remove" function, the "pop" function does not require searching through the list when implemented.
Alternatively, as long as you're using the "remove" function, you can use the fact that the remove function searches through the list to avoid searching through the list "from scratch". Consider the following:
def reverse(filename):
s = open(filename, 'r')
content = s.read()
g = list(content)
for c in ('\n',' ','!','.','#','#'):
while c in g:
g.remove(c)
return g
Notably, the c in g check for the while condition consists of an extra search. We can avoid this by handling the exception of the remove function in the case that the character isn't found.
def reverse(filename):
s = open(filename, 'r')
content = s.read()
g = list(content)
for c in ('\n',' ','!','.','#','#'):
while True:
try:
g.remove(c)
except(ValueError):
break
return g
Might be more efficient to remove with str.replace:
def reverse(filename):
with open(filename) as f:
s = f.read()
for c in '\n !.##':
s = s.replace(c, '')
return list(s)
That goes over the string a few times, but such string methods are very fast.
Related
I'm trying to replace the characters of the reversed alphabet with those of the alphabet. This is what I've got:
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
rev_alphabet = alphabet[::-1]
sample = "wrw blf hvv ozhg mrtsg'h vkrhlwv?"
def f(alph, rev_alph):
return (alph, rev_alph)
char_list_of_tups = list(map(f, alphabet, rev_alphabet))
for alph, rev_alph in char_list_of_tups:
sample = sample.replace(rev_alph, alph)
print(sample)
expected output: did you see last night's episode?
actual output: wrw you svv ozst nrtst's vprsowv?
I understand that I'm printing the last "replacement" of the whole iteration. How can I avoid this without appending it to a list and then running into problems with the spacing of the words?
Your problem here is that you lose data as you perform each replacement; for a simple example, consider an input of "az". On the first replacement pass, you replace 'z' with 'a', and now have "aa". When you get to replacing 'a' with 'z', it becomes "zz", because you can't tell the difference between an already replaced character and one that's still unchanged.
For single character replacements, you want to use the str.translate method (and the not strictly required, but useful helper function, str.maketrans), to do character by character transliteration across the string in a single pass.
from string import ascii_lowercase # No need to define the alphabet; Python provides it
# You can use the original str form, no list needed
# Do this once up front, and reuse it for as many translate calls as you like
trans_map = str.maketrans(ascii_lowercase[::-1], ascii_lowercase)
sample = sample.translate(trans_map)
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
# or
alphabet = [chr(97 + i) for i in range(0,26)]
sample = "wrw blf hvv ozhg mrtsg'h vkrhlwv?"
res = []
for ch in sample:
if ch in alphabet:
res.append(alphabet[-1 - alphabet.index(ch)])
else:
res.append(ch)
print("".join(res))
Another Way if you are ok with creating a new string instead.
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
dictRev = dict(zip(alphabet, alphabet[::-1]))
sample = "wrw blf hvv ozhg mrtsg'h vkrhlwv?"
s1="".join([dictRev.get(char, char) for char in sample])
print(s1)
"did you see last night's episode?"
I've been experimenting with itertools, combinations and enchant to find all possible (English) words from a list of characters, up to a set (x) amount of words, with no character limit for each word. Can't seem to find/create what I am looking for. Not looking for a handout or freebie, just genuinely stuck on a DnD cipher my friend passed along to me.
Basically, if I have:
char_list = ['i', 't', 'c', 'r', 'r', 's', 'f', 'o', 'k', 'p', 'a', 'e', 'u', 'a']
I'm trying to print:
possible_combos = [["xxx", "xxx", "xxx", "xxx"], ...]
Please don't laugh, but this is what I've been working with. I know it's not right, but I'm having a really hard time understanding exactly what I'm missing.
import itertools
lst = ['i', 't', 'c', 'r', 'r', 's', 'f', 'o', 'k', 'p', 'a', 'e', 'u', 'a']
combinatorics = itertools.product([True, False], repeat=len(lst) - 1)
solution = []
for combination in combinatorics:
i = 0
one_such_combination = [lst[i]]
for slab in combination:
i += 1
if not slab: # there is a join
one_such_combination[-1] += lst[i]
else:
one_such_combination += [lst[i]]
solution.append(one_such_combination)
print(solution)
I have a long list of words that I'm trying to go through and if the word contains a specific character remove it. However, the solution I thought would work doesn't and doesn't remove any words
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
validwords = []
for i in l3:
for x in firstchect:
if i not in x:
validwords.append(x)
continue
else:
break
If a word from firstcheck has a character from l3 I want it removed or not added to this other list. I tried it both ways. Can anyone offer insight on what could be going wrong? I'm pretty sure I could use some list comprehension but I'm not very good at that.
The accepted answer makes use of np.sum which means importing a huge numerical library to perform a simple task that the Python kernel can easily do by itself:
validwords = [w for w in firstcheck if all(c not in w for c in l3)]
you can use a list comprehension:
import numpy as np
[w for w in firstcheck if np.sum([c in w for c in l3])==0]
It seems all the words contain at least 1 char from l3 and the output of above is an empty list.
If firstcheck is defined as below:
firstcheck = ['a', 'z', 'poach', 'omnificent']
The code should output:
['a', 'z']
If you want to avoid all loops etc, you can use re directly.
import re
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['azz', 'poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
# Create a regex string to remove.
strings_to_remove = "[{}]".format("".join(l3))
validwords = [x for x in firstcheck if re.sub(strings_to_remove, '', x) == x]
print(validwords)
Output:
['azz']
Ah, there was some mistake in code, rest was fine:
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['aza', 'ca', 'poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
validwords = []
flag=1
for x in firstcheck:
for i in l3:
if i not in x:
flag=1
else:
flag=0
break
if(flag==1):
validwords.append(x)
print(validwords)
So, here the first mistake was, the for loops, we need to iterate through words first then, through l3, to avoid the readdition of elements.
Next, firstcheck spelling was wrong in 'for x in firstcheck` due to which error was there.
Also, I added a flag, such that if flag value is 1 it will add the element in validwords.
To, check I added new elements as 'aza' and 'ca', due to which, now it shows correct o/p as 'aza' and 'ca'.
Hope this helps you.
So I am supposed to make a script that asks user to make a sentence then discard all characters but lower case and print the lower case letters like this ['m', 'y', 'p', 'a', 's', 's', 'w', 'o', 'r', 'd'].
My script:
#!/usr/bin/python3
sentence = input("Enter a sentence: ")
for letter in sentence:
if letter.islower():
print(letter)
and this is the output:
o
e
s
h
i
s
w
r
k
Seems like you want to produce a list, you have list comprehensions to make life easy:
l = ['P', 'm', 'y', 'H', 'p', 'a', 's', 's', 'w', 'o', 'r', 'd']
out = [i for i in l if i.islower()]
print(out)
# ['m', 'y', 'p', 'a', 's', 's', 'w', 'o', 'r', 'd']
Which is equivalent to:
out = []
for i in l:
if i.islower():
out.append(i)
print(out)
# ['m', 'y', 'p', 'a', 's', 's', 'w', 'o', 'r', 'd']
You might be looking for end = ",":
sentence = input("Enter a sentence: ")
for letter in sentence:
if letter.islower():
print(letter, end=",")
# ^^^
Your program is almost OK, only instead of printing every lowercase character, append it to a list, and finally print only that list:
sentence = input("Enter a sentence: ")
lowercases = [] # prepare an empty list
for letter in sentence:
if letter.islower():
lowercases.append(letter)
print(lowercases) # print the filled list
Test:
Enter a sentence: The End of Universe.
['h', 'e', 'n', 'd', 'o', 'f', 'n', 'i', 'v', 'e', 'r', 's', 'e']
I am new to NLP and NLTK, and I want to find ambiguous words, meaning words with at least n different tags. I have this method, but the output is more than confusing.
Code:
def MostAmbiguousWords(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
if wordsUniqeTags.has_key(w):
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
else:
wordsUniqeTags[w] = set([t])
# Starting to count
res = []
for w in wordsUniqeTags:
if len(wordsUniqeTags[w]) >= n:
res.append((w, wordsUniqeTags[w]))
return res
MostAmbiguousWords(brown.tagged_words(), 13)
Output:
[("what's", set(['C', 'B', 'E', 'D', 'H', 'WDT+BEZ', '-', 'N', 'T', 'W', 'V', 'Z', '+'])),
("who's", set(['C', 'B', 'E', 'WPS+BEZ', 'H', '+', '-', 'N', 'P', 'S', 'W', 'V', 'Z'])),
("that's", set(['C', 'B', 'E', 'D', 'H', '+', '-', 'N', 'DT+BEZ', 'P', 'S', 'T', 'W', 'V', 'Z'])),
('that', set(['C', 'D', 'I', 'H', '-', 'L', 'O', 'N', 'Q', 'P', 'S', 'T', 'W', 'CS']))]
Now I have no idea what B,C,Q, ect. could represent. So, my questions:
What are these?
What do they mean? (In case they are tags)
I think they are not tags, because who and whats don't have the WH tag indicating "wh question words".
I'll be happy if someone could post a link that includes a mapping of all possible tags and their meaning.
It looks like you have a typo. In this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
you should have set([t]) (not set(t)), like you do in the else case.
This explains the behavior you're seeing because t is a string and set(t) is making a set out of each character in the string. What you want is set([t]) which makes a set that has t as its element.
>>> t = 'WHQ'
>>> set(t)
set(['Q', 'H', 'W']) # bad
>>> set([t])
set(['WHQ']) # good
By the way, you can correct the problem and simplify things by just changing that line to:
wordsUniqeTags[w].add(t)
But, really, you should make use of the setdefault method on dict and list comprehension syntax to improve the method overall. So try this instead:
def most_ambiguous_words(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
wordsUniqeTags.setdefault(w, set()).add(t)
# Starting to count
return [(word,tags) for word,tags in wordsUniqeTags.iteritems() if len(tags) >= n]
You are splitting your POS tags into single characters in this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
set('AT') results in set(['A', 'T']).
How about making use of the Counter and defaultdict functionality in the collections module?
from collection import defaultdict, Counter
def most_ambiguous_words(words, n):
counts = defaultdict(Counter)
for (word,tag) in words:
counts[word][tag] += 1
return [(w, counts[w].keys()) for w in counts if len(counts[word]) > n]