Python code to compute anagrams without reptetitions

Python code to compute anagrams without reptetitions - python

I am solving a python exercise that asks me to find all the possible anagrams of a word but excluding the anagrams that occur more then once.
Here is the code I have come up with:
from itertools import permutations
seq = getString('Insert ')
seq = permutations(seq) # this let us get all anagrams WITH repetitions
unique = []
unique2=[]
for x in seq:
unique.append(x)
i=0
while i < len(unique):
if unique[i] in unique[i+1:]:
i=i+1
continue
if unique[i] not in unique[i+1:]:
unique2.append(unique[i])
i=i+1
What the while does is basically this: consider the ith-anagram and check if in the next part of the list there is an anagram that is identical to the one we are checking. If yes, increase the counter "i" and go on, otherwise append that anagram (that we are sure it does not have repetitions) to another new list.
This goes on up until we reach an i that has a lenght equal to the lenght of the list of permutations with repetitions.

Related

List index out of range in indented for loops

I am trying to make a program that searches a list of strings, using a second list. The first list is a list of random jumbled up letters and the second is a list of words. The program is supposed to detect whether a word in the each word in the second list can be found in any of the strings in the first. If it is, then the word that was found is added to an empty, third list.
counter = 0
for a in FirstList:
for b in SecondList:
if SecondList[counter] in FirstList[counter]:
ThirdList += [SecondList[counter]]
counter += 1
if counter >= len(SecondList):
break
return ThirdList
However, it throws up an error on the first if statement claiming that the list index is out of range. I don't quite understand what I am missing, because I am not editing the contents of any of the lists that I am iterating through, which is what the reason for this error was in other posts about the same error.

The issue here is you are you using a counter which should break at the size of SecondList, but if the FirstList is smaller "counter" will be out of bounds for the FirstList.
I think this may be closer to what you are trying to achieve:
ThirdList = [word for word in SecondList for string in FirstList if word in string]
This is a list comprehension which is running through both lists and outputting any values from the second list which appear in any of the values in the first list.
So for example with the code:
FirstList = ["aadada", "asdtest", "hasdk"]
SecondList = ["test", "data"]
ThirdList = [word for word in SecondList for string in FirstList if word in string]
print(ThirdList)
You would receive the output: "test" as test is found in the FirstList.

You should not iterate over the two lists. Your counter is exceding the amount of elements in both lists. If I understand well what you are trying to do, you could use the following code:
ThirdList = []
for word in SecondList:
for random_word in FirstList:
if word in random_word and word not in ThirdList:
ThirdList.append(word)
return ThirdList
You can remove the second condition of the if statement if you have no problem with duplicate elements in ThirdList.

Time limit exceeded error. Word Ladder leetcode

I am trying to solve leetcode problem(https://leetcode.com/problems/word-ladder/description/):
Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Note:
Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.
You may assume no duplicates in the word list.
You may assume beginWord and endWord are non-empty and are not the same.
Input:
beginWord = "hit",
endWord = "cog",
wordList = ["hot","dot","dog","lot","log","cog"]
Output:
5
Explanation:
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" ->
"cog", return its length 5.
import queue
class Solution:
def isadjacent(self,a, b):
count = 0
n = len(a)
for i in range(n):
if a[i] != b[i]:
count += 1
if count > 1:
return False
if count == 1:
return True
def ladderLength(self,beginWord, endWord, wordList):
word_queue = queue.Queue(maxsize=0)
word_queue.put((beginWord,1))
while word_queue.qsize() > 0:
queue_last = word_queue.get()
index = 0
while index != len(wordList):
if self.isadjacent(queue_last[0],wordList[index]):
new_len = queue_last[1]+1
if wordList[index] == endWord:
return new_len
word_queue.put((wordList[index],new_len))
wordList.pop(index)
index-=1
index+=1
return 0
Can someone suggest how to optimise it and prevent the error!

The basic idea is to find the adjacent words faster. Instead of considering every word in the list (even one that has already been filtered by word length), construct each possible neighbor string and check whether it is in the dictionary. To make those lookups fast, make sure the word list is stored in something like a set that supports fast membership tests.
To go even faster, you could store two sorted word lists, one sorted by the reverse of each word. Then look for possibilities involving changing a letter in the first half in the reversed list and for the latter half in the normal list. All the existing neighbors can then be found without making any non-word strings. This can even be extended to n lists, each sorted by omitting one letter from all the words.

Intro to Python - Lists questions

we've started doing Lists in our class and I'm a bit confused thus coming here since previous questions/answers have helped me in the past.
The first question was to sum up all negative numbers in a list, I think I got it right but just want to double check.
import random
def sumNegative(lst):
sum = 0
for e in lst:
if e < 0:
sum = sum + e
return sum
lst = []
for i in range(100):
lst.append(random.randrange(-1000, 1000))
print(sumNegative(lst))
For the 2nd question, I'm a bit stuck on how to write it. The question was:
Count how many words occur in a list up to and including the first occurrence of the word “sap”. I'm assuming it's a random list but wasn't given much info so just going off that.
I know the ending would be similar but no idea how the initial part would be since it's string opposed to numbers.
I wrote a code for a in-class problem which was to count how many odd numbers are on a list(It was random list here, so assuming it's random for that question as well) and got:
import random
def countOdd(lst):
odd = 0
for e in lst:
if e % 2 = 0:
odd = odd + 1
return odd
lst = []
for i in range(100):
lst.append(random.randint(0, 1000))
print(countOdd(lst))
How exactly would I change this to fit the criteria for the 2nd question? I'm just confused on that part. Thanks.

The code to sum -ve numbers looks fine! I might suggest testing it on a list that you can manually check, such as:
print(sumNegative([1, -1, -2]))
The same logic would apply to your random list.
A note about your countOdd function, it appears that you are missing an = (== checks for equality, = is for assignment) and the code seems to count even numbers, not odd. The code should be:
def countOdd(lst):
odd = 0
for e in lst:
if e%2 == 1: # Odd%2 == 1
odd = odd + 1
return odd
As for your second question, you can use a very similar function:
def countWordsBeforeSap(inputList):
numWords = 0
for word in inputList:
if word.lower() != "sap":
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList))
To explain the above, the countWordsBeforeSap function:
Starts iterating through the words.
If the word is anything other than "sap" it increments the counter and continues
If the word IS "sap" then it returns early from the function
The function could be more general by passing in the word that you wanted to check for:
def countWordsBefore(inputList, wordToCheckFor):
numWords = 0
for word in inputList:
if word.lower() != wordToCheckFor:
numWords = numWords + 1
else:
return numWords
inputList = ["trees", "produce", "sap"]
print(countWordsBeforeSap(inputList, "sap"))
If the words that you are checking come from a single string then you would initially need to split the string into individual words like so:
inputString = "Trees produce sap"
inputList = inputString.split(" ")
Which splits the initial string into words that are separated by spaces.
Hope this helps!
Tom

def count_words(lst, end="sap"):
"""Note that I added an extra input parameter.
This input parameter has a default value of "sap" which is the actual question.
However you can change this input parameter to any other word if you want to by
just doing "count_words(lst, "another_word".
"""
words = []
# First we need to loop through each item in the list.
for item in lst:
# We append the item to our "words" list first thing in this loop,
# as this will make sure we will count up to and INCLUDING.
words.append(item)
# Now check if we have reached the 'end' word.
if item == end:
# Break out of the loop prematurely, as we have reached the end.
break
# Our 'words' list now has all the words up to and including the 'end' variable.
# 'len' will return how many items there are in the list.
return len(words)
lst = ["something", "another", "woo", "sap", "this_wont_be_counted"]
print(count_words(lst))
Hope this helps you understand lists better!

You can make effective use of list/generator comprehensions. Below are fast and memory efficient.
1. Sum of negatives:
print(sum( i<0 for i in lst))
2. Count of words before sap: Like you sample list, it assumes no numbers are there in list.
print(lst.index('sap'))
If it's a random list. Filter strings. Find Index for sap
l = ['a','b',1,2,'sap',3,'d']
l = filter(lambda x: type(x)==str, l)
print(l.index('sap'))
3. Count of odd numbers:
print(sum(i%2 != 0 for i in lst))

How to count number of occurences of permutation (overlapping) in large text in python3?

I am having a list of words and I'd like to find out how many time each permutation occurs in this list of word.
And I'd like to count overlapping permutation also. So count() doesn't seem to be appropriate.
for example: the permutation aba appears twice in this string:
ababa
However count() would say one.
So I designed this little script, but I am not too sure that is efficient. The array of word is an external file, I just removed this part to make it simplier.
import itertools
import itertools
#Occurence counting function
def occ(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
#permutation generator
abc="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
permut = [''.join(p) for p in itertools.product(abc,repeat=2)]
#Transform osd7 in array
arrayofWords=['word1',"word2","word3","word4"]
dict_output['total']=0
#create the array
for perm in permut:
dict_output[perm]=0
#iterate over the arrayofWords and permutation
for word in arrayofWords:
for perm in permut:
dict_output[perm]=dict_output[perm]+occ(word,perm)
dict_output['total']=dict_output['total']+occ(word,perm)
It is working, but it takes looonnnggg time. If I change, product(abc,repeat=2) by product(abc,repeat=3) or product(abc,repeat=4)... It will take a full week!
The question: Is there a more efficient way?

Very simple: count only what you need to count.
from collections import defaultdict
quadrigrams = defaultdict(lambda: 0)
for word in arrayofWords:
for i in range(len(word) - 3):
quadrigrams[word[i:i+4]] += 1

You can use re module to count overlapping match.
import re
print len(re.findall(r'(?=(aba))','ababa'))
Output:
2
More generally,
print len(re.findall(r'(?=(<pattern>))','<input_string>'))

How to reference the next item in a list in Python?

I'm fairly new to Python, and am trying to put together a Markov chain generator. The bit that's giving me problems is focused on adding each word in a list to a dictionary, associated with the word immediately following.
def trainMarkovChain():
"""Trains the Markov chain on the list of words, returning a dictionary."""
words = wordList()
Markov_dict = dict()
for i in words:
if i in Markov_dict:
Markov_dict[i].append(words.index(i+1))
else:
Markov_dict[i] = [words.index(i+1)]
print Markov_dict
wordList() is a previous function that turns a text file into a list of words. Just what it sounds like. I'm getting an error saying that I can't concatenate strings and integers, referring to words.index(i+1), but if that's not how to refer to the next item then how is it done?

You can also do it as:
for a,b in zip(words, words[1:]):
This will assign a as an element in the list and b as the next element.

The following code, simplified a bit, should produce what you require. I'll elaborate more if something needs explaining.
words = 'Trains the Markov chain on the list of words, returning a dictionary'.split()
chain = {}
for i, word in enumerate(words):
# ensure there's a record
next_words = chain.setdefault(word, [])
# break on the last word
if i + 1 == len(words):
break
# append the next word
next_words.append(words[i + 1])
print(words)
print(chain)
assert len(chain) == 11
assert chain['the'] == ['Markov', 'list']
assert chain['dictionary'] == []

def markov_chain(list):
markov = {}
for index, i in enumerate(list):
if index<len(list)-1:
markov[i]=list[index+1]
return (markov)
The code above takes a list as an input and returns the corresponding markov chain as a dictionary.

You can use loops to get that, but it's actually a waste to have to put the rest of your code in a loop when you only need the next element.
There are two nice options to avoid this:
Option 1 - if you know the next index, just call it:
my_list[my_index]
Although most of the times you won't know the index, but still you might want to avoid the for loop.
Option 2 - use iterators
& check this tutorial
my_iterator = iter(my_list)
next(my_iterator) # no loop required

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python code to compute anagrams without reptetitions - python

Related

List index out of range in indented for loops

Time limit exceeded error. Word Ladder leetcode

Intro to Python - Lists questions

How to count number of occurences of permutation (overlapping) in large text in python3?

How to reference the next item in a list in Python?

Categories

Resources