Find consecutive combinations [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Rolling or sliding window iterator in Python
I'm new to programming and am learning Python. I'm looking for an efficient/pythonic way to solve a problem.
I'd like a function that returns a list of iterables containing the combinations of a parent iterable as long as the elements in the combination appear the same same consecutive order as the original parent iterable.
I'm not sure if "consecutive" if the right word to describe this concept as 'consecutive' typically means, 'the same element repeated.' e.g. [1,1,1], 'aaa', etc...
I mean that given the list [1,2,3,4,5]:
[1,2,3] is consecutive but [1,2,4] is not. (Is there a word for this?)
Here's a function consecutive_combinations() I created and the expected behavior:
def consecutive_combinations(iterable, consec):
begin = 0
chunks = len(iterable) + 1 - consec
return [iterable[x + begin: x + consec] for x in xrange(chunks)]
def test():
t = (1,2,3,4,5)
s = "The quick brown fox jumps over the lazy dog."
CC = consecutive_combinations
assert CC(t, 2) == [(1, 2), (2, 3), (3, 4), (4, 5)]
assert CC(t, 3) == [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
assert CC(t, 4) == [(1, 2, 3, 4), (2, 3, 4, 5)]
assert CC(t, 5) == [(1, 2, 3, 4, 5)]
assert CC(s, 3) == ['The', 'he ', 'e q', ' qu', 'qui', 'uic', 'ick', 'ck ', 'k b', ' br', 'bro', 'row', 'own', 'wn ', 'n f', ' fo', 'fox', 'ox ', 'x j', ' ju', 'jum', 'ump', 'mps', 'ps ', 's o', ' ov', 'ove', 'ver', 'er ', 'r t', ' th', 'the', 'he ', 'e l', ' la', 'laz', 'azy', 'zy ', 'y d', ' do', 'dog', 'og. ']
assert CC('', 3) == []
print "All tests passed!"
test()
Is this an efficient solution? Is there something in itertools or some other pre-built module that would do this sort of thing?

I like the pragmatic zip approach:
n = 3
s = "The quick brown fox jumps over the lazy dog."
zip(*(s[i:] for i in xrange(n)))
It's not super-efficient and it only works for sequences, but often enough it does the job.
The corresponding itertools solution is a pretty straightforward transformation of the above:
from itertools import izip, islice, tee
def slices(iterable, n):
return izip(*(islice(it, i, None) for i, it in enumerate(tee(iterable, n))))
Soo many is...
Nevertheless, this one should work for any iterable (but may be slower for plain sequences like lists or strings).

Your solution is fine. You could however shorten it a little bit. For example:
def subsequences(iterable, length):
return [iterable[i: i + length] for i in xrange(len(iterable) - length + 1)]

Related

checking for a down diagonal win in tic tac toe?

sorry if this is really basic! i'm in first year computer science.
i am trying to write a function to check if there is a win on a tictactoe board of NxN size (so i can't hardcode any values); the win has to be from the top left, to the bottom right.
i've already written a function for the upwards diagonal, so i'ved based it around that, but with a list like this: [ [ 'X' , ' ' ] , [ ' ' , ' ' ] ] the function returns True - which it definitely should not.
here's what i have right now, but i have tried many things:
#these can be anything but here's just an example of what i'm using to test
cur = [['X',' '],[' ',' ']]
player= 'X'
def check_down_diag(cur, player):
columncount = 0
for row in cur:
if row[columncount] != player:
columncount += 1
return False
else:
return True
The first step to figure out is the logic. You have a player symbol (here 'X') and a Tic-Tac-Toe size ('N', usually 3 for TTT). So to see if there is a diagonal winner, you would check the two diagonals:
(0, 0), (1, 1), (2, 2), ... (N-1, N-1)
and:
(N-1, 0), (N-2, 1), (N-3, 2) ... (0, N-1)
which are the indeces into the fields to check.
So now you know the two sets of locations to check, and you know which symbol to check for. So write a function which accepts an N for size/dimension and a symbol for which player, and you should be good to go. You can check for both diagonals in one loop, or you can check them in separate loops to start. If you do one loop, you must go through the whole loop once until both checks fail. If you do one loop/diagonal at a time, you should break from the loop as soon as you find a single failure condition, to improve performance.
A while back I wrote a function for pattern finding in 2d arrays specifically for something like this:
def find_pattern(array, pattern, value):
for row in range(len(array)):
for column in range(len(array[0])):
try:
for y, x in pattern:
_value = array[y + row][x + column]
if value != _value:
break
else:
return pattern, (pattern[0][0] + row, pattern[0][1] + column)
except IndexError:
break
return None
The patterns are formed like so (these specific ones are for the standard tic tac toe board (all win conditions)):
win_patterns = [
((0, 0), (0, 1), (0, 2)),
((0, 0), (1, 0), (2, 0)),
((0, 0), (1, 1), (2, 2)),
((0, 2), (1, 1), (2, 0)),
]
The way it works is simple, it is just a tuple of (x, y) coordinates in the grid, starting from 0, so the first pattern would search for straight lines horizontally, second for vertical lines, third and fourth for both diagonals.
Then you can loop over the patterns, pass them to the function along with the 2d array of the board and the value that has to be found in this pattern (' ' values don't mean anything they are there so that the array is more readable):
board = [
['x', ' ', 'x', ' ', ' '],
[' ', 'x', ' ', ' ', ' '],
[' ', ' ', 'x', ' ', 'x'],
[' ', ' ', ' ', ' ', ' '],
[' ', ' ', 'x', ' ', ' '],
]
for pattern in win_patterns:
found = find_pattern(board, pattern, 'x')
if found:
print(found)
break
For the main diagonal, check that all the values that are at the same row and column (i.e. the main diagonal) are equal to the player's symbol:
def check_down_diag1(cur, player):
return all( cur[i][i]==player for i in range(len(cur)) )
For the other diagonal, simply invert one of the dimensions:
def check_down_diag2(cur, player):
return all( cur[i][-1-i]==player for i in range(len(cur)) )
You could also check both diagonals at the same time:
def check_down_diags(cur, player):
return all( cur[i][i]==player for i in range(len(cur)) ) \
or all( cur[i][-1-i]==player for i in range(len(cur)) )

Can someone explain how this loop is printing this tuple?

I'm trying to understand the solutions to question 5 from pythonchallenge, but I don't understand how the for loop is printing that data from the tuple. The solution is from here
Data contains a list of tuples, eg. data = [[(' ', 95)], [(' ', 14), ('#', 5), (' ', 70), ('#', 5), (' ', 1) ...]]
for line in data:
print("".join([k * v for k, v in line]))
What should be printed out is an ASCII graphic made up of '#'.
This one is sneaky. It's a list of lists of tuples. The inner list is a row on the terminal, and each tuple is a character followed by the number of times that
character should be printed.
It looks like it's iterating through the list, and for each tuple,
printing out tuple[0] tuple[1]-times.
It prints '' 95 times, then '' 14 times, then '#' 5 times, etc, inserting newlines
inbetween each inner list.
Consider:
>>> line = [(' ', 3), ('#', 5), (' ', 3), ('#', 5)]
>>> strs = [k * v for k, v in line]
Then:
>>> strs
[' ', '#####', ' ', '#####']
Furthermore:
>>> ''.join(strs)
' ##### #####'

python: tokenize list of tuples without for loop

I have got a list of 2 million tuples with the first element being text and the second an integer. e.g.
list_of_tuples = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
I would like to tokenize the first item in each tuple and attach all of the lists of words to a flattened list so the desired output would be.
list_of_tokenized_tuples = [(['here', 'is', 'some', 'text'], 1), (['this', 'is', 'more', 'text'], 5), (['a', 'final', 'tuple'], 12)]
list_of_all_words = ['here', 'is', 'some', 'text', 'this', 'is', 'more', 'text', 'a', 'final', 'tuple']
So far, I believe that I have found a way to achieve this with a for loop however due to the length of the list, it's really time intensive. Is there any way that I can tokenize the first item in the tuples and/or flatten the list of all words in a way that doesn't involve loops?
list_of_tokenized_tuples = []
list_of_all_words = []
for text, num in list_of_tuples:
tokenized_text = list(word_tokenize(text))
tokenized_tuples = (tokenized_text, num)
list_of_all_words.append(tokenized_text)
list_of_tokenized_tuples.append(tokenized_tuples)
list_of_all_words = [val for sublist in list_of_all_words for val in sublist]
Using itertools you could write it as:
from itertools import chain, imap
chain.from_iterable(imap(lambda (text,_): word_tokenize(text), list_of_tuples))
Testing this:
from itertools import chain, imap
def word_tokenize(text):
return text.split() # insert your tokenizer here
ts = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
print list( chain.from_iterable(imap(lambda (t,_): word_tokenize(t), ts)) )
Output
['here', 'is', 'some', 'text', 'this', 'is', 'more', 'text', 'a', 'final', 'tuple']
I'm not sure what this buys you though as there are for loops in the implementation of the itertools functions.
TL;DR
>>> from itertools import chain
>>> list_of_tuples = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
# Split up your list(str) from the int
>>> texts, nums = zip(*list_of_tuples)
# Go into each string and split by whitespaces,
# Then flatten the list of list of str to list of str
>>> list_of_all_words = list(chain(*map(str.split, texts)))
>>> list_of_all_words
['here', 'is', 'some', 'text', 'this', 'is', 'more', 'text', 'a', 'final', 'tuple']
If you need to use word_tokenize, then:
list_of_all_words = list(chain(*map(word_tokenize, texts)))
I wrote this generator for you. If you want to create a list, there isn't much else you can do (except a list comprehension). With that in mind, please see below, it gives you your desired output but joined within a tuple as two seperate lists. I doubt that matters too much and I'm sure you could always change it a bit to suit your needs or preferences.
import timeit, random
list_of_tuples = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
big_list = [random.choice(list_of_tuples) for x in range(1000)]
def gen(lot=big_list, m='tokenize'):
list_all_words = []
tokenised_words = []
i1 = 0
i2 = 0
i3 = 0
lol1 = len(lot)
while i1 < lol1:
# yield lot[i1]
lol2 = len(lot[i1])
while i2 < lol2:
if type(lot[i1][i2]) == str:
list_all_words.append((lot[i1][i2].split(), i1 + 1))
i2 += 1
i1 += 1
i2 = 0
# print(list_all_words)
lol3 = len(list_all_words)
while i3 < lol3:
tokenised_words += list_all_words[i3][0]
i3 += 1
if m == 'list':
yield list_all_words
if m == 'tokenize':
yield tokenised_words
for x in gen():
print(x)
print(timeit.timeit(gen))
# Output of timeit: 0.2610903770813007
# This should be unnoticable on system resources I would have thought.

Finding the original positions of words in a sentence when the word occurs more than once

I need to find the positions of words in a sentence the user inputs and if the word occurs more than once, only print the first time that word occurs
I have the code so far-
sentence=input("Enter a sentence: ")
sentence=sentence.lower()
words=sentence.split()
place=[]
for c,a in enumerate(words):
if words.count(a)>2 :
place.append(words.index(a+1))
else:
place.append(c+1)
print(sentence)
print(place)
But it prints the positions of the individual words in the sentence rather than repeating a original position of a word that occurs more than once
Can anyone help me with this???
If you are using python 2, then raw_input instead of input else it'll eval. That isn't a problem, just an observation (you're probably using python 3 then, so I'll leave it that way).
You could create a dict to track word counts and positions found. This is basically a dict of lists. The dict being a map of words to a list of positions.
sentence=input("Enter a sentence: ")
sentence=sentence.lower()
words=sentence.split()
place={}
for pos, word in enumerate(words):
try:
place[word].append(pos)
except KeyError:
place[word] = [pos]
print(sentence)
print(place)
Also, if you wanted to do something a little more advanced with your sentence parsing, you could do:
import re
words = re.split('\W+',sentence)
Basically uses all nonalphanumerics (commas, colons, etc) as a split. Just note you can get a blank entry this way (probably at the end).
Your code needs some modifications to achieve what you are trying to do:
if words.count(a)>2 : It should be if words.count(a)>1 since count would be more than 1 if the word is repeated.
place.append(words.index(a+1)) : It should be place.append(words.index(a)+1) since you want to find the index of a and then add 1 to it.
The modified code based on the suggestions:
sentence=input("Enter a sentence: ")
sentence=sentence.lower()
words=sentence.split()
place=[]
for c,a in enumerate(words):
if words.count(a)>1 :
place.append(words.index(a)+1)
else:
place.append(c+1)
print(sentence)
print(place)
Output:
Enter a sentence: "hello world hello people hello everyone"
hello world hello people hello everyone
[1, 2, 1, 4, 1, 6]
Split the string
>>> s = '''and but far and so la ti but'''
>>> s = s.split()
>>> s
['and', 'but', 'far', 'and', 'so', 'la', 'ti', 'but']
use set to find the unique words and use the list.index method to find the first position of each unique word.
>>> map(s.index, set(s))
[0, 5, 2, 1, 4, 6]
zip the result of that with the unique words to associate the word with its position.
>>> zip(set(s),map(s.index, set(s)))
[('and', 0), ('la', 5), ('far', 2), ('but', 1), ('so', 4), ('ti', 6)]
>>>
I suppose a list comprehension might be easier to read;
>>> s = '''and but far and so la ti but'''
>>> s = s.split()
>>> result = [(word, s.index(word)) for word in set(s)]
>>> result
[('and', 0), ('la', 5), ('far', 2), ('but', 1), ('so', 4), ('ti', 6)]
>>>
Sort on position
>>> import operator
>>> position = operator.itemgetter(1)
>>> result.sort(key = position)
>>> result
[('and', 0), ('but', 1), ('far', 2), ('so', 4), ('la', 5), ('ti', 6)]
>>>

Ordered tally of the cumulative number of unique words seen by a given position

I have a list of words given below (example):
['the', 'counter', 'starts', 'the', 'starts', 'for']
I want to process this list in order and generate a pair (x,y) where x is incremented with each word and y is incremented only when it sees a unique word.
So for the given example, my output should be like: [(1,1) (2,2), (3,3) (4,3) (5,3) (6,4)]
I am not sure about how to do this in python. It would be great if i can get some insights on how to do this.
Thanks.
try this:
>>>from collections import Counter
>>>data = ['the', 'counter', 'starts', 'the', 'starts', 'for']
>>>tally=Counter()
>>>for elem in data:
>>> tally[elem] += 1
>>>tally
Counter({'starts': 2, 'the': 2, 'counter': 1, 'for': 1})
from here: http://docs.python.org/2/library/collections.html
Of course, this results in a dictionary not a list. I wouldn't know if there's any way to convert this dict to a list (like some zip function ?)
Hope it might be any help for anyone
>>> words = ['the', 'counter', 'starts', 'the', 'starts', 'for']
>>> uniq = set()
>>> result = []
>>> for i, word in enumerate(words, 1):
uniq.add(word)
result.append((i, len(uniq)))
>>> result
[(1, 1), (2, 2), (3, 3), (4, 3), (5, 3), (6, 4)]
Use collections.Counter for counting occurrences:
I appreciate this doesn't directly answer your question but it presents the canonical, pythonic way to count stuff as a response to the incorrect usage provided in this answer.
from collections import Counter
data = ['the', 'counter', 'starts', 'the', 'starts', 'for']
counter = Counter(data)
The result is a dict-like object that can be accessed via the keys
counter['the']
>>> 2
you can also call Counter.items() to generate an unordered list of (element, count) pairs
counter.items()
>>> [('starts', 2), ('the', 2), ('counter', 1), ('for', 1)]
The output you want is slightly weird, it might be worth re-thinking why you need the data in that format.
Like this:
>>> seen = set()
>>> words = ['the', 'counter', 'starts', 'the', 'starts', 'for']
>>> for x, w in enumerate(words, 1):
... seen.add(w)
... print(x, len(seen))
...
(1, 1)
(2, 2)
(3, 3)
(4, 3)
(5, 3)
(6, 4)
In actual practice, I'd make a generator function to successively yield the tuples, instead of printing them:
def uniq_count(lst):
seen = set()
for w in lst:
seen.add(w)
yield len(seen)
counts = list(enumerate(uniq_count(words), 1))
Note here that I have also separated the logic of the two counts. Since enumerate does just what you need for the first number in each pair, it's easier just to handle the second number in the generator and let enumerate handle the first.
data = ['the', 'counter', 'starts', 'the', 'starts', 'for']
print [(i, len(set(data[:i]))) for i, v in enumerate(data, 1)]
a dictionary mentioned in your comment is created as follows:
data = ['the', 'counter', 'starts', 'the', 'starts', 'for']
print {j: data.count(j) for j in set(data)}

Categories