Sequence Generation with Number applied to string

Sequence Generation with Number applied to string - python

I have tried the Sequence Generator like Lambda, List comprehension and others but it seems that I am not able to get what I really want. My final goal is to print sequence of words from a string like string[1:3]
What I am looking for :
a = [0,13,26,39]
b = [12,25,38,51]
str = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
read = str.split()
read[0:12]
['If', 'you', 'are', 'done', 'with', 'the', 'file,', 'move', 'to', 'the', 'command', 'area']
read[13:25]
['from', 'the', 'file', 'name', 'in', 'the', 'RL', 'screen', 'and', 'type']

Use zip:
>>> a = [0,13,26,39]
>>> b = [12,25,38,51]
>>> strs = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
>>> spl = strs.split()
>>> for x,y in zip(a,b):
... print spl[x:y]
...
['If', 'you', 'are', 'done', 'with', 'the', 'file,', 'move', 'to', 'the', 'command', 'area']
['from', 'the', 'file', 'name', 'in', 'the', 'RL', 'screen', 'and', 'type']
[]
[]
zip returns list of tuples, where each tuple contains items on the same index from the iterables passed to it:
>>> zip(a,b)
[(0, 12), (13, 25), (26, 38), (39, 51)]
Use itertools.izip if you want memory efficient solution, as it returns an iterator.
You can use str.join if you want to create a string from that sliced list:
for x,y in zip(a,b):
print " ".join(spl[x:y])
...
If you are done with the file, move to the command area
from the file name in the RL screen and type
Update: Creating a and b:
>>> n = 5
>>> a = range(0, 13*n, 13)
>>> b = [ x + 12 for x in a]
>>> a
[0, 13, 26, 39, 52]
>>> b
[12, 25, 38, 51, 64]

Do you mean:
>>> [read[i:j] for i, j in zip(a,b)]
[['If', 'you', 'are', 'done', 'with', 'the', 'file,', 'move', 'to', 'the',
'command', 'area'], ['from', 'the', 'file', 'name', 'in', 'the', 'RL',
'screen', 'and', 'type'], [], []]
or
>>> ' '.join[read[i:j] for i, j in zip(a,b)][0])
'If you are done with the file, move to the command area'
>>> ' '.join[read[i:j] for i, j in zip(a,b)][1])
'from the file name in the RL screen and type'

a = [0,13,26,39]
b = [12,25,38,51]
str = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
read = str.split()
extra_lists = [read[start:end] for start,end in zip(a,b)]
print extra_lists

You mentioned a lambda, so:
f = lambda s, i, j: s.split()[i:j]
>>> f("hello world how are you",0,2)
['hello', 'world']
Seems like you're doing the slice indices in two lists, might I suggest a dictionary or a list of tuples?
str = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
slices = [(0, 13), (12, 25)]
dslices = {0:13, 12:25}
for pair in slices:
print f(str, pair[0], pair[1])
for key in dslices:
print f(str, key, dislikes[key])
I'm not a fan of using zip when you have the option of just formatting your data better.

Related

python: tokenize list of tuples without for loop

I have got a list of 2 million tuples with the first element being text and the second an integer. e.g.
list_of_tuples = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
I would like to tokenize the first item in each tuple and attach all of the lists of words to a flattened list so the desired output would be.
list_of_tokenized_tuples = [(['here', 'is', 'some', 'text'], 1), (['this', 'is', 'more', 'text'], 5), (['a', 'final', 'tuple'], 12)]
list_of_all_words = ['here', 'is', 'some', 'text', 'this', 'is', 'more', 'text', 'a', 'final', 'tuple']
So far, I believe that I have found a way to achieve this with a for loop however due to the length of the list, it's really time intensive. Is there any way that I can tokenize the first item in the tuples and/or flatten the list of all words in a way that doesn't involve loops?
list_of_tokenized_tuples = []
list_of_all_words = []
for text, num in list_of_tuples:
tokenized_text = list(word_tokenize(text))
tokenized_tuples = (tokenized_text, num)
list_of_all_words.append(tokenized_text)
list_of_tokenized_tuples.append(tokenized_tuples)
list_of_all_words = [val for sublist in list_of_all_words for val in sublist]

Using itertools you could write it as:
from itertools import chain, imap
chain.from_iterable(imap(lambda (text,_): word_tokenize(text), list_of_tuples))
Testing this:
from itertools import chain, imap
def word_tokenize(text):
return text.split() # insert your tokenizer here
ts = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
print list( chain.from_iterable(imap(lambda (t,_): word_tokenize(t), ts)) )
Output
['here', 'is', 'some', 'text', 'this', 'is', 'more', 'text', 'a', 'final', 'tuple']
I'm not sure what this buys you though as there are for loops in the implementation of the itertools functions.

TL;DR
>>> from itertools import chain
>>> list_of_tuples = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
# Split up your list(str) from the int
>>> texts, nums = zip(*list_of_tuples)
# Go into each string and split by whitespaces,
# Then flatten the list of list of str to list of str
>>> list_of_all_words = list(chain(*map(str.split, texts)))
>>> list_of_all_words
['here', 'is', 'some', 'text', 'this', 'is', 'more', 'text', 'a', 'final', 'tuple']
If you need to use word_tokenize, then:
list_of_all_words = list(chain(*map(word_tokenize, texts)))

I wrote this generator for you. If you want to create a list, there isn't much else you can do (except a list comprehension). With that in mind, please see below, it gives you your desired output but joined within a tuple as two seperate lists. I doubt that matters too much and I'm sure you could always change it a bit to suit your needs or preferences.
import timeit, random
list_of_tuples = [('here is some text', 1), ('this is more text', 5), ('a final tuple', 12)]
big_list = [random.choice(list_of_tuples) for x in range(1000)]
def gen(lot=big_list, m='tokenize'):
list_all_words = []
tokenised_words = []
i1 = 0
i2 = 0
i3 = 0
lol1 = len(lot)
while i1 < lol1:
# yield lot[i1]
lol2 = len(lot[i1])
while i2 < lol2:
if type(lot[i1][i2]) == str:
list_all_words.append((lot[i1][i2].split(), i1 + 1))
i2 += 1
i1 += 1
i2 = 0
# print(list_all_words)
lol3 = len(list_all_words)
while i3 < lol3:
tokenised_words += list_all_words[i3][0]
i3 += 1
if m == 'list':
yield list_all_words
if m == 'tokenize':
yield tokenised_words
for x in gen():
print(x)
print(timeit.timeit(gen))
# Output of timeit: 0.2610903770813007
# This should be unnoticable on system resources I would have thought.

How to get number of occurences of items after certain item in list

I have a list of strings like the one below, written in Python. What I want to do now is to get the number of occurences of the string 'you' after each string 'hello'. The output should then be something like the number 0 for the first two 'hello's , the number 2 for the third 'hello', number 1 for the fourth 'hello' and so on.
Does anyone know how to that exactly?
my_list = ['hello', 'hello', 'hello', 'you', 'you', 'hello', 'you', 'hello',
'you', 'you', 'you', ...]
Update:
Solved it myself, though Karan Elangovans approach also works:
This is how i did it:
list_counter = []
counter = 0
# I reverse the list because the loop below counts the number of
# occurences of 'you' behind each 'hello', not in front of it
my_list_rev = reversed(my_list)
for m in my_list_rev:
if m == 'you':
counter += 1
elif m == 'hello':
list_counter.append(counter)
counter = 0
# reverse the output to match it with my_list
list_counter = list(reversed(list_counter))
print(list_counter)
This outputs:
[0, 0, 2, 1, 3]
for:
my_list = ['hello', 'hello', 'hello', 'you', 'you', 'hello', 'you', 'hello',
'you', 'you', 'you']
Maybe not the best approach, as you have to reverse both the original list and the list with the results to get the correct output, but it works for this problem.

Try this one-liner:
reduce(lambda acc, cur: acc + ([] if cur[1] == 'you' else [next((i[0] - cur[0] - 1 for i in list(enumerate(my_list))[cur[0]+1:] if i[1] == 'hello'), len(my_list) - cur[0] - 1)]), enumerate(my_list), [])
It will give an array where the nth element is the number of 'you's following the nth occurrence of 'hello'.
e.g.
If my_list = ['hello', 'hello', 'hello', 'you', 'you', 'hello', 'you', 'hello', 'you', 'you', 'you'],
it will give: [0, 0, 2, 1, 3]

python Split list based on delimiting list value

I've dug through countless other questions but none of them seem to work for me. I've also tried a ton of different things but I don't understand what I need to do. I don't know what else to do.
list:
split_me = ['this', 'is', 'my', 'list', '--', 'and', 'thats', 'what', 'it', 'is!', '--', 'Please', 'split', 'me', 'up.']
I need to:
Split this into a new list everytime it finds a "--"
name the list the first value after the "--"
not include the "--" in the new lists.
So it becomes this:
this=['this', 'is', 'my', 'list']
and=['and', 'thats', 'what', 'it', 'is!']
please=['Please', 'split', 'me', 'up.']
current attempt (Work in progress):
for value in split_me:
if firstrun:
newlist=list(value)
firstrun=False
continue
if value == "--":
#restart? set firstrun to false?
firstrun=False
continue
else:
newlist.append(value)
print(newlist)

This more or less works, although I had to change words to solve the reserved word problem. (Bad idea to call a variable 'and').
split_me = ['This', 'is', 'my', 'list', '--', 'And', 'thats', 'what', 'it', 'is!', '--', 'Please', 'split', 'me', 'up.']
retval = []
actlist = []
for e in split_me:
if (e == '--'):
retval.append(actlist)
actlist = []
continue
actlist.append(e)
if len(actlist) != 0:
retval.append(actlist)
for l in retval:
name = l[0]
cmd = name + " = " + str(l)
exec( cmd )
print This
print And
print Please

Utilizing itertools.groupby():
dash = "--"
phrases = [list(y) for x, y in groupby(split_me, lambda z: z == dash) if not x]
Initialize a dict and map each list to the first word in that list:
myDict = {}
for phrase in phrases:
myDict[phrase[0].lower()] = phrase
Which will output:
{'this': ['this', 'is', 'my', 'list]
'and': ['and', 'thats', 'what', 'it', 'is!']
'please': ['Please', 'split', 'me', 'up.'] }

This will actually create global variables named the way you want them to be named. Unfortunately it will not work for Python keywords such as and and for this reason I am replacing 'and' with 'And':
split_me = ['this', 'is', 'my', 'list', '--', 'And', 'thats', 'what', 'it',
'is!', '--', 'Please', 'split', 'me', 'up.']
new = True
while split_me:
current = split_me.pop(0)
if current == '--':
new = True
continue
if new:
globals()[current] = [current]
newname = current
new = False
continue
globals()[newname].append(current)
A more elegant approach based on #Mangohero1 answer would be:
from itertools import groupby
dash = '--'
phrases = [list(y) for x, y in groupby(split_me, lambda z: z == dash) if not x]
for l in phrases:
if not l:
continue
globals()[l[0]] = l

I would try something ike
" ".join(split_me).split(' -- ') # as a start

Split strings in a list of lists

I currently have a list of lists:
[['Hi my name is'],['What are you doing today'],['Would love some help']]
And I would like to split the strings in the lists, while remaining in their current location. For example
[['Hi','my','name','is']...]..
How can I do this?
Also, if I would like to use a specific of the lists after searching for it, say I search for "Doing", and then want to append something to that specific list.. how would I go about doing that?

You can use a list comprehension to create new list of lists with all the sentences split:
[lst[0].split() for lst in list_of_lists]
Now you can loop through this and find the list matching a condition:
for sublist in list_of_lists:
if 'doing' in sublist:
sublist.append('something')
or searching case insensitively, use any() and a generator expression; this will the minimum number of words to find a match:
for sublist in list_of_lists:
if any(w.lower() == 'doing' for w in sublist):
sublist.append('something')

list1 = [['Hi my name is'],['What are you doing today'],['Would love some help']]
use
[i[0].split() for i in list1]
then you will get the output like
[['Hi', 'my', 'name', 'is'], ['What', 'are', 'you', 'doing', 'today'], ['Would', 'love', 'some', 'help']]

l = [['Hi my name is'],['What are you doing today'],['Would love some help']]
for x in l:
l[l.index(x)] = x[0].split(' ')
print l
Or simply:
l = [x[0].split(' ') for x in l]
Output
[['Hi', 'my', 'name', 'is'], ['What', 'are', 'you', 'doing', 'today'], ['Would', 'love', 'some', 'help']]

Python: Split list based on first character of word

Im kind of stuck on an issue and Ive gone round and round with it until ive confused myself.
What I am trying to do is take a list of words:
['About', 'Absolutely', 'After', 'Aint', 'Alabama', 'AlabamaBill', 'All', 'Also', 'Amos', 'And', 'Anyhow', 'Are', 'As', 'At', 'Aunt', 'Aw', 'Bedlam', 'Behind', 'Besides', 'Biblical', 'Bill', 'Billgone']
Then sort them under and alphabetical order:
A
About
Absolutely
After
B
Bedlam
Behind
etc...
Is there and easy way to do this?

Use itertools.groupby() to group your input by a specific key, such as the first letter:
from itertools import groupby
from operator import itemgetter
for letter, words in groupby(sorted(somelist), key=itemgetter(0)):
print letter
for word in words:
print word
print
If your list is already sorted, you can omit the sorted() call. The itemgetter(0) callable will return the first letter of each word (the character at index 0), and groupby() will then yield that key plus an iterable that consists only of those items for which the key remains the same. In this case that means looping over words gives you all items that start with the same character.
Demo:
>>> somelist = ['About', 'Absolutely', 'After', 'Aint', 'Alabama', 'AlabamaBill', 'All', 'Also', 'Amos', 'And', 'Anyhow', 'Are', 'As', 'At', 'Aunt', 'Aw', 'Bedlam', 'Behind', 'Besides', 'Biblical', 'Bill', 'Billgone']
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> for letter, words in groupby(sorted(somelist), key=itemgetter(0)):
... print letter
... for word in words:
... print word
... print
...
A
About
Absolutely
After
Aint
Alabama
AlabamaBill
All
Also
Amos
And
Anyhow
Are
As
At
Aunt
Aw
B
Bedlam
Behind
Besides
Biblical
Bill
Billgone

Instead of using any library imports, or anything fancy.
Here is the logic:
def splitLst(x):
dictionary = dict()
for word in x:
f = word[0]
if f in dictionary.keys():
dictionary[f].append(word)
else:
dictionary[f] = [word]
return dictionary
splitLst(['About', 'Absolutely', 'After', 'Aint', 'Alabama', 'AlabamaBill', 'All', 'Also', 'Amos', 'And', 'Anyhow', 'Are', 'As', 'At', 'Aunt', 'Aw', 'Bedlam', 'Behind', 'Besides', 'Biblical', 'Bill', 'Billgone'])

def split(n):
n2 = []
for i in n:
if i[0] not in n2:
n2.append(i[0])
n2.sort()
for j in n:
z = j[0]
z1 = n2.index(z)
n2.insert(z1+1, j)
return n2
word_list = ['be','have','do','say','get','make','go','know','take','see','come','think',
'look','want','give','use','find','tell','ask','work','seem','feel','leave','call']
print(split(word_list))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sequence Generation with Number applied to string - python

a = [0,13,26,39] b = [12,25,38,51] str = 'If you are done with the file, move to the command area across from the file name in the RL screen and type' read = str.split() extra_lists = [read[start:end] for start,end in zip(a,b)] print extra_lists

Related

python: tokenize list of tuples without for loop

How to get number of occurences of items after certain item in list

python Split list based on delimiting list value

Split strings in a list of lists

Python: Split list based on first character of word

Categories

Resources