python Split list based on delimiting list value

python Split list based on delimiting list value - python

I've dug through countless other questions but none of them seem to work for me. I've also tried a ton of different things but I don't understand what I need to do. I don't know what else to do.
list:
split_me = ['this', 'is', 'my', 'list', '--', 'and', 'thats', 'what', 'it', 'is!', '--', 'Please', 'split', 'me', 'up.']
I need to:
Split this into a new list everytime it finds a "--"
name the list the first value after the "--"
not include the "--" in the new lists.
So it becomes this:
this=['this', 'is', 'my', 'list']
and=['and', 'thats', 'what', 'it', 'is!']
please=['Please', 'split', 'me', 'up.']
current attempt (Work in progress):
for value in split_me:
if firstrun:
newlist=list(value)
firstrun=False
continue
if value == "--":
#restart? set firstrun to false?
firstrun=False
continue
else:
newlist.append(value)
print(newlist)

This more or less works, although I had to change words to solve the reserved word problem. (Bad idea to call a variable 'and').
split_me = ['This', 'is', 'my', 'list', '--', 'And', 'thats', 'what', 'it', 'is!', '--', 'Please', 'split', 'me', 'up.']
retval = []
actlist = []
for e in split_me:
if (e == '--'):
retval.append(actlist)
actlist = []
continue
actlist.append(e)
if len(actlist) != 0:
retval.append(actlist)
for l in retval:
name = l[0]
cmd = name + " = " + str(l)
exec( cmd )
print This
print And
print Please

Utilizing itertools.groupby():
dash = "--"
phrases = [list(y) for x, y in groupby(split_me, lambda z: z == dash) if not x]
Initialize a dict and map each list to the first word in that list:
myDict = {}
for phrase in phrases:
myDict[phrase[0].lower()] = phrase
Which will output:
{'this': ['this', 'is', 'my', 'list]
'and': ['and', 'thats', 'what', 'it', 'is!']
'please': ['Please', 'split', 'me', 'up.'] }

This will actually create global variables named the way you want them to be named. Unfortunately it will not work for Python keywords such as and and for this reason I am replacing 'and' with 'And':
split_me = ['this', 'is', 'my', 'list', '--', 'And', 'thats', 'what', 'it',
'is!', '--', 'Please', 'split', 'me', 'up.']
new = True
while split_me:
current = split_me.pop(0)
if current == '--':
new = True
continue
if new:
globals()[current] = [current]
newname = current
new = False
continue
globals()[newname].append(current)
A more elegant approach based on #Mangohero1 answer would be:
from itertools import groupby
dash = '--'
phrases = [list(y) for x, y in groupby(split_me, lambda z: z == dash) if not x]
for l in phrases:
if not l:
continue
globals()[l[0]] = l

I would try something ike
" ".join(split_me).split(' -- ') # as a start

Related

How to extract specific words from a string?

I have to extract two things from a string: A list that contains stop-words, and another list that contains the rest of the string.
text = 'he is the best when people in our life'
stopwords = ['he', 'the', 'our']
contains_stopwords = []
normal_words = []
for i in text.split():
for j in stopwords:
if i in j:
contains_stopwords.append(i)
else:
normal_words.append(i)
if text.split() in stopwords:
contains_stopwords.append(text.split())
else:
normal_words.append(text.split())
print("contains_stopwords:", contains_stopwords)
print("normal_words:", normal_words)
Output:
contains_stopwords: ['he', 'he', 'the', 'our']
normal_words: ['he', 'is', 'is', 'is', 'the', 'the', 'best', 'best', 'best', 'when', 'when', 'when', 'people', 'people', 'people', 'in', 'in', 'in', 'our', 'our', 'life', 'life', 'life', ['he', 'is', 'the', 'best', 'when', 'people', 'in', 'our', 'life']]
Desired result:
contains_stopwords: ['he', 'the', 'our']
normal_words: ['is', 'best', 'when', 'people', 'in', 'life']

One answer could be:
text = 'he is the best when people in our life'
stopwords = ['he', 'the', 'our']
contains_stopwords = set() # The set data structure guarantees there won't be any duplicate
normal_words = []
for word in text.split():
if word in stopwords:
contains_stopwords.add(word)
else:
normal_words.append(word)
print("contains_stopwords:", contains_stopwords)
print("normal_words:", normal_words)

you seem to have chosen the most difficult path. The code under should do the trick.
for word in text.split():
if word in stopwords:
contains_stopwords.append(word)
else:
normal_words.append(word)
First, we separate the text into a list of words using split, then we iterate and check if that word is in the list of stopwords (yeah, python allows you to do this). If it is, we just append it to the list of stopwords, if not, we append it to the other list.

Use the list comprehention and eliminate the duplicates by creating a dictionary with keys as list values and converting it again to a list:
itext = 'he is the best when people in our life'
stopwords = ['he', 'the', 'our']
split_words = itext.split(' ')
contains_stopwords = list(dict.fromkeys([word for word in split_words if word in stopwords]))
normal_words = list(dict.fromkeys([word for word in split_words if word not in stopwords]))
print("contains_stopwords:", contains_stopwords)
print("normal_words:", normal_words)

Some list comprehension could work and then use set() to remove duplicates from the list. I reconverted the set datastructure to a list as per your question, but you can leave it as a set:
text = 'he is the best when people in our life he he he'
stopwords = ['he', 'the', 'our']
list1 = {item for item in text.split(" ") if item in stopwords}
list2 = [item for item in text.split(" ") if item not in list1]
Output:
list1 - ['he', 'the', 'our']
list2 - ['is', 'best', 'when', 'people', 'in', 'life']

text = 'he is the best when people in our life'
# I will suggest make `stopwords` a set
# cuz the membership operator(ie. in) will take O(1)
stopwords = set(['he', 'the', 'our'])
contains_stopwords = []
normal_words = []
for word in text.split():
if word in stopwords: # here checking membership
contains_stopwords.append(word)
else:
normal_words.append(word)
print("contains_stopwords:", contains_stopwords)
print("normal_words:", normal_words)

Python: list/array, remove doubled words string

I want to filter only elements that have only one word and make a new array of it.
How could I do this in Python?
Array:
['somewhat', 'all', 'dictator', 'was called', 'was', 'main director', 'in']
NewArray should be:
['somewhat', 'all', 'dictator', 'was', 'in']

try this
a= ['somewhat', 'all', 'dictator', 'was called', 'was', 'main director', 'in']
print([i for i in a if " " not in i])
Output:
['somewhat', 'all', 'dictator', 'was', 'in']

filter the list with a list comprehension
old_list = ['somewhat', 'all', 'dictator', 'was called', 'was', 'main director', 'in']
new_list = [x for x in old_list if len(x.split()) == 1]
Returns:
['somewhat', 'all', 'dictator', 'was', 'in']

Using re.match and filter
import re
MATCH_SINGLE_WORD = re.compile(r"^\w+$")
inp = ['somewhat', 'all', 'dictator', 'was called', 'was', 'main director', 'in']
out = filter(MATCH_SINGLE_WORD.match, inp)
print(list(out)) # If you need to print. Otherwise, out is a generator that can be traversed(once) later
This solution would handle \n or \t being present in word boundaries as well along with single whitespace character.
If you want to handle leading and trailing whitespaces,
import re
from operator import methodcaller
MATCH_SINGLE_WORD = re.compile(r"^\w+$")
inp = ['somewhat', 'all', 'dictator', 'was called', 'was', 'main director', 'in']
out = filter(MATCH_SINGLE_WORD.match, map(methodcaller("strip"), inp))

IndexError: list index out of range and python(With array 2D)

title_list = [['determined', 'by', 'saturation', 'transfer', '31P', 'NMR'], ['Interactions', 'of', 'the', 'F1', 'ATPase', 'subunits', 'from', 'Escherichia', 'coli', 'detected', 'by', 'the', 'yeast', 'two', 'hybrid', 'system']]
pc_title_list = [[]]
print(title_list[1][0].isalpha() == True)
for i in range(len(title_list)):
for j in range(len(title_list[i])):
if (title_list[i][j].isalpha() == True):
pc_title_list[i].append(title_list[i][j].lower())
And now i going to stucking in this (IndexError: list index out of range).

len() is 1-based and range() is 0-based, so if you do len() - 1 it should work, (but you don't need to do all that, you can jsut do for i in title_list). Also, it looks like you are missing a lot of data using this method, see the list comprehension option below:
title_list = [['determined', 'by', 'saturation', 'transfer', '31P', 'NMR'],
['Interactions', 'of', 'the', 'F1', 'ATPase', 'subunits', 'from',
'Escherichia', 'coli', 'detected', 'by', 'the', 'yeast', 'two',
'hybrid', 'system']]
pc_title_list = [[]]
print(title_list[1][0].isalpha() == True)
for i in range(len(title_list) - 1):
for j in range(len(title_list[i]) - 1):
if (title_list[i][j].isalpha() == True):
pc_title_list[i].append(title_list[i][j].lower())
print('for loop: ', pc_title_list) # looks like items are missing
# list comprehension version, much more concise
pc_title_list2 = [[j.lower()
for j in i
if j.isalpha()]
for i in title_list]
print('list comprehension: ', pc_title_list2)
Output:
True
for loop: [['determined', 'by', 'saturation', 'transfer']]
list comprehension: [['determined', 'by', 'saturation', 'transfer', 'nmr'], ['interactions', 'of', 'the', 'atpase', 'subunits', 'from', 'escherichia', 'coli', 'detected', 'by', 'the', 'yeast', 'two', 'hybrid', 'system']]

Python: Split list based on first character of word

Im kind of stuck on an issue and Ive gone round and round with it until ive confused myself.
What I am trying to do is take a list of words:
['About', 'Absolutely', 'After', 'Aint', 'Alabama', 'AlabamaBill', 'All', 'Also', 'Amos', 'And', 'Anyhow', 'Are', 'As', 'At', 'Aunt', 'Aw', 'Bedlam', 'Behind', 'Besides', 'Biblical', 'Bill', 'Billgone']
Then sort them under and alphabetical order:
A
About
Absolutely
After
B
Bedlam
Behind
etc...
Is there and easy way to do this?

Use itertools.groupby() to group your input by a specific key, such as the first letter:
from itertools import groupby
from operator import itemgetter
for letter, words in groupby(sorted(somelist), key=itemgetter(0)):
print letter
for word in words:
print word
print
If your list is already sorted, you can omit the sorted() call. The itemgetter(0) callable will return the first letter of each word (the character at index 0), and groupby() will then yield that key plus an iterable that consists only of those items for which the key remains the same. In this case that means looping over words gives you all items that start with the same character.
Demo:
>>> somelist = ['About', 'Absolutely', 'After', 'Aint', 'Alabama', 'AlabamaBill', 'All', 'Also', 'Amos', 'And', 'Anyhow', 'Are', 'As', 'At', 'Aunt', 'Aw', 'Bedlam', 'Behind', 'Besides', 'Biblical', 'Bill', 'Billgone']
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> for letter, words in groupby(sorted(somelist), key=itemgetter(0)):
... print letter
... for word in words:
... print word
... print
...
A
About
Absolutely
After
Aint
Alabama
AlabamaBill
All
Also
Amos
And
Anyhow
Are
As
At
Aunt
Aw
B
Bedlam
Behind
Besides
Biblical
Bill
Billgone

Instead of using any library imports, or anything fancy.
Here is the logic:
def splitLst(x):
dictionary = dict()
for word in x:
f = word[0]
if f in dictionary.keys():
dictionary[f].append(word)
else:
dictionary[f] = [word]
return dictionary
splitLst(['About', 'Absolutely', 'After', 'Aint', 'Alabama', 'AlabamaBill', 'All', 'Also', 'Amos', 'And', 'Anyhow', 'Are', 'As', 'At', 'Aunt', 'Aw', 'Bedlam', 'Behind', 'Besides', 'Biblical', 'Bill', 'Billgone'])

def split(n):
n2 = []
for i in n:
if i[0] not in n2:
n2.append(i[0])
n2.sort()
for j in n:
z = j[0]
z1 = n2.index(z)
n2.insert(z1+1, j)
return n2
word_list = ['be','have','do','say','get','make','go','know','take','see','come','think',
'look','want','give','use','find','tell','ask','work','seem','feel','leave','call']
print(split(word_list))

Sequence Generation with Number applied to string

I have tried the Sequence Generator like Lambda, List comprehension and others but it seems that I am not able to get what I really want. My final goal is to print sequence of words from a string like string[1:3]
What I am looking for :
a = [0,13,26,39]
b = [12,25,38,51]
str = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
read = str.split()
read[0:12]
['If', 'you', 'are', 'done', 'with', 'the', 'file,', 'move', 'to', 'the', 'command', 'area']
read[13:25]
['from', 'the', 'file', 'name', 'in', 'the', 'RL', 'screen', 'and', 'type']

Use zip:
>>> a = [0,13,26,39]
>>> b = [12,25,38,51]
>>> strs = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
>>> spl = strs.split()
>>> for x,y in zip(a,b):
... print spl[x:y]
...
['If', 'you', 'are', 'done', 'with', 'the', 'file,', 'move', 'to', 'the', 'command', 'area']
['from', 'the', 'file', 'name', 'in', 'the', 'RL', 'screen', 'and', 'type']
[]
[]
zip returns list of tuples, where each tuple contains items on the same index from the iterables passed to it:
>>> zip(a,b)
[(0, 12), (13, 25), (26, 38), (39, 51)]
Use itertools.izip if you want memory efficient solution, as it returns an iterator.
You can use str.join if you want to create a string from that sliced list:
for x,y in zip(a,b):
print " ".join(spl[x:y])
...
If you are done with the file, move to the command area
from the file name in the RL screen and type
Update: Creating a and b:
>>> n = 5
>>> a = range(0, 13*n, 13)
>>> b = [ x + 12 for x in a]
>>> a
[0, 13, 26, 39, 52]
>>> b
[12, 25, 38, 51, 64]

Do you mean:
>>> [read[i:j] for i, j in zip(a,b)]
[['If', 'you', 'are', 'done', 'with', 'the', 'file,', 'move', 'to', 'the',
'command', 'area'], ['from', 'the', 'file', 'name', 'in', 'the', 'RL',
'screen', 'and', 'type'], [], []]
or
>>> ' '.join[read[i:j] for i, j in zip(a,b)][0])
'If you are done with the file, move to the command area'
>>> ' '.join[read[i:j] for i, j in zip(a,b)][1])
'from the file name in the RL screen and type'

a = [0,13,26,39]
b = [12,25,38,51]
str = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
read = str.split()
extra_lists = [read[start:end] for start,end in zip(a,b)]
print extra_lists

You mentioned a lambda, so:
f = lambda s, i, j: s.split()[i:j]
>>> f("hello world how are you",0,2)
['hello', 'world']
Seems like you're doing the slice indices in two lists, might I suggest a dictionary or a list of tuples?
str = 'If you are done with the file, move to the command area across from the file name in the RL screen and type'
slices = [(0, 13), (12, 25)]
dslices = {0:13, 12:25}
for pair in slices:
print f(str, pair[0], pair[1])
for key in dslices:
print f(str, key, dislikes[key])
I'm not a fan of using zip when you have the option of just formatting your data better.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python Split list based on delimiting list value - python

I would try something ike " ".join(split_me).split(' -- ') # as a start

Related

How to extract specific words from a string?

Python: list/array, remove doubled words string

IndexError: list index out of range and python(With array 2D)

Python: Split list based on first character of word

Sequence Generation with Number applied to string

Categories

Resources