Swapping characters within a list of strings in Python - python

Given a list of strings such as: ['Math is cool', 'eggs and bacon']
How would one swap words from one list item to the other to turn them into something like
['bacon is cool', 'eggs and Math']
I would post code if I had any but I really have no idea where to start with this. Thanks.
I'm using Python 3

Start by creating your lists.
text1 = 'Math is cool'
text2 = 'eggs and bacon'
mylist = []
mylist.append(text1.split())
mylist.append(text2.split()
print mylist
Output:
[['Math', 'is', 'cool'], ['eggs', 'and', 'bacon']]
Now that you have the lists, play with them. Use append() to add texts that the user enters, etc.
I think that you can see where to go from here.

You're not providing too much information as to your purpose in general (to say the least)... so the answer below refers only to the specific example given in your question, in order to help you get started:
list = ['Math is cool', 'eggs and bacon']
list0 = list[0].split(' ')
list1 = list[1].split(' ')
newList = [list1[-1]+' '+' '.join(list0[1:]), ' '.join(list1[:-1])+' '+list0[0]]

import random
def swap_words(s1, s2):
s1 = s1.split() # split string into words
s2 = s2.split()
w1 = random.randrange(len(s1)) # decide which words to swap
w2 = random.randrange(len(s2))
s1[w1], s2[w2] = s2[w2], s1[w1] # do swap
return " ".join(s1), " ".join(s2)
then swap_words('Math is cool', 'eggs and bacon') returns sentences like
('Math and cool', 'eggs is bacon')
('bacon is cool', 'eggs and Math')

Related

How to remove sentences/phrases of a certain length from a list?

So this is a smaller version of my sentences/phrase list:
search = ['More than words', 'this way', 'go round', 'the same', 'end With', 'be sure', 'care for', 'see you in hell', 'see you', 'according to', 'they say', 'to go', 'stay with', 'Golf pro', 'Country Club']
What I would like to do is remove any terms that are more or less than 2 words in length. I basically just want another list with all 2 word terms. Is there a way to do this in Python? From my searching, I have only managed to find how to erase words of a certain number of characters and not entire phrases.
You can get 2 word phrases using this:
ans = list(filter(lambda s: len(s.split()) == 2, search))
Another way is using list comprehension:
ans = [w for w in search if len(w.split()) == 2]
As OP asked for removing duplicates in comment, adding it here:
ans = list(set(filter(lambda s: len(s.split()) == 2, search)))
Well if you take a simple approach and say every word must be separated by a space, then any two-word strings will only have 1 space character, three-word strings have 2 space characters, and so on (general rule, an n-word string has (n-1) spaces).
You can just break the two lists up like this (if you want one with strings <= two words, and one with strings > 2 words):
twoWords = []
moreThanTwo = []
for s in search:
if s.count(" ") == 1:
twoWords.append(s)
else
moreThanTwo.append(s)
Or to simplify with a lambda expression and just extract a single list of all two-word strings:
twoWords = list(filter(lambda s: s.count(" ") == 1, search))
As you want to get a new list with a list of words of length equal to 2, use list comprehension
new_list = [sentence for sentence in search if len(sentence.split(' ')) == 2]

split string by using regex in python

What is the best way to split a string like
text = "hello there how are you"
in Python?
So I'd end up with an array like such:
['hello there', 'there how', 'how are', 'are you']
I have tried this:
liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
print(a[0])
But I'm getting:
hello there
how are
you
How can I make the findall function move only one token when searching?
Here's a solution with re.findall:
>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']
Have a look at the Python docs for re: https://docs.python.org/3/library/re.html
(?=...) Lookahead assertion
(?:...) Non-capturing regular parentheses
If regex isn't require you could do something like:
l = text.split(' ')
out = []
for i in range(len(l)):
try:
o.append(l[i] + ' ' + l[i+1])
except IndexError:
continue
Explanation:
First split the string on the space character. The result will be a list where each element is a word in the sentence. Instantiate an empty list to hold the result. Loop over the list of words adding the two word combinations seperated by a space to the output list. This will throw an IndexError when accessing the last word in the list, just catch it and continue since you don't seem to want that lone word in your result anyway.
I don't think you actually need regex for this.
I understand you want a list, in which each element contains two words, the latter also being the former of the following element. We can do this easily like this:
string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
liste[i] = liste[i] + " " + liste[i+1]
I don't know if it's mandatory for you need to use regex, but I'd do this way.
First, you can get the list of words with the str.split() method.
>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']
Then, you can make pairs.
>>> output = []
>>> for i in range (1, len(splited_sentence) ):
... output += [ splited[ i-1 ] + ' ' + splited_sentence[ i ] ]
...
output
['hello there', 'there how', 'how are', 'are you']
An alternative is just to split, zip, then join like so...
sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]
Another possible solution using findall.
>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']

Separating string into two Python array's

I'm trying to split a string of words into two lists of words using the query below. The string up until 'a' should go into begin, and the rest into remainder. But the while loop somehow keeps running regardless of the fact that begin already contains 'a'. Thanks a lot for your help!
random_string = 'hello this is a test string'
split = {}
split = random_string.split()
begin = []
remainder = []
while 'a' not in begin:
for word in split:
storage = word
begin.append(storage)
print(begin)
So your problem here is that the while loop condition is checked after the for loop has completed. Essentially this is what happens
'a' is not in begin
Loop through the split and add every word to begin
check is 'a' is in begin
You could try something like:
for word in split:
if 'a' in begin:
remainder.append(word)
else:
begin.append(word)
where the 'a' condition is checked on every iteration of the loop or follow the slicing techniques listed in other answers
Try using slices and index, there is no need to run a loop for catching the 'a':
random_string = 'hello this is a test string'
split = random_string.split(' ')
index = split.index('a') + 1
array_1 = split[:index]
array_2 = split[index:]
print(array_1, array_2)
You should be looking at .index method of array and there is no need of loop.
random_string = 'hello this is a test string'
split = random_string.split()
begin = []
remainder = []
index = split.index('a')
begin = split[:index]
remainder = split[index:]
print(begin)
print(remainder)
Code snippet above will print:
['hello', 'this', 'is']
['a', 'test', 'string']
Just slice the list you get by spliting your string :
To take the sentence until a specific word :
>>>words = "one two three four"
>>>words.split()[:words.split().index("three")+1]
['one', 'two', 'three']
>>>words.split()[words.split().index("three")+1:]
['four']
To take half the sentence :
(Your post seemed ambiguous to me about what you wanted.)
>>>words = "one two three four"
>>>words.split()[:len(words.split())//2]
['one', 'two']
>>>words.split()[len(words.split())//2:]
['three', 'four']
Try this, just two lines. one to get the words with 'a', another to get words without out 'a'.
random_string = 'hello this is a test string, tap task'
begin = [x for x in random_string.split(' ') if 'a' in x]
remainder = [x for x in random_string.split(' ') if 'a' not in x]
print begin, remainder
will print.
['a', 'tap', 'task']
['hello', 'this', 'is', 'test', 'string,']
Use builtin string routines:
>>> str = 'hello this is a test string'
>>> begin, end str.split('a')
['hello this is ', ' test string']
>>> begin_words = begin.split()
['hello', 'this', 'is']
>>> end_words = end.split()
['test', 'string']
The default of split is to split on whitespace, but as you can see, it works with other strings as well.

Erasing list of phrases from list of texts in python

I am trying to erase specific words found in a list. Lets say that I have the following example:
a= ['you are here','you are there','where are you','what is that']
b = ['you','what is']
The desired output should be the following:
['are here', 'are there', 'where are', 'that']
I created the following code for that task:
import re
def _find_word_and_remove(w,strings):
"""
w:(string)
strings:(string)
"""
temp= re.sub(r'\b({0})\b'.format(w),'',strings).strip()# removes word from string
return re.sub("\s{1,}", " ", temp)# removes double spaces
def find_words_and_remove(words,strings):
"""
words:(list)
strings:(list)
"""
if len(words)==1:
return [_find_word_and_remove(words[0],word_a) for word_a in strings]
else:
temp =[_find_word_and_remove(words[0],word_a) for word_a in strings]
return find_words_and_remove(words[1:],temp)
find_words_and_remove(b,a)
>>> ['are here', 'are there', 'where are', 'that']
It seems that I am over-complicating the 'things' by using recursion for this task. Is there a more simple and readable way to do this task?
You can use list comprehension:
def find_words_and_remove(words, strings):
return [" ".join(word for word in string.split() if word not in words) for string in strings]
That will work only when there are single words in b, but because of your edit and comment, I now know that you really do need _find_word_and_remove(). Your recursion way isn't really too bad, but if you don't want recursion, do this:
def find_words_and_remove(words, strings):
strings_copy = strings[:]
for i, word in enumerate(words):
for string in strings:
strings_copy[i] = _find_word_and_remove(word, string)
return strings_copy
the simple way is to use regex:
import re
a= ['you are here','you are there','where are you','what is that']
b = ['you','what is']
here you go:
def find_words_and_remove(b,a):
return [ re.sub("|".join(b), "", x).strip() if len(re.sub("|".join(b), "", x).strip().split(" ")) < len(x.split(' ')) else x for x in a ]
find_words_and_remove(b,a)
>> ['are here', 'are there', 'where are', 'that']

How to eliminate duplicate list entries in Python while preserving case-sensitivity?

I'm looking for a way to remove duplicate entries from a Python list but with a twist; The final list has to be case sensitive with a preference of uppercase words.
For example, between cup and Cup I only need to keep Cup and not cup. Unlike other common solutions which suggest using lower() first, I'd prefer to maintain the string's case here and in particular I'd prefer keeping the one with the uppercase letter over the one which is lowercase..
Again, I am trying to turn this list:
[Hello, hello, world, world, poland, Poland]
into this:
[Hello, world, Poland]
How should I do that?
Thanks in advance.
This does not preserve the order of words, but it does produce a list of "unique" words with a preference for capitalized ones.
In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]
In [35]: wordset = set(words)
In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']
If you wish to preserve the order as they appear in words, then you could use a collections.OrderedDict:
In [43]: wordset = collections.OrderedDict()
In [44]: wordset = collections.OrderedDict.fromkeys(words)
In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']
Using set to track seen words:
def uniq(words):
seen = set()
for word in words:
l = word.lower() # Use `word.casefold()` if possible. (3.3+)
if l in seen:
continue
seen.add(l)
yield word
Usage:
>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']
UPDATE
Previous version does not take care of preference of uppercase over lowercase. In the updated version I used the min as #TheSoundDefense did.
import collections
def uniq(words):
seen = collections.OrderedDict() # Use {} if the order is not important.
for word in words:
l = word.lower() # Use `word.casefold()` if possible (3.3+)
seen[l] = min(word, seen.get(l, word))
return seen.values()
Since an uppercase letter is "smaller" than a lowercase letter in a comparison, I think you can do this:
orig_list = ["Hello", "hello", "world", "world", "Poland", "poland"]
unique_list = []
for word in orig_list:
for i in range(len(unique_list)):
if unique_list[i].lower() == word.lower():
unique_list[i] = min(word, unique_list[i])
break
else:
unique_list.append(word)
The min will have a preference for words with uppercase letters earlier on.
Some better answers here, but hopefully something simple, different and useful. This code satisfies the conditions of your test, sequential pairs of matching words, but would fail on anything more complicated; such as non-sequential pairs, non-pairs or non-strings. Anything more complicated and I'd take a different approach.
p1 = ['Hello', 'hello', 'world', 'world', 'Poland', 'poland']
p2 = ['hello', 'Hello', 'world', 'world', 'Poland', 'Poland']
def pref_upper(p):
q = []
a = 0
b = 1
for x in range(len(p) /2):
if p[a][0].isupper() and p[b][0].isupper():
q.append(p[a])
if p[a][0].isupper() and p[b][0].islower():
q.append(p[a])
if p[a][0].islower() and p[b][0].isupper():
q.append(p[b])
if p[a][0].islower() and p[b][0].islower():
q.append(p[b])
a +=2
b +=2
return q
print pref_upper(p1)
print pref_upper(p2)

Categories