How to get each letter of word list python - python

l = ['hello', 'world', 'monday']
for i in range(n):
word = input()
l.append(word)
for j in l[0]:
print(j)
Output : h e l l o
I would like to do it for every word in l.
I want to keep my list intact because i would need to get len() of each word and i won't know the number of word that i could possibly get.
I don't know if i'm clear enough, if you need more informations let me know, thanks !

def split_into_letters(word):
return ' '.join(word)
lst = ['hello', 'world', 'monday']
lst_2 = list(map(split_into_letters, lst))
print(lst_2)
You can map each word to a function that splits it into letters

l = ['hello', 'world', 'monday']
list(map(list, l))
#output
[['h', 'e', 'l', 'l', 'o'],
['w', 'o', 'r', 'l', 'd'],
['m', 'o', 'n', 'd', 'a', 'y']]

from itertools import chain
lst = ['hello', 'world', 'monday']
# Print all letters of all words seperated by spaces
print(*chain.from_iterable(lst))
# Print all letters of all words seperated by spaces
# for each word on a new line
for word in lst:
print(*word)

Related

Split words in string into nested lists of characters

Can anyone help me with a list comprehension to split a string into a nested list of words and characters? i.e:
mystring = "this is a string"
Wanted ouput:
[['t','h','i','s'],['i','s'],['a'],['s','t','r','i','n','g']]
I've tried the following, but it doesnt split 'x' into nested list:
mylist = [x.split() for x in mystring.split(' ')]
print(mylist)
[['this'],['is'],['a'],['string']]
[list(x) for x in mystring.split(' ')]
You can use a nested list comprehension:
[[j for j in i] for i in mystring.split()]
Yields:
[['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['s', 't', 'r', 'i', 'n', 'g']]
You need list(x) instead of x.split():
[list(x) for x in mystring.split()]
Slightly similar to other answers
map(list,mystring.split(" "))

Find all possible substrings beginning with characters from capturing group

I have for example the string BANANA and want to find all possible substrings beginning with a vowel. The result I need looks like this:
"A", "A", "A", "AN", "AN", "ANA", "ANA", "ANAN", "ANANA"
I tried this: re.findall(r"([AIEOU]+\w*)", "BANANA")
but it only finds "ANANA" which seems to be the longest match.
How can I find all the other possible substrings?
s="BANANA"
vowels = 'AIEOU'
sorted(s[i:j] for i, x in enumerate(s) for j in range(i + 1, len(s) + 1) if x in vowels)
This is a simple way of doing it. Sure there's an easier way though.
def subs(txt, startswith):
for i in xrange(len(txt)):
for j in xrange(1, len(txt) - i + 1):
if txt[i].lower() in startswith.lower():
yield txt[i:i + j]
s = 'BANANA'
vowels = 'AEIOU'
print sorted(subs(s, vowels))
A more pythonic way:
>>> def grouper(s):
... return [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
...
>>> vowels = {'A', 'I', 'O', 'U', 'E', 'a', 'i', 'o', 'u', 'e'}
>>> [t for t in grouper(s) if t[0] in vowels]
['A', 'A', 'A', 'AN', 'AN', 'ANA', 'ANA', 'ANAN', 'ANANA']
Benchmark with accepted answer:
from timeit import timeit
s1 = """
sorted(s[i:j] for i, x in enumerate(s) for j in range(i + 1, len(s) + 1) if x in vowels)
"""
s2 = """
def grouper(s):
return [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
[t for t in grouper(s) if t[0] in vowels]
"""
print '1st: ', timeit(stmt=s1,
number=1000000,
setup="vowels = 'AIEOU'; s = 'BANANA'")
print '2nd : ', timeit(stmt=s2,
number=1000000,
setup="vowels = {'A', 'I', 'O', 'U', 'E', 'a', 'i', 'o', 'u', 'e'}; s = 'BANANA'")
result :
1st: 6.08756995201
2nd : 5.25555992126
As already mentioned in the comments, Regex would not be the right way to go about this.
Try this
def get_substr(string):
holder = []
for ix, elem in enumerate(string):
if elem.lower() in "aeiou":
for r in range(len(string[ix:])):
holder.append(string[ix:ix+r+1])
return holder
print get_substr("BANANA")
## ['A', 'AN', 'ANA', 'ANAN', 'ANANA', 'A', 'AN', 'ANA', 'A']

How to get all substrings in a list of characters (python)

I want to iterate over a list of characters
temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
so that I can obtain two strings, "hello" and "world"
My current way to do this is:
#temp is the name of the list
#temp2 is the starting index of the first alphabetical character found
for j in range(len(temp)):
if temp[j].isalpha() and temp[j-1] != '#':
temp2 = j
while (temp[temp2].isalpha() and temp2 < len(temp)-1:
temp2 += 1
print(temp[j:temp2+1])
j = temp2
The issue is that this prints out
['h', 'e', 'l', 'l', 'o']
['e', 'l', 'l', 'o']
['l', 'l', 'o']
['l', 'o']
['o']
etc. How can I print out only the full valid string?
Edit: I should have been more specific about what constitutes a "valid" string. A string is valid as long as all characters within it are either alphabetical or numerical. I didn't include the "isnumerical()" method within my check conditions because it isn't particularly relevant to the question.
If you want only hello and world and your words are always # seperated, you can easily do it by using join and split
>>> temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
>>> "".join(temp).split('#')
['hello', 'world']
Further more if you need to print the full valid string you need to
>>> t = "".join(temp).split('#')
>>> print(' '.join(t))
hello world
You can do it like this:
''.join(temp).split('#')
List has the method index which returns position of an element. You can use slicing to join the characters.
In [10]: temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
In [11]: pos = temp.index('#')
In [14]: ''.join(temp[:pos])
Out[14]: 'hello'
In [17]: ''.join(temp[pos+1:])
Out[17]: 'world'
An alternate, itertools-based solution:
>>> temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
>>> import itertools
>>> ["".join(str)
for isstr, str in itertools.groupby(temp, lambda c: c != '#')
if isstr]
['hello', 'world']
itertools.groupby is used to ... well ... group consecutive items depending if they are of not equal to #. The comprehension list will discard the sub-lists containing only # and join the non-# sub-lists.
The only advantage is that way, you don't have to build the full-string just to split it afterward. Probably only relevant if the string in really long.
If you just want alphas just use isalpha() replacing the # and any other non letters with a space and then split of you want a list of words:
print("".join(x if x.isalpha() else " " for x in temp).split())
If you want both words in a single string replace the # with a space and join using the conditional expression :
print("".join(x if x.isalpha() else " " for x in temp))
hello world
To do it using a loop like you own code just iterate over items and add to the output string is isalpha else add a space to the output:
out = ""
for s in temp:
if s.isalpha():
out += s
else:
out += " "
Using a loop to get a list of words:
words = []
out = ""
for s in temp:
if s.isalpha():
out += s
else:
words.append(out)
out = ""

Finding sentences that contain one of an array of keywords using Python

I'm using Python 2.7
I want to go through a .txt file and only keep the sentences that contain one or more of a list of keywords.
After that I want to go through the remaining text once more with another list of keywords and repeat the proces.
The result I want to save in that .txt, the rest can be deleted.
I'm new to Python (but loving it!) so don't worry about hurting my feelings by, you're free to assume little knowledge on my side and dumb it down a bit :)
This is what I have so far:
import re
f = open('C:\\Python27\\test\\A.txt')
text = f.read()
define_words = 'contractual'
print re.findall(r"([^.]*?%s[^.]*\.)" % define_words,text)
And that works in so far that it filters out any sentence with 'contractual' in it. If I'd put 'contractual obligation' there it will filter out the sentences that have those two words next to each other.
What I'm stuck at is how do I change that into an array of words that will all be considered seperately of each other? Like 'contractual', 'obligation', 'law', 'employer' etc etc
EDIT regarding applepi's answer:
I've done some testing with a small test:
"The quick brown fox jumps over the lazy dog.
New line.
Yet another nice new line."
I only get a sentence if I put 2 words in that sentence in the string. Like ['quick', 'brown']
OUTPUT: ['T', 'h', 'e', ' ', 'q', 'u', 'i', 'c', 'k', ' ', 'b', 'r', 'o', 'w', 'n', ' ', 'f', 'o', 'x', 'y', ' ', 'j', 'u', 'm', 'p', 's', ' ', 'o', 'v', 'e', 'r', ' ', 't', 'h', 'e', ' ', 'l', 'a', 'z', 'y', ' ', 'd', 'o', 'g', '.']
So ['quick', 'another'] comes up with nothing.
['Yet', 'another'] will come up with:
OUTPUT: [' ', '\n', '\n', 'Y', 'e', 't', ' ', 'a', 'n', 'o', 't', 'h', 'e', 'r', ' ', 'n', 'i', 'c', 'e', ' ', 'n', 'e', 'w', ' ', 'l', 'i', 'n', 'e', '.']
Why not use list comprehension?
print [sent for sent in text.split('.')
if any(word in sent for word in define_words.split()) ]
or if you change define_words for list of strings:
# define_words = ['contractual', 'obligations']
define_words = 'contractual obligations'.split()
print [sent for sent in text.split('.')
if any(word in sent for word in define_words) ]
def init_contains_useful_word(words_to_search_for):
def contains_useful_word(sentence):
return any(map(lambda x: x in sentence, words_to_search_for))
with open(filename, 'r') as f:
text = f.read()
sentences = text.split(".")
for words in list_of_lists:
contains_useful_word = init_contains_useful_word(words)
sentences = filter(contains_useful_word, sentences)
with open(filename, 'w') as f:
f.write(sentences.join(" "))
actually, you can replace the contains useful word with your re operator if you'd like.
I couldn't comment (I don't have enough reputation), so this answer isn't technically an answer.
I am not very familiar with regex, but assuming your re.findall() is successful, you can use the following code:
import re, itertools
from collections import Counter
f = open('C:\\Python27\\test\\A.txt')
text = f.read()
everything = []
define_words = ['contractual', 'obligation', 'law', 'employer']
for k in define_words:
everything.append(re.findall(r"([^.]*?%s[^.]*\.)" % k,text))
everything = list(itertools.chain(*everything))
counts = Counter(everything)
everything = [value for value, count in counts.items() if count > 1]
everything = list(itertools.chain(*everything))
print everything
This loops over the array list and adds the values to a list, making a list of lists. Then I keep only the duplicates (the good values), and I convert the list of lists into one list.
ERROR: the real error was that everything was a list of lists, which Counter(everything) did not allow. Thus, I have stripped it before the Counter().

Is there a function in python to split a word into a list? [duplicate]

This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
Is there a function in python to split a word into a list of single letters? e.g:
s = "Word to Split"
to get
wordlist = ['W', 'o', 'r', 'd', ' ', 't', 'o', ' ', 'S', 'p', 'l', 'i', 't']
>>> list("Word to Split")
['W', 'o', 'r', 'd', ' ', 't', 'o', ' ', 'S', 'p', 'l', 'i', 't']
The easiest way is probably just to use list(), but there is at least one other option as well:
s = "Word to Split"
wordlist = list(s) # option 1,
wordlist = [ch for ch in s] # option 2, list comprehension.
They should both give you what you need:
['W','o','r','d',' ','t','o',' ','S','p','l','i','t']
As stated, the first is likely the most preferable for your example but there are use cases that may make the latter quite handy for more complex stuff, such as if you want to apply some arbitrary function to the items, such as with:
[doSomethingWith(ch) for ch in s]
The list function will do this
>>> list('foo')
['f', 'o', 'o']
Abuse of the rules, same result:
(x for x in 'Word to split')
Actually an iterator, not a list. But it's likely you won't really care.
text = "just trying out"
word_list = []
for i in range(len(text)):
word_list.append(text[i])
print(word_list)
Output:
['j', 'u', 's', 't', ' ', 't', 'r', 'y', 'i', 'n', 'g', ' ', 'o', 'u', 't']
The easiest option is to just use the list() command. However, if you don't want to use it or it dose not work for some bazaar reason, you can always use this method.
word = 'foo'
splitWord = []
for letter in word:
splitWord.append(letter)
print(splitWord) #prints ['f', 'o', 'o']
def count():
list = 'oixfjhibokxnjfklmhjpxesriktglanwekgfvnk'
word_list = []
# dict = {}
for i in range(len(list)):
word_list.append(list[i])
# word_list1 = sorted(word_list)
for i in range(len(word_list) - 1, 0, -1):
for j in range(i):
if word_list[j] > word_list[j + 1]:
temp = word_list[j]
word_list[j] = word_list[j + 1]
word_list[j + 1] = temp
print("final count of arrival of each letter is : \n", dict(map(lambda x: (x, word_list.count(x)), word_list)))

Categories