How to process element list by using rstrip and lower() - python

I use this code:
test_list = ['A','a','word','Word','If,','As:']
for a in test_list:
str(a)
a.rstrip('.' or '?' or '!' or '"' or ":" or ',' or ';')
a.lower()
print(a)
print(test_list)
I got result like this:
A
a
word
Word
If,
As:
['A', 'a', 'word', 'Word', 'If,', 'As:']
An I was looking for something like:
a
a
word
word
if
as
['a','a','word','word','if','as']
I want to convert all elements in a list and strip all the marks off, so only if the word for me to process.

The following should do everything you requested by using a generator expression:
# Test List
test_list = ['A', 'a', 'word', 'Word', 'If,', 'As:']
# Remove certain characters and convert characters to lower case in test list
test_list = [str(a).strip('.?!":,;').lower() for a in test_list]
# Print test list
print(test_list)
Output:
['a', 'a', 'word', 'word', 'if', 'as']

Related

[Python]_ Does python has string comprehension like list comprehension?

I am studying Python and I got curious about string formatting.
I learned that there is a list comprehension to manipulate or create a list in Python.
For example,
li1 = [i for i in rage(10)]
# this will create a list name with li1
# and li1 contains following:
print(li1) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
So, my question is, if I have the code below, is there any solution to solve this? Like using list comprehension?
# The task I need to do is remove all the pucntuations from the string and replace it to empty string.
text = input() # for example: "A! Lion? is crying..,!" is given as input
punctuations = [",", ".", "!", "?"]
punc_removed_str = text.replace(p, "") for p in punctuations
# above line is what I want to do..
print(remove_punctuation)
# Then result will be like below:
# Output: A Lion is crying
There isn't string comprehension, however you can use generator expression, which is used in list comprehension, inside join()
text = ''.join(x for x in text if x not in punctuations)
print(text) # A Lion is crying
Python already has a complete set of punctuations in the standard library.
from string import punctuation
punctuation returns strings !"#$%&'()*+,-./:;<=>?#[]^_`{|}~.
Docs
So you can create a list from your given input which checks whether or not each characters from your input is in the punctuation string.
>>> [char for char in text if char not in punctuation]
['A', ' ', 'L', 'i', 'o', 'n', ' ', 'i', 's', ' ', 'c', 'r', 'y', 'i', 'n', 'g']
You can pass the resulting list in the built-in str.join method.
>>> "".join([char for char in text if char not in punctuation])
'A Lion is crying'

Finding the index of beginning and end of a matched string in a word array

I would like to find a string match from an array of words, and get the index.
my variables
myString = "I heard a tapping"
wordList = ['Soon', 'again', 'I', 'heard', 'a', 'tapping', 'somewhat', 'louder', 'than', 'before']
The index I'm looking for, for this example is 2 and 5
I honestly don't even know where to begin
You could use index and iterate through myString
which when split looks like
['I', 'heard', 'a', 'tapping']
Using "in"
indexes = [wordList.index(words) for words in myString.split() if words in wordList]
The above will iterate through myString and check if any word is in wordList. If so it will append the index value to a new list called indexes.
output
[2, 3, 4, 5]
Another way you could accomplish this is by using enumerate
indexes = [count for count,elements in enumerate(wordList) if elements in myString]
output
[2, 3, 4, 5]
Using enumerate
Edit
Because you are looking for the start and end you can just index the output lists like list[0] (first) and list[-1](last)
Split the string into a list.
Then search wordList for a slice that matches this.
myString = "I heard a tapping"
wordList = ['Soon', 'again', 'I', 'heard', 'a', 'tapping', 'somewhat', 'louder', 'than', 'before']
stringList = myString.split()
listLen = len(stringList)
for start in range(len(wordList)-listLen):
if wordList[start:start+listLen] == stringList:
end = start + listLen - 1
print(start, end)
break
else:
print("Not found")
Add to #BuddyBobiii's answer:
You can try using:
def findStringInWordArray(string, wordArray):
a = [wordList.index(words) for words in myString.split() if words in wordList]
return [a[0], a[-1]]
Then you can use it:
myString = "I heard a tapping"
wordList = ['Soon', 'again', 'I', 'heard', 'a', 'tapping', 'somewhat', 'louder', 'than', 'before']
findStringInWordArray(string, wordArray)
You will get [2,5] as the first index and the last index

How can I check the first letter of each item in an array?

I'm building a pig latin translator and I can't figure out how to identify the first letter of the entered words. I've converted the input to an array with each item being a new word, but how do I select each first letter of each item to determine if it's a consonant/vowel/etc.?
a = ['This', 'is', 'a', 'sentence']
for word in a:
print(word[0])
Output:
T
i
a
s
words = ['apple', 'bike', 'cow']
Use list comprehension, that is, building a list from the contents of another:
firsts = [w[0] for w in words]
firsts
Output
['a','b','c']
using list cmprh with checking if a word not null
a = ['This', 'is', '', 'sentence']
[w[0] for w in a if w]
Output :
['T', 'i', 's']

What regex will emulate the default behavior of split() in python?

Using split() I can easily create from a string the list of tokens that are divided by space:
>>> 'this is a test 200/2002'.split()
['this', 'is', 'a', 'test', '200/2002']
How do I do the same using re.compile and re.findall? I need something similiar to the following example but without splitting the "200/2002".
>>> test = re.compile('\w+')
>>> test.findall('this is a test 200/2002')
['this', 'is', 'a', 'test', '200', '2002']
This should output the desired list:
>>> test = re.compile('\S+')
>>> test.findall('this is a test 200/2002')
['this', 'is', 'a', 'test', '200/2002']
\S is anything but a whitespace (space, tab, newline, ...).
From str.split() documentation :
If sep is not specified or is None, a different splitting algorithm is
applied: runs of consecutive whitespace are regarded as a single
separator, and the result will contain no empty strings at the start
or end if the string has leading or trailing whitespace. Consequently,
splitting an empty string or a string consisting of just whitespace
with a None separator returns [].
findall() with the above regex should have the same behaviour :
>>> test.findall(" a\nb\tc d ")
['a', 'b', 'c', 'd']
>>> " a\nb\tc d ".split()
['a', 'b', 'c', 'd']

Split a string at uppercase letters

What is the pythonic way to split a string before the occurrences of a given set of characters?
For example, I want to split
'TheLongAndWindingRoad'
at any occurrence of an uppercase letter (possibly except the first), and obtain
['The', 'Long', 'And', 'Winding', 'Road'].
Edit: It should also split single occurrences, i.e.
from 'ABC' I'd like to obtain
['A', 'B', 'C'].
Unfortunately it's not possible to split on a zero-width match in Python. But you can use re.findall instead:
>>> import re
>>> re.findall('[A-Z][^A-Z]*', 'TheLongAndWindingRoad')
['The', 'Long', 'And', 'Winding', 'Road']
>>> re.findall('[A-Z][^A-Z]*', 'ABC')
['A', 'B', 'C']
Here is an alternative regex solution. The problem can be reprased as "how do I insert a space before each uppercase letter, before doing the split":
>>> s = "TheLongAndWindingRoad ABC A123B45"
>>> re.sub( r"([A-Z])", r" \1", s).split()
['The', 'Long', 'And', 'Winding', 'Road', 'A', 'B', 'C', 'A123', 'B45']
This has the advantage of preserving all non-whitespace characters, which most other solutions do not.
Use a lookahead and a lookbehind:
In Python 3.7, you can do this:
re.split('(?<=.)(?=[A-Z])', 'TheLongAndWindingRoad')
And it yields:
['The', 'Long', 'And', 'Winding', 'Road']
You need the look-behind to avoid an empty string at the beginning.
>>> import re
>>> re.findall('[A-Z][a-z]*', 'TheLongAndWindingRoad')
['The', 'Long', 'And', 'Winding', 'Road']
>>> re.findall('[A-Z][a-z]*', 'SplitAString')
['Split', 'A', 'String']
>>> re.findall('[A-Z][a-z]*', 'ABC')
['A', 'B', 'C']
If you want "It'sATest" to split to ["It's", 'A', 'Test'] change the rexeg to "[A-Z][a-z']*"
A variation on #ChristopheD 's solution
s = 'TheLongAndWindingRoad'
pos = [i for i,e in enumerate(s+'A') if e.isupper()]
parts = [s[pos[j]:pos[j+1]] for j in xrange(len(pos)-1)]
print parts
I think that a better answer might be to split the string up into words that do not end in a capital. This would handle the case where the string doesn't start with a capital letter.
re.findall('.[^A-Z]*', 'aboutTheLongAndWindingRoad')
example:
>>> import re
>>> re.findall('.[^A-Z]*', 'aboutTheLongAndWindingRoadABC')
['about', 'The', 'Long', 'And', 'Winding', 'Road', 'A', 'B', 'C']
Pythonic way could be:
"".join([(" "+i if i.isupper() else i) for i in 'TheLongAndWindingRoad']).strip().split()
['The', 'Long', 'And', 'Winding', 'Road']
Works good for Unicode, avoiding re/re2.
"".join([(" "+i if i.isupper() else i) for i in 'СуперМаркетыПродажаКлиент']).strip().split()
['Супер', 'Маркеты', 'Продажа', 'Клиент']
import re
filter(None, re.split("([A-Z][^A-Z]*)", "TheLongAndWindingRoad"))
or
[s for s in re.split("([A-Z][^A-Z]*)", "TheLongAndWindingRoad") if s]
src = 'TheLongAndWindingRoad'
glue = ' '
result = ''.join(glue + x if x.isupper() else x for x in src).strip(glue).split(glue)
Another without regex and the ability to keep contiguous uppercase if wanted
def split_on_uppercase(s, keep_contiguous=False):
"""
Args:
s (str): string
keep_contiguous (bool): flag to indicate we want to
keep contiguous uppercase chars together
Returns:
"""
string_length = len(s)
is_lower_around = (lambda: s[i-1].islower() or
string_length > (i + 1) and s[i + 1].islower())
start = 0
parts = []
for i in range(1, string_length):
if s[i].isupper() and (not keep_contiguous or is_lower_around()):
parts.append(s[start: i])
start = i
parts.append(s[start:])
return parts
>>> split_on_uppercase('theLongWindingRoad')
['the', 'Long', 'Winding', 'Road']
>>> split_on_uppercase('TheLongWindingRoad')
['The', 'Long', 'Winding', 'Road']
>>> split_on_uppercase('TheLongWINDINGRoadT', True)
['The', 'Long', 'WINDING', 'Road', 'T']
>>> split_on_uppercase('ABC')
['A', 'B', 'C']
>>> split_on_uppercase('ABCD', True)
['ABCD']
>>> split_on_uppercase('')
['']
>>> split_on_uppercase('hello world')
['hello world']
Alternative solution (if you dislike explicit regexes):
s = 'TheLongAndWindingRoad'
pos = [i for i,e in enumerate(s) if e.isupper()]
parts = []
for j in xrange(len(pos)):
try:
parts.append(s[pos[j]:pos[j+1]])
except IndexError:
parts.append(s[pos[j]:])
print parts
Replace every uppercase letter 'L' in the given with an empty space plus that letter " L". We can do this using list comprehension or we can define a function to do it as follows.
s = 'TheLongANDWindingRoad ABC A123B45'
''.join([char if (char.islower() or not char.isalpha()) else ' '+char for char in list(s)]).strip().split()
>>> ['The', 'Long', 'A', 'N', 'D', 'Winding', 'Road', 'A', 'B', 'C', 'A123', 'B45']
If you choose to go by a function, here is how.
def splitAtUpperCase(text):
result = ""
for char in text:
if char.isupper():
result += " " + char
else:
result += char
return result.split()
In the case of the given example:
print(splitAtUpperCase('TheLongAndWindingRoad'))
>>>['The', 'Long', 'A', 'N', 'D', 'Winding', 'Road']
But most of the time that we are splitting a sentence at upper case letters, it is usually the case that we want to maintain abbreviations that are typically a continuous stream of uppercase letters. The code below would help.
def splitAtUpperCase(s):
for i in range(len(s)-1)[::-1]:
if s[i].isupper() and s[i+1].islower():
s = s[:i]+' '+s[i:]
if s[i].isupper() and s[i-1].islower():
s = s[:i]+' '+s[i:]
return s.split()
splitAtUpperCase('TheLongANDWindingRoad')
>>> ['The', 'Long', 'AND', 'Winding', 'Road']
Thanks.
An alternative way without using regex or enumerate:
word = 'TheLongAndWindingRoad'
list = [x for x in word]
for char in list:
if char != list[0] and char.isupper():
list[list.index(char)] = ' ' + char
fin_list = ''.join(list).split(' ')
I think it is clearer and simpler without chaining too many methods or using a long list comprehension that can be difficult to read.
This is possible with the more_itertools.split_before tool.
import more_itertools as mit
iterable = "TheLongAndWindingRoad"
[ "".join(i) for i in mit.split_before(iterable, pred=lambda s: s.isupper())]
# ['The', 'Long', 'And', 'Winding', 'Road']
It should also split single occurrences, i.e. from 'ABC' I'd like to obtain ['A', 'B', 'C'].
iterable = "ABC"
[ "".join(i) for i in mit.split_before(iterable, pred=lambda s: s.isupper())]
# ['A', 'B', 'C']
more_itertools is a third-party package with 60+ useful tools including implementations for all of the original itertools recipes, which obviates their manual implementation.
An alternate way using enumerate and isupper()
Code:
strs = 'TheLongAndWindingRoad'
ind =0
count =0
new_lst=[]
for index, val in enumerate(strs[1:],1):
if val.isupper():
new_lst.append(strs[ind:index])
ind=index
if ind<len(strs):
new_lst.append(strs[ind:])
print new_lst
Output:
['The', 'Long', 'And', 'Winding', 'Road']
Sharing what came to mind when I read the post. Different from other posts.
strs = 'TheLongAndWindingRoad'
# grab index of uppercase letters in strs
start_idx = [i for i,j in enumerate(strs) if j.isupper()]
# create empty list
strs_list = []
# initiate counter
cnt = 1
for pos in start_idx:
start_pos = pos
# use counter to grab next positional element and overlook IndexeError
try:
end_pos = start_idx[cnt]
except IndexError:
continue
# append to empty list
strs_list.append(strs[start_pos:end_pos])
cnt += 1
You might also wanna do it this way
def camelcase(s):
words = []
for char in s:
if char.isupper():
words.append(':'+char)
else:
words.append(char)
words = ((''.join(words)).split(':'))
return len(words)
This will output as follows
s = 'oneTwoThree'
print(camecase(s)
//['one', 'Two', 'Three']
def solution(s):
st = ''
for c in s:
if c == c.upper():
st += ' '
st += c
return st
I'm using list
def split_by_upper(x):
i = 0
lis = list(x)
while True:
if i == len(lis)-1:
if lis[i].isupper():
lis.insert(i,",")
break
if lis[i].isupper() and i != 0:
lis.insert(i,",")
i+=1
i+=1
return "".join(lis).split(",")
OUTPUT:
data = "TheLongAndWindingRoad"
print(split_by_upper(data))`
>> ['The', 'Long', 'And', 'Winding', 'Road']
My solution for splitting on capitalized letters - keeps capitalized words
text = 'theLongAndWindingRoad ABC'
result = re.sub('(?<=.)(?=[A-Z][a-z])', r" ", text).split()
print(result)
#['the', 'Long', 'And', 'Winding', 'Road', 'ABC']
Little late in the party, but:
In [1]: camel = "CamelCaseConfig"
In [2]: parts = "".join([
f"|{c}" if c.isupper() else c
for c in camel
]).lstrip("|").split("|")
In [3]: screaming_snake = "_".join([
part.upper()
for part in parts
])
In [4]: screaming_snake
Out[4]: 'CAMEL_CASE_CONFIG'
part of my answer is based on other people answer from here
def split_string_after_upper_case(word):
word_lst = [x for x in word]
index = 0
for char in word[1:]:
index += 1
if char.isupper():
word_lst.insert(index, ' ')
index += 1
return ''.join(word_lst).split(" ")
k = split_string_after_upper_case('TheLongAndWindingRoad')
print(k)

Categories