Merge the matches from regular expressions into a single list - python

I am trying to separate a string in CamelCase into a single list
I managed to separate the words with regular expressions
But I am clueless on how create a single list of all the matches
I tried to concatenate the lists, append something like that but I don't think it would work in my case
n="SafaNeelHelloAByeSafaJasleen"
patt=re.compile(r'([A-Z][a-z]*|[a-z$])')
matches=patt.finditer(n)
for match in matches:
a=match.group()
list=a.split()
print(list)
output:
['Safa']
['Neel']
['Hello']
['A']
['Bye']
['Safa']
['Jasleen']
Desired output:
['Safa','Neel','Hello','A','Bye','Safa','Jasleen']

You're looking for re.findall(), not re.finditer():
>>> string = "SafaNeelHelloAByeSafaJasleen"
>>> pattern = re.compile(r"([A-Z][a-z]*|[a-z$])")
>>> pattern.findall(string)
['Safa', 'Neel', 'Hello', 'A', 'Bye', 'Safa', 'Jasleen']

You can append the matches to new list:
new_list=[]
for match in matches:
a=match.group()
new_list.append(a)
Output of new_list:
['Safa', 'Neel', 'Hello', 'A', 'Bye', 'Safa', 'Jasleen']

Related

Splitting a single index list into multiple list indexes?

I have a list:
lst = ['words in a list']
and I was hoping to split each one of these words in the string into their own separate indexes. So for example, it would look something like this:
lst = ['words','in','a','list']
I'm wondering if this is possible? I thought initially this would be just a simple lst.split() with a loop, but it seems like this is will throw an error.
Thanks for the help!
Use this:
print(lst[0].split())
If the list has more elements:
print([x for i in lst for x in i.split()])
Split only works for a string type. So you need to index the list item first and then split.
lst = lst[0].split()
Use this when you have a list of string or single string inside a list
lst = ['this is string1', 'this is string2', 'this is string3']
result =' '.join(lst).split()
print(result)
# output : ['this', 'is', 'string1', 'this', 'is', 'string2', 'this', 'is', 'string3']

Split string into array in Python

I have a string with the following structure.
string = "[abcd, abc, a, b, abc]"
I would like to convert that into an array. I keep using the split function in Python but I get spaces and the brackets on the start and the end of my new array. I tried working around it with some if statements but I keep missing letters in the end from some words.
Keep in mind that I don't know the length of the elements in the string. It could be 1, 2, 3 etc.
Assuming your elements never end or start with spaces or square brackets, you could just strip them out (the bracket can be stripped out before splitting):
arr = [ x.strip() for x in string.strip('[]').split(',') ]
It gives as expected
print (arr)
['abcd', 'abc', 'a', 'b', 'abc']
The nice part with strip is that it leaves all inner characters untouched. With:
string = "[ab cd, a[b]c, a, b, abc]"
You get: ['ab cd', 'a[b]c', 'a', 'b', 'abc']
You can also do this
>>> s = string[1:len(string)-1].split(", ")
>>> s
['abcd', 'abc', 'a', 'b', 'abc']
If the values in this list are variables themselves (looks like it because they're not quoted) the easiest way to convert this string to the equivalent list is
string = eval(string)
Caution: If the values in your list should be strings this will not work.
another way to solve this problem
string = "[abcd, abc, a, b, abc]"
result = string[1:len(string)-1].split(", ")
print(result)
Hope this helps
First remove [ and ] from your string, then split on commas, then remove spaces from resulting items (using strip).
If you do not want to use strip, it can be done by following rather clumsy way:
arr = [e[1:] for e in string.split(',')]
arr[len(arr)-1]=arr[len(arr)-1].replace(']', '')
print(arr)
['abcd', 'abc', 'a', 'b', 'abc']
I would suggest following.
[list_element.strip() for list_element in string.strip("[]").split(",")]
First remove brackets and then split it accordingly.

Python re: if string has one word AND any one of a list of words?

I want to find if a string matches on this rule using a regular expression:
list_of_words = ['a', 'boo', 'blah']
if 'foo' in temp_string and any(word in temp_string for word in list_of_words)
The reason I want it in a regular expression is that I have hundreds of rules like it and different from it so I want to save them all as patterns in a dict.
The only one I could think of is this but it doesn't seem pretty:
re.search(r'foo.*(a|boo|blah)|(a|boo|blah).*foo')
You can join the array elements using | to construct a lookahead assertion regex:
>>> list_of_words = ['a', 'boo', 'blah']
>>> reg = re.compile( r'^(?=.*\b(?:' + "|".join(list_of_words) + r')\b).*foo' )
>>> print reg.pattern
^(?=.*\b(?:a|boo|blah)\b).*foo
>>> reg.findall(r'abcd foo blah')
['abcd foo']
As you can see we have constructed a regex ^(?=.*\b(?:a|boo|blah)\b).*foo which asserts presence of one word from list_of_words and matches foo anywhere.

Capitalisation of First Letter of Each Word in a List; Bar All-Caps Words

I need to write a program that can capitalise each word in a sentence (which is stored as a list of words), without affecting the capitalisation of other parts of the sentence.
Let's say, for example, the sentence is 'hello, i am ROB ALSOD'. In a list, this would be:
['hello,','i','am','ROB','ALSOD']
I understand that I could loop through and use the str.title() method to title them, but this would result in:
['Hello,','I','Am','Rob','Alsod']
Notice the difference? The effect I am going for is:
['Hello,','I','Am','ROB','ALSOD']
In other words, I want to keep other capitalised letters the same.
That's a one-liner, using list comprehensions and string slicing:
>>> words = ['hello,','i','am','ROB','ALSOD']
>>> [word[:1].upper() + word[1:] for word in words]
['Hello,', 'I', 'Am', 'ROB', 'ALSOD']
It uppercases word[:1] (everything up to and including the first character) rather than word[0] (the first character itself) in order to avoid an error if your list contains the empty string ''.
While Zero Piraeus's answer is correct, I'd be down for a more functional syntax that avoids loops.
>>> words = ['hello,','i','am','ROB','ALSOD']
>>> def up(word): return word[:1].upper() + word[1:]
>>> map(up, words)
['Hello,', 'I', 'Am', 'ROB', 'ALSOD']

How to convert a string with comma-delimited items to a list in Python?

How do you convert a string into a list?
Say the string is like text = "a,b,c". After the conversion, text == ['a', 'b', 'c'] and hopefully text[0] == 'a', text[1] == 'b'?
Like this:
>>> text = 'a,b,c'
>>> text = text.split(',')
>>> text
[ 'a', 'b', 'c' ]
Just to add on to the existing answers: hopefully, you'll encounter something more like this in the future:
>>> word = 'abc'
>>> L = list(word)
>>> L
['a', 'b', 'c']
>>> ''.join(L)
'abc'
But what you're dealing with right now, go with #Cameron's answer.
>>> word = 'a,b,c'
>>> L = word.split(',')
>>> L
['a', 'b', 'c']
>>> ','.join(L)
'a,b,c'
The following Python code will turn your string into a list of strings:
import ast
teststr = "['aaa','bbb','ccc']"
testarray = ast.literal_eval(teststr)
I don't think you need to
In python you seldom need to convert a string to a list, because strings and lists are very similar
Changing the type
If you really have a string which should be a character array, do this:
In [1]: x = "foobar"
In [2]: list(x)
Out[2]: ['f', 'o', 'o', 'b', 'a', 'r']
Not changing the type
Note that Strings are very much like lists in python
Strings have accessors, like lists
In [3]: x[0]
Out[3]: 'f'
Strings are iterable, like lists
In [4]: for i in range(len(x)):
...: print x[i]
...:
f
o
o
b
a
r
TLDR
Strings are lists. Almost.
In case you want to split by spaces, you can just use .split():
a = 'mary had a little lamb'
z = a.split()
print z
Output:
['mary', 'had', 'a', 'little', 'lamb']
If you actually want arrays:
>>> from array import array
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> myarray = array('c', text)
>>> myarray
array('c', 'abc')
>>> myarray[0]
'a'
>>> myarray[1]
'b'
If you do not need arrays, and only want to look by index at your characters, remember a string is an iterable, just like a list except the fact that it is immutable:
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> text[0]
'a'
m = '[[1,2,3],[4,5,6],[7,8,9]]'
m= eval(m.split()[0])
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
All answers are good, there is another way of doing, which is list comprehension, see the solution below.
u = "UUUDDD"
lst = [x for x in u]
for comma separated list do the following
u = "U,U,U,D,D,D"
lst = [x for x in u.split(',')]
I usually use:
l = [ word.strip() for word in text.split(',') ]
the strip remove spaces around words.
To convert a string having the form a="[[1, 3], [2, -6]]" I wrote yet not optimized code:
matrixAr = []
mystring = "[[1, 3], [2, -4], [19, -15]]"
b=mystring.replace("[[","").replace("]]","") # to remove head [[ and tail ]]
for line in b.split('], ['):
row =list(map(int,line.split(','))) #map = to convert the number from string (some has also space ) to integer
matrixAr.append(row)
print matrixAr
split() is your friend here. I will cover a few aspects of split() that are not covered by other answers.
If no arguments are passed to split(), it would split the string based on whitespace characters (space, tab, and newline). Leading and trailing whitespace is ignored. Also, consecutive whitespaces are treated as a single delimiter.
Example:
>>> " \t\t\none two three\t\t\tfour\nfive\n\n".split()
['one', 'two', 'three', 'four', 'five']
When a single character delimiter is passed, split() behaves quite differently from its default behavior. In this case, leading/trailing delimiters are not ignored, repeating delimiters are not "coalesced" into one either.
Example:
>>> ",,one,two,three,,\n four\tfive".split(',')
['', '', 'one', 'two', 'three', '', '\n four\tfive']
So, if stripping of whitespaces is desired while splitting a string based on a non-whitespace delimiter, use this construct:
words = [item.strip() for item in string.split(',')]
When a multi-character string is passed as the delimiter, it is taken as a single delimiter and not as a character class or a set of delimiters.
Example:
>>> "one,two,three,,four".split(',,')
['one,two,three', 'four']
To coalesce multiple delimiters into one, you would need to use re.split(regex, string) approach. See the related posts below.
Related
string.split() - Python documentation
re.split() - Python documentation
Split string based on regex
Split string based on a regular expression
# to strip `,` and `.` from a string ->
>>> 'a,b,c.'.translate(None, ',.')
'abc'
You should use the built-in translate method for strings.
Type help('abc'.translate) at Python shell for more info.
Using functional Python:
text=filter(lambda x:x!=',',map(str,text))
Example 1
>>> email= "myemailid#gmail.com"
>>> email.split()
#OUTPUT
["myemailid#gmail.com"]
Example 2
>>> email= "myemailid#gmail.com, someonsemailid#gmail.com"
>>> email.split(',')
#OUTPUT
["myemailid#gmail.com", "someonsemailid#gmail.com"]

Categories