Split string into array in Python - python

I have a string with the following structure.
string = "[abcd, abc, a, b, abc]"
I would like to convert that into an array. I keep using the split function in Python but I get spaces and the brackets on the start and the end of my new array. I tried working around it with some if statements but I keep missing letters in the end from some words.
Keep in mind that I don't know the length of the elements in the string. It could be 1, 2, 3 etc.

Assuming your elements never end or start with spaces or square brackets, you could just strip them out (the bracket can be stripped out before splitting):
arr = [ x.strip() for x in string.strip('[]').split(',') ]
It gives as expected
print (arr)
['abcd', 'abc', 'a', 'b', 'abc']
The nice part with strip is that it leaves all inner characters untouched. With:
string = "[ab cd, a[b]c, a, b, abc]"
You get: ['ab cd', 'a[b]c', 'a', 'b', 'abc']

You can also do this
>>> s = string[1:len(string)-1].split(", ")
>>> s
['abcd', 'abc', 'a', 'b', 'abc']

If the values in this list are variables themselves (looks like it because they're not quoted) the easiest way to convert this string to the equivalent list is
string = eval(string)
Caution: If the values in your list should be strings this will not work.

another way to solve this problem
string = "[abcd, abc, a, b, abc]"
result = string[1:len(string)-1].split(", ")
print(result)
Hope this helps

First remove [ and ] from your string, then split on commas, then remove spaces from resulting items (using strip).

If you do not want to use strip, it can be done by following rather clumsy way:
arr = [e[1:] for e in string.split(',')]
arr[len(arr)-1]=arr[len(arr)-1].replace(']', '')
print(arr)
['abcd', 'abc', 'a', 'b', 'abc']

I would suggest following.
[list_element.strip() for list_element in string.strip("[]").split(",")]
First remove brackets and then split it accordingly.

Related

Merge the matches from regular expressions into a single list

I am trying to separate a string in CamelCase into a single list
I managed to separate the words with regular expressions
But I am clueless on how create a single list of all the matches
I tried to concatenate the lists, append something like that but I don't think it would work in my case
n="SafaNeelHelloAByeSafaJasleen"
patt=re.compile(r'([A-Z][a-z]*|[a-z$])')
matches=patt.finditer(n)
for match in matches:
a=match.group()
list=a.split()
print(list)
output:
['Safa']
['Neel']
['Hello']
['A']
['Bye']
['Safa']
['Jasleen']
Desired output:
['Safa','Neel','Hello','A','Bye','Safa','Jasleen']
You're looking for re.findall(), not re.finditer():
>>> string = "SafaNeelHelloAByeSafaJasleen"
>>> pattern = re.compile(r"([A-Z][a-z]*|[a-z$])")
>>> pattern.findall(string)
['Safa', 'Neel', 'Hello', 'A', 'Bye', 'Safa', 'Jasleen']
You can append the matches to new list:
new_list=[]
for match in matches:
a=match.group()
new_list.append(a)
Output of new_list:
['Safa', 'Neel', 'Hello', 'A', 'Bye', 'Safa', 'Jasleen']

How to print the strings having repeating characters?

The question is that:
Suppose I have a string S='ABC', then I want the output to be this list['AAA','BBB','CCC','AAB','ABB','AAC','ACC','BBC','BCC']
How do I achieve this result?
Edit: Thanks to #Breno Monteiro, I came up with the solution based on the example he had shown. What I did was produced the list ['AAA','BBB','CCC'] at first, by multiplying 3 with each of the characters. After that, I replaced the first and second index of the each of the elements in ['AAA','BBB','CCC'] by the second character in the string i.e., if the character is 'A' then its replaced by 'B', if its 'C', then its replaced by 'A' and so on and so forth. So the real output came out to be ['AAA', 'BBB', 'CCC', 'BAA', 'BBA', 'CBB', 'CCB', 'ACC', 'AAC']
My code:
string='ABC'
K=3
output=['AAA','BBB','CCC','AAB','ABB','AAC','ACC','BBC','BCC']
s=""
exp_output,temp=[],[]
ind=1
#including all repeating characters in the string
for i in string:
s+=i*K
exp_output.append(s)
s=""
#including all repeating characters by the first and second index
for i in exp_output:
for j in range(K-1):
i=i.replace(i[j],string[ind%len(string)],1)
temp.append(i)
#print(temp)
ind+=1
exp_output.extend(temp)
print(exp_output)
The simplest way to repeat characters in Python is:
character = 'A'
repeat_times = 3
print(character * repeat_times)
Output: AAA
You can also use Python strings as a list of characters, like this:
characters = 'ABC'
repeat_times = 3
for character in characters:
print(character*repeat_times)
Output: AAA, BBB, CCC

Splitting a single index list into multiple list indexes?

I have a list:
lst = ['words in a list']
and I was hoping to split each one of these words in the string into their own separate indexes. So for example, it would look something like this:
lst = ['words','in','a','list']
I'm wondering if this is possible? I thought initially this would be just a simple lst.split() with a loop, but it seems like this is will throw an error.
Thanks for the help!
Use this:
print(lst[0].split())
If the list has more elements:
print([x for i in lst for x in i.split()])
Split only works for a string type. So you need to index the list item first and then split.
lst = lst[0].split()
Use this when you have a list of string or single string inside a list
lst = ['this is string1', 'this is string2', 'this is string3']
result =' '.join(lst).split()
print(result)
# output : ['this', 'is', 'string1', 'this', 'is', 'string2', 'this', 'is', 'string3']

Sort text based on last 3rd character

I am using the sorted() function to sort the text based on last character
which works perfectly
def sort_by_last_letter(strings):
def last_letter(s):
return s[-1]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))
Output
['a', 'from', 'hello', 'letter', 'last']
My requirement is to sort based on last 3rd character .But problem is few of the words are less than 3 character in that case it should be sorted based on next lower placed character (2 if present else last).Searching to do it in pythonic way
Presently I am getting
IndexError: string index out of range
def sort_by_last_letter(strings):
def last_letter(s):
return s[-3]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))
You can use:
return sorted(strings,key=lambda x: x[max(0,len(x)-3)])
So thus we first calculate the length of the string len(x) and subtract 3 from it. In case the string is not that long, we will thus obtain a negative index, but by using max(0,..) we prevent that and thus take the last but one, or the last character in case these do not exist.
This will work given every string has at least one character. This will produce:
>>> sorted(["hello","from","last","letter","a"],key=lambda x: x[max(0,len(x)-3)])
['last', 'a', 'hello', 'from', 'letter']
In case you do not care about tie-breakers (in other words if 'a' and 'abc' can be reordered), you can use a more elegant approach:
from operator import itemgetter
return sorted(strings,key=itemgetter(slice(-3,None)))
What we here do is generating a slice with the last three characters, and then compare these substrings. This then generates:
>>> sorted(strings,key=itemgetter(slice(-3,None)))
['a', 'last', 'hello', 'from', 'letter']
Since we compare with:
['a', 'last', 'hello', 'from', 'letter']
# ['a', 'ast', 'llo', 'rom', 'ter'] (comparison key)
You can simply use the minimum of the string length and 3:
def sort_by_last_letter(strings):
def last_letter(s):
return s[-min(len(s), 3)]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))

How to convert a string with comma-delimited items to a list in Python?

How do you convert a string into a list?
Say the string is like text = "a,b,c". After the conversion, text == ['a', 'b', 'c'] and hopefully text[0] == 'a', text[1] == 'b'?
Like this:
>>> text = 'a,b,c'
>>> text = text.split(',')
>>> text
[ 'a', 'b', 'c' ]
Just to add on to the existing answers: hopefully, you'll encounter something more like this in the future:
>>> word = 'abc'
>>> L = list(word)
>>> L
['a', 'b', 'c']
>>> ''.join(L)
'abc'
But what you're dealing with right now, go with #Cameron's answer.
>>> word = 'a,b,c'
>>> L = word.split(',')
>>> L
['a', 'b', 'c']
>>> ','.join(L)
'a,b,c'
The following Python code will turn your string into a list of strings:
import ast
teststr = "['aaa','bbb','ccc']"
testarray = ast.literal_eval(teststr)
I don't think you need to
In python you seldom need to convert a string to a list, because strings and lists are very similar
Changing the type
If you really have a string which should be a character array, do this:
In [1]: x = "foobar"
In [2]: list(x)
Out[2]: ['f', 'o', 'o', 'b', 'a', 'r']
Not changing the type
Note that Strings are very much like lists in python
Strings have accessors, like lists
In [3]: x[0]
Out[3]: 'f'
Strings are iterable, like lists
In [4]: for i in range(len(x)):
...: print x[i]
...:
f
o
o
b
a
r
TLDR
Strings are lists. Almost.
In case you want to split by spaces, you can just use .split():
a = 'mary had a little lamb'
z = a.split()
print z
Output:
['mary', 'had', 'a', 'little', 'lamb']
If you actually want arrays:
>>> from array import array
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> myarray = array('c', text)
>>> myarray
array('c', 'abc')
>>> myarray[0]
'a'
>>> myarray[1]
'b'
If you do not need arrays, and only want to look by index at your characters, remember a string is an iterable, just like a list except the fact that it is immutable:
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> text[0]
'a'
m = '[[1,2,3],[4,5,6],[7,8,9]]'
m= eval(m.split()[0])
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
All answers are good, there is another way of doing, which is list comprehension, see the solution below.
u = "UUUDDD"
lst = [x for x in u]
for comma separated list do the following
u = "U,U,U,D,D,D"
lst = [x for x in u.split(',')]
I usually use:
l = [ word.strip() for word in text.split(',') ]
the strip remove spaces around words.
To convert a string having the form a="[[1, 3], [2, -6]]" I wrote yet not optimized code:
matrixAr = []
mystring = "[[1, 3], [2, -4], [19, -15]]"
b=mystring.replace("[[","").replace("]]","") # to remove head [[ and tail ]]
for line in b.split('], ['):
row =list(map(int,line.split(','))) #map = to convert the number from string (some has also space ) to integer
matrixAr.append(row)
print matrixAr
split() is your friend here. I will cover a few aspects of split() that are not covered by other answers.
If no arguments are passed to split(), it would split the string based on whitespace characters (space, tab, and newline). Leading and trailing whitespace is ignored. Also, consecutive whitespaces are treated as a single delimiter.
Example:
>>> " \t\t\none two three\t\t\tfour\nfive\n\n".split()
['one', 'two', 'three', 'four', 'five']
When a single character delimiter is passed, split() behaves quite differently from its default behavior. In this case, leading/trailing delimiters are not ignored, repeating delimiters are not "coalesced" into one either.
Example:
>>> ",,one,two,three,,\n four\tfive".split(',')
['', '', 'one', 'two', 'three', '', '\n four\tfive']
So, if stripping of whitespaces is desired while splitting a string based on a non-whitespace delimiter, use this construct:
words = [item.strip() for item in string.split(',')]
When a multi-character string is passed as the delimiter, it is taken as a single delimiter and not as a character class or a set of delimiters.
Example:
>>> "one,two,three,,four".split(',,')
['one,two,three', 'four']
To coalesce multiple delimiters into one, you would need to use re.split(regex, string) approach. See the related posts below.
Related
string.split() - Python documentation
re.split() - Python documentation
Split string based on regex
Split string based on a regular expression
# to strip `,` and `.` from a string ->
>>> 'a,b,c.'.translate(None, ',.')
'abc'
You should use the built-in translate method for strings.
Type help('abc'.translate) at Python shell for more info.
Using functional Python:
text=filter(lambda x:x!=',',map(str,text))
Example 1
>>> email= "myemailid#gmail.com"
>>> email.split()
#OUTPUT
["myemailid#gmail.com"]
Example 2
>>> email= "myemailid#gmail.com, someonsemailid#gmail.com"
>>> email.split(',')
#OUTPUT
["myemailid#gmail.com", "someonsemailid#gmail.com"]

Categories