String splitting in Python - python

Is there a way to split a string in Python using multiple delimiters instead of one? split seems to take in only one parameter as delimiter.
Also, I cannot import the re module. (This is the main stumbling block really.)
Any suggestions on how I should do it?
Thanks!

In order to split on multiple sequences you could simply replace all of the sequences you need to split on with just one sequence and then split on that one sequence.
So
s = s.replace("z", "s")
s.split("s")
Will split on s and z.

Generic approach for a list of splitters, please, someone can write this with less code?
Initializing vars:
>>> splits = ['.', '-', ':', ',']
>>> s='hola, que: tal. be'
Splitting:
>>> r = [ s ]
>>> for p in splits:
... r = reduce(lambda x,y: x+y, map(lambda z: z.split(p), r ))
Results:
>>> r
['hola', ' que', ' tal', ' be']

Related

How to replace a character within a string in a list?

I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]

Possible occurrences of splitting a string by delimiter

I have a string : str = "**Quote_Policy_Generalparty_NameInfo** "
I am splitting the string as str.split("_") which gives me a list in python.
Any help in getting the output as below is appreciated.
[ Quote, Quote_Policy, Quote_Policy_Generalparty, Quote_Policy_Generalparty_NameInfo ]
You can use range(len(list)) to create slices list[:1], list[:2], etc. and then "_".join(...) to concatenate every slice
text = "Quote_Policy_Generalparty_NameInfo "
data = text.split('_')
result = []
for x in range(len(data)):
part = data[:x+1]
part = "_".join(part)
result.append(part)
print(result)
input = "Quote_Policy_Generalparty_NameInfo"
tokenized = input.split("_")
combined = [
"_".join(tokenized[:i])
for i, token in enumerate(tokenized, 1)
]
The value of combined above will be
['Quote', 'Quote_Policy', 'Quote_Policy_Generalparty', 'Quote_Policy_Generalparty_NameInfo']
you could use accumulate from itertools, we basically give it one more argument, which decides how to accumulate two elements
from itertools import accumulate
input = "Quote_Policy_Generalparty_NameInfo"
output = [*accumulate(input.split('_'), lambda str1, str2 : '_'.join([str1,str2])),]
which gives :
Out[11]:
['Quote',
'Quote_Policy',
'Quote_Policy_Generalparty',
'Quote_Policy_Generalparty_NameInfo']
If you find the above answers too clean and satisfactory, you can also consider regular expressions:
>>> import regex as re # For `overlapped` support
>>> x = "Quote_Policy_Generalparty_NameInfo"
>>> list(map(lambda s: s[::-1], re.findall('(?<=_).*$', '_' + x[::-1], overlapped=True)))
['Quote_Policy_Generalparty_NameInfo', 'Quote_Policy_Generalparty', 'Quote_Policy', 'Quote']

Splitting lists at the commas

Currently I have a long list that has elements like this:
['01/01/2013 06:31, long string of characters,Unknown'].
How would I split each element into:
['01/01/2013 06:31], [long string of characters],[Unknown]? Can I even do that?
I tried variable.split(","), but I get "AttributeError: 'list' object has no attribute 'split'".
Here's my code:
def sentiment_analysis():
f = open('C:\path', 'r')
write_to_list = f.readlines()
write_to_list = map(lambda write_to_list: write_to_list.strip(), write_to_list)
[e.split(',') for e in write_to_list]
print write_to_list[0:2]
f.close()
return
I'm still not getting it, I'd appreciate any help!
Solution
You are given this:
['01/01/2013 06:31, long string of characters,Unknown']
Alright. If you know that there is only this one long string in this list, just extract the only element:
>>> x = ['01/01/2013 06:31, long string of characters,Unknown']
>>>
>>> y = x[0].split(",") # extract only element and split by comma
>>> print(y) # list of strings, with one depth
['01/01/2013 06:31', ' long string of characters', 'Unknown']
Now for whatever reasons, you actually want each eletent of the outer list to be a list with one string in it. That is easy enough to do - simply use map and anonymous functions:
... # continuation from snippet above
...
>>> z = map(lambda s: [s], y) # encapsulates each elem of y in a list
>>> print(z)
[['01/01/2013 06:31'], [' long string of characters'], ['Unknown']]
There you have it.
One-Liner Conclusion
No list comprehensions, no for loops, no generators. Just really simple functional programming and anonymous functions.
Given original list l,
res = map(lambda s: [s],
l[0].split(","))
List comprehension!
>>> variable = ['01/01/2013 06:31, long string of characters,Unknown']
>>> [x.split(',') for x in variable]
[['01/01/2013 06:31', ' long string of characters', 'Unknown']]
But wait, that's nested more than you wanted...
>>> itertools.chain.from_iterable(x.split(',') for x in variable)
<itertools.chain object at 0x109180fd0>
>>> list(itertools.chain.from_iterable(x.split(',') for x in variable))
['01/01/2013 06:31', ' long string of characters', 'Unknown']

Python split string every n char

I want to split a string every n char and the print must be like that:
MISSISSIPPI => MI*SS*IS*SI*PP*I
I've done a program but I don't know how to change the , with a *. Here is the code:
n=input('chunk size')
s=input('Add word')
import re
r=[s[i:i+n] for i in range(0, len(s), n)]
print (r)
This is the output:
['MI', 'SS', 'IS', 'SI', 'PP', 'I']
but I want it to be like this:
MI*SS*IS*SI*PP*I
You could use str.join() for this:
>>> '*'.join(r)
'MI*SS*IS*SI*PP*I'
What this does is iterate over the strings in r, and join them, inserting '*'.
you could also use re module:
import re
r = '*'.join(re.findall('..|.$', s))
Output:
'MI*SS*IS*SI*PP*I'
Well at the point that you're at, you just have 1 more line to add:
r = '*'.join(r)
So then your program becomes
n=input('chunk size')
s=input('Add word')
import re
r=[s[i:i+n] for i in range(0,len(s),n)]
r = '*'.join(r)
print (r)
Unpack it and then use a custom separator:
>>> print(*r, sep='*')
MI*SS*IS*SI*PI
If you want the brackets in the output, use string formatting instead.
>>> print('[{}]'.format('*'.join(r)))
[MI*SS*IS*SI*PI]
We can use split and join methods of string data structure.
x = 'MI*SS*IS*SI*PP*I'
xlist = x.split('*')
'*'.join(xlist)

Python List Replace [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Python and Line Breaks
In Python I know that when your handling strings you can use "replace" to replace a certain character in a string. For example:
h = "2,3,45,6"
h = h.replace(',','\n')
print h
Returns:
2
3
45
6
Is there anyway to do this with a list? For example replace all the "," in a list with "\n"?
A list like:
h = ["hello","goodbye","how are you"]
"hello"
"goodbye"
"how are you"
And the Script should output something like this:
Any suggestions would be helpful!
Looking at your example and desire, you can use the str.join and this is probably what you want
>>> h
['2', '3', '45', '6']
>>> print '\n'.join(str(i) for i in h)
2
3
45
6
similarly for your second example
>>> h = ["hello","goodbye","how are you"]
>>> print '\n'.join(str(i) for i in h)
hello
goodbye
how are you
If you really wan't the quotation mark for strings you can use the following
>>> h = ["hello","goodbye","how are you"]
>>> print '\n'.join('"{0}"'.format(i) if isinstance(i,str) else str(i) for i in h)
"hello"
"goodbye"
"how are you"
>>>
You could use list comprehension for that:
>>> search = 'foo'
>>> replace = 'bar'
>>> lst = ['my foo', 'foo', 'bip']
>>> print [x.replace(search, replace) for x in lst]
['my bar', 'bar', 'bip']
In a list like your h = [2,5,6,8,9], there really are no commas to replace in the list itself. The list contains the items 2, 5 and so on, the commas are merely part of the external representation to make it easier to separate the items visually.
So, to generate some output form from the list but without the commas, you can use any number of techniques. For instance, to join them all up into a single string without commas, use:
"".join([str(x) for x in h])
This will evaluate to 25689.
for each in h: print each
In 3.x:
for each in h: print(each)
A list is simply a representation of data. You can only affect the way it looks in the output.
You can replace the ',' in the string because the ',' is part of the string itself but you cannot replace the ',' in a list because it is not an item in the list rather it is what is used by python for delineating different items in such a list together with the opening and closing square brackets. It is just like asking if you could replace the '"' used in creating the string. On the other hand if the ',' is an item in the list and you want to replace it with a newline item then you could use list comprehensions like:
['\n' if x=="," else x for x in yourlist]
or if you want to print each item on a single line you could use:
for item in list:
print item

Categories