For example, if I have a list of strings
alist=['a_name1_1', 'a_name1_2', 'a_name1_3']
How do I get this:
alist_changed = ['a_n1_1', 'a_n1_2', 'a_n1_3']
alist_changed = [s.replace("ame", "") for s in alist]
If you are looking for something that actually needs to be "pattern" based then you can use python's re module and sub the regular expression pattern for what you want.
import re
alist=['a_name1_1', 'a_name1_2', 'a_name1_3']
alist_changed = []
pattern = r'_\w*_'
for x in alist:
y = re.sub(pattern, '_n1_', x, 1)
#print(y)
alist_changed.append(y)
print(alist_changed)
Related
I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]
I have a string : str = "**Quote_Policy_Generalparty_NameInfo** "
I am splitting the string as str.split("_") which gives me a list in python.
Any help in getting the output as below is appreciated.
[ Quote, Quote_Policy, Quote_Policy_Generalparty, Quote_Policy_Generalparty_NameInfo ]
You can use range(len(list)) to create slices list[:1], list[:2], etc. and then "_".join(...) to concatenate every slice
text = "Quote_Policy_Generalparty_NameInfo "
data = text.split('_')
result = []
for x in range(len(data)):
part = data[:x+1]
part = "_".join(part)
result.append(part)
print(result)
input = "Quote_Policy_Generalparty_NameInfo"
tokenized = input.split("_")
combined = [
"_".join(tokenized[:i])
for i, token in enumerate(tokenized, 1)
]
The value of combined above will be
['Quote', 'Quote_Policy', 'Quote_Policy_Generalparty', 'Quote_Policy_Generalparty_NameInfo']
you could use accumulate from itertools, we basically give it one more argument, which decides how to accumulate two elements
from itertools import accumulate
input = "Quote_Policy_Generalparty_NameInfo"
output = [*accumulate(input.split('_'), lambda str1, str2 : '_'.join([str1,str2])),]
which gives :
Out[11]:
['Quote',
'Quote_Policy',
'Quote_Policy_Generalparty',
'Quote_Policy_Generalparty_NameInfo']
If you find the above answers too clean and satisfactory, you can also consider regular expressions:
>>> import regex as re # For `overlapped` support
>>> x = "Quote_Policy_Generalparty_NameInfo"
>>> list(map(lambda s: s[::-1], re.findall('(?<=_).*$', '_' + x[::-1], overlapped=True)))
['Quote_Policy_Generalparty_NameInfo', 'Quote_Policy_Generalparty', 'Quote_Policy', 'Quote']
This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 4 years ago.
import re
name = 'propane'
a = []
Alkane = re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)
if Alkane != a:
print(Alkane)
As you can see when the regular express takes in propane it will output two empty strings.
[('', '', 'prop', 'ane')]
For these types of inputs, I want to remove the empty strings from the output. I don't know what kind of form this output is in though, it doesn't look like a regular list.
You can use str.split() and str.join() to remove empty strings from your output:
>>> import re
>>> name = 'propane'
>>> Alkane = re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)
>>> Alkane
[('', '', 'prop', 'ane')]
>>> [tuple(' '.join(x).split()) for x in Alkane]
[('prop', 'ane')]
Or using filter():
[tuple(filter(None, x)) for x in Alkane]
You can use filter to remove empty strings:
import re
name = 'propane'
a = []
Alkane = list(map(lambda m: tuple(filter(bool, m)), re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)))
if Alkane != a:
print(Alkane)
Or you can use list/tuple comprehension:
import re
name = 'propane'
a = []
Alkane = [tuple(i for i in m if i) for m in re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)]
if Alkane != a:
print(Alkane)
Both output:
[('prop', 'ane')]
It is stated in the doc that empty match are included.
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.
This means you will need to filter out empty compounds yourself. Use falsiness of the empty string for that.
import re
name = 'propane'
alkanes = re.findall(r'(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)
alkanes = [tuple(comp for comp in a if comp) for a in alkanes]
print(alkanes) # [('prop', 'ane')]
Also, avoid using capitalized variable names as those are generally reserved for class names.
I have a list with some English text while other in Hindi. I want to remove all elements from list written in English. How to achieve that?
Example: How to remove hello from list L below?
L = ['मैसेज','खेलना','दारा','hello','मुद्रण']
for i in range(len(L)):
print L[i]
Expected Output:
मैसेज
खेलना
दारा
मुद्रण
You can use isalpha() function
l = ['मैसेज', 'खेलना', 'दारा', 'hello', 'मुद्रण']
for word in l:
if not word.isalpha():
print word
will give you the result:
मैसेज
खेलना
दारा
मुद्रण
How about a simple list comprehension:
>>> import re
>>> i = ['मैसेज','खेलना','दारा','hello','मुद्रण']
>>> [w for w in i if not re.match(r'[A-Z]+', w, re.I)]
['मैसेज', 'खेलना', 'दारा', 'मुद्रण']
You can use filter with regex match:
import re
list(filter(lambda w: not re.match(r'[a-zA-Z]+', w), ['मैसेज','खेलना','दारा','hello','मुद्रण']))
You can use Python's regular expression module.
import re
l=['मैसेज','खेलना','दारा','hello','मुद्रण']
for string in l:
if not re.search(r'[a-zA-Z]', string):
print(string)
I am trying to get value
l1 = [u'/worldcup/archive/southafrica2010/index.html', u'/worldcup/archive/germany2006/index.html', u'/worldcup/archive/edition=4395/index.html', u'/worldcup/archive/edition=1013/index.html', u'/worldcup/archive/edition=84/index.html', u'/worldcup/archive/edition=76/index.html', u'/worldcup/archive/edition=68/index.html', u'/worldcup/archive/edition=59/index.html', u'/worldcup/archive/edition=50/index.html', u'/worldcup/archive/edition=39/index.html', u'/worldcup/archive/edition=32/index.html', u'/worldcup/archive/edition=26/index.html', u'/worldcup/archive/edition=21/index.html', u'/worldcup/archive/edition=15/index.html', u'/worldcup/archive/edition=9/index.html', u'/worldcup/archive/edition=7/index.html', u'/worldcup/archive/edition=5/index.html', u'/worldcup/archive/edition=3/index.html', u'/worldcup/archive/edition=1/index.html']
I'm trying to do regular expression starting off with something like this below
m = re.search(r"\d+", l)
print m.group()
but I want value between "archive/" and "/index.html"
I goggled and have tried something like (?<=archive/\/index.html).*(?=\/index.html:)
but It didn't work for me .. how can I get my result list as '
result = ['germany2006','edition=4395','edition=1013' , ...]
If you know for sure that the pattern will match always, you can use this
import re
print [re.search("archive/(.*?)/index.html", l).group(1) for l in l1]
Or you can simply split like this
print [l.rsplit("/", 2)[-2] for l in l1]
You can take help from below code .It will solve your problem.
>>> import re
>>> p = '/worldcup/archive/southafrica2010/index.html'
>>> r = re.compile('archive/(.*?)/index.html')
>>> m = r.search(p)
>>> m.group(1)
'southafrica2010'
Look-arounds is what you need. You need to use it like this:
>>> [re.search(r"(?<=archive/).*?(?=/index.html)", s).group() for s in l1]
[u'southafrica2010', u'germany2006', u'edition=4395', u'edition=1013', u'edition=84', u'edition=76', u'edition=68', u'edition=59', u'edition=50', u'edition=39', u'edition=32', u'edition=26', u'edition=21', u'edition=15', u'edition=9', u'edition=7', u'edition=5', u'edition=3', u'edition=1']
The regular expression
m = re.search(r'(?<=archive\/).+(?=\/index.html)', s)
can solve this, suppose that s is a string from your list.