I have a list with some English text while other in Hindi. I want to remove all elements from list written in English. How to achieve that?
Example: How to remove hello from list L below?
L = ['मैसेज','खेलना','दारा','hello','मुद्रण']
for i in range(len(L)):
print L[i]
Expected Output:
मैसेज
खेलना
दारा
मुद्रण
You can use isalpha() function
l = ['मैसेज', 'खेलना', 'दारा', 'hello', 'मुद्रण']
for word in l:
if not word.isalpha():
print word
will give you the result:
मैसेज
खेलना
दारा
मुद्रण
How about a simple list comprehension:
>>> import re
>>> i = ['मैसेज','खेलना','दारा','hello','मुद्रण']
>>> [w for w in i if not re.match(r'[A-Z]+', w, re.I)]
['मैसेज', 'खेलना', 'दारा', 'मुद्रण']
You can use filter with regex match:
import re
list(filter(lambda w: not re.match(r'[a-zA-Z]+', w), ['मैसेज','खेलना','दारा','hello','मुद्रण']))
You can use Python's regular expression module.
import re
l=['मैसेज','खेलना','दारा','hello','मुद्रण']
for string in l:
if not re.search(r'[a-zA-Z]', string):
print(string)
Related
I have several strings like this:
mylist = ['pearsapple','grapevinesapple','sinkandapple'...]
I want to parse the parts before apple and then append to a new list:
new = ['pears','grapevines','sinkand']
Is there a way other than finding starting points of 'apple' in each string and then appending before the starting point?
By using slicing in combination with the index method of strings.
>>> [x[:x.index('apple')] for x in mylist]
['pears', 'grapevines', 'sinkand']
You could also use a regular expression
>>> import re
>>> [re.match('(.*?)apple', x).group(1) for x in mylist]
['pears', 'grapevines', 'sinkand']
I don't see why though.
I hope the word apple will be fix (fixed length word), then we can use:
second_list = [item[:-5] for item in mylist]
If some elements in the list don't contain 'apple' at the end of the string, this regex leaves the string untouched:
>>> import re
>>> mylist = ['pearsapple','grapevinesapple','sinkandapple', 'test', 'grappled']
>>> [re.sub('apple$', '', word) for word in mylist]
['pears', 'grapevines', 'sinkand', 'test', 'grappled']
By also using string split and list comprehension
new = [x.split('apple')[0] for x in mylist]
['pears', 'grapevines', 'sinkand']
One way to do it would be to iterate through every string in the list and then use the split() string function.
for word in mylist:
word = word.split("apple")[0]
For example, if I have a list of strings
alist=['a_name1_1', 'a_name1_2', 'a_name1_3']
How do I get this:
alist_changed = ['a_n1_1', 'a_n1_2', 'a_n1_3']
alist_changed = [s.replace("ame", "") for s in alist]
If you are looking for something that actually needs to be "pattern" based then you can use python's re module and sub the regular expression pattern for what you want.
import re
alist=['a_name1_1', 'a_name1_2', 'a_name1_3']
alist_changed = []
pattern = r'_\w*_'
for x in alist:
y = re.sub(pattern, '_n1_', x, 1)
#print(y)
alist_changed.append(y)
print(alist_changed)
I am trying the next code but it seems that i am doing something wrong.
import re
lista = ["\\hola\\01\\02Jan\\05\\03",
"\\hola\\01\\02Dem\\12",
"\\hola\\01\\02March\\12\\04"]
for l in lista:
m= re.search("\\\\\d{2,2}\\\\\d{2,2}[a-zA-Z]+\\\\\d{2,2}\s",l)
if m:
print (m.group(0))
The result should be second string.
I have tried without \s but the result match with all strings.
You can try this regex:
lista = [r"\hola\01\02Jan\05\03", r"\hola\01\02Dem\12", r"\hola\01\02March\12\04"]
>>> for l in lista:
... m = re.search(r"\\\d{2,2}\\\d{2,2}[a-zA-Z]+\\\d{2}$", l)
... if m:
... print m.group()
...
Output:
\01\02Dem\12
Use r"..." form to declare a regex and input as raw string
Use anchor $ to avoid matching unwanted input
You can use the following code without regex:
>>> for l in lista:
totalNo = l.count('\\')
if totalNo == 4:
print l
I want just for fun know if it's possible process this in a comprehension
list
some like:
text = "give the most longest word"
def LongestWord(text):
l = 0
words = list()
for x in text.split():
word = ''.join(y for y in x if y.isalnum())
words.append(word)
for word in words:
if l < len(word):
l = len(word)
r = word
return r
Not one but two list comprehensions:
s = 'ab, c d'
cmpfn = lambda x: -len(x)
sorted([''.join(y for y in x if y.isalnum()) for x in s.split()], key=cmpfn)[0]
Zero list comprehensions:
import re
def longest_word(text):
return sorted(re.findall(r'\w+', text), key=len, reverse=True)[0]
print(longest_word("this is an example.")) # prints "example"
Or, if you insist, the same thing but with a list comprehension:
def longest_word(text):
return [w for w in sorted(re.findall(r'\w+', text), key=len, reverse=True)][0]
No need for a list comprehension, really.
import re
my_longest_word = max(re.findall(r'\w+', text), key=len)
Alternatively if you don't want to import re, you can avoid a lambda expression and use one list comprehension using max once again:
my_longest_word = max([ ''.join(l for l in word if l.isalnum())
for w in text.split() ], key = len)
How this works:
Uses a list comprehension and isalnum() to filter out non-alphanumeric characters evaluating each letter in each word, and splits into a list using whitespaces.
Takes the max once again.
How regex solution works:
Matches all alphanumeric of at least length 1 with \w+
findall() places the matches in a list of strings
Max finds the element with maximum length from the list.
Outputs (in both cases):
>>>text = "give the most longest word"
>>>my_longest_word
'longest'
>>>text = "what!! is ??with !##$ these special CharACTERS?"
>>>my_longest_word
'CharACTERS'
I am trying to get value
l1 = [u'/worldcup/archive/southafrica2010/index.html', u'/worldcup/archive/germany2006/index.html', u'/worldcup/archive/edition=4395/index.html', u'/worldcup/archive/edition=1013/index.html', u'/worldcup/archive/edition=84/index.html', u'/worldcup/archive/edition=76/index.html', u'/worldcup/archive/edition=68/index.html', u'/worldcup/archive/edition=59/index.html', u'/worldcup/archive/edition=50/index.html', u'/worldcup/archive/edition=39/index.html', u'/worldcup/archive/edition=32/index.html', u'/worldcup/archive/edition=26/index.html', u'/worldcup/archive/edition=21/index.html', u'/worldcup/archive/edition=15/index.html', u'/worldcup/archive/edition=9/index.html', u'/worldcup/archive/edition=7/index.html', u'/worldcup/archive/edition=5/index.html', u'/worldcup/archive/edition=3/index.html', u'/worldcup/archive/edition=1/index.html']
I'm trying to do regular expression starting off with something like this below
m = re.search(r"\d+", l)
print m.group()
but I want value between "archive/" and "/index.html"
I goggled and have tried something like (?<=archive/\/index.html).*(?=\/index.html:)
but It didn't work for me .. how can I get my result list as '
result = ['germany2006','edition=4395','edition=1013' , ...]
If you know for sure that the pattern will match always, you can use this
import re
print [re.search("archive/(.*?)/index.html", l).group(1) for l in l1]
Or you can simply split like this
print [l.rsplit("/", 2)[-2] for l in l1]
You can take help from below code .It will solve your problem.
>>> import re
>>> p = '/worldcup/archive/southafrica2010/index.html'
>>> r = re.compile('archive/(.*?)/index.html')
>>> m = r.search(p)
>>> m.group(1)
'southafrica2010'
Look-arounds is what you need. You need to use it like this:
>>> [re.search(r"(?<=archive/).*?(?=/index.html)", s).group() for s in l1]
[u'southafrica2010', u'germany2006', u'edition=4395', u'edition=1013', u'edition=84', u'edition=76', u'edition=68', u'edition=59', u'edition=50', u'edition=39', u'edition=32', u'edition=26', u'edition=21', u'edition=15', u'edition=9', u'edition=7', u'edition=5', u'edition=3', u'edition=1']
The regular expression
m = re.search(r'(?<=archive\/).+(?=\/index.html)', s)
can solve this, suppose that s is a string from your list.