How to get the string of either side of letter EGemail#gmail.com
If the desired letter was "." it would print "l" and "c" from "gmail" and "com"
I do not think that using [] to separate the letter would work as I think the algorithm is much more complicated
Use index().
def getNeighbors(string, desired):
index = mystring.index(desired)
return mystring[index-1], mystring[index+1]
mystring = 'email#gmail.com'
desired = '.'
print(getNeighbors(mystring, desired)) # >>> ('l', 'c')
A couple notes:
This will return the characters around the first instance of '.'. It also does not perform bounds checking. Finally, it does not check that character actually exists in the string.
One possible solution using re module:
s = 'email#gmail.com'
l = '.'
import re
print( re.findall(r'(.)?{}(.)?'.format(re.escape(l)), s))
Prints:
[('l', 'c')]
EDIT: To get only first match you can use re.search:
s = 'email#gmail.com'
l = 'l'
import re
print( re.search(r'(.)?{}(.)?'.format(re.escape(l)), s).groups() )
Prints:
('i', '#')
Related
Suppose I have the following string:
trend = '(A|B|C)_STRING'
I want to expand this to:
A_STRING
B_STRING
C_STRING
The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)
would expand to
STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D
I also want to cover the case of an empty conditional:
(|A_)STRING would expand to:
A_STRING
STRING
Here's what I've tried so far:
def expandOr(trend):
parenBegin = trend.index('(') + 1
parenEnd = trend.index(')')
orExpression = trend[parenBegin:parenEnd]
originalTrend = trend[0:parenBegin - 1]
expandedOrList = []
for oe in orExpression.split("|"):
expandedOrList.append(originalTrend + oe)
But this is obviously not working.
Is there any easy way to do this using regex?
Here's a pretty clean way. You'll have fun figuring out how it works :-)
def expander(s):
import re
from itertools import product
pat = r"\(([^)]*)\)"
pieces = re.split(pat, s)
pieces = [piece.split("|") for piece in pieces]
for p in product(*pieces):
yield "".join(p)
Then:
for s in ('(A|B|C)_STRING',
'(|A_)STRING',
'STRING_(A|B)_STRING_(C|D)'):
print s, "->"
for t in expander(s):
print " ", t
displays:
(A|B|C)_STRING ->
A_STRING
B_STRING
C_STRING
(|A_)STRING ->
STRING
A_STRING
STRING_(A|B)_STRING_(C|D) ->
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D
import exrex
trend = '(A|B|C)_STRING'
trend2 = 'STRING_(A|B)_STRING_(C|D)'
>>> list(exrex.generate(trend))
[u'A_STRING', u'B_STRING', u'C_STRING']
>>> list(exrex.generate(trend2))
[u'STRING_A_STRING_C', u'STRING_A_STRING_D', u'STRING_B_STRING_C', u'STRING_B_STRING_D']
I would do this to extract the groups:
def extract_groups(trend):
l_parens = [i for i,c in enumerate(trend) if c == '(']
r_parens = [i for i,c in enumerate(trend) if c == ')']
assert len(l_parens) == len(r_parens)
return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]
And then you can evaluate the product of those extracted groups using itertools.product:
expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]
Now it's just a question of splicing those back onto your original expression. I'll use re for that :)
#python3.3+
def _gen(it):
yield from it
p = re.compile('\(.*?\)')
for tup in product(*extract_groups(trend)):
gen = _gen(tup)
print(p.sub(lambda x: next(gen),trend))
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D
There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.
It is easy to achieve with sre_yield module:
>>> import sre_yield
>>> trend = '(A|B|C)_STRING'
>>> strings = list(sre_yield.AllStrings(trend))
>>> print(strings)
['A_STRING', 'B_STRING', 'C_STRING']
The goal of sre_yield is to efficiently generate all values that can match a given regular expression, or count possible matches efficiently... It does this by walking the tree as constructed by sre_parse (same thing used internally by the re module), and constructing chained/repeating iterators as appropriate. There may be duplicate results, depending on your input string though -- these are cases that sre_parse did not optimize.
I have a list of characters
a = ["s", "a"]
I have some words.
b = "asp"
c= "lat"
d = "kasst"
I know that the characters in the list can appear only once or in linear order(or at most on small set can appear in the bigger one).
I would like to split my words by putting the elements in a in the middle, an the rest on the left or on the right (and put a "=" if there is nothing)
so b = ["*", "as", "p"]
If a bigger set of characters which contains
d = ["k", "ass", "t"]
I know that the combinations can be at most of length 4.
So I have divided the possible combinations depending on the length:
import itertools
c4 = [''.join(i) for i in itertools.product(a, repeat = 4)]
c3 = [''.join(i) for i in itertools.product(a, repeat = 3)]
c2 = [''.join(i) for i in itertools.product(a, repeat = 2)]
c1 = [''.join(i) for i in itertools.product(a, repeat = 1)]
For each c, starting with the greater
For simplicity, let's say I start with c3 in this case and not with length 4.
I have to do this with a lot of data.
Is there a way to simplify the code ?
You can do something similar using a regular expression:
>>> import re
>>> p = re.compile(r'([sa]{1,4})')
p matches the characters 's' or 'a' repeated between 1 and 4 times.
To split a given string at this pattern, use p.split. The use of capturing parentheses in the pattern leads to the pattern itself being included in the result.
>>> p.split('asp')
['', 'as', 'p']
>>> p.split('lat')
['l', 'a', 't']
>>> p.split('kasst')
['k', 'ass', 't']
Use regex ?
import re
a = ["s", "a"]
text = "kasst"
pattern = re.compile("[" + "".join(a) + "]{1,4}")
match = pattern.search(text)
parts = [text[:match.start()], text[match.start():match.end()], text[match.end():]]
parts = [part if part else "*" for part in parts]
However, note that this won't handle the case when there is no match on the elements in a
I would do a regular expression to simplify the matching.
import re
splitters = ''.join(a)
pattern = re.compile("([^%s]*)([%s]+)([^%s]*)" % (splitters, splitters, splitters))
words = [v if v else '=' for v in pattern.match(s).groups() ]
This doesn't allow the characters in the first or last group, so not all string will match correctly (and throw an exception). You can allow them if you want. Feel free to modify the regular expression to better match what you want it to do.
Also you only need to run the re.compile once, not for every string you are trying to match.
I have the following string:
s = "<X> First <Y> Second"
and I can match any text right after <X> and <Y> (in this case "First" and "Second"). This is how I already did it:
import re
s = "<X> First <Y> Second"
pattern = r'\<([XxYy])\>([^\<]+)' # lower and upper case X/Y will be matched
items = re.findall(pattern, s)
print items
>>> [('X', ' First '), ('Y', ' Second')]
What I am now trying to match is the case without <>:
s = "X First Y Second"
I tried this:
pattern = r'([XxYy]) ([^\<]+)'
>>> [('X', ' First Y Second')]
Unfortunately it's not producing the right result. What am I doing wrong? I want to match X or x or Y or y PLUS one whitespace (for instance "X "). How can I do that?
EDIT: this is a possible string too:
s = "<X> First one <Y> Second <X> More <Y> Text"
Output should be:
>>> [('X', ' First one '), ('Y', ' Second '), ('X', ' More '), ('Y', ' Text')]
EDIT2:
pattern = r'([XxYy]) ([^ ]+)'
s = "X First text Y Second"
produces:
[('X', 'First'), ('Y', 'Second')]
but it should be:
[('X', 'First text'), ('Y', 'Second')]
How about something like: <?[XY]>? ([^<>XY$ ]+)
Example in javascript:
const re = /<?[XY]>? ([^<>XY$ ]+)/ig
console.info('<X> First <Y> Second'.match(re))
console.info('X First Y Second'.match(re))
If you know which whitespace char to match, you can just add it to your expression.
If you want any whitespace to match, you can use \s
pattern = r'\<([XxYy])\>([^\<]+)'
would then be
pattern = r'\<([XxYy])\>\s([^\<]+)'
Always keep in mind the the expression within the () is what will be returned as your result.
Assuming that a the whitespace token to match is a single space character, the pattern is:
pattern = r'([XxYy]) ([^ ]+)'
So i came up with this solution:
pattern = r"([XxYy]) (.*?)(?= [XxYy] |$)"
Given a string like this:
>>> s = "X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct"
First I want to split the string by underscores, i.e.:
>>> s.split('_')
['X/NOUN/dobj>',
'hold/VERB/ROOT',
'<membership/NOUN/dobj',
'<with/ADP/prep',
'<Y/PROPN/pobj',
'>,/PUNCT/punct']
We assume that the underscore is solely used as the delimiter and never exist as part of the substring we want to extract.
Then I need to first checks whether each of these "nodes" in my splitted list above starts of ends with a '>', '<', then remove it and put the appropriate bracket as the end of the sublist, something like:
result = []
nodes = s.split('_')
for node in nodes:
if node.endswith('>'):
result.append( node[:-1].split('/') + ['>'] )
elif node.startswith('>'):
result.append( node[1:].split('/') + ['>'] )
elif node.startswith('<'):
result.append( node[1:].split('/') + ['<'] )
elif node.endswith('<'):
result.append( node[:-1].split('/') + ['<'] )
else:
result.append( node.split('/') + ['-'] )
And if it doesn't start of ends with an angular bracket then we append - to the end of the sublist.
[out]:
[['X', 'NOUN', 'dobj', '>'],
['hold', 'VERB', 'ROOT', '-'],
['membership', 'NOUN', 'dobj', '<'],
['with', 'ADP', 'prep', '<'],
['Y', 'PROPN', 'pobj', '<'],
[',', 'PUNCT', 'punct', '>']]
Given the original input string, is there a less verbose way to get to the result? Maybe with regex and groups?
s = 'X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct'
def get_sentinal(node):
if not node:
return '-'
# Assuming the node won't contain both '<' and '>' at a same time
for index in [0, -1]:
if node[index] in '<>':
return node[index]
return '-'
results = [
node.strip('<>').split('/') + [get_sentinal(node)]
for node in s.split('_')
]
print(results)
This does not make it significantly shorter, but personally I'd think it's somehow a little bit cleaner.
Use this:
import re
s_split = "X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct".split('_')
for i, text in enumerate(s_split):
Left, Mid, Right = re.search('^([<>]?)(.*?)([<>]?)$', text).groups()
s_split[i] = Mid.split('/') + [Left+Right or '-']
print s_split
I can't find a possible answer for a shorter one.
Use ternary to shorten code. Example: print None or "a" will print a. And also use regex to parse the occurence of <> easily.
Yes, although it's not pretty:
s = "X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct"
import re
out = []
for part in s.split('_'):
Left, Mid, Right = re.search('^([<>]|)(.*?)([<>]|)$', part).groups()
tail = ['-'] if not Left+Right else [Left+Right]
out.append(Mid.split('/') + tail)
print(out)
Try online: https://repl.it/Civg
It relies on two main things:
a regex pattern which always makes three groups ()()() where the edge groups only look for characters <, > or nothing ([<>]|), and the middle matches everything (non-greedy) (.*?). The whole thing is anchored at the start (^) and end ($) of the string so it consumes the whole input string.
Assuming that you will never have angles on both ends of the string, then the combined string Left+Right will either be an empty string plus the character to put at the end, one way or the other, or a completely empty string indicating a dash is required.
Instead of my other answer with regexes, you can drop a lot of lines and a lot of slicing, if you know that string.strip('<>') will strip either character from both ends of the string, in one move.
This code is about halfway between your original and my regex answer in linecount, but is more readable for it.
s = "X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct"
result = []
for node in s.split('_'):
if node.startswith('>') or node.startswith('<'):
tail = node[0]
elif node.endswith('>') or node.endswith('>'):
tail = node[-1]
else:
tail = '-'
result.append( node.strip('<>').split('/') + [tail])
print(result)
Try online: https://repl.it/Civr
Edit: how much less verbose do you want to get?
result = [node.strip('<>').split('/') + [(''.join(char for char in node if char in '<>') + '-')[0]] for node in s.split('_')]
print(result)
This is quite neat, you don't have to check which side the <> is on, or whether it's there at all. One step strip()s either angle bracket whichever side it's on, the next step filters only the angle brackets out of the string (whichever side they're on) and adds the dash character. This is either a string starting with any angle bracket from either side or a single dash. Take char 0 to get the right one.
Even shorter with a list comprehension and some regex magic:
import re
s = "X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct"
rx = re.compile(r'([<>])|/')
items = [list(filter(None, match)) \
for item in s.split('_') \
for match in [rx.split(item)]]
print(items)
# [['X', 'NOUN', 'dobj', '>'], ['hold', 'VERB', 'ROOT'], ['<', 'membership', 'NOUN', 'dobj'], ['<', 'with', 'ADP', 'prep'], ['<', 'Y', 'PROPN', 'pobj'], ['>', ',', 'PUNCT', 'punct']]
Explanation:
The code splits the items by _, splits it again with the help of the regular expression rx and filters out empty elements in the end.
See a demo on ideone.com.
I did not use regex and groups but it can be solution as shorter way.
>>> result=[]
>>> nodes=['X/NOUN/dobj>','hold/VERB/ROOT','<membership/NOUN/dobj',
'<with/ADP/prep','<Y/PROPN/pobj','>,/PUNCT/punct']
>>> for node in nodes:
... nd=node.replace(">",("/>" if node.endswith(">") else ">/"))
... nc=nd.replace("<",("/<" if nd.endswith("<") else "</"))
... result.append(nc.split("/"))
>>> nres=[inner for outer in result for inner in outer] #nres used to join all result at single array. If you dont need single array you can use result.
How can I get the position of a character inside a string in python, and list the position in reverse order? Also how can I make it look for both uppercase and lowercase character in the string?
e.g.: if I put in AvaCdefh, and I look for 'a' (both uppercase and lowercase), and return the position for a in my initial string. In this example 'a' is located in 0 and 2 position, so how can I make python to return it as '2 0' (with space)?
This is easily achieved using the re module:
import re
x = "AvaCdefh"
" ".join([str(m.start()) for m in re.finditer("[Aa]",x)][::-1])
... which produces:
'2 0'
The list is reversed before constructing the string using the method described in the second answer to How can I reverse a list in python?.
You can use string.index() to find the first character.
w= "AvaCdefh"
To change string to upper case
print w.upper() #Output: AVACDEFH
To change string to lower case
print w.lower() #Output: avacdefh
To find the first charchter using python built-in function:
print w.lower().index('a') #Output: 0
print w.index('a') #Output: 2
To reverse a word
print w[::-1] #Output: hfedCavA
But you can do this using comprehension list:
char='a'
# Finding a character in the word
findChar= [(c,index) for index,c in enumerate(list(w.lower())) if char==c ]
# Finding a character in the reversed word
inverseFindChar = [(c,index) for index,c in enumerate(list(w[::-1].lower())) if char==c ]
print findChar #Output: [('a', 0), ('a', 2)]
print inverseFindChar #Output: [('a', 5), ('a', 7)]
The other way to do it using lambda.
l = [index for index,c in enumerate(list(w.lower())) if char==c ]
ll= map(lambda x:w[x], l)
print ll #Output: ['A', 'a']
Then, you can wrap this as a function:
def findChar(char):
return " ".join([str(index) for index,c in enumerate(list(w.lower())) if char==c ])
def findCharInReversedWord(char):
return " ".join([str(index) for index,c in enumerate(list(w[::-1].lower())) if char==c ])
print findChar('a')
print findChar('c')
print findCharInReversedWord('a')