RegEx for removing non ASCII characters from both ends

RegEx for removing non ASCII characters from both ends - python

I have to loop multiple times using this code, is there a better way?
item = '!##$abc-123-4;5.def)(*&^;\n'
or
'!##$abc-123-4;5.def)(*&^;\n_'
or
'!##$abc-123-4;5.def)_(*&^;\n_'
The one I have like this did not work
item = re.sub('^\W|\W$', '', item)
Expect
abc-123-4;5.def
The final goal is to keep only remove anything not [a-zA-Z0-9] from both ends while keeping any chars in between. The first and last letter is in class [a-zA-Z0-9]

This expression is not bounded from the left side, and it might perform faster, if all your desired chars would be similar to the example you have provided in your question:
([a-z0-9;.-]+)(.*)
Here, we're assuming that you might just want to filter those special chars in the left and right parts of your input strings.
You can include other chars and boundaries to the expression, and you can even modify/change it to a simpler and faster expression, if you wish.
RegEx Descriptive Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
If you wish to add other boundaries in the right side, you can simply do that:
([a-z0-9;.-]+)(.*)$
or even you can list your special chars both in the left and right of the capturing group.
JavaScript Test
const regex = /([a-z0-9;.-]+)(.*)$/gm;
const str = `!##\$abc-123-4;5.def)(*&^;\\n`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Performance Test
This JavaScript snippet shows the performance of that expression using a simple loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '!##\$abc-123-4;5.def)(*&^;\\n';
const regex = /([!##$)(*&^;]+)([a-z0-9;.-]+)(.*)$/gm;
var match = string.replace(regex, "$2");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
Python Test
import re
regex = r"([a-z0-9;.-]+)(.*)$"
test_str = "!##$abc-123-4;5.def)(*&^;\\n"
print(re.findall(regex, test_str))
Output
[('abc-123-4;5.def', ')(*&^;\\n')]

You can accomplish this by using the carat ^ character at the beginning of a character set to negate its contents. [^a-zA-Z0-9] will match anything that isn't a letter or numeral.
^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$

To trim non word characters (upper \W) from start/end but also add the underscore which belongs to word characters [A-Za-z0-9_] you can drop the _ into a character class together with \W.
^[\W_]+|[\W_]+$
See demo at regex101. This is very similar to #CAustin's answer and #sln's comment.
To get the inverse demo and match everything from the first to the last alphanumeric character:
[^\W_](?:.*[^\W_])?
Or with alternation demo (|[^\W_] for strings having just one alnum in it).
[^\W_].*[^\W_]|[^\W_]
Both with re.DOTALL for multiline strings. Regex flavors without try [\s\S]* instead of .* demo

First of all, you can cut off some very special cases by removing the escape characters:
item = re.sub(r'\\[abnrt]', '', item)
After that lets remove the _ character from \W, from what you get [^a-zA-Z0-9].
Your final regular expression will be: (^[^a-zA-Z0-9]+)|([^a-zA-Z0-9]+$)
item = re.sub(r'(^[^a-zA-Z0-9]+)|([^a-zA-Z0-9]+$)', '', item)
See explanation...
Here you can visualize your regular expression...

Related

regex python catch selective content inside curly braces, including curly sublevels and \n chars

regex python catch selective content inside curly braces, including curly sublevels
The best explanation is a minimum representative example (as you can see is for .bib for those who know latex..). Here is the representative input raw text:
text = """
#book{book1,
title={tit1},
author={aut1}
}
#article{art2,
title={tit2},
author={aut2}
}
#article{art3,
title={tit3},
author={aut3}
}
"""
and here is my try (I failed..) to extract the content inside curly braces only for #article fields.. note that there are \n jumps inside that also want to gather.
regexpresion = r'\#article\{[.*\n]+\}'
result = re.findall(regexpresion, text)
and this is actually what I wanted to obtain,
>>> result
['art2,\ntitle={tit2},\nauthor={aut2}', 'art3,\ntitle={tit3},\nauthor={aut3}']
Many thanks for your experience

You might use a 2 step approach, first matching the parts that start with #article, and then in the second step remove the parts that you don't want in the result.
The pattern to match all the parts:
^#article{.*(?:\n(?!#\w+{).*)+(?=\n}$)
Explanation
^ Start of string
#article{.* Match #article{ and the rest of the line
(?: Non capture group
\n(?!#\w+{).* Match a newline and the rest of the line if it does not start with # 1+ word chars and {
)+ Close the non capture group and repeat it to match all lines
(?=\n}$) Positive lookahead to assert a newline and } at the end of the string
See the matches on regex101.
The pattern in the replacement matches either #article{ or (using the pipe char |) 1 one or more spaces after a newline.
#article{|(?<=\n)[^\S\n]+
Example
import re
pattern = r"^#article{.*(?:\n(?!#\w+{).*)+(?=\n}$)"
s = ("#book{book1,\n"
" title={tit1},\n"
" author={aut1}\n"
"}\n"
"#article{art2,\n"
" title={tit2},\n"
" author={aut2}\n"
"}\n"
"#article{art3,\n"
" title={tit3},\n"
" author={aut3}\n"
"}")
res = [re.sub(r"#article{|(?<=\n)[^\S\n]+", "", m) for m in re.findall(pattern, s, re.M)]
print(res)
Output
['art2,\ntitle={tit2},\nauthor={aut2}', 'art3,\ntitle={tit3},\nauthor={aut3}']

Try this :
results = re.findall(r'{(.*?)}', text)
the output is following :
['tit1', 'aut1', 'tit2', 'aut2', 'tit3', 'aut3']

Here is my solution for regexpression. It's not very elegant, basic.
regexpression = r'\#article\{\w+,\n\s+\w+\=\{.*?\},\n\s+\w+\=\{.*?\}'
aclaratory breakdown of regexpression:
r'\#article\{\w+,\n # catches the article field, 1st line
\s+\w+\=\{.*?\},\n # title sub-field, comma, new line,
\s+\w+\=\{.*?\} # author sub-field

Separate string starting from a parenthesis occurrence (regex)

How can I achieve something like this: "Ca(OH)2" => "Ca" and "(OH)2"
In python, it can be achieved like this:
import re
compound = "Ca(OH)2"
segments=re.split('(\([A-Za-z0-9]*\)[0-9]*)',compound)
print(segments)
Output: ['Ca', '(OH)2', '']
I am following this tutorial from https://medium.com/swlh/balancing-chemical-equations-with-python-837518c9075b (except that I wanted to do it in Java)
(\([A-Za-z0-9]*\)[0-9]*) To breakdown the regex, the outermost parenthesis(near the single quotes) indicate that that is our capture group and it is what we want to keep. The inner parenthesis with the forward slash before them mean that we want to literally find parenthesis(this is called escaping) the [A-Za-z0–9] indicate that we are ok with any letter(of any case) or number within our parentheses and the asterisk after the square brackets is a quantifier. It means that we are ok with having zero or infinite many letters(of any case) or numbers within our parenthesis. and the [0–9] near the end, indicate that we want to include ALL digits to the right of our parenthesis in our split.
I tried to do it in Java but the output was not what I wanted:
String compound = "Ca(OH)2";
String[] segments = compound.split("(\\([A-Za-z0-9]*\\)[0-9]*)");
System.out.println(Arrays.toString(segments));
Output: [Ca]

In Java, unlike Python re.split method, String#split does not keep captured parts.
You can use the following code in Java:
String s = "Ca(OH)2";
Pattern p = Pattern.compile("\\([A-Za-z0-9]+\\)[0-9]*|[A-Za-z0-9]+");
Matcher m = p.matcher(s);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res); // => [Ca, (OH)2]
See the online demo. Here, \([A-Za-z0-9]+\)[0-9]*|[A-Za-z0-9]+ regex matches
\([A-Za-z0-9]+\)[0-9]* - (, one or more ASCII letters/digits, ) and then zero or more digits
| - or
[A-Za-z0-9]+ - one or more ASCII letters/digits.
See the regex demo. It can also be written as
Pattern p = Pattern.compile("\\(\\p{Alnum}+\\)\\d*|\\p{Alnum}+");

Try this mate:
String[] segments = compound.split("([^\\w*])");
so output should be :
ca , oh ,2
Hopefully it will help you!

Splitting a string based on condition [duplicate]

I'm new to regular expressions and would appreciate your help. I'm trying to put together an expression that will split the example string using all spaces that are not surrounded by single or double quotes. My last attempt looks like this: (?!") and isn't quite working. It's splitting on the space before the quote.
Example input:
This is a string that "will be" highlighted when your 'regular expression' matches something.
Desired output:
This
is
a
string
that
will be
highlighted
when
your
regular expression
matches
something.
Note that "will be" and 'regular expression' retain the space between the words.

I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:
[^\s"']+|"([^"]*)"|'([^']*)'
I added the capturing groups because you don't want the quotes in the list.
This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}
If you don't mind having the quotes in the returned list, you can use much simpler code:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}

There are several questions on StackOverflow that cover this same question in various contexts using regular expressions. For instance:
parsings strings: extracting words and phrases
Best way to parse Space Separated Text
UPDATE: Sample regex to handle single and double quoted strings. Ref: How can I split on a string except when inside quotes?
m/('.*?'|".*?"|\S+)/g
Tested this with a quick Perl snippet and the output was as reproduced below. Also works for empty strings or whitespace-only strings if they are between quotes (not sure if that's desired or not).
This
is
a
string
that
"will be"
highlighted
when
your
'regular expression'
matches
something.
Note that this does include the quote characters themselves in the matched values, though you can remove that with a string replace, or modify the regex to not include them. I'll leave that as an exercise for the reader or another poster for now, as 2am is way too late to be messing with regular expressions anymore ;)

If you want to allow escaped quotes inside the string, you can use something like this:
(?:(['"])(.*?)(?<!\\)(?>\\\\)*\1|([^\s]+))
Quoted strings will be group 2, single unquoted words will be group 3.
You can try it on various strings here: http://www.fileformat.info/tool/regex.htm or http://gskinner.com/RegExr/

The regex from Jan Goyvaerts is the best solution I found so far, but creates also empty (null) matches, which he excludes in his program. These empty matches also appear from regex testers (e.g. rubular.com).
If you turn the searches arround (first look for the quoted parts and than the space separed words) then you might do it in once with:
("[^"]*"|'[^']*'|[\S]+)+

(?<!\G".{0,99999})\s|(?<=\G".{0,99999}")\s
This will match the spaces not surrounded by double quotes.
I have to use min,max {0,99999} because Java doesn't support * and + in lookbehind.

It'll probably be easier to search the string, grabbing each part, vs. split it.
Reason being, you can have it split at the spaces before and after "will be". But, I can't think of any way to specify ignoring the space between inside a split.
(not actual Java)
string = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
regex = "\"(\\\"|(?!\\\").)+\"|[^ ]+"; // search for a quoted or non-spaced group
final = new Array();
while (string.length > 0) {
string = string.trim();
if (Regex(regex).test(string)) {
final.push(Regex(regex).match(string)[0]);
string = string.replace(regex, ""); // progress to next "word"
}
}
Also, capturing single quotes could lead to issues:
"Foo's Bar 'n Grill"
//=>
"Foo"
"s Bar "
"n"
"Grill"

String.split() is not helpful here because there is no way to distinguish between spaces within quotes (don't split) and those outside (split). Matcher.lookingAt() is probably what you need:
String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|('[^']+?')|([^\\s]+?))\\s++").matcher(str);
for (int i = 0; i < len; i++)
{
m.region(i, len);
if (m.lookingAt())
{
String s = m.group(1);
if ((s.startsWith("\"") && s.endsWith("\"")) ||
(s.startsWith("'") && s.endsWith("'")))
{
s = s.substring(1, s.length() - 1);
}
System.out.println(i + ": \"" + s + "\"");
i += (m.group(0).length() - 1);
}
}
which produces the following output:
0: "This"
5: "is"
8: "a"
10: "string"
17: "that"
22: "will be"
32: "highlighted"
44: "when"
49: "your"
54: "regular expression"
75: "matches"
83: "something."

I liked Marcus's approach, however, I modified it so that I could allow text near the quotes, and support both " and ' quote characters. For example, I needed a="some value" to not split it into [a=, "some value"].
(?<!\\G\\S{0,99999}[\"'].{0,99999})\\s|(?<=\\G\\S{0,99999}\".{0,99999}\"\\S{0,99999})\\s|(?<=\\G\\S{0,99999}'.{0,99999}'\\S{0,99999})\\s"

Jan's approach is great but here's another one for the record.
If you actually wanted to split as mentioned in the title, keeping the quotes in "will be" and 'regular expression', then you could use this method which is straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc
The regex:
'[^']*'|\"[^\"]*\"|( )
The two left alternations match complete 'quoted strings' and "double-quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expressions on the left. We replace those with SplitHere then split on SplitHere. Again, this is for a true split case where you want "will be", not will be.
Here is a full working implementation (see the results on the online demo).
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
Pattern regex = Pattern.compile("\'[^']*'|\"[^\"]*\"|( )");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits) System.out.println(split);
} // end main
} // end Program

If you are using c#, you can use
string input= "This is a string that \"will be\" highlighted when your 'regular expression' matches <something random>";
List<string> list1 =
Regex.Matches(input, #"(?<match>\w+)|\""(?<match>[\w\s]*)""|'(?<match>[\w\s]*)'|<(?<match>[\w\s]*)>").Cast<Match>().Select(m => m.Groups["match"].Value).ToList();
foreach(var v in list1)
Console.WriteLine(v);
I have specifically added "|<(?[\w\s]*)>" to highlight that you can specify any char to group phrases. (In this case I am using < > to group.
Output is :
This
is
a
string
that
will be
highlighted
when
your
regular expression
matches
something random

1st one-liner using String.split()
String s = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
String[] split = s.split( "(?<!(\"|').{0,255}) | (?!.*\\1.*)" );
[This, is, a, string, that, "will be", highlighted, when, your, 'regular expression', matches, something.]
don't split at the blank, if the blank is surrounded by single or double quotes
split at the blank when the 255 characters to the left and all characters to the right of the blank are neither single nor double quotes
adapted from original post (handles only double quotes)

I'm reasonably certain this is not possible using regular expressions alone. Checking whether something is contained inside some other tag is a parsing operation. This seems like the same problem as trying to parse XML with a regex -- it can't be done correctly. You may be able to get your desired outcome by repeatedly applying a non-greedy, non-global regex that matches the quoted strings, then once you can't find anything else, split it at the spaces... that has a number of problems, including keeping track of the original order of all the substrings. Your best bet is to just write a really simple function that iterates over the string and pulls out the tokens you want.

A couple hopefully helpful tweaks on Jan's accepted answer:
(['"])((?:\\\1|.)+?)\1|([^\s"']+)
Allows escaped quotes within quoted strings
Avoids repeating the pattern for the single and double quote; this also simplifies adding more quoting symbols if needed (at the expense of one more capturing group)

You can also try this:
String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something";
String ss[] = str.split("\"|\'");
for (int i = 0; i < ss.length; i++) {
if ((i % 2) == 0) {//even
String[] part1 = ss[i].split(" ");
for (String pp1 : part1) {
System.out.println("" + pp1);
}
} else {//odd
System.out.println("" + ss[i]);
}
}

The following returns an array of arguments. Arguments are the variable 'command' split on spaces, unless included in single or double quotes. The matches are then modified to remove the single and double quotes.
using System.Text.RegularExpressions;
var args = Regex.Matches(command, "[^\\s\"']+|\"([^\"]*)\"|'([^']*)'").Cast<Match>
().Select(iMatch => iMatch.Value.Replace("\"", "").Replace("'", "")).ToArray();

When you come across this pattern like this :
String str = "2022-11-10 08:35:00,470 RAV=REQ YIP=02.8.5.1 CMID=caonaustr CMN=\"Some Value Pyt Ltd\"";
//this helped
String[] str1= str.split("\\s(?=(([^\"]*\"){2})*[^\"]*$)\\s*");
System.out.println("Value of split string is "+ Arrays.toString(str1));
This results in :[2022-11-10, 08:35:00,470, PLV=REQ, YIP=02.8.5.1, CMID=caonaustr, CMN="Some Value Pyt Ltd"]
This regex matches spaces ONLY if it is followed by even number of double quotes.

Match a line if there is something before a group of characters, at the start of the line [duplicate]

The following should be matched:
AAA123
ABCDEFGH123
XXXX123
can I do: ".*123" ?

Yes, you can. That should work.
. = any char except newline
\. = the actual dot character
.? = .{0,1} = match any char except newline zero or one times
.* = .{0,} = match any char except newline zero or more times
.+ = .{1,} = match any char except newline one or more times

Yes that will work, though note that . will not match newlines unless you pass the DOTALL flag when compiling the expression:
Pattern pattern = Pattern.compile(".*123", Pattern.DOTALL);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.matches();

Use the pattern . to match any character once, .* to match any character zero or more times, .+ to match any character one or more times.

The most common way I have seen to encode this is with a character class whose members form a partition of the set of all possible characters.
Usually people write that as [\s\S] (whitespace or non-whitespace), though [\w\W], [\d\D], etc. would all work.

.* and .+ are for any chars except for new lines.
Double Escaping
Just in case, you would wanted to include new lines, the following expressions might also work for those languages that double escaping is required such as Java or C++:
[\\s\\S]*
[\\d\\D]*
[\\w\\W]*
for zero or more times, or
[\\s\\S]+
[\\d\\D]+
[\\w\\W]+
for one or more times.
Single Escaping:
Double escaping is not required for some languages such as, C#, PHP, Ruby, PERL, Python, JavaScript:
[\s\S]*
[\d\D]*
[\w\W]*
[\s\S]+
[\d\D]+
[\w\W]+
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex_1 = "[\\s\\S]*";
final String regex_2 = "[\\d\\D]*";
final String regex_3 = "[\\w\\W]*";
final String string = "AAA123\n\t"
+ "ABCDEFGH123\n\t"
+ "XXXX123\n\t";
final Pattern pattern_1 = Pattern.compile(regex_1);
final Pattern pattern_2 = Pattern.compile(regex_2);
final Pattern pattern_3 = Pattern.compile(regex_3);
final Matcher matcher_1 = pattern_1.matcher(string);
final Matcher matcher_2 = pattern_2.matcher(string);
final Matcher matcher_3 = pattern_3.matcher(string);
if (matcher_1.find()) {
System.out.println("Full Match for Expression 1: " + matcher_1.group(0));
}
if (matcher_2.find()) {
System.out.println("Full Match for Expression 2: " + matcher_2.group(0));
}
if (matcher_3.find()) {
System.out.println("Full Match for Expression 3: " + matcher_3.group(0));
}
}
}
Output
Full Match for Expression 1: AAA123
ABCDEFGH123
XXXX123
Full Match for Expression 2: AAA123
ABCDEFGH123
XXXX123
Full Match for Expression 3: AAA123
ABCDEFGH123
XXXX123
If you wish to explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

There are lots of sophisticated regex testing and development tools, but if you just want a simple test harness in Java, here's one for you to play with:
String[] tests = {
"AAA123",
"ABCDEFGH123",
"XXXX123",
"XYZ123ABC",
"123123",
"X123",
"123",
};
for (String test : tests) {
System.out.println(test + " " +test.matches(".+123"));
}
Now you can easily add new testcases and try new patterns. Have fun exploring regex.
See also
regular-expressions.info/Tutorial

No, * will match zero-or-more characters. You should use +, which matches one-or-more instead.
This expression might work better for you: [A-Z]+123

Specific Solution to the example problem:-
Try [A-Z]*123$ will match 123, AAA123, ASDFRRF123. In case you need at least a character before 123 use [A-Z]+123$.
General Solution to the question (How to match "any character" in the regular expression):
If you are looking for anything including whitespace you can try [\w|\W]{min_char_to_match,}.
If you are trying to match anything except whitespace you can try [\S]{min_char_to_match,}.

Try the regex .{3,}. This will match all characters except a new line.

[^] should match any character, including newline. [^CHARS] matches all characters except for those in CHARS. If CHARS is empty, it matches all characters.
JavaScript example:
/a[^]*Z/.test("abcxyz \0\r\n\t012789ABCXYZ") // Returns ‘true’.

I like the following:
[!-~]
This matches all char codes including special characters and the normal A-Z, a-z, 0-9
https://www.w3schools.com/charsets/ref_html_ascii.asp
E.g. faker.internet.password(20, false, /[!-~]/)
Will generate a password like this: 0+>8*nZ\\*-mB7Ybbx,b>

I work this Not always dot is means any char. Exception when single line mode. \p{all} should be
String value = "|°¬<>!\"#$%&/()=?'\\¡¿/*-+_#[]^^{}";
String expression = "[a-zA-Z0-9\\p{all}]{0,50}";
if(value.matches(expression)){
System.out.println("true");
} else {
System.out.println("false");
}

regexp: match character group or end of line

How do you match ^ (begin of line) and $ (end of line) in a [] (character group)?
simple example
haystack string: zazty
rules:
match any "z" or "y"
if preceded by
an "a", "b"; or
at the beginning of the line.
pass:
match the first two "z"
a regexp that would work is:
(?:^|[aAbB])([zZyY])
But I keep thinking it would be much cleaner with something like that meant beginning/end of line inside the character group
[^aAbB]([zZyY])
(in that example assumes the ^ means beginning of line, and not what it really is there, a negative for the character group)
note: using python. but knowing that on bash and vim would be good too.
Update: read again the manual it says for set of chars, everything lose it's special meaning, except the character classes (e.g. \w)
down on the list of character classes, there's \A for beginning of line, but this does not work [\AaAbB]([zZyY])
Any idea why?

You can't match a ^ or $ within a [] because the only characters with special meaning inside a character class are ^ (as in "everything but") and - (as in "range") (and the character classes). \A and \Z just don't count as character classes.
This is for all (standard) flavours of regex, so you're stuck with (^|[stuff]) and ($|[stuff]) (which aren't all that bad, really).

Concatenate the character 'a' to the beginning of the string. Then use [aAbB]([zZyY]).

Try this one:
(?<![^abAB])([yzYZ])

Why not trying escape character \? ([\^\$])
UPDATE:
If you want to find all Zs and As preceded by "a" than you can use positive lookbehind. Probably there is no way to specify wild cards in character groups (because wild cards are characters too). (It there is I would be pleased to know about it).
private static final Pattern PATTERN = Pattern.compile("(?<=(?:^|[aA]))([zZyY])");
public static void main(String[] args) {
Matcher matcher = PATTERN.matcher("zazty");
while(matcher.find()) {
System.out.println("matcher.group(0) = " + matcher.group(0));
System.out.println("matcher.start() = " + matcher.start());
}
}
Output:
matcher.group(0) = z
matcher.start() = 0
matcher.group(0) = z
matcher.start() = 2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

RegEx for removing non ASCII characters from both ends - python

You can accomplish this by using the carat ^ character at the beginning of a character set to negate its contents. [^a-zA-Z0-9] will match anything that isn't a letter or numeral. ^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$

Related

regex python catch selective content inside curly braces, including curly sublevels and \n chars

Separate string starting from a parenthesis occurrence (regex)

Splitting a string based on condition [duplicate]

Match a line if there is something before a group of characters, at the start of the line [duplicate]

regexp: match character group or end of line

Categories

Resources