This question already has answers here:
How to split strings into text and number?
(11 answers)
Closed 2 years ago.
I have a string like 'S10', 'S11' v.v
How to split this to ['S','10'], ['S','11']
example:
import re
str = 'S10'
re.compile(...)
result = re.split(str)
result:
print(result)
// ['S','10']
resolved at How to split strings into text and number?
This should do the trick:
I'm using capture groups using the circle brackets to match the alphabetical part to the first group and the numbers to the second group.
Code:
import re
str_data = 'S10'
exp = "(\w)(\d+)"
match = re.match(exp, str_data)
result = match.groups()
Output:
('S', '10')
Related
This question already has answers here:
How to replace the first occurrence of a regular expression in Python?
(2 answers)
Closed 6 months ago.
Simple regex question. I have a string in the following format:
string = """陣頭には見るも<RUBY text="いかめ">厳</RUBY>しい、厚い鎧姿の武士達が立つ。
分厚い鉄甲、長大な太刀――彼らの<RUBY text="かも">醸</RUBY>し出す威圧感
は、一騎のみでも背後の兵全てに優る戦力たり得ると
いう事実を、何より雄弁に物語っている。"""
What is the regular expression to find the first occurance of <RUBY text="something">something</RUBY> and replace it with something like HELLO i.e
陣頭には見るもHELLOしい、厚い鎧姿の武士達が立つ。
分厚い鉄甲、長大な太刀――彼らの<RUBY text="かも">醸</RUBY>し出す威圧感
は、一騎のみでも背後の兵全てに優る戦力たり得ると
いう事実を、何より雄弁に物語っている。
I tried it with (<R(.*?)/RUBY>){0} but this didn't work.
string = re.sub("(\<R(.*?)\/RUBY>){0}", "HELLO", string)
print(string)
Can be done like this:
string = """陣頭には見るも<RUBY text="いかめ">厳</RUBY>しい、厚い鎧姿の武士達が立つ。
分厚い鉄甲、長大な太刀――彼らの<RUBY text="かも">醸</RUBY>し出す威圧感
は、一騎のみでも背後の兵全てに優る戦力たり得ると
いう事実を、何より雄弁に物語っている。"""
try:
first_match = re.findall(r'<RUBY text=.*</RUBY>', string)[0]
parts = string.split(first_match)
result = f'{parts[0]}HELLO{first_match.join(parts[1:])}'
except IndexError:
result = string
print(result)
Result:
陣頭には見るもHELLOしい、厚い鎧姿の武士達が立つ。
分厚い鉄甲、長大な太刀――彼らの<RUBY text="かも">醸</RUBY>し出す威圧感
は、一騎のみでも背後の兵全てに優る戦力たり得ると
いう事実を、何より雄弁に物語っている。
This question already has answers here:
Regular Expressions: Is there an AND operator?
(14 answers)
Closed 2 years ago.
I have a list of strings like:
1,-102a
1,123-f
1943dsa
-da238,
-,dwjqi92
How can I make a Regex expression in Python that matches as long as the string contains the characters , AND - regardless of the order or the pattern in which they appear?
I would use the following regex alternation:
,.*-|-.*,
Sample script:
inp = ['1,-102a', '1,123-f', '1943dsa', '-da238,', '-,dwjqi92']
output = [x for x in inp if re.search(r',.*-|-.*,', x)]
print(output)
This prints:
['1,-102a', '1,123-f', '-da238,', '-,dwjqi92']
This question already has answers here:
Splitting a string where it switches between numeric and alphabetic characters
(4 answers)
Closed 4 years ago.
I have been looking for a way to split a string by digits, for instance:
st = "abc4ert"
from string import digits
st = st.split(digits)
--> st = ['abc','ert']
Is there a way to do that (without including the numbers in the list)?
Use Regex.
Ex:
import re
st = "abc4ert"
print(re.findall(r"[A-Za-z]+", st))
Output:
['abc', 'ert']
Use re.split:
import re
st = "abc4ert"
st = re.split(r'\d+', st)
print(st)
Output:
['abc', 'ert']
This question already has answers here:
Printing Lists as Tabular Data
(20 answers)
Closed 5 years ago.
import re
from collections import Counter
words = re.findall(r'\w+', open('test01_cc_sharealike.txt').read().lower())
count = Counter(words).most_common(10)
print(count)
How can I change the code so it will format into like this:
Word number
word number
instead of a list
I want the format to be: the word first then 4 whitespace and the number of the word it appears on the text and so on
Just use a for loop, so instead of print(count), you could use:
for p in count:
print(p[0]+" "+str(p[1]))
However, for formatting purposes, you would probably prefer to align the numbers, so you should use:
indent=1+max([len(p[0]) for p in count])
for p in count:
print(p[0].rjust(indent)+str(p[1]))
This question already has an answer here:
Detect repetitions in string
(1 answer)
Closed 8 years ago.
Let's suppose I have this string
s = '123123123'
I can notice the '123' sub-string is being repeated.
here = '1234'
The sub-string would be '1234' with no repetitions.
s = '11111'
The sub-string would be '1'
How can I get this with Python? Any hints?
strings = ['123123123', '1234', '11111']
import re
pattern, result = re.compile(r'(.+?)\1+'), []
for item in strings:
result.extend(pattern.findall(item) or [item])
print result
# ['123', '1234', '1']
Debuggex Demo
You can see the explanation for the RegEx here