I want to add quotes around all hyphenated words in a string.
With an example string, the desired function add_quotes() should perform like this:
>>> s = '{name = first-name}'
>>> add_quotes(s)
{name = "first-name"}
I know how to find all occurances of hyphenated works using this Regex selector, but don't know how to add quotes around each of those occurances in the original string.
>>> import re
>>> s = '{name = first-name}'
>>> re.findall(r'\w+(?:-\w+)+', s)
['first-name']
Regex can be used to do this with Python Module re from the standard library.
import re
def add_quotes(s):
return re.sub(r'\w+(?:-\w+)+', r'"\g<0>"', s)
s = '{name = first-name}'
add_quotes(s) # returns '{name = "first-name"}'
where the occurances of hyphenated words are found using this selector.
Related
I have a string = "ProductId%3D967164%26Colour%3Dbright-royal" and i want to extract data using regex so output will be 967164bright-royal.
I have tried with this (?:ProductId%3D|Colour%3D)(.*) in python with regex, but getting output as 967164%26Colour%3Dbright-royal.
Can anyone please help me to find out regex for it.
You don't need a regex here, use urllib.parse module:
from urllib.parse import parse_qs, unquote
qs = "ProductId%3D967164%26Colour%3Dbright-royal"
d = parse_qs(unquote(qs))
print(d)
# Output:
{'ProductId': ['967164'], 'Colour': ['bright-royal']}
Final output:
>>> ''.join(i[0] for i in d.values())
'967164bright-royal'
Update
>>> ''.join(re.findall(r'%3D(\S*?)(?=%26|$)', qs))
'967164bright-royal'
The alternative matches on the first part, you can not get a single match for 2 separate parts in the string.
If you want to capture both values using a regex in a capture group:
(?:ProductId|Colour)%3D(\S*?)(?=%26|$)
Regex demo
import re
pattern = r"(?:ProductId|Colour)%3D(\S*?)(?=%26|$)"
s = "ProductId%3D967164%26Colour%3Dbright-royal"
print(''.join(re.findall(pattern, s)))
Output
967164bright-royal
If you must use a regular expression and you can guarantee that the string will always be formatted the way you expect, you could try this.
import re
pattern = r"ProductId%3D(\d+)%26Colour%3D(.*)"
string = "ProductId%3D967164%26Colour%3Dbright-royal"
matches = re.match(pattern, string)
print(f"{matches[1]}{matches[2]}")
How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True
As title says string is '="24digit number"' and I want to extract number between "" (example: ="000021484123647598423458" should get me '000021484123647598423458').
There are answers that answer how to get data between " but in my case I also need to confirm that =" exist without capturing (there are also other "\d{24}" strings, but they are for other stuff) it.
I couldn't modify these answers to get what I need.
My latest regex was ((?<=\")\d{24}(?=\")) and string is ="000021484123647598423458".
UPDATE: I think I will settle with pattern r'^(?:\=\")(\d{24})(?:\")' because I just want to capture digit characters.
word = '="000021484123647598423458"'
pattern = r'^(?:\=\")(\d{24})(?:\")'
match = re.findall(pattern, word)[0]
Thank you all for suggestions.
You could have it like:
=(['"])(\d{24})\1
See a demo on regex101.com.
In Python:
import re
string = '="000021484123647598423458"'
rx = re.compile(r'''=(['"])(\d{24})\1''')
print(rx.search(string).group(2))
# 000021484123647598423458
Any one of the following works:
>>> st = '="000021484123647598423458"'
>>> import re
>>> re.findall(r'".*\d+.*"',st)
['"000021484123647598423458"']
or
>>> re.findall(r'".*\d{24}.*"',st)
['"000021484123647598423458"']
or
>>> re.findall(r'"\d{24}"',st)
['"000021484123647598423458"']
I am trying to write code that will take a string and remove specific data from it. I know that the data will look like the line below, and I only need the data within the " " marks, not the marks themselves.
inputString = 'type="NN" span="123..145" confidence="1.0" '
Is there a way to take a Substring of a string within two characters to know the start and stop points?
You can extract all the text between pairs of " characters using regular expressions:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
pat=re.compile('"([^"]*)"')
while True:
mat=pat.search(inputString)
if mat is None:
break
strings.append(mat.group(1))
inputString=inputString[mat.end():]
print strings
or, easier:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
strings=re.findall('"([^"]*)"', inputString)
print strings
Output for both versions:
['NN', '123..145', '1.0']
fields = inputString.split('"')
print fields[1], fields[3], fields[5]
You could split the string at each space to get a list of 'key="value"' substrings and then use regular expressions to parse the substrings.
Using your input string:
>>> input_string = 'type="NN" span="123..145" confidence="1.0" '
>>> input_string_split = input_string.split()
>>> print input_string_split
[ 'type="NN"', 'span="123..145"', 'confidence="1.0"' ]
Then use regular expressions:
>>> import re
>>> pattern = r'"([^"]+)"'
>>> for substring in input_string_split:
match_obj = search(pattern, substring)
print match_obj.group(1)
NN
123..145
1.0
The regular expression '"([^"]+)"' matches anything within quotation marks (provided there is at least one character). The round brackets indicate the bit of the regular expression that you are interested in.
I'm looking for a way to search a text file for quotes made by author and then print them out. My script so far:
import re
#searches end of string
print re.search('"$', 'i am searching for quotes"')
#searches start of string
print re.search('^"' , '"i am searching for quotes"')
What I would like to do
import re
## load text file
quotelist = open('A.txt','r').read()
## search for strings contained with quotation marks
re.search ("-", quotelist)
## Store in list or Dict
Dict = quotelist
## Print quotes
print Dict
I also tried
import re
buffer = open('bbc.txt','r').read()
quotes = re.findall(r'.*"[^"].*".*', buffer)
for quote in quotes:
print quote
# Add quotes to list
l = []
for quote in quotes:
print quote
l.append(quote)
Develop a regular expression that matches all the expected characters you would expect to see inside of a quoted string. Then use the python method findall in re to find all occurrences of the match.
import re
buffer = open('file.txt','r').read()
quotes = re.findall(r'"[^"]*"',buffer)
for quote in quotes:
print quote
Searching between " and ” requires a unicode-regex search such as:
quotes = re.findall(ur'"[^\u201d]*\u201d',buffer)
And for a document that uses " and ” interchangeably for quotation termination
quotes = re.findall(ur'"[^"^\u201d]*["\u201d]', buffer)
You don't need regular expressions to find static strings. You should use this Python idiom for finding strings:
>>> haystack = 'this is the string to search!'
>>> needle = '!'
>>> if needle in haystack:
print 'Found', needle
Creating a list is easy enough -
>>> matches = []
Storing matches is easy too...
>>> matches.append('add this string to matches')
This should be enough to get you started. Good luck!
An addendum to address the comment below...
l = []
for quote in matches:
print quote
l.append(quote)