I want to run for loop to generate new regex for checking in the file for example
import re
K = "MS-85409/LN-85409/L-1"
le = ["L-1","L-11","L-112"]
for i in le:
s="[A-Z]+-+[0-9]+/"+i+"$"
x = re.search(r's,K)
if x:
do something
else:
pass
where i want to do is,
on first loop of for s be "[A-Z]+-+[0-9]+/"+L-1$" and put that in x=re.search(r'[A-Z]+-+[0-9]+/"+L-1$',K)
and search it in K if matches do something and then second loop s will be
"[A-Z]+-+[0-9]+/"+L-11$" and put that in x=re.search(r'[A-Z]+-+[0-9]+/"+L-11$',K)
and search it in K if matches do something and so on..
Change those 2 lines:
s="[A-Z]+-+[0-9]+/"+i+"$"
x = re.search(r's,K)
into:
s=r"[A-Z]+-+[0-9]+/"+i+"$"
x = re.search(s,K)
I have a list i need to .join as string and append characters
my_list = ['3.3.3.3', '2.2.2.3', '2.2.2.2']
my_list.append(')"')
my_list.insert(0,'"(')
hostman = '|'.join('{0}'.format(w) for w in my_list)
#my_list.pop()
print(hostman)
print(my_list)
My output = "(|3.3.3.3|2.2.2.3|2.2.2.2|)"
I need the output to be = "(3.3.3.3|2.2.2.3|2.2.2.2)"
how can i strip the first and last | from the string
You are making it harder than it needs to be. You can just use join() directly with the list:
my_list = ['3.3.3.3', '2.2.2.3', '2.2.2.2']
s = '"(' + '|'.join(my_list) + ')"'
# s is "(3.3.3.3|2.2.2.3|2.2.2.2)"
# with quotes as part of the string
or if you prefer format:
s = '"({})"'.format('|'.join(my_list))
Try this :
hostman = "("+"|".join(my_list)+")"
OUTPUT :
'(3.3.3.3|2.2.2.3|2.2.2.2)'
I know that the following is how to replace a string with another string i
line.replace(x, y)
But I only want to replace the second instance of x in the line. How do you do that?
Thanks
EDIT
I thought I would be able to ask this question without going into specifics but unfortunately none of the answers worked in my situation. I'm writing into a text file and I'm using the following piece of code to change the file.
with fileinput.FileInput("Player Stats.txt", inplace=True, backup='.bak') as file:
for line in file:
print(line.replace(chosenTeam, teamName), end='')
But if chosenTeam occurs multiple times then all of them are replaced.
How can I replace only the nth instance in this situation.
That's actually a little tricky. First use str.find to get an index beyond the first occurrence. Then slice and apply the replace (with count 1, so as to replace only one occurrence).
>>> x = 'na'
>>> y = 'banana'
>>> pos = y.find(x) + 1
>>> y[:pos] + y[pos:].replace(x, 'other', 1)
'banaother'
Bonus, this is a method to replace "NTH" occurrence in a string
def nth_replace(str,search,repl,index):
split = str.split(search,index+1)
if len(split)<=index+1:
return str
return search.join(split[:-1])+repl+split[-1]
example:
nth_replace("Played a piano a a house", "a", "in", 1) # gives "Played a piano in a house"
You can try this:
import itertools
line = "hello, hi hello how are you hello"
x = "hello"
y = "something"
new_data = line.split(x)
new_data = ''.join(itertools.chain.from_iterable([(x, a) if i != 2 else (y, a) for i, a in enumerate(new_data)]))
Can you use values from script to inform regexs dynamically how to operate?
For example:
base_pattern = r'\s*(([\d.\w]+)[ \h]+)'
n_rep = random.randint(1, 9)
new_pattern = base_pattern + '{n_rep}'
line_matches = re.findall(new_pattern, some_text)
I keep getting problems with trying to get the grouping to work
Explanation
I am attempting to find the most common number of repetitions of a regex pattern in a text file in order to find table type data within files.
I have the idea to make a regex such as this:
base_pattern = r'\s*(([\d.\w]+)[ \h]+)'
line_matches = np.array([re.findallbase_pattern, line) for line_num, line in enumerate(some_text.split("\n"))])
# Find where the text has similar number of words/data in each line
where_same_pattern= np.where(np.diff([len(x) for x in line_matches])==0)
line_matches_where_same = line_matches[where_same_pattern]
# Extract out just the lines which have data
interesting_lines = np.array([x for x in line_matches_where_same if x != []])
# Find how many words in each line of interest
len_of_lines = [len(l) for l in interesting_lines]
# Use the most prevalent as the most likely number of columns of data
n_cols = Counter(len_of_lines).most_common()[0][0]
# Rerun the data through a regex to find the columns
new_pattern = base_pattern + '{n_cols}'
line_matches = np.array([re.findall(new_pattern, line) for line_num, line in enumerate(some_text.split("\n"))])
you need to use the value of the variable, not a string literal with the name of the variable, e.g.:
new_pattern = base_pattern + '{' + n_cols + '}'
Your pattern is just a string. So, all you need is to convert your number into a string. You can use format (for example, https://infohost.nmt.edu/tcc/help/pubs/python/web/new-str-format.html) to do that:
base_pattern = r'\s*(([\d.\w]+)[ \h]+)'
n_rep = random.randint(1, 9)
new_pattern = base_pattern + '{{{0}}}'.format(n_rep)
print new_pattern ## '\\s*(([\\d.\\w]+)[ \\h]+){6}'
Note that the two first and the two last curly braces are creating the curly braces in the new pattern, while {0} is being replaced by the number n_rep
I'm trying to build a list of domain names from an Enom API call. I get back a lot of information and need to locate the domain name related lines, and then join them together.
The string that comes back from Enom looks somewhat like this:
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1
I'd like to build a list from that which looks like this:
[domain1.com, domain2.org, domain3.co.uk, domain4.net]
To find the different domain name components I've tried the following (where "enom" is the string above) but have only been able to get the SLD and TLD matches.
re.findall("^.*(SLD|TLD).*$", enom, re.M)
Edit:
Every time I see a question asking for regular expression solution I have this bizarre urge to try and solve it without regular expressions. Most of the times it's more efficient than the use of regex, I encourage the OP to test which of the solutions is most efficient.
Here is the naive approach:
a = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
b = a.split("\n")
c = [x.split("=")[1] for x in b if x != 'TLDOverride=1']
for x in range(0,len(c),2):
print ".".join(c[x:x+2])
>> domain1.com
>> domain2.org
>> domain3.co.uk
>> domain4.net
You have a capturing group in your expression. re.findall documentation says:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
That's why only the conent of the capturing group is returned.
try:
re.findall("^.*((?:SLD|TLD)\d*)=(.*)$", enom, re.M)
This would return a list of tuples:
[('SLD1', 'domain1'), ('TLD1', 'com'), ('SLD2', 'domain2'), ('TLD2', 'org'), ('SLD3', 'domain3'), ('TLD4', 'co.uk'), ('SLD5', 'domain4'), ('TLD5', 'net')]
Combining SLDs and TLDs is then up to you.
this works for you example,
>>> sld_list = re.findall("^.*SLD[0-9]*?=(.*?)$", enom, re.M)
>>> tld_list = re.findall("^.*TLD[0-9]*?=(.*?)$", enom, re.M)
>>> map(lambda x: x[0] + '.' + x[1], zip(sld_list, tld_list))
['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
I'm not sure why are you talking about regular expressions. I mean, why don't you just run a for loop?
A famous quote seems to be appropriate here:
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
domains = []
components = []
for line in enom.split('\n'):
k,v = line.split('=')
if k == 'TLDOverride':
continue
components.append(v)
if k.startswith('TLD'):
domains.append('.'.join(components))
components = []
P.S. I'm not sure what's this TLDOverride so the code just ignores it.
Here's one way:
import re
print map('.'.join, zip(*[iter(re.findall(r'^(?:S|T)LD\d+=(.*)$', text, re.M))]*2))
# ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
Just for fun, map -> filter -> map:
input = """
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
"""
splited = map(lambda x: x.split("="), input.split())
slds = filter(lambda x: x[1][0].startswith('SLD'), enumerate(splited))
print map(lambda x: '.'.join([x[1][1], splited[x[0] + 1][1], ]), slds)
>>> ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
This appears to do what you want:
domains = re.findall('SLD\d+=(.+)', re.sub(r'\nTLD\d+=', '.', enom))
It assumes that the lines are sorted and SLD always comes before its TLD. If that can be not the case, try this slightly more verbose code without regexes:
d = dict(x.split('=') for x in enom.strip().splitlines())
domains = [
d[key] + '.' + d.get('T' + key[1:], '')
for key in d if key.startswith('SLD')
]
You need to use multiline regex for this. This is similar to this post.
data = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
domain_seq = re.compile(r"SLD\d=(\w+)\nTLD\d=(\w+)", re.M)
for item in domain_seq.finditer(data):
domain, tld = item.group(1), item.group(2)
print "%s.%s" % (domain,tld)
As some other answers already said, there's no need to use a regular expression here. A simple split and some filtering will do nicely:
lines = data.split("\n") #assuming data contains your input string
sld, tld = [[x.split("=")[1] for x in lines if x[:3] == t] for t in ("SLD", "TLD")]
result = [x+y for x, y in zip(sld, tld)]