split string every nth character and append ':'

split string every nth character and append ':' - python

I've read some switch MAC address table into a file and for some reason the MAC address if formatted as such:
'aabb.eeff.hhii'
This is not what a MAC address should be, it should follow: 'aa:bb:cc:dd:ee:ff'
I've had a look at the top rated suggestions while writing this and found an answer that may fit my needs but it doesn't work
satomacoto's answer
The MACs are in a list, so when I run for loop I can see them all as such:
Current Output
['8424.aa21.4er9','fa2']
['94f1.3002.c43a','fa1']
I just want to append ':' at every 2nd nth character, I can just remove the '.' with a simple replace so don't worry about that
Desired output
['84:24:aa:21:4e:r9','fa2']
['94:f1:30:02:c4:3a','fa1']
My code
info = []
newinfo = []
file = open('switchoutput')
newfile = file.read().split('switch')
macaddtable = newfile[3].split('\\r')
for x in macaddtable:
if '\\n' in x:
x = x.replace('\\n', '')
if carriage in x:
x = x.replace(carriage, '')
if '_#' in x:
x = x.replace('_#', '')
x.split('/r')
info.append(x)
for x in info:
if "Dynamic" in x:
x = x.replace('Dynamic', '')
if 'SVL' in x:
x = x.replace('SVL', '')
newinfo.append(x.split(' '))
for x in newinfo:
for x in x[:1]:
if '.' in x:
x = x.replace('.', '')
print(x)

Borrowing from the solution that you linked, you can achieve this as follows:
macs = [['8424.aa21.4er9','fa2'], ['94f1.3002.c43a','fa1']]
macs_fixed = [(":".join(map(''.join, zip(*[iter(m[0].replace(".", ""))]*2))), m[1]) for m in macs]
Which yields:
[('84:24:aa:21:4e:r9', 'fa2'), ('94:f1:30:02:c4:3a', 'fa1')]

If you like regular expressions:
import re
dotted = '1234.3456.5678'
re.sub('(..)\.?(?!$)', '\\1:', dotted)
# '12:34:34:56:56:78'
The template string looks for two arbitrary characters '(..)' and assigns them to group 1. It then allows for 0 or 1 dots to follow '\.?' and makes sure that at the very end there is no match '(?!$)'. Every match is then replaced with its group 1 plus a colon.
This uses the fact that re.sub operates on nonoverlapping matches.

x = '8424.aa21.4er9'.replace('.','')
print(':'.join(x[y:y+2] for y in range(0, len(x) - 1, 2)))
>> 84:24:aa:21:4e:r9
Just iterate through the string once you've cleaned it, and grab 2 string each time you loop through the string. Using range() third optional argument you can loop through every second elements. Using join() to add the : in between the two elements you are iterating.

You can use re module to achieve your desired output.
import re
s = '8424.aa21.4er9'
s = s.replace('.','')
groups = re.findall(r'([a-zA-Z0-9]{2})', s)
mac = ":".join(groups)
#'84:24:aa:21:4e:r9'
Regex Explanation
[a-zA-Z0-9]: Match any alphabets or number
{2}: Match at most 2 characters.
This way you can get groups of two and then join them on : to achieve your desired mac address format

wrong_mac = '8424.aa21.4er9'
correct_mac = ''.join(wrong_mac.split('.'))
correct_mac = ':'.join(correct_mac[i:i+2] for i in range(0, len(correct_mac), 2))
print(correct_mac)

Related

how to run for loop to check for new pattern in string

I want to run for loop to generate new regex for checking in the file for example
import re
K = "MS-85409/LN-85409/L-1"
le = ["L-1","L-11","L-112"]
for i in le:
s="[A-Z]+-+[0-9]+/"+i+"$"
x = re.search(r's,K)
if x:
do something
else:
pass
where i want to do is,
on first loop of for s be "[A-Z]+-+[0-9]+/"+L-1$" and put that in x=re.search(r'[A-Z]+-+[0-9]+/"+L-1$',K)
and search it in K if matches do something and then second loop s will be
"[A-Z]+-+[0-9]+/"+L-11$" and put that in x=re.search(r'[A-Z]+-+[0-9]+/"+L-11$',K)
and search it in K if matches do something and so on..

Change those 2 lines:
s="[A-Z]+-+[0-9]+/"+i+"$"
x = re.search(r's,K)
into:
s=r"[A-Z]+-+[0-9]+/"+i+"$"
x = re.search(s,K)

Python Joining List and adding and removing characters

I have a list i need to .join as string and append characters
my_list = ['3.3.3.3', '2.2.2.3', '2.2.2.2']
my_list.append(')"')
my_list.insert(0,'"(')
hostman = '|'.join('{0}'.format(w) for w in my_list)
#my_list.pop()
print(hostman)
print(my_list)
My output = "(|3.3.3.3|2.2.2.3|2.2.2.2|)"
I need the output to be = "(3.3.3.3|2.2.2.3|2.2.2.2)"
how can i strip the first and last | from the string

You are making it harder than it needs to be. You can just use join() directly with the list:
my_list = ['3.3.3.3', '2.2.2.3', '2.2.2.2']
s = '"(' + '|'.join(my_list) + ')"'
# s is "(3.3.3.3|2.2.2.3|2.2.2.2)"
# with quotes as part of the string
or if you prefer format:
s = '"({})"'.format('|'.join(my_list))

Try this :
hostman = "("+"|".join(my_list)+")"
OUTPUT :
'(3.3.3.3|2.2.2.3|2.2.2.2)'

Replace specific instance in string - Python

I know that the following is how to replace a string with another string i
line.replace(x, y)
But I only want to replace the second instance of x in the line. How do you do that?
Thanks
EDIT
I thought I would be able to ask this question without going into specifics but unfortunately none of the answers worked in my situation. I'm writing into a text file and I'm using the following piece of code to change the file.
with fileinput.FileInput("Player Stats.txt", inplace=True, backup='.bak') as file:
for line in file:
print(line.replace(chosenTeam, teamName), end='')
But if chosenTeam occurs multiple times then all of them are replaced.
How can I replace only the nth instance in this situation.

That's actually a little tricky. First use str.find to get an index beyond the first occurrence. Then slice and apply the replace (with count 1, so as to replace only one occurrence).
>>> x = 'na'
>>> y = 'banana'
>>> pos = y.find(x) + 1
>>> y[:pos] + y[pos:].replace(x, 'other', 1)
'banaother'

Bonus, this is a method to replace "NTH" occurrence in a string
def nth_replace(str,search,repl,index):
split = str.split(search,index+1)
if len(split)<=index+1:
return str
return search.join(split[:-1])+repl+split[-1]
example:
nth_replace("Played a piano a a house", "a", "in", 1) # gives "Played a piano in a house"

You can try this:
import itertools
line = "hello, hi hello how are you hello"
x = "hello"
y = "something"
new_data = line.split(x)
new_data = ''.join(itertools.chain.from_iterable([(x, a) if i != 2 else (y, a) for i, a in enumerate(new_data)]))

regex number of repetitions from code

Can you use values from script to inform regexs dynamically how to operate?
For example:
base_pattern = r'\s*(([\d.\w]+)[ \h]+)'
n_rep = random.randint(1, 9)
new_pattern = base_pattern + '{n_rep}'
line_matches = re.findall(new_pattern, some_text)
I keep getting problems with trying to get the grouping to work
Explanation
I am attempting to find the most common number of repetitions of a regex pattern in a text file in order to find table type data within files.
I have the idea to make a regex such as this:
base_pattern = r'\s*(([\d.\w]+)[ \h]+)'
line_matches = np.array([re.findallbase_pattern, line) for line_num, line in enumerate(some_text.split("\n"))])
# Find where the text has similar number of words/data in each line
where_same_pattern= np.where(np.diff([len(x) for x in line_matches])==0)
line_matches_where_same = line_matches[where_same_pattern]
# Extract out just the lines which have data
interesting_lines = np.array([x for x in line_matches_where_same if x != []])
# Find how many words in each line of interest
len_of_lines = [len(l) for l in interesting_lines]
# Use the most prevalent as the most likely number of columns of data
n_cols = Counter(len_of_lines).most_common()[0][0]
# Rerun the data through a regex to find the columns
new_pattern = base_pattern + '{n_cols}'
line_matches = np.array([re.findall(new_pattern, line) for line_num, line in enumerate(some_text.split("\n"))])

you need to use the value of the variable, not a string literal with the name of the variable, e.g.:
new_pattern = base_pattern + '{' + n_cols + '}'

Your pattern is just a string. So, all you need is to convert your number into a string. You can use format (for example, https://infohost.nmt.edu/tcc/help/pubs/python/web/new-str-format.html) to do that:
base_pattern = r'\s*(([\d.\w]+)[ \h]+)'
n_rep = random.randint(1, 9)
new_pattern = base_pattern + '{{{0}}}'.format(n_rep)
print new_pattern ## '\\s*(([\\d.\\w]+)[ \\h]+){6}'
Note that the two first and the two last curly braces are creating the curly braces in the new pattern, while {0} is being replaced by the number n_rep

Regular Expression in Python

I'm trying to build a list of domain names from an Enom API call. I get back a lot of information and need to locate the domain name related lines, and then join them together.
The string that comes back from Enom looks somewhat like this:
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1
I'd like to build a list from that which looks like this:
[domain1.com, domain2.org, domain3.co.uk, domain4.net]
To find the different domain name components I've tried the following (where "enom" is the string above) but have only been able to get the SLD and TLD matches.
re.findall("^.*(SLD|TLD).*$", enom, re.M)

Edit:
Every time I see a question asking for regular expression solution I have this bizarre urge to try and solve it without regular expressions. Most of the times it's more efficient than the use of regex, I encourage the OP to test which of the solutions is most efficient.
Here is the naive approach:
a = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
b = a.split("\n")
c = [x.split("=")[1] for x in b if x != 'TLDOverride=1']
for x in range(0,len(c),2):
print ".".join(c[x:x+2])
>> domain1.com
>> domain2.org
>> domain3.co.uk
>> domain4.net

You have a capturing group in your expression. re.findall documentation says:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
That's why only the conent of the capturing group is returned.
try:
re.findall("^.*((?:SLD|TLD)\d*)=(.*)$", enom, re.M)
This would return a list of tuples:
[('SLD1', 'domain1'), ('TLD1', 'com'), ('SLD2', 'domain2'), ('TLD2', 'org'), ('SLD3', 'domain3'), ('TLD4', 'co.uk'), ('SLD5', 'domain4'), ('TLD5', 'net')]
Combining SLDs and TLDs is then up to you.

this works for you example,
>>> sld_list = re.findall("^.*SLD[0-9]*?=(.*?)$", enom, re.M)
>>> tld_list = re.findall("^.*TLD[0-9]*?=(.*?)$", enom, re.M)
>>> map(lambda x: x[0] + '.' + x[1], zip(sld_list, tld_list))
['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']

I'm not sure why are you talking about regular expressions. I mean, why don't you just run a for loop?
A famous quote seems to be appropriate here:
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
domains = []
components = []
for line in enom.split('\n'):
k,v = line.split('=')
if k == 'TLDOverride':
continue
components.append(v)
if k.startswith('TLD'):
domains.append('.'.join(components))
components = []
P.S. I'm not sure what's this TLDOverride so the code just ignores it.

Here's one way:
import re
print map('.'.join, zip(*[iter(re.findall(r'^(?:S|T)LD\d+=(.*)$', text, re.M))]*2))
# ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']

Just for fun, map -> filter -> map:
input = """
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
"""
splited = map(lambda x: x.split("="), input.split())
slds = filter(lambda x: x[1][0].startswith('SLD'), enumerate(splited))
print map(lambda x: '.'.join([x[1][1], splited[x[0] + 1][1], ]), slds)
>>> ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']

This appears to do what you want:
domains = re.findall('SLD\d+=(.+)', re.sub(r'\nTLD\d+=', '.', enom))
It assumes that the lines are sorted and SLD always comes before its TLD. If that can be not the case, try this slightly more verbose code without regexes:
d = dict(x.split('=') for x in enom.strip().splitlines())
domains = [
d[key] + '.' + d.get('T' + key[1:], '')
for key in d if key.startswith('SLD')
]

You need to use multiline regex for this. This is similar to this post.
data = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
domain_seq = re.compile(r"SLD\d=(\w+)\nTLD\d=(\w+)", re.M)
for item in domain_seq.finditer(data):
domain, tld = item.group(1), item.group(2)
print "%s.%s" % (domain,tld)

As some other answers already said, there's no need to use a regular expression here. A simple split and some filtering will do nicely:
lines = data.split("\n") #assuming data contains your input string
sld, tld = [[x.split("=")[1] for x in lines if x[:3] == t] for t in ("SLD", "TLD")]
result = [x+y for x, y in zip(sld, tld)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

split string every nth character and append ':' - python

wrong_mac = '8424.aa21.4er9' correct_mac = ''.join(wrong_mac.split('.')) correct_mac = ':'.join(correct_mac[i:i+2] for i in range(0, len(correct_mac), 2)) print(correct_mac)

Related

how to run for loop to check for new pattern in string

Python Joining List and adding and removing characters

Replace specific instance in string - Python

regex number of repetitions from code

Regular Expression in Python

Categories

Resources