Replacing a certain number of characters only - python

I was wondering if anyone could help provide some insight on the following problem that I am currently struggling with.
Let's assume that you have a file that contains the following characters:
|**********|
You have another file that contains a pattern, such as:
-
/-\
/---\
/-----\
/-------\
How would you go about replacing the characters in the pattern with the characters from the first file BUT at the same time - you can only print the specific number of *'s that are in the first file.
Once you have printed say the 10 stars, in total, you have to STOP printing.
So it would be something like:
*
***
*****
*
Any hints or tips or help would be greatly appreciated.
I have been using .replace() to replace all of the characters in the pattern with the '*' but I am unable to print the specific amount only.
for ch in ['-', '/', '\\']:
if ch in form:
form = form.replace(ch, '*')

Here's my aestric file(aestricks.txt), which contains:
************
And pattern file (pattern.txt), which contains:
-
/-\
/---\
/-----\
/-------\
And here's the code. I know it can be optimized a little more, but I am posting the basic one:
file1 = open("aestricks.txt","r")
file1 = file1.read()
t_c = len(file1)
form = open("pattern.txt","r")
form = form.read()
form1 = form
count = 0
for ch in form1:
if ch in ['-','/', '\\']:
form = form.replace(ch, '*', 1)
count += 1
if count == t_c:
break
for ch in form1:
if ch in ['-','/', '\\']:
form = form.replace(ch, '')
print(form)
OUTPUT:
*
***
*****
***

You can use regular expressions and sub() function from re module.
sub() takes an optional count argument that indicates the maximal number of pattern occurrences to be replaced.
import re
with open('asterisks.txt') as asterisks_file, open('ascii_art.txt') as ascii_art_file:
pattern = re.compile(r'[' # match one from within square brackets:
r'\\' # either backslash
r'/' # or slash
r'-' # or hyphen
r']')
# let n be the number of asterisks from the first file
n = asterisks_file.read().count('*')
# replace first n matches of our pattern (one of /\- characters)
replaced_b = pattern.sub('*', ascii_art_file.read(), n)
# replace rest of the /\- characters with spaces (based on your example)
result = pattern.sub(' ', replaced_b)
print(result)
OUTPUT:
*
***
*****
*

Instead of replacing every character at once you can replace items one at a time and use some count on number of replacements.
But str object doesn't support item assignment at specific index, so you have to convert the str object into list first. Then do your operations and convert back to str again.
you can write something like this.
characters = ['-', '/', '\\']
count = 0
a = list(form) # convert your string to list
for i in range(len(a)):
if a[i] in characters and count < 10: # iterate through each character
a[i] = '*' # replace with '*'
count += 1 # increment count
result = "".join(a) # convert list back into str
print(result)

import re
file1 = open("file1.txt", "r")
s=file1.read()
starcount=s.count('*')
file2 = open("file2.txt", "r")
line = re.sub(r'[-|\\|/]', r'*', file2.read(), starcount)
line = re.sub(r'[-|\\|/]', r'', line)
print(line)
Syntax of sub
>>> import re
>>> help(re.sub)
Help on function sub in module re:
sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used.
Output
*
***
*****
*
Demo
https://repl.it/repls/ObeseNoteworthyDevelopment

You just need to keep track of the number of * in the input line and then continue to replace the dashes until the counter runs out. Once the counter runs out then replace the remaining dashes with empty strings.
def replace(p, s):
counter = len(s) - 2
chars = ['\\', '/', '-']
i = 0
for c in p:
if c in chars:
p = p.replace(c, '*', 1)
i += 1
if i == counter:
break
p = p.replace('\\', '')
p = p.replace('/', '')
p = p.replace('-', '')
return p
if __name__ == '__main__':
stars = '|**********|'
pyramid = r'''
-
/-\
/---\
/-----\
/-------\ '''
print(pyramid)
print(replace(pyramid, stars))
OUTPUT
*
***
*****
*

import re
inp = open('stars.txt', 'r').read()
count = len(inp.strip('|')) #stripping off the extra characters from either end
pattern = open('pattern.txt', 'r').read() # read the entire pattern
out = re.sub(r'-|/|\\', r'*', pattern, count=count) # for any of the characters in '-' or '|' or '\', replace them with a '*' only **count** number of times.
out = re.sub(r'-|/|\\', r'', out) # to remove the left out characters
print (out)
Added one more re.sub line to remove the left out characters if any.

Related

Replace ip partially with x in python

I have several ip addresses like
162.1.10.15
160.15.20.222
145.155.222.1
I am trying to replace the ip's like below.
162.x.xx.xx
160.xx.xx.xxx
145.xxx.xxx.x
How to achieve this in python.
Here’s a slightly simpler solution
import re
txt = "192.1.2.3"
x = txt.split(".", 1) # ['192', '1.2.3']
y = x[0] + "." + re.sub(r"\d", "x", x[1])
print(y) # 192.x.x.x
We can use re.sub with a callback function here:
def repl(m):
return m.group(1) + '.' + re.sub(r'.', 'x', m.group(2)) + '.' + re.sub(r'.', 'x', m.group(3)) + '.' + re.sub(r'.', 'x', m.group(4))
inp = "160.15.20.222"
output = re.sub(r'\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b', repl, inp)
print(output) # 160.xx.xx.xxx
In the callback, the idea is to use re.sub to surgically replace each digit by x. This keeps the same width of each original number.
This is not the optimize solution but it works for me .
import re
Ip_string = "160.15.20.222"
Ip_string = Ip_string.split('.')
Ip_String_x =""
flag = False
for num in Ip_string:
if flag:
num = re.sub('\d','x',num)
Ip_String_x = Ip_String_x + '.'+ num
else:
flag = True
Ip_String_x = num
Solution 1
Other answers are good, and this single regex works, too:
import re
strings = [
'162.1.10.15',
'160.15.20.222',
'145.155.222.1',
]
for string in strings:
print(re.sub(r'(?:(?<=\.)|(?<=\.\d)|(?<=\.\d\d))\d', 'x', string))
output:
162.x.xx.xx
160.xx.xx.xxx
145.xxx.xxx.x
Explanation
(?<=\.) means following by single dot.
(?<=\.\d) means follwing by single dot and single digit.
(?<=\.\d\d) means following by single dot and double digit.
\d means a digit.
So, all digits that following by single dot and none/single/double digits are replaced with 'x'
(?<=\.\d{0,2}) or similar patterns are not allowed since look-behind ((?<=...)) should has fixed-width.
Solution 2
Without re module and regex,
for string in strings:
first, *rest = string.split('.')
print('.'.join([first, *map(lambda x: 'x' * len(x), rest)]))
above code has same result.
There are multiple ways to go about this. Regex is the most versatile and fancy way to write string manipulation codes. But you can also do it by same old for-loops with split and join functions.
ip = "162.1.10.15"
#Splitting the IPv4 address using '.' as the delimiter
ip = ip.split(".")
#Converting the substrings to x's except 1st string
for i,val in enumerate(ip[1:]):
cnt = 0
for x in val:
cnt += 1
ip[i+1] = "x" * cnt
#Combining the substrings back to ip
ip = ".".join(ip)
print(ip)
I highly recommend checking Regex but this is also a valid way to go about this task.
Hope you find this useful!
Pass an array of IPs to this function:
def replace_ips(ip_list):
r_list=[]
for i in ip_list:
first,*other=i.split(".",3)
r_item=[]
r_item.append(first)
for i2 in other:
r_item.append("x"*len(i2))
r_list.append(".".join(r_item))
return r_list
In case of your example:
print(replace_ips(["162.1.10.15","160.15.20.222","145.155.222.1"]))#==> expected output: ["162.x.xx.xx","160.xx.xx.xxx","145.xxx.xxx.x"]
Oneliner FYI:
import re
ips = ['162.1.10.15', '160.15.20.222', '145.155.222.1']
pattern = r'\d{1,3}'
replacement_sign = 'x'
res = [re.sub(pattern, replacement_sign, ip[::-1], 3)[::-1] for ip in ips]
print(res)

How to replace every third word in a string with the # length equivalent

Input:
string = "My dear adventurer, do you understand the nature of the given discussion?"
expected output:
string = 'My dear ##########, do you ########## the nature ## the given ##########?'
How can you replace the third word in a string of words with the # length equivalent of that word while avoiding counting special characters found in the string such as apostrophes('), quotations("), full stops(.), commas(,), exclamations(!), question marks(?), colons(:) and semicolons (;).
I took the approach of converting the string to a list of elements but am finding difficulty filtering out the special characters and replacing the words with the # equivalent. Is there a better way to go about it?
I solved it with:
s = "My dear adventurer, do you understand the nature of the given discussion?"
def replace_alphabet_with_char(word: str, replacement: str) -> str:
new_word = []
alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
for c in word:
if c in alphabet:
new_word.append(replacement)
else:
new_word.append(c)
return "".join(new_word)
every_nth_word = 3
s_split = s.split(' ')
result = " ".join([replace_alphabet_with_char(s_split[i], '#') if i % every_nth_word == every_nth_word - 1 else s_split[i] for i in range(len(s_split))])
print(result)
Output:
My dear ##########, do you ########## the nature ## the given ##########?
There are more efficient ways to solve this question, but I hope this is the simplest!
My approach is:
Split the sentence into a list of the words
Using that, make a list of every third word.
Remove unwanted characters from this
Replace third words in original string with # times the length of the word.
Here's the code (explained in comments) :
# original line
line = "My dear adventurer, do you understand the nature of the given discussion?"
# printing original line
print(f'\n\nOriginal Line:\n"{line}"\n')
# printing somehting to indicate that next few prints will be for showing what is happenning after each lone
print('\n\nStages of parsing:')
# splitting by spaces, into list
wordList = line.split(' ')
# printing wordlist
print(wordList)
# making list of every third word
thirdWordList = [wordList[i-1] for i in range(1,len(wordList)+1) if i%3==0]
# pritning third-word list
print(thirdWordList)
# characters that you don't want hashed
unwantedCharacters = ['.','/','|','?','!','_','"',',','-','#','\n','\\',':',';','(',')','<','>','{','}','[',']','%','*','&','+']
# replacing these characters by empty strings in the list of third-words
for unwantedchar in unwantedCharacters:
for i in range(0,len(thirdWordList)):
thirdWordList[i] = thirdWordList[i].replace(unwantedchar,'')
# printing third word list, now without punctuation
print(thirdWordList)
# replacing with #
for word in thirdWordList:
line = line.replace(word,len(word)*'#')
# Voila! Printing the result:
print(f'\n\nFinal Output:\n"{line}"\n\n')
Hope this helps!
Following works and does not use regular expressions
special_chars = {'.','/','|','?','!','_','"',',','-','#','\n','\\'}
def format_word(w, fill):
if w[-1] in special_chars:
return fill*(len(w) - 1) + w[-1]
else:
return fill*len(w)
def obscure(string, every=3, fill='#'):
return ' '.join(
(format_word(w, fill) if (i+1) % every == 0 else w)
for (i, w) in enumerate(string.split())
)
Here are some example usage
In [15]: obscure(string)
Out[15]: 'My dear ##########, do you ########## the nature ## the given ##########?'
In [16]: obscure(string, 4)
Out[16]: 'My dear adventurer, ## you understand the ###### of the given ##########?'
In [17]: obscure(string, 3, '?')
Out[17]: 'My dear ??????????, do you ?????????? the nature ?? the given ???????????'
With help of some regex. Explanation in the comments.
import re
imp = "My dear adventurer, do you understand the nature of the given discussion?"
every_nth = 3 # in case you want to change this later
out_list = []
# split the input at spaces, enumerate the parts for looping
for idx, word in enumerate(imp.split(' ')):
# only do the special logic for multiples of n (0-indexed, thus +1)
if (idx + 1) % every_nth == 0:
# find how many special chars there are in the current segment
len_special_chars = len(re.findall(r'[.,!?:;\'"]', word))
# ^ add more special chars here if needed
# subtract the number of special chars from the length of segment
str_len = len(word) - len_special_chars
# repeat '#' for every non-special char and add the special chars
out_list.append('#'*str_len + word[-len_special_chars] if len_special_chars > 0 else '')
else:
# if the index is not a multiple of n, just add the word
out_list.append(word)
print(' '.join(out_list))
A mixed of regex and string manipulation
import re
string = "My dear adventurer, do you understand the nature of the given discussion?"
new_string = []
for i, s in enumerate(string.split()):
if (i+1) % 3 == 0:
s = re.sub(r'[^\.:,;\'"!\?]', '#', s)
new_string.append(s)
new_string = ' '.join(new_string)
print(new_string)

Python extract string starting with index up to character

Say I have an incoming string that varies a little:
" 1 |r|=1.2e10 |v|=2.4e10"
" 12 |r|=-2.3e10 |v|=3.5e-04"
"134 |r|= 3.2e10 |v|=4.3e05"
I need to extract the numbers (ie. 1.2e10, 3.5e-04, etc)... so I would like to start at the end of '|r|' and grab all characters up to the ' ' (space) after it. Same for '|v|'
I've been looking for something that would:
Extract a substring form a string starting at an index and ending on a specific character...
But have not found anything remotely close.
Ideas?
NOTE: Added new scenario, which is the one that is causing lots of head-scratching...
To keep it elegant and generic, let's utilize split:
First, we split by ' ' to tokens
Then we find if it has an equal sign and parse the key-value
import re
sabich = "134 |r| = 3.2e10 |v|=4.3e05"
parts = sabich.split(' |')
values = {}
for p in parts:
if '=' in p:
k, v = p.split('=')
values[k.replace('|', '').strip()] = v.strip(' ')
# {'r': '3.2e10', 'v': '4.3e05'}
print(values)
This can be converted to the one-liner:
import re
sabich = "134 |r| = 3.2e10 |v|=4.3e05"
values = {t[0].replace('|', '').strip() : t[1].strip(' ') for t in [tuple(p.split('=')) for p in sabich.split(' |') if '=' in p]}
# {'|r|': '1.2e10', '|v|': '2.4e10'}
print(values)
You can solve it with a regular expression.
import re
strings = [
" 1 |r|=1.2e10 |v|=2.4e10",
" 12 |r|=-2.3e10 |v|=3.5e-04"
]
out = []
pattern = r'(?P<name>\|[\w]+\|)=(?P<value>-?\d+(?:\.\d*)(?:e-?\d*)?)'
for s in strings:
out.append(dict(re.findall(pattern, s)))
print(out)
Output
[{'|r|': '1.2e10', '|v|': '2.4e10'}, {'|r|': '-2.3e10', '|v|': '3.5e-04'}]
And if you want to convert the strings to number
out = []
pattern = r'(?P<name>\|[\w]+\|)=(?P<value>-?\d+(?:\.\d*)(?:e-?\d*)?)'
for s in strings:
# out.append(dict(re.findall(pattern, s)))
out.append({
name: float(value)
for name, value in re.findall(pattern, s)
})
Output
[{'|r|': 12000000000.0, '|v|': 24000000000.0}, {'|r|': -23000000000.0, '|v|': 0.00035}]

Python regular expression to extract optional number at the end of string

I'm trying to write a Python regular expression that can parse strings of the type "<name>(<number>)", where <number> is optional.
For example, if I pass 'sclkout', then there is no number at the end, so it should just match 'sclkout'. If the input is 'line7', then is should match 'line' and '7'. The name can also contain numbers inside it, so if I give it 'dx3f', then the output should be 'dx3f', but for 'dx3b0' it should match 'dx3b' and 0.
This is what I first tried:
import re
def do_match(signal):
match = re.match('(\w+)(\d+)?', signal)
assert match
print "Input = " + signal
print "group1 = " + match.group(1)
if match.lastindex == 2:
print "group2 = " + match.group(2)
print ""
# should match 'sclkout'
do_match("sclkout")
# should match 'line' and '7'
do_match("line7")
# should match 'dx4f'
do_match("dx4f")
# should match 'dx3b' and '0'
do_match("dx3b0")
This is of course wrong because of greedy matching in the (\w+) group, so I tried setting that to non-greedy:
match = re.match('(\w+?)(\d+)?', signal)
This however only matches the first letter of the string.
You don't need regex for this:
from itertools import takewhile
def do_match(s):
num = ''.join(takewhile(str.isdigit, reversed(s)))[::-1]
return s[:s.rindex(num)], num
...
>>> do_match('sclkout')
('sclkout', '')
>>> do_match('line7')
('line', '7')
>>> do_match('dx4f')
('dx4f', '')
>>> do_match('dx3b0')
('dx3b', '0')
You can use a possessive quantifier like this:
^(?<name>\w+?)(?<number>\d+)?$
Or ^(\w+?)(\d+)?$, if you don't want the named capture groups.
See live demo here: http://rubular.com/r/44Ntc4mLDY
([a-zA-Z0-9]*[a-zA-Z]+)([0-9]*) is what you want.
import re
test = ["sclkout", "line7", "dx4f", "dx3b0"]
ans = [("sclkout", ""), ("line", "7"), ("dx4f", ""), ("dx3b", "0")]
for t, a in zip(test, ans):
m = re.match(r'([a-zA-Z0-9]*[a-zA-Z]+)([0-9]*)', t)
if m.groups() == a:
print "OK"
else:
print "NG"
output:
OK
OK
OK
OK

Recursive function dies with Memory Error

Say we have a function that translates the morse symbols:
. -> -.
- -> ...-
If we apply this function twice, we get e.g:
. -> -. -> ...--.
Given an input string and a number of repetitions, want to know the length of the final string. (Problem 1 from the Flemish Programming Contest VPW, taken from these slides which provide a solution in Haskell).
For the given inputfile
4
. 4
.- 2
-- 2
--... 50
We expect the solution
44
16
20
34028664377246354505728
Since I don't know Haskell, this is my recursive solution in Python that I came up with:
def encode(msg, repetition, morse={'.': '-.', '-': '...-'}):
if isinstance(repetition, str):
repetition = eval(repetition)
while repetition > 0:
newmsg = ''.join(morse[c] for c in msg)
return encode(newmsg, repetition-1)
return len(msg)
def problem1(fn):
with open(fn) as f:
f.next()
for line in f:
print encode(*line.split())
which works for the first three inputs but dies with a memory error for the last input.
How would you rewrite this in a more efficient way?
Edit
Rewrite based on the comments given:
def encode(p, s, repetition):
while repetition > 0:
p,s = p + 3*s, p + s
return encode(p, s, repetition-1)
return p + s
def problem1(fn):
with open(fn) as f:
f.next()
for line in f:
msg, repetition = line.split()
print encode(msg.count('.'), msg.count('-'), int(repetition))
Comments on style and further improvements still welcome
Consider that you don't actually have to output the resulting string, only the length of it. Also consider that the order of '.' and '-' in the string do not affect the final length (e.g. ".- 3" and "-. 3" produce the same final length).
Thus, I would give up on storing the entire string and instead store the number of '.' and the number of '-' as integers.
In your starting string, count the number of dots and dashes. Then apply this:
repetitions = 4
dots = 1
dashes = 0
for i in range(repetitions):
dots, dashes = dots + 3 * dashes, dashes + dots
Think about it why this works.
Per #Hammar (I had the same idea, but he explained it better than I could have ;-):
from sympy import Matrix
t = Matrix([[1,3],[1,1]])
def encode(dots, dashes, reps):
res = matrix([dashes, dots]) * t**reps
return res[0,0] + res[0,1]
you put the count of dots to dashes, and count of dashes to dots in each iteration...
def encode(dots, dashes, repetitions):
while repetitions > 0:
dots, dashes = dots + 3 * dashes, dots + dashes
repetitions -= 1
return dots + dashes
def problem1(fn):
with open(fn) as f:
count = int(next(f))
for i in xrange(count):
line = next(f)
msg, repetition = line.strip().split()
print encode(msg.count('.'), msg.count('-'), int(repetition))

Categories