Remove nonalpha, edit, add nonalpha - python

i want to edit a string after removing nonalpha chars, then move them in the proper place again.
eg:
import re
string = input()
alphastring = re.sub('[^0-9a-zA-Z]+', '', string)
alphastring = alphastring[::2]
i want it to be as following:
string = "heompuem ykojua'rje awzeklvl."
alphastring = heompuemykojuarjeawzeklvl
alphastring = hopeyourewell
?????? = hope you're well.
I tried fixing the problem with different sollutions but none give me the right output. I resorted to using RegEx which I'm not very familiar with.
Any help would be greatly welcomed.

I can't think of any reasonable way to do this with a regexp. Just use a loop that copies from input to output, skipping every other alphanumeric character.
copy_flag = True
string = input()
alphastring = ''
for c in string:
if c.isalnum():
if copy_flag:
alphastring += c
copy_flag = not copy_flag
else:
alphastring += c

Here is an approach:
string = "heompuem ykojua'rje awzeklvl."
result, i = "", 0
for c, alph in ((c, c.isalnum()) for c in string)
if not (alph and i%2): # skip odd alpha-numericals
result += c
i += alph # only increment counter for alpha-numericals
result
# "hope you're well."

Related

How to replace every third word in a string with the # length equivalent

Input:
string = "My dear adventurer, do you understand the nature of the given discussion?"
expected output:
string = 'My dear ##########, do you ########## the nature ## the given ##########?'
How can you replace the third word in a string of words with the # length equivalent of that word while avoiding counting special characters found in the string such as apostrophes('), quotations("), full stops(.), commas(,), exclamations(!), question marks(?), colons(:) and semicolons (;).
I took the approach of converting the string to a list of elements but am finding difficulty filtering out the special characters and replacing the words with the # equivalent. Is there a better way to go about it?
I solved it with:
s = "My dear adventurer, do you understand the nature of the given discussion?"
def replace_alphabet_with_char(word: str, replacement: str) -> str:
new_word = []
alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
for c in word:
if c in alphabet:
new_word.append(replacement)
else:
new_word.append(c)
return "".join(new_word)
every_nth_word = 3
s_split = s.split(' ')
result = " ".join([replace_alphabet_with_char(s_split[i], '#') if i % every_nth_word == every_nth_word - 1 else s_split[i] for i in range(len(s_split))])
print(result)
Output:
My dear ##########, do you ########## the nature ## the given ##########?
There are more efficient ways to solve this question, but I hope this is the simplest!
My approach is:
Split the sentence into a list of the words
Using that, make a list of every third word.
Remove unwanted characters from this
Replace third words in original string with # times the length of the word.
Here's the code (explained in comments) :
# original line
line = "My dear adventurer, do you understand the nature of the given discussion?"
# printing original line
print(f'\n\nOriginal Line:\n"{line}"\n')
# printing somehting to indicate that next few prints will be for showing what is happenning after each lone
print('\n\nStages of parsing:')
# splitting by spaces, into list
wordList = line.split(' ')
# printing wordlist
print(wordList)
# making list of every third word
thirdWordList = [wordList[i-1] for i in range(1,len(wordList)+1) if i%3==0]
# pritning third-word list
print(thirdWordList)
# characters that you don't want hashed
unwantedCharacters = ['.','/','|','?','!','_','"',',','-','#','\n','\\',':',';','(',')','<','>','{','}','[',']','%','*','&','+']
# replacing these characters by empty strings in the list of third-words
for unwantedchar in unwantedCharacters:
for i in range(0,len(thirdWordList)):
thirdWordList[i] = thirdWordList[i].replace(unwantedchar,'')
# printing third word list, now without punctuation
print(thirdWordList)
# replacing with #
for word in thirdWordList:
line = line.replace(word,len(word)*'#')
# Voila! Printing the result:
print(f'\n\nFinal Output:\n"{line}"\n\n')
Hope this helps!
Following works and does not use regular expressions
special_chars = {'.','/','|','?','!','_','"',',','-','#','\n','\\'}
def format_word(w, fill):
if w[-1] in special_chars:
return fill*(len(w) - 1) + w[-1]
else:
return fill*len(w)
def obscure(string, every=3, fill='#'):
return ' '.join(
(format_word(w, fill) if (i+1) % every == 0 else w)
for (i, w) in enumerate(string.split())
)
Here are some example usage
In [15]: obscure(string)
Out[15]: 'My dear ##########, do you ########## the nature ## the given ##########?'
In [16]: obscure(string, 4)
Out[16]: 'My dear adventurer, ## you understand the ###### of the given ##########?'
In [17]: obscure(string, 3, '?')
Out[17]: 'My dear ??????????, do you ?????????? the nature ?? the given ???????????'
With help of some regex. Explanation in the comments.
import re
imp = "My dear adventurer, do you understand the nature of the given discussion?"
every_nth = 3 # in case you want to change this later
out_list = []
# split the input at spaces, enumerate the parts for looping
for idx, word in enumerate(imp.split(' ')):
# only do the special logic for multiples of n (0-indexed, thus +1)
if (idx + 1) % every_nth == 0:
# find how many special chars there are in the current segment
len_special_chars = len(re.findall(r'[.,!?:;\'"]', word))
# ^ add more special chars here if needed
# subtract the number of special chars from the length of segment
str_len = len(word) - len_special_chars
# repeat '#' for every non-special char and add the special chars
out_list.append('#'*str_len + word[-len_special_chars] if len_special_chars > 0 else '')
else:
# if the index is not a multiple of n, just add the word
out_list.append(word)
print(' '.join(out_list))
A mixed of regex and string manipulation
import re
string = "My dear adventurer, do you understand the nature of the given discussion?"
new_string = []
for i, s in enumerate(string.split()):
if (i+1) % 3 == 0:
s = re.sub(r'[^\.:,;\'"!\?]', '#', s)
new_string.append(s)
new_string = ' '.join(new_string)
print(new_string)

Split string with delimiter as whitespace but it preserve whitespace within doubleqoutes and also doubleqoutes in Python

Split string with delimiter as whitespace but it should preserve whitespace within doubleqoutes and also doubleqoutes in Python
a='Append ",","te st1",input To output'
output list should be like below
['Append', '",","te st1",input', 'To', 'output']
I found a solution using regular expressions:
re.findall("(?:\".*?\"|\S)+", a)
gives
['Append', '",","te st1",input', 'To', 'output']
Update: Improved pattern that includes escaping:
re.findall("(?:\".*?[^\\\"]\"|\S)+", a)
Please note that this also matches the empty string "" by means of the \S part of the pattern.
Note: Old answer below for archival purposes:
The obvious answer would be to use shlex like this:
>>> shlex.split('Append ",","te st1",input To output')
['Append', ',,te st1,input', 'To', 'output']
This will remove the quotes, unfortunately. Anyways, this kind of problem can be solved with a simple state machine. The performance might be sub-par, but it works:
#!/usr/bin/env python2
import string
def split_string_whitespace(s):
current_token = []
result = []
state = 0
for c in s + " ":
if state == 0:
if c in string.whitespace:
if current_token:
result.append("".join(current_token))
current_token = []
else:
current_token.append(c)
if c == '"':
state = 1
else:
current_token.append(c)
if c == '"':
state = 0
return result
print split_string_whitespace('Append ",","te st1",input To output')
The script yields:
['Append', '",","te st1",input', 'To', 'output']
I'm pretty sure one could construct something with the re submodule, so I'm waiting for that answer, too :)
A very simple generator function, maintaining the current "quotation state":
def splitter(s):
i, quoted = 0, False
for n, c in enumerate(s+' '):
if c == '"':
quoted = not quoted
elif c == ' ' and not quoted:
if n > i:
yield s[i:n]
i = n+1
list(splitter(a))
# ['Append', '",","te st1",input', 'To', 'output']

Python Regex use during specific string search in a text file

I have to find an expression in a text file like : StartTime="4/11/2013 8:11:20:965" and EndTime="4/11/2013 8:11:22:571"
So I used the regex expression
r'(\w)="(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}:\d{2,3})"'
Thanks again to eumiro for his help earlier (Retrieve randomly preformatted text from Text File)
But I can't find anything in my file, and I checked it was there.
I can't go trhough 'GetDuration lvl 1' with it actually.
I tried to simplify my regex as r'(\d)', and it worked to lvl 4, so I thought it could be and issue with eventually protected " but I didn't see anything about this in python doc.
What am I missing ?
Regular_Exp = r'(\w)="(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}:\d{2,3})"'
def getDuration(timeCode1, timeCode2)
duration =0
c = ''
print 'GetDuration lvl 0'
for c in str(timeCode1) :
m = re.search(Regular_Exp, c)
print 'GetDuration lvl 1'
if m:
print 'GetDuration lvl 2'
for text in str(timeCode2) :
print 'GetDuration lvl 3'
n = re.search(Regular_Exp, c)
if n:
print 'GetDuration lvl 4'
timeCode1Split = timeCode1.split(' ')
timeCode1Date = timeCode1Split[0].split('/')
timeCode1Heure = timeCode1Split[1].split(':')
timeCode2Split = timeCode2.split(' ')
timeCode2Date = timeCode2Split[0].split('/')
timeCode2Heure = timeCode2Split[1].split(':')
timeCode1Date = dt.datetime(timeCode1Date[0], timeCode1Date[1], timeCode1Date[2], timeCode1Heure[0], timeCode1Heure[0], timeCode1Heure[0], tzinfo=utc)
timeCode2Date = dt.datetime(timeCode2Date[0], timeCode2Date[1], timeCode2Date[2], timeCode2Heure[0], timeCode2Heure[0], timeCode2Heure[0], tzinfo=utc)
print 'TimeCode'
print timeCode1Date
print timeCode2Date
duration += timeCode1Date - timeCode2Date
return duration
for c in str(timeCode1) :
m = re.search(Regular_Exp, c)
...
for x in str(something) means you're iterating something character by character (one character=1 length str at a time), and no regex can match with that.
Maybe this exp should help:
"(\w+?)=\"(.+?)\""
TO use:
>>> string = u'StartTime="4/11/2013 8:11:20:965" and EndTime="4/11/2013 8:11:22:571"'
>>> regex = re.compile("(\w+?)=\"(.+?)\"")
# Run findall
>>> regex.findall(string)
[(u'StartTime', u'4/11/2013 8:11:20:965'), (u'EndTime', u'4/11/2013 8:11:22:571')]
Also, for c in str(timeCode1), try printing c, you are going one character at a time, not a good idea with regex..

Cesar Cipher on Python beginner level

''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
if __name__ == "__main__": print encrypt("programming", 3)
This gives me wrong answers on shifts higher than 1 and words longer then 2. I can't figure out why. Any help please?
Thilo explains the problem exactly. Let's step through it:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
Try encrypt('abc', 1) and see what happens:
First loop:
i = 'a'
r = chr(ord('a')+1) = 'b'
word = 'abc'.replace('a', 'b') = 'bbc'
Second loop:
i = 'b'
r = chr(ord('b')+1) = 'c'
word = 'bbc'.replace('b', 'c') = 'ccc'
Third loop:
i = 'c'
r = chr(ord('c')+1) = 'd'
word = 'ccc'.replace('c', 'd') = 'ddd'
You don't want to replace every instance of i with r, just this one. How would you do this? Well, if you keep track of the index, you can just replace at that index. The built-in enumerate function lets you get each index and each corresponding value at the same time.
for index, ch in enumerate(word):
r = chr(ord(ch)+shift)
if r > "z":
r = chr(ord(ch) - 26 + shift)
word = new_word_replacing_one_char(index, r)
Now you just have to write that new_word_replacing_one_char function, which is pretty easy if you know slicing. (If you haven't learned slicing yet, you may want to convert the string into a list of characters, so you can just say word[index] = r, and then convert back into a string at the end.)
I don't know how Python likes replacing characters in the word while you are iterating over it, but one thing that seems to be a problem for sure is repeated letters, because replace will replace all occurrences of the letter, not just the one you are currently looking at, so you will end up shifting those repeated letters more than once (as you hit them again in a later iteration).
Come to think of it, this will also happen with non-repeated letters. For example, shifting ABC by 1 will become -> BBC -> CCC -> DDD in your three iterations.
I had this assignment as well. The hint is you have to keep track of where the values wrap, and use that to your advantage. I also recommend using the upper function call so everything is the same case, reduces the number of checks to do.
In Python, strings are immutable - that is they cannot be changed. Lists, however, can be. So to use your algorithm, use a list instead:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
# Convert the word to a list
word = list(word)
# Iterate over the word by index
for i in xrange(len(word)):
# Get the character at i
c = word[i]
# Apply shift algorithm
r = chr(ord(c)+shift)
if r > "z":
r = chr(ord(c) - 26 + shift)
# Replace the character at i
word[i] = r
# Convert the list back to a string
return ''.join(word)
if __name__ == "__main__": print encrypt("programming", 3)

How can I increment a char?

I'm new to Python, coming from Java and C. How can I increment a char? In Java or C, chars and ints are practically interchangeable, and in certain loops, it's very useful to me to be able to do increment chars, and index arrays by chars.
How can I do this in Python? It's bad enough not having a traditional for(;;) looper - is there any way I can achieve what I want to achieve without having to rethink my entire strategy?
In Python 2.x, just use the ord and chr functions:
>>> ord('c')
99
>>> ord('c') + 1
100
>>> chr(ord('c') + 1)
'd'
>>>
Python 3.x makes this more organized and interesting, due to its clear distinction between bytes and unicode. By default, a "string" is unicode, so the above works (ord receives Unicode chars and chr produces them).
But if you're interested in bytes (such as for processing some binary data stream), things are even simpler:
>>> bstr = bytes('abc', 'utf-8')
>>> bstr
b'abc'
>>> bstr[0]
97
>>> bytes([97, 98, 99])
b'abc'
>>> bytes([bstr[0] + 1, 98, 99])
b'bbc'
"bad enough not having a traditional for(;;) looper"?? What?
Are you trying to do
import string
for c in string.lowercase:
...do something with c...
Or perhaps you're using string.uppercase or string.letters?
Python doesn't have for(;;) because there are often better ways to do it. It also doesn't have character math because it's not necessary, either.
Check this: USING FOR LOOP
for a in range(5):
x='A'
val=chr(ord(x) + a)
print(val)
LOOP OUTPUT: A B C D E
I came from PHP, where you can increment char (A to B, Z to AA, AA to AB etc.) using ++ operator. I made a simple function which does the same in Python. You can also change list of chars to whatever (lowercase, uppercase, etc.) is your need.
# Increment char (a -> b, az -> ba)
def inc_char(text, chlist = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
# Unique and sort
chlist = ''.join(sorted(set(str(chlist))))
chlen = len(chlist)
if not chlen:
return ''
text = str(text)
# Replace all chars but chlist
text = re.sub('[^' + chlist + ']', '', text)
if not len(text):
return chlist[0]
# Increment
inc = ''
over = False
for i in range(1, len(text)+1):
lchar = text[-i]
pos = chlist.find(lchar) + 1
if pos < chlen:
inc = chlist[pos] + inc
over = False
break
else:
inc = chlist[0] + inc
over = True
if over:
inc += chlist[0]
result = text[0:-len(inc)] + inc
return result
There is a way to increase character using ascii_letters from string package which ascii_letters is a string that contains all English alphabet, uppercase and lowercase:
>>> from string import ascii_letters
>>> ascii_letters[ascii_letters.index('a') + 1]
'b'
>>> ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Also it can be done manually;
>>> letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> letters[letters.index('c') + 1]
'd'
def doubleChar(str):
result = ''
for char in str:
result += char * 2
return result
print(doubleChar("amar"))
output:
aammaarr
For me i made the fallowing as a test.
string_1="abcd"
def test(string_1):
i = 0
p = ""
x = len(string_1)
while i < x:
y = (string_1)[i]
i=i+1
s = chr(ord(y) + 1)
p=p+s
print(p)
test(string_1)

Categories