How to safely truncate a quoted string?

How to safely truncate a quoted string? - python

I have the following string:
Customer sale 88% in urm 50
Quoted with urllib.parse.quote, it becomes:
Customer%20sale%2088%25%20in%20urm%2050%27
Then I need to limit its length to a maximum of 30 characters and I use value[:30].
The problem is that it becomes "Customer%20sale%2088%25%20in%" which is not valid:
The last % is part of %20 from quoted string and makes it an invalid quoted string.
I don't have control over the original string, and the final result needs to have a maximum 30 length, so I can't truncate it beforehand.
What approach would be feasible?

urllib.quote uses percent-encoding as defined in RFC 3986. This means that encoded character will always be of the form "%" HEXDIG HEXDIG.
So you simply can delete any trailing rest of the encoding by looking for a % sign in the last two characters.
For example:
>>> s=quote("Customer sale 88% in urm 50")[:30]
>>> n=s.find('%', -2)
>>> s if n < 0 else s[:n]
'Customer%20sale%2088%25%20in'

What about looking for dangling percentage marks?
value = value[:30]
if value[-1] == "%":
value = value[:-1]
elif value[-2] == "%":
value = value[:-2]
print(value)

The encoded string will be always in the format of %HH. You want the string length to be maximum of 30characters with a valid encoding. So, probably the best solution I can think of:
from urllib.parse import quote
string= "Customer sale 88% in urm 50"
string=quote(string)
string=string[:string[:30].rfind("%")]
print(string)
Output:
string=string[:string[:30].rfind("%")]
Solution:
After the encoding, you may get a string of any length, the following one line of code will be enough to achieve your requirement in a very optimized way.
string=string[:string[:30].rfind("%")]
Explanation:
It first extracts 30 characters from the quoted string then searches for % from the right end. The position of % from the right end will be used to extract the string. Voilaa!! You got your result.
Alternate approach:
Instead of string=string[:string[:30].rfind("%")] you can do like this too string=string[:string.rfind("%",0,30)]
Note: I extracted the string and stored it back to showcase how it works, if you do not want to store then you can simply use like print(string[:string[:30].rfind("%")]) to display the results
Hope it helps...

How about putting the individual characters in a list and then count and strip?
Rough example:
from urllib import quote
s = 'Customer sale 88% in urm 50'
res = []
for c in s:
res.append(quote(c))
print res # ['C', 'u', 's', 't', 'o', 'm', 'e', 'r', '%20', 's', 'a', 'l', 'e', '%20', '8', '8', '%25', '%20', 'i', 'n', '%20', 'u', 'r', 'm', '%20', '5', '0']
print len(res)
current_length = 0
for item in res:
current_length += len(item)
print current_length # 39
while current_length > 30:
res = res[:-1]
current_length = 0
for item in res:
current_length += len(item)
print "".join(res) # Customer%20sale%2088%25%20in
That way you will not end up cutting in the middle of a quoting character. And in case you need a different length in the future, you just need to modify the while-loop. Well, code can be made more clean as well ;)

Related

Recreating the strip() method using list comprehensions but output returns unexpected result

I am trying to 'recreate' the str.split() method in python for fun.
def ourveryownstrip(string, character):
newString = [n for n in string if n != character]
return newString
print("The string is", ourveryownstrip(input("Enter a string.`n"), input("enter character to remove`n")))
The way it works is that I create a function that passes in two arguments: 1) the first one is a string supplied, 2) the second is a a string or char that the person wants to remote/whitespace to be moved from the string. Then I use a list comprehension to store the 'modified' string as a new list by using a conditional statement. Then it returns the modified string as a list.
The output however, returns the entire thing as an array with every character in the string separated by a comma.
Expected output:
Boeing 747 Boeing 787
enter character to removeBoeing
The string is ['B', 'o', 'e', 'i', 'n', 'g', ' ', '7', '4', '7', ' ', 'B', 'o', 'e', 'i', 'n', 'g', ' ', '7', '8', '7']
How can I fix this?

What you have set up is checking each individual character in a list and seeing if it matches 'Boeing' which will never be true so it will always return the whole input. It is returning it as a list because using list comprehension makes a list. Like #BrutusForcus said this can be solved using string slicing and the string.index() function like this:
def ourveryownstrip(string,character):
while character in string:
string = string[:string.index(character)] + string[string.index(character)+len(character):]
return string
This will first check if the value you want removed is in your string. If it is then string[:string.index(character)] will get all of the string before the first occurrence of the character variable value and string[string.index(character)+len(character):] will get everything in the string after the first occurrence of the variable value. That will keep happening until the variable value doesn't occur in the string anymore.

Why does this function result in ['h', '-', '-', '-', '-']?

The function hangman_guessed(guessed, secret) is supposed to take a string of guessed characters and a list of "secret" characters.
The function checks every character in the secret list and compares it with each character in the guessed character string to check if the character is in both. If the characters are not the same then the function places a - in a temporary list equal to the secret list (so that we can still compare other characters in the guessed list to the original secret list later).
def hangman_guessed(guessed, secret):
modified = secret
for i1 in range(len(secret)):
for i2 in range(len(guessed)):
if secret[i1] == guessed[i2]:
modified[i1] = secret[i1]
break
else:
modified[i1] = '-'
return modified
For example, when I run hangman_guessed('hl', ['h','e','l','l','o']), it should return ['h', '-', 'l', 'l', '-'], but currently it returns ['h', '-', '-', '-', '-'].
The problem here is that only the first character in the guessed list is considered, but I do not know why. It this case, it is expected that the program checks over the 'l' characters in ['h','e','l','l','o']) and sets the corresponding characters in the temporary list modified to -, but to my understanding after the for loop runs again and checks the original secret list for l characters it should overwrite the - in the modified list and the result should have the 'l' characters rather than the - characters.

A list-comprehension is perfectly suited to what you want to do. We want to create a list of each character (let this be i) in secret if i is in guessed else we want to have a hyphen ("-").
def hangman_guessed(guessed, secret):
return [i if i in guessed else "-" for i in secret]
and a test to show it works:
>>> hangman_guessed('hl', ['h','e','l','l','o'])
['h', '-', 'l', 'l', '-']
As you get more used to the flow of Python, you will find that comprehensions in general are extremely useful as well as being very readable for a whole variety of things.
If for some reason however, you had to use nested for-loops and weren't allowed to use the really simple in operator, then you need to / can make a couple of corrections to your current code:
make a copy of the secret list first
iterate through the characters in guessed, rather than the indexes
After making these two corrections, the function will look something like:
def hangman_guessed(guessed, secret):
modified = secret[:]
for i in range(len(secret)):
for g in guessed:
if secret[i] == g:
modified[i] = secret[i]
break
else:
modified[i] = '-'
return modified
which now works:
>>> hangman_guessed('hl', ['h','e','l','l','o'])
['h', '-', 'l', 'l', '-']

Recursion, out of memory?

I wrote a function with two parameters. One is an empty string and the other is a string word. My assignment is to use to recursion to reverse the word and place it in the empty string. Just as I think ive got it, i received an "out of memory error". I wrote the code so that so it take the word, turn it into a list, flips it backwards, then places the first letter in the empty string, then deletes the letter out of the list so recursion can happen to each letter. Then it compares the length of the the original word to the length of the empty string (i made a list so they can be compared) so that when their equivalent the recursion will end, but idk
def reverseString(prefix, aStr):
num = 1
if num > 0:
#prefix = ""
aStrlist = list(aStr)
revaStrlist = list(reversed(aStrlist))
revaStrlist2 = list(reversed(aStrlist))
prefixlist = list(prefix)
prefixlist.append(revaStrlist[0])
del revaStrlist[0]
if len(revaStrlist2)!= len(prefixlist):
aStr = str(revaStrlist)
return reverseString(prefix,aStr)

When writing something recursive I try and think about 2 things
The condition to stop the recursion
What I want one iteration to do and how I can pass that progress to the next iteration.
Also I'd recommend getting the one iteration working then worry about calling itself again. Otherwise it can be harder to debug
Anyway so applying this to your logic
When the length of the output string matches the length of the input string
add one letter to the new list in reverse. to maintain progress pass list accumulated so far to itself
I wanted to just modify your code slightly as I thought that would help you learn the most...but was having a hard time with that so I tried to write what i would do with your logic.
Hopefully you can still learn something from this example.
def reverse_string(input_string, output_list=[]):
# condition to keep going, lengths don't match we still have work to do otherwise output result
if len(output_list) < len(list(input_string)):
# lets see how much we have done so far.
# use the length of current new list as a way to get current character we are on
# as we are reversing it we need to take the length of the string minus the current character we are on
# because lists are zero indexed and strings aren't we need to minus 1 from the string length
character_index = len(input_string)-1 - len(output_list)
# then add it to our output list
output_list.append(input_string[character_index])
# output_list is our progress so far pass it to the next iteration
return reverse_string(input_string, output_list)
else:
# combine the output list back into string when we are all done
return ''.join(output_list)
if __name__ == '__main__':
print(reverse_string('hello'))
This is what the recursion will look like for this code
1.
character_index = 5-1 - 0
character_index is set to 4
output_list so far = ['o']
reverse_string('hello', ['o'])
2.
character_index = 5-1 - 1
character_index is set to 3
output_list so far = ['o', 'l']
reverse_string('hello', ['o', 'l'])
3.
character_index = 5-1 - 2
character_index is set to 2
output_list so far = ['o', 'l', 'l']
reverse_string('hello', ['o', 'l', 'l'])
4.
character_index = 5-1 - 3
character_index is set to 1
output_list so far = ['o', 'l', 'l', 'e']
reverse_string('hello', ['o', 'l', 'l', 'e'])
5.
character_index = 5-1 - 4
character_index is set to 0
output_list so far = ['o', 'l', 'l', 'e', 'h']
reverse_string('hello', ['o', 'l', 'l', 'e', 'h'])
6. lengths match just print what we have!
olleh

Removing all elements containing (",") from a list

muutujad = list(input("Muutujad (sisesta formaadis A,B,C,...): "))
while "," in muutujad == True:
muutujad.remove(",")
print (muutujad)
My brain says that this code should remove all the commas from the list and in the end
the list should contain only ["A","B","C" ....] but it still contains all the elements. When i tried to visualize the code online, it said like [ "," in muutujad ] is always False but when i check the same command from the console it says it is True. I know it is a simple question but i would like to understand the basics.

You can use a list comprehension instead of a while loop:
muutujad = [elem for elem in muutujad if elem != ',']
Your if test itself is also wrong. You never need to test for == True for if in any case, that's what if does. But in your case you test the following:
("," in muutujad) and (muutujad == True)
which is always going to be False. In python, comparison operators like in and == are chained. Leaving off the == True would make your while loop work much better.
I'm not sure you understand what happens when you call list() on a string though; it'll split it into individual characters:
>>> list('Some,string')
['S', 'o', 'm', 'e', ',', 's', 't', 'r', 'i', 'n', 'g']
If you wanted to split the input into elements separated by a comma, use the .split() method instead, and you won't have to remove the commas at all:
>>> 'Some,string'.split(',')
['Some', 'string']

The best option here is to simply parse the string in a better way:
>>> muutujad = input("Muutujad (sisesta formaadis A,B,C,...): ").split(",")
Muutujad (sisesta formaadis A,B,C,...): A, B, C
>>> muutujad
['A', ' B', ' C']
str.split() is a much better option for what you are trying to do here.

What about list("Muutujad (sisesta formaadis A,B,C,...): ".replace(' ', ''))
Downvoter: I meant: this is how you do remove commas from string.
You do not convert your input from string to list and then remove your commas from the list, it's absurd.
you do: list(input('...').replace(' ', ''))
or you use split, as pointed out above.

Convert string / character to integer in python

I want to convert a single character of a string into an integer, add 2 to it, and then convert it back to a string. Hence, A becomes C, K becomes M, etc.

This is done through the chr and ord functions. Eg; chr(ord(ch)+2) does what you want. These are fully described here.

This sounds a lot like homework, so I'll give you a couple of pieces and let you fill in the rest.
To access a single character of string s, its s[x] where x is an integer index. Indices start at 0.
To get the integer value of a character it is ord(c) where c is the character. To cast an integer back to a character it is chr(x). Be careful of letters close to the end of the alphabet!
Edit: if you have trouble coming up with what to do for Y and Z, leave a comment and I'll give a hint.

Normally, Just ord and add 2 and chr back, (Y, Z will give you unexpected result ("[","\")
>>> chr(ord("A")+2)
'C'
If you want to change Y, Z to A, B, you could do like this.
>>> chr((ord("A")-0x41+2)%26+0x41)
'C'
>>> chr((ord("Y")-0x41+2)%26+0x41)
'A'
>>> chr((ord("Z")-0x41+2)%26+0x41)
'B'
Here is A to Z
>>> [chr((i-0x41+2)%26+0x41) for i in range(0x41,0x41+26)]
['C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'A', 'B']

http://docs.python.org/library/functions.html
ord(c)
Given a string of length one, return an integer representing the Unicode code point of the character when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string. For example, ord('a') returns the integer 97, ord(u'\u2020') returns 8224. This is the inverse of chr() for 8-bit strings and of unichr() for unicode objects. If a unicode argument is given and Python was built with UCS2 Unicode, then the character’s code point must be in the range [0..65535] inclusive; otherwise the string length is two, and a TypeError will be raised.

"ord" is only part of the solution. The puzzle you mentioned there rotates, so that "X"+3 rotates to "A". The most famous of these is rot-13, which rotates 13 characters. Applying rot-13 twice (rotating 26 characters) brings the text back to itself.
The easiest way to handle this is with a translation table.
import string
def rotate(letters, n):
return letters[n:] + letters[:n]
from_letters = string.ascii_lowercase + string.ascii_uppercase
to_letters = rotate(string.ascii_lowercase, 2) + rotate(string.ascii_uppercase, 2)
translation_table = string.maketrans(from_letters, to_letters)
message = "g fmnc wms bgblr"
print message.translate(translation_table)
Not a single ord() or chr() in here. That's because I'm answering a different question than what was asked. ;)

Try ord(), should do the trick :)

For a whole string this would be:
>>> s = "Anne"
>>> ''.join([chr(ord(i)+2) for i in s])
'Cppg'
It's diffucult for 'Y', 'Z' ...
>>> s = "Zappa"
>>> ''.join([chr(ord(i)+2) for i in s])
'\\crrc'
Functions: chr, ord

For those who need to perform the operation on each character of the string, another way of handling this is by converting the str object to/from a bytes object which will take advantage of the fact that a bytes object is just a sequence of integers.
import numpy
old_text_str = "abcde" # Original text
old_num_list = list(old_text_str.encode()) # Integer values of the original text
new_num_list = numpy.add(old_num_list, 2).tolist() # Add +2 to the integer values
new_text_str = bytes(new_num_list).decode() # Updated text
print(f"{old_text_str=}")
print(f"{old_num_list=}")
print(f"{new_num_list=}")
print(f"{new_text_str=}")
Output:
old_text_str='abcde'
old_num_list=[97, 98, 99, 100, 101]
new_num_list=[99, 100, 101, 102, 103]
new_text_str='cdefg'
Related topics:
How do I convert a list of ascii values to a string in python?
How can I convert a character to a integer in Python, and viceversa?
How to add an integer to each element in a list?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to safely truncate a quoted string? - python

What about looking for dangling percentage marks? value = value[:30] if value[-1] == "%": value = value[:-1] elif value[-2] == "%": value = value[:-2] print(value)

Related

Recreating the strip() method using list comprehensions but output returns unexpected result

Why does this function result in ['h', '-', '-', '-', '-']?

Recursion, out of memory?

Removing all elements containing (",") from a list

Convert string / character to integer in python

Categories

Resources