Find all floats or ints in a given string - python

Given a string, "Hello4.2this.is random 24 text42", I want to return all ints or floats, [4.2, 24, 42]. All the other questions have solutions that return just 24. I want to return a float even if non-digit characters are next to the number. Since I am new to Python, I am trying to avoid regex or other complicated imports. I have no idea how to start. Please help. Here are some research attempts: Python: Extract numbers from a string, this didn't work since it doesn't recognize 4.2 and 42. There are other questions like the one mentioned, none of which sadly recognize 4.2 and 42.

A regex from perldoc perlretut:
import re
re_float = re.compile("""(?x)
^
[+-]?\ * # first, match an optional sign *and space*
( # then match integers or f.p. mantissas:
\d+ # start out with a ...
(
\.\d* # mantissa of the form a.b or a.
)? # ? takes care of integers of the form a
|\.\d+ # mantissa of the form .b
)
([eE][+-]?\d+)? # finally, optionally match an exponent
$""")
m = re_float.match("4.5")
print m.group(0)
# -> 4.5
To get all numbers from a string:
str = "4.5 foo 123 abc .123"
print re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)
# -> ['4.5', ' 123', ' .123']

Using regular expressions is likely to give you the most concise code for this problem. It is hard to beat the conciseness of
re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)
from pythad's answer.
However, you say "I am trying to avoid regex", so here's a solution that does not use regular expressions. It is obviously a bit longer than a solution using a regular expression (and probably much slower), but it is not complicated.
The code loops through the input character by character.
As it pulls each character from the string, it appends it to current (a string that holds the number currently being parsed) if appending it still maintains a valid number. When it encounters a character that cannot be appended to current, current is saved to a list of numbers, but only if current itself isn't one of '', '.', '-' or '-.'; these are strings that could potentially begin a number but are not themselves valid numbers.
When current is saved, a trailing 'e', 'e-' or 'e+' is removed. That will happen with a string such as '1.23eA'. While parsing that string, current will eventually become '1.23e', but then 'A' is encountered, which means the string does not contain a valid exponential part, so the 'e' is discarded.
After saving current, it is reset. Usually current is reset to '', but when the character that triggered current to be saved was '.' or '-', current is set to that character, because those characters could be the beginning of a new number.
Here's the function extract_numbers(s). The line before return numbers converts the list of strings to a list of integers and floating point values. If you want just the strings, remove that line.
def extract_numbers(s):
"""
Extract numbers from a string.
Examples
--------
>>> extract_numbers("Hello4.2this.is random 24 text42")
[4.2, 24, 42]
>>> extract_numbers("2.3+45-99")
[2.3, 45, -99]
>>> extract_numbers("Avogadro's number, 6.022e23, is greater than 1 million.")
[6.022e+23, 1]
"""
numbers = []
current = ''
for c in s.lower() + '!':
if (c.isdigit() or
(c == 'e' and ('e' not in current) and (current not in ['', '.', '-', '-.'])) or
(c == '.' and ('e' not in current) and ('.' not in current)) or
(c == '+' and current.endswith('e')) or
(c == '-' and ((current == '') or current.endswith('e')))):
current += c
else:
if current not in ['', '.', '-', '-.']:
if current.endswith('e'):
current = current[:-1]
elif current.endswith('e-') or current.endswith('e+'):
current = current[:-2]
numbers.append(current)
if c == '.' or c == '-':
current = c
else:
current = ''
# Convert from strings to actual python numbers.
numbers = [float(t) if ('.' in t or 'e' in t) else int(t) for t in numbers]
return numbers

If you want to get integers or floats from a string, follow the pythad's
ways...
If you want to get both integers and floats from a single string, do this:
string = "These are floats: 10.5, 2.8, 0.5; and these are integers: 2, 1000, 1975, 308 !! :D"
for line in string:
for actualValue in line.split():
value = []
if "." in actualValue:
value = re.findall('\d+\.\d+', actualValue)
else:
value = re.findall('\d+', actualValue)
numbers += value

Related

how to use dictionnaries to translate inputs to numbers/letters

I want a python code that translate numbers to specific letters and the opposite (not only alphabetically) , and i wonder if there is a way for doing that using dictionaries , and how to specify the number of digits to translate . for example the input would be : 1203
so how to control if the program would translate each digit individually,in pairs or in more complicated ways .
you can use ascii code and the built-in function chr(number):
chr(32)
result:
' '
but this enables just one per character, so you could do something like this:
DISCLAMER: the code isn't efficent, its just for example:
number = '1234'
dgs = []
digits = ""
for digit in number:
if digits == '':
digits+=digit;continue
if int(digits+digit) <= 255:
print(digit,"has been appended to ",digits)
digits+=digit
if int(digits+digit) > 255:
print("the queque is full, list ",digits,"cleared!")
dgs.append(int(digits));digits = ''
if digits != '':
dgs.append(int(digits))
print(digits)
digits=''
decoded = [chr(x) for x in dgs]
print(dgs,decoded)
output:
[123,4] ['{', '\x04']
if you want to be in a single string, you could join them together, using the join method:
string_stdout = ''.join(decoded)
print(string_stdout)
out:
{♦

Why does my code remove 999 in my replacement code?

I have the code below to replace all punctuation with 999 and all alphabet characters with its number position. I have included the print statement that confirms punctuation is being replaced. However I seem to override with my remaining code to replace the other characters.
import string
def encode(text):
punct = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
for x in text.lower():
if x in punct:
text = text.replace(x, ".999")
print(text)
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
return ".".join(nums)
print(encode(str(input("Enter Text: "))))
Input: 'Morning! \n'
Output: '13.15.18.14.9.14.7 \n'
Expected Output: 13.15.18.14.9.14.7.999
No, you have two independent logical "stories" here. One replaces punctuation with 999. The other filters out all the letters and builds an independent list of their alphabetic positions.
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
return ".".join(nums)
Note that this does nothing to alter text, and it takes nothing but letters from text. If you want to include the numbers, do so:
nums = [str(ord(x) - 96)
if x >= 'a' and x <= 'z'
else x
for x in text.lower()
]
return ".".join(nums)
Output of print(encode("[hello]")):
..9.9.9.8.5.12.12.15...9.9.9
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
This means: take every character from the lowercase version of the string, and only if it is between 'a' and 'z', convert the value and put the result in nums.
In the first step, you replace a bunch of punctuation with text that includes '.' and '9' characters. But neither '9' nor '.' is between 'a' and 'z', so of course neither is preserved in the second step.
Now that I understand what you are going for: you have fundamentally the wrong approach to splitting up the problem. You want to separate the two halves of the rule for "encoding" a given part of the input. But what you want to do is separate the whole rule for encoding a single element, from the process of applying a single-element rule to the whole input. After all - that is what list comprehensions do.
This is the concept of separation of concerns. The two business rules are part of the same concern - because implementing one rule doesn't help you implement the other. Being able to encode one input character, though, does help you encode the whole string, because there is a tool for that exact job.
We can have a complicated rule for single characters - no problem. Just put it in a separate function, so that we can give it a meaningful name and keep things simple to understand. Conceptually, our individual-character encoding is a numeric value, so we will consistently encode as a number, and then let the string-encoding process do the conversion.
def encode_char(c):
if c in '''!()-[]{};:'"\,<>./?##$%^&*_~''':
return 999
if 'a' <= c.lower() <= 'z':
return ord(c) - 96
# You should think about what to do in other cases!
# In particular, you don't want digit symbols 1 through 9 to be
# confused with letters A through I.
# So I leave the rest up to you, depending on your requirements.
Now we can apply the overall encoding process: we want a string that puts '.' in between the string representations of the values. That's straightforward:
def encode(text):
return '.'.join(str(encode_char(c)) for c in text)

Codecademy Hurricane

I'm a newbie at Codecademy and I'm currently doing this Hurricane Analysis project. The question I'm stuck on requires me to convert a string (such as "1.76B") and convert it to a real number.
I've tried parsing the string, but can't figure out how to add and convert the B part and add it to the float.
def convert_damages(damages):
for damage in damages:
if damage[-1] == 'B':
split_damages = damage.split('B')
elif damage[-1] == 'M':
split_damages = damage.split('M')
else:
return
So if I have a list containing something like 1.76M, 2.35B and 3.11M it should return something like 1760000, 2350000000, 3110000
Another option is to use a dictionary to hold the conversion factors rather than conditionals
def convert_damages(damages):
" convert damages to a number using following steps "
# Table (dictionary) of conversion lookups
# i.e. conversion_factors['B'] = 10^9, conversion_factors['K'] = 10^3, etc.
conversion_factors = {'B': 1E9,'K': 1E3, 'M': 1E6}
# Get the numeric part of string
num_string = damages[:-1] # number part is everything but last character
# get the factor part of string (last character)
factor = damages[-1] # last character is conversion
new_number = float(num_string) * conversion_factors[factor]
return new_number
Test
for num in ['1.76B', '1.76K', '1.76M']:
print(num, ' => ', convert_damages(num))
Output
1.76B => 1760000000.0
1.76K => 1760.0
1.76M => 1760000.0
Try the following changes:
Make sure that every case returns something - I've added a return at the end.
Note that split returns a list; If you want the first element, take it with [0].
split returns a list of strings, but you can convert your string to a float by calling float(the_string).
Finally, don't forget to multiply by 10 to the relevant power (M is 6 and B is 9 where I come from; change yours appropriately).
The function is as follows:
def convert_damages(damages):
for damage in damages:
if damage[-1] == 'B':
split_damages = float(damage.split('B')[0]) * (10 ** 9)
elif damage[-1] == 'M':
split_damages = float(damage.split('M')[0]) * (10 ** 6)
else:
return
return split_damages
Good luck!
You can use regex to help you with this. The first step is creating a pattern that matches your expected string. You should match one or more digits, followed by 0 or 1 period, and then 1 or more digits. Finally, you want some factor at the end of your string. Since you only mentioned M and B, that's all I've included here. The regex pattern used would be ^(\d+(\.?\d+)?)([MB])$.
Explanation of the pattern:
^ Matches the start of the string
(\d+(\.?\d+)?) Matches one or more digits followed by (optionally) a decimal and one or more digits.
([MB]) Matches M or B literally.
$ Matches end of the string
Because of how our pattern is set up, match.group(0) will be the entire match (this is always true). match.group(1) will be the full number. match.group(2) will be either empty or the decimal point and additional numbers. Note that group 2 is inside group 1 (look at the parenthesis) so we don't need to access group 2 at all. The entire number is in group 1. match.group(3) will be the letter at the end of the string, if it is M or B.
Finally, we need some kind of map between the alphabetic factor (M or B) and the numeric factor (1 million or 1 billion). A dictionary is perfect for this. We multiply the number found in match.group(1) with the numerical factor, and that is our result.
import re
def convert_damage(damage):
damage_pattern = re.compile(r"^(\d+(\.?\d+)?)([MB])$")
match = damage_pattern.match(damage)
if match is None:
print(f"The provided damage '{damage}' is not in the appropriate format.")
return
number = float(match.group(1)) # This is the number
factor = match.group(3) # This is the factor string (M or B)
factor_map = {"M": 1e6, "B": 1e9}
actual_damage = number * factor_map[factor]
return actual_damage
def convert_damages(damages):
output = []
for damage in damages:
output.append(convert_damage(damage))
return output
### Converting single damages
d1 = "1.76M"
print(f"{d1} -> {convert_damage(d1)}")
d2 = "3.52B"
print(f"{d2} -> {convert_damage(d2)}")
### Converting a list of damages
tests = [d1, d2, "6B", "1..5M", "0.2B"]
results = convert_damages(tests)
for damage_str, damage_num in zip(tests, results):
print(f"{damage_str} -> {damage_num}")
Output:
1.76M -> 1760000.0
3.52B -> 3520000000.0
The provided damage '1..5M' is not in the appropriate format.
1.76M -> 1760000.0
3.52B -> 3520000000.0
6B -> 6000000000.0
1..5M -> None
0.2B -> 200000000.0

Different results when return multiple values in python (Cryptopal challenges)

I'm working on problem 3(set 1) of the cryptopals challenges (https://cryptopals.com/sets/1/challenges/3)
I've already found the key ('x') and decrypted the message ('Cooking mcs like a pound of bacon')
Here is my code:
from hexToBase64 import hexToBinary
from fixedXOR import xorBuffers
def binaryToChar(binaryString):
asciiValue = 0
for i in range(int(len(binaryString))-1,-1,-1):
if(binaryString[i] == '1'):
asciiValue = asciiValue + 2**(7-i)
return chr(asciiValue)
def decimalToBinary(number):
binaryString = ""
while (number != 0):
bit = number % 2
binaryString = str(bit) + binaryString
number = int(number/2)
while(len(binaryString) < 8):
binaryString = "0" + binaryString
return binaryString
def breakSingleByteXOR(cipherString):
decryptedMess = ""
lowestError = 10000
realKey = ""
for i in range(0,128):
errorChar = 0
tempKey = decimalToBinary(i)
tempMess = ""
for j in range(0,len(cipherString),2):
#Take each byte of the cipherString
cipherChar = hexToBinary(cipherString[j:j+2])
decryptedChar = binaryToChar(xorBuffers(cipherChar,tempKey))
asciiValue = ord(decryptedChar)
if (not ((asciiValue >= 65) and (asciiValue <= 90)) \
or ((asciiValue >= 90) and (asciiValue <= 122)) \
or ( asciiValue == 32 )):
# if the character is not one of the characters ("A-Z" or "a-z"
# or " ") consider it as an "error"
errorChar += 1
tempMess = tempMess + decryptedChar
if(errorChar < lowestError):
lowestError = errorChar
decryptedMess = tempMess
realKey = chr(i)
return (realKey,decryptedMess)
if __name__ == "__main__":
print(breakSingleByteXOR("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"))
The problem is when I use the function breakSingleByteXOR to return one value (decryptedMess), it came out okay "cOOKING mcS LIKE A POUND OF BACON"
But when I return 2 values with the function (as the code above - (key,decryptedMess)), I received a weird result ('x', 'cOOKING\x00mc\x07S\x00LIKE\x00A\x00POUND\x00OF\x00BACON'), can anyboby explain to me why this is the case?
Tbh, I'm learning python as I'm doing the challenges so hopefully I dont trigger anyone with these code.... I'd also really appreciate it if anyone could give me some advices on writing good python code
Thanks guys :D
It's true that the reason for the difference in the printed string is a quirk of the print function.
The deeper problem with that program is that it's not producing the correct answer. That's because the big ugly if that tries to decide whether a decrypted character is in the acceptable range is incorrect.
It's incorrect in two ways. The first is that (asciiValue >= 90) should be (asciiValue >= 97). A better way to write all of those expressions, which would have avoided this error, is to express them as (asciiValue >= ord('a')) and (asciiValue == ord(' ')) and so on, avoiding the inscrutable numbers.
The second way is that the expressions are not properly grouped. As they stand they do this:
character is not in the range 'A' to 'Z',
or character is in the range 'a' to 'z',
or character is 'space',
then count this as an error
so some of the characters that should be good (specifically 'a' through 'z' and space) are counted as bad. To fix, you need to rework the parentheses so that the condition is:
character is not in the range 'A' to 'Z',
and character is not in the range 'a' to 'z',
and character is not space,
then count this as an error
or (this is style you were trying for)
character is not (in the range 'A' to 'Z'
or in the range 'a' to 'z'
or a space)
I'm not going to give you the exact drop-in expression to fix the program, it'll be better for you to work it out for yourself. (A good way to deal with this kind of complexity is to move it into a separate function that returns True or False. That makes it easy to test that your implementation is correct, just by calling the function with different characters and seeing that the result is what you wanted.)
When you get the correct expression, you'll find that the program discovers a different "best key" and the decrypted string for that key contains no goofy out-of-range characters that behave strangely with print.
The print function is the culprit - it is translating the characters \x00 and \x07 to ASCII values when executed. Specifically, this only occurs when passing a string to the print function, not an iterable or other object (like your tuple).
This is an example:
>>> s = 'This\x00string\x00is\x00an\x00\x07Example.'
>>> s
'This\x00string\x00is\x00an\x00\x07Example.'
>>> print(s)
This string is an Example.
If you were to add the string s to an iterable (tuple, set, or list), s will not be formatted by the print function:
>>> s_list = [s]
>>> print(s_list) # List
['This\x00string\x00is\x00an\x00\x07Example.']
>>> print(set(s_list)) # Set
{'This\x00string\x00is\x00an\x00\x07Example.'}
>>> print(tuple(s_list)) # Tuple
('This\x00string\x00is\x00an\x00\x07Example.')
Edit
Because the \x00 and \x07 bytes are ASCII control characters, (\x00 being NUL and \x07 being BEL), you can't represent them in any other way. So one of the only ways you could strip these characters from the string without printing would be to use the .replace() method; but given \x00 bytes are being treated as spaces by the terminal, you would have to use s.replace('\x00', ' ') to get the same output, which has now changed the true content of the string.
Otherwise when building the string; you could try and implement some logic to check for ASCII control characters and either not add them to tempMess or add a different character like a space or similar.
References
ASCII Wiki: https://en.wikipedia.org/wiki/ASCII
Curses Module: https://docs.python.org/3.7/library/curses.ascii.html?highlight=ascii#module-curses.ascii (Might be useful if you wish to implement any logic).

Recursive function to convert characters

I am trying to write a program in Python which uses a recursive function to convert all the lower-case characters in a string to the next character. Here's my attempt:
def convert(s):
if len(s) < 1:
return ""
else:
return convert(chr(ord(s[0+1])))
print(convert("hello"))
When I try to run this program, it gives me the error: string index out of range. Could anyone please help me correct this? I'm not even sure if my program is coded correctly to give the required output :/
You want to return the shifted character and then call your convert function on the remainder of the string. If you must use recursion, you need to check if the string is exhausted (if not s is the same as if len(s) == 0 here because '' is equivalent to False) and bail:
def convert(s):
if not s:
return ''
c = s[0]
i = ord(c)
if 96 < i < 123:
# for lower-case characters permute a->b, b->c, ... y->z, z->a
c = chr(((i-97)+1)%26 + 97)
return c + convert(s[1:])
print(convert('hello'))
print(convert('abcdefghijklmnopqrstuvwxyz'))
Output:
ifmmp
bcdefghijklmnopqrstuvwxyza
The ASCII codes for 'a' and 'z' are 97 and 122 respectively, so we only apply the shift to characters whose codes, i, are in this range. Don't forget to wrap if the character is z: you can do this with modular arithmetic: ((i-97)+1)%26 + 97.
EDIT explanation: Subtract 97 so that the code becomes 0 to 25, then add 1 mod 26 such that 0+1 = 1, 1+1 = 2, ..., 24+1 = 25, 25+1=0. Then add back on 97 so that the code represents a letter between a and z. This way your letters will cycle round
You are trying to index the second character each time; Python indexes start at 0 so 0+1 is 1 is the second character. Your len() test doesn't guard against that, it only tests for empty strings.
You also pass in just one character to the recursive call, so you always end up with a string of length 1, which doesn't have a second character.
So your test with 'hello' does this:
convert('hello')
len('hello') > 1 -> True
s[0+1] == s[1] == 'e'; chr(ord('e')) is 'e'
return convert('e')
len('e') > 1 -> True
s[0+1] == s[1] -> 'e'[1] raises an index error
If you wanted to use recursion, then you need to decide how to detect the end of the recursion path correctly. You could test for strings shorter than 2 characters, for example, as there is no next character to use in that case.
You also need to decide what to delegate to the recursive call. For a conversion like this, you could pass in the remainder of the string.
Last but not least, you need to test if the character you are going to replace is actually lowercase.

Categories