Python: Extract multiple float numbers from string - python

Forgive me, I'm new to Python.
Given a string that starts with a float of indeterminate length and ends with the same, how can I extract both of them into an array, or if there is just one float, just the one.
Example:
"38.00,SALE ,15.20"
"69.99"
I'd like to return:
[38.00, 15.20]
[69.99]

You could also use regex to do this
import re
s = "38.00,SALE ,15.20"
p = re.compile(r'\d+\.\d+') # Compile a pattern to capture float values
floats = [float(i) for i in p.findall(s)] # Convert strings to float
print floats
Output:
[38.0, 15.2]

def extract_nums(text):
for item in text.split(','):
try:
yield float(item)
except ValueError:
pass
print list(extract_nums("38.00,SALE ,15.20"))
print list(extract_nums("69.99"))
[38.0, 15.2]
[69.99]
However by using float conversion you are losing precision, If you want to keep the precision you can use decimal:
import decimal
def extract_nums(text):
for item in text.split(','):
try:
yield decimal.Decimal(item)
except decimal.InvalidOperation:
pass
print list(extract_nums("38.00,SALE ,15.20"))
print list(extract_nums("69.99"))
[Decimal('38.00'), Decimal('15.20')]
[Decimal('69.99')]

You said you're only interested in floats at the start and end of the string, so assuming it's comma-delimited:
items = the_string.split(',')
try:
first = float(items[0])
except (ValueError, IndexError):
pass
try:
second = float(items[-1])
except (ValueError, IndexError):
pass
We have to wrap the operations in exception handlers since the value might not be a valid float (ValueError) or the index might not exist in the list (an IndexError).
This will handle all cases including if one or both of the floats is omitted.

You can try something like
the_string = "38.00,SALE ,15.20"
floats = []
for possible_float in the_string.split(','):
try:
floats.append (float (possible_float.strip())
except:
pass
print floats

Try a list comprehension:
just_floats = [i for i in your_list.split(',') if i.count('.') == 1]
first you split the string where the commas are, then you filter through the string and get rid of values that don't have a decimal place

import re
map(float, filter(lambda x: re.match("\s*\d+.?\d+\s*", x) , input.split(","))
Input : input = '38.00,SALE ,15.20' Output: [38.0, 15.2]
Input : input = '38.00,SALE ,15.20, 15, 34.' Output: [38.0, 15.2, 15.0, 34.0]
Explanation:
Idea is to split the string : list_to_filter = input.split(",") splits on ,
Then using regex filter strings which are real numbers: filtered_list = filter(<lambda>, list_to_filter) , here item is included in output of filter if lamda expression is true. So when re.match("\s*\d+.?\d+\s*", x) matches for string x filter keeps it.
And finally convert into float. map(float, filtered_list). What it does is apply float() function to each element of the list

Split the input, check if each element is numeric when the period is removed, convert to float if it is.
def to_float(input):
return [float(x) for x in input.split(",") if unicode(x).replace(".", "").isdecimal()]

Related

Error while working with construction try-except ValueError

result = []
try:
for i in range(len(ass)):
int(df['sku'][i])
except ValueError:
result.append(df['sku'][i])
I need to collect all the errors in a list. Tell me, please, the code above adds only the first error, I need everything.
After iterating over all sku values, only those that cannot be converted to int should be included in the list.
You can move the try...except inside the loop:
result = []
for i in range(len(ass)):
try:
int(df['sku'][i])
except ValueError:
result.append(df['sku'][i])
You can also use isdigit() with a list comprehension as follows:
result = [val for val in df['sku'] if val.isdigit()]
However, you should note that isdigit() will not work in some cases e.g. those with leading signs.
As an example, '+1' will convert to an integer type fine with int() but will return False with is isdigit(). Similarly, -1 will convert fine with int() but return False with isdigit().
Further information can be found int the documentation:
str.isdigit()
Return true if all characters in the string are digits and there is at least one character, false otherwise.
You'd want the try-except in the loop:
result = []
for i in range(len(ass)):
try:
int(df['sku'][i])
except ValueError:
result.append(df['sku'][i])
But if it's really a list of non-digit SKUs you want,
result = [sku for sku in df['sku'] if not sku.isdigit()]
This should work:
result = []
for i in range(len(ass)):
try:
int(df['sku'][i])
except ValueError:
result.append(df['sku'][i])

pd.to_numeric could not convert string to float

def openfiles():
file1 = tkinter.filedialog.askopenfilename(filetypes=(("Text Files",".csv"),("All files","*")))
read_text=pd.read_csv(file1)
displayed_file.insert(tk.END,read_text)
read_text['OPCODE'] = pd.to_numeric(read_text['OPCODE'],errors = 'coerce').fillna(0.0)
read_text['ADDRESS'] = pd.to_numeric(read_text['ADDRESS'],errors = 'coerce').fillna(0.0)
classtype1=np.argmax(model.predict(read_text), axis=-1)
tab2_display_text.insert(tk.END,read_text)
When running this code it shows "could not convert string to float".
Link of the csv file that is used to as datafram: https://github.com/Yasir1515/Learning/blob/main/Book2%20-%20Copy.csv
Complete code link (probmatic code is at line 118-119): https://github.com/Yasir1515/Learning/blob/main/PythonApplication1.py
In your data ADDRESS is a hexadecimal number and OPCODE is a list of hexadecimal numbers. I don't know why would you want to convert hex numbers to float. You should convert them to integers.
The method to_numeric is not suitable to convert hex string to integer, or handle a list of hex numbers. You need to write help function:
def hex2int(x):
try:
return int(x, 16)
except:
return 0
def hex_list2int_list(zz):
return [hex2int(el) for el in zz.split()]
Now replace relevant lines:
read_text['OPCODE'] = read_text['OPCODE'].apply(hex_list2int_list)
read_text['ADDRESS'] = read_text['ADDRESS'].apply(hex2int)
I look at your CSV file. The column OPCODE contains one row with a long string of some numbers separated by space(' '). therefor you cannot cast that type of value to numeric type (the string '88 99 77 66' != numeric type). I can suggest some solution to split those many values in the column OPCODE to many rows and then perform the to_numeric method after afterwards you can make manipulation and return it to the previous form.
what I suggest is:
read_text=pd.read_csv(file1)
new_df = pd.concat([pd.Series(row['ADDRESS'], row['OPCODE'].split(' '))
for _, row in a.iterrows()]).reset_index()
new_df['OPCODE'] = pd.to_numeric(new_df['OPCODE'],errors = 'coerce').fillna(0.0)

Remove punctuations from list and convert string value to float in python

I want to remove the dollar signs and commas from the column and cast to float.
This is what I do so far, it didn't work. Actually nothing changed.
The data look like["$200,00","$1,000.00"..."$50.00"]
import pandas as pd
import string
y_train = train.iloc[:,-1]
needtoclean=y_train.to_list()#''.join(y_train.to_list())
to_delete = set(string.punctuation) - {'$',','}
clean = [x for x in needtoclean if x not in to_delete]
list_ = ['$58.00', '$60.00'] #Your Lise
new_list = [] #Initialise new list
for elem in list_: #Iterate over previous list's elements
elem = elem.replace("$", '') #Replace the `$` sign
new_list.append(float(elem)) #Add the typecasted float to new list
Try with this, next time you should post code
Iterate list by index to be able to modify values.
1). Remove $
2). Cast to float
for i in xrange(len(your_list)):
your_list[i] = float(your_list[i].replace("$", ""))
It would be easily solved as list comprehenshion.
unclean = ['$58.00', '$125.00'] # your data
clean = [float(value[1:]) for value in unclean if value.startswith('$')]
# you can remove "if value.startswith('$')" if you are sure
# that all values start with $
If you want it as function:
unclean = ['$58.00', '$125.00']
def to_clean_float(unclean):
return [float(value[1:]) for value in unclean if value.startswith('$')]
print(to_clean_float(unclean)) # Gives: [58.0, 125.0]
If you don't need it as atomic list but want to work further with the data, you could also create a generator expression.
If it's a huge list it can save a lot of memory.
unclean = ['$58.00', '$125.00']
def to_clean_float(unclean):
return (float(value[1:]) for value in unclean if value.startswith('$'))
clean_generator = to_clean_float(unclean)
print(list(value for value in clean_generator)) # Gives: [58.0, 125.0]
If dollar signs always in the same place in those strings this should do the job.
I assumpt that you use pandas dataframe.
df["needtoclean"] = df["needtoclean"].apply(lambda x: float(x[1:].replace(",", "")))

Converting str items in a dictionary to int and converting datetime.datetime variable to int

I know have this for loop that looks like this:
for i in my_dict[hostname]:
try:
if i == '':
except ValueError:
pass
i = int(i)
print(type(i))
It is giving me a syntax error and im unsure where or why.
Not sure I really understand your purpose, but converting a string of an int is straight-forward in Python :
>>> s = '123'
>>> int(s)
123
To convert a datetime to an int, you can convert it to a timestamp and then to an int:
timestamp = datetime.timestamp(d)
The timestamp is already a number that you can do operations on:
>>> d.timestamp()
1562855175.285529
>>> d.timestamp() - 1
1562855174.285529
Like #EuclidianHike said, you can easily convert str to int with int("str")
Make a 2nd list containing the values of 'scores' but in integer form by doing:
int_scores = []
for i in my_dict['scores']:
int_s = int(i)
int_scores.append(int_s) #Add indention as I cannot do it here on stackoverflow

Django - Field validation to ensure the data is a list of tuples of float

I have a Charfield where the user must enter a list of tuples of float (without the brackets) like: (0,1),(0.43,54),(24.2,4)
What would be the way to ensure: first that the input is a list of tuples, and second, that tuples are made of float only?
What I tryed so far:
def clean_dash_array(self):
data = self.cleaned_data['dash_array']
try:
data_list = eval("[%s]" % data) #transform string into list
for t in data_list:
if type(t) != tuple:
raise forms.ValidationError("If: You must enter tuple(s) of float delimited with coma - Ex: (1,1),(2,2)")
except:
raise forms.ValidationError("Except: You must enter tuple(s) of float delimited with coma - Ex: (1,1),(2,2)")
return data
This is not complete because it can't validate that the tuples contain float only.
Edit:
def clean_dash_array(self):
data = self.cleaned_data['dash_array']
try:
data_cleaned = [tuple(float(i) for i in el.strip('()').split(',')) for el in data.split('),(')]
except:
raise forms.ValidationError("Except: You must enter tuple(s) of int or float delimited with coma - Ex: (1,1),(2,2)")
return data
This clean method seems to work and do not use eval() as suggested by: Iain Shelvington.
Do you think this will validate the data for any kind of erroneous input?
If I understand you correctly this should do it:
def clean_dash_array(self):
data = self.cleaned_data['dash_array']
for tuple_array in data:
if type(tuple_array) == tuple:
for tuple_data in tuple_array:
if type(tuple_data) == float:
#Do something with this
else:
return "Error: Not a float"
else:
return "Error: Not a tuple."
This solution is working:
def clean_dash_array(self):
data = self.cleaned_data['dash_array']
try:
data_cleaned = [tuple(float(i) for i in el.strip('()').split(',')) for el in data.split('),(')]
except:
raise forms.ValidationError("You must enter tuple(s) of int or float delimited with commas - Ex: (1,1),(2,2)")
return data

Categories