Converting string values to float and removing strings from list - python

I have a list that looks like this
lst = ['a','b','43.23','c','9','22']
I would like to remove the elements that cannot be represented as floats and hence I am doing the following (Attempt 1):
for i,j in enumerate(lst):
try:
lst[i]=float(j)
except:
lst.remove(j)
Which leaves the list looking like this
lst = ['b', 43.23, '9', 22.0]
whereas what I need is this
lst = [43.23, 9.0 , 22.0]
And so I'm doing the following:
for i,j in enumerate(lst):
try:
lst[i]=float(j)
except:
pass
lst = [i for i in lst if type(i) != str]
Is there a cleaner way to do this.?
EDIT: Changed the name of example list from 'list' to 'lst' based on the recommendations below.

You can use the following function from this stackoverflow post:
def isfloat(value):
try:
float(value)
return True
except ValueError:
return False
And, then use it in a list comprehension:
>>> l = ['a','b','43.23','c','9','22']
>>> [float(x) for x in l if isfloat(x)]
# [43.23, 9.0, 22.0]

First you shouldn't name your variable list it will shadow the built-in list function/class. You can use a simple function to do this:
>>> lst = ['a','b','43.23','c','9','22']
>>> def is_float(el):
... try:
... return float(el)
... except ValueError:
... pass
...
>>> [i for i in lst if is_float(i)]
['43.23', '9', '22']
>>> [float(i) for i in lst if is_float(i)] # to return a list of floating point number
[43.23, 9.0, 22.0]
The problem with your code is that you are trying to modify your list while iterating. Instead you can make a copy of your list then use the element index to remove their value.
lst = ['a','b','43.23','c','9','22']
lst_copy = lst.copy()
for el in lst:
try:
float(val)
except ValueError:
lst_copy.remove(el)
Of course this is less efficient than the solution using the list comprehension with a predicate because you first need to make a copy of your original list.

You shouldn't manipulate the list you're iterating through (and you shouldn't call it list neither, since you would shadow the built-in list), since that messes up with the indexes.
The reason why 'b' shows up in your output is that during the first iteration, 'a' is not a float, so it gets removed. Thus your list becomes:
['b','43.23','c','9','22']
and b becomes list[0]. However, the next iteration calls list[1] skipping thus 'b'.
To avoid such an issue, you can define a second list and append the suitable values to that:
l1 = ['a','b','43.23','c','9','22']
l2 = []
for item in l1:
try:
l2.append(float(item))
except ValueError: # bare exception statements are bad practice too!
pass

Would be better in considering iterators to efficiently use system memory. Here is my take to the solution.
def func(x):
try:
return float(x)
except ValueError:
pass
filter(lambda x: x, map(func, li))

Borrowing idea from this post : python: restarting a loop, the first attempt can be fixed with a simple while loop
lst = ['a','b','43.23','c','9','22']
temp = 0
while temp<len(lst):
try:
lst[temp] = float(lst[temp])
temp+=1
except ValueError:
lst.remove(lst[temp])
temp = 0
which leaves me with the desired result (by resetting the loop iterator)
lst = [43.23, 9.0 , 22.0]

Related

Filtering out lists and only keeping integers in python

I am trying to take a list with a mix of integers and strings, filter out the strings in the list, and only keep the integers. An example of a list I might have to filter:
filter_list([1,2,'a','b'])
Here is my code:
new_list = list()
def filter_list(l):
for step in l:
if type(l[step]) == int:
new_list.append(int(l[step]))
else:
pass
return new_list
However, I am getting the error:
Traceback (most recent call last):
File "tests.py", line 3, in <module>
test.assert_equals(filter_list([1,2,'a','b']),[1,2])
File "/workspace/default/solution.py", line 4, in filter_list
if type(l[step]) is int:
TypeError: list indices must be integers or slices, not str
What am I doing wrong? The traceback is coming from running the test file which tests to see if my code works, but the actual error is from my code.
Other people already provided solutions, but they didn't explain, why your didn't work.
Your list is [1,2,'a','b']. Python throws exception at element 'a'. If you run this code:
list = [1,2,'a','b']
for step in list:
print(step)
Your output would be:
1
2
a
b
As you can see 3rd element of your list is string - that's why you get
TypeError: list indices must be integers or slices, not str
To get index of element use enumerate:
list = [1,2,'a','b']
for index, value in enumarate(list):
print(f"{index}: {value}")
Output:
0: 1
1: 2
2: a
3: b
Your code with enumarate:
new_list = list()
def filter_list(l):
for step, value in enumarate(l):
if type(l[step]) == int:
new_list.append(int(l[step]))
else:
pass
return new_list
A for loop in python iterates over the elements, not the indices, of an iterable object. That means that step takes on the values 1, 2, 'a', 'b', not 0, 1, 2, 3. In other languages, this sort of iteration is sometimes called foreach.
While the first two list elements are valid indices, the last two are not. Also, l[2] is 'a', which is not an integer. The most basic fix is to fix your indexing. The example below also initializes the new list inside the function, since you probably don't want to keep appending to the same global list (run the function twice to see what I mean).
def filter_list(l):
new_list = []
for item in l:
if type(item) == int:
new_list.append(item)
return new_list
You can also make your check of integerness more idiomatic. If you only want to keep objects that are already integers, use isinstance(item, int) rather than type(item) == int. If you want to allow things like the string '123' to pass through, use int(item) in a try-except instead:
def filter_list(l):
new_list = []
for item in l:
try:
new_list.append(int(item))
except TypeError, ValueError:
pass
return new_list
The most idiomatic way to express the version with a conditional is to use a list comprehension:
def filter_list(l):
return [item for item in l if isinstance(item, int)]
Try this instead:
list(filter(lambda x: isinstance(x, int), [1,2,'a','b']))
Python's built in filter function has two arguments, first one is a function that you want to apply the filter with and the second one is the object being filtered.
output:
[1, 2]

how to convert types while keeping the same list structure

So I have this list here:
[['Afghanistan', '2.66171813', '7.460143566', '0.490880072', '52.33952713', '0.427010864', '-0.106340349', '0.261178523'], ['Albania', '4.639548302', '9.373718262', '0.637698293', '69.05165863', '0.74961102', '-0.035140377', '0.457737535'], ['Algeria', '5.248912334', '9.540244102', '0.806753874', '65.69918823', '0.436670482', '-0.194670126', ''], ['Argentina', '6.039330006', '9.843519211', '0.906699121', '67.53870392', '0.831966162', '-0.186299905', '0.305430293'], ['Armenia', '4.287736416', '9.034710884', '0.697924912', '65.12568665', '0.613697052', '-0.132166177', '0.246900991'], ['Australia', '7.25703764', '10.71182728', '0.949957848', '72.78334045', '0.910550177', '0.301693261', '0.45340696']]
My aim is to loop through the list of lists and convert number string values to integers.
I tried
for li in main_li:
for element in li:
if element == li[0]:
continue
else:
element = int(element)
My problem is how can I get this back into the same list format I had above without the numbers being strings.
You can do it by making a small change in your code
for li in main_li:
for i in range(1,len(li)):
try:
li[i] = int(li[i])
except ValueError:
pass
You shouldn't (not saying it's not possible) change list values while you loop over them. You'll have to create a new list. Fortunately, you can do it very easily with a small modification to your original code:
newlist = []
for li in main_li:
newli = []
for element in li:
if element == li[0]:
newli.append(element)
else:
try:
newli.append(int(float(element)))
except Exception as e:
newli.append(0) # This is added because not everything in your list can be converted to int.
newlist.append(newli)
newlist will be your modified list of lists.
Alternatively, you can use list comprehension:
newlist = [[p[0]] + [int(float(x)) for x in p[1:]] for p in main_li]
Note how this requires all of your string to be correctly formatted.
Your list elements are float so you can only convert them to float:
import re
pattern = re.compile('[-+]?\d+(\.\d+)?') # a pattern for all number types
new_list = []
for nest in l:
temp_list = []
for val in nest:
if bool(pattern.match(val)): #check if the element is a number
temp_list.append(float(val))
continue
temp_list.append(val)
new_list.append(temp_list)
print(new_list)
[['Afghanistan', 2.66171813, 7.460143566, 0.490880072, 52.33952713, 0.427010864, -0.106340349, 0.261178523], ['Albania', 4.639548302, 9.373718262, 0.637698293, 69.05165863, 0.74961102, -0.035140377, 0.457737535], ['Algeria', 5.248912334, 9.540244102, 0.806753874, 65.69918823, 0.436670482, -0.194670126, ''], ['Argentina', 6.039330006, 9.843519211, 0.906699121, 67.53870392, 0.831966162, -0.186299905, 0.305430293], ['Armenia', 4.287736416, 9.034710884, 0.697924912, 65.12568665, 0.613697052, -0.132166177, 0.246900991], ['Australia', 7.25703764, 10.71182728, 0.949957848, 72.78334045, 0.910550177, 0.301693261, 0.45340696]]
Simply just convert those elements that are convertible to float and change them in current list with iterating through list index:
for i in range(len(main_li)):
for j in range(len(main_li[i])):
try:
main_li[i][j] = float(main_li[i][j])
except ValueError:
continue
li # [['Afghanistan', 2.66171813, 7.460143566, 0.490880072, 52.33952713, 0.427010864, -0.106340349, 0.261178523], ['Albania', 4.639548302, 9.373718262, 0.637698293, 69.05165863, 0.74961102, -0.035140377, 0.457737535], ['Algeria', 5.248912334, 9.540244102, 0.806753874, 65.69918823, 0.436670482, -0.194670126, ''], ['Argentina', 6.039330006, 9.843519211, 0.906699121, 67.53870392, 0.831966162, -0.186299905, 0.305430293], ['Armenia', 4.287736416, 9.034710884, 0.697924912, 65.12568665, 0.613697052, -0.132166177, 0.246900991], ['Australia', 7.25703764, 10.71182728, 0.949957848, 72.78334045, 0.910550177, 0.301693261, 0.45340696]]
Most of the answers assume a fixed level of nesting (list of lists). You could (if applicable) use the following code, which uses recursion of handle deeper nested lists (lists of lists of lists, etc).
def nested_str_to_float(value):
""" Converts an string to float. Value may be a single value, a list
or even a nested list. If value is a (nested) list, all
values in the (nested) list are evaluated to floats wherever possible.
Args:
value: single value of any type or a list
Returns:
a copy of value with all float-convertible items converted to float
"""
# Test if value is a list, if so, recursively call nested_str_to_float
if isinstance(value, list):
return [nested_str_to_float(item) for item in value]
# Try to convert to float; if possible, return float, else return value
try:
return float(value)
except ValueError:
return value
By the way, check this SO answer to see what Python considers floats...
You can use a star unpacking in a listcomp:
l = [['A', '1.1', '1.2'], ['B', '2.1', '2.2']]
[[i, *map(float, j)] for i, *j in l]
# [['A', 1.1, 1.2], ['B', 2.1, 2.2]]

Uniqueify returning a empty list

I'm new to python and trying to make a function Uniqueify(L) that will be given either a list of numbers or a list of strings (non-empty), and will return a list of the unique elements of that list.
So far I have:
def Uniquefy(x):
a = []
for i in range(len(x)):
if x[i] in a == False:
a.append(x[i])
return a
It looks like the if str(x[i]) in a == False: is failing, and that's causing the function to return a empty list.
Any help you guys can provide?
Relational operators all have exactly the same precedence and are chained. This means that this line:
if x[i] in a == False:
is evaluated as follows:
if (x[i] in a) and (a == False):
This is obviously not what you want.
The solution is to remove the second relational operator:
if x[i] not in a:
You can just create a set based on the list which will only contain unique values:
>>> s = ["a", "b", "a"]
>>> print set(s)
set(['a', 'b'])
The best option here is to use a set instead! By definition, sets only contain unique items and putting the same item in twice will not result in two copies.
If you need to create it from a list and need a list back, try this. However, if there's not a specific reason you NEED a list, then just pass around a set instead (that would be the duck-typing way anyway).
def uniquefy(x):
return list(set(x))
You can use the built in set type to get unique elements from a collection:
x = [1,2,3,3]
unique_elements = set(x)
You should use set() here. It reduces the in operation time:
def Uniquefy(x):
a = set()
for item in x:
if item not in a:
a.add(item)
return list(a)
Or equivalently:
def Uniquefy(x):
return list(set(x))
If order matters:
def uniquefy(x):
s = set()
return [i for i in x if i not in s and s.add(i) is None]
Else:
def uniquefy(x):
return list(set(x))

Python: Compare more numbers

I would like to search for numbers in existing list. If is one of this numbers repeated then set variable's value to true and break for loop.
list = [3, 5, 3] //numbers in list
So if the function gets two same numbers then break for - in this case there is 3 repeated.
How to do that?
First, don't name your list list. That is a Python built-in, and using it as a variable name can give undesired side effects. Let's call it L instead.
You can solve your problem by comparing the list to a set version of itself.
Edit: You want true when there is a repeat, not the other way around. Code edited.
def testlist(L):
return sorted(set(L)) != sorted(L)
You could look into sets. You loop through your list, and either add the number to a support set, or break out the loop.
>>> l = [3, 5, 3]
>>> s = set()
>>> s
set([])
>>> for x in l:
... if x not in s:
... s.add(x)
... else:
... break
You could also take a step further and make a function out of this code, returning the first duplicated number you find (or None if the list doesn't contain duplicates):
def get_first_duplicate(l):
s = set()
for x in l:
if x not in s:
s.add(x)
else:
return x
get_first_duplicate([3, 5, 3])
# returns 3
Otherwise, if you want to get a boolean answer to the question "does this list contain duplicates?", you can return it instead of the duplicate element:
def has_duplicates(l):
s = set()
for x in l:
if x not in s:
s.add(x)
else:
return true
return false
get_first_duplicate([3, 5, 3])
# returns True
senderle pointed out:
there's an idiom that people sometimes use to compress this logic into a couple of lines. I don't necessarily recommend it, but it's worth knowing:
s = set(); has_dupe = any(x in s or s.add(x) for x in l)
you can use collections.Counter() and any():
>>> lis=[3,5,3]
>>> c=Counter(lis)
>>> any(x>1 for x in c.values()) # True means yes some value is repeated
True
>>> lis=range(10)
>>> c=Counter(lis)
>>> any(x>1 for x in c.values()) # False means all values only appeared once
False
or use sets and match lengths:
In [5]: lis=[3,3,5]
In [6]: not (len(lis)==len(set(lis)))
Out[6]: True
In [7]: lis=range(10)
In [8]: not (len(lis)==len(set(lis)))
Out[8]: False
You should never give the name list to a variable - list is a type in Python, and you can give yourself all kinds of problems masking built-in names like that. Give it a descriptive name, like numbers.
That said ... you could use a set to keep track of which numbers you've already seen:
def first_double(seq):
"""Return the first item in seq that appears twice."""
found = set()
for item in seq:
if item in found:
return item
# return will terminate the function, so no need for 'break'.
else:
found.add(item)
numbers = [3, 5, 3]
number = first_double(numbers)
without additional memory:
any(l.count(x) > 1 for x in l)

Test for list membership and get index at the same time in Python

It seems silly to write the following:
L = []
if x in L:
L[x] = something
else:
L[x] = something_else
Doesn't this perform the look-up for x twice? I tried using index(), but this gives an error when the value is not found.
Ideally I would like to say like:
if x is in L, save that index and:
...
I can appreciate that this might be a beginner python idiom, but it seems rather un-search-able. Thanks.
Another option is try/except:
d = {}
try:
d[x] = something_else
except KeyError:
d[x] = something
Same result as your code.
Edit: Okay, fast moving target. Same idiom for a list, different exception (IndexError).
Do you mean you want setdefault(key[, default])
a = {}
a['foo'] # KeyError
a.setdefault('foo', 'bar') # key not exist, set a['foo'] = 'bar'
a.setdefault('foo', 'x') # key exist, return 'bar'
If you have a list you can use index, catching the ValueError if it is thrown:
yourList = []
try:
i = yourList.index(x)
except ValueError:
i = None
Then you can test the value of i:
if i is not None:
# Do things if the item was found.
I think your question confused many because you've mixed your syntax between dict and list.
If:
L = [] # L is synonym for list and [] (braces) used to create list()
Here you are looking for a value in a list, not a key nor a value in a dict:
if x in L:
And then you use x seemingly intended as a key but in lists it's an int() index and doing if x in L: doesn't test to see if index is in L but if value is in L:
L[x]=value
So if you intend to see if a value is in L a list do:
L = [] # that's a list and empty; and x will NEVER be in an empty list.
if x in L: # that looks for value in list; not index in list
# to test for an index in a list do if len(L)>=x
idx = L.index(x)
L[idx] = something # that makes L[index]=value not L[key]=value
else:
# x is not in L so you either append it or append something_else
L.append(x)
If you use:
L[x] = something together with if x in L: then it would make sense to have a list with only these values: L=[ 0, 1, 2, 3, 4, ...] OR L=[ 1.0, 2.0, 3.0, ...]
But I'd offer this:
L = []
L.extend(my_iterable)
coder0 = 'farr'
coder1 = 'Mark Byers'
if coder0 not in L:
L.append(coder1)
Weird logic

Categories