Converting strings within a list into floats - python

I have a list of numerical values that are of type "string" right now. Some of the elements in this list have more than one value, e.g.:
AF=['0.056', '0.024, 0.0235', '0.724', '0.932, 0.226, 0.634']
The other thing is that some of the elements might be a .
With that being said, I've been trying to convert the elements of this list into floats (while still conserving the tuple if there's more than one value), but I keep getting the following error:
ValueError: could not convert string to float: .
I've tried a LOT of things to solve this, with the latest one being:
for x in AF:
if "," in x: #if there are multiple values for one AF
elements= x.split(",")
for k in elements: #each element of the sub-list
if k != '.':
k= map(float, k)
print(k) #check to see if there are still "."
else:
pass
But when I run that, I still get the same error. So I printed k from the above loop and sure enough, there were still . in the list, despite me stating NOT to include those in the string-to-float conversion.
This is my desired output:
AF=[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]

def convert(l):
new = []
for line in l:
if ',' in line:
new.append([float(j) for j in line.split(',')])
else:
try:
new.append(float(line))
except ValueError:
pass
return new
>>> convert(AF)
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]

If you try this:
result = []
for item in AF:
if item != '.':
values = list(map(float, item.split(', ')))
result.append(values)
You get:
[[0.056], [0.024, 0.0235], [0.724], [0.932, 0.226, 0.634]]
You can simplify using a comprehension list:
result = [list(map(float, item.split(', ')))
for item in AF
if item != '.']

With re.findall() function (on extended input list):
import re
AF = ['0.056', '0.024, 0.0235, .', '.', '0.724', '0.932, 0.226, 0.634', '.']
result = []
for s in AF:
items = re.findall(r'\b\d+\.\d+\b', s)
if items:
result.append(float(items[0]) if len(items) == 1 else list(map(float, items)))
print(result)
The output:
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]

Related

How can I cut a list into a list of lists based on the presence of a particular string?

I'll try my best to explain.
Say I have this; it represents a username (ex: jjo), an optional real name (ex: josh) and it's always followed by a "remove".
list_of_people = ['jjo','josh','remove','flor30','florentina','remove','mary_h','remove','jasoncel3','jason celora','remove', 'lashit', 'remove']
My goal is to achieve this:
cut_list = [ ['jjo','josh'], ['flor30', 'florentina'], ['mary_h'], ['jasoncel3', 'jason celora'], ['lashit']]
The problem here is that the real name is optional and therefore, it's not always a perfect "trio". In other words, I need to use the presence of "remove" as a pivot to cut my list.
Verbally speaking, I would say that the code would be:
if you meet "remove", go backwards and store everything until you meet another "remove"
One issue is that there's no "remove" at the start (although I could manually add it), but my main issue is the logic. I can't get it right.
Here's my "best" shot so far and what it gives:
list_of_people = ['jjo','josh','remove','flor30','florentina','remove','mary_h','remove','jasoncel3','jason celora','remove', 'lashit', 'remove']
#Add the first 2 items
#If "remove" is there (means there was no real name), remove it
#Turn list into a list of lists
cut_list = list_of_people[0:2]
if "remove" in cut_list:
cut_list.remove("remove")
cut_list = [cut_list]
#Loop through and cut based on the presence of "remove"
for i in range(2, len(list_of_people)):
if list_of_people[i] == 'remove':
first_back = list_of_people[i-1]
if list_of_people.append(list_of_people[i-2]) != 'remove':
second_back = list_of_people[i-2]
cut_list.append([first_back, second_back])
print(cut_list)
# #Should give:
# ##cut_list = [ ['jjo','josh'], ['flor30', 'florentina'], ['mary_h'], ['jasoncel3', 'jason celora'], ['lashit']]
[['jjo', 'josh'], ['josh', 'jjo'], ['josh', 'jjo'], ['josh', 'jjo'],
['florentina', 'flor30'], ['florentina', 'flor30'], ['mary_h',
'remove'], ['mary_h', 'remove'], ['mary_h', 'remove'], ['jason
celora', 'jasoncel3'], ['jason celora', 'jasoncel3'], ['lashit',
'remove']]
I chose to keep this simple and iterate once through the list using the ”remove” as the marker to do additional processing.
list_of_people = ['jjo','josh','remove','flor30','florentina','remove','mary_h','remove','jasoncel3','jason celora','remove', 'lashit', 'remove']
result = []
user = []
for name in list_of_people:
if name != "remove":
# Add to the people list
user.append(name)
else:
# Found a remove, reset `user` after adding to result
result.append(user)
user = []
print(result)
from itertools import groupby
sentence = ['jjo', 'josh', 'remove', 'flor30', 'florentina', 'remove', 'mary_h',
'remove', 'jasoncel3', 'jason celora', 'remove', 'lashit', 'remove']
i = (list(g) for _, g in groupby(sentence, key='remove'.__ne__))
l = [a + b for a, b in zip(i, i)]
N = 'remove'
res = [[ele for ele in sub if ele != N] for sub in l]
print(res)
Try:
from itertools import groupby
out = [
list(g) for v, g in groupby(list_of_people, lambda x: x != "remove") if v
]
print(out)
Prints:
[['jjo', 'josh'], ['flor30', 'florentina'], ['mary_h'], ['jasoncel3', 'jason celora'], ['lashit']]
Alternate version of #Jarvis's answer:
a= ['jjo','josh','remove','flor30','florentina','remove','mary_h','remove','jasoncel3','jason celora','remove', 'lashit', 'remove']
b=[]
start_index=0
for i,j in enumerate(a):
if j=='remove':
b.append(a[start_index:i])
start_index=i+1
print(b)
Logic:
Using append, append the whole list from start_index(which is 0 at intial) to to index of 'remove'
Then reset the start_index` to the index of 'remove' + 1.
(index of 'remove) + 1 to remove 'remove' in the output.

python edit tuple duplicates in a list

my target is:
while for looping a list I would like to check for duplicates and if there are some i would like to append a number to it see following example
my list output as an example:
[('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
in a loop i would like to edit those duplicates so instead of the 2nd microsoft i would like to have microsoft1 (if there would be 3 microsoft guys so the third guy would have microsoft2)
with this i can filter the duplicates but i dont know how to edit them directly in the list
list = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
names = []
double = []
for u in list[1:]:
names.append(u[1])
list_size = len(names)
for i in range(list_size):
k = i + 1
for j in range(k, list_size):
if names[i] == names[j] and names[i] not in double:
double.append(names[i])
This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
lst = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
seen = defaultdict(int)
result = []
for k, v in lst:
if seen[v]:
result.append((k, "{}_{}".format(v, seen[v])))
else:
result.append((k,v))
seen[v] += 1
print(result)
Output:
[('name', 'company'),
('someguy', 'microsoft'),
('anotherguy', 'microsoft_1'),
('thirdguy', 'amazon')]

Find and replace some elements in a list using python

I have to search all elements in a list and replace all occurrences of one element with another. What is the best way to do this?
For example, suppose my list has the following elements:
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt',
'=>',
'text/CommitTest.txt',
'0',
'README.md',
'=>',
'text/README.md',
'0']
and I need to replace all occurrences of character '=>' with the combined value from elements before and after the character '=>', so the output I need is:
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt=>text/CommitTest.txt',
'0',
'README.md=>text/README.md',
'0']
This is my code I wrote so far:
ind = data.index("=>")
item_to_replace = data[ind]
combine = data[ind-1]+data[ind]+data[ind+1]
replacement_value = combine
indices_to_replace = [i for i,x in enumerate(data) if x==item_to_replace]
for i in indices_to_replace:
data[i] = replacement_value
data
However, the unwanted output is like this :
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt',
'CommitTest.txt=>text/CommitTest.txt',
'text/CommitTest.txt',
'0',
'README.md',
'CommitTest.txt=>text/CommitTest.txt',
'text/README.md',
'0']
Is there a better way?
Your general algorithm is correct.
However, data.index("->") will only find the index of the first occurance of "->".
You need to find all occurrences of "=>" store it in a list, combine the elements and replace for each of the occurances.
To find the index of all occurance of "=>", you can use:
indices = [i for i, x in enumerate(data) if x == "=>"]
As #alpha_989 suggested first find the index of => element and replace for each occurances, hope this may help
>>> indices = [i for i, x in enumerate(data) if x == "=>"]
>>> for i in indices: #this will add one index upper and one index lower of elem "=>" with elem
data[i-1] = data[i-1]+ data[i] + data[i+1]
>>> for elem in data:
if elem == "=>":
del data[data.index("=>")+1]
del data[data.index("=>")]
>>> data
['a34b3f8b22783cf748d8ec99b651ddf35204d40c', 'baa6cb4298d90db1c375c63ee28733eb144b7266', 'CommitTest.txt=>text/CommitTest.txt', '0', 'README.md=>text/README.md', '0']
It was correctly pointed out to you that data.index will only return the index of the first occurence of an element. Furthermore, you code does not remove the entries after and before the "=>".
For a solution that mutates your list, you could use del, but I recommend using this neat slicing syntax that Python offers.
indices = [i for i, val in enumerate(data) if val == '=>']
for i in reversed(indices):
data[i-1: i+2] = [data[i-1] + data[i] + data[i+1]]
I also suggest you attempt an implementation that generates a new list in a single pass. Mutating a list can be a bad practice and has no real advantage over creating a new list like so.
new_data = []
i = 0
while i < len(data):
if i + 1 < len(data) and data[i + 1] == "=>":
new_data.append(data[i] + data[i+1] + data[i+2])
i += 3
else:
new_data.append(data[i])
i += 1
Below is my little experiment, I added a function to call. You can check for it:
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt',
'=>',
'text/CommitTest.txt',
'0',
'README.md',
'=>',
'text/README.md',
'0']
def convert_list():
ind = [i for i, x in enumerate(data) if x == "=>"]
if ind == 0 or ind == len(data) - 1:
print("Invalid element location")
return
new_data = []
index_start = 0
while index_start < len(data):
for ind_index in ind:
if index_start == ind_index -1:
index_start += 3
new_data.append(data[ind_index - 1] + data[ind_index] +data[ind_index + 1])
new_data.append(data[index_start])
index_start += 1
return new_data
print(convert_list())
The indexs that need to be deleted are saved first, then deleted.
delete_index=[]
for i,d in enumerate(data):
if(d=="=>"):
data[i]=data[i-1]+data[i]+data[i+1]
delete_index.append(i-1)
delete_index.append(i+1)
new_data=[]
for i,d in enumerate(data):
if i not in delete_index:
new_data.append(d)
print(new_data)

Merge nested list items based on a repeating value

Although poorly written, this code:
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
for i in range(len(marker_array)):
if marker_array[i-1][1] != marker_array[i][1]:
marker_array_DS.append(marker_array[i])
print marker_array_DS
Returns:
[['hard', '2', 'soft'], ['fast', '3'], ['turtle', '4', 'wet']]
It accomplishes part of the task which is to create a new list containing all nested lists except those that have duplicate values in index [1]. But what I really need is to concatenate the matching index values from the removed lists creating a list like this:
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
The values in index [1] must not be concatenated. I kind of managed to do the concatenation part using a tip from another post:
newlist = [i + n for i, n in zip(list_a, list_b]
But I am struggling with figuring out the way to produce the desired result. The "marker_array" list will be already sorted in ascending order before being passed to this code. All like-values in index [1] position will be contiguous. Some nested lists may not have any values beyond [0] and [1] as illustrated above.
Quick stab at it... use itertools.groupby to do the grouping for you, but do it over a generator that converts the 2 element list into a 3 element.
from itertools import groupby
from operator import itemgetter
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
def my_group(iterable):
temp = ((el + [''])[:3] for el in marker_array)
for k, g in groupby(temp, key=itemgetter(1)):
fst, snd = map(' '.join, zip(*map(itemgetter(0, 2), g)))
yield filter(None, [fst, k, snd])
print list(my_group(marker_array))
from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2:])
res = [[' '.join(d1[x]), x, ' '.join(d2[x])] for x in sorted(d1)]
If you really need 2-tuples (which I think is unlikely):
for p in res:
if not p[-1]:
p.pop()
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
marker_array_hit = []
for i in range(len(marker_array)):
if marker_array[i][1] not in marker_array_hit:
marker_array_hit.append(marker_array[i][1])
for i in marker_array_hit:
lists = [item for item in marker_array if item[1] == i]
temp = []
first_part = ' '.join([str(item[0]) for item in lists])
temp.append(first_part)
temp.append(i)
second_part = ' '.join([str(item[2]) for item in lists if len(item) > 2])
if second_part != '':
temp.append(second_part);
marker_array_DS.append(temp)
print marker_array_DS
I learned python for this because I'm a shameless rep whore
marker_array = [
['hard','2','soft'],
['heavy','2','light'],
['rock','2','feather'],
['fast','3'],
['turtle','4','wet'],
]
data = {}
for arr in marker_array:
if len(arr) == 2:
arr.append('')
(first, index, last) = arr
firsts, lasts = data.setdefault(index, [[],[]])
firsts.append(first)
lasts.append(last)
results = []
for key in sorted(data.keys()):
current = [
" ".join(data[key][0]),
key,
" ".join(data[key][1])
]
if current[-1] == '':
current = current[:-1]
results.append(current)
print results
--output:--
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
A different solution based on itertools.groupby:
from itertools import groupby
# normalizes the list of markers so all markers have 3 elements
def normalized(markers):
for marker in markers:
yield marker + [""] * (3 - len(marker))
def concatenated(markers):
# use groupby to iterator over lists of markers sharing the same key
for key, markers_in_category in groupby(normalized(markers), lambda m: m[1]):
# get separate lists of left and right words
lefts, rights = zip(*[(m[0],m[2]) for m in markers_in_category])
# remove empty strings from both lists
lefts, rights = filter(bool, lefts), filter(bool, rights)
# yield the concatenated entry for this key (also removing the empty string at the end, if necessary)
yield filter(bool, [" ".join(lefts), key, " ".join(rights)])
The generator concatenated(markers) will yield the results. This code correctly handles the ['fast', '3'] case and doesn't return an additional third element in such cases.

Removing values from a list in python

I have a large file of names and values on a single line separated by a space:
name1 name2 name3....
Following the long list of names is a list of values corresponding to the names. The values can be 0-4 or na. What I want to do is consolidate the data file and remove all the names and and values when the value is na.
For instance, the final line of name in this file is like so:
namenexttolast nameonemore namethelast 0 na 2
I would like the following output:
namenexttolast namethelast 0 2
How would I do this using Python?
Let's say you read the names into one list, then the values into another. Once you have a names and values list, you can do something like:
result = [n for n, v in zip(names, values) if v != 'na']
result is now a list of all names whose value is not "na".
s = "name1 name2 name3 v1 na v2"
s = s.split(' ')
names = s[:len(s)/2]
values = s[len(s)/2:]
names_and_values = zip(names, values)
names, values = [], []
[(names.append(n) or values.append(v)) for n, v in names_and_values if v != "na"]
names.extend(values)
print ' '.join(names)
Update
Minor improvement after suggestion from Paul. I'm sure the list comprehension is fairly unpythonic, as it leverages the fact that list.append returns None, so both append expressions will be evaluated and a list of None values will be constructed and immediately thrown away.
I agree with Justin than using zip is a good idea. The problems is how to put the data into two different lists. Here is a proposal that should work ok.
reader = open('input.txt')
writer = open('output.txt', 'w')
names, nums = [], []
row = reader.read().split(' ')
x = len(row)/2
for (a, b) in [(n, v) for n, v in zip(row[:x], row[x:]) if v!='na']:
names.append(a)
nums.append(b)
writer.write(' '.join(names))
writer.write(' ')
writer.write(' '.join(nums))
#writer.write(' '.join(names+nums)) is nicer but cause list to be concat
or say you have a string which you have read from a file. Let's call this string as "s"
words = filter(lambda x: x!="na", s.split())
should give you all the strings except for "na"
edit: the code above obviously doesn't do what you want it to do.
the one below should work though
d = s.split()
keys = d[:len(d)/2]
vals = d[len(d)/2:]
w = " ".join(map(lambda (k,v): (k + " " + v) if v!="na" else "", zip(keys, vals)))
print " ".join([" ".join(w.split()[::2]), " ".join(w.split()[1::2])])
strlist = 'namenexttolast nameonemore namethelast 0 na 2'.split()
vals = ('0', '1', '2', '3', '4', 'na')
key_list = [s for s in strlist if s not in vals]
val_list = [s for s in strlist if s in vals]
#print [(key_list[i],v) for i, v in enumerate(val_list) if v != 'na']
filtered_keys = [key_list[i] for i, v in enumerate(val_list) if v != 'na']
filtered_vals = [v for v in val_list if v != 'na']
print filtered_keys + filtered_vals
If you'd rather group the vals, you could create a list of tuples instead (commented out line)
Here is a solution that uses just iterators plus a single buffer element, with no calls to len and no other intermediate lists created. (In Python 3, just use map and zip, no need to import imap and izip from itertools.)
from itertools import izip, imap, ifilter
def iterStartingAt(cond, seq):
it1,it2 = iter(seq),iter(seq)
while not cond(it1.next()):
it2.next()
for item in it2:
yield item
dataline = "namenexttolast nameonemore namethelast 0 na 2"
datalinelist = dataline.split()
valueset = set("0 1 2 3 4 na".split())
print " ".join(imap(" ".join,
izip(*ifilter(lambda (n,v): v != 'na',
izip(iter(datalinelist),
iterStartingAt(lambda s: s in valueset,
datalinelist))))))
Prints:
namenexttolast namethelast 0 2

Categories