I have a large file of names and values on a single line separated by a space:
name1 name2 name3....
Following the long list of names is a list of values corresponding to the names. The values can be 0-4 or na. What I want to do is consolidate the data file and remove all the names and and values when the value is na.
For instance, the final line of name in this file is like so:
namenexttolast nameonemore namethelast 0 na 2
I would like the following output:
namenexttolast namethelast 0 2
How would I do this using Python?
Let's say you read the names into one list, then the values into another. Once you have a names and values list, you can do something like:
result = [n for n, v in zip(names, values) if v != 'na']
result is now a list of all names whose value is not "na".
s = "name1 name2 name3 v1 na v2"
s = s.split(' ')
names = s[:len(s)/2]
values = s[len(s)/2:]
names_and_values = zip(names, values)
names, values = [], []
[(names.append(n) or values.append(v)) for n, v in names_and_values if v != "na"]
names.extend(values)
print ' '.join(names)
Update
Minor improvement after suggestion from Paul. I'm sure the list comprehension is fairly unpythonic, as it leverages the fact that list.append returns None, so both append expressions will be evaluated and a list of None values will be constructed and immediately thrown away.
I agree with Justin than using zip is a good idea. The problems is how to put the data into two different lists. Here is a proposal that should work ok.
reader = open('input.txt')
writer = open('output.txt', 'w')
names, nums = [], []
row = reader.read().split(' ')
x = len(row)/2
for (a, b) in [(n, v) for n, v in zip(row[:x], row[x:]) if v!='na']:
names.append(a)
nums.append(b)
writer.write(' '.join(names))
writer.write(' ')
writer.write(' '.join(nums))
#writer.write(' '.join(names+nums)) is nicer but cause list to be concat
or say you have a string which you have read from a file. Let's call this string as "s"
words = filter(lambda x: x!="na", s.split())
should give you all the strings except for "na"
edit: the code above obviously doesn't do what you want it to do.
the one below should work though
d = s.split()
keys = d[:len(d)/2]
vals = d[len(d)/2:]
w = " ".join(map(lambda (k,v): (k + " " + v) if v!="na" else "", zip(keys, vals)))
print " ".join([" ".join(w.split()[::2]), " ".join(w.split()[1::2])])
strlist = 'namenexttolast nameonemore namethelast 0 na 2'.split()
vals = ('0', '1', '2', '3', '4', 'na')
key_list = [s for s in strlist if s not in vals]
val_list = [s for s in strlist if s in vals]
#print [(key_list[i],v) for i, v in enumerate(val_list) if v != 'na']
filtered_keys = [key_list[i] for i, v in enumerate(val_list) if v != 'na']
filtered_vals = [v for v in val_list if v != 'na']
print filtered_keys + filtered_vals
If you'd rather group the vals, you could create a list of tuples instead (commented out line)
Here is a solution that uses just iterators plus a single buffer element, with no calls to len and no other intermediate lists created. (In Python 3, just use map and zip, no need to import imap and izip from itertools.)
from itertools import izip, imap, ifilter
def iterStartingAt(cond, seq):
it1,it2 = iter(seq),iter(seq)
while not cond(it1.next()):
it2.next()
for item in it2:
yield item
dataline = "namenexttolast nameonemore namethelast 0 na 2"
datalinelist = dataline.split()
valueset = set("0 1 2 3 4 na".split())
print " ".join(imap(" ".join,
izip(*ifilter(lambda (n,v): v != 'na',
izip(iter(datalinelist),
iterStartingAt(lambda s: s in valueset,
datalinelist))))))
Prints:
namenexttolast namethelast 0 2
Related
my target is:
while for looping a list I would like to check for duplicates and if there are some i would like to append a number to it see following example
my list output as an example:
[('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
in a loop i would like to edit those duplicates so instead of the 2nd microsoft i would like to have microsoft1 (if there would be 3 microsoft guys so the third guy would have microsoft2)
with this i can filter the duplicates but i dont know how to edit them directly in the list
list = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
names = []
double = []
for u in list[1:]:
names.append(u[1])
list_size = len(names)
for i in range(list_size):
k = i + 1
for j in range(k, list_size):
if names[i] == names[j] and names[i] not in double:
double.append(names[i])
This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
lst = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
seen = defaultdict(int)
result = []
for k, v in lst:
if seen[v]:
result.append((k, "{}_{}".format(v, seen[v])))
else:
result.append((k,v))
seen[v] += 1
print(result)
Output:
[('name', 'company'),
('someguy', 'microsoft'),
('anotherguy', 'microsoft_1'),
('thirdguy', 'amazon')]
I have 2 lists
mainlist=[['RD-12',12,'a'],['RD-13',45,'c'],['RD-15',50,'e']] and
sublist=[['RD-12',67],['RD-15',65]]
if i join both the list based on 1st element condition by using below code
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
result = [ [k] + v for k, v in dict1.items() ]
return result
Its results in like below
[['RD-12',12,'a',67],['RD-13',45,'c',],['RD-15',50,'e',65]]
as their is no element in for 'RD-13' in sublist, i want to empty string on that.
The final output should be
[['RD-12',12,'a',67],['RD-13',45,'c'," "],['RD-15',50,'e',65]]
Please help me.
Your problem can be solved using a while loop to adjust the length of your sublists until it matches the length of the longest sublist by appending the wanted string.
for list in result:
while len(list) < max(len(l) for l in result):
list.append(" ")
You could just go through the result list and check where the total number of your elements is 2 instead of 3.
for list in lists:
if len(list) == 2:
list.append(" ")
UPDATE:
If there are more items in the sublist, just subtract the lists containing the 'keys' of your lists, and then add the desired string.
def combinelist(mainlist,sublist):
dict1 = { e[0]:e[1:] for e in mainlist }
list2 = [e[0] for e in sublist]
for e in sublist:
try:
dict1[e[0]].extend(e[1:])
except:
pass
for e in dict1.keys() - list2:
dict1[e].append(" ")
result = [[k] + v for k, v in dict1.items()]
return result
You can try something like this:
mainlist=[['RD-12',12],['RD-13',45],['RD-15',50]]
sublist=[['RD-12',67],['RD-15',65]]
empty_val = ''
# Lists to dictionaries
maindict = dict(mainlist)
subdict = dict(sublist)
result = []
# go through all keys
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
# pick the value from each key or a default alternative
result.append([k, maindict.pop(k, empty_val), subdict.pop(k, empty_val)])
# sort by the key
result = sorted(result, key=lambda x: x[0])
You can set up your empty value to whatever you need.
UPDATE
Following the new conditions, it would look like this:
mainlist=[['RD-12',12,'a'], ['RD-13',45,'c'], ['RD-15',50,'e']]
sublist=[['RD-12',67], ['RD-15',65]]
maindict = {a:[b, c] for a, b, c in mainlist}
subdict = dict(sublist)
result = []
for k in list(set(list(maindict.keys()) + list(subdict.keys()))):
result.append([k, ])
result[-1].extend(maindict.pop(k, ' '))
result[-1].append(subdict.pop(k, ' '))
sorted(result, key=lambda x: x[0])
Another option is to convert the sublist to a dict, so items are easily and rapidly accessible.
sublist_dict = dict(sublist)
So you can do (it modifies the mainlist):
for i, e in enumerate(mainlist):
data: mainlist[i].append(sublist_dict.get(e[0], ""))
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c', ''], ['RD-15', 50, 'e', 65]]
Or a one liner list comprehension (it produces a new list):
[ e + [sublist_dict.get(e[0], "")] for e in mainlist ]
If you want to skip the missing element:
for i, e in enumerate(mainlist):
data = sublist_dict.get(e[0])
if data: mainlist[i].append(data)
print(mainlist)
#=> [['RD-12', 12, 'a', 67], ['RD-13', 45, 'c'], ['RD-15', 50, 'e', 65]]
I have a list of numerical values that are of type "string" right now. Some of the elements in this list have more than one value, e.g.:
AF=['0.056', '0.024, 0.0235', '0.724', '0.932, 0.226, 0.634']
The other thing is that some of the elements might be a .
With that being said, I've been trying to convert the elements of this list into floats (while still conserving the tuple if there's more than one value), but I keep getting the following error:
ValueError: could not convert string to float: .
I've tried a LOT of things to solve this, with the latest one being:
for x in AF:
if "," in x: #if there are multiple values for one AF
elements= x.split(",")
for k in elements: #each element of the sub-list
if k != '.':
k= map(float, k)
print(k) #check to see if there are still "."
else:
pass
But when I run that, I still get the same error. So I printed k from the above loop and sure enough, there were still . in the list, despite me stating NOT to include those in the string-to-float conversion.
This is my desired output:
AF=[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]
def convert(l):
new = []
for line in l:
if ',' in line:
new.append([float(j) for j in line.split(',')])
else:
try:
new.append(float(line))
except ValueError:
pass
return new
>>> convert(AF)
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]
If you try this:
result = []
for item in AF:
if item != '.':
values = list(map(float, item.split(', ')))
result.append(values)
You get:
[[0.056], [0.024, 0.0235], [0.724], [0.932, 0.226, 0.634]]
You can simplify using a comprehension list:
result = [list(map(float, item.split(', ')))
for item in AF
if item != '.']
With re.findall() function (on extended input list):
import re
AF = ['0.056', '0.024, 0.0235, .', '.', '0.724', '0.932, 0.226, 0.634', '.']
result = []
for s in AF:
items = re.findall(r'\b\d+\.\d+\b', s)
if items:
result.append(float(items[0]) if len(items) == 1 else list(map(float, items)))
print(result)
The output:
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]
I have got two lists. The first one contains names and second one names and corresponding values. The names of the first list in a subset of the name of the name of the second lists. The values are a true or false. I want to find the co-occurrences of the names of both lists and count the true values. My code:
data1 = [line.strip() for line in open("text_files/first_list.txt", 'r')]
ins = open( "text_files/second_list.txt", "r" ) # the "r" is not really needed - default
parseTable = []
for line in ins:
row = line.rstrip().split(' ') # <- note use of rstrip()
parseTable.append(row)
new_data = []
indexes = []
for index in range(len(parseTable)):
new_data.append(parseTable[index][0])
indexes.append(parseTable[index][1])
in1 =return_indices_of_a(new_data, data1)
def return_indices_of_a(a, b):
b_set = set(b)
return [i for i, v in enumerate(a) if v in b_set] #return the co-occurrences
I am reading both text files which containing the lists, i found the co-occurrences and then I want to keep from the parseTable[][1] only the in1 indices . Am I doing it right? How can I keep the indices I want? My two lists:
['SITNC', 'porkpackerpete', 'teensHijab', '1DAlert', 'IsmodoFashion',....
[['SITNC', 'true'], ['1DFAMlLY', 'false'], ['tibi', 'true'], ['1Dneews', 'false'], ....
Here's a one liner to get the matches:
matches = [(name, dict(values)[name]) for name in set(names) if name in dict(values)]
and then to get the number of true matches:
len([name for (name, value) in matches if value == 'true'])
Edit
You might want to move dict(values) into a named variable:
value_map = dict(values)
matches = [(name, value_map[name]) for name in set(names) if name in value_map]
There are two ways, one is what Andrey suggests (you may want to convert names to set), or, alternatively, convert the second list into a dictionary:
mapping = dict(values)
sum_of_true = sum(mapping[n] for n in names)
The latter sum works because bool is essentially int in Python (True == 1).
If you need just the sum of true values, then use in operator and list comprehension:
In [1]: names = ['SITNC', 'porkpackerpete', 'teensHijab', '1DAlert', 'IsmodoFashion']
In [2]: values = [['SITNC', 'true'], ['1DFAMlLY', 'false'], ['tibi', 'true'], ['1Dneews', 'false']]
In [3]: sum_of_true = len([v for v in values if v[0] in names and v[1] == "true"])
In [4]: sum_of_true
Out[4]: 1
To get also indices of co-occurrences, this one-liner may come in handy:
In [6]: true_indices = [names.index(v[0]) for v in values if v[0] in names and v[1] == "true"]
In [7]: true_indices
Out[7]: [0]
Although poorly written, this code:
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
for i in range(len(marker_array)):
if marker_array[i-1][1] != marker_array[i][1]:
marker_array_DS.append(marker_array[i])
print marker_array_DS
Returns:
[['hard', '2', 'soft'], ['fast', '3'], ['turtle', '4', 'wet']]
It accomplishes part of the task which is to create a new list containing all nested lists except those that have duplicate values in index [1]. But what I really need is to concatenate the matching index values from the removed lists creating a list like this:
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
The values in index [1] must not be concatenated. I kind of managed to do the concatenation part using a tip from another post:
newlist = [i + n for i, n in zip(list_a, list_b]
But I am struggling with figuring out the way to produce the desired result. The "marker_array" list will be already sorted in ascending order before being passed to this code. All like-values in index [1] position will be contiguous. Some nested lists may not have any values beyond [0] and [1] as illustrated above.
Quick stab at it... use itertools.groupby to do the grouping for you, but do it over a generator that converts the 2 element list into a 3 element.
from itertools import groupby
from operator import itemgetter
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
def my_group(iterable):
temp = ((el + [''])[:3] for el in marker_array)
for k, g in groupby(temp, key=itemgetter(1)):
fst, snd = map(' '.join, zip(*map(itemgetter(0, 2), g)))
yield filter(None, [fst, k, snd])
print list(my_group(marker_array))
from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2:])
res = [[' '.join(d1[x]), x, ' '.join(d2[x])] for x in sorted(d1)]
If you really need 2-tuples (which I think is unlikely):
for p in res:
if not p[-1]:
p.pop()
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
marker_array_hit = []
for i in range(len(marker_array)):
if marker_array[i][1] not in marker_array_hit:
marker_array_hit.append(marker_array[i][1])
for i in marker_array_hit:
lists = [item for item in marker_array if item[1] == i]
temp = []
first_part = ' '.join([str(item[0]) for item in lists])
temp.append(first_part)
temp.append(i)
second_part = ' '.join([str(item[2]) for item in lists if len(item) > 2])
if second_part != '':
temp.append(second_part);
marker_array_DS.append(temp)
print marker_array_DS
I learned python for this because I'm a shameless rep whore
marker_array = [
['hard','2','soft'],
['heavy','2','light'],
['rock','2','feather'],
['fast','3'],
['turtle','4','wet'],
]
data = {}
for arr in marker_array:
if len(arr) == 2:
arr.append('')
(first, index, last) = arr
firsts, lasts = data.setdefault(index, [[],[]])
firsts.append(first)
lasts.append(last)
results = []
for key in sorted(data.keys()):
current = [
" ".join(data[key][0]),
key,
" ".join(data[key][1])
]
if current[-1] == '':
current = current[:-1]
results.append(current)
print results
--output:--
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
A different solution based on itertools.groupby:
from itertools import groupby
# normalizes the list of markers so all markers have 3 elements
def normalized(markers):
for marker in markers:
yield marker + [""] * (3 - len(marker))
def concatenated(markers):
# use groupby to iterator over lists of markers sharing the same key
for key, markers_in_category in groupby(normalized(markers), lambda m: m[1]):
# get separate lists of left and right words
lefts, rights = zip(*[(m[0],m[2]) for m in markers_in_category])
# remove empty strings from both lists
lefts, rights = filter(bool, lefts), filter(bool, rights)
# yield the concatenated entry for this key (also removing the empty string at the end, if necessary)
yield filter(bool, [" ".join(lefts), key, " ".join(rights)])
The generator concatenated(markers) will yield the results. This code correctly handles the ['fast', '3'] case and doesn't return an additional third element in such cases.