Splitting a list by first character of each element - python

I have a Python list mylist whose elements are a sublist containing a string of a letter and number. I was wondering how I could split mylist by the character at the start of the string without using code with individual statements/cases for each character.
Say I want to split mylist into lists a, b, c:
mylist = [['a1'],['a2'],['c1'],['b1']]
a = [['a1'],['a2']]
b = [['b1']]
c = [['c1']]
It is important that I keep them as a list-of-lists (even though it's only a single element in each sublist).

This will work:
import itertools as it
mylist = [['a1'],['a2'],['c1'],['b1']]
keyfunc = lambda x: x[0][0]
mylist = sorted(mylist, key=keyfunc)
a, b, c = [list(g) for k, g in it.groupby(mylist, keyfunc)]
The line where sorted() is used is necessary only if the elements in mylist are not already sorted by the character at the start of the string.
EDIT :
As pointed out in the comments, a more general solution (one that does not restrict the number of variables to just three) would be using dictionary comprehensions (available in Python 2.7+) like this:
result_dict = {k: list(g) for k, g in it.groupby(mylist, keyfunc)}
Now the answer is keyed in the dictionary by the first character:
result_dict['a']
> [['a1'],['a2']]
result_dict['b']
> [['b1']]
result_dict['c']
> [['c1']]

Using a dictionary could work too
mylist = [['a1'],['a2'],['c1'],['b1']]
from collections import defaultdict
dicto = defaultdict(list)
for ele in mylist:
dicto[ele[0][0]].append(ele)
Result:
>>> dicto
defaultdict(<type 'list'>, {'a': [['a1'], ['a2']], 'c': [['c1']], 'b': [['b1']]})
It does not give the exact result you were asking for; however, it is quite easy to access a list of lists associated with each letter
>>> dicto['a']
[['a1'], ['a2']]

You can also get these sublists by using a simple function:
def get_items(mylist, letter):
return [item for item in mylist if item[0][0] == letter]
The expression item[0][0] simply means to get the first letter of the first element of the current item. You can then call the function for each letter:
a = get_items(mylist, 'a')
b = get_items(mylist, 'b')
c = get_items(mylist, 'c')

Related

How can it taken the specific args in lists of list?

for example we have;
L = [["Ak","154"],["Bm","200"],["Ck","250"], ["Ad","500"],["Ac","600"]]
I want to choose first element starting with 'A' I want to find their values which are in second element; see this output should like
["154","500","600"] or like [["154"],["500"],["600"]]
Filter and map with a list comprehension:
[b for a, b in L if a[0] == "A"]
Or, if you need to search for prefixes of more than one character:
[b for a, b in L if a.startswith("A")]
Another Solution Using map() and filter() functions
L = [["Ak","154"],["Bm","200"],["Ck","250"], ["Ad","500"],["Ac","600"]]
k = list(map(lambda y:y[1], list(filter(lambda x: x[0][0] == 'A' , L))))
Output:
['154', '500', '600']

Filter list based on the element in nested list at specific index

I have a list of list containing:
[['4.2','3.4','G'],['2.4','1.2','H'],['8.7','5.4','G']]
and i want to obtain the value from the list of list by referring to the alphabet in the third section of every list inside the list of list.
example, I want python to print the element represented by letter 'G' for every item in the list of list.
output = [4.2,3.4]
[8.7,5.4]
Here's what I've tried:
L = [['4.2','3.4','G'],['2.4','1.2','H'],['8.7','5.4','G']]
newList = []
for line in L:
if line[0][2] == 'G'
newList.append([float(i) for i in line[0:2]])
print(newList)
my error would be on line 5 as I'm not sure if i am able to do it this way. Regards.
Simple list comprehension:
L = [['4.2','3.4','G'],['2.4','1.2','H'],['8.7','5.4','G']]
newList = [l[0:2] for l in L if l[2] == 'G']
print(newList)
The output:
[['4.2', '3.4'], ['8.7', '5.4']]
I would suggest using a collections.defaultdict, as a multi-value dictionary:
from collections import defaultdict
d = defaultdict(list)
for x in L:
d[x[2]].append(x[:2])
Now you can use d['G'] to get what you wanted, but also d['H'] to get the result for 'H'!
Edit: Source append multiple values for one key in Python dictionary
There are 2 issues in your code,
1. line = ['4.2', '3.4', 'G'] for 1st iteration
hence to check for 'G', look out for line[2] == 'G' instead of line[0][3] == 'G'
2. use 'G' instead off 'house'.
>>> for line in L:
... if line[2] == 'G':
... newList.append([float(i) for i in line[0:2]])
...
>>> newList
[[4.2, 3.4], [8.7, 5.4]]
you can use a dictionary by iterating the list of lists.
lst = [['4.2','3.4','G'],['2.4','1.2','H'],['8.7','5.4','G']]
dict1 = {}
for l in lst:
if l[2] in dict1.keys():
dict1[l[2]].append(l[0:2])
else:
dict1[l[2]] = [l[0:2]]
print l[0:2]
print dict1['G']
A list comprehension will do this:
newList = [[float(j) for j in i[:-1]] for i in L if i[2]=='G']
#Electric your edited code.
L = [['4.2','3.4','G'],['2.4','1.2','H'],['8.7','5.4','G']]
newList = []
# line = ['4.2','3.4','G']
for line in L:
if line[2] == 'G': # ':' was missing.
newList.append(line[:2]) # line[:2] => ['4.2','3.4']
print(newList)
You may create a function to return sub-lists based on the element using an list comprehension expression along with the usage of map as:
def get_element_by_alpha(alpha, data_list):
# v map returns generator object in Python 3.x,hence type-cast to `list`
return [list(map(float, s[:2])) for s in data_list if s[2]==alpha]
# ^ type-cast the number string to `float` type
Sample Runs:
>>> my_list = [['4.2','3.4','G'],['2.4','1.2','H'],['8.7','5.4','G']]
>>> get_element_by_alpha('G', my_list)
[[4.2, 3.4], [8.7, 5.4]]
>>> get_element_by_alpha('H', my_list)
[[2.4, 1.2]]
>>> get_element_by_alpha('A', my_list) # 'A' not in the list
[]

Pair strings in list based on containing text in Python

I'm looking to take a list of strings and create a list of tuples that groups items based on whether they contain the same text.
For example, say I have the following list:
MyList=['Apple1','Pear1','Apple3','Pear2']
I want to pair them based on all but the last character of their string, so that I would get:
ListIWant=[('Apple1','Apple3'),('Pear1','Pear2')]
We can assume that only the last character of the string is used to identify. Meaning I'm looking to group the strings by the following unique values:
>>> list(set([x[:-1] for x in MyList]))
['Pear', 'Apple']
In [69]: from itertools import groupby
In [70]: MyList=['Apple1','Pear1','Apple3','Pear2']
In [71]: [tuple(v) for k, v in groupby(sorted(MyList, key=lambda x: x[:-1]), lambda x: x[:-1])]
Out[71]: [('Apple1', 'Apple3'), ('Pear1', 'Pear2')]
Consider this code:
def alphagroup(lst):
results = {}
for i in lst:
letter = i[0].lower()
if not letter in results.keys():
results[letter] = [i,]
else:
results[letter].append(i)
output = []
for k in results.keys():
res = results[k]
output.append(res)
return output
arr = ["Apple1", "Pear", "Apple2", "Pack"];
print alphagroup(arr);
This will achieve your goal. If each element must be a tuple, use the tuple() builtin in order to convert each element to a tuple. Hope this helps; I tested the code.

Python:reduce list but keep details

say i have a list of items which some of them are similiar up to a point
but then differ by a number after a dot
['abc.1',
'abc.2',
'abc.3',
'abc.7',
'xyz.1',
'xyz.3',
'xyz.11',
'ghj.1',
'thj.1']
i want to to produce from this list a new list which collapses multiples but preserves some of their data, namely the numbers suffixes
so the above list should produce a new list
[('abc',('1','2','3','7'))
('xyz',('1','3','11'))
('ghj',('1'))
('thj',('1'))]
what I have thought, is the first list can be split by the dot into pairs
but then how i group the pairs by the first part without losing the second
I'm sorry if this question is noobish, and thanks in advance
...
wow, I didnt expect so many great answers so fast, thanks
from collections import defaultdict
d = defaultdict(list)
for el in elements:
key, nr = el.split(".")
d[key].append(nr)
#revert dict to list
newlist = d.items()
Map the list with a separator function, use itertools.groupby with a key that takes the first element, and collect the second element into the result.
from itertools import groupby, imap
list1 = ["abc.1", "abc.2", "abc.3", "abc.7", "xyz.1", "xyz.3", "xyz.11", "ghj.1", "thj.1"]
def break_up(s):
a, b = s.split(".")
return a, int(b)
def prefix(broken_up): return broken_up[0]
def suffix(broken_up): return broken_up[1]
result = []
for key, sub in groupby(imap(break_up, list1), prefix):
result.append((key, tuple(imap(suffix, sub))))
print result
Output:
[('abc', (1, 2, 3, 7)), ('xyz', (1, 3, 11)), ('ghj', (1,)), ('thj', (1,))]

Get list based on occurrences in unknown number of sublists

I'm looking for a way to make a list containing list (a below) into a single list (b below) with 2 conditions:
The order of the new list (b) is based on the number of times the value has occurred in some of the lists in a.
A value can only appear once
Basically turn a into b:
a = [[1,2,3,4], [2,3,4], [4,5,6]]
# value 4 occurs 3 times in list a and gets first position
# value 2 occurs 2 times in list a and get second position and so on...
b = [4,2,3,1,5,6]
I figure one could do this with set and some list magic. But can't get my head around it when a can contain any number of list. The a list is created based on user input (I guess that it can contain between 1 - 20 list with up 200-300 items in each list).
My trying something along the line with [set(l) for l in a] but don't know how to perform set(l) & set(l).... to get all matched items.
Is possible without have a for loop iterating sublist count * items in sublist times?
I think this is probably the closest you're going to get:
from collections import defaultdict
d = defaultdict(int)
for sub in outer:
for val in sub:
d[val] += 1
print sorted(d.keys(), key=lambda k: d[k], reverse = True)
# Output: [4, 2, 3, 1, 5, 6]
There is an off chance that the order of elements that appear an identical number of times may be indeterminate - the output of d.keys() is not ordered.
import itertools
all_items = set(itertools.chain(*a))
b = sorted(all_items, key = lambda y: -sum(x.count(y) for x in a))
Try this -
a = [[1,2,3,4], [2,3,4], [4,5,6]]
s = set()
for l in a:
s.update(l)
print s
#set([1, 2, 3, 4, 5, 6])
b = list(s)
This will add each list to the set, which will give you a unique set of all elements in all the lists. If that is what you are after.
Edit. To preserve the order of elements in the original list, you can't use sets.
a = [[1,2,3,4], [2,3,4], [4,5,6]]
b = []
for l in a:
for i in l:
if not i in b:
b.append(i)
print b
#[1,2,3,4,5,6] - The same order as the set in this case, since thats the order they appear in the list
import itertools
from collections import defaultdict
def list_by_count(lists):
data_stream = itertools.chain.from_iterable(lists)
counts = defaultdict(int)
for item in data_stream:
counts[item] += 1
return [item for (item, count) in
sorted(counts.items(), key=lambda x: (-x[1], x[0]))]
Having the x[0] in the sort key ensures that items with the same count are in some kind of sequence as well.

Categories