1d list indexing python: enhance MaskableList - python

A common problem of mine is the following:
As input I have (n is some int >1)
W = numpy.array(...)
L = list(...)
where
len(W) == n
>> true
shape(L)[0] == n
>> true
And I want to sort the list L regarding the values of W and a comparator. My idea was to do the following:
def my_zip_sort(W,L):
srt = argsort(W)
return zip(L[srt],W[srt])
This should work like this:
a = ['a', 'b', 'c', 'd']
b = zeros(4)
b[0]=3;b[1]=2;b[2]=[1];b[3]=4
my_zip_sort(a,b)
>> [(c,1)(b,2)(a,3)(d,4)]
But this does not, because
TypeError: only integer arrays with one element can be converted to an index
thus, I need to do another loop:
def my_zip_sort(W,L):
srt = argsort(W)
res = list()
for i in L:
res.append((L[srt[i]],W[srt[i]]))
return res
I found a thread about a MaskableList, but this does not work for me (as you can read in the comments), because I would not only need to hold or discard particular values of my list, but also need to re-order them:
a.__class__
>> msk.MaskableList
srt = argsort(b)
a[srt]
>> ['a', 'b', 'd']
Concluding:
I want to find a way to sort a list of objects by constraints in an array. I found a way myself, which is kind of nice except for the list-indexing. Can you help me to write a class that works likewise to MaskableList for this task, which has a good performance?

You don't need to extend list do avoid the for-loop. A list-comprehension is sufficient and probably the best you can do here, if you expect a new list of tuples:
def my_zip_sort(W, L):
srt = argsort(W)
return [(L[i], W[i]) for i in srt]
Example:
n = 5
W = np.random.randint(10,size=5)
L = [chr(ord('A') + i) for i in W]
L # => ['A', 'C', 'H', 'G', 'C']
srt = np.argsort(W)
result = [(L[i], W[i]) for i in srt]
print result
[('A', 0), ('C', 2), ('C', 2), ('G', 6), ('H', 7)]

Related

Split tuple / collection into instances of pattern

I have a tuple [but it can be any collection] that contains elements:
tple = (1.02, 'a', 'b', 1.02, 'a', 'b')
I'm trying to find a way to count the number of times the pattern (1.02, 'a', 'b') occurs within the tuple. The pattern may not exist at all, in which case I would want to return 0.
Is there such a way?
One approach using itertools:
from itertools import tee
def wise(iterable):
"""Iterate over contiguous overlapping chunks"""
a, b, c = tee(iterable, 3)
next(b, None)
next(c, None)
next(c, None)
return zip(a, b, c)
tple = (1.02, 'a', 'b', 1.02, 'a', 'b')
pattern = (1.02, 'a', 'b')
res = sum(i == pattern for i in wise(tple))
print(res)
Output
2
The function wise, is a generalization of itertools.pairwise. For the above example it returns something similar to:
[(1.02, 'a', 'b'), ('a', 'b', 1.02), ('b', 1.02, 'a'), (1.02, 'a', 'b')]
Note that by using itertools tple can be any collection. The expression:
res = sum(i == pattern for i in wise(tple))
is equivalent to the following for-loop:
res = 0
for i in wise(tple):
if pattern == i:
res += 1
print(res)
If you want to iterate in chunks of different lengths, use the following general wise function:
def wise(iterable, n=3):
"""Iterate over contiguous overlapping chunks"""
its = tee(iterable, n)
for i, it in enumerate(its):
for _ in range(i):
next(it)
return zip(*its)
UPDATE
The general function can be linear as suggested by #KellyBundy:
def wise(iterable, n=3):
"""Iterate over contiguous overlapping chunks"""
its = []
for _ in range(n):
iterable, it = tee(iterable)
its.append(it)
next(iterable, None)
return zip(*its)
You can also do something like this if you're trying out for the One-Line Olympics:
>>> tple = (1.02, 'a', 'b', 1.02, 'a', 'b')
>>> pattern = (1.02, 'a', 'b')
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
2
It should also work for patterns of any length:
>>> pattern = (1.02,)
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
2
It will return 0 if the pattern is longer than the tuple, or if the pattern is not found:
>>> pattern = (1.02, 'a', 'b', 1.02, 'a', 'b', 'c')
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
0
>>> pattern = (1.03,)
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
0
Yes, there's a simple and straightforward way to do so:
# collection = ...
count = 0
for ind in range(2, len(collection)):
if collection [ind-2] == 1.02 and collection [ind-1] == 'a' and collection [ind] == 'b':
count += 1
print(count)
Want to point this out, that this code is valid with any type that is subscritable and can be passed to len(). But notice, that length of collection should be >= 3 (If otherwise it's trivial).

Count elements in a nested list in an elegant way

I have nested tuples in a list like
l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')]
I want to know how many 'a' and 'b' in the list in total. So I currently use the following code to get the result.
amount_a_and_b = len([None for _, elem2, elem3 in l if elem2 == 'a' or elem3 == 'b'])
But I got amount_a_and_b = 1, so how to get the right answer?
Also, is there a more elegant way (less code or higher performance or using builtins) to do this?
I'd flatten the list with itertools.chain.from_iterable() and pass it to a collections.Counter() object:
from collections import Counter
from itertools import chain
counts = Counter(chain.from_iterable(l))
amount_a_and_b = counts['a'] + counts['b']
Or use sum() to count how many times a value appears in the flattened sequence:
from itertools import chain
amount_a_and_b = sum(1 for v in chain.from_iterable(l) if v in {'a', 'b'})
The two approaches are pretty much comparable in speed on Python 3.5.1 on my Macbook Pro (OS X 10.11):
>>> from timeit import timeit
>>> from collections import Counter
>>> from itertools import chain
>>> l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')] * 1000 # make it interesting
>>> def counter():
... counts = Counter(chain.from_iterable(l))
... counts['a'] + counts['b']
...
>>> def summing():
... sum(1 for v in chain.from_iterable(l) if v in {'a', 'b'})
...
>>> timeit(counter, number=1000)
0.5640139860006457
>>> timeit(summing, number=1000)
0.6066895100011607
You want to avoid putting data in a datastructure. The [...] syntax constructs a new list and fills it with the content you put in ... , after which the length of the array is taken and the array is never used. If the list if very large, this uses a lot of memory, and it is inelegant in general. You can also use iterators to loop over the existing data structure, e.g., like so:
sum(sum(c in ('a', 'b') for c in t) for t in l)
The c in ('a', 'b') predicate is a bool which evaluates to a 0 or 1 when cast to an int, causing the sum() to only count the tuple entry if the predicate evaluates to True.
Just for fun, functional method using reduce:
>>> l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')]
>>> from functools import reduce
>>> reduce(lambda x, y: (1 if 'a' in y else 0) + (1 if 'b' in y else 0) + x, l, 0)
4
You can iterate over both the list and the sub-lists in one list comprehension:
len([i for sub_list in l for i in sub_list if i in ("a", "b")])
I think that's fairly concise.
To avoid creating a temporary list, you could use a generator expression to create a sequence of 1s and pass that to sum:
sum(1 for sub_list in l for i in sub_list if i in ("a", "b"))
Although this question already has an accepted answer, just wondering why all of them as so complex. I would think that this would suffice.
>>> l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')]
>>> total = sum(tup.count('a') + tup.count('b') for tup in l)
>>> total
4
Or
>>> total = sum(1 for tup in l for v in tup if v in {'a', 'b'})

How to get a split up a list of numbers and insert into another list

Currently I have a file with 6 rows of numbers and each row containing 9 numbers. The point is to test each row of numbers in the file if it completes a magic square. So for example, say a row of numbers from the file is 4 3 8 9 5 1 2 7 6. The first three numbers need to be the first row in a matrix. The next three numbers need to be the second row, and same for the third.
Therefore you would need to end up with a matrix of:
[['4','3','8'],['9','5','1'],['2','7','6']]
I need to test the matrix to see if it is a valid magic square (Rows add up to 15, columns add to 15, and diagonals add to 15).
My code is currently:
def readfile(fname):
"""Return a list of lines from the file"""
f = open(fname, 'r')
lines = f.read()
lines = lines.split()
f.close()
return lines
def assignValues(lines):
magicSquare = []
rows = 3
columns = 3
for row in range(rows):
magicSquare.append([0] * columns)
for row in range(len(magicSquare)):
for column in range(len(magicSquare[row])):
magicSquare[row][column] = lines[column]
return magicSquare
def main():
lines = readfile(input_fname)
matrix = assignValues(lines)
print(matrix)
Whenever I run my code to test it, I'm getting:
[['4', '3', '8'], ['4', '3', '8'], ['4', '3', '8']]
So as you can see I am only getting the first 3 numbers into my matrix.
Finally, my question is how would I go by continuing my matrix with the following 6 numbers of the line of numbers? I'm not sure if it is something I can do in my loop, or if I am splitting my lines wrong, or am I completely on the wrong track?
Thanks.
To test if each row in your input file contains magic square data you need to re-organize the code slightly. I've used a different technique to Francis to fill the matrix. It might be a bit harder to understand how zip(*[iter(seq)] * size) works, but it's a very useful pattern. Please let me know if you need an explanation for it.
My code uses a list of tuples for the matrix, rather than a list of lists, but tuples are more suitable here anyway, since the data in the matrix doesn't need to be modified. Also, I convert the input data from str into int, since you need to do arithmetic on the numbers to test if matrix is a magic square.
#! /usr/bin/env python
def make_square(seq, size):
return zip(*[iter(seq)] * size)
def main():
fname = 'mydata'
size = 3
with open(fname, 'r') as f:
for line in f:
nums = [int(s) for s in line.split()]
matrix = make_square(nums, size)
print matrix
#Now call the function to test if the data in matrix
#really is a magic square.
#test_square(matrix)
if __name__ == '__main__':
main()
Here's a modified version of make_square() that returns a list of lists instead of a list of tuples, but please bear in mind that a list of tuples is actually better than a list of lists if you don't need the mutability that lists give you.
def make_square(seq, size):
square = zip(*[iter(seq)] * size)
return [list(t) for t in square]
I suppose I should mention that there's actually only one possible 3 x 3 magic square that uses all the numbers from 1 to 9, not counting rotations and reflections. But I guess there's no harm in doing a brute-force demonstration of that fact. :)
Also, I have Python code that I wrote years ago (when I was first learning Python) which generates magic squares of size n x n for odd n >= 5. Let me know if you'd like to see it.
zip and iterator objects
Here's some code that briefly illustrates what the zip() and iter() functions do.
''' Fun with zip '''
numbers = [1, 2, 3, 4, 5, 6]
letters = ['a', 'b', 'c', 'd', 'e', 'f']
#Using zip to create a list of tuples containing pairs of elements of numbers & letters
print zip(numbers, letters)
#zip works on other iterable objects, including strings
print zip(range(1, 7), 'abcdef')
#zip can handle more than 2 iterables
print zip('abc', 'def', 'ghi', 'jkl')
#zip can be used in a for loop to process two (or more) iterables simultaneously
for n, l in zip(numbers, letters):
print n, l
#Using zip in a list comprehension to make a list of lists
print [[l, n] for n, l in zip(numbers, letters)]
#zip stops if one of the iterables runs out of elements
print [[n, l] for n, l in zip((1, 2), letters)]
print [(n, l) for n, l in zip((3, 4), letters)]
#Turning an iterable into an iterator object using the iter function
iletters = iter(letters)
#When we take some elements from an iterator object it remembers where it's up to
#so when we take more elements from it, it continues from where it left off.
print [[n, l] for n, l in zip((1, 2, 3), iletters)]
print [(n, l) for n, l in zip((4, 5), iletters)]
#This list will just contain a single tuple because there's only 1 element left in iletters
print [(n, l) for n, l in zip((6, 7), iletters)]
#Rebuild the iletters iterator object
iletters = iter('abcdefghijkl')
#See what happens when we zip multiple copies of the same iterator object.
print zip(iletters, iletters, iletters)
#It can be convenient to put multiple copies of an iterator object into a list
iletters = iter('abcdefghijkl')
gang = [iletters] * 3
#The gang consists of 3 references to the same iterator object
print gang
#We can pass each iterator in the gang to zip as a separate argument
#by using the "splat" syntax
print zip(*gang)
#A more compact way of doing the same thing:
print zip(* [iter('abcdefghijkl')]*3)
Here's the same code running in the interactive interpreter so you can easily see the output of each statement.
>>> numbers = [1, 2, 3, 4, 5, 6]
>>> letters = ['a', 'b', 'c', 'd', 'e', 'f']
>>>
>>> #Using zip to create a list of tuples containing pairs of elements of numbers & letters
... print zip(numbers, letters)
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e'), (6, 'f')]
>>>
>>> #zip works on other iterable objects, including strings
... print zip(range(1, 7), 'abcdef')
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e'), (6, 'f')]
>>>
>>> #zip can handle more than 2 iterables
... print zip('abc', 'def', 'ghi', 'jkl')
[('a', 'd', 'g', 'j'), ('b', 'e', 'h', 'k'), ('c', 'f', 'i', 'l')]
>>>
>>> #zip can be used in a for loop to process two (or more) iterables simultaneously
... for n, l in zip(numbers, letters):
... print n, l
...
1 a
2 b
3 c
4 d
5 e
6 f
>>> #Using zip in a list comprehension to make a list of lists
... print [[l, n] for n, l in zip(numbers, letters)]
[['a', 1], ['b', 2], ['c', 3], ['d', 4], ['e', 5], ['f', 6]]
>>>
>>> #zip stops if one of the iterables runs out of elements
... print [[n, l] for n, l in zip((1, 2), letters)]
[[1, 'a'], [2, 'b']]
>>> print [(n, l) for n, l in zip((3, 4), letters)]
[(3, 'a'), (4, 'b')]
>>>
>>> #Turning an iterable into an iterator object using using the iter function
... iletters = iter(letters)
>>>
>>> #When we take some elements from an iterator object it remembers where it's up to
... #so when we take more elements from it, it continues from where it left off.
... print [[n, l] for n, l in zip((1, 2, 3), iletters)]
[[1, 'a'], [2, 'b'], [3, 'c']]
>>> print [(n, l) for n, l in zip((4, 5), iletters)]
[(4, 'd'), (5, 'e')]
>>>
>>> #This list will just contain a single tuple because there's only 1 element left in iletters
... print [(n, l) for n, l in zip((6, 7), iletters)]
[(6, 'f')]
>>>
>>> #Rebuild the iletters iterator object
... iletters = iter('abcdefghijkl')
>>>
>>> #See what happens when we zip multiple copies of the same iterator object.
... print zip(iletters, iletters, iletters)
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i'), ('j', 'k', 'l')]
>>>
>>> #It can be convenient to put multiple copies of an iterator object into a list
... iletters = iter('abcdefghijkl')
>>> gang = [iletters] * 3
>>>
>>> #The gang consists of 3 references to the same iterator object
... print gang
[<iterator object at 0xb737eb8c>, <iterator object at 0xb737eb8c>, <iterator object at 0xb737eb8c>]
>>>
>>> #We can pass each iterator in the gang to zip as a separate argument
... #by using the "splat" syntax
... print zip(*gang)
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i'), ('j', 'k', 'l')]
>>>
>>> #A more compact way of doing the same thing:
... print zip(* [iter('abcdefghijkl')]*3)
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i'), ('j', 'k', 'l')]
>>>
it only gets the first 3 column always because
magicSquare[row][column] = lines[column]
thus
def assignValues(lines):
magicSquare = []
rows = 3
columns = 3
for row in range(rows):
magicSquare.append([0] * columns)
for line in range((sizeof(lines)/9)) #since the input is already split this means that the size of 'lines' divided by 9 is equal to the number of rows of numbers
for row in range(len(magicSquare)):
for column in range(len(magicSquare[row])):
magicSquare[row][column] = lines[(9*line)+(3*row)+column]
return magicSquare
note that (3*row)+column will move to it 3 columns to the right every iteration
and that (9*line)+(3*row)+column will move to it 9 columns (a whole row) to the right every iteration
once you get this you are now ready to process in finding out for the magic square
def testMagicSquare(matrix):
rows = 3
columns = 3
for a in len(matrix)
test1 = 0
test2 = 0
test3 = 0
for b in range(3)
if(sum(matrix[a][b])==15) test1=1 #flag true if whole row is 15 but turns false if a row is not 15
else test1=0
if((matrix[a][0][b]+matrix[a][1][b]+matrix[a][2][b])==15) test2=1 #flag true if column is 15 but turns false if a column is not 15
else test2=0
if(((matrix[a][0][0]+matrix[a][1][1]+matrix[a][2][2])==15) and
((matrix[a][0][2]+matrix[a][1][1]+matrix[a][2][0])==15)) test3=1 #flag true if diagonal is 15 but turns false if diagonal is not 15
else test3=0
if(test1>0 and test2>0 and test3>0) println('line ' + a + ' is a magic square')
else println('line ' + a + ' is not a magic square')

Command to choose next tuple in the list

I was wondering if there are any commands to automatically select the next item in the tuple without me having to type it out?
eg.
nul = 0
noofvalue = 5
value = ['a', 'b', 'c', 'd', 'e']
for nul < noofvalue:
file.write(value[0])
what command can i use here to add 1 to 'value' such that when the file loops, instead of using value[0], it uses value[1] instead?
nul = nul + 1
I've googled for the answer and searched, but i don't understand what they are talking about since i'm extremely new to computer coding, so please forgive my ignorance.
I think what you want is enumerate(). I'll add my own example, since your example is a bit weird:
>>> L = ['a', 'b', 'c', 'd', 'e']
>>> for index, value in enumerate(L):
... try:
... print L[index+1] # Prints the next item in the list
... except IndexError:
... print 'End of the list!'
...
b
c
d
e
End of the list!
In Python, you can iterate over a list or tuple in the same way:
for x in value:
do_something(x)
First value = ['a', 'b', 'c', 'd', 'e'] is not tuple, it is a list. In Python to iterate in for loop you can simply do like:
for v in value:
print v # file.write(v)
(I think you have C background where we need index to access elements and iterate over arrays).
if you wants index also then use use `enumerate( any_sequence) function that return (index, value) pairs in list,
>>> list(enumerate(value))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]
so you could do like:
for i, v in enumerate(value):
print i, v
of course if you want to use index explicitly do like:
index = 0
for v in value:
print index, v
index += 1
but this is not Pythonic way so not preferable in genral.

Clone elements of a list

Let's say I have a Python list that looks like this:
list = [ a, b, c, d]
I am looking for the most efficient way performanse wise to get this:
list = [ a, a, a, a, b, b, b, c, c, d ]
So if the list is N elements long then the first element is cloned N-1 times, the second element N-2 times, and so forth...the last element is cloned N-N times or 0 times. Any suggestions on how to do this efficiently on large lists.
Note that I am testing speed, not correctness. If someone wants to edit in a unit test, I'll get around to it.
pyfunc_fastest: 152.58769989 usecs
pyfunc_local_extend: 154.679298401 usecs
pyfunc_iadd: 158.183312416 usecs
pyfunc_xrange: 162.234091759 usecs
pyfunc: 166.495800018 usecs
Ignacio: 238.87629509 usecs
Ishpeck: 311.713695526 usecs
FabrizioM: 456.708812714 usecs
JohnKugleman: 519.239497185 usecs
Bwmat: 1309.29429531 usecs
Test code here. The second revision is trash because I was rushing to get everybody tested that posted after my first batch of tests. These timings are for the fifth revision of the code.
Here's the fastest version that I was able to get.
def pyfunc_fastest(x):
t = []
lenList = len(x)
extend = t.extend
for l in xrange(0, lenList):
extend([x[l]] * (lenList - l))
Oddly, a version that I modified to avoid indexing into the list by using enumerate ran slower than the original.
>>> items = ['a', 'b', 'c', 'd']
>>> [item for i, item in enumerate(items) for j in xrange(len(items) - i)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
First we use enumerate to pull out both indexes and values at the same time. Then we use a nested for loop to iterate over each item a decreasing number of times. (Notice that the variable j is never used. It is junk.)
This should be near optimal, with minimal memory usage thanks to the use of the enumerate and xrange generators.
How about this - A simple one
>>> x = ['a', 'b', 'c', 'd']
>>> t = []
>>> lenList = len(x)
>>> for l in range(0, lenList):
... t.extend([x[l]] * (lenList - l))
...
>>> t
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
>>>
Lazy mode:
import itertools
l = ['foo', 'bar', 'baz', 'quux']
for i in itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)):
print i
Just shove it through list() if you really do need a list instead.
list(itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)))
My first instinct..
l = ['a', 'b', 'c', 'd']
nl = []
i = 0
while len(l[i:])>0:
nl.extend( [l[i]]*len(l[i:]) )
i+=1
print nl
The trick is in using repeat from itertools
from itertools import repeat
alist = "a b c d".split()
print [ x for idx, value in enumerate(alist) for x in repeat(value, len(alist) - idx) ]
>>>['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
Use a generator: it's O(1) memory and O(N^2) cpu, unlike any solution that produces the final list which uses O(N^2) memory and cpu. This means it'll be massively faster as soon as the input list is large enough that the constructed list fills memory and swapping starts. It's unlikely you need to have the final list in memory unless this is homework.
def triangle(seq):
for i, x in enumerate(seq):
for _ in xrange(len(seq) - i - 1):
yield x
To create that new list, list = [ a, a, a, a, b, b, b, c, c, d ] would require O(4n) = O(n) time since for every n elements, you are creating 4n elements in the second array. aaronasterling gives that linear solution.
You could cheat and just not create the new list. Simply, get the index value as input. Divide the index value by 4. Use the result as the index value of the original list.
In pseudocode:
function getElement(int i)
{
int trueIndex = i / 4;
return list[trueIndex]; // Note: that integer division will lead us to the correct index in the original array.
}
fwiw:
>>> lst = list('abcd')
>>> [i for i, j in zip(lst, range(len(lst), 0, -1)) for _ in range(j)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
def gen_indices(list_length):
for index in range(list_length):
for _ in range(list_length - index):
yield index
new_list = [list[i] for i in gen_indices(len(list))]
untested but I think it'll work

Categories