Create list based on index of existing list - python

I know how to create a new list based on the values of an existing list, eg casting
numspec = [float(x) for x in textspec]
Now I have a list of numbers where I need to subtract a value based on the index of a list. I have calculated an a and b value and ended up doing
peakadj = []
for i in range(len(peakvalues)):
val=peakvalues[i]-(i*a+b)
peakadj.append(val)
This works, but I don't like the feel of it, is there any more pythonic way of doing this?

Use the builtin enumerate function and a list comprehension.
peakadj = [val-(i*a+b) for i, val in enumerate(peakvalues)]

Perhaps faster:
from itertools import count
peakadj = [val-iab for val, iab in zip(peakvalues, count(b, a))]
Or:
from itertools import count
from operator import sub
peakadj = [*map(sub, peakvalues, count(b, a))]
Little benchmark

Related

python efficient way to compare nested lists and append matches to new list

I wish to compare two nested lists. If there is a match between the first element of each sublist, I wish to add the matched element to a new list for further operations. Below is an example and what I've tried so far:
Example:
x = [['item1','somethingelse1'], ['item2', 'somethingelse2']...]
y = [['item1','somethingelse3'], ['item3','somethingelse4']...]
What I've I tried so far:
match = []
for itemx in x:
for itemy in y:
if itemx[0] == itemy[0]:
match.append(itemx)
The above of what I tried did the job of appending the matched item into the new list, but I have two very long nested lists, and what I did above is very slow for operating on very long lists. Are there any more efficient ways to get out the matched item between two nested lists?
Yes, use a data structure with constant-time membership testing. So, using a set, for example:
seen = set()
for first,_ in x:
seen.add(first)
matched = []
for first,_ in y:
if first in seen:
matched.append(first)
Or, more succinctly using set/list comprehensions:
seen = {first for first,_ in x}
matched = [first for first,_ in y if first in seen]
(This was before the OP changed the question from append(itemx[0]) to append(itemx)...)
>>> {a[0] for a in x} & {b[0] for b in y}
{'item1'}
Or if the inner lists are always pairs:
>>> dict(x).keys() & dict(y)
{'item1'}
IIUC using numpy:
import numpy as np
y=[l[0] for l in y]
x=np.array(x)
x[np.isin(x[:, 0], y)]

Python List Indexing or Appending?

What is the best way to add values to a List in terms of processing time, memory usage and just generally what is the best programming option.
list = []
for i in anotherArray:
list.append(i)
or
list = range(len(anotherArray))
for i in list:
list[i] = anotherArray[i]
Considering that anotherArray is for example an array of Tuples. (This is just a simple example)
It really depends on your use case. There is no generic answer here as it depends on what you are trying to do.
In your example, it looks like you are just trying to create a copy of the array, in which case the best way to do this would be to use copy:
from copy import copy
list = copy(anotherArray)
If you are trying to transform the array into another array you should use list comprehension.
list = [i[0] for i in anotherArray] # get the first item from tuples in anotherArray
If you are trying to use both indexes and objects, you should use enumerate:
for i, j in enumerate(list)
which is much better than your second example.
You can also use generators, lambas, maps, filters, etc. The reason all of these possibilities exist is because they are all "better" for different reasons. The writters of python are pretty big on "one right way", so trust me, if there was one generic way which was always better, that is the only way that would exist in python.
Edit: Ran some results of performance for tuple swap and here are the results:
comprehension: 2.682028295999771
enumerate: 5.359116118001111
for in append: 4.177091988000029
for in indexes: 4.612594166001145
As you can tell, comprehension is usually the best bet. Using enumerate is expensive.
Here is the code for the above test:
from timeit import timeit
some_array = [(i, 'a', True) for i in range(0,100000)]
def use_comprehension():
return [(b, a, i) for i, a, b in some_array]
def use_enumerate():
lst = []
for j, k in enumerate(some_array):
i, a, b = k
lst.append((b, a, i))
return lst
def use_for_in_with_append():
lst = []
for i in some_array:
i, a, b = i
lst.append((b, a, i))
return lst
def use_for_in_with_indexes():
lst = [None] * len(some_array)
for j in range(len(some_array)):
i, a, b = some_array[j]
lst[j] = (b, a, i)
return lst
print('comprehension:', timeit(use_comprehension, number=200))
print('enumerate:', timeit(use_enumerate, number=200))
print('for in append:', timeit(use_for_in_with_append, number=200))
print('for in indexes:', timeit(use_for_in_with_indexes, number=200))
Edit2:
It was pointed out to me the the OP just wanted to know the difference between "indexing" and "appending". Really, those are used for two different use cases as well. Indexing is for replacing objects, whereas appending is for adding. However, in a case where the list starts empty, appending will always be better because the indexing has the overhead of creating the list initially. You can see from the results above that indexing is slightly slower, mostly because you have to create the first list.
Best way is list comprehension :
my_list=[i for i in anotherArray]
But based on your problem you can use a generator expression (is more efficient than list comprehension when you just want to loop over your items and you don't need to use some list methods like indexing or len or ... )
my_list=(i for i in anotherArray)
I would actually say the best is a combination of index loops and value loops with enumeration:
for i, j in enumerate(list): # i is the index, j is the value, can't go wrong

Pythonic way of handling string formatted loop

I have the following code, and the idea is to be able to iterate over root for each string in some_list[j]. The goal is to stay away from nested for loops and learn a more pythonic way of doing this. I would like to return each value for the first item in some_list then repeat for the next item in some_list.
for i, value in enumerate(root.iter('{0}'.format(some_list[j])))
return value
Any ideas?
EDIT: root is
tree = ElementTree.parse(self._file)
root = tree.getroot()
I think what you're trying to do is this:
values = ('{0}'.format(root.iter(item)) for item in some_list)
for i, value in enumerate(values):
# ...
But really, '{0}'.format(foo) is silly; it's just going to do the same thing as str(foo) but more slowly and harder to understand. Most likely you already have strings, so all you really need is:
values = (root.iter(item) for item in some_list)
for i, value in enumerate(values):
# ...
You could merge those into a single line, or replace the genexpr with map(root.iter, some_list), etc., but that's the basic idea.
At any rate, there are no nested loops here. There are two loops, but they're just interleaving—you're still only running the inner code once for each item in some_list.
So, given List A, which contains multiple lists, you want to return the first element of each list? I may not be understanding you correctly, but if so you can use a list comprehension...very "Pythonic" ;)
In [1]: some_list = [[1,2,3],[4,5,6],[7,8,9]]
In [2]: new = [x[0] for x in some_list]
In [3]: new
Out[3]: [1, 4, 7]
You could try using chain()
from the itertools module. It would look something like
from itertools import chain
all_items = chain(*[root.iter('{0}'.format(x) for x in some_list)])
for i, value in enumerate(all_items):
return value # or yield i, value, or whatever you need to do with value

What is the simplest and most efficient function to return a sublist based on an index list?

Say I have a list l:
['a','b','c','d','e']
and a list of indexes idx:
[1,3]
What is the simplest and most efficient function that will return:
['b','d']
Try using this:
[l[i] for i in idx]
You want operator.itemgetter.
In my first example, I'll show how you can use itemgetter to construct a callable which you can use on any indexable object:
from operator import itemgetter
items = itemgetter(1,3)
items(yourlist) #('b', 'd')
Now I'll show how you can use argument unpacking to store your indices as a list
from operator import itemgetter
a = ['a','b','c','d','e']
idx = [1,3]
items = itemgetter(*idx)
print items(a) #('b', 'd')
Of course, this gives you a tuple, not a list, but it's trivial to construct a list from a tuple if you really need to.
Here is an option using a list comprehension:
[v for i, v in enumerate(l) if i in idx]
This will be more efficient if you convert idx to a set first.
An alternative with operator.itemgetter:
import operator
operator.itemgetter(*idx)(l)
As noted in comments, [l[i] for i in idx] will probably be your best bet here, unless idx may contain indices greater than the length of l, or if idx is not ordered and you want to keep the same order as l.

python: how to know the index when you randomly select an element from a sequence with random.choice(seq)

I know very well how to select a random item from a list with random.choice(seq) but how do I know the index of that element?
import random
l = ['a','b','c','d','e']
i = random.choice(range(len(l)))
print i, l[i]
You could first choose a random index, then get the list element at that location to have both the index and value.
>>> import random
>>> a = [1, 2, 3, 4, 5]
>>> index = random.randint(0,len(a)-1)
>>> index
0
>>> a[index]
1
You can do it using randrange function from random module
import random
l = ['a','b','c','d','e']
i = random.randrange(len(l))
print i, l[i]
The most elegant way to do so is random.randrange:
index = random.randrange(len(MY_LIST))
value = MY_LIST[index]
One can also do this in python3, less elegantly (but still better than .index) with random.choice on a range object:
index = random.choice(range(len(MY_LIST)))
value = MY_LIST[index]
The only valid solutions are this solution and the random.randint solutions.
The ones which use list.index not only are slow (O(N) per lookup rather than O(1); gets really bad if you do this for each element, you'll have to do O(N^2) comparisons) but ALSO you will have skewed/incorrect results if the list elements are not unique.
One would think that this is slow, but it turns out to only be slightly slower than the other correct solution random.randint, and may be more readable. I personally consider it more elegant because one doesn't have to do numerical index fiddling and use unnecessary parameters as one has to do with randint(0,len(...)-1), but some may consider this a feature, though one needs to know the randint convention of an inclusive range [start, stop].
Proof of speed for random.choice: The only reason this works is that the range object is OPTIMIZED for indexing. As proof, you can do random.choice(range(10**12)); if it iterated through the entire list your machine would be slowed to a crawl.
edit: I had overlooked randrange because the docs seemed to say "don't use this function" (but actually meant "this function is pythonic, use it"). Thanks to martineau for pointing this out.
You could of course abstract this into a function:
def randomElement(sequence):
index = random.randrange(len(sequence))
return index,sequence[index]
i,value = randomElement(range(10**15)) # try THAT with .index, heh
# (don't, your machine will die)
# use xrange if using python2
# i,value = (268840440712786, 268840440712786)
If the values are unique in the sequence, you can always say: list.index(value)
Using randrage() as has been suggested is a great way to get the index. By creating a dictionary created via comprehension you can reduce this code to one line as shown below. Note that since this dictionary only has one element, when you call popitem() you get the combined index and value in a tuple.
import random
letters = "abcdefghijklmnopqrstuvwxyz"
# dictionary created via comprehension
idx, val = {i: letters[i] for i in [random.randrange(len(letters))]}.popitem()
print("index {} value {}" .format(idx, val))
We can use sample() method also.
If you want to randomly select n elements from list
import random
l, n = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 2
index_list = random.sample(range(len(l)), n)
index_list will have unique indexes.
I prefer sample() over choices() as sample() does not allow duplicate elements in a sequence.

Categories