Convert list to list of ids - python

Python 3.3:
What is the easiest way to obtain from the list:
input = ["A", 112, "apple", 74, 112]
following list:
output = [0, 1, 2, 3, 1]
That is, assign automatically incremented id starting from 0 to every unique entry, and convert the original list to the list of this ids.
I am aware, that I can obtain cheap the number of classes by
number_of_classes = len(set(input))
But how to create correctly ordered output?

You could use list comprehension to create a list of indexes of when the element first occurs in that list.
For an input list i = ["A", 112, "apple", 74, 112]:
>>> [i.index(value) for value in i]
[0, 1, 2, 3, 1]

In addition to #ajcr answer, which is fine for small lists, here is solution that has linerar computational complexity (while using list.index() has O(n^2)):
data = ["A", 112, "apple", 74, 112]
index = {val: i for i, val in reversed(list(enumerate(data)))}
indexes = [index[x] for x in data]
indexed = [(x, index[x]) for x in data]
print index
print indexes
print indexed

Just keep a another list with the keys and use the array.index() method to get the index of the item:
input = ["A", 112, "apple", 74, 112]
keys = []
output = []
for item in input:
if item not in keys:
keys.append(item)
output.append(keys.index(item))
print output

Using dictionary:
input = ["A", 112, "apple", 74, 112]
dictMap = dict((i[1],i[0]) for i in enumerate(set(input)))
print [dictMap[i] for i in input]
Output:
[0, 1, 3, 2, 1]

>>> input = ["A", 112, "apple", 74, 112]
>>> my_dict = {x:i for i,x in enumerate(sorted(set(input),key=input.index))} # need to as set dosent remember order
>>> my_dict
{'A': 0, 112: 1, 74: 3, 'apple': 2}
>>> [ my_dict[x] for x in input ]
[0, 1, 2, 3, 1]

Related

find local maxima and their indices in a python list

I have a large data set and I am trying to find the local maxima and their indices. I have made it to get the local maxima but can't find a way to get their indices.
The thing is that I need the maxima only for positive values and we should ignore the negative values. In other words, it would be like splitting the list in several segments with positive values and getting those maxima.
For example, the list can be smth like this:
test_data = [2, 35, -45, 56, 5, 67, 21, 320, 55, -140, -45, -98, -23, -23, 35, 67, 34, -30, -86, 4, -93, 35, 88, 32, 98, -6]
My code is:
`
def global_peaks(test_data):
counter1 = []
index = []
global_peak = []
global_idx = []
for idx, data in enumerate(test_data):
if data > 0:
counter1.append(data)
index.append(idx)
else:
if(len(counter1) != 0):
global_peak.append(max(counter1))
index.append(idx)
global_idx.append(index)
counter1.clear()
index.clear()
return global_peak, global_idx
```
global_peaks(test_data)
```
([35, 320, 67, 4, 98], [[], [], [], [], []])
```
result are correct when it comes to the values, but not the indices
def global_peaks(test_data):
counter1 = []
index = []
global_peak = []
global_idx = []
for idx, data in enumerate(test_data):
if data > 0:
counter1.append(data)
index.append(idx)
else:
if(len(counter1) != 0):
global_peak.append(max(counter1))
index.append(idx)
global_idx.append(index)
counter1.clear()
index.clear()
index.append(1) # <-- for demonstration
return global_peak, global_idx
global_peaks(test_data)
# Output
([35, 320, 67, 4, 98], [[1], [1], [1], [1], [1]])
One problem comes with appending a list (global_idx.append(index)), which is a mutable object. You are appending a reference to this list so your output will show whatever is within this list at the moment of outputting.
What you need to use instead is a copy of that list (index.copy()), though this still does not give you the result you need.
Why should these lines give you the corresponding index
index.append(idx)
global_idx.append(index)
This should give you its corresponding index:
max_idx = index[counter1.index(max(counter1))]
One comment on your general approach: Be aware that if you have 2 or more local maxima within a region of only positive values, you would only find a single one. Is that really what you want?
The full code looks like this:
def global_peaks(test_data):
counter1 = []
index = []
global_peak = []
global_idx = []
for idx, data in enumerate(test_data):
if data > 0:
counter1.append(data)
index.append(idx)
else:
if(len(counter1) != 0):
global_peak.append(max(counter1))
max_idx = index[counter1.index(max(counter1))] # <- changed
global_idx.append(max_idx) # <- changed
counter1.clear()
index.clear()
return global_peak, global_idx
#Output
globla_peaks(test_data)
([35, 320, 67, 4, 98], [1, 7, 15, 19, 24])

Convert list of lists to unique dictionary in python

I have the following list of lists: [[100, 23], [101, 34], [102, 35] ... ]
How can i convert it to a dictionary like that: {100: 23, 101: 34, 102: 35 ... }
Here is what i tried:
myDict = dict(map(reversed, myList))
This code works, but it will give the opposite of what i want in the dictionary: {23: 100, 101:34..}
You can pass a list with this format to dict() and it will convert it for you.
myList = [[100, 23], [101, 34], [102, 35]]
myDict = dict(myList)
print(myDict)
According to docs page for dict()
. . . the positional argument must be an iterable object. Each item in the iterable must itself be an iterable with exactly two objects. The first object of each item becomes a key in the new dictionary, and the second object the corresponding value.
and here is an example for the same doc page:
...
>>> d = dict([('two', 2), ('one', 1), ('three', 3)])
returns a dictionary equal to {"one": 1, "two": 2, "three": 3}
L = []
A = {}
for x,y in L:
A[x] = y

How to iterate unequal nested lists to create a new list Python

I'm stuck on iterating several nested lists in order to calculate Call options by using a Python module, Mibian.
If I use mibian to calculate made up European call options.
import mibian as mb
mb.BS([stock price, strike price, interest rate, days to maturity], volatility)
my_list = [[20, 25, 30, 35, 40, 45],
[50, 52, 54, 56, 58, 60, 77, 98, 101],
[30, 40, 50, 60]]
For calculating multiple call options, first, I create a range.
If I select, say the first nested list, my_list[0], and run a for-loop. I get all the call options for the stock.
range_list = list(range(len(my_list)))
range_list
# [0, 1, 2]
data = dict()
for x in range_list:
data[x] = option2 = []
for i in my_list[0]:
c = mb.BS([120, i, 1, 20 ], 10)
option2.append(c.callPrice)
option2
This gives the 6 call prices of the first nested list from my_list.
Output:
[100.01095590221843,
95.013694877773034,
90.016433853327641,
85.019172828882233,
80.021911804436854,
75.024650779991447]
What I'm trying to figure out, is how I can iterate all the nested lists in one go, and get a new list of nested lists that contain the call option prices for my_list[0], my_list[1], and my_list[2].
I'd like this output in one go for all three nested lists.
Output:
[[100.01095590221843, [70.027389755546068, [90.016433853327641,
95.013694877773034, 68.028485345767905, 80.021911804436854,
90.016433853327641, 66.029580935989742, 80.021911804436854,
85.019172828882233, 64.030676526211579, 70.027389755546068,
80.021911804436854, 62.03177211643343, ]]
75.024650779991447] 60.032867706655267,
43.042180223540925,
22.05368392087027,
19.055327306203068]
Can anyone help? I'm sure it's something very simple that I'm missing.
Many thanks.
P.S. I can't get the indentation right when editing my code on here.
Let's start with your current approach:
range_list = list(range(len(my_list)))
data = dict()
for x in range_list:
data[x] = option2 = []
for i in my_list[0]:
c = mb.BS([120, i, 1, 20 ], 10)
option2.append(c.callPrice)
The first thing you should note is that there is enumerate to get the index and the part at the same time, so you can omit the range_list variable:
data = dict()
for x, sublist in enumerate(my_list):
data[x] = option2 = []
for i in my_list[0]:
c = mb.BS([120, i, 1, 20 ], 10)
option2.append(c.callPrice)
This also takes care of the problem with the "dynamic indexing" because you can just iterate over the sublist:
data = dict()
for x, sublist in enumerate(my_list):
data[x] = option2 = []
for i in sublist:
c = mb.BS([120, i, 1, 20 ], 10)
option2.append(c.callPrice)
Then you can use a list comprehension to replace the inner loop:
data = dict()
for x, sublist in enumerate(my_list):
data[x] = [mb.BS([120, i, 1, 20 ], 10).callPrice for i in sublist]
and if you feel like you want this shorter (not recommended but some like it) then use a dict comprehension instead of the outer loop:
data = {x: [mb.BS([120, i, 1, 20 ], 10).callPrice for i in sublist]
for x, sublist in enumerate(my_list)}
provided that
my_nested_list = [[1,2,3], [4,5,6,7], [8,9]]
[i for i in my_nested_list]
returns
[[1, 2, 3], [4, 5, 6, 7], [8, 9]]
something along
my_list = [[20, 25, 30, 35, 40, 45], [50, 52, 54, 56, 58, 60, 77, 98, 101],
[30, 40, 50, 60]]
[mb.BS([120, i, 1, 20 ], 10) for i in my_list]
shall return what you expect?

Can I use the slice method to return a list that excludes ranges in the middle of the original list?

Is there a way to slice through a whole list while excluding a range of values or multiple range of values in the middle of the list?
For example:
list = [1,2,3,4,5,6,7,8,9,0]
print list[......] #some code inside
I'd like the above code to print the list while excluding a range of values so the output would be: [1,2,3,8,9,0] or excluding multiple value ranges so the output would be: [1,2,6,7,0] by using the slice notation or any other simple method you can suggest.
Use list comprehensions:
>>> mylist = [1,2,3,4,5,6,7,8,9,0]
>>> print [i for i in mylist if i not in xrange(4,8)]
[1, 2, 3, 8, 9, 0]
Or if you want to exclude numbers in two different ranges:
>>> print [i for i in mylist if i not in xrange(4,8) and i not in xrange(1,3)]
[3, 8, 9, 0]
By the way, it's not good practice to name a list list. It is already a built-in function/type.
If the list was unordered and was a list of strings, you can use map() along with sorted():
>>> mylist = ["2", "5", "3", "9", "7", "8", "1", "6", "4"]
>>> print [i for i in sorted(map(int,mylist)) if i not in xrange(4,8)]
[1, 2, 3, 8, 9]
>>> nums = [1,2,3,4,5,6,7,8,9,0]
>>> exclude = set(range(4, 8))
>>> [n for n in nums if n not in exclude]
[1, 2, 3, 8, 9, 0]
Another example
>>> exclude = set(range(4, 8) + [1] + range(0, 2))
>>> [n for n in nums if n not in exclude]
[2, 3, 8, 9]
Using a method, and an exclude list
def method(l, exclude):
return [i for i in l if not any(i in x for x in exclude)]
r = method(range(100), [range(5,10), range(20,50)])
print r
>>>
[0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
My example uses ranges with ints. But this method can be any list of items, with any number of exclude lists with other items, as long as the items have an equals comparison.
Edit:
A much faster method:
def method2(l, exclude):
'''
l is a list of items, exclude is a list of items, or a list of a list of items
exclude the items in exclude from the items in l and return them.
'''
if exclude and isinstance(exclude[0], (list, set)):
x = set()
map(x.add, [i for j in exclude for i in j])
else:
x = set(exclude)
return [i for i in l if i not in x]
Given my_list = [1,2,3,4,5,6,7,8,9,0], in one line, with enumerate() and range() (or xrange() in Python 2.x):
[n for i, n in enumerate(my_list) if i not in range(3, 7)]
I was wondering if using this was also valid:
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
print my_list[:3] + my_list[7:]
Here's a function that takes multiple slice objects and gives you a list containing only the items included by those slices. You can exclude items by specifying around what you want to exclude.
from itertools import chain
def sliceAndDice(sequence, *slices):
return list(chain(*[sequence[slice] for slice in slices]))
So, If you have the list [0,1,2,3,4,5,6,7,8,9] and you want to exclude the 4,5,6 in the middle, you could do this:
sliceAndDice([0,1,2,3,4,5,6,7,8,9], slice(0,4), slice(7,None))
This would return [0, 1, 2, 3, 7, 8, 9].
It works with lists of things that aren't numbers: sliceAndDice(['Amy','John','Matt','Joey','Melissa','Steve'], slice(0,2), slice(4,None)) will leave out 'Matt' and 'Joey', resulting in ['Amy', 'John', 'Melissa', 'Steve']
It won't work right if you pass in slices that are out of order or overlapping.
It also creates the whole list at once. A better (but more complicated) solution would be to create a iterator class that iterates over only the the items you wish to include. The solution here is good enough for relatively short lists.

remove None value from a list without removing the 0 value

This was my source I started with.
My List
L = [0, 23, 234, 89, None, 0, 35, 9]
When I run this :
L = filter(None, L)
I get this results
[23, 234, 89, 35, 9]
But this is not what I need, what I really need is :
[0, 23, 234, 89, 0, 35, 9]
Because I'm calculating percentile of the data and the 0 make a lot of difference.
How to remove the None value from a list without removing 0 value?
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9]
Just for fun, here's how you can adapt filter to do this without using a lambda, (I wouldn't recommend this code - it's just for scientific purposes)
>>> from operator import is_not
>>> from functools import partial
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> list(filter(partial(is_not, None), L))
[0, 23, 234, 89, 0, 35, 9]
A list comprehension is likely the cleanest way:
>>> L = [0, 23, 234, 89, None, 0, 35, 9
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9]
There is also a functional programming approach but it is more involved:
>>> from operator import is_not
>>> from functools import partial
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> list(filter(partial(is_not, None), L))
[0, 23, 234, 89, 0, 35, 9]
Using list comprehension this can be done as follows:
l = [i for i in my_list if i is not None]
The value of l is:
[0, 23, 234, 89, 0, 35, 9]
For Python 2.7 (See Raymond's answer, for Python 3 equivalent):
Wanting to know whether something "is not None" is so common in python (and other OO languages), that in my Common.py (which I import to each module with "from Common import *"), I include these lines:
def exists(it):
return (it is not None)
Then to remove None elements from a list, simply do:
filter(exists, L)
I find this easier to read, than the corresponding list comprehension (which Raymond shows, as his Python 2 version).
#jamylak answer is quite nice, however if you don't want to import a couple of modules just to do this simple task, write your own lambda in-place:
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> filter(lambda v: v is not None, L)
[0, 23, 234, 89, 0, 35, 9]
Iteration vs Space, usage could be an issue. In different situations profiling may show either to be "faster" and/or "less memory" intensive.
# first
>>> L = [0, 23, 234, 89, None, 0, 35, 9, ...]
>>> [x for x in L if x is not None]
[0, 23, 234, 89, 0, 35, 9, ...]
# second
>>> L = [0, 23, 234, 89, None, 0, 35, 9]
>>> for i in range(L.count(None)): L.remove(None)
[0, 23, 234, 89, 0, 35, 9, ...]
The first approach (as also suggested by #jamylak, #Raymond Hettinger, and #Dipto) creates a duplicate list in memory, which could be costly of memory for a large list with few None entries.
The second approach goes through the list once, and then again each time until a None is reached. This could be less memory intensive, and the list will get smaller as it goes. The decrease in list size could have a speed up for lots of None entries in the front, but the worst case would be if lots of None entries were in the back.
The second approach would likely always be slower than the first approach. That does not make it an invalid consideration.
Parallelization and in-place techniques are other approaches, but each have their own complications in Python. Knowing the data and the runtime use-cases, as well profiling the program are where to start for intensive operations or large data.
Choosing either approach will probably not matter in common situations. It becomes more of a preference of notation. In fact, in those uncommon circumstances, numpy (example if L is numpy.array: L = L[L != numpy.array(None) (from here)) or cython may be worthwhile alternatives instead of attempting to micromanage Python optimizations.
Say the list is like below
iterator = [None, 1, 2, 0, '', None, False, {}, (), []]
This will return only those items whose bool(item) is True
print filter(lambda item: item, iterator)
# [1, 2]
This is equivalent to
print [item for item in iterator if item]
To just filter None:
print filter(lambda item: item is not None, iterator)
# [1, 2, 0, '', False, {}, (), []]
Equivalent to:
print [item for item in iterator if item is not None]
To get all the items that evaluate to False
print filter(lambda item: not item, iterator)
# Will print [None, '', 0, None, False, {}, (), []]
from operator import is_not
from functools import partial
filter_null = partial(filter, partial(is_not, None))
# A test case
L = [1, None, 2, None, 3]
L = list(filter_null(L))
If it is all a list of lists, you could modify sir #Raymond's answer
L = [ [None], [123], [None], [151] ]
no_none_val = list(filter(None.__ne__, [x[0] for x in L] ) )
for python 2 however
no_none_val = [x[0] for x in L if x[0] is not None]
""" Both returns [123, 151]"""
<< list_indice[0] for variable in List if variable is not None >>
L = [0, 23, 234, 89, None, 0, 35, 9]
result = list(filter(lambda x: x is not None, L))
If the list has NoneType and pandas._libs.missing.NAType objects than use:
[i for i in lst if pd.notnull(i)]

Categories