indexing a list where there is no match in python - python

I have two separate lists (of lists) that can potentially have overlapping values. Each list has a date. I would like to modify mylist1 so that the date (item[3]) is the minimum of the date from mylist1 and the corressponding date in mylist2. The problem I am running into is that when there is no match, I get an error: IndexError: list index out of range. In these cases, I would just like to not take the minimum, but instead just use the date in mylist1. Is there some sort of if error clause that I can put around it?
My Code:
mylist1 = [[u'AAA', None, u'111', u'1/1/2015'],
[u'BBB', None, u'222', u'1/1/2012'],
[u'CCC', None, u'333', u'1/1/2012']]
mylist2 = [(u'111', u'1/1/2011'),
(u'333', u'2013-11-10'),
(u'444', u'1/1/2017')]
for key, item in enumerate(mylist1):
mylist1[key] = [item[0], item[1], item[2],
min(item[3], [x for x in mylist2 if x[0] == item[2]][0][1])]
Desired output:
[[u'AAA', None, u'111', u'1/1/2011'],
[u'BBB', None, u'222', u'1/1/2012'],
[u'CCC', None, u'333', u'1/1/2012']]

If I got this right, your mylist2 is kinda used as a dictionary. Why not just make it one:
mylist1 = [[u'AAA', None, u'111' ,u'1/1/2015'], [u'BBB', None, u'222' ,u'1/1/2012'], [u'CCC', None, u'333' ,u'1/1/2012']]
mylist2 = [(u'111', u'1/1/2011'), (u'333', u'2013-11-10'), (u'444', u'1/1/2017')]
# assuming, you are not responsible for the form of mylist2,
# we will change it into a dictionary here:
mydict = dict(mylist2) # easy :)
for elem_list in mylist1:
elem_list[3] = min(elem_list[3], mydict.get(elem_list[2], elem_list[3]))
elem_list is a reference to the list, so we don't need the index of it in mylist1 as a change in it will persist. This makes the rest of the last line easier, as we do not have to re-build the original list. The mydict.get prevents the error, if the desired element is not available.

mylist1 = [[u'AAA', None, u'111' ,u'1/1/2015'], [u'BBB', None, u'222' ,u'1/1/2012'], [u'CCC', None, u'333' ,u'1/1/2012']]
mylist2 = [(u'111', u'1/1/2011'), (u'333', u'2013-11-10'), (u'444', u'1/1/2017')]
for key, item in enumerate(mylist1):
try:
mylist1[key] = [item[0],item[1],item[2],min(item[3],[x for x in mylist2 if x[0] == item[2]][0][1])]
except IndexError:
#Put your codes that what you want to do if you got this error
#print ("An error happened but I dont care") <--- like this for example
pass #or just simply pass this error
Actually you can decide what will happen if you got IndexError, try this.

This is a cool way to do it if you have unique keys:
# produce dictionaries whose keys are the matching values (have to be uniue)
mylist1dict={x[2]: x for x in mylist1}
mylist2dict={x[0]: x for x in mylist2}
# replace where needed, only use common keys
for k in set(mylist2dict) & set(mylist1dict):
mylist1dict[k][2] = mylist2dict[k]
# this is the result
mylist1dict.values()

As in some case your min function return an empty list You can use an if statement for check the it then indexing :
>>> for key, item in enumerate(mylist1):
... m=min(item[3],[x for x in mylist2 if x[0] == item[2]])
... if m :
... mylist1[key] = [item[0],item[1],item[2],m[0][1]]
...
>>> mylist1
[[u'AAA', None, u'111', u'1/1/2011'],
[u'BBB', None, u'222', u'1/1/2012'],
[u'CCC', None, u'333', u'1/1/2012']]

Related

Get all min values from dictionary list

I have the following snippet:
list = [{"num":1,"test":"A"},{"num":6,"test":"B"},{"num":5,"test":"c"},{"num":1,"test":"D"}]
min = None
for x in list:
if x["num"]<min or min==None:
min=x["num"]
print(min)
print([index for index, element in enumerate(list)
if min == element["num"]])
Which doesn't really output anything useful, my objective was to output, as said in the title, the dictionaries with "1" in num.
A noob question I know, but this is my first contact with the language.
Thanks!
min() takes a key argument that lets you specific how to calculate the min. This will let you find an object with the min num value. You can then use that to find all of them with a list comprehension (or similar method).
l = [{"num":1,"test":"A"},{"num":6,"test":"B"},{"num":5,"test":"c"},{"num":1,"test":"D"}]
m = min(l, key=lambda d: d['num'])
# {'num': 1, 'test': 'A'}
[item for item in l if item['num'] == m['num']]
# [{'num': 1, 'test': 'A'}, {'num': 1, 'test': 'D'}]
You need to set min to an arbitrarily large number at the beginning of the program. I set it to 500. You then have to make you're checking if the "num" value is less than or equal to min, otherwise it will not grab both 1 values.
list = [{"num":1,"test":"A"},{"num":6,"test":"B"},{"num":5,"test":"c"},{"num":1,"test":"D"}]
min = 500
for x in list:
if x["num"]<=min or min==None:
min=x["num"]
print(x)
print(min)
You can try with this:
list_=[{"num":1,"test":"A"},{"num":6,"test":"B"},{"num":5,"test":"c"},{"num":1,"test":"D"}]
min_=min(list_,key=lambda x: x["num"])
min_ = min_["num"]
l=list(filter(lambda x: x["num"]==min_,list_))
print(l)

How to structure dictionary to apply to function with enumerate

I am trying to re-build a simple function, that ask for a dictionary as an input. No matter what I try I cannot figure out a minimum working example of a dictionary to pass through this function. I've read upon dictionaries and there is not so much room to create it differently, hence I do not know what the problem is.
I've tried to apply following minimum dictionary examples:
import nltk
#Different dictionaries to try as minimum working examples:
comments1 = {1 : 'Rockies', 2: 'Red Sox'}
comments2 = {'key1' : 'Rockies', 'key2': 'Red Sox'}
comments3 = dict([(1, 3), (2, 3)])
#Function:
def tokenize_body(comments):
tokens = {}
for idx, com_id in enumerate(comments):
body = comments[com_id]['body']
tokenized = [x.lower() for x in nltk.word_tokenize(body)]
tokens[com_id] = tokenized
return tokens
tokens = tokenize_body(comments1)
I know that with enumerate I am basically calling the index and the key, I can not figure out how to call the 'body', i.e the strings that I want to tokenize.
For both comments1 and comments2 with strings as inputs I receive the error: TypeError: string indices must be integers.
If I apply integers instead of strings, comments3, I receive the error:
TypeError: 'int' object is not subscriptable.
This may seem trivial to you, but I can not figure out what I am doing wrong. If you could provide a minimum working example, that would be highly appreciated.
In order to loop through a dictionary in python, you need to use the items method to get both keys and values:
comments = {"key1": "word", "key2": "word2"}
def tokenize_body(comments):
tokens = {}
for key, value in comments.items():
# values - word, word2
# keys - key1, key2
tokens[key] = [x.lower() for x in nltk.word_tokenize(value)]
return tokens
enumerate is used for lists, in order to get the index of an element:
l = ['a', 'b']
for index, elm in enumerate(l):
print(index) # => 0, 1
You might be looking for .items(), e.g.:
for idx, item in enumerate(comments1.items()):
print(idx, item)
This will print
0 (1, 'Rockies')
1 (2, 'Red Sox')
See a demo on ideone.com.

how to make list comprehension using while in loop

I have such loop:
ex = [{u'white': []},
{u'yellow': [u'9241.jpg', []]},
{u'red': [u'241.jpg', []]},
{u'blue': [u'59241.jpg', []]}]
for i in ex:
while not len(i.values()[0]):
break
else:
print i
break
I need always to return first dict with lenght of values what is higher then 0
but i want to make it with list comprehension
A list comprehension would produce a whole list, while you want just one item.
Use a generator expression instead, and have the next() function iterate to the first value:
next((i for i in ex if i.values()[0]), None)
I've given next() a default to return as well; if there is no matching dictionary, None is returned instead.
Demo:
>>> ex = [{u'white': []},
... {u'yellow': [u'9241.jpg', []]},
... {u'red': [u'241.jpg', []]},
... {u'blue': [u'59241.jpg', []]}]
>>> next((i for i in ex if i.values()[0]), None)
{u'yellow': [u'9241.jpg', []]}
You should, however, rethink your data structure. Dictionaries with just one key-value pair suggest to me you wanted a different type instead; tuples perhaps:
ex = [
(u'white', []),
(u'yellow', [u'9241.jpg', []]),
(u'red', [u'241.jpg', []]),
(u'blue', [u'59241.jpg', []]),
]

Python list set value at index if index does not exist

Is there a way, lib, or something in python that I can set value in list at an index that does not exist?
Something like runtime index creation at list:
l = []
l[3] = 'foo'
# [None, None, None, 'foo']
And more further, with multi dimensional lists:
l = []
l[0][2] = 'bar'
# [[None, None, 'bar']]
Or with an existing one:
l = [['xx']]
l[0][1] = 'yy'
# [['xx', 'yy']]
There isn't a built-in, but it's easy enough to implement:
class FillList(list):
def __setitem__(self, index, value):
try:
super().__setitem__(index, value)
except IndexError:
for _ in range(index-len(self)+1):
self.append(None)
super().__setitem__(index, value)
Or, if you need to change existing vanilla lists:
def set_list(l, i, v):
try:
l[i] = v
except IndexError:
for _ in range(i-len(l)+1):
l.append(None)
l[i] = v
Not foolproof, but it seems like the easiest way to do this is to initialize a list much larger than you will need, i.e.
l = [None for i in some_large_number]
l[3] = 'foo'
# [None, None, None, 'foo', None, None None ... ]
If you really want the syntax in your question, defaultdict is probably the best way to get it:
from collections import defaultdict
def rec_dd():
return defaultdict(rec_dd)
l = rec_dd()
l[3] = 'foo'
print l
{3: 'foo'}
l = rec_dd()
l[0][2] = 'xx'
l[1][0] = 'yy'
print l
<long output because of defaultdict, but essentially)
{0: {2: 'xx'}, 1: {0: 'yy'}}
It isn't exactly a 'list of lists' but it works more or less like one.
You really need to specify the use case though... the above has some advantages (you can access indices without checking whether they exist first), and some disadvantages - for example, l[2] in a normal dict will return a KeyError, but in defaultdict it just creates a blank defaultdict, adds it, and then returns it.
Other possible implementations to support different syntactic sugars could involve custom classes etc, and will have other tradeoffs.
You cannot create a list with gaps. You could use a dict or this quick little guy:
def set_list(i,v):
l = []
x = 0
while x < i:
l.append(None)
x += 1
l.append(v)
return l
print set_list(3, 'foo')
>>> [None, None, None, 'foo']

Getting the first non None value from list

Given a list, is there a way to get the first non-None value? And, if so, what would be the pythonic way to do so?
For example, I have:
a = objA.addreses.country.code
b = objB.country.code
c = None
d = 'CA'
In this case, if a is None, then I would like to get b. If a and b are both None, the I would like to get d.
Currently I am doing something along the lines of (((a or b) or c) or d), is there another way?
You can use next():
>>> a = [None, None, None, 1, 2, 3, 4, 5]
>>> next(item for item in a if item is not None)
1
If the list contains only Nones, it will throw StopIteration exception. If you want to have a default value in this case, do this:
>>> a = [None, None, None]
>>> next((item for item in a if item is not None), 'All are Nones')
All are Nones
I think this is the simplest way when dealing with a small set of values:
firstVal = a or b or c or d
Will always return the first non "Falsey" value which works in some cases (given you dont expect any values which could evaluate to false as #GrannyAching points out below)
first_true is an itertools recipe found in the Python 3 docs:
def first_true(iterable, default=False, pred=None):
"""Returns the first true value in the iterable.
If no true value is found, returns *default*
If *pred* is not None, returns the first item
for which pred(item) is true.
"""
# first_true([a,b,c], x) --> a or b or c or x
# first_true([a,b], x, f) --> a if f(a) else b if f(b) else x
return next(filter(pred, iterable), default)
One may choose to implement the latter recipe or import more_itertools, a library that ships with itertools recipes and more:
> pip install more_itertools
Use:
import more_itertools as mit
a = [None, None, None, 1, 2, 3, 4, 5]
mit.first_true(a, pred=lambda x: x is not None)
# 1
a = [None, None, None]
mit.first_true(a, default="All are None", pred=lambda x: x is not None)
# 'All are None'
Why use the predicate?
"First non-None" item is not the same as "first True" item, e.g. [None, None, 0] where 0 is the first non-None, but it is not the first True item. The predicate allows first_true to be useable, ensuring any first seen, non-None, falsey item in the iterable is still returned (e.g. 0, False) instead of the default.
a = [None, None, None, False]
mit.first_true(a, default="All are None", pred=lambda x: x is not None)
# 'False'
When the items in your list are expensive to calculate such as in
first_non_null = next((calculate(x) for x in my_list if calculate(x)), None)
# or, when receiving possibly None-values from a dictionary for each list item:
first_non_null = next((my_dict[x] for x in my_list if my_dict.get(x)), None)
then you might want to avoid the repetitive calculation and simplify to:
first_non_null = next(filter(bool, map(calculate, my_list)), None)
# or:
first_non_null = next(filter(bool, map(my_dict.get, my_list)), None)
Thanks to the usage of a generator expression, the calculations are only executed for the first items until a truthy value is generated.
Adapt from the following (you could one-liner it if you wanted):
values = (a, b, c, d)
not_None = (el for el in values if el is not None)
value = next(not_None, None)
This takes the first non None value, or returns None instead.
First of all want to mention that such function exists in SQL and is called coalesce. Found no such thing in Python so made up my own one, using the recipe of #alecxe.
def first_not_none(*values):
return next((v for v in values if v is not None), None)
Really helps in cases like this:
attr = 'title'
document[attr] = first_not_none(cli_args.get(attr), document_item.get(attr),
defaults_item.get(attr), '')

Categories