Remove Duplicate Items in Dictionary

Remove Duplicate Items in Dictionary - python

I'm trying to remove duplicate items in a list through a dictionary:
def RemoveDuplicates(list):
d = dict()
for i in xrange(0, len(list)):
dict[list[i]] = 1 <------- error here
return d.keys()
But it is raising me the following error:
TypeError: 'type' object does not support item assignment
What is the problem?

You should have written:
d[list[i]] = 1
But why not do this?
def RemoveDuplicates(l):
return list(set(l))
Also, don't use built-in function names as variable names. It can lead to confusing bugs.

In addition to what others have said, it is unpythonic to do this:
for i in xrange(0, len(lst)):
do stuff with lst[i]
when you can do this instead:
for item in lst:
do stuff with item

dict is the type, you mean d[list[i]] = 1.
Addition: This points out the actual error in your code. But the answers provided by others provide better way to achieve what you are aiming at.

def remove_duplicates(myList):
return list (set(myList))
From looking at your code it seems that you are not bothered about the ordering of elements and concerned only about the uniqueness. In such a case, a set() could be a better data structure.
The problem in your code is just to use a function argument name which is not the name of the built-in type list and later on the type dict in the expression dict[list[i]].

Note that using list(set(seq)) will likely change the ordering of the remaining items. If retaining their order is important, you need to make a copy of the list:
items = set()
copy = []
for item in seq:
if not item in items:
copy.add(item)
items.append(item)
seq = copy

Related

Shortest way to get first item of `OrderedDict` in Python 3

What's the shortest way to get first item of OrderedDict in Python 3?
My best:
list(ordered_dict.items())[0]
Quite long and ugly.
I can think of:
next(iter(ordered_dict.items())) # Fixed, thanks Ashwini
But it's not very self-describing.
Any better suggestions?

Programming Practices for Readabililty
In general, if you feel like code is not self-describing, the usual solution is to factor it out into a well-named function:
def first(s):
'''Return the first element from an ordered collection
or an arbitrary element from an unordered collection.
Raise StopIteration if the collection is empty.
'''
return next(iter(s))
With that helper function, the subsequent code becomes very readable:
>>> extension = {'xml', 'html', 'css', 'php', 'xhmtl'}
>>> one_extension = first(extension)
Patterns for Extracting a Single Value from Collection
The usual ways to get an element from a set, dict, OrderedDict, generator, or other non-indexable collection are:
for value in some_collection:
break
and:
value = next(iter(some_collection))
The latter is nice because the next() function lets you specify a default value if collection is empty or you can choose to let it raise an exception. The next() function is also explicit that it is asking for the next item.
Alternative Approach
If you actually need indexing and slicing and other sequence behaviors (such as indexing multiple elements), it is a simple matter to convert to a list with list(some_collection) or to use [itertools.islice()][2]:
s = list(some_collection)
print(s[0], s[1])
s = list(islice(n, some_collection))
print(s)

Use popitem(last=False), but keep in mind that it removes the entry from the dictionary, i.e. is destructive.
from collections import OrderedDict
o = OrderedDict()
o['first'] = 123
o['second'] = 234
o['third'] = 345
first_item = o.popitem(last=False)
>>> ('first', 123)
For more details, have a look at the manual on collections. It also works with Python 2.x.

Subclassing and adding a method to OrderedDict would be the answer to clarity issues:
>>> o = ExtOrderedDict(('a',1), ('b', 2))
>>> o.first_item()
('a', 1)
The implementation of ExtOrderedDict:
class ExtOrderedDict(OrderedDict):
def first_item(self):
return next(iter(self.items()))

Code that's readable, leaves the OrderedDict unchanged and doesn't needlessly generate a potentially large list just to get the first item:
for item in ordered_dict.items():
return item
If ordered_dict is empty, None would be returned implicitly.
An alternate version for use inside a stretch of code:
for first in ordered_dict.items():
break # Leave the name 'first' bound to the first item
else:
raise IndexError("Empty ordered dict")
The Python 2.x code corresponding to the first example above would need to use iteritems() instead:
for item in ordered_dict.iteritems():
return item

You might want to consider using SortedDict instead of OrderedDict.
It provides SortedDict.peekitem to peek an item.
Runtime complexity: O(log(n))
>>> sd = SortedDict({'a': 1, 'b': 2, 'c': 3})
>>> sd.peekitem(0)
('a', 1)

If you need a one-liner:
ordered_dict[[*ordered_dict.keys()][0]]
It creates a list of dict keys, picks the first and use it as key to access the dictionary value.

First record:
[key for key, value in ordered_dict][0]
Last record:
[key for key, value in ordered_dict][-1]

Python: How to traverse a List[Dict{List[Dict{}]}]

I was just wondering if there is a simple way to do this. I have a particular structure that is parsed from a file and the output is a list of a dict of a list of a dict. Currently, I just have a bit of code that looks something like this:
for i in xrange(len(data)):
for j, k in data[i].iteritems():
for l in xrange(len(data[i]['data'])):
for m, n in data[i]['data'][l].iteritems():
dostuff()
I just wanted to know if there was a function that would traverse a structure and internally figure out whether each entry was a list or a dict and if it is a dict, traverse into that dict and so on. I've only been using Python for about a month or so, so I am by no means an expert or even an intermediate user of the language. Thanks in advance for the answers.
EDIT: Even if it's possible to simplify my code at all, it would help.

You never need to iterate through xrange(len(data)). You iterate either through data (for a list) or data.items() (or values()) (for a dict).
Your code should look like this:
for elem in data:
for val in elem.itervalues():
for item in val['data']:
which is quite a bit shorter.

Will, if you're looking to decend an arbitrary structure of array/hash thingies then you can create a function to do that based on the type() function.
def traverse_it(it):
if (isinstance(it, list)):
for item in it:
traverse_it(item)
elif (isinstance(it, dict)):
for key in it.keys():
traverse_it(it[key])
else:
do_something_with_real_value(it)
Note that the average object oriented guru will tell you not to do this, and instead create a class tree where one is based on an array, another on a dict and then have a single function to process each with the same function name (ie, a virtual function) and to call that within each class function. IE, if/else trees based on types are "bad". Functions that can be called on an object to deal with its contents in its own way "good".

I think this is what you're trying to do. There is no need to use xrange() to pull out the index from the list since for iterates over each value of the list. In my example below d1 is therefore a reference to the current data[i].
for d1 in data: # iterate over outer list, d1 is a dictionary
for x in d1: # iterate over keys in d1 (the x var is unused)
for d2 in d1['data']: # iterate over the list
# iterate over (key,value) pairs in inner most dict
for k,v in d2.iteritems():
dostuff()
You're also using the name l twice (intentionally or not), but beware of how the scoping works.

well, question is quite old. however, out of my curiosity, I would like to respond to your question for much better answer which I just tried.
Suppose, dictionary looks like: dict1 = { 'a':5,'b': [1,2,{'a':100,'b':100}], 'dict 2' : {'a':3,'b':5}}
Solution:
dict1 = { 'a':5,'b': [1,2,{'a':100,'b':100}], 'dict 2' : {'a':3,'b':5}}
def recurse(dict):
if type(dict) == type({}):
for key in dict:
recurse(dict[key])
elif type(dict) == type([]):
for element in dict:
if type(element) == type({}):
recurse(element)
else:
print element
else:
print dict
recurse(dict1)

How to add items to a dictionary

people. I'm python newbie. I have two def functions as below under a class.
def add_item(self, itemID, itemlist):
lines = []
self.itemID = itemID
self.itemlist = itemlist
for line in self.itemID, itemlist:
lines.append(line)
and
def get_keys(self):
i = []
i.append(self.itemID)
return i
If I do
example.add_item('abc', item list)
example.add_item('abcd', item list)
example.add_item('abce', item list)
then when I do
example.get_keys()
It should give:
['abc', 'abcd', 'abce']
but mine only gives the latest one that is ['abce'].
Can anyone please let me know how to fix?

If I understand correctly, you want to add several couple of key and item_list to your example, and be able to retrieve the keys you added so far ? The easiest is to store the keys and the itemlist in two lists
Assuming that you initialize your object as such
def __init__(self, *args, **kwargs):
self.itemID = []
self.itemlist = []
...
Now, your add_item can simplify in
def add_item(self, itemID, itemlist):
self.itemID.append(itemID)
self.itemlist.append(itemlist)
and your get_key is only:
def get_keys():
return self.itemID
Note that the get_key is exactly the one you have suggested, just simpler (no need to create a temporary list).
When you do
lines = []
for line in self.itemID, itemlist:
lines.append(line)
line first takes the value self.itemID, then itemlist. Eventually, your lines is just [self.itemID, itemlist]. Probably not what you had in mind.

To add a new key to a dictionary, just assign it.
dict['new_key'] = 'value'

Perhaps
i.extend(self.itemID)
Might be what you are looking for

It looks like you are overwriting the item each time you add it.
When you call add_item, you are creating this variable "lines" that is never used again, and item_id and item_list are over-written with the new inputs.

You could also use the built-in method update:
example.update({'Key':'value'})

def add_item(self, itemID, itemlist):
lines = []
You are initializing your lines variable with empty list...
So, each time you invoke this method, it create a new list, and add the item to it..
You can rather return your lines from this method and store it in some variable where you are invoking this method..
Or, just declare lines as instance variable.
example = Example();
example.lines = []
example.lines.extend(example.add_item(itemId1, item_list1));
example.lines.extend(example.add_item(itemId2, item_list2));
Or, you can rather add your itemId and list to dictionary __dict__ of your class..
dict[itemId] = value;
** NOTE: - Just saw that, you have not used your for-loop correctly.. You don't iterate over two iterable like this..
You need to go through a good Python Book.. Or rather, Python Documentation..

First thing I see: You are iterating over two elements at once which is usually done by using zip(), at least if both elements are lists. Otherwise just use the container you want to loop over.
for id,line in zip(self.itemID, itemlist):
lines.append(line)
But I don't see any dict...

How to compare an element of a tuple (int) to determine if it exists in a list

I have the two following lists:
# List of tuples representing the index of resources and their unique properties
# Format of (ID,Name,Prefix)
resource_types=[('0','Group','0'),('1','User','1'),('2','Filter','2'),('3','Agent','3'),('4','Asset','4'),('5','Rule','5'),('6','KBase','6'),('7','Case','7'),('8','Note','8'),('9','Report','9'),('10','ArchivedReport',':'),('11','Scheduled Task',';'),('12','Profile','<'),('13','User Shared Accessible Group','='),('14','User Accessible Group','>'),('15','Database Table Schema','?'),('16','Unassigned Resources Group','#'),('17','File','A'),('18','Snapshot','B'),('19','Data Monitor','C'),('20','Viewer Configuration','D'),('21','Instrument','E'),('22','Dashboard','F'),('23','Destination','G'),('24','Active List','H'),('25','Virtual Root','I'),('26','Vulnerability','J'),('27','Search Group','K'),('28','Pattern','L'),('29','Zone','M'),('30','Asset Range','N'),('31','Asset Category','O'),('32','Partition','P'),('33','Active Channel','Q'),('34','Stage','R'),('35','Customer','S'),('36','Field','T'),('37','Field Set','U'),('38','Scanned Report','V'),('39','Location','W'),('40','Network','X'),('41','Focused Report','Y'),('42','Escalation Level','Z'),('43','Query','['),('44','Report Template ','\\'),('45','Session List',']'),('46','Trend','^'),('47','Package','_'),('48','RESERVED','`'),('49','PROJECT_TEMPLATE','a'),('50','Attachments','b'),('51','Query Viewer','c'),('52','Use Case','d'),('53','Integration Configuration','e'),('54','Integration Command f'),('55','Integration Target','g'),('56','Actor','h'),('57','Category Model','i'),('58','Permission','j')]
# This is a list of resource ID's that we do not want to reference directly, ever.
unwanted_resource_types=[0,1,3,10,11,12,13,14,15,16,18,20,21,23,25,27,28,32,35,38,41,47,48,49,50,57,58]
I'm attempting to compare the two in order to build a third list containing the 'Name' of each unique resource type that currently exists in unwanted_resource_types. e.g. The final result list should be:
result = ['Group','User','Agent','ArchivedReport','ScheduledTask','...','...']
I've tried the following that (I thought) should work:
result = []
for res in resource_types:
if res[0] in unwanted_resource_types:
result.append(res[1])
and when that failed to populate result I also tried:
result = []
for res in resource_types:
for type in unwanted_resource_types:
if res[0] == type:
result.append(res[1])
also to no avail. Is there something i'm missing? I believe this would be the right place to perform list comprehension, but that's still in my grey basket of understanding fully (The Python docs are a bit too succinct for me in this case).
I'm also open to completely rethinking this problem, but I do need to retain the list of tuples as it's used elsewhere in the script. Thank you for any assistance you may provide.

Your resource types are using strings, and your unwanted resources are using ints, so you'll need to do some conversion to make it work.
Try this:
result = []
for res in resource_types:
if int(res[0]) in unwanted_resource_types:
result.append(res[1])
or using a list comprehension:
result = [item[1] for item in resource_types if int(item[0]) in unwanted_resource_types]

The numbers in resource_types are numbers contained within strings, whereas the numbers in unwanted_resource_types are plain numbers, so your comparison is failing. This should work:
result = []
for res in resource_types:
if int( res[0] ) in unwanted_resource_types:
result.append(res[1])

The problem is that your triples contain strings and your unwanted resources contain numbers, change the data to
resource_types=[(0,'Group','0'), ...
or use int() to convert the strings to ints before comparison, and it should work. Your result can be computed with a list comprehension as in
result=[rt[1] for rt in resource_types if int(rt[0]) in unwanted_resource_types]
If you change ('0', ...) into (0, ... you can leave out the int() call.
Additionally, you may change the unwanted_resource_types variable into a set, like
unwanted_resource_types=set([0,1,3, ... ])
to improve speed (if speed is an issue, else it's unimportant).

The one-liner:
result = map(lambda x: dict(map(lambda a: (int(a[0]), a[1]), resource_types))[x], unwanted_resource_types)
without any explicit loop does the job.
Ok - you don't want to use this in production code - but it's fun. ;-)
Comment:
The inner dict(map(lambda a: (int(a[0]), a[1]), resource_types)) creates a dictionary from the input data:
{0: 'Group', 1: 'User', 2: 'Filter', 3: 'Agent', ...
The outer map chooses the names from the dictionary.

Do I have to cause an ValueError in Python

I have this code:
chars = #some list
try:
indx = chars.index(chars)
except ValueError:
#doSomething
else:
#doSomethingElse
I want to be able to do this because I don't like knowfully causing Exceptions:
chars = #some list
indx = chars.index(chars)
if indx == -1:
#doSomething
else:
#doSomethingElse
Is there a way I can do this?

Note that the latter approach is going against the generally accepted "pythonic" philosophy of EAFP, or "It is Easier to Ask for Forgiveness than Permission.", while the former follows it.

if element in mylist:
index = mylist.index(element)
# ... do something
else:
# ... do something else

For the specific case where your list is a sequence of single-character strings you can get what you want by changing the list to be searched to a string in advance (eg. ''.join(chars)).
You can then use the .find() method, which does work as you want. However, there's no corresponding method for lists or tuples.
Another possible option is to use a dictionary instead. eg.
d = dict((x, loc) for (loc,x) in enumerate(chars))
...
index = d.get(chars_to_find, -1) # Second argument is default if not found.
This may also perform better if you're doing a lot of searches on the list. If it's just a single search on a throwaway list though, its not worth doing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove Duplicate Items in Dictionary - python

You should have written: d[list[i]] = 1 But why not do this? def RemoveDuplicates(l): return list(set(l)) Also, don't use built-in function names as variable names. It can lead to confusing bugs.

In addition to what others have said, it is unpythonic to do this: for i in xrange(0, len(lst)): do stuff with lst[i] when you can do this instead: for item in lst: do stuff with item

dict is the type, you mean d[list[i]] = 1. Addition: This points out the actual error in your code. But the answers provided by others provide better way to achieve what you are aiming at.

Note that using list(set(seq)) will likely change the ordering of the remaining items. If retaining their order is important, you need to make a copy of the list: items = set() copy = [] for item in seq: if not item in items: copy.add(item) items.append(item) seq = copy

Related

Shortest way to get first item of `OrderedDict` in Python 3

Python: How to traverse a List[Dict{List[Dict{}]}]

How to add items to a dictionary

How to compare an element of a tuple (int) to determine if it exists in a list

Do I have to cause an ValueError in Python

Categories

Resources