python: how to merge a list into clusters? - python

I have a list of tuples:
[(3,4), (18,27), (4,14)]
and need a code merging tuples which has repeated numbers, making another list where all list elements will only contain unique numbers. The list should be sorted by the length of the tuples, i.e.:
>>> MergeThat([(3,4), (18,27), (4,14)])
[(3,4,14), (18,27)]
>>> MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
[(57,66,76,85), (1,3,10), (15,21)]
I understand it's something similar to hierarchical clustering algorithms, which I've read about, but can't figure them out.
Is there a relatively simple code for a MergeThat() function?

I tried hard to figure this out, but only after I tried the approach Ian's answer (thanks!) suggested I realized what the theoretical problem is: The input is a list of edges and defines a graph. We are looking for the strongly connected components of this graph. It's simple as that.
While you can do this efficiently, there is actually no reason to implement this yourself! Just import a good graph library:
import networkx as nx
# one of your examples
g1 = nx.Graph([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
print nx.connected_components(g1) # [[57, 66, 76, 85], [1, 10, 3], [21, 15]]
# my own test case
g2 = nx.Graph([(1,2),(2,10), (20,3), (3,4), (4,10)])
print nx.connected_components(g2) # [[1, 2, 3, 4, 10, 20]]

import itertools
def merge_it(lot):
merged = [ set(x) for x in lot ] # operate on sets only
finished = False
while not finished:
finished = True
for a, b in itertools.combinations(merged, 2):
if a & b:
# we merged in this iteration, we may have to do one more
finished = False
if a in merged: merged.remove(a)
if b in merged: merged.remove(b)
merged.append(a.union(b))
break # don't inflate 'merged' with intermediate results
return merged
if __name__ == '__main__':
print merge_it( [(3,4), (18,27), (4,14)] )
# => [set([18, 27]), set([3, 4, 14])]
print merge_it( [(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)] )
# => [set([21, 15]), set([1, 10, 3]), set([57, 66, 76, 85])]
print merge_it( [(1,2), (2,3), (3,4), (4,5), (5,9)] )
# => [set([1, 2, 3, 4, 5, 9])]
Here's a snippet (including doctests): http://gist.github.com/586252

def collapse(L):
""" The input L is a list that contains tuples of various sizes.
If any tuples have shared elements,
exactly one instance of the shared and unshared elements are merged into the first tuple with a shared element.
This function returns a new list that contain merged tuples and an int that represents how many merges were performed."""
answer = []
merges = 0
seen = [] # a list of all the numbers that we've seen so far
for t in L:
tAdded = False
for num in t:
pleaseMerge = True
if num in seen and pleaseMerge:
answer += merge(t, answer)
merges += 1
pleaseMerge = False
tAdded= True
else:
seen.append(num)
if not tAdded:
answer.append(t)
return (answer, merges)
def merge(t, L):
""" The input L is a list that contains tuples of various sizes.
The input t is a tuple that contains an element that is contained in another tuple in L.
Return a new list that is similar to L but contains the new elements in t added to the tuple with which t has a common element."""
answer = []
while L:
tup = L[0]
tupAdded = False
for i in tup:
if i in t:
try:
L.remove(tup)
newTup = set(tup)
for i in t:
newTup.add(i)
answer.append(tuple(newTup))
tupAdded = True
except ValueError:
pass
if not tupAdded:
L.remove(tup)
answer.append(tup)
return answer
def sortByLength(L):
""" L is a list of n-tuples, where n>0.
This function will return a list with the same contents as L
except that the tuples are sorted in non-ascending order by length"""
lengths = {}
for t in L:
if len(t) in lengths.keys():
lengths[len(t)].append(t)
else:
lengths[len(t)] = [(t)]
l = lengths.keys()[:]
l.sort(reverse=True)
answer = []
for i in l:
answer += lengths[i]
return answer
def MergeThat(L):
answer, merges = collapse(L)
while merges:
answer, merges = collapse(answer)
return sortByLength(answer)
if __name__ == "__main__":
print 'starting'
print MergeThat([(3,4), (18,27), (4,14)])
# [(3, 4, 14), (18, 27)]
print MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
# [(57, 66, 76, 85), (1, 10, 3), (15, 21)]

Here's another solution that doesn't use itertools and takes a different, slightly more verbose, approach. The tricky bit of this solution is the merging of cluster sets when t0 in index and t1 in index.
import doctest
def MergeThat(a):
""" http://stackoverflow.com/questions/3744048/python-how-to-merge-a-list-into-clusters
>>> MergeThat([(3,4), (18,27), (4,14)])
[(3, 4, 14), (18, 27)]
>>> MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
[(57, 66, 76, 85), (1, 3, 10), (15, 21)]
"""
index = {}
for t0, t1 in a:
if t0 not in index and t1 not in index:
index[t0] = set()
index[t1] = index[t0]
elif t0 in index and t1 in index:
index[t0] |= index[t1]
oldt1 = index[t1]
for x in index.keys():
if index[x] is oldt1:
index[x] = index[t0]
elif t0 not in index:
index[t0] = index[t1]
else:
index[t1] = index[t0]
assert index[t0] is index[t1]
index[t0].add(t0)
index[t0].add(t1)
return sorted([tuple(sorted(x)) for x in set(map(frozenset, index.values()))], key=len, reverse=True)
if __name__ == "__main__":
import doctest
doctest.testmod()

The code others have written will surely work, but here's another option, maybe simpler to understand and maybe less algorithmic complexity.
Keep a dictionary from numbers to the cluster (implemented as a python set) they're a member of. Also include that number in the corresponding set. Process an input pair either as:
Neither element is in the dictionary: create a new set, hook up dictionary links appropriately.
One or the other, but not both elements are in the dictionary: Add the yet-unseen element to the set of its brother, and add its dictionary link into the correct set.
Both elements are seen before, but in different sets: Take the union of the old sets and update all dictionary links to the new set.
You've seen both members before, and they're in the same set: Do nothing.
Afterward, simply collect the unique values from the dictionary and sort in descending order of size. This portion of the job is O(m log n) and thus will not dominate runtime.
This should work in a single pass. Writing the actual code is left as an exercise for the reader.

This is not efficient for huge lists.
def merge_that(lot):
final_list = []
while len(lot) >0 :
temp_set = set(lot[0])
deletable = [0] #list of all tuples consumed by temp_set
for i, tup2 in enumerate(lot[1:]):
if tup2[0] in temp_set or tup2[1] in temp_set:
deletable.append(i)
temp_set = temp_set.union(tup2)
for d in deletable:
del lot[d]
deletable = []
# Some of the tuples consumed later might have missed their brothers
# So, looping again after deleting the consumed tuples
for i, tup2 in enumerate(lot):
if tup2[0] in temp_set or tup2[1] in temp_set:
deletable.append(i)
temp_set = temp_set.union(tup2)
for d in deletable:
del lot[d]
final_list.append(tuple(temp_set))
return final_list
It looks ugly but works.

Related

Finding value in dict given an integer that can be found in between dictionary's tuple key

Given an x dictionary of tuple keys and string values:
x = {(0, 4): 'foo', (4,9): 'bar', (9,10): 'sheep'}
The task is to write the function, find the value, given a specific number, e.g. if user inputs 3, it should return 'foo'. We can assume that there is no overlapping numbers in the key.
Another e.g., if user inputs 9, it should return 'bar'.
I've tried converting the x dict to a list and write the function as follows, but it's suboptimal if the range of values in the keys is extremely huge:
from itertools import chain
mappings = None * max(chain(*x))
for k in x:
for i in range(k[0], k[1]):
mappings[i] = x[k]
def myfunc(num):
return mapping[num]
How else can the myfunc function be written?
Is there a better data structure to keep the mapping?
You can convert your key in a numpy array and use numpy.searchsorted to search a query. Since keys are left open I have incremented open value of keys by 1 in the array.
Each query is of order O(log(n)).
Create an array:
A = np.array([[k1+1, k2] for k1, k2 in x])
>>> A
array([[ 1, 4],
[ 5, 9],
[10, 10]])
Function to search query:
def myfunc(num):
ind1 = np.searchsorted(A[:, 0], num, 'right')
ind2 = np.searchsorted(A[:, 1], num, 'left')
if ind1 == 0 or ind2 == A.shape[0] or ind1 <= ind2: return None
return vals[ind2]
Prints:
>>> myfunc(3)
'foo'
Iterate over the dictionary comparing to the keys:
x = {(0, 4): 'foo', (4, 9): 'bar', (9, 10): 'sheep'}
def find_tuple(dct, num):
for tup, val in dct.items():
if tup[0] <= num < tup[1]:
return val
return None
print(find_tuple(x, 3))
# foo
print(find_tuple(x, 9))
# sheep
print(find_tuple(x, 11))
# None
A better data structure would be a dictionary with just the left boundaries of the intervals (as keys) and the corresponding values. Then you can use bisect as the other answers mention.
import bisect
import math
x = {
-math.inf: None,
0: 'foo',
4: 'bar',
9: 'sheep',
10: None,
}
def find_tuple(dct, num):
idx = bisect.bisect_right(list(dct.keys()), num)
return list(dct.values())[idx-1]
print(find_tuple(x, 3))
# foo
print(find_tuple(x, 9))
# sheep
print(find_tuple(x, 11))
# None
You could simply iterate through keys and compare the values (rather than creating a mapping). This is a bit more efficient than creating a mapping first, since you could have a key like (0, 100000) which will create needless overhead.
Edited answer based on comments from OP
x = {(0, 4): 'foo', (4,9): 'bar', (9,10): 'sheep'}
def find_value(k):
for t1, t2 in x:
if k > t1 and k <= t2: # edited based on comments
return x[(t1, t2)]
# if we end up here, we can't find a match
# do whatever appropriate, e.g. return None or raise exception
return None
Note: it's unclear in your tuple keys if they are inclusive ranges for the input number. E.g. if a user inputs 4, should they get 'foo' or 'bar'? This will affect your comparison in the function described above in my snippet. (see edit above, this should fulfill your requirement).
In this example above, an input of 4 would return 'foo', since it would fulfill the condition of being k >= 0 and k <= 4, and thus return before continuing the loop.
Edit: wording and typo fix
Here's one solution using pandas.IntervalIndex and pandas.cut. Note, I "tweaked" the last key to (10, 11), because I'm using closed="left" in my IntervalIndex. You can change this if you want the intervals closed on different sides (or both):
import pandas as pd
x = {(0, 4): "foo", (4, 9): "bar", (10, 11): "sheep"}
bins = pd.IntervalIndex.from_tuples(x, closed="left")
result = pd.cut([3], bins)[0]
print(x[(result.left, result.right)])
Prints:
foo
Other solution using bisect module (assuming the ranges are continuous - so no "gaps"):
from bisect import bisect_left
x = {(0, 4): "foo", (4, 9): "bar", (10, 10): "sheep"}
bins, values = [], []
for k in sorted(x):
bins.append(k[1]) # intervals are closed "right", eg. (0, 4]
values.append(x[k])
idx = bisect_left(bins, 4)
print(values[idx])
Prints:
foo

Complex heapq lazy merge (minimize space used)

Given a list of sorted lists, I wish to produce a sorted list of output.
This would be easy:
nums : List[List[int]]
h = heapq.merge(nums)
However, I also wish to tag each element of the output, with the index of the inner list from which it originated. Eg.
nums = [[1,3,5], [2,6], [4]]
h = _ # ???
for x in h:
print(x)
# Outputs
# (1,0)
# (2,1)
# (3,0)
# (4,2)
# (5,0)
# (6,1)
I have written a version that works,
h = heapq.merge(*map(lambda l: map(lambda x: (x,l[0]), l[1]), enumerate(nums)))
But I'm afraid I might have lost a desirable space-complexity guarantee; how can I know whether the (transformed) inner lists are being manifested or not? (and what exactly does the * do in my attempt?)
Since default tuple comparison is lexicographic, you can convert the innermost-element x to (x, i) where i is the index of the list containing x in nums. For example,
import heapq
from itertools import repeat
nums = [[1,3,5], [2,6], [4]]
nums_with_inds = (zip(lst, repeat(i)) for i, lst in enumerate(nums))
res = heapq.merge(*nums_with_inds)
for tup in res:
print(tup)
# (1, 0)
# (2, 1)
# (3, 0)
# (4, 2)
# (5, 0)
# (6, 1)

Find indexes of repeated elements in an array (Python, NumPy)

Assume, I have a NumPy-array of integers, as:
[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
I want to find the start and end indices of the array, where a value is more than x-times (say 5-times) repeated. So in the case above, it is the value 22 and 6. Start index of the repeated 22 is 3 and end-index is 8. Same for the repeatening 6.
Is there a special tool in Python that is helpful?
Otherwise, I would loop through the array index for index and compare the actual value with the previous.
Regards.
Using np.diff and the method given here by #WarrenWeckesser for finding runs of zeros in an array:
import numpy as np
def zero_runs(a): # from link
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
zero_runs(np.diff(a))
Out[87]:
array([[ 3, 8],
[15, 22]], dtype=int32)
This can then be filtered on the difference between the start & end of the run:
runs = zero_runs(np.diff(a))
runs[runs[:, 1]-runs[:, 0]>5] # runs of 7 or more, to illustrate filter
Out[96]: array([[15, 22]], dtype=int32)
Here is a solution using Python's native itertools.
Code
import itertools as it
def find_ranges(lst, n=2):
"""Return ranges for `n` or more repeated values."""
groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
return ((sub[0][0], sub[-1][0]) for sub in repeated)
lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
list(find_ranges(lst, 5))
# [(3, 8), (15, 22)]
Tests
import nose.tools as nt
def test_ranges(f):
"""Verify list results identifying ranges."""
nt.eq_(list(f([])), [])
nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])
nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])
test_ranges(find_ranges)
This example captures (index, element) pairs in lst, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.
See also this post for finding ranges of indices using itertools.groupby.
There really isn't a great short-cut for this. You can do something like:
mult = 5
for elem in val_list:
target = [elem] * mult
found_at = val_list.index(target)
I leave the not-found exceptions and longer sequence detection to you.
If you're looking for value repeated n times in list L, you could do something like this:
def find_repeat(value, n, L):
look_for = [value for _ in range(n)]
for i in range(len(L)):
if L[i] == value and L[i:i+n] == look_for:
return i, i+n
Here is a relatively quick, errorless solution which also tells you how many copies were in the run. Some of this code was borrowed from KAL's solution.
# Return the start and (1-past-the-end) indices of the first instance of
# at least min_count copies of element value in container l
def find_repeat(value, min_count, l):
look_for = [value for _ in range(min_count)]
for i in range(len(l)):
count = 0
while l[i + count] == value:
count += 1
if count >= min_count:
return i, i + count
I had a similar requirement. This is what I came up with, using only comprehension lists:
A=[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
Find unique and return their indices
_, ind = np.unique(A,return_index=True)
np.unique sorts the array, sort the indices to get the indices in the original order
ind = np.sort(ind)
ind contains the indices of the first element in the repeating group, visible by non-consecutive indices
Their diff gives the number of elements in a group. Filtering using np.diff(ind)>5 shall give a boolean array with True at the starting indices of groups. The ind array contains the end indices of each group just after each True in the filtered list
Create a dict with the key as the repeating element and the values as a tuple of start and end indices of that group
rep_groups = dict((A[ind[i]], (ind[i], ind[i+1]-1)) for i,v in enumerate(np.diff(ind)>5) if v)

Python Distribute List of Tuple Items by Matching Recursively

I'm trying to write a recursive function that will look for matching items in a list of tuples, and distribute them into groups. An example list could be:
l = [(1,2),(2,3),(4,6),(3,5),(7,9),(6,8)]
And the intent is to end with three sublists in the form of:
l = [(1,2),(2,3),(3,5)],[(4,6),(6,8)],[7,9]]
This appears to be an ideal situation for a recursive function, but I haven't made it work yet. Here is what I have written so far:
count = 0
def network(index_list, tuples_list, groups, count):
if len(index_list) > 0:
i = 0
for j in range (len(index_list)):
match = set(tuples_list[groups[count][i]]) & set(tuples_list[index_list[j]])
if len(match) > 0:
groups[count].append(index_list[j])
index_list.pop(j)
i += 1
else:
count += 1
groups.append([])
groups[count].append(index_list[0])
network(index_list, tuples_list, groups, count)
else:
return groups
Also I'm pretty certain this question is different than the one marked duplicate. I'm looking for a recursive solution that keeps all of the tuples intact, in this case by pushing around their indices with append and pop. I'm positive there's an elegant and recursive solution to this problem.
I'm not sure recursion is helpful here. It was an interesting thought experiment, so I tried it out, but my recursive function only ever needs one run-through.
Modified based on comment to only match the first two data points of a tuple
import copy
#Inital 1-item groups
def groupTuples(tupleList):
groups = []
for t in tupleList:
groups.append([t])
return groups
#Merge two groups
def splice(list1, list2, index):
for item in list2:
list1.insert(index, item)
index += 1
return list1
def network(groups = []):
#Groups get modified by reference, need a shallow copy for reference
oldGroups = copy.copy(groups)
for group in groups:
for index, secondGroup in enumerate(groups):
if group == secondGroup:
#Splice can get caught in a loop if passed the same item twice.
continue
#Last item of First Group matches First item of Second Group
if group[-1][1] == secondGroup[0][0]:
splice(group, secondGroup, len(group))
groups.pop(index)
#First item of First Group matches Last item of Second Group
elif group[0][0] == secondGroup[-1][1]:
splice(group, secondGroup, 0)
groups.pop(index)
if groups == oldGroups:
return groups #No change.
else:
return network(groups)
l = [(1,2,0),(2,3,1),(4,6,2),(3,5,3),(1,7,4),(7,9,5),(6,8,6),(12, 13,7),(5, 12,8)]
result = network(groupTuples(l))
print result
Result: [[(1, 2, 0), (2, 3, 1), (3, 5, 3), (5, 12, 8), (12, 13, 7)], [(1, 7, 4), (7, 9, 5)], [(4, 6, 2), (6, 8, 6)]]
It does run the matching algorithm over the dataset multiple times, but I've yet to see a configuration of data where the second iteration yields any changes, so not sure recursion is necessary here.

Summing Consecutive Ranges Pythonically

I have a sumranges() function, which sums all the ranges of consecutive numbers found in a tuple of tuples. To illustrate:
def sumranges(nums):
return sum([sum([1 for j in range(len(nums[i])) if
nums[i][j] == 0 or
nums[i][j - 1] + 1 != nums[i][j]]) for
i in range(len(nums))])
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sumranges(nums)
7
As you can see, it returns the number of ranges of consecutive digits within the tuple, that is: len((1, 2, 3, 4), (1), (5, 6), (19, 20), (24), (29), (400)) = 7. The tuples are always ordered.
My problem is that my sumranges() is terrible. I hate looking at it. I'm currently just iterating through the tuple and each subtuple, assigning a 1 if the number is not (1 + previous number), and summing the total. I feel like I am missing a much easier way to accomplish my stated objective. Does anyone know a more pythonic way to do this?
Edit: I have benchmarked all the answers given thus far. Thanks to all of you for your answers.
The benchmarking code is as follows, using a sample size of 100K:
from time import time
from random import randrange
nums = [sorted(list(set(randrange(1, 10) for i in range(10)))) for
j in range(100000)]
for func in sumranges, alex, matt, redglyph, ephemient, ferdinand:
start = time()
result = func(nums)
end = time()
print ', '.join([func.__name__, str(result), str(end - start) + ' s'])
Results are as follows. Actual answer shown to verify that all functions return the correct answer:
sumranges, 250281, 0.54171204567 s
alex, 250281, 0.531121015549 s
matt, 250281, 0.843333005905 s
redglyph, 250281, 0.366822004318 s
ephemient, 250281, 0.805964946747 s
ferdinand, 250281, 0.405596971512 s
RedGlyph does edge out in terms of speed, but the simplest answer is probably Ferdinand's, and probably wins for most pythonic.
My 2 cents:
>>> sum(len(set(x - i for i, x in enumerate(t))) for t in nums)
7
It's basically the same idea as descriped in Alex' post, but using a set instead of itertools.groupby, resulting in a shorter expression. Since sets are implemented in C and len() of a set runs in constant time, this should also be pretty fast.
Consider:
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> flat = [[(x - i) for i, x in enumerate(tu)] for tu in nums]
>>> print flat
[[1, 1, 1, 1], [1, 4, 4], [19, 19, 22, 26, 396]]
>>> import itertools
>>> print sum(1 for tu in flat for _ in itertools.groupby(tu))
7
>>>
we "flatten" the "increasing ramps" of interest by subtracting the index from the value, turning them into consecutive "runs" of identical values; then we identify and could the "runs" with the precious itertools.groupby. This seems to be a pretty elegant (and speedy) solution to your problem.
Just to show something closer to your original code:
def sumranges(nums):
return sum( (1 for i in nums
for j, v in enumerate(i)
if j == 0 or v != i[j-1] + 1) )
The idea here was to:
avoid building intermediate lists but use a generator instead, it will save some resources
avoid using indices when you already have selected a subelement (i and v above).
The remaining sum() is still necessary with my example though.
Here's my attempt:
def ranges(ls):
for l in ls:
consec = False
for (a,b) in zip(l, l[1:]+(None,)):
if b == a+1:
consec = True
if b is not None and b != a+1:
consec = False
if consec:
yield 1
'''
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sum(ranges(nums))
7
'''
It looks at the numbers pairwise, checking if they are a consecutive pair (unless it's at the last element of the list). Each time there's a consecutive pair of numbers it yields 1.
This could probably be put together in a more compact form, but I think clarity would suffer:
def pairs(seq):
for i in range(1,len(seq)):
yield (seq[i-1], seq[i])
def isadjacent(pair):
return pair[0]+1 == pair[1]
def sumrange(seq):
return 1 + sum([1 for pair in pairs(seq) if not isadjacent(pair)])
def sumranges(nums):
return sum([sumrange(seq) for seq in nums])
nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
print sumranges(nums) # prints 7
You could probably do this better if you had an IntervalSet class because then you would scan through your ranges to build your IntervalSet, then just use the count of set members.
Some tasks don't always lend themselves to neat code, particularly if you need to write the code for performance.
There is a formula for this, the sum of the first n numbers, 1+ 2+ ... + n = n(n+1) / 2 . Then if you want to have the sum of i-j then it is (j(j+1)/2) - (i(i+1)/2) this I am sure simplifies but you can work that out. It might not be pythonic but it is what I would use.

Categories