python ElementTree.Element missing text? - python

So, I'm parsing this xml file of moderate size (about 27K lines). Not far into it, I'm seeing unexpected behavior from ElementTree.Element where I get Element.text for one entry but not the next, yet it's there in the source XML as you can see:
<!-- language: lang-xml -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:enumeration value="24">
<xs:annotation>
<xs:documentation>UPC12 (item-specific) on cover 2</xs:documentation>
<xs:documentation>AKA item/price; ‘cover 2’ is defined as the inside front cover of a book</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="25">
<xs:annotation>
<xs:documentation>UPC12+5 (item-specific) on cover 2</xs:documentation>
<xs:documentation>AKA item/price; ‘cover 2’ is defined as the inside front cover of a book</xs:documentation>
</xs:annotation>
</xs:enumeration>
When I encounter an enumeration tag I call this function:
import xml.etree.cElementTree as ElementTree
...
def _parse_list_item(xmlns: str, list_id: int, itemElement: ElementTree.Element) -> ListItem:
if isinstance(itemElement, ElementTree.Element):
if itemElement.attrib['value'] is not None:
item_id = itemElement.attrib['value'] # string
if list_id == 6 and (item_id == '25' or item_id=='24'):
print(list_id, item_id) # <== debug break point here
desc = None
notes = ""
for child in itemElement:
if child.tag == (xmlns + 'annotation'):
for grandchild in child:
if grandchild.tag == (xmlns + 'documentation'):
if desc is None:
desc = grandchild.text
else:
if len(notes)>0:
notes += " " # add a space
notes += grandchild.text or ""
if item_id is not None and desc is not None:
return Codex.ListItem({'itemId': item_id, 'listId': list_id, 'description': desc, 'notes': notes})
If I place a breakpoint at the print statement, when I get to the enumeration node for "24" I can look at the text for the grandchild nodes and they are as shown in the XML, i.e. "UPC12..." or "AKA item...", but when I get to the enumeration node for "25", and look at the grandchild text, it's None.
When I remove the xs: namespace by pre-filtering the XML file, the grandchild text comes through fine.
Is it possible I'm over some size limit or is there some syntax problem?
Sorry for less-than-pythonic code but I wanted to be able to examine all the intermediate values in pycharm. It's python 3.6.
Thanks for any insights you may have!

In the for loop, this condition is never met: if child.tag == (xmlns + 'annotation'):.
Why?
Try to output the child's tag. If we suppose your namespace (xmlns) is 'Steve' then:
print(child.tag) will output: {Steve}annotation, not Steveannotation.
So given this fact, if child.tag == (xmlns + 'annotation'): is always False.
You should change it to: if child.tag == ('{'+xmlns+'}annotation'):
With the same logic, you will find out you will also have to change this condition:
if grandchild.tag == (xmlns + 'documentation'):
to:
if grandchild.tag == ('{'+xmlns+'}documentation'):

So, ultimately, I solved my problem by running a pre-process on the XML file to remove the xs: namespace from all of the open/close XML tags and then I was able to successfully process the file using the function as defined above. Not sure why namespaces are causing problems, but perhaps there is a bug in cElementTree for namespace prefixes in large XML files. To #mzjn - I expect that it would be difficult to construct a minimal example as it does process hundreds of items correctly before it fails, so I would at least have to provide a fairly large XML file. Nevertheless, thanks for being a sounding board.

Related

How to handle XML element iter nested attributes with the same tag

I am trying to parse NPORT-P XML SEC submission. My code (Python 3.6.8) with a sample XML record:
import xml.etree.ElementTree as ET
content_xml = '<?xml version="1.0" encoding="UTF-8"?><edgarSubmission xmlns="http://www.sec.gov/edgar/nport" xmlns:com="http://www.sec.gov/edgar/common" xmlns:ncom="http://www.sec.gov/edgar/nportcommon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><headerData></headerData><formData><genInfo></genInfo><fundInfo></fundInfo><invstOrSecs><invstOrSec><name>N/A</name><lei>N/A</lei><title>US 10YR NOTE (CBT)Sep20</title><cusip>N/A</cusip> <identifiers> <ticker value="TYU0"/> </identifiers> <derivativeInfo> <futrDeriv derivCat="FUT"> <counterparties> <counterpartyName>Chicago Board of Trade</counterpartyName> <counterpartyLei>549300EX04Q2QBFQTQ27</counterpartyLei> </counterparties><payOffProf>Short</payOffProf> <descRefInstrmnt> <otherRefInst> <issuerName>U.S. Treasury 10 Year Notes</issuerName> <issueTitle>U.S. Treasury 10 Year Notes</issueTitle> <identifiers> <cusip value="N/A"/><other otherDesc="USER DEFINED" value="TY_Comdty"/> </identifiers> </otherRefInst> </descRefInstrmnt> <expDate>2020-09-21</expDate> <notionalAmt>-2770555</notionalAmt> <curCd>USD</curCd> <unrealizedAppr>-12882.5</unrealizedAppr></futrDeriv> </derivativeInfo> </invstOrSec> </invstOrSecs> <signature> </signature> </formData></edgarSubmission>'
content_tree = ET.ElementTree(ET.fromstring(bytes(content_xml, encoding='utf-8')))
content_root = content_tree.getroot()
for edgar_submission in content_root.iter('{http://www.sec.gov/edgar/nport}edgarSubmission'):
for form_data in edgar_submission.iter('{http://www.sec.gov/edgar/nport}formData'):
for genInfo in form_data.iter('{http://www.sec.gov/edgar/nport}genInfo'):
None
for fundInfo in form_data.iter('{http://www.sec.gov/edgar/nport}fundInfo'):
None
for invstOrSecs in form_data.iter('{http://www.sec.gov/edgar/nport}invstOrSecs'):
for invstOrSec in invstOrSecs.iter('{http://www.sec.gov/edgar/nport}invstOrSec'):
myrow = []
myrow.append(getattr(invstOrSec.find('{http://www.sec.gov/edgar/nport}name'), 'text', ''))
myrow.append(getattr(invstOrSec.find('{http://www.sec.gov/edgar/nport}lei'), 'text', ''))
security_title = getattr(invstOrSec.find('{http://www.sec.gov/edgar/nport}title'), 'text', '')
myrow.append(security_title)
myrow.append(getattr(invstOrSec.find('{http://www.sec.gov/edgar/nport}cusip'), 'text', ''))
for identifiers in invstOrSec.iter('{http://www.sec.gov/edgar/nport}identifiers'):
if identifiers.find('{http://www.sec.gov/edgar/nport}isin') is not None:
myrow.append(identifiers.find('{http://www.sec.gov/edgar/nport}isin').attrib['value'])
else:
myrow.append('')
if security_title == "US 10YR NOTE (CBT)Sep20":
print("No ISIN")
if identifiers.find('{http://www.sec.gov/edgar/nport}ticker') is not None:
myrow.append(identifiers.find('{http://www.sec.gov/edgar/nport}ticker').attrib['value'])
else:
myrow.append('')
if security_title == "US 10YR NOTE (CBT)Sep20":
print("No Ticker")
if identifiers.find('{http://www.sec.gov/edgar/nport}other') is not None:
myrow.append(identifiers.find('{http://www.sec.gov/edgar/nport}other').attrib['value'])
else:
myrow.append('')
if security_title == "US 10YR NOTE (CBT)Sep20":
print("No Other")
The output from this code is:
No ISIN
No Other
No ISIN
No Ticker
This working fine aside from the fact that the identifiers iter invstOrSec.iter('{http://www.sec.gov/edgar/nport}identifiers') finds identifiers under formData>invstOrSecs>invstOrSec but also other identifiers under a nested tag under formData>invstOrSecs>invstOrSec>derivativeInfo>futrDeriv>descRefInstrmnt>otherRefInst. How can I restrict my iter or the find to the right level? I have unsuccessfully tried to get the parent but I am not finding how to do this using the {namespace}tag notation. Any ideas?
So I switched from ElementTree to lxml using an import like this to avoid code changes:
from lxml import etree as ET
Make sure you check https://lxml.de/1.3/compatibility.html to avoid compatibility issues. In my case lxml worked without issues.
And I then I was able to use the getparent() method to be able to only read the identifiers from the right part of the XML file:
if identifiers.getparent().tag == '{http://www.sec.gov/edgar/nport}invstOrSec':

Python/LXML - Children node returning 'NoneType'

Here is a sample of the XML tree I am traversing:
<entry dataset="Swiss-Prot" created="1993-07-01+01:00" modified="2013-04-03+01:00" version="144">
<accession>P31750</accession>
<accession>Q62274</accession>
<accession>Q6GSA6</accession>
<name>AKT1_MOUSE</name>
<protein>
<recommendedName>
<fullName>RAC-alpha serine/threonine-protein kinase</fullName>
<ecNumber>2.7.11.1</ecNumber>
</recommendedName>
<alternativeName>
<fullName>AKT1 kinase</fullName>
</alternativeName><alternativeName>
<fullName>Protein kinase B</fullName>
<alternativeName>
<fullName>Some other value</fullName>
</alternativeName><alternativeName>
..........
I am trying to get to alternativeName. I do not have any trouble getting to recommended name, so I try the same approach with alternativeName. However, the Python interpretor outputs the following error message:
for child in protein.find("{http://uniprot.org/uniprot}alternativeName"):
TypeError: 'NoneType' object is not iterable
Here is the Python code I am using to get at these elements. Again, the code works fine for recommendedName but NOT for alternativeName. Thanks for any help!
alt_shortnames = []
alt_fullnames = []
protein = e.find("{http://uniprot.org/uniprot}protein")
for child in protein.find("{http://uniprot.org/uniprot}alternativeName"):
if child.tag == "{http://uniprot.org/uniprot}fullName":
alt_fullnames.append(child.text)
if child.tag == "{http://uniprot.org/uniprot}shortName":
alt_shortnames.append(child.text)
temp_dict["alternativeFullNames"] = alt_fullnames
temp_dict["alternativeShortNames"] = alt_shortnames
You are using protein.find(); the .find() method returns either the found element or None if nothing is found.
If you expect to find a sequence of elements, use .findall(). That method always returns an iterable (possibly empty):
for altName in protein.findall("{http://uniprot.org/uniprot}alternativeName"):
for child in altName:
if child.tag == "{http://uniprot.org/uniprot}fullName":
alt_fullnames.append(child.text)
if child.tag == "{http://uniprot.org/uniprot}shortName":
alt_shortnames.append(child.text)

parse a special xml in python

I have s special xml file like below:
<alarm-dictionary source="DDD" type="ProxyComponent">
<alarm code="402" severity="Alarm" name="DDM_Alarm_402">
<message>Database memory usage low threshold crossed</message>
<description>dnKinds = database
type = quality_of_service
perceived_severity = minor
probable_cause = thresholdCrossed
additional_text = Database memory usage low threshold crossed
</description>
</alarm>
...
</alarm-dictionary>
I know in python, I can get the "alarm code", "severity" in tag alarm by:
for alarm_tag in dom.getElementsByTagName('alarm'):
if alarm_tag.hasAttribute('code'):
alarmcode = str(alarm_tag.getAttribute('code'))
And I can get the text in tag message like below:
for messages_tag in dom.getElementsByTagName('message'):
messages = ""
for message_tag in messages_tag.childNodes:
if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE):
messages += message_tag.data
But I also want to get the value like dnkind(database), type(quality_of_service), perceived_severity(thresholdCrossed) and probable_cause(Database memory usage low threshold crossed
) in tag description.
That is, I also want to parse the content in the tag in xml.
Could anyone help me with this?
Thanks a lot!
Once you have the text from the description tag, it's nothing to do with XML parsing. You just need do simple string-parsing to get the type = quality_of_service keys/values strings into something nicer to use in Python like a dictionary
With some slightly simpler parsing thanks to ElementTree, it would look like this
messages = """
<alarm-dictionary source="DDD" type="ProxyComponent">
<alarm code="402" severity="Alarm" name="DDM_Alarm_402">
<message>Database memory usage low threshold crossed</message>
<description>dnKinds = database
type = quality_of_service
perceived_severity = minor
probable_cause = thresholdCrossed
additional_text = Database memory usage low threshold crossed
</description>
</alarm>
...
</alarm-dictionary>
"""
import xml.etree.ElementTree as ET
# Parse XML
tree = ET.fromstring(messages)
for alarm in tree.getchildren():
# Get code and severity
print alarm.get("code")
print alarm.get("severity")
# Grab description text
descr = alarm.find("description").text
# Parse "thing=other" into dict like {'thing': 'other'}
info = {}
for dl in descr.splitlines():
if len(dl.strip()) > 0:
key, _, value = dl.partition("=")
info[key.strip()] = value.strip()
print info
I'm not completely sure on Python, but after quick research.
Seeing as you can already get all of the content from the description tag in XML, can you not split by line breaks, and then split each line using the str.split() function on the equals signs to give you name / value separately?
e.g.
for messages_tag in dom.getElementsByTagName('message'):
messages = ""
for message_tag in messages_tag.childNodes:
if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE):
messages += message_tag.data
tag = str.split('=');
tagName = tag[0]
tagValue = tag[1]
(I haven't taken into account splitting each line up and looping)
But that should get you on the right track :)
AFAIK there is no library to handle the text as DOM elements.
You can however (after you have the message in the message variable) do:
description = {}
messageParts = message.split("\n")
for part in messageParts:
descInfo = part.split("=")
description[descInfo[0].strip()] = descInfo[1].strip()
then you'll have inside description the information you need in the form of a key-value map.
You should also add error handling on my code...

Python replace XML content with Etree

I'd like to parse and compare 2 XML files with the Python Etree parser as follows:
I have 2 XML files with loads of data. One is in English (the source file), the other one the corresponding French translation (the target file).
E.g.:
source file:
<AB>
<CD/>
<EF>
<GH>
<id>123</id>
<IJ>xyz</IJ>
<KL>DOG</KL>
<MN>dogs/dog</MN>
some more tags and info on same level
<metadata>
<entry>
<cl>Translation</cl>
<cl>English:dog/dogs</cl>
</entry>
<entry>
<string>blabla</string>
<string>blabla</string>
</entry>
some more strings and entries
</metadata>
</GH>
</EF>
<stuff/>
<morestuff/>
<otherstuff/>
<stuffstuff/>
<blubb/>
<bla/>
<blubbbla>8</blubbla>
</AB>
The target file looks exactly the same, but has no text at some places:
<MN>chiens/chien</MN>
some more tags and info on same level
<metadata>
<entry>
<cl>Translation</cl>
<cl></cl>
</entry>
The French target file has an empty cross-language reference where I'd like to put in the information from the English source file whenever the 2 macros have the same ID.
I already wrote some code in which I replaced the string tag name with a unique tag name in order to identify the cross-language reference. Now I want to compare the 2 files and if 2 macros have the same ID, exchange the empty reference in the French file with the info from the English file. I was trying out the minidom parser before but got stuck and would like to try Etree now. I have hardly any knowledge about programming and find this very hard.
Here is the code I have so far:
macros = ElementTree.parse(english)
for tag in macros.getchildren('macro'):
id_ = tag.find('id')
data = tag.find('cl')
id_dict[id_.text] = data.text
macros = ElementTree.parse(french)
for tag in macros.getchildren('macro'):
id_ = tag.find('id')
target = tag.find('cl')
if target.text.strip() == '':
target.text = id_dict[id_.text]
print (ElementTree.tostring(macros))
I am more than clueless and reading other posts on this confuses me even more. I'd appreciate it very much if someone could enlighten me :-)
There is probably more details to be clarified. Here is the sample with some debug prints that shows the idea. It assumes that both files have exactly the same structure, and that you want to go only one level below the root:
import xml.etree.ElementTree as etree
english_tree = etree.parse('en.xml')
french_tree = etree.parse('fr.xml')
# Get the root elements, as they support iteration
# through their children (direct descendants)
english_root = english_tree.getroot()
french_root = french_tree.getroot()
# Iterate through the direct descendants of the root
# elements in both trees in parallel.
for en, fr in zip(english_root, french_root):
assert en.tag == fr.tag # check for the same structure
if en.tag == 'id':
assert en.text == fr.text # check for the same id
elif en.tag == 'string':
if fr.text is None:
fr.text = en.text
print en.text # displaying what was replaced
etree.dump(french_tree)
For more complex structures of the file, the loop through the direct children of the node can be replaced by iteration through all the elements of the tree. If the structures of the files are exactly the same, the following code will work:
import xml.etree.ElementTree as etree
english_tree = etree.parse('en.xml')
french_tree = etree.parse('fr.xml')
for en, fr in zip(english_tree.iter(), french_tree.iter()):
assert en.tag == fr.tag # check if the structure is the same
if en.tag == 'id':
assert en.text == fr.text # identification must be the same
elif en.tag == 'string':
if fr.text is None:
fr.text = en.text
print en.text # display the inserted text
# Write the result to the output file.
with open('fr2.xml', 'w') as fout:
fout.write(etree.tostring(french_tree.getroot()))
However, it works only in cases when both files have exactly the same structure. Let's follow the algorithm that would be used when the task is to be done manually. Firstly, we need to find the French translation that is empty. Then it should be replaced by the English translation from the GH element with the same identification. A subset of XPath expressions is used in the case when searching for the elements:
import xml.etree.ElementTree as etree
def find_translation(tree, id_):
# Search fot the GH element with the given identification, and return
# its translation if found. Otherwise None is returned implicitly.
for gh in tree.iter('GH'):
id_elem = gh.find('./id')
if id_ == id_elem.text:
# The related GH element found.
# Find metadata entry, extract the translation.
# Warning! This is simplification for the fixed position
# of the Translation entry.
me = gh.find('./metadata/entry')
assert len(me) == 2 # metadata/entry has two elements
cl1 = me[0]
assert cl1.text == 'Translation'
cl2 = me[1]
return cl2.text
# Body of the program. --------------------------------------------------
english_tree = etree.parse('en.xml')
french_tree = etree.parse('fr.xml')
for gh in french_tree.iter('GH'): # iterate through the GH elements only
# Get the identification of the GH section
id_elem = gh.find('./id')
id_ = id_elem.text
# Find and check the metadata entry, extract the French translation.
# Warning! This is simplification for the fixed position of the Translation
# entry.
me = gh.find('./metadata/entry')
assert len(me) == 2 # metadata/entry has two elements
cl1 = me[0]
assert cl1.text == 'Translation'
cl2 = me[1]
fr_translation = cl2.text
# If the French translation is empty, put there the English translation
# from the related element.
if cl2.text is None:
cl2.text = find_translation(english_tree, id_)
with open('fr2.xml', 'w') as fout:
fout.write(etree.tostring(french_tree.getroot()).decode('utf-8'))

Search and remove element with elementTree in Python

I have an XML document in which I want to search for some elements and if they match some criteria
I would like to delete them
However, I cannot seem to be able to access the parent of the element so that I can delete it
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.attrib.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
#here I need to access the parent of prop
# in order to delete the prop
Is there a way I can do this?
Thanks
You can remove child elements with the according remove method. To remove an element you have to call its parents remove method. Unfortunately Element does not provide a reference to its parents, so it is up to you to keep track of parent/child relations (which speaks against your use of elem.findall())
A proposed solution could look like this:
root = elem.getroot()
for child in root:
if child.name != "prop":
continue
if True:# TODO: do your check here!
root.remove(child)
PS: don't use prop.attrib.get(), use prop.get(), as explained here.
You could use xpath to select an Element's parent.
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
# Get parent and remove this prop
parent = prop.find("..")
parent.remove(prop)
http://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax
Except if you try that it doesn't work: http://elmpowered.skawaii.net/?p=74
So instead you have to:
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)
# Use xpath to get all parents of props
prop_parents = elem.findall(search + '/..')
for parent in prop_parents:
# Still have to find and iterate through child props
for prop in parent.findall(search):
type = prop.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
parent.remove(prop)
It is two searches and a nested loop. The inner search is only on Elements known to contain props as first children, but that may not mean much depending on your schema.
I know this is an old thread but this kept popping up while I was trying to figure out a similar task. I did not like the accepted answer for two reasons:
1) It doesn't handle multiple nested levels of tags.
2) It will break if multiple xml tags are deleted in the same level one-after-another. Since each element is an index of Element._children you shouldn't delete while forward iterating.
I think a better more versatile solution is this:
import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
root = tree.getroot()
def iterator(parents, nested=False):
for child in reversed(parents):
if nested:
if len(child) >= 1:
iterator(child)
if True: # Add your entire condition here
parents.remove(child)
iterator(root, nested=True)
For the OP, this should work - but I don't have the data you're working with to test if it's perfect.
import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
namespace = "{http://somens}"
props = tree.findall('.//{0}prop'.format(namespace))
def iterator(parents, nested=False):
for child in reversed(parents):
if nested:
if len(child) >= 1:
iterator(child)
if prop.attrib.get('type') == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
parents.remove(child)
iterator(props, nested=True)
A solution using lxml module
from lxml import etree
root = ET.fromstring(xml_str)
for e in root.findall('.//{http://some.name.space}node'):
parent = e.getparent()
for child in parent.find('./{http://some.name.space}node'):
try:
parent.remove(child)
except ValueError:
pass
Using the fact that every child must have a parent, I'm going to simplify #kitsu.eb's example. f using the findall command to get the children and parents, their indices will be equivalent.
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)
# Use xpath to get all parents of props
prop_parents = elem.findall(search + '/..')
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.attrib.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
#use the index of the current child to find
#its parent and remove the child
prop_parents[props.index[prop]].remove(prop)
I also used XPath for this issue, but in a different way:
root = elem.getroot()
elementName = "YourElement"
#this will find all the parents of the elements with elementName
for elementParent in root.findall(".//{}/..".format(elementName)):
#this will find all the elements under the parent, and remove them
for element in elementParent.findall("{}".format(elementName)):
elementParent.remove(element)
I like to use an XPath expression for this kind of filtering. Unless I know otherwise, such an expression must be applied at the root level, which means I can't just get a parent and apply the same expression on that parent. However, it seems to me that there is a nice and flexible solution that should work with any supported XPath, as long as none of the sought nodes is the root. It goes something like this:
root = elem.getroot()
# Find all nodes matching the filter string (flt)
nodes = root.findall(flt)
while len(nodes):
# As long as there are nodes, there should be parents
# Get the first of all parents to the found nodes
parent = root.findall(flt+'/..')[0]
# Use this parent to remove the first node
parent.remove(nodes[0])
# Find all remaining nodes
nodes = root.findall(flt)
I would like only to add a comment on the accepted answer, but my lack of reputation doesn't allow me to. I wanted to add that it is important to add .findall("*")to the iterator to avoid issues, as stated in the documentation:
Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. Therefore, the example first collects all matching elements with root.findall(), and only then iterates over the list of matches.
Therefore, in the accepted answer the iteration should be for child in root.findal("*"):instead of for child in root:. Not doing so made my code skip some elements from the list.

Categories