Intrusion Detection Working Group D. Curry Internet Draft ISS Document: draft-ietf-idwg-idmef-xml-02.txt December 2000 Category: Informational Intrusion Detection Message Exchange Format IDMEF Data Model Extensible Markup Language (XML) Document Type Definition David A. Curry Internet Security Systems, Inc. davy@iss.net Herve Debar France Telecom R&D herve.debar@francetelecom.fr Ming-Yuh Huang The Boeing Company Ming-Yuh.Huang@boeing.com Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Curry Informational - Expires June 2001 [Page 1] Internet Draft IDMEF Data Model and XML DTD December 2000 Abstract The purpose of the Intrusion Detection Message Exchange Format (IDMEF) is to define data formats and exchange procedures for sharing information of interest to intrusion detection and response systems, and to the management systems that may need to interact with them. The goals and requirements of the IDMEF are described in [2]. This Internet-Draft describes a proposed data model to represent the information exported by the intrusion-detection systems, including the rationale for this model, and a proposed implementation of this data model, using the Extensible Markup Language (XML) [3]. The rationale for choosing XML is explained, a Document Type Definition (DTD) is developed, and examples are provided. An earlier version of this implementation was reviewed, along with other proposed implementations, by the IDWG at its September 1999 and February 2000 meetings. At the February meeting, it was decided that the XML solution was best at fulfilling the IDWG requirements. The rationale for this decision is presented in [4]. Curry Informational - Expires June 2001 [Page 2] Internet Draft IDMEF Data Model and XML DTD December 2000 Table of Contents 1. Introduction...................................................6 2. Conventions used in this document..............................6 3. Rationale for the IDMEF Data Model.............................7 3.1. Problems addressed by the data model.........................7 3.2. Design goals.................................................8 3.2.1. Representing events........................................8 3.2.2. Content driven.............................................8 3.2.3. Relationship between alerts................................9 3.3. UML Overview.................................................9 3.3.1. Relationships..............................................9 3.3.1.1. Inheritance Relationship.................................9 3.3.1.2. Aggregation Relationship................................10 3.3.1.3. Multiplicity Indicator..................................10 3.3.2. Types and default values..................................11 4. The Data Model................................................12 4.1. Data model overview.........................................12 4.2. The core of the data model..................................13 4.2.1. The ALERT class...........................................15 4.2.2. The TOOLALERT class.......................................16 4.2.3. The CORRELATIONALERT class................................16 4.2.4. The OVERFLOWALERT class...................................16 4.2.5. The ANALYZER class........................................17 4.2.6. The CLASSIFICATION class..................................17 4.2.7. The ADDITIONALDATA class..................................18 4.2.8. The TARGET class..........................................19 4.2.9. The SOURCE class..........................................20 4.2.10. The support classes......................................20 4.2.10.1. The IDENT class........................................21 4.2.10.2. The ADDRESS class......................................22 4.2.10.3. The USER class.........................................23 4.2.10.4. The NODE class.........................................25 4.2.10.5. The PROCESS class......................................26 4.2.11. The SERVICE class........................................27 4.2.11.1. The WEBSERVICE class...................................27 4.2.11.2. The SNMPSERVICE class..................................28 4.3. Extension of the data model.................................28 4.3.1. Extension by aggregation..................................29 4.3.2. Extension by subclassing..................................29 5. Arguments for the realization of the class hierarchy in XML...30 5.1. The Extensible Markup Language..............................30 5.2. Rationale for Implementing IDMEF in XML.....................30 5.3. Relationship to the IDMEF Class Hierarchy...................31 6. Use of XML in the IDMEF.......................................33 6.1. The IDMEF Document Prolog...................................33 6.1.1. XML Declaration...........................................33 6.1.2. XML Document Type Definition (DTD)........................33 6.1.3. IDMEF DTD Formal Public Identifier........................34 6.1.4. IDMEF DTD Document Type Declaration.......................34 6.2. Character Data Processing in XML and IDMEF..................35 6.2.1. Character Entity References...............................36 Curry Informational - Expires June 2001 [Page 3] Internet Draft IDMEF Data Model and XML DTD December 2000 6.2.2. Character Code References.................................36 6.2.3. White Space Processing....................................36 6.3. Languages in XML and IDMEF..................................37 6.4. Unrecognized Tags in IDMEF Messages.........................37 6.5. Digital Signatures..........................................38 7. IDMEF Data Types..............................................39 8. Structure of an IDMEF Message.................................41 8.1. The IDMEF-Message Root Element..............................41 8.2. The Message Type Elements...................................41 8.2.1. Alert.....................................................42 8.2.1.1. CorrelationAlert........................................43 8.2.1.2. OverflowAlert...........................................43 8.2.1.3. ToolAlert...............................................43 8.2.2. Heartbeat.................................................44 8.3. Time Elements...............................................44 8.3.1. Time......................................................44 8.3.2. DetectTime................................................45 8.3.3. AnalyzerTime..............................................46 8.4. High-Level Entity Identification Elements...................46 8.4.1. Analyzer..................................................46 8.4.2. Target....................................................46 8.4.3. Source....................................................47 8.5. Low-Level Entity Identification Elements....................48 8.5.1. Address...................................................48 8.5.2. Classification............................................48 8.5.3. Node......................................................49 8.5.4. Process...................................................49 8.5.5. Service...................................................50 8.5.5.1. SNMPService.............................................51 8.5.5.2. WebService..............................................51 8.5.6. User......................................................52 8.6. Simple Elements.............................................52 8.6.1. address...................................................52 8.6.2. alertid...................................................53 8.6.3. Arguments.................................................53 8.6.3.1. arg.....................................................53 8.6.4. buffer....................................................53 8.6.5. cgi.......................................................53 8.6.6. command...................................................53 8.6.7. community.................................................53 8.6.8. date......................................................54 8.6.9. dport.....................................................54 8.6.10. Environment..............................................54 8.6.10.1. env....................................................54 8.6.11. gid......................................................54 8.6.12. group....................................................54 8.6.13. location.................................................54 8.6.14. method...................................................55 8.6.15. name.....................................................55 8.6.16. netmask..................................................55 8.6.17. ntpstamp.................................................55 8.6.18. oid......................................................55 Curry Informational - Expires June 2001 [Page 4] Internet Draft IDMEF Data Model and XML DTD December 2000 8.6.19. path.....................................................55 8.6.20. pid......................................................55 8.6.21. portlist.................................................55 8.6.22. program..................................................56 8.6.23. protocol.................................................56 8.6.24. serial...................................................56 8.6.25. size.....................................................56 8.6.26. sport....................................................56 8.6.27. time.....................................................56 8.6.28. url......................................................56 8.6.29. uid......................................................56 8.7. Providing Additional Information............................57 8.7.1. AdditionalData............................................57 9. Examples......................................................57 9.1. Denial of Service Attacks...................................57 9.1.1. The "teardrop" Attack.....................................57 9.1.2. The "ping of death" Attack................................58 9.2. Port Scanning Attacks.......................................59 9.2.1. Connection to a Disallowed Service........................60 9.2.2. Simple Port Scanning......................................61 9.3. Local Attacks...............................................62 9.3.1. The "loadmodule" Attack...................................62 9.3.2. The "phf" Attack..........................................64 9.4. System Policy Violation.....................................65 9.5. Correlated Alerts...........................................66 9.6. Heartbeat...................................................68 10. Extending the IDMEF..........................................68 10.1. Extending an Existing Attribute............................69 10.2. Adding an Attribute........................................70 10.3. Adding an Element..........................................70 11. The IDMEF Document Type Definition...........................71 12. Security Considerations......................................83 13. References...................................................84 14. Acknowledgments..............................................85 15. Author's Addresses...........................................85 Curry Informational - Expires June 2001 [Page 5] Internet Draft IDMEF Data Model and XML DTD December 2000 1. Introduction The Intrusion Detection Message Exchange Format (IDMEF) [2] is intended to be a standard data format that automated intrusion detection systems can use to report alerts about events that they have deemed suspicious. The development of this standard format will enable interoperability among commercial, open source, and research systems, allowing users to mix-and-match the deployment of these systems according to their strong and weak points to obtain an optimal implementation. The most obvious place to implement the IDMEF is in the data channel between an intrusion-detection "analyzer" (or "sensor") and the "manager" (or "console") to which it sends alarms. But there are other places where the IDMEF can be useful: + A single database system that could store the results from a variety of intrusion detection products would make it possible for data analysis and reporting activities to be performed on "the whole picture" instead of just a part of it; + An event correlation system that could accept alerts from a variety of intrusion detection products would be capable of performing more sophisticated cross-correlation and cross- confirmation calculations than one that is limited to a single product; + A graphical user interface that could display alerts from a variety of intrusion detection products would enable the user to monitor all of the products from a single screen, and require him or her to learn only one interface, instead of several; and + A common data exchange format would make it easier for different organizations (users, vendors, response teams, law enforcement) to not only exchange data, but also communicate about it. 2. Conventions used in this document The keywords "MUST", "MUST NOT", "SHALL, "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED, and "MAY in this document are to be interpreted as described in RFC-2119 [5]. Curry Informational - Expires June 2001 [Page 6] Internet Draft IDMEF Data Model and XML DTD December 2000 3. Rationale for the IDMEF Data Model 3.1. Problems addressed by the data model The reasons for proposing an object-oriented model as the data representation format of the IDWG are: + Alert information is inherently heterogeneous. Certain alerts are defined with very little information, such as origin, destination, name and time of the event. Other alerts provide much more context, such as ports or services, processes, user information, and others. Therefore, it is important that the data representation proposed is flexible enough to accommodate different needs. An object-oriented model has a natural extensibility via subclassing. If an implementation of the data model extends it with new classes, either by aggregation or subclassing, an implementation that does not understand these extensions will still be able to understand the subset of information that is defined by the data model. Subclassing and aggregation provide extensibility while preserving the consistency of the model. + Tool environments are different. Some tools detect attacks by analyzing network traffic while others use operating system logs, or application audit information. The same attack reported by tools with different information sources will not contain the same information. The data model defines support classes that accommodate the differences in data sources among tools. In particular, the notion of target and source for the alert are represented by the combination of NODE, USER, PROCESS and SERVICE classes. + Tool capabilities are different. Depending on the environment, one may install a lightweight tool providing little information. More complex tools that will have a greater impact on the running system provide more detailed information about the alerts observed by the intrusion detection system. The data model must allow for conversion to formats used by tools other than intrusion detection sensors, for the purpose of further processing the alert information. The data model defines extensions to the basic schema that allow carrying both simple and complex alerts. Extensions are either done through subclassing or association of new classes. Curry Informational - Expires June 2001 [Page 7] Internet Draft IDMEF Data Model and XML DTD December 2000 + Operating environments are different. Depending on the kind of network, or operating system used, attacks will be observed and reported with different characteristics. The data model should accommodate these differences. The reporting flexibility is brought by the definition of the NODE and SERVICE support classes. If additional information must be reported, subclasses should be defined that extend the data model with the additional attributes. + Commercial vendor objectives are different. Depending on the constraints set forth for the development of the tool, or on the operating environment, vendors may wish to deliver more or less information about certain attacks. Vendors may want to provide more information about alerts. Again, the object-oriented approach allows this flexibility while specifying how the subclassing mechanism must be used. 3.2. Design goals 3.2.1. Representing events The goal of the data model is to provide a standard representation of the information that an intrusion-detection analyzer detected an occurrence of some unusual activity. These alerts may be simple or complex, depending on the capabilities of the analyzer that created them. 3.2.2. Content driven The design of the data model is content-driven. This means that new objects are introduced to accommodate additional content, not semantic differences between the alerts. This is an important goal as the task of classifying and naming computer vulnerabilities is extremely difficult and subjective. The data model MUST be unambiguous. This means that we allow tools to be more or less precise than one another, e.g., one tool may report more information about an event than another. However, we do not allow them to produce contradictory information in two alerts describing the same event; e.g., in the previous case, the common set of information reported by the two tools MUST be identical and inserted in the same placeholder of the data structure. Of course, it is always possible to insert all interesting information about an event in the extensions to an alert instead of using pre-defined fields; doing this reduces interoperability and MUST be avoided as much as possible. Curry Informational - Expires June 2001 [Page 8] Internet Draft IDMEF Data Model and XML DTD December 2000 3.2.3. Relationship between alerts Intrusion detection alerts can be transmitted at several levels. This draft applies to both very simple alerts (those alerts that are the result of a single action or operation in the system, such as a failed login report) and to more complex alerts (the aggregation of several actions in the system to generate the alert, or the aggregation of several simple alerts). As such, the data model must provide a way to describe the relationship between low level and high-level alerts. 3.3. UML Overview The data model is described using the Universal Modeling Language (UML)[6]. UML provides a simple framework to represent entities and their relationships. UML define entities as classes. In this document we have identified the classes with the associated attributes. The symbols used in this document to represent class and attributes are: +---------------------+ | CLASS | -> Class name (in uppercase) +---------------------+ | Attribute | -> Name of attribute 1 | ... | | Attribute | -> Name of attribute N +---------------------+ Please note that attributes for a class do not appear in all diagrams that use the class. 3.3.1. Relationships This data model currently uses only two relationships, the inheritance relationship and the aggregation relationship. 3.3.1.1. Inheritance Relationship Inheritance denotes a superclass, subclass type of relationship where the subclass inherits all the attributes, operations and relationships of the superclass. This type of relationship is also referred to as an "is-a" or a "kind-of" relationship. The subclasses have additional attributes or operations, which apply only to the subclass and not to the superclass. In this document, inheritance is represented by the / \ symbol. In the example below we are stating that an EXTENDED ALERT is a kind of Curry Informational - Expires June 2001 [Page 9] Internet Draft IDMEF Data Model and XML DTD December 2000 ALERT. It contains all the attributes of Alert as well as any attributes contained in the extended alert class. (Note: EXTENDED ALERT does not have any particular role elsewhere in this document). +---------------------+ | ALERT | +---------------------+ /_\ | | +---------------------+ | EXTENDED_ALERT | +---------------------+ 3.3.1.2. Aggregation Relationship Aggregation is a form of association in which the whole is related to its parts. This type of relationship is also referred to as a "part- of" relationship. In this case the aggregate class contains all of its own attributes and as many of the attributes associated with its parts as required and specified in the multiplicity indicators (discussed in the next paragraph). In this document the symbol <> is used to indicate aggregation. It is placed at the end of the association line closest to the aggregate (whole) class. +---------------------+ 1 +---------------------+ | ALERT |<>----------| ANALYZER | +---------------------+ +---------------------+ | | | | 1 +---------------------+ | |<>----------| NAME | | | +---------------------+ | | | | 0..*+---------------------+ | |<>----------| TARGET | | | +---------------------+ | | | | 0..*+---------------------+ | |<>----------| SOURCE | | | +---------------------+ | | | | 0..*+---------------------+ | |<>----------| ADDITIONALDATA | +---------------------+ +---------------------+ 3.3.1.3. Multiplicity Indicator Multiplicity defines the number of objects within a class that are linked to one another by an aggregation relationship (Section 3.3.1.2). Typically multiplicity indicators are placed at each end of the association line. In this document if a multiplicity number is Curry Informational - Expires June 2001 [Page 10] Internet Draft IDMEF Data Model and XML DTD December 2000 left off it is assumed to be 1 (one). Standard symbols, as used in this document, are: 1 = Exactly One 0..* = Zero or More 1..* = One or More 0..1 = Zero or One 5..8 = Specific Range (5,6,7, & 8) In the example above an Alert contains all the attributes of its own class and the attributes of exactly one Analyzer and the attributes of 1 Name. I may contain the attributes of zero or more target(s), zero or more source(s), and zero or more items of additional data. 3.3.2. Types and default values Attributes of the classes defined by the data model are typed. This type information is not an exact requirement on the implementation; it is rather intended to convey to the reader what kind of data the data model is expecting for this attribute. The exact representation is left to the XML DTD starting in Section 7. For example, the INTEGER type indicates that the data model views this attribute as an integer. It might actually be encoded as a binary 32-bit integer, a binary 64-bit integer, or a string in an XML representation. Name Definition BOOLEAN Boolean = (TRUE, FALSE) INTEGER The value provided MUST be an integer CHARACTER UTF-8 or UTF-16 character STRING An ordered sequence of characters of known length BYTE Byte (8-bits, no parity) TIME A structure or schema carrying time representation. This is defined in the XML representation in Section 7 and is understood as a generic time representation for the data model. ENUM An enumerated type consisting of an ordered list of acceptable values. When an attribute is specified as ENUM, a table describing the acceptable values for the ENUM follows the attribute definition table. Each value has a rank (number) and a representing keyword. The default values are specified per class for each attribute. Only mandatory attributes have default values. Curry Informational - Expires June 2001 [Page 11] Internet Draft IDMEF Data Model and XML DTD December 2000 4. The Data Model 4.1. Data model overview An overview of the data model is presented in Figure 1. The main component is the ALERT class, which bears minimum required information along with the ANALYZER and CLASSIFICATION classes. The ANALYZER class describes the sender of the alert, i.e. the analyzer; every alert must be associated with one and only one analyzer. The CLASSIFICATION class describes the subject of the alert, i.e. the reason for sending it; at least one of them MUST be provided in the alert. +-------+ 1+----------------+ | ALERT |<>------| ANALYZER | | | +----------------+ 0..1+---------+ | | +------| USER | | | 1..*+----------------+ | +---------+ | |<>------| CLASSIFICATION | | 0..1+---------+ | | +----------------+ +------| PROCESS | | | | +---------+ | | 0..*+----------+ 0..1+---------+ | 0..1+---------+ | |<>------| TARGET |<>------| NODE |<>-+------| SERVICE | | | +----------+ +---------+ +---------+ | | | | 0..*+----------+ 0..1+---------+ 0..1+---------+ | |<>------| SOURCE |<>------| NODE |<>-+------| USER | | | +----------+ +---------+ | +---------+ | | | 0..1+---------+ | | 0..*+----------------+ +------| PROCESS | | |<>------| ADDITIONALDATA | +---------+ +-------+ +----------------+ Figure 1: Data Model Overview In addition, each alert is associated with zero or more TARGETs, and with zero or more SOURCEs. Each TARGET and SOURCE is described by a number of attributes, as described in sections 4.2.8 and 4.2.9. Provision of additional alert data is done by subclassing any of the model classes. Standard extensions and examples are given in section 4.3. It is important to note that this data model does not specify how an alert should be classified or identified. For example, a port scan may be determined as a single attack against multiple targets by one sensor where another sensor may see it as multiple attacks by a single source. The taxonomy for this analysis lies in each IDS. Once the alert type is determined this data model provides the standard structure for formatting the alert. Curry Informational - Expires June 2001 [Page 12] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2. The core of the data model The core of the data model is the ALERT class. Every alert is associated with a single analyzer that generated it, and a single time. It is also associated with a list of 0 or more targets, and a list of 0 or more sources. This relationship is illustrated in Figure 2. +--------------------+ 1..1+----------------------+ | ALERT |<>---------| ANALYZER | |--------------------| +----------------------+ | INTEGER version=1 | | INTEGER ident | | INTEGER alertID | | NODE host | | ENUM impact | | PROCESS process | | TIME time | +----------------------+ | | 1..*+----------------------+ | |<>---------| CLASSIFICATION | | | +----------------------+ | | | ENUM origin | | | | STRING name | | | | STRING url | | | +----------------------+ | | 0..*+----------------------+ | |<>---------| ADDITIONALDATA | | | +----------------------+ | | | STRING meaning | | | | ENUM type | | | | BYTE[] data | | | +----------------------+ | | 0..*+----------------------+ | |<>---------| TARGET | | | +----------------------+ | | 0..*+----------------------+ | |<>---------| SOURCE | +--------------------+ +----------------------+ /_\ +------------+----------+ | | | +------------------+ | +-----------------+ | TOOLALERT | | | OVERFLOWALERT | |------------------| | |-----------------| |STRING name | | | INTEGER size | |STRING command | | | BYTE buffer | |INTEGER[] alertIDs| | | STRING program | +------------------+ | +-----------------+ +--------------------+ | CORRELATIONALERT | +--------------------+ | INTEGER[] alertIDs | +--------------------+ Figure 2: Data model core Curry Informational - Expires June 2001 [Page 13] Internet Draft IDMEF Data Model and XML DTD December 2000 Figure 2 contains three classes that extend the ALERT class. These three classes have been included here because the information they carry appears frequently during the operation of intrusion-detection systems. Curry Informational - Expires June 2001 [Page 14] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2.1. The ALERT class The ALERT class is the central component of the data model. An IDWG- compliant intrusion-detection analyzer must generate at a minimum this set of information. The ALERT class defines the following attributes: Attribute Type Definition version INTEGER The version of the class hierarchy used. The current version is 1. This attribute is MANDATORY. The default value is 1. alertID INTEGER A serial number for the alert. This number MUST be unique for every alert generated by a given analyzer. This attribute is MANDATORY. The default value is 0 and indicates that the analyzer cannot generate this information reliably. time TIME The time of the alert. The format used is defined in Section 7. This attribute is MANDATORY and does not have a default value. impact ENUM The evaluated impact of the alert on the system. The range of acceptable values is ["unknown", "bad-unknown", "not-suspicious", "attempted- admin", "successful-admin", "attempted-dos", "successful-dos", "attempted-recon", "successful- recon-limited", "successful-recon-largescale", "attempted-user", "successful-user"], further defined in Section 8.2.1. This attribute is MANDATORY. The default value is "unknown" The definition of the acceptable values for the impact attribute is as follows: # keyword Definition 0 unknown Event's impact not known to analyzer 1 bad-unknown Event is unpleasant in an unknown way 2 not-suspicious Event is not suspicious in any way 3 attempted-admin Attempt to obtain administrator access 4 successful-admin Successful compromise of admin access 5 attempted-dos Attempted denial-of-service 6 successful-dos Successful denial-of-service 7 attempted-recon Attempted reconnaissance probe 8 successful-recon-limited Successful reconnaissance probe of limited scope (e.g., one machine) 9 successful-recon- Successful reconnaissance probe of largescale large scale 10 attempted-user Attempt to obtain user-level access 11 successful-user Successful compromise of user access Curry Informational - Expires June 2001 [Page 15] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2.2. The TOOLALERT class The TOOLALERT class carries additional information related to the use of attack tools or Trojan horses. Attribute Type Definition name STRING The reason for grouping the alerts, for example an attack tool name (trinoo). This attribute is MANDATORY. This attribute does not have a default value. command STRING The command or operation that the tool was asked to perform, for example a BackOrifice ping. This attribute is OPTIONAL. The default value is the empty STRING. alerts INTEGER[] The list of alert identifiers that are related to the name. This attribute is OPTIONAL. The default value is the empty list. 4.2.3. The CORRELATIONALERT class The CORRELATIONALERT class carries additional information related to the correlation of alert information. Attribute Type Definition name STRING The reason for grouping the alerts, for example a correlation method. This attribute is MANDATORY. This attribute does not have a default value. alerts INTEGER[] The list of alert identifiers that are related to the name. This attribute is OPTIONAL. The default value is the empty list. 4.2.4. The OVERFLOWALERT class The OVERFLOWALERT class carries additional information related to overflow attacks. These include but are not limited to the infamous buffer overflow attacks when an attacker can overflow a fixed size buffer and have code run under higher privileges than his own. Attribute Type Definition size INTEGER The size of the overflowing buffer. This attribute is MANDATORY. This attribute does not have a default value. program STRING The program that the overflow attempted to run. This attribute is MANDATORY. This attribute does not have a default value. buffer BYTE[] A buffer containing the overflowing data partially or entirely. This attribute is OPTIONAL. The default value is the empty array. Curry Informational - Expires June 2001 [Page 16] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2.5. The ANALYZER class The ANALYZER class identifies the intrusion detection analyzer that provided the alert. At the minimum, this is a unique identifier such as a serial number (unique over the organization where the IDS system is deployed). Additional identification information is provided. Attribute Type Definition ident INTEGER Analyzer identification token. This attribute is MANDATORY. This attribute does not have a default value. This token MUST be unique within the communicating analyzers and managers. host NODE Identification of the equipment on which the analyzer resides. This attribute is OPTIONAL. The default value is EMPTY. process PROCESS Process information concerning the analyzer. This attribute is OPTIONAL. The default value is EMPTY. 4.2.6. The CLASSIFICATION class The CLASSIFICATION class names the vulnerability associated with the alert. One name MUST be provided, and additional, equivalent names MAY be added. Attribute Type Definition origin ENUM The origin of the name. The range of acceptable values is ["unknown", "bugtraqid", "cve", "vendor specific"]. This attribute is MANDATORY. The default value is "unknown". name STRING The associated name. This attribute is MANDATORY. The default value is the empty STRING. url STRING The URL at which the manager can find more information about the alert. This information can consist of a signature pattern, a description of the attack, appropriate countermeasures, or any other information deemed relevant by the vendor. This attribute is MANDATORY. This attribute does not have a default value. The definition of the acceptable values for the origin attribute is as follows: # keyword Definition 0 unknown No origin known. 1 bugtraqid The Bugtraq ID naming scheme [7] 2 cve The Common Vulnerability Enumeration naming scheme [8] 3 vendor-specific A vendor-specific name. Curry Informational - Expires June 2001 [Page 17] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2.7. The ADDITIONALDATA class The ADDITIONALDATA class provides a way to carry vendor-specific or implementation specific information. Attribute Type Definition meaning STRING Optional. A string that describes the meaning of the data in this element. These strings will be implementation-dependent type ENUM The type of the data in this element. The range of acceptable values is ["byte", "boolean", "character", "date", "integer", "ntpstamp", "real", "string", "time"]. This attribute is MANDATORY. This attribute does not have a default value. data BYTE[] The data. The definition of the acceptable values for the type attribute is as follows: # keyword Definition 0 unknown MUST NOT be used. Reserved for future use. 1 byte A single byte 2 character A single character 3 boolean True/false or yes/no 4 integer An integer 5 real A floating-point number 6 date A date string formatted as per Section 7. 7 ntpstamp An NTP timestamp as per Section 7. 8 time A time string formatted as per Section 7 9 string A string of characters Curry Informational - Expires June 2001 [Page 18] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2.8. The TARGET class The TARGET class contains information about the target of the alert, i.e. the recipient of the malicious or anomalous activity that has been spotted by the intrusion detection system. The target itself consists of an identifier and an ENUM indicating whether the target is real or target information is unreliable. The relationship diagram for TARGET is shown in Figure 3. +------------------+ 0..1+---------------------+ | TARGET |<>------------------| NODE | |------------------| +---------------------+ | INTEGER targetID | | ENUM decoy | 0..1+---------------------+ | |<>------------------| USER | | | +---------------------+ | | | | 0..1+---------------------+ | |<>------------------| PROCESS | | | +---------------------+ | | | | 0..1+---------------------+ | |<>------------------| SERVICE | +------------------+ +---------------------+ Figure 3: The Target class The TARGET class defines the following attributes: Attribute Type Definition decoy ENUM Indicates if the data associated with the target is considered real as far as the analyzer can decide, a decoy, or undetermined. The range of acceptable values is ["unknown", "yes", "no"]. This attribute is MANDATORY. The default value is "unknown". targetID INTEGER Target reference token. This token identifies and refers to a previously defined target. This attribute is OPTIONAL. The default value is 0 and indicates that the analyzer does not support this facility. The definition of the acceptable values for the decoy attribute is as follows: # keyword Definition 0 unknown No information 1 yes This is not the true target 2 no This is the true target. Curry Informational - Expires June 2001 [Page 19] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2.9. The SOURCE class The SOURCE class contains information about the possible source or sources of the alert, i.e. the party or parties generating the anomalous data. The source itself contains a reference number (to refer to previously transmitted or defined sources) and an indicator to indicate whether the source is real or spoofed. The relationship diagram for SOURCE is given in Figure 4. +------------------+ 0..1+---------------------+ | SOURCE |<>------------------| NODE | |------------------| +---------------------+ | INTEGER sourceID | | ENUM spoofed | 0..1+---------------------+ | |<>------------------| USER | | | +---------------------+ | | | | 0..1+---------------------+ | |<>------------------| PROCESS | +------------------+ +---------------------+ Figure 4: The SOURCE class The SOURCE class defines the following attributes: Attribute Type Definition spoofed ENUM Indicates if the data associated with the source is considered real as far as the analyzer can decide, a decoy, or impossible to tell. The range of acceptable values is ["unknown", "yes", "no"]. This attribute is MANDATORY. The default value is "unknown". sourceID INTEGER Source reference token. This attribute is OPTIONAL. The default value is 0, meaning that this facility is not supported by the analyzer. The definition of the acceptable values for the decoy attribute is as follows: # keyword Definition 0 unknown No information 1 yes This is the true source 2 no This is not the true source 4.2.10. The support classes The support classes represent entities in the data model that have an important role. Their relationship is described in Figure 5. The following entities have been identified: nodes, processes, users and Curry Informational - Expires June 2001 [Page 20] Internet Draft IDMEF Data Model and XML DTD December 2000 services. In addition, an address entity has been defined to enhance user and node. +---------------+ | IDENT | |---------------| | INTEGER ident | +---------------+ /_\ | +------------------------------------+ | | +------------------+ | +----------------+ | |PROCESS |---+---|NODE | | |------------------+ | |----------------+ 0..*+---------------+ |INTEGER pid | | |STRING name |<>-----|ADDRESS | |STRING name | | |STRING location| |---------------+ |STRING path | | |INTEGER domain | |ENUM category| |STRING[] arguments| | +----------------+ 0..*|STRING address | |STRING[] environ | | +----|STRING netmask | +------------------+ | | +---------------+ | | +------------------+ | +----------------+ | |SERVICE |---+---| USER |<>+ |------------------+ +----------------+ |STRING name | | ENUM category | |INTEGER dport | | | 1..*+---------------+ |INTEGER sport | | |<>-----| USERID | |STRING protocol | | | +---------------+ +------------------+ +----------------+ | ENUM type | /_\ | STRING name | | | INTEGER id | +-------------------+ +---------------+ | | +---------------+ +-------------------+ | WEBSERVICE | | SNMPSERVICE | |---------------| +-------------------+ | STRING url | | STRING Oid | | STRING cgi | | STRING Community | | STRING method | | STRING Command | | STRING args | +-------------------+ +---------------+ Figure 5: Support classes diagram 4.2.10.1. The IDENT class All support classes inherit from the IDENT class. The IDENT class provides a reference to an object predefined by the analyzer and the manager. Instead of sending the complete description of the object, the IDMEF message MAY contain the reference identifier for this object. Curry Informational - Expires June 2001 [Page 21] Internet Draft IDMEF Data Model and XML DTD December 2000 The IDENT class defines the following attributes: Attribute Type Definition ident INTEGER The shared reference number by which the analyzer and the manager identify the object. This attribute is OPTIONAL. The default value is 0 and indicates that the analyzer does not support this facility. The justification for having an ident is twofold. First, this information can serve as a correlation tool. When two intrusion- detection systems have different views of objects (e.g. network based associated with IP addresses and host-based associated with names), providing them with a single identifier will help the manager receiving the alerts understand that they are related to the same object. This is particularly useful for target objects, which are likely to be well identified by the organization deploying the intrusion-detection system. This mechanism is also useful when the same object has multiple interfaces. This also means that every NODE, USER, ADDRESS, PROCESS or SERVICE information can be exchanged with an integer. However, the implementor MUST NOT reuse an identification previously used for an other instance. It is explicitly forbidden to share the same identification for two instances even if they are specializations of two different subclasses (e.g. a NODE and a USER). 4.2.10.2. The ADDRESS class The ADDRESS support class carries address information. The address in question can be for example a network address, a hardware address or an application address. Attribute Type Definition category ENUM The kind of address information. The range of acceptable values is ["unknown", "atm", "e-mail", "lotus-notes", "mac", "sna", "vm", "ipv4-addr", "ipv4-addr-hex", "ipv4-net", "ipv4-net-mask", "ipv6-addr", "ipv6-net", "ipv6 network address", "ipv6-net-mask"]. This attribute is MANDATORY. This attribute does not have a default value. address BYTE[] The address information itself. The type information MUST allow transcription of the byte string into its original format. This attribute is MANDATORY. This attribute does not have a default value. netmask BYTE[] The netmask information, if appropriate (depends on the category attribute). This attribute is OPTIONAL. The default value is the empty array. Curry Informational - Expires June 2001 [Page 22] Internet Draft IDMEF Data Model and XML DTD December 2000 The definition of the acceptable values for the category attribute is as follows: # keyword Definition 0 unknown Type not known [SHOULD be avoided] 1 atm Asynchronous Transfer Mode network address 2 e-mail Internet electronic mail address (RFC 822) 3 lotus-notes Lotus Notes address 4 mac Media Access Control (MAC) address 5 sna IBM Shared Network Architecture (SNA) address 6 vm IBM "VM" (PROFS) electronic mail address 7 ipv4-addr IPv4 host address in dotted-decimal notation (aaa.bbb.ccc.ddd 8 ipv4-addr-hex IPv4 host address in hexadecimal 9 ipv4-net IPv4 network address in dotted-decimal notation, slash, significant bits (aaa.bbb.ccc.ddd/nn) 10 ipv4-net-mask IPv4 network address and associated network mask. 11 ipv6-addr IPv6 host (equipment) address 12 ipv6-net IPv6 network address 13 ipv6-net-mask IPv6 network address and associated network mask. 4.2.10.3. The USER class The USER class indicates the different ways by which a user can be identified. It contains information about which kind of user it is, and then one or multiple USERID objects which contain the actual user information. +---------------+ 1..n+--------------+ | USER |<>------| USERID | +---------------+ +--------------+ | ENUM category | | ENUM type | | | | STRING name | +---------------+ | INTEGER id | +--------------+ Attribute Type Definition Category ENUM The category of the user, representing the scope of user information. The range of acceptable values is ["unknown", "application", "os-device"]. The default value is "unknown". The definition of the acceptable values for the category attribute is as follows: Curry Informational - Expires June 2001 [Page 23] Internet Draft IDMEF Data Model and XML DTD December 2000 # keyword Definition 0 unknown User category unknown. SHOULD be avoided. 1 application User identity at the application level. This is for example an HTTP authentication or an Exchange user name. 2 os-device This is a login on a distributed network of workstations or equipment. The USERID class defines the following attributes: Attribute Type Definition type ENUM The type of user information carried by the USERID object. The range of acceptable values is ["original-user", "current-user", "target-user", "user-privs", "current-group", "group-privs"]. The default value is "original-user". name STRING A string representing the user by name. number INTEGER A numerical representation of the user, such as a UNIX id. Note that there are constraints on the type of USERID associated with the USER. Only one of each "original-user", "current-user", "target- user" or "current-group" may be specified with a given USER. Multiple USERIDs of type "user-privs" and "group-privs" are allowed. The definition of the acceptable values for the category attribute is as follows: Curry Informational - Expires June 2001 [Page 24] Internet Draft IDMEF Data Model and XML DTD December 2000 # keyword Definition 0 unknown MUST NOT be used, reserved for future use. 1 original-user The actual identity of the user or process being reported on. On those systems that (a) do some type of auditing and (b) support extracting a user id from the "audit id" token, that value should be used. On those systems that do not support this, and where the user has logged into the system, the "login id" should be used. 2 current-user The current user id being used by the user or process. On Unix systems, this would be the "real" user id, in general. 3 target-user The user id the user or process is attempting to become. This would apply, for example, when the user attempts to use "su," "rlogin," "telnet," etc. 4 current-group The current group id (if applicable) being used by the user or process. On Unix systems, this would be the "real" group id, in general. 5 user-privs Other user ids the user or process has the ability to use. On Unix systems, this would be the "effective" user id. A list is allowed, for operating systems/applications that allow more than one 6 group-privs Other group ids (if applicable) the user or process has the ability to use. On Unix systems, this would be the "effective" group id. On BSD-derived Unix systems, it would also include all the group ids on the "group list." 4.2.10.4. The NODE class The NODE class indicates the different ways by which a host or equipment on the network can be identified. Attribute Type Definition name STRING The machine fully qualified domain name. This attribute is MANDATORY unless associated ADDRESS information is provided. The default value is the empty STRING. location STRING The location of the equipment. This attribute is optional. The default value is the empty STRING. domain ENUM The domain to which the equipment belongs, if relevant. This attribute is OPTIONAL. Acceptable values for the enumerated type are [unknown, ads, afs, coda, dfs, dns, kerberos, nds, nis, nisplus, nt, wfw]. The default value is unknown The definition of the acceptable values for the domain attribute is as follows: Curry Informational - Expires June 2001 [Page 25] Internet Draft IDMEF Data Model and XML DTD December 2000 # keyword Definition 0 unknown No relevant domain 1 ads Windows 2000 ADS 2 afs Andrew File System 3 coda CODA distributed file system 4 dfs DFS distributed file system 5 dns Domain Name System 6 kerberos Kerberos realm 7 nds Novell Netware 8 nis Network Information Service (Yellow Page) 9 nisplus Network Information Services Plus 10 nt Windows NT domain 11 wfw Windows for Workgroups 4.2.10.5. The PROCESS class The PROCESS class gathers information about the process that is being run. Attribute Type Definition name STRING The name of the program being run. This is a short name, e.g. sendmail or explorer. Options and path information are provided by additional attributes. This attribute is MANDATORY. This attribute does not have a default value. pid INTEGER The process identifier of the process being run. This attribute is OPTIONAL. The default value is 0. path STRING The path of the program. This attribute is OPTIONAL. The default value is the empty STRING. Provision of the most meaningful path information is left to the appreciation of the implementer in this version of the data model. One could nevertheless imagine that it SHOULD include the server name in a Windows NT environment, or MAY include the mount point in a Unix environment. arguments STRING[] The arguments passed to the program or the system call, in the order in which they appear on the command line. This attribute is OPTIONAL. The default value is the empty array. This version of the data model does not differentiate between command line flags (e.g. -x) and their associated values. environ STRING[] The environment strings with which the process is being run. This argument is optional. The default value is the empty array. For example, the string "PATH=/bin:/usr/bin" is an acceptable value. Curry Informational - Expires June 2001 [Page 26] Internet Draft IDMEF Data Model and XML DTD December 2000 4.2.11. The SERVICE class The SERVICE class identifies a network service request being carried out over the network. In particular, this class should be used to report not only open services, but also connections and connections attempts. To do so, the class provides for identification of the source port from which the connection originated. In general, a service is a resource available from the network. This SERVICE class is also related to process and user information. Process and user information are aggregated at the source or target. Attribute Type Definition name STRING The name of the service. The name of the service is independent of the destination port (dport attribute). A combination of "name=http" and "dport=8080" is perfectly valid. Implementers MUST use the name listed in the IANA list of well-known ports if applicable. dport INTEGER The port to which the connection request is addressed. In many situations, this will be a well-known port in the IANA list, associated with a name. sport INTEGER The source port from which the connection originated. In many situations, this will be a high-numbered port. protocol STRING The name of the protocol used. There should be more information concerning the protocol. The service name does not necessarily matches the application level protocol used to interpret the event, one may connect using TELNET to an HTTP port; there may be protocol encapsulation. The original meaning of the protocol was tcp/udp, now we need to create a class with a hierarchy of levels, IP/ICMP/ARP/EGP/OSPF/...; TCP/UDP/?; other on top, same thing as name ?/is there something on top of application-layer protocols ? 4.2.11.1. The WEBSERVICE class The WEBSERVICE class carries additional information related to web traffic. Note that the data model does not enforce coherence between the usage of this class and the information contained in the Service class, because the two can be unrelated (examples of ports used for web traffic include but are not limited to 80, 443, 8080, 8484 and 8888). Curry Informational - Expires June 2001 [Page 27] Internet Draft IDMEF Data Model and XML DTD December 2000 Attribute Type Definition url STRING The URL in the request. This attribute is MANDATORY. This attribute does not have a default value. cgi STRING The CGI script in the request (without arguments). This attribute is OPTIONAL. The default value is the empty STRING. args STRING The arguments passed to the cgi script. This attribute is OPTIONAL. The default value is the empty STRING. method STRING The method used for the request. This attribute is OPTIONAL. The default value is the empty STRING. 4.2.11.2. The SNMPSERVICE class The SNMPSERVICE class carries additional information related to SNMP traffic. Note that the data model does not enforce coherence between the usage of this class and the information contained in the Service class, because the two can be unrelated. Attribute Type Definition oid STRING The object identifier used for the request. This attribute is OPTIONAL. The default value is the empty STRING. community STRING The object's community string. This attribute is OPTIONAL. The default value is the empty STRING. command STRING The command sent to the SNMP daemon (e.g. GET, SET, ...). This attribute is OPTIONAL. The default value is the empty STRING. Note that even though it is possible to generate an alert of class SNMPSERVICE with only default values, analyzers MUST NOT do that. 4.3. Extension of the data model It is expected that the model will have to be extended by vendors to carry additional information relevant to the alerts they need to transport. When a manager receives information from an analyzer that it cannot understand, the unknown information MUST be ignored until the manager has been enriched with the appropriate data definition and semantic. When the vendor extensions mature, they can be incorporated in the data model. Depending on the kind of extension needed, two mechanisms can be used. Note that these mechanisms extend the data model only, and that additional mechanisms are provided in the XML DTD to transport additional information which does not fit in the current data model, see section 8.7. Curry Informational - Expires June 2001 [Page 28] Internet Draft IDMEF Data Model and XML DTD December 2000 4.3.1. Extension by aggregation Extension by aggregation consists of aggregating a new class to one of the existing classes of the data model. This is the mechanism used for example to associate the NAME class with the ALERT class. This type of extension allows propagation of the additional information to all alerts sent by the analyzer that uses the extension. Two methods for realizing this type of extension for the XML DTD are described in Sections 10.2 and 10.3. For example, if an analyzer decides to send the time of the analyzer in addition to the time already stored in the alert, the IDS vendor defines a new class aggregated with the ALERT class that carries the appropriate time information. The model would then look like Figure 6. +----------+ 1+--------------+ | ALERT |<>------| ANALYZER | | | +--------------+ | | | | 1..*+--------------+ | |<>------|CLASSIFICATION| | | +--------------+ | | | | 1..*+--------------+ | |<>------| EXTRATIME | | | +--------------+ | | | | 0..*+--------------+ | |<>------| TARGET | | | +--------------+ | | | | 0..*+--------------+ | |<>------| SOURCE | +----------+ +--------------+ Figure 6: Insertion of the EXTRATIME class 4.3.2. Extension by subclassing The other extension possibility consists of specializing one of the classes defined by the model. This is the mechanism used for example to specialise the SERVICE class into the WEBSERVICE class, or the ALERT class into the TOOLALERT class. This is the preferred mode of extension because it not only preserves the data structure, it also preserves the operations executed on them (i.e. the methods). Curry Informational - Expires June 2001 [Page 29] Internet Draft IDMEF Data Model and XML DTD December 2000 5. Arguments for the realization of the class hierarchy in XML 5.1. The Extensible Markup Language The Extensible Markup Language (XML) [3] is a simplified version of the Standard Generalized Markup Language (SGML), a text markup syntax defined by the ISO 8879 standard. XML is gaining widespread attention as a language for representing and exchanging documents and data on the Internet, and as the solution to most of the problems inherent in HyperText Markup Language (HTML). XML was published as a recommendation by the World Wide Web Consortium (W3C) on February 10, 1998. XML is a metalanguage -- a language for describing other languages -- that enables an application to define its own markup. XML allows the definition of customized markup languages for different types of documents and different applications. This differs from HTML, in which there is a fixed set of tags with preset meanings that must be "adapted" for specialized uses. Both XML and HTML use tags (identifiers delimited by '<' and '>') and attributes (of the form "name='value'"). But where "

" always means "paragraph" in HTML, it may mean "paragraph," "person," "price," or "platypus" in XML, or it might have no meaning at all, depending on the particular application. The publication of XML was followed by the publication of a second recommendation [9] by the World Wide Web Consortium, defining the use of namespaces in XML documents. An XML namespace is a collection of names, identified by a Universal Resource Identifier (URI). It allows documents of different types, that use tags with the same names, to be merged with no confusion. When using namespaces, each tag is identified with the namespace it comes from, allowing tags from different namespaces with the same names to occur in the same document. For example, a single document could contain both "usa:football" and "europe:football" tags, each with different meanings. In anticipation of the widespread use of XML namespaces, this memo includes the definition of the URI to be used to identify the IDMEF namespace. 5.2. Rationale for Implementing IDMEF in XML XML-based applications are being used or developed for a wide variety of uses, including electronic data interchange in a variety of fields, financial data interchange, electronic business cards, calendar and scheduling, enterprise software distribution, web "push" technology, and markup languages for chemistry, mathematics, music, molecular dynamics, astronomy, book and periodical publishing, web publishing, weather observations, real estate transactions, and many others. Curry Informational - Expires June 2001 [Page 30] Internet Draft IDMEF Data Model and XML DTD December 2000 XML's flexibility makes it a good choice for these applications; that same flexibility makes it a good choice for implementing the IDMEF as well. Other, more specific reasons for choosing XML to implement the IDMEF are: + XML allows a custom language to be developed specifically for the purpose of describing intrusion detection alerts. It also defines a standard way to extend this language, either for later revisions of this document ("standard" extensions), or for vendor-specific use ("non-standard" extensions). + Software tools for processing XML documents are widely available, in both commercial and open source forms. A variety of tools and APIs for parsing and/or validating XML are available in a variety of languages, including Java, C, C++, Tcl, Perl, Python, and GNU Emacs Lisp. Widespread access to tools will make adoption of the IDMEF by product developers easier, and hopefully, faster. + XML meets IDMEF Requirement 5.1, that message formats support full internationalization and localization. The XML standard specifies support for both the UTF-8 and UTF-16 encodings of ISO 10646 (Unicode), making IDMEF compatible with both one- and two-byte character sets. XML also provides support for specifying, on a per-element basis, the language in which the element's content is written, making IDMEF easy to adapt to "Natural Language Support" versions of a product. + XML meets IDMEF Requirement 5.2, that message formats must support filtering and aggregation. XML's integration with XSL, a style language, allows messages to be combined, discarded, and rearranged. + Ongoing XML development projects, in the W3C and elsewhere, will provide object-oriented extensions, database support, and other useful features. If implemented in XML, the IDMEF immediately gains these features as well. + XML is free, with no license, no license fees, and no royalties. 5.3. Relationship to the IDMEF Class Hierarchy This implementation follows the model described in Section 4 almost exactly, with the following exceptions and restrictions: + XML tags have the names given to the various classes in the model, with a few minor exceptions where changes were made to deal with XML scoping rules or to increase consistency with the rest of the implementation. + XML does not support "inheritance;" tags may only be used at the level at which they are declared. Subclasses are implemented by Curry Informational - Expires June 2001 [Page 31] Internet Draft IDMEF Data Model and XML DTD December 2000 making the tags for those classes child tags of the tags for the parent classes. + Some extensions have been made, represented by the following elements: , , , , and . These changes make little difference in the overall usefulness of the model, or XML as an implementation language. Curry Informational - Expires June 2001 [Page 32] Internet Draft IDMEF Data Model and XML DTD December 2000 6. Use of XML in the IDMEF This section describes how some of XML's features and requirements will impact the IDMEF. 6.1. The IDMEF Document Prolog The "prolog" of an IDMEF document, that part that precedes anything else, consists of the XML declaration and the document type declaration. 6.1.1. XML Declaration Every XML document (and therefore every IDMEF document) starts with an XML declaration. The XML declaration specifies the version of XML being used; it may also specify the character set being used. The XML declaration looks like: If a character encoding is specified, the declaration looks like: where "charset" is the name of the character encoding in use (see section 6.2). If no encoding is specified, UTF-8 is assumed. IDMEF documents being exchanged between IDMEF applications MUST begin with an XML declaration, and MUST specify the XML version in use. Specification of the encoding in use is RECOMMENDED. IDMEF applications MAY choose to omit the XML declaration internally to conserve space, adding it only when the message is sent to another destination (e.g., a web browser). This practice is NOT RECOMMENDED unless it can be accomplished without loss of each message's version and encoding information. 6.1.2. XML Document Type Definition (DTD) The Document Type Definition (DTD) specifies the exact syntax of an XML document. It defines the various tags that may be used in the document, how the tags are related to each other, which tags are mandatory and which are optional, and so forth. The IDMEF Document Type Definition is listed in its entirety in Section 11. It is expected that IDMEF applications will not normally include the IDMEF DTD itself in their communications. Instead, the DTD will be Curry Informational - Expires June 2001 [Page 33] Internet Draft IDMEF Data Model and XML DTD December 2000 referenced in the document type declaration in the document entity (see below). Such IDMEF documents will be well-formed and valid as defined in [3]. Other IDMEF documents will be specified that do not include the document prolog (e.g., entries in an IDMEF-format database). Such IDMEF documents will be well-formed but not valid. Generally, well-formedness implies that a document has a single element that contains everything else (e.g., ""), and that all the other elements nest nicely within each other without any overlapping (e.g., a "chapter" does not start in the middle of another "chapter"). Validity further implies that not only is the document well-formed, but it also follows specific rules (contained in the Document Type Definition) about which elements are "legal" in the document, how those elements nest within other elements, and so on (e.g., a "chapter" does not begin in the middle of a "title"). A document CANNOT be valid unless it references a DTD (see Section 6.1.4). XML processors are required to be able to parse any well-formed document, valid or not. The purpose of validation is to make the processing of that document (what's done with the data after it's parsed) easier. Without validation, a document may contain elements in nonsense order, elements "invented" by the author that the processing application doesn't understand, and so forth. IDMEF documents MUST be well-formed. IDMEF documents SHOULD be valid whenever both possible and practical. 6.1.3. IDMEF DTD Formal Public Identifier The formal public identifier (FPI) for the Document Type Definition described in this memo is: "-//IETF//DTD RFCxxxx IDMEF v0.1//EN" NOTE: The "RFCxxxx" text in the FPI value will be replaced with the actual RFC number, if this memo is published as an RFC. This FPI MUST be used in the document type declaration within an XML document referencing the DTD defined by this memo, as shown in the following section. 6.1.4. IDMEF DTD Document Type Declaration Curry Informational - Expires June 2001 [Page 34] Internet Draft IDMEF Data Model and XML DTD December 2000 The document type declaration for an XML document referencing the DTD defined by this memo will usually be specified in one of the following ways: The last component of the document type declaration is the formal public identifier (FPI) specified in the previous section. The last component of the document type declaration is a URL that points to a copy of the Document Type Definition. To be valid (see above), an XML document must contain a document type declaration. However, this represents significant overhead to an IDMEF application, both in the bandwidth it consumes as well as the requirements it places on the XML parser (not only to parse the declaration itself, but also to parse the DTD it references). Implementers MAY decide, therefore, to have analyzers and managers agree out-of-band on the particular document type definition they will be using (the standard one as defined here, or one with extensions), and then omit the document type declaration from IDMEF messages. Great care must be taken in doing this however, as the manager may have to accept messages from analyzers using DTDs with different sets of extensions. 6.2. Character Data Processing in XML and IDMEF The XML standard requires that XML processors support the UTF-8 and UTF-16 encodings of ISO 10646 (Unicode), making XML compatible with both one- and two-byte character sets. While many XML processing applications may support other character sets, only UTF-8 and UTF-16 can be relied upon from a portability viewpoint. A document's XML declaration (see section 6.1.1) specifies the character encoding to be used in the document, as follows: where "charset" is the name of the character set, as registered with the Internet Assigned Numbers Authority (IANA), see [10]. Consistent with the XML standard, if no encoding is specified for an IDMEF message, UTF-8 SHALL be assumed. IDMEF applications MUST NOT use, and IDMEF messages MUST NOT be encoded in, character encodings other than UTF-8 and UTF-16. Note that since ASCII is a subset of UTF-8, it MAY be used to encode IDMEF messages. Curry Informational - Expires June 2001 [Page 35] Internet Draft IDMEF Data Model and XML DTD December 2000 Per the XML standard, IDMEF documents encoded in UTF-16 MUST begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and Unicode Appendix B (the "ZERO WIDTH NO-BREAK SPACE" character, #xFEFF). 6.2.1. Character Entity References Within XML documents, certain characters have special meanings in some contexts. To include the actual character itself in one of these contexts, a special escape sequence, called an entity reference, must be used. The characters that sometimes need to be escaped, and their entity references, are: Character Entity Reference --------------------------------- & & < < > > " " ' ' It is RECOMMENDED that IDMEF applications use the entity reference form whenever writing these characters in data, to avoid any possibility of misinterpretation. 6.2.2. Character Code References Any character defined by the ISO/IEC 10646 standard may be included in an XML document by the use of a character reference. A character reference is started with the characters '&' and '#', and ended with the character ';'. Between these characters, the character code for the character inserted. If the character code is preceded by an 'x' it is interpreted in hexadecimal (base 16), otherwise, it is interpreted in decimal (base 10). For instance, the ampersand (&) is encoded as & or & and the less-than sign (<) is encoded as < or <. Any one- or two-byte character specified in the Unicode standard can be included in a document using this technique. 6.2.3. White Space Processing XML preserves white space by default. The XML processor passes all white space characters to the application unchanged. This is much different from HTML (and SGML), in which, although the space/no space Curry Informational - Expires June 2001 [Page 36] Internet Draft IDMEF Data Model and XML DTD December 2000 distinction is meaningful, the one space/many spaces distinction is not. XML allows tags to identify the importance of white space in their content by using the "xml:space" attribute: where "action" is either "default" or "preserve." If "action" is "preserve," the application MUST treat all white space in the tag's content as significant. If "action" is "default," the application is free to do whatever it normally would with white space in the tag's content. The intent declared with the "xml:space" attribute is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of "xml:space" on another element within that content. All IDMEF tags support the "xml:space" attribute. 6.3. Languages in XML and IDMEF XML allows tags to identify the language their content is written in by using the "xml:lang" attribute: where "langcode" is a language tag as described in RFC 1766 [11]. The intent declared with the "xml:lang" attribute is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of "xml:lang" on another element within that content. IDMEF applications SHOULD specify the language in which their contents are encoded; in general this can be done by specifying the "xml:lang" attribute for the top-level tag. If no language is specified for an IDMEF message, English SHALL be assumed. All IDMEF tags support the "xml:lang" attribute. 6.4. Unrecognized Tags in IDMEF Messages On occasion, an IDMEF application may receive a well-formed, or even well-formed and valid, IDMEF message containing tags that it does not understand. The tags may be either: Curry Informational - Expires June 2001 [Page 37] Internet Draft IDMEF Data Model and XML DTD December 2000 + Recognized as "legitimate" (a valid document), but the application does not know the semantic meaning of the tag's content; or + Not recognized at all. IDMEF applications MUST continue to process IDMEF messages that contain unknown tags, provided that such messages meet the well- formedness requirement of Section 6.1.2. It is up to the individual application to decide how to process any content from the unknown tag(s). 6.5. Digital Signatures The joint IETF/W3C XML Signature Working Group is currently working to specify XML digital signature processing rules and syntax [12]. XML Signatures provide integrity, message authentication, and/or signer authentication services for data of any type, whether located within the XML that includes the signature or elsewhere. The IDMEF requirements document assigns responsibility for message integrity and authentication to the communications protocol, not the message format. However, in situations where IDMEF messages are exchanged over other, less secure protocols, or in cases where the digital signatures must be archived for later use, the inclusion of digital signatures within an IDMEF message itself may be desirable. Specifications for the use of digital signatures within IDMEF messages are outside the scope of this document. However, use of the XML Signature standard is RECOMMENDED if such functionality is needed. Curry Informational - Expires June 2001 [Page 38] Internet Draft IDMEF Data Model and XML DTD December 2000 7. IDMEF Data Types XML is a typeless language; everything is simply a stream of bytes, and it is left to the application to extract meaning from them. That being said, this specification makes the following rules to allow interoperability in interpreting IDMEF documents: 1. Integer data MUST be encoded in either Base 10 or Base 16. Base 10 encoding uses the digits '0' through '9' and an optional sign ('+' or '-'). Base 16 encoding uses the digits '0' through '9' and 'a' through 'f' (or their upper case equivalents), and is preceded by the characters "0x". For example, the number one hundred twenty- three would be encoded as "123" in Base 10, or "0x7b" in Base 16. 2. Floating-point (real) data MUST be encoded in Base 10. The encoding is that of the POSIX strtod() function: an optional sign ('+' or '-') followed by a non-empty sequence of digits optionally containing a radix character, then an optional exponent part. An exponent part consists of an 'e' or 'E', followed by an optional sign, followed by one or more decimal digits. 3. Character and character string data does not require quoting, as the IDMEF tags provide that functionality. 4. Dates, as used in the element, MUST be encoded as a four- digit year, two-digit month, and two-digit day, separated by dashes. The two-digit day and its corresponding dash MAY be omitted to represent an entire month. For example, March 13, 2000 would be encoded as "2000-03-13", and December, 1999 would be encoded as "1999-12". 5. Time of day, as used in the

, , , and elements MUST be unique for each particular combination of elements and sub-elements. Each time the same entity is identified in the same way, this number SHOULD be the same. Note, though, that if the same entity is identified in two different ways (e.g., once by host name and once by IP address), two different id numbers MUST be generated. A value of 0 indicates that the analyzer is not capable of identifying these entities uniquely. - An easy, albeit overly complex, way to accomplish this would be to compute the cryptographic checksum of the element and its sub-elements.) - The above does not apply to the "alertid" attribute of the element, the "heartbeatid" attribute of the element, or the "ident" attribute of the attribute, which have different rules. Curry Informational - Expires June 2001 [Page 40] Internet Draft IDMEF Data Model and XML DTD December 2000 8. Structure of an IDMEF Message This section describes the individual elements and attributes that make up an IDMEF message. It is organized in a somewhat "top down" manner, in that the more significant elements are described first, followed by the less significant ones. A description of the element is provided, followed by a list of the element's attributes, and then a list of the sub-elements that the element may contain. For attributes, the notation "required" indicates that a value for the attribute must be specified, while "optional" indicates that a value is not required. All optional attributes have default values that the manager (XML parser) will assume if no other value is provided. For sub-elements, the number of times the element may occur is given. Possible values are "exactly one," "zero or one," "zero or more," and "one or more." Except as stated otherwise, sub-elements must occur in the order shown. 8.1. The IDMEF-Message Root Element An IDMEF message (document) contains one or more alerts and other message types (see below). The element represents this; all other elements are sub-elements of . Put another way, is the root element of an IDMEF document. The element has one attribute: version - Optional. The version of the IDMEF message specification this message conforms to; messages conforming to the format described in this memo MUST use "0.1" as the value for this attribute. The element may contain the following sub-elements, in any order: - Zero or more. Description of an intrusion detection alert. The data model specifies the contents of this element type. - Zero or more. Data about the "health" of the analyzer. This is an extension element, not specified in the data model. 8.2. The Message Type Elements Curry Informational - Expires June 2001 [Page 41] Internet Draft IDMEF Data Model and XML DTD December 2000 There are two types of IDMEF message: , and . These elements are sub-elements of the element. 8.2.1. Alert The element implements the ALERT class described in Section 4.2.1. It is used to describe an alert. It contains the name of the analyzer that generated the alert, the event that caused the alert to be generated, and information about the source(s) and target(s) of the event. The element has the following attributes: alertid Required. A serial number for the alert. This number MUST be unique for every alert generated by a given analyzer. The default value is 0, which indicates that the analyzer cannot provide this information reliably. impact Required. The evaluated impact of the event on the system. The list of acceptable values is [unknown, bad-unknown, not- suspicious, attempted-admin, successful-admin, attempted-dos, successful-dos, attempted-recon, successful-recon-limited, successful-recon-largescale, attempted-user, successful-user], refer to Section 4.2.1 for the significance of the keywords. version Required. The version of the class hierarchy used. Messages conforming to the format described in this memo MUST use "1" as the value for this attribute. The element has the following sub-elements: