CIDF Specification: Version 0.6 Page 1 THE COMMON INTRUSION DETECTION FRAMEWORK CIDF working group http://seclab.cs.ucdavis.edu/cidf/members.html Contents 0: Preamble 0.1 Introduction 0.2 Organization of this document 1 Architecture 1.1 Introduction 1.2 Functional decomposition (E-boxes, A-boxes etc). 1.3 Layering scheme 1.4 Naming and locating components 2 Gidos and S-expressions 2.1 Introduction to the gido format 2.2 Gido Requirements and Rationale 2.3 GIDO S-expression format 2.4 Parts of a GIDO payload 2.5 Detailed Examples 2.6 Rules and Guidelines for Defining SIDs 2.7 Example CIDF Module GIDO Sets 2.8 Negotiation 3 Encoding Gidos in Bytes 3.1 Introduction 3.2 Gido header 3.3 S-expression encoding 4 CIDF Communication 4.1 CIDF message layer formats 4.2 CIDF Message Processing 4.3 CIDF Directory Services 5 APIs 5.1 Introduction 5.2 General APIs 5.3 Crypto APIs 5.4 Event generator API 5.5 Event analyzer API 5.6 GIDO database API 5.7 Response unit API A. Primitive Type Definitions B. SIDS List C. LDAP Background D. Conformance Profiles CIDF Specification: Version 0.6 Page 2 ======================================================================== = 0.1: Introduction to CIDF ======================================================================== The goal of the Common Intrusion Detection Framework is a set of specifications which allow * different intrusion detection systems to inter-operate and share information as richly as possible, * components of intrusion detection systems to be easily re-used in contexts different from those they were designed for. The CIDF working group came together originally in January 1997 at the behest of Teresa Lunt at DARPA in order to develop standards to accomplish the goals outlined in the previous section. She was particularly concerned that the various intrusion detection efforts she was funding be usable and reusable together and have lasting value to customers of intrusion detection systems. During the life of the effort, it became clear that this was of wider value than just to DARPA contractors, and the group was broadened to include representatives from a number of government, commercial, and academic organizations. After the first few months, membership in the CIDF working group was open to any individuals or organizations that wished to contribute. No cost was involved (except to defray meeting expenses). Major decisions were made at regular (every few months) meetings of the working group. Those decisions were made by rough consensus of all attendees. That is, the meeting facilitator attempted to reach consensus, but in situations where only one or two individuals were protesting a decision, they were overruled in the interest of efficiency. No decisions were taken in the face of opposition from a sizeable minority, rather the issue was tabled for further consideration. Meetings were fun and the working group had a good time doing this (well, most of them, anyway). In between meetings, most of the writing was done by small subgroups or individuals. Their text was brought back for approval/changes at meetings. Discussions were also carried on in the working group mailing list, but few decisions were made that way. The CIDF working group is now seeking to become an IETF working group. CIDF Specification: Version 0.6 Page 3 ======================================================================== = 0.2: Organization of the CIDF Spec ======================================================================== This section describes the organization of the CIDF specification as it appears in the rest of this document. CIDF basically consists of the following things: 1) A set of architectural conventions for how different parts of intrusion detection systems can be modeled as CIDF components. 2) A way to represent gidos (generalized intrusion detection objects). Gidos can * describe events that have happened in the systems mo by an IDS, * instruct an ids to carry out some action * query an ids as to what has happened. * describe and IDS component. 3) A way to encode gidos into streams of bytes suitable for transmission over a network or storage in a file. 4) Protocols for CIDF components to find each other over a network and exchange gidos. 5) Application Programming Interfaces to re-use CIDF components. Each of these major areas thus forms one section (numbered as shown above) of this document. The organization of the individual sections is described at the front of that section. 0.2.1: Format This document complies with the requirements for RFC 1543, the format for ASCII Internet RFCs. In summary, this means that lines are at most 72 characters long and that they are terminated with a carriage-return, line-feed pair. Pages are at most 58 lines long and are terminated with a form-feed character. Paragraphs are single spaced and are separated by blank lines. Lines in the text beginning with "#" denote editorial comments which should be removed before the final version. The document is also divided into sections which are further divided into subsections, subsubsections, and so on. The numbering convention is as "3.4.1", which describes the first subsubsection of the fourth subsection of the third section. Appendices are lettered, and so an Appendix subsection might be B.4.2. CIDF Specification: Version 0.6 Page 4 ======================================================================== ======================================================================== = = 1: CIDF Architecture = ======================================================================== ======================================================================== = ======================================================================== = 1.1: Introduction ======================================================================== This section introduces the architectural framework that CIDF assumes will structure an intrusion detection system. This scheme is basically a framework around which interfaces and the communication protocols are organized. It is not mandated that CIDF-conformant intrusion detection systems must be organized in exactly this way. But they must support interfaces that are so organized. Section 1.2 introduces the various different kinds of components that CIDF believes are needed in IDS systems. Section 1.3 covers the communication layering scheme, and section 1.4 discusses how components are named and located. CIDF Specification: Version 0.6 Page 5 ======================================================================== = 1.2: CIDF Functional Decomposition ======================================================================== All CIDF components deal in *gidos* (generalized intrusion detection objects) which are represented via a standard common format. Gidos are data that is moved around in the intrusion detection system. Gidos can represent events that occurred in the system, analysis of those events, prescriptions to be carried out, or queries about events. CIDF defines four interfaces that CIDF components may implement: || Push-style | Pull-style =========++========================+======================= || Produces gidos when it | Produces gidos Producer || wants to, typically in | when queried. || response to events. | ---------++------------------------+----------------------- || Mates with push-style | Mates with pull- Consumer || producer. | style producer. || | Each of these interfaces takes two forms: a callable form, which permits reuse of the component, and a protocol form, which permits the component to interoperate with other CIDF components. CIDF defines several types of preferred components: * Event generators * Analyzers * Databases * Response units Figure 1.1 presents a schematic view of these components in a hypothetical intrusion detection system. The solid boxes labeled E1, E2, A1, A2, D, etc represent the various components of some hypothetical intrusion detection system. It is convenient to think of these as objects in the object-oriented programming sense (this does not dictate an implementation in an object-oriented language or framework). CIDF Specification: Version 0.6 Page 6 | | | ,-------|--------|---------|---------. | | | | | | V V V | | ,------. ,------. ,------. | | | E1 | | E2 | | E3 | | | `------' `------' `------' | | ^ ^ ^ | | | | ,------' | | | ,-------' | | | | | ,---------' ,------. | | V V V ,------>| A1 | | <---------->| ,------. | `------' | | | C |<-----' | | | |<-----. | | `------' | ,------. | | ^ ^ `------>| A2 | | | | | `------' | | | `-----------. | | | | | | V V | | ,------. ,------. | | | D | | R | | | `------' `------' | | | | `--------------------|---------------' | V Figure 1.1: Types of CIDF components Whether the individual components are separate processes or images, or merely conceptually separate parts of the code in a single image is not specified - both possibilities are covered by the CIDF specification. CIDF allows for components to be aggregated together to masquerade as a single component. In other words, a large number of (possibly distributed) components can be tied together and present themselves to the outside world through a single CIDF interface. ##################################################################### # # Stuart comment: # It is not clear at present how this last requirement is to # be achieved. # ##################################################################### 1.2.1 Matchmaking Service CIDF Specification: Version 0.6 Page 7 The gray box (labelled C in Figure 1.1) represents the configuration and directory services that tie components together via their standard CIDF interfaces. These are collectively termed the CIDF "matchmaker". A component initiating communication may avoid using the matchmaker if the component knows how to address its target directly, or if it uses broadcast or other (non-CIDF) means to do so. Otherwise, the matchmaker allows a component either to look up its target by name or to derive its communication "partners" by looking up "gido classes". Gido classes specify types of data that may be exchanged between components. Components that wish to receive certain kinds of gidos describe what they want; components producing event records describe what it is they produce. The matchmaker then takes care of associating GIDO producers with appropriate GIDO consumers. In this mode of use, components are thus relieved of the burden of identifying or locating their partners in the intrusion-detection system. 1.2.2 Event Generators The boxes labelled Ei in Figure 1.1 are event generators. Their role is to obtain events from the larger computational environment outside the intrusion detection system (symbolized by the fat arrows coming from outside the dashed box), and provide them in the CIDF standard gido format to the rest of the system. For example, event generators might be simple filters that take C2 audit trails and convert them into the standard format. Another event generator may passively monitor a network and generate events based on the traffic thereon. A third might be application code in an SQL database program which generates events describing database transactions. It seems that event generators are likely to be reusable in that CIDF has a standard data format, and so converting features of typical computational environments into that format will be a task that many groups will need to perform. Hence, it is useful to specify a preferred way to configure and use event generators. Preferred event generators implement the push-style producer interface. They create only gidos describing raw events, not gidos describing analyses or prescriptions. Preferred event generators provide events as soon as they occur (with the possible exception of transport queuing). Storage of events is handled in gido databases. 1.2.2 Event Analyzers Analyzers are labeled by Ai in Figure 1.1. They are the components we typically think of in the intrusion detection context. They obtain gidos from other components, analyze them, and return new gidos (which hopefully represent some kind of synthesis or summary of the input). CIDF Specification: Version 0.6 Page 8 Thus for example, an analyzer might be a statistical profiling tool that examines whether events being supplied to it now are statistically unlikely to be from the same time series as events supplied to it in the past. Another example is a signature tool that examines sequences of events looking for particular patterns that represent known misuse of the system. Another example would be a correlator that simply examines events and attempts to determine whether they are causally related to one another, and then puts them together into composite events which can be further analyzed. Simple analyzers might be just filters that throw away events that match certain patterns, or caches that only forward events dissimilar from recently seen events. A preferred event analyzer implements the push-style consumer interface, whereby it obtains input, and the push-style producer interface, whereby it reports analyses. The gidos it produces are analysis results, not raw events nor prescriptions. Again, preferred gido analyzers immediately pass through gidos (with the exception of some processing delay). No provision is made for storage of gidos by analyzers. 1.2.3 Event Databases Databases are labeled by Di in Figure 1.1. These components exist simply to give persistence to CIDF gidos where that is necessary. The interfaces allow other components to pass gidos to the database, and to query the database for gidos that it is holding. Databases are not expected to change or process the gidos in any way (or at least to maintain the illusion that they don't). A preferred gido database implements the push-style consumer interface, whereby it receives any sort of gido, and the pull-style producer interface, whereby it responds to queries. It is not assumed that the database is a complex application (such as a relational database). It may simply be a file. 1.2.4 Response Units Response units are the soldier ants of the CIDF ant-heap. They carry out prescriptions - gidos that instruct them to act on behalf of other CIDF components. This is where functionality such as killing processes, resetting connections, etc. would reside. Response units are not expected to produce output except as acknowledgements. A preferred response unit implements the push-style consumer interface, whereby it receives prescriptions. It may also implement the push-style producer interface, whereby it reports on its efforts to carry out the prescriptions. 1.2.5 Other Components CIDF Specification: Version 0.6 Page 9 Many other useful types of component are compatible with CIDF. For example, a subsystem may record events in a non-CIDF format, but may implement the pull-style producer interface so that CIDF components can query its record of events. A component may record gidos for archival purposes, thus needing only a push-style consumer interface. A component may observe the world and do some analysis or filtering before creating gidos. Such a component implements the push-style producer interface. An event analyzer may consult a gido database. The analyzer would need a pull-style consumer interface beside the usual push-style producer and push-style consumer interfaces. A component may carry out responses, like a response unit, but also produce analyses, like an event analyzer. CIDF Specification: Version 0.6 Page 10 ======================================================================== = 1.3: Communication Layers ======================================================================== 1.3.1: Background CIDF supports both interoperability and reusability of components. As such, a component may be communicating with another across the network, or as part of the same executable. In addition, to the extent feasible, CIDF avoids specifying a particular language or choice of network protocols. To support this flexibility, the design is structured in layers. Figure 1.2 shows the layers. ------------------ | APIs | |----------------| | Gido layer | |----------------| | message | | layer | |----------------| | (negotiated) | | transport | | layer | ------------------ Figure 1.2 1.3.2: API Layer At the top of figure 1.2 is an API layer indicating code-based interfaces to the layers below. Application programmers require a clean and uniform way to call upon functions that are either local or remote and do not wish to bother with the details of exactly how that function is provided. APIs hide information and simplify a programmer's task. If the underlying structure of one of the lower layers is changed, the programmer does not have to rewrite the application program. The specification is in-principle neutral regarding the language used for APIs. Of course, the APIs must be instantiated for any specific language, and the instantiations will be different for different languages. However, the semantics of what is being passed across the interface will be common, and to the extent feasible, the APIs will be conceptually similar. The APIs are discussed in detail in Section 5 of this document. 1.3.3: Gido layer Independent of programming language, network protocols, etc, CIDF defines common formats for intrusion detection data. This data comes in discrete packages called gidos (generalized intrusion detection objects). The organization of the data, its semantics for an IDS component, and a way to encode it in bytes are all defined at this level. CIDF Specification: Version 0.6 Page 11 The rationale for this is to separate the issue of how data is organized and what it means (gido layer) from how it is gotten in and out of components (API layer) and moved across networks (API layer). In the case of components that are linked together into a single executable, there may be no layer below the gido layer. Gidos are discussed in sections 2 and 3 of this document. 1.3.4: Message layer Gidos must be moved across networks. Certain features of this process must be present for CIDF purposes and may not be provided by underlying transport mechanisms (such as cryptography, CIDF addressing, etc). The CIDF message layer is intended to provide this functionality. This layer is addressed in section 4. Use of this layer is mandated for CIDF components that are to be interoperable across a network. 1.3.5: Transport layer The figure below illustrates the notion of two independently developed CIDF modules that build to a common interface specification. CIDF supports For the two modules to communicate, they are required to employ the same transport protocols that will establish the communication channel and handle message passing. The introduction of the transport layer is handled during the integration phase, as module developers negotiate and agree upon a common transport channel. For example, both developers may agree that sockets will be used for this communication session. Other developers may decide they wish to employ secure RPC for a different session. CIDF provides the flexibility to use different transport mechanisms, and a negotiation mechanism to choose amongst them. The reason for having an independent transport layer below the message layer is that our only requirement is that the components understand the messages. This is independent of the way in which messages are transmitted. Different applications will require different transport mechanisms. All components are required to support a default transport mechanism, namely UDP. This is necessary in order to guarantee that two components can talk at least enough to negotiate about which other transport mechanism they might prefer. ------------------------------------------------------------------------ Interoperation Among Independently-Developed Intrusion-Detection Modules CIDF Specification: Version 0.6 Page 12 +-------------+ +---+ +---+ +-------------+ | Intrusion | | T | | T | | Intrusion | | Detection | | R | | R | | Detection | | Module X | | A | communication | A | | Module Y | | | | N | interface | N | | | | Developer 1 | | S | <-------------------->| S | | Developer 2 | | | | P | negotiated | P | | | | Language A | | O | during | O | | Language B | | OS X | | R | integration | R | | OS Y | +-------------+ | T | phase | T | +-------------+ ^ +---+ +---+ ^ / \ / \ | | | | | Build-to +------------------+ Build-to | | | | | +-----------------| Common Interface |------------------+ | Specification | +------------------+ ------------------------------------------------------------------------ - Figure 1.3 CIDF Specification: Version 0.6 Page 13 ======================================================================== = 1.4: Naming and Locating Components ======================================================================== 1.4.1: Background In an intrusion-detection system of any scale, naming components has the potential to become a boundless headache. Components that "know" the identity of other components will require modification to work with other partners if redeployed in new contexts. Such components might not be informed at all of changes in the system (such as the addition or removal of components of interest to them) that could affect their own operation. So each component has the option of specifying classes of gidos that it is interested in, rather than naming other components. A producer of gidos can announce the classes of gidos it produces. A gido consumer can request the classes of gidos it wants to receive. Communication between components is then characterized, not by the address or other identifying information of the endpoints involved, but by some feature of the data the communication is to carry. Components also have the option of naming other components explicitly, either with or without the use of the CIDF infrastructure described in the next sections. Finally, components may also use network broadcast to indicate their willingness to accept specific gidos. 1.4.2: Associations Enabling the feature-based communication described above is the role of the CIDF Matchmaking Service, or matchmaker, which will form and maintain associations between components. To discuss associations further, it is useful to divide CIDF components into gido producers (E- boxes, A-boxes, D-boxes) and gido consumers (A-boxes, D-boxes, R-boxes). Also note that gidos may enter the ID & R system from other sources (e.g., humans) and leave the system bound for other recipients. A component contacts the matchmaker to announce its presence and ask for associates. This call returns a set of communications endpoints identifying potential partners for the transfer of gidos of interest to the caller. Each producer-consumer pair subsequently established is the basis for an association. The caller has the option of being notified (via a callback) when new potential partners enter the system or when old ones leave. Note, though, that individual components or CIDF platforms may choose not to support dynamic addition of associations, e.g. due to resource constraints. Components may also restrict the number of concurrent associations they will enter into. CIDF Specification: Version 0.6 Page 14 Associates are sought by a single API call, and individual associations may be torn down with a second call. However, each of these calls may induce a larger number of lower-level interactions. At the message layer, setting up an association involves directory operations (optionally including authentication). Maintaining a request for associates may involve keepalive functions also implemented in the message layer. In the API layer occurs negotiation between producer and consumer to determine what kinds of gidos the consumer will receive. The specification returns to these lower-level operations in later sections. 1.4.3: Gido Classes For feature-based communication to work on a large scale, ways of classifying gidos must be established in advance. Every legal class of gido will fall under a category, which is a set of values for a particular attribute of the gido. The attribute need not necessarily be an explicit field in the gido; it could be an attribute of the gido producer, or of the host the producer monitors. When an attribute does form part of a gido, it corresponds to a semantic ID (SID), as defined elsewhere in this document. So a category can denote something like: * an IP subnet * a DNS domain or subdomain * a physical subnet * a functional grouping of hosts, like: + a department + a project Wildcarding will be allowed, but not arbitrary wildcarding. For instance, the last two elements of an IP address (only) might be allowed to be wildcarded. Each kind of category is hierarchical and will be used to organize the directory used by the matchmaker. Each category will be the responsibility of a single server. A gido class may specify more than one category; it may also specify attributes that are not categories at all. These will be applied in negotiating what a given producer will actually send to a consumer. The matchmaker can be significantly simplified by building it atop LDAP- compliant infrastructure. Basically, the matchmaker then becomes a set of LDAP-compliant servers (DSA's), plus an LDAP client (DUA) and additional intelligence local to each component. This "normal" client environment will have to be replaceable where required with a simpler equivalent. Appendix C says more about our proposed use of LDAP. 1.4.4: Limitations CIDF Specification: Version 0.6 Page 15 Though the matchmaker can do a great deal to make connections between components intuitive and flexible, two key limitations of the approach (or indeed of any approach built atop a hierarchical directory service) should be noted. Looking up a target in a hierarchical directory service is appropriate if the target of the lookup is susceptible to hierarchical naming and if its part of the directory hierarchy is believed to be trustworthy (or at least not believed a priori to be untrustworthy). However, there are interesting classes of gidos for which one or both of the above assertions are not true. First are gidos that concern things lacking hierarchical names. Some examples are: * public keys * programs or other bad bundles of bits * attack profiles, like Stephanie Forrest's tuples of system calls Second are gidos that describe some characteristic of an attack or of an attacker. If one wants to know about attacks emanating from a given subnet, or authored by a given principal, use of a hierarchical directory service to locate related gidos would lead one to a server operating out of the (hypothetical) attacker's domain, and hence likely to be compromised. In either case, to cope with gidos that describe an object lacking a hierarchical name, or for which the name leads into an administrative domain that cannot be trusted to provide accurate information, a hierarchical directory service seems inappropriate. CIDF Specification: Version 0.6 Page 16 ======================================================================== = 2.1: Introduction to the gido format ======================================================================== 2.1.1: Overview This chapter specifies a standard gido format for use by CIDF components. These components shall use this standard for disseminating event records, analysis results, and countermeasure directives, to IDS modules. The document both defines the syntactic structure of these messages, and provides a method for defining the semantic content necessary for interpreting the various data elements embedded within the structure. 2.1.2: Organization This section is organized as follows. Section 2.2 discusses the requirements for the gido format and the rationale for our choice. Section 2.3 summarizes S-expressions as we define them and use them for gidos. Section 2.4 begins serious discussion of the semantic identifiers we use, and how to put gido-sentences together. Section 2.5 provides some detailed examples. Section 2.6 contains some rules and guidelines for defining new SIDS. Section 2.7 identifies the recommended set of GIDOs (primarily internal status information) that all CIDF- compliant modules should be able to produce. Section 2.8 discusses requirements for gido format negotiation protocols. Readers will probably wish also to consult Appendices A and B which list all the currently defined data types and SIDS. Appendix D on conformance profiles is also related to this chapter. CIDF Specification: Version 0.6 Page 17 ======================================================================== = 2.2: Gido Requirements and Rationale. ======================================================================== Under the CIDF data sharing model, components receive an input stream, use this input to drive their internal analytical processing, and pass the results to other components within an overall intrusion detection architecture. The output of one component may be the input of another component. Therefore, this specification closely coordinates the structures of event records, analysis reports, and countermeasure prescriptions. In many cases, current state information must also be used in order to fully understand the meaning of events, hence this is also encoded in gidos. This adoption of a single standard for both E-, A-, and R-boxes provides significant advantages in the reduction of interface complexity. In addition, this approach provides great flexibility as intrusion-detection objectives move from component analysis, to systems analysis, to system of systems analysis. However, this relationship between event records and analysis results does not necessarily extend beyond the specification of identical gido structures. Event records, analysis results, and countermeasure prescriptions remain dissimilar in significant ways: o Event records represent the operational activity of the analysis target, and may be produced in large volumes. Minor losses of event records, while potentially damaging, will not necessarily imply a significant compromise to operational security. o Analysis results represent significant conclusions derived from an analytical review of an event stream, and should represent a significant reduction in volume from that of the event stream. Minor losses of analysis results are far more critical to the operation security of the target system than event records. o Countermeasure results likewise should be low volume and sensitive to loss. Thus, while gidos encode events, analysis results, and countermeasure prescriptions identically, other processing layers such as transport may handle them differently. For example, specifications for event transport may derive requirements that emphasize performance (e.g., stateless UDP transmission), while analysis results dissemination protocols may emphasize ensured delivery and accurate reassembly over issues of performance (e.g., TCP transmission). Protocols for event dissemination and analysis results reporting may also handle other issues differently, such as security requirements. The GIDO structure contains the actual data representing the event record, analysis results, and countermeasure directives produced by their respective CIDF components. The encoding scheme requires the ability to express complex, self-defining data structures, while providing efficient high-volume transmissions of predefined structures. This specification uses S-expressions as the basic payload format. CIDF Specification: Version 0.6 Page 18 S-expressions are a self-defining formatting scheme for representing arbitrarily complex data structures. This message encoding specification employs a very simplified form of S-expressions for event record, analysis report, and countermeasure directive representation. One of the motivations for this choice is that S-expressions in general allow for an impressive degree of reasoning and formalism. The design goals for the gido format are: -- generality: Gidos should be capable of representing arbitrarily complex data. -- self-defining: Extensions to payload formatting should be semantically defined within the payload itself. Consumers should be able to learn or adjust to alterations in the expected format or comprehend entirely new payload format. -- simplicity: The encoding scheme should produce messages that do not force complex parsing logic upon IDS module developers if tha is not necessary in their application. The encoding scheme should b easily understandable and gidos should have a human readable representation. -- efficiency: Payload expressions should represent data compactly. The overhead of semantic self-definitions should be removable when predefined messages are transported in bulk. -- flexible: Payload expressions must be open to modification and extensions to new data types, semantic information, and new data structures. -- independent of call semantics: Payload expression must be supportive of both embedded data (call by value) messages and data independent (call by reference) messages. CIDF Specification: Version 0.6 Page 19 ======================================================================== = 2.3 GIDO S-expression format ======================================================================== 2.3.1 Preamble In this section, we define how S-expressions are put together at a low level in CIDF. This is the human readable format; the wire format is defined in terms of this one in section 3.3. In addition to questions of encoding format, this specification also enumerates a set of CIDF-compliant default primitive data types and semantic-identifiers (SIDs) used when expressing individual payload fields. How SIDS should be combined into S-expressions that form meaningful gidos is discussed in section 2.4 The primitive data types, presented in Appendix A, define the available encoding used for field representation. Semantic-identifiers (SIDs), in Appendix B, provide standard identifiers that gido consumers may use to interpret the various data fields within a payload expression. 2.3.2 S-Expression Grammar Following is the grammar for CIDF S-expressions in BNF. Terminal symbols are represented in upper case. Literal characters are enclosed in quotes ("). ::= | "(" item-list ")" | ::= "(" ")" | "(" "def" SID ")" ::= | "(" ")" ::= | ::= SID | TYPE | NAME ::= | ::= DATA | "(" ")" Using this grammar, data fields are coupled with semantic identifiers parenthetically. A SID indicates how its associated data element is syntactically represented as well as the data element's semantic content. A collection of parenthetical SID/Data tuples can themselves be grouped together in outer parentheses, indicating an explicit *association* of the SID/Data tuples (i.e., they represent attributes of a larger element in the expression). SID grouping is discussed further, with illustrations, in Section 2.3. A SID is a unique token for a semantic identifier. TYPE is one of the primitive types specified in Appendix A. NAME identifies a named element of a structure. DATA is a data literal. CIDF Specification: Version 0.6 Page 20 2.3.3: GIDO S-expression Examples The following sections illustrate low-level ways of using S-expressions to encode gido data structures. We give these examples for concreteness, but see the next section for more information on how to form gidos. 2.3.3.1: Embedded Semantics and Data Payload Example This form is used for expressing field-oriented lists of data, where the data is embedded within the message. The format consists of a series of tuples, one tuple per data field. Each tuple consists of a semantic identifier followed by its associated data item: Format: (SID-1 data-exp-1)(SID-2 data-exp-2) . . . (SID-N data-exp-N) In this format, their is a SID with each data item, providing a self- defining message format. A consumer can parse the message for those SIDs it understands and desires to analyze, and discard data fields containing unknown or unwanted SIDs. As discussed in Appendix B, each SID has an associated data type, which completes the self-definition of the message. Thus, by parsing the SID tokens, the consumer knows both how to interpret each data element semantically, and how the data elements are syntactically represented. 2.3.3.2 Pre-defined Constant Payload Format This form allows for semantics of predefined message structures to be conveyed to consumers once. From that point forward, consumers can receive and interpret raw data structures without the overhead of embedded SIDs. This form is highly efficient for transporting high- volumes of the same message type. This form is also used for enumerating a pre-defined set of CIDF E/A-box messages (see Section 2.5). A gido producer begin the message exchange by sending the consumer a message definition statement. The "def" defines a new SID that can be used subsequently. SID indicates the semantic identifier being defined. SIDs are special identifiers in the language. Attempting to define a SID that is already defined is an error. arg-list is a list of dummy arguments that will be matched with the actual arguments in use to evaluate the S-expression. sid-exp-1 defines the SID in terms of SIDs and TYPEs that are already defined. sid-exp-1 may only contain SIDS that have been predefined either because they are included in an appendix to this document or they have been defined in a prior definition. Format1: (def SID arg-list sid-exp-1) ##################################################################### # Editor's Comment: The event subgroup has not resolved the # issue of scope for dynamically defined SIDS. ##################################################################### CIDF Specification: Version 0.6 Page 21 ======================================================================== = 2.4 Parts of a GIDO payload ======================================================================== 2.4.1 Introduction A GIDO consists of the GIDO header--which gives information pertaining to the encapsulation of the GIDO, such as its version number, its length, and so forth--and the GIDO payload. In this section, we will describe how SIDs are put together to compose the GIDO payload using S- expressions described in the last section. The Gido header is discussed in section 3.2 A well-formed GIDO payload consists of one or more top-level *sentences*. Sentences are S-expressions that can be said to "assert" something. A typical sentence might describe the state of a machine at a given time, or it might report that a given event had taken place, or it might also recommend that an action be taken to counter an attack. A sentence may be composed of other sentences, connected in some way; such a sentence is called a *compound sentence*. A sentence which is not compound is called a *simple sentence*. Broadly speaking, a simple sentence contains a *verb*, which describes what happened, and other S- expressions that describe who verbed what, where, when, and how, and so forth. In the following sections, we will examine how each of these may be denoted and described, and finally, put together to form a complete sentence. 2.4.2. Verb SIDs At the heart of a sentence is the *verb*. Normally, we think of verbs as denoting some action (which may sound somewhat event-centric), but they may also denote a recommendation, for instance, or description of state. Each sentence has one main verb. An example of a verb SID is "Execute". Verb SIDs, unlike most other SIDs, do not take a concrete data type for an argument. Instead they take a sequence of one or more S-expressions. These S-expressions describe the various "players" for the verb. In the case of "Execute", we would be interested in what (program) was executed, who executed it, where and when it was executed, and so on. 2.4.3. Role SIDs A verb has little value until we describe who and what that verb applies to. This is accomplished using *role* SIDs. A role denotes what part an entity, or set of attributes, plays in a sentence. Examples of roles are "Initiator" and "Operand". CIDF Specification: Version 0.6 Page 22 Role SIDs, like verb SIDs, take a sequence of one or more S-expressions as argument. These S-expressions describe the object, roughly speaking, which is playing that role in the sentence. Example: (Initiator (RealName "Joe Cool") (UserName "joe") (UserID "1618")) denotes a user, with real name "Joe Cool", user name "joe", and user ID "1618", acting as Initiator. (Typically, an Initiator is someone who causes an action to take place--such as executing a program.) An S-expression headed by a role SID is called a *role clause*. 2.4.4. Extension SIDs It is not expected that any component will understand all SIDs. A component concerned with Unix notions will often not be worried about X.500-related SIDs. Nevertheless, many X.500-related SIDs have their complements in the Unix world, and the Unix component will want to capture this information, even if it isn't cognizant of the exact use of this information in the X.500 world. For instance, a user's real name is a user's real name, although in Unix it might be the name in /etc/passwd associated with the user's account, and in X.500 it may be a Common Name. If these two concepts were expressed with two completely distinct SIDs, then we would lose much of the benefits of data sharing. Extension SIDs are designed to address this. Extension SIDs allow one to specify information in a relatively generic fashion, and then give more specialized receivers extra information about a SID that specifies more precisely how it is to be used. For instance, an X.500 Common Name would be expressed as follows: (RealName (ExtendedBy X500CommonName) "Joe Cool") Most components would be able to understand the RealName SID, and would be able to capture the fact that the a user with the real name "Joe Cool" is in question here. Additionally, any component who understands X.500 would implement the X500CommonName extension, so that it knows that the real name is registered as a Common Name, along with any implications of that fact. In general, a SID is *extended* by following it with a sequence of one or more SID-pairs, each of which is tagged with the ExtendedBy SID. An extension SID MUST follow the SID or extension which it extends. For example, the following is well-formed: (ObjectName (ExtendedBy DeviceName) (Extendedby UnixFullDeviceName) ... ) where the ellipsis indicates the sequence of S-expressions qualifying the ChangePrivilege verb. CIDF Specification: Version 0.6 Page 23 An extended SID always takes the same *type* as the unextended (base) SID. In fact, if one knows that a message will *only* be used by someone who recognizes the extension, then it may omit the base class altogether, and refer only to the extension. Therefore, for instance, one could write (X500CommonName "Joe Cool") 2.4.5. Conjunction SIDs Conjunction SIDs join sentences at the same "level" together. Two sentences that are simply juxtaposed together are presumed to mean that both hold. That is, means that both Sentence1 and Sentence2 hold. Other relationships are indicated by the appropriate conjunction SID. For instance, to indicate that Sentence1, Sentence2, and Sentence3 all had a common cause, one writes (CommonCause ) 2.4.6. Open S-Expressions An open S-expression is one in which not all the data values are "filled in", so to speak. It is used to express concepts such as " removed ." Its only currently defined usage is in the def construct, as follows: (def RemoveFile ($username $filename) (Remove (Initiator (UserName $username)) (Operand (ObjectType file) (ObjectName $filename)) ) ) In later usage, we can express "The user with user name joe removed the file /etc/passwd" in this way: (RemoveFile "joe" "/etc/passwd") Its general format is (def () ) 2.4.7. Referent SIDs There is a last special type of SIDs, called Referent SIDs. They are placed at the end of this chapter, because they are not restricted to the construction of a single sentence, but instead allow one to link two or more sentences together (though they are often used to refer to other parts of the same sentence). CIDF Specification: Version 0.6 Page 24 The two referent SIDs are ReferAs and ReferTo. They take a string as their data type. A SID-pair headed by a referent SID is called a *referent clause*. A referent clause may be placed into either a sentence or a role clause. Their interpretation varies depending on where they appear: * If a ReferAs clause is placed into a sentence, it can be said to *refer* to that sentence, *except* for any ReferAs clauses. (It is considered bad form to use more than one ReferAs clause in the same sentence at the top level.) Thereafter, a use of the corresponding ReferTo clause can be used in place of that sentence (although see warning below). * If a ReferAs clause is placed into a role clause, it is said to refer to the object described by the sequence of S-expressions following that role, *except* for any ReferAs clauses. (It is considered bad form to use more than one ReferAs clause in the same role clause.) Thereafter, a use of the corresponding ReferTo clause can be used in place of that object description (again, see warning below). * WARNING. The referent SIDs MAY carry actual semantics, and are not simply macros. If a ReferAs clause is placed into a sentence, and that sentence refers to an event (say), then the ReferTo clause refers specifically to that specific event, and not simply to an event with the same attributes (which after all may not be uniquely identifying). Similarly, if a ReferAs clause is placed into a role clause, and that role clause describes an object (say) then the ReferTo clause refers specifically to the same object, and not simply to an object with the same attributes. Of course, if no specific item is denoted by the ReferAs clause, then this warning does not apply. For example, if ReferAs occurs in an assertion of state, then it can be interpreted as simply a macro, since there is no unique item being denoted. As an example, consider the following sequence: (Remove (Initiator (RealName "Joe Cool")) (Operand (FileName (ExtendedBy UnixPathName) "/etc/passwd")) (AtTime (Time "1998 Feb 25 12:40:32 PST")) (ReferAs "JoesDeletion") ) followed by (HelpedCause (ReferTo "JoesDeletion") (Login (Initiator (RealName "Mary Worth")) (To (HostName "host.work.com")) (Outcome (ExtendedBy UnixErrno) (ReturnCode 13)) ) ) CIDF Specification: Version 0.6 Page 25 This indicates that the act of Joe Cool deleting /etc/passwd later helped to prevent Mary Worth from logging in to host.work.com. Note that this specific instance of Joe Cool deleting /etc/passwd is referred to here. Even if (by resetting the clock, say) Joe Cool were to delete /etc/passwd a second time with the same attributes, this construction would still show that it was the *first* deletion that helped prevent Mary Worth from logging in. Since referent SIDs act across GIDOs, and hence potentially across multiple messages (although not necessarily so), the question of scope arises. The scope rule applying to Referent SIDs is as follows: The value of a referent clause is the verb or role within which it is found (roughly speaking), provided that that verb or role is in the same thread. A thread is defined as the conjunction of the originator ID and thread ID fields in the GIDO header. A producer MUST NOT re-use a referent (such as "JoesDeletion") within the same thread, for perpetuity. 2.4.8. Guidelines for Putting SIDs Together to Form Sentences In this section, we describe how to use verb SIDs, role SIDs, conjunction SIDs, and other kinds of SIDs to construct sentences. 2.4.8.1. Basic Organization As noted above, a simple sentence is an S-expression headed by a verb SID (which may be extended). This verb SID is followed by a sequence of one or more S-expressions that describe the various entities that play parts in the sentence, or qualify the verb. The S-expressions denoting the roles of the sentence are headed by a role SID, which may also be extended. This role SID is again followed by a sequence of one or more S-expressions that may describe attributes of the entity playing that role. It may also describe a sentence that plays a role within the sentence. A BNF-like grammar that specifies this structure is as follows. CIDF Specification: Version 0.6 Page 26 ::= | ::= "(" ")" | "(" ") | "(" ") | "(" "def" ")" ::= | ::= "(" ") | "(" ") | "(" ")" | "(" "ReferAs" ")" ::= | ::= "(" "ExtendedBy" ")" ::= | ::= "(" "ReferTo" ")" In English: A GIDO payload is a SentenceList, which is a list of Sentences. A Sentence may be a ConjunctionSID, followed by a list of the Sentences it conjoins, or it may be a VerbSID, followed by a list of Qualifiers of that VerbSID. A Qualifier may be a RoleSID, followed by a list of Qualifiers of that RoleSID. A Qualifier may also be an AtomSID followed by its data. Any list of Qualifiers may contain a ReferAs clause. Thereafter, use of the corresponding ReferTo clause may stand in for that list of Qualifiers. Any SID may be followed by a list of Extensions. 2.4.8.2. Understanding Sentences and the Principle of Connectedness The Principle of Connectedness simply states that when a component reading a GIDO encounters a SID it does not understand, the component must strictly ignore the S-expression that the SID heads. The component MUST NOT reject the GIDO on this ground. For instance, in the example below (InOrder (Delete (Initiator (FullName "Joe Hacker")) (Operand (ObjectType file) (ObjectName "/etc/passwd")) ) (Execute (Initiator (UserName "sysadmin")) (Operand (ObjectType program) (ProgramName "SystemCheck")) ) ) CIDF Specification: Version 0.6 Page 27 if a component does not understand the Delete verb SID, it may not make use of the Initiator and Operand SIDs within that sentence, even if it understands those, because it will not understand what they are the Initiator and Operand *of*. This is called the Principle of Connectedness because the portion of the GIDO which is understood must form a connected tree. If a parent is not understood, its children should not be interpreted, as its relation to the portion of the tree contain the parent is unknown. 2.4.8.3. Rules and Guidelines for Using SIDs Whenever a component puts a SID into a GIDO, the SID MUST be used with the number of arguments (usually one) that the SID's definition calls for (see the definitions in Appendix B). The SID's argument(s) MUST have the syntax and meaning that the SID's definition calls for. Otherwise the component is OUT OF CONFORMANCE with the SID's definition. A component that generates GIDOs MUST generate them in conformance with all of the SID definitions in this specification. Whenever the above rule permits, a component generating a GIDO SHOULD use a SID from this specification and SHOULD avoid the SIDs defined in the Uninterpreted SIDs section. If the only suitable SID in this specification is in the Uninterpreted SIDs section, then an implementation MAY use it or define a new SID; defining a new SID is usually better. If a component generating GIDOs uses a SID from a particular specification, and if that specification defines two applicable SIDs, one of which is strictly more specific than another, then the component SHOULD use the more specific one. If CIDF component X creates a sentence and CIDF component Y later has a copy of the sentence and passes it verbatim to CIDF component Z, then Y MAY do so even if the sentence violates the above rules and guidelines. The sentence MUST be passed verbatim and SHOULD be clearly ascribed to its originator. This provision frees D-boxes and such from having to thoroughly understand and validate every GIDO they process. However, if the CIDF component modifies any part of the sentence, then it is responsible for the sentence's compliance with the above rules and guidelines. CIDF Specification: Version 0.6 Page 28 ======================================================================== = 2.5. Detailed Examples ======================================================================== Now that the basic components of an S-expression have been presented, we illustrate how to utilize these components to express various records structures and messages that intrusion detection systems may wish to express. In the following examples, we walk the reader through the process of translating raw event structures, analysis results, and other candidate message structures into S-expressions. 2.5.1. Translating a Basic Security Module Audit Record One very well-known form of security audit records are those introduced in Sun Microsystems' SunOS 4.1.X Basic Security Module (BSM). There are a variety of ways to translate BSM audit records into S-expressions, depending on the data elements that a CIDF module may be directed to filter or incorporate within its GIDOs. In this example we demonstrate the translation of a BSM audit record generated as a result of a successful rlogin request. 2.5.1.1. BSM Record Description The raw BSM record describes an event in which an external user performs a successful remote login to target.machine.com from source.machine.com. A session is established in which the resulting real and effective user IDs are set to thomas, the real and effective group IDs are set to staff, terminal 6 is assigned to the session, and the process and session IDs are set to 5345. The event is captured by the audit daemon on target.machine.com, which records the event as follows: Raw BSM Audit Record [header,86,2,login - rlogin,,Sat Jul 29 20:43:01 1995, + 280009000 msec subject,thomas,thomas,staff,thomas,staff, 5345,5345,0 6, source.machine.com text,successful login return, success,0] 2.5.1.2. BSM to S-Expression Translation Process Now we illustrate the underlying rationale used to translate a common event structure such as a BSM audit record into a CIDF S-Expression. As discussed in Section 3.2, we begin our S-expression construction by first defining the verb of our sentence in its most general form. In this case, the operation recorded in the BSM audit record is the establishment of a communication session between two entities via rlogin. As we parse the potential Verb SIDs available in Appendix B.2, we find that the SID most closely matching the rlogin operation is the BeginSession SID. While BeginSession captures well the underyling action represented in the audit record, we note that a Unix-specific extension is available for further refinement (as discussed in Section 3.4). The resulting S-expression is as follows: CIDF Specification: Version 0.6 Page 29 Example 2.5.1.2a BSM Rlogin S-Expression: - -->(BeginSession (ExtendedBy UnixRlogin) : : : - -->) The next step is to qualify the verb with supporting S-expressions that further enumerate the attributes of the event. In this case, the verb BeginSession has a series of supporting role clauses that can be derived from the BSM record (Section 3.4). These role clauses include: o the observer from which the event was recorded o the initiator of the BeginSession operation o the entity to whom the BeginSession was directed o the resulting state changes or resource(s) produced or destroyed by the operation (in our case this involves the attributes of the session established by the rlogin o and the outcome of the event >From the above categories of attributes we augment the S-expression with the following relevant role-clauses: Example 2.5.1.2b BSM Rlogin S-Expression: (BeginSession (ExtendedBy UnixRlogin) - --> (Observer (S-expression ...) ) - --> (Initiator (S-expression ...) ) - --> (To (S-expression ...) ) - --> (Operand (S-expression ...) ) - --> (Outcome (S-expression ...) ) ) Role clauses are selected for grouping associated datafields under a common contextual usage in the S-expression sentence. At this point, we switch our attention to incorporating associated datafields within the above role clauses. Datafields that cannot correctly be associated within the context of one of the available role clauses can still be incorporated in the S-expression independent simple sentences within the S-expression. CIDF Specification: Version 0.6 Page 30 In our example, the Observer clause provides a contextual association with all datafields that describe attributes of the oberserve, including when, where, and through which means (i.e., BSM data) the observation was recorded. The initiator clause is used to associate datafields that describe the entity responsible for the event. In this case, the BSM record provides very little information, other than hostname from which the request was sent. Similarly, the BSM record provides only the hostname of the recipient, which we document in the To clause. The Operand clause is used to describe object that has been affected by the event, which in this case was the creation of the session. - From the BSM audit record, we can include under the Operand clause the session's associated user attributes, group attributes, process/session attributes, and the device through which the session is supported. Lastl we enumerate the attributes of the outcome. Example 2.5.1.2.c Final BSM Rlogin S-Expression: Section Ref. ------------ (BeginSession (ExtendedBy UnixRlogin) -- B.2.5 (Observer -- B.3.7 - --> (AtTime (Time "Sat Jul 29 20:43:01 PDT 1995")) -- B.3.2 - --> (HostName "target.machine.com") -- B.5.4 - --> (ObservationSourceType "BSM-SunOS") -- B.5.1 ) (Initiator -- B.3.1 - --> (HostName "source.machine.com") -- B.5.4 ) (To -- B.3.3 - --> (HostName "target.machine.com") -- B.5.4 ) (Operand -- B.3.1 - --> (UnixAUserName "thomas") -- B.5.9.6 - --> (UnixUserName "thomas") -- " - --> (UnixEUserName "thomas") -- " - --> (UnixGroupName "staff") -- " - --> (UnixEGroupName "staff") -- " - --> (ProcessID 5345) -- B.5.2 - --> (SessionID 5345) -- B.5.2 - --> (Through -- B.3.3 - --> (ObjectName -- B.5.1 - --> (ExtendedBy UnixFullDeviceName) -- B.5.1 - --> "/dev/tty06") ) ) (Outcome -- B.3.6 - --> (Severity 3) -- B.5.1 - --> (ReturnCode -- B.5.1 - --> (ExtendedBy UnixErrno) -- B.5.1 - --> 0) -- B.5.1 - --> (Comment "successful login") -- B.5.1 ) ) CIDF Specification: Version 0.6 Page 31 2.5.2. Translating a TCP/IP Packet In the next example, we'll see how to translate the contents of an FTP connection request captured by a TCP/IP packet sniffer. Here the TCP/IP packet is observed being sent from an external client to the target host's FTP control port. The packet is translated by a CIDF module that attempts to describe the transaction from the perspecti of analyzing data sent to the application-layer (i.e, FTP) network servi 2.5.2.1. TCP/IP Packet Description The observer in this example is a CIDF E-box that parses sniffed pacekts from a Sun Microsystem's Solaris machine. The observer's host platform i named snoopmachine.machine.com, and from this machine the observer attem to capture and translate traffic to and from the FTP control port of server.machine.com using the Solaris snoop(1) command: snoopmachine% snoop -v -d le0 -t a host server port 21 The following is an example snoop-formatted packet produced be the observer: Raw TCP/IP Packet CIDF Specification: Version 0.6 Page 32 ETHER: ----- Ether Header ----- ETHER: ETHER: Packet 7 arrived at 8:59:49.05 ETHER: Packet size = 70 bytes ETHER: Destination = 0:01:02:03:04:05, Western Digital ETHER: Source = 0:aa:bb:cc:dd:ee, ETHER: Ethertype = 0800 (IP) ETHER: IP: ----- IP Header ----- IP: IP: Version = 4 IP: Header length = 20 bytes IP: Type of service = 0x00 IP: xxx. .... = 0 (precedence) IP: ...0 .... = normal delay IP: .... 0... = normal throughput IP: .... .0.. = normal reliability IP: Total length = 56 bytes IP: Identification = 63187 IP: Flags = 0x4 IP: .1.. .... = do not fragment IP: ..0. .... = last fragment IP: Fragment offset = 0 bytes IP: Time to live = 38 seconds/hops IP: Protocol = 6 (TCP) IP: Header checksum = 69a3 IP: Source address = 999.998.997.996, client.machine.com IP: Destination address = 111.121.131.141, server.machine.com IP: No options IP: TCP: ----- TCP Header ----- TCP: TCP: Source port = 12406 TCP: Destination port = 21 (FTP) TCP: Sequence number = 820300070 TCP: Acknowledgement number = 3095138926 TCP: Data offset = 20 bytes TCP: Flags = 0x18 TCP: ..0. .... = No urgent pointer TCP: ...1 .... = Acknowledgement TCP: .... 1... = Push TCP: .... .0.. = No reset TCP: .... ..0. = No Syn TCP: .... ...0 = No Fin TCP: Window = 61320 TCP: Checksum = 0x4e8d TCP: Urgent pointer = 0 TCP: No options TCP: FTP: ----- FTP: ----- FTP: "USER anonymous\r\n" CIDF Specification: Version 0.6 Page 33 The packet consists for four layers of structure: the Ethernet header, the IP header, the TCP header, and the FTP data portion. Working from the bottom up, we see that the packet represents an FTP "USER anonymous" request, which for FTP is equivalent to a BeginSession request for an anonymous FTP session. Above the FTP header are the TCP fields, containing, among other things, the source and destination ports (note the destination port is port 21, the FTP control protocol port). Above the TCP layer are the IP and Ethernet header, both containing datafields that could be of use to further identify the initiator and recipient of the FTP request. 2.5.2.2. TCP/IP Packet to S-Expression Translation Process As with the BSM exaample, we begin our S-expression by defining the verb of our sentence. In this example, the E-box is monitoring traffic to the FTP control port when it encouters a TCP/IP packet that contains an FTP USER command request for anonymous access. As a result, we again choose BeginSession as the verb. The resulting S-expression is as follows: Example 2.5.2.2a FTP BeginSession S-Expression Example: - --> (BeginSession : : : - --> ) Next, we qualify the verb with supporting S-expressions that further enumerate the attributes of the event. As with BeginSession in our BSM example, we can support a series of role clauses from the information in our FTP packet. These role clauses include: o the observer from which the event was recorded o the initiator of the BeginSession operation o the entity to whom the BeginSession was directed o the resulting state changes or resource(s) produced or destroyed by the operation (in our case this involves the attributes of the session established by the rlogin o the command or tool used in the event o and the outcome of the event - From the above categories of attributes, we augment the S-expression with the following relevant role-clauses: Example 2.5.2.2b FTP BeginSession S-Expression Example: (BeginSession - --> (Observer (S-expression ...) ) - --> (Initiator (S-expression ...) ) - --> (To (S-expression ...) ) - --> (Operand (S-expression ...) ) - --> (Using (S-expression ...) ) - --> (Outcome (S-expression ...) ) ) CIDF Specification: Version 0.6 Page 34 The Observer clause can include a variety of datafield attributes, including the timestamp and the host platform of the sniffer. The initiator of the BeginSession could also be viewed as attributes of the location from which the request was sent. Because both the "Initiator" and "From" roles both provide accurate context to the set of attributes that represent the entity responsible for the BeginSession Event, we chose to recognize the two clause using referent SIDS (Section 3.7). The entity responsible for the event can be described through a variety of attributes within the packet, including the Ethernet address, IP address, TCPPort, and hostname. The recipient can be identified from a similar set of corresponding datafields. Unlike the BSM record, there is very little information in the packet to describe the session, other than the session will be associated with the anonymous user account. The means used in this event is an FTP command, "USER". Lastly, we identify the outcome of this event as pending, in that at this point we cannot determine whether the BeginSession will succeed. The outcome will be determined in subsequent GIDOs, which require an association with this S-expression through a common thread ID define in their GIDO headers. We use the CIDFReturnCode extension of ReturnCode to express this condition. The GIDO recipient must consult the other GIDOs in the thread until it encounters an Outcome with a ReturnCode that is not pending. Example 2.5.2.2.c Final FTP BeginSession S-Expression: CIDF Specification: Version 0.6 Page 35 Section Ref. ------------ (BeginSession (ExtendedBy FtpCommand) "USER" -- B.2.5 (Observer -- B.3.7 - --> (AtTime (Time "08:59:49.1 PDT")) -- B.3.2 - --> (HostName "snoopmachine.machine.com") -- B.5.4 - --> (ObservationSourceType "Packet") -- B.5.1 ) (Initiator -- B.3.1 - --> (ReferTo "the-client") -- B.5.1 ) (From - --> (ReferAs "the-client") -- B.5.1 - --> (HostName client.machine.com) -- B.5.5 - --> (EthernetAddress 0:aa:bb:cc:dd:ee) -- B.5.5.1 - --> (IPv4Address 999.998.997.999) -- B.5.5.2 - --> (TCPPort 12406) -- B.5.5.3 ) (To -- B.3.3 - --> (EthernetAddress 0:01:02:03:04:05) -- B.5.5.1 - --> (IPv4Address 111.121.131.141) -- B.5.5.2 - --> (Hostname "server.machine.com") -- B.5.5 - --> (TCPPort 21) -- B.5.5.3 ) (Operand -- B.3.1 - --> (UserName -- B.5.4 - --> (ExtendedBy UnixUserName) -- B.5.9.6 - --> "anonymous") ) (Using -- B.3.1 - --> (FTPCommand "USER") -- B.5.9.5 ) (Outcome -- B.3.6 - --> (ReturnCode -- B.5.1 - --> (ExtendedBy CIDFReturnCode) -- B.5.1 - --> pending) ) ) ======================================================================== = 2.6 Rules and Guidelines for Defining SIDs ======================================================================== Other specifications MAY define SIDs for use with the CIDF framework. If a CIDF component generates or uses those SIDs, those SIDs MUST be defined in conformance to the rules here and SHOULD be defined in conformance with the guidelines here. o Every SID MUST have a unique name. o Every SID's definition MUST include precise syntax. o Every SID's definition SHOULD include precise semantics. o The SID description must fully explain the intended use of SID (i.e., the intended data arguments must be described) CIDF Specification: Version 0.6 Page 36 # Editor's note: The Event Subgroup is investigating naming # conventions and rules for SID enumeration to eliminate the # potential for SID reuse. Specifiers SHOULD avoid defining a SID whose meaning overlaps another, unless one SID is strictly more specific than another (unless the first one provides all the information that the second one provides and more). A SID MUST be so defined that when the SID heads an S-expression, the truth of its S-expression is independent of the peer S-expressions, the containing S-expression's peers, the peers of the container of the containing S-expression, and so on. Thus, an S-expression cannot *modify* the meaning of a peer S- expression. It can only augment the the peer S-expression. (The logical relationship between peer S-expressions is conjunction.) This is critical because a consumer may ignore some peer S-expressions. Specifiers should be wary when defining a set of closely related SIDs, since a consumer may understand some of the SIDs and not others. If two data items can be properly understood together but cannot be properly understood singly, then it is advisable to define a single SID that takes both data items as arguments. CIDF Specification: Version 0.6 Page 37 ======================================================================== = 2.7.: Example CIDF Module GIDO Sets ======================================================================== This section enumerates example sets of internal status messages that each CIDF-compliant E-, A-box, and R-box may choose to support. These message sets are not mandatory, but recommended as a consistent way of conveying internal module information. ##################################################################### # Editor's Comment: Recommendations for R-box message sets are # forthcoming. ##################################################################### 2.7.1 Recommended E-Box Message Set E-boxes can employ the following messages for basic internal information transfer to consumers. These messages are all formatted using pre- defined constant payload expressions (see Section 2.4.3.3, Format1), and contain E-box internal operation information. (See Appendix A for the SID to data type listing, and Section 3.2.3 for the list of Class ID codes.) Message ID: EB-Owner Description: Returns the hostname of the machine where the E-box is running, the machine's IP address, the port number assigned to the E-box (-1 if NA), E-Box process ID, identification of E-box developer, and revision number of the E-box. Priority: 5 Msg. Format: (def EB-Owner (struct HostName IP_Address Port PID DeveloperID RevisionNo)) Message ID: EB-Target Description: Returns the hostname of the monitoring target, the IP address of the target, the port number assigned to the target if a network service, the process ID of the target, and an identifier indicating the type of event stream through which the target is being monitored. Priority: 5 Msg. Format: (def EB-Target (struct HostName IP_Address Port PID EventStreamID)) Message ID: EB-Status Description: Returns a timestamp indicating uptime for the E-box, the transfer messages to the consumer (synchronous polling, asynchronous forwarding, trap, other), events parsed per second, bytes parsed per second, records sent since uptime, bytes sent since uptime, internal E-Box errors produced since uptime. Priority: 5 Msg. Format: (def EB-Status (struct UpTime ReportMethod RecsPerSec BytesPerSec SentRecsCnt SentByteCnt ErrorCount)) CIDF Specification: Version 0.6 Page 38 Message ID: EB-Transport Description: Returns an identifier for the current transport mechanism being used, the revision number of the transport software, and the list of available transport mechanisms for this E-box. Priority: 5 Msg. Format: (def EB-Transport (struct CurrentTrans RevisionNo AvailableTrans)) Message ID: EB-Error Description: Returns an internal error code produced by the E-Box, a textual description of the error, and a severity code (e.g., fatal, non-fatal, potential data loss). Priority: 3 Msg. Format: (def EB-Error (struct EB-ErrorCode ErrorDesc Severity)) Message ID: EB-Warning Description: Returns an internal warning code produced by the E-Box and a textual description of the warning. Priority: 4 Msg. Format: (def EB-Warning (struct EB-WarnCode WarnDesc)) Message ID: EB-FilterStatus Description: Returns the current filter array that identifies which of the available events the E-box is currently generating and returning to the consumer. Priority: 5 Msg. Format: (def EB-FilterStatus CurrentFilterArray) 2.7.2 Recommended A-Box Message Set A-boxes can employ the following message for basic internal information transfer to their consumers. These messages are all formatted using pre-defined constant payload expressions (See Section 2.4.3.3, Format1), and contain A-box internal operation information. Message ID: AB-Owner Description: Returns the hostname of the machine where the A-box is running, the machine's IP address, the port number assigned to A-box (-1 if NA), A-Box process ID, identification of A-box developer, and revision number of the A-box. Priority: 5 Msg. Format: (def AB-Owner (struct HostName IP_Address Port PID DeveloperID RevisionNo)) Message ID: AB-Target Description: Returns the hostname of the analysis target, the IP address of the target, the port number assigned to the target if a network service, the process ID of the target, and the module identity of the E-box through which the target's operational activity is being monitored. Priority: 5 Msg. Format: (def AB-Target (struct HostName IP_Address Port PID ModuleIdentity)) CIDF Specification: Version 0.6 Page 39 (Question: ModuleIdentity assumes a single E-to-A relationship. Need to handle multi-E-box analyses?) Message ID: AB-Status Description: Returns a timestamp indicating uptime for the A-box, the transfer messages to the consumer (synchronous polling, asynchronous forwarding, trap, other), event records parsed per second, bytes parsed per second, reports sent since uptime, bytes sent since uptime, internal A-Box errors produced since uptime. Priority: 5 Msg. Format: (def AB-Status (struct UpTime ReportMethod RecsPerSec BytesPerSec SentRecsCnt SentByteCnt ErrorCount)) Message ID: AB-Transport Description: Returns an identifier for the current transport being used, the revision number of the transport software, and the list of available transport mechanisms for this A-box. Priority: 5 Msg. Format: (def AB-Transport (struct CurrentTrans RevisionNo AvailableTrans)) Message ID: AB-Error Description: Returns an internal error code produced by the A-Box, a textual description of the error, and a severity code. Priority: 3 Msg. Format: (def AB-Error (struct AB-ErrorCode ErrorDesc Severity)) Message ID: AB-Warning Description: Returns an internal warning code produced by the A-Box and a textual description of the warning. Priority: 4 Msg.Format: (def AB-Warning (struct AB-WarnCode WarnDesc)) Message ID: AB-FilterStatus Description: Returns the current filter array that identifies which of the available analysis reports the A-box is currently building and returning to the consumer. Priority: 5 Msg. Format: (def AB-FilterStatus CurrentReportingArray) CIDF Specification: Version 0.6 Page 40 ======================================================================== = 2.8 Negotiation ======================================================================== We would like to enable CIDF to support adaptations in ID systems. For example: - adding or removing components (i.e., E-, A-, R-, or D-boxes) on the fly, - adding new capabilities to components via software modifications or adding new data such as signatures, - responding to specific situations such as identification of a possible threat. We therefore want the components to be able to change, on the fly, the information they are exchanging via CIDF without prearrangement. For example, a new E-box brought into a system could broadcast its capabilities and A-boxes could then request that the E-box start sending them some subset of the newly available data. The goal stated above is a research problem and is not amenable to a near term solution. However, there are some specific objectives that we feel a dynamic negotiation protocol could accomplish in the near term that would begin to address the more general problem. These are: 1. Identify the parties to participate in communication. 2. Specify and distribute the packages to be used in communication. 3. Specify and distribute filters to be used by producer(s). These functions can be implemented as a preamble to normal CIDF communication. More details: 1. Identify the parties to participate in communication. We would like to begin to address the question of how ID components locate other components to communicate with. This could be done by prearrangement outside of CIDF, but we would like CIDF to address this problem. A component can broadcast a message providing its identity, how it can be contacted, and a description of its capabilities and await a reply from other components. This can address both the situation of bringing new components on board and components restarting after being off line. 2. Specify and distribute the packages to be used in communication. Packages are collections of SID definitions. Typically, an ID component deals in SIDs from a "small" number of packages. The packages to be used must be known to all producers and consumers of a collection of gidos. We want to specify a means by which the parties to communication agree on the packages to be used and can obtain the necessary packages if they do not already have them. CIDF Specification: Version 0.6 Page 41 3. Specify and distribute filters to be used by producer(s). Filters are agents that change a gido into a different form and are agents that run on behalf of a consuming component in a producing component. They allow communication to be more efficient by limiting the data sent to that which can actually be used by the consumer. We want to provide a means by which a consumer can specify or actually send a filter to a producer. ################################################################ # We would like to prioritize these three functions and then # proceed to specify a protocol to address each of them in order # of priority. Comments and suggestions for priority are requested. ################################################################ ======================================================================== ======================================================================== = = 3: Encoding Gidos = ======================================================================== ======================================================================== = ======================================================================== = 3.1: Introduction to Gido Encoding ======================================================================== In encoding a gido into actual bytes for storage, tranmission, etc, two things are involved. Firstly, every gido is accompanied (in perpetuity) by a static format header which contains basic information about that gido. This header format is described in section 3.2. Secondly, the S-expression which forms the payload of the gido must also be encoded. The method for doing this is covered in section 3.3 CIDF Specification: Version 0.6 Page 42 ======================================================================== = 3.2: Gido Header ======================================================================== 3.2.1: Introduction The header definition, presented in this section, consists of a series of constant fields that gido consumers can reliably parse to read basic data common across all gidos. The gido s-expression payload, presented in a preceding section, contains the actual IDS component-specific data structures, including semantic identifiers that allow gido consumers to decode and interpret individual fields. The gido header is used to convey information about the gido itself, rather than details of the event, analysis report, or response prescription (which are captured in the payload). Each CIDF-compliant gido generated by any component MUST contain these fields in this order (for this version). Consult Appendix A for details on type definition. 3.2.2: The Header Fields 1. Version ID (type revision). Indicates the format revision used to encode this gido. Initially, the Version ID will indicate CIDF Version 1.0 (major = 1, minor = 0). This Version ID will be incremented as future versions are introduced. All current and future versions of this specification must reserve the first field of the gido header for the Version ID. Gido consumers may reliably use this field to detect the format of the remainder of the gido. ##################################################################### # Editor's Comment: This field suggests that CIDF revision # identifiers will follow a major.minor format. The CIDF working # group must decide if this is the proper revision format, and must # then define the meaning of major and minor revision indicators. ##################################################################### 2. Gido Length (4 octets, big-endian). Indicates the byte length of the entire gido, including this header but excluding any optional digital signature. This field may be used to cross- check gido completeness. 3. Time Stamp (4 octets, big-endian). Indicates the seconds since Unix epoch 1970. This time refers to the moment that this report or request was generated. Specifically, it does not refer to the time that any events were first detected, or when they occurred; these (if they are known) are to be placed in the message payload itself. 4. Thread ID (4 octets). Used to identify gidos with some common thread; all gidos about a given event (e.g., first report followed by successive updates) would share the same Thread ID. CIDF Specification: Version 0.6 Page 43 5. Class ID (2 octets). Indicates the category that the event, analysis, or response generator believes the gido falls under. Class IDs are defined in Section 3.2.3. This field is intended to allow receivers to process high-priority gidos in a given field of expertise before all others. Note that some codes are reserved for user-defined Class IDs; the receiver must check to see if prior agreement exists between sender and receiver on these codes. 6. Originator ID (unknown type). A unique identifier associated with the component generating this gido. ##################################################################### # Editor's Comment: The format and semantics of the Originator ID # is an open issue that requires resolution by the CIDF working group. # Specifically, how will CIDF modules be uniquely identified from other # CIDF modules? ##################################################################### 7. Flags (1 octet). The bits of this flags octet are to be interpreted according to the following table: Bit Meaning --- ------- 0 (LSB) set = optional signature present (see below). clear = no optional signature 1-7 (MSB) reserved (MSB = most significant bit) The gido payload, plain or compressed, immediately follows the header. If bit 0 in Flags is cleared, indicating no optional signature, the gido ends with the payload (indicated by the Gido Length header field). Otherwise, if bit 0 is set, indicating that a digital signature of the content is present, this signature is contained in a structure following the gido payload. Recall that the Gido Length header field indicates the end of the gido payload, not including the signature structure. The signature structure has the following fields in it: 1. Signature Length (2 octets). Indicates the length, including this field (signature length), of the signature structure, in octets. 2. Key ID (type unknown). Uniquely identifies the key used to generate the signature. This ID may be understood only by a given receiver if the gido is to be sent one-to-one. This field also implies the signature algorithm. ##################################################################### # Editor's Comment: This issues is tied up with that of originator-id # ##################################################################### CIDF Specification: Version 0.6 Page 44 3. Signature data. The entire gido represented by the Gido Length header field is passed through a gido digest, resulting in a short, fixed-length quantity. This quantity is then signed using the applicable encryption/signature algorithm, and the result of this operation placed in this field. 3.2.3 Class ID Codes The following default Class ID codes are defined for events and analysis results. Under this scheme, class ids 0 thru 15 are reserved for CIDF event priorities, and 16 thru 31 are reserved for analysis report priorities. In addition, class ids 32 thru 127 are reserved for future CIDF extensions. IDS developers may use the remaining range (128 thru 255) for application-specific purposes. (Default Event Class IDs) 00 - Complete Event 01 - Intermediate Event 02 - Incomplete Event 03 - E-box Internal Error Report 04 - E-box Internal Warning Report 05 - E-box Internal Status Message 06 - Reserved for E 07 - Reserved for E : 15 - Reserved for (Default Analysis Class IDs) 16 - Critical Security Violation 17 - Potential Security Violation 18 - Suspicious Report 19 - Warning Report 20 - Intermediate Result 21 - Informational Report 22 - A-box Internal Error Report 23 - A-box Internal Warning Report 24 - Reserved for A 25 - Reserved for A : 31 - Reserved for A (Reserved Priority Code Range) ##################################################################### # Editor's Comment: Class ID code range 32-48 is reserved for # R-Box countermeasure directives. ##################################################################### 32 - Reserved for future use 33 - Reserved for future use : 127 - Reserved for future use (Undefined Priority Codes) 128 - Undefined : : (Undefined values may be employed for : application-specific purposes.) 255 - Undefined CIDF Specification: Version 0.6 Page 45 ======================================================================== = 3.3: Encoding S-Expressions ======================================================================== GIDO payloads consist of S-expressions. However, these S-expressions are translated to an octet encoding format for efficient transmission or storage. The octet encoding of message payloads support highly efficient transmissions of messages. This section describes how to transform an S-expression into the appropriate octet encoding. This encoding is designed to meet the following objectives: * It must indicate the structure, so that a component ignorant of the elements within the S-expressions will still be able to parse the S-expressions. * It must allow for pre-defined and distributed-out-of-band SIDs. * It should be as compact as possible. 3.3.1: Octet Codes The following codes will be used to represent various octet values in the succeeding encoding specifications. They are *not* S-expression atoms. Code Value Interpretation ---- ----- -------------- SEP 0xff Used as separator. SOPEN 0xfe S-expression open. PTR 0xfd Pointer (referred to as @). SID 0xfc Prelude to SID 2-octet code. TYPE 0xfb Indicates concrete syntax type. 3.3.2: Encoding of S-Expression Grammar What follows is the grammar for CIDF S-expressions. After each line we give the encoding applicable to that line. ::= E() = E() ::= ( ) E() = E() ::= E() = E() E() ::= ( ) E() = SOPEN E(length{E() E()}) E() E() E(length{X}) = var_encode(X) CIDF Specification: Version 0.6 Page 46 ::= ( @ ) E() = SOPEN PTR E() E() E() = ascii_encode() ::= ( def ) E() = SOPEN E(length{E(def) E() E() E() E(def) E() E() E() E() = SID sid_encode() E() = ascii_encode() ::= E() = sid_encode() ::= ': E() = TYPE type_encode() sid_encode() ::= ( ) E() = SOPEN E(length{E()}) E() ::= E() = E() ::= E() = E() E() ::= E() = E() ::= E() = E() E() ::= E() = E() ::= ( ) E() = SOPEN E(length{E() E()}) E() E() 3.3.3: Auxiliary Functions The following functions are used in the above syntax and encoding: ascii_encode() returns the ASCII-encoding of . short_encode() returns the big-endian expression of . (E.g., short_encode(1234) = 0xd204.) sid_encode() returns the 2-octet code for . type_encode() returns the SEP-terminated code for . var_encode() encodes an arbitrarily long integer. It is encoded as follows: L1 | CIDF Specification: Version 0.6 Page 47 where L1 is one byte containing the length of , which is expressed in big-endian order. 3.3.4: Encoding Data Data may be encoded in one of two ways. If the applicable SID had a fixed-length data type, then the data is encoded exactly as specified by the type; e.g., a ulong is encoded as four octets in big-endian order. Otherwise, the data is encoded as follows: var_encode(length{Data}) | Data Thus, if Data is a variable-length data structure that is 84,000 bytes long, then it is encoded as follows: 03 01 48 20 xx xx xx ... 3.3.5: SID Codes SIDs are ordinarily encoded as 2-octet values. A list of pre-defined SIDs is given in Appendix B; if one exists for the purpose, it SHOULD be used. However, this encoding furnishes the ability to define new SIDs should no applicable one exist, using the "def" operative. For the purposes of encoding, "def" is treated as a SID as well (i.e., it has its own 2-octet code). As noted in Section 3.3.2, this requires one to define a new SID code. These SID codes may be unrestricted, but they should conform to the following standard: * The code is a 2-octet value, as stated above. * The MSB (bit 7) of the first octet is the DYNAMIC bit. If this bit is set, this is a dynamically-defined SID, and the code for the actual SID is given by bit 5 of the first octet through the LSB (bit 0) of the second octet. If it is clear, this is a statically-defined SID, and the code for the SID is as given in the appendix. * If the DYNAMIC bit is set, the 2-octet value is followed by a 4-octet value representing the UUID of the SID designer. Also, the next bit (bit 6 of the first octet) is the EXPERIMENTAL bit. If *this* bit is set, then the SID is ephemeral and cannot be relied on in future encodings. If it is clear, then this is a stable SID. CIDF Specification: Version 0.6 Page 48 ======================================================================== ======================================================================== = = 4: CIDF Communication = ======================================================================== ======================================================================== = ======================================================================== = 4.1: Message Layer ======================================================================== 4.1.1: Rationale for Message Layer The CIDF message layer was developed to solve problems of synchronization (i.e., blocking vs. non-blocking processes) and problems of different data formats for different operating systems. It also solves the problem that different groups will use different programming languages. In other words, the use of a messaging format achieves the following goals: * Independent of blocking/non-blocking processes * Data format independent * Operating system independent * Programming language independent 4.1.2: Objectives of the CIDF Message Layer The top-level objectives for the CIDF message layer are to * Provide an open architecture. * Avoid imposing architectural constraints or assumptions on the systems or modules. * Allow messaging independent of language, operating system, and network protocol. * Support easy addition of new components to the CIDF. * Support security requirements for authentication and privacy. * Support devices that don't want to fully support CIDF. 4.1.3: Message Format This message structure resides on top of the negotiated transport layer service. Note that all reserved fields are set to 0 on transmission and ignored on receipt. CIDF Specification: Version 0.6 Page 49 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Control Byte | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time Stamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Data (variable) | ~ ~ | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Privacy Trailer* (variable) | +-+-+-+-+-+-+-+-+ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * if privacy option is used Options all have a common type-length-value format described below. * Version - 1 octet. CIDF message-layer version (1 for this initial version). * Control Byte - 1 octet. Used by the message layer to support reliable transmission, flow control, and security association management. - Acknowledgement of a delivered message (1). - Message received, but not delivered because of lack of resources (2). - Message received, but the supplied security association was not available to all processing (4). * Checksum - 2 octets. A checksum across the entire CIDF message, prior to application of cryptographic mechanisms (i.e., privacy and authentication transforms). The checksum is computed as specified in the TCP standard (RFC 793). CIDF Specification: Version 0.6 Page 50 * Next Header - 1 octet. Defines the type of either the next message layer option or application. The following are the currently defined types. - Application Header (1) - Route List (4) - Privacy Header (50) - Authentication Header (51) * Length - 4 octets. Length of the CIDF message, including message header. * Sequence Number - 4 octets. Message layer sequence number used for message reliability (acknowledgement and duplicate removal) and to support protection against message replay. * Time Stamp - 4 octets. Used to provide loose time synchronization between CIDF communicating parties and to support tardy delivery detection (from denial of service). * Destination Address - 4 octets. IP address of the target of this message. This field identifies the eventual recipient of the CIDF message and is used to route CIDF messages through intermediate CIDF nodes that cannot be traversed by normal network routing (e.g., firewalls). 4.1.4: Message Layer Protocol Options Except for the CIDF privacy option, CIDF message options use the following format. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Option Data (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Next Header - 1 octet. Defines the type of either the next message layer option or application, with the same permitted values as defined above. * Length - 1 octet. Specifies the number of 32-bit words for this option, including the next type and length fields. * Option Data - variable length. The option data field is always padded to a 32-bit aligned size. 4.1.4.1: Route List Option CIDF Specification: Version 0.6 Page 51 Route List is a variable length field that specifies the CIDF nodes through which the message is to be routed for source routing, and through which the message has been routed for recorded routing. The Subtype field indicates whether this is a source or record route. The Route List has the following format. The route list option is used when the message destination and source are separated by CIDF nodes that cannot be traversed by normal network routing (e.g., firewalls). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Length | Subtype | Index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Route Data (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Next Header and Length are defined above. * Subtype - 1 octet. Specifies whether this is a recorded route or a source route. - Recorded Route (1) - Source Route (2) * Index - 1 octet. Index into the array of addresses specifying the current address to be processed. For source routing, this is the address of the next CIDF hop. For recorded routes, this is the address of the last transmitting CIDF node. * Route Data - variable length. This field is an array of Internet addresses. Each internet address is 32 bits or 4 octets. For a source route, if the index is greater than the length, the source route is empty and the routing is to be based on the destination address field. For a recorded route, if the index is greater than the length, the recorded route list is full. 4.1.4.2: Privacy Option The CIDF privacy option supports both unicast or multicast privacy. For multicast privacy, one node of the multicast group is selected to generate the keys. The keys are then distributed to each multicast group member. For unicast privacy, each node generates its own privacy keys which are distributed to the remote party. CIDF Specification: Version 0.6 Page 52 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Generator Identity | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Security Parameters Index (SPI) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Data* (variable) | ~ ~ | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Padding (0-255 bytes) | +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Pad Length | Next Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * (foot note) if the cryptographic algorithm requires use of an initialization vector, then that vector is placed as clear text between the SPI and Payload Data. * Key Generator Identity - 4 octets. This value identifies the CIDF entity that generated the key. The initial use of this field is to specify either the key generator's IP address or for multicast applications the multicast address for the multicast group using this security association. * Security Parameters Index (SPI) - 4 octets. The SPI is an arbitrary 32-bit value that uniquely identifies the Security Association for this message, relative to the key generator identity. * Padding - variable length. The transmitter may add up to 255 bytes of padding if required to support the block size of the cryptographic algorithm. Padding is required to ensure that after the privacy option is applied, the message ends on a 4-byte boundary. * Pad Length - 1 octet. The number of padding bytes immediately preceding it. The range of valid values is 0-255, where a value of zero indicates that no Padding bytes are present. * Next Header is defined above. 4.1.4.3: Authentication Header Option CIDF Specification: Version 0.6 Page 53 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Generator Identity | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Security Parameters Index (SPI) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Authentication Data (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Next Header and Length are defined above. * Key Generator Identity - 4 octets. This value identifies the CIDF entity that generated the key. The initial use of this field is to specify either the key generator's IP address or for multicast applications the multicast address for the multicast group using this security association. * Security Parameters Index (SPI) - 4 octets. The SPI is an arbitrary 32-bit value that uniquely identifies the Security Association for this message, relative to the key generator identity. * Authentication Data - variable number of 32-bit words. The data (e.g., digital signature or keyed hash) used to provide cryptographic authentication. 4.1.5: Cryptographic Mechanisms The CIDF message layer protocol provides data integrity and source authentication services for the negotiation phase of CIDF communication. This enables components to reliably establish communications with minimal security overhead. During the negotiation phase, the client and server determine the specific cryptographic services to be provided for further communication. The message layer provides the cryptographic mechanisms as options, enabling use of lower-level services (e.g., IPSEC), CIDF-specific mechanisms, or no cryptographic services, depending on application requirements. The mechanisms used are determined by the client based on the mechanisms supported by the server. The message layer mechanisms provide the fields necessary to (1) determine the cryptographic services applied (if any), (2) determine the cryptographic context, and (3) provide timeliness and replay protection. 4.1.6: Negotiation Mechanism 4.1.6.1: Introduction CIDF Specification: Version 0.6 Page 54 Our approach is to use the simplest reliable transport mechanism available (i.e., reliable CIDF messaging over UDP) as the default CIDF transport protocol. This simple protocol can then be used to negotiate a more or less complex protocol for those components requiring additional transport-layer services. This allows simple devices to participate easily, while allowing complex devices to take full advantage of other transport-layer mechanisms. The message layer provides optional services to compensate for weaknesses in the transport layer. The combination of the CIDF message layer with transport-layer options provides a range of communication capabilities that can be used to support different application requirements. The following types of transport/messaging are initially envisioned: * No assured delivery over a connection-less transport. That is, the CIDF message layer without acknowledgement and retransmission directly over UDP. * Assured delivery over a connection-less transport. That is, the CIDF message layer with reliable delivery (acknowledgement, retransmission, and duplicate removal) over UDP. * Assured delivery over a connection-oriented transport. That is, the CIDF message layer directly over TCP. * Object-oriented transport. That is, the CIDF operations over CORBA. To enable support for components that must use minimal communication infrastructure, the default transport mechanism is based on UDP. The following sections define the default transport layer protocol, CIDF security services, and the transport negotiation mechanisms. 4.1.6.1.1: Rationale for negotiated transport layer The simplest approach would be to mandate the use of a single transport protocol. But there is no one protocol that can adapt to the varying requirements of all anticipated CIDF applications. Depending on whether an application is concerned with real-time traffic or simple accrual of a database of events, different transport mechanisms are appropriate. Specifically, some CIDF applications require a very light-weight communication channel that does not have the resource usage required by current TCP implementations, while other applications require a flexible and robust communication channel such as TCP. Other requirements include application-specific support for multicast, which is not supported by TCP. Therefore, we have requirements for connectionless communication, reliable connectionless communication, and reliable connection-oriented communication. Additionally, we have varying requirements for security services. In some applications and environments, the infrastructure provides adequate security services. In other applications, we require CIDF-layer security services for authentication, privacy, or both. CIDF Specification: Version 0.6 Page 55 Nevertheless, communications clearly cannot begin between two specific components until a channel is agreed upon. At the very least, this implies that if we don't agree on a single channel for all transport, we need to agree on a single channel for transport negotiation. This channel needs to be widely supported and freely available. Components are allowed to share data on whatever channel they wish, but they must support channel negotiation on the common mechanism. To support this range of requirements we provide a protocol based on the reliable UDP variant of CIDF that enables applications to agree upon the desired transport protocol, plus the desired CIDF message-layer security services. This exchange is only necessary if the participants have not previously agreed upon a transport mechanism through external mechanisms (e.g., local configuration settings or through the CIDF directory service). 4.1.6.2: Default Transport Layer The default transport layer protocol for CIDF messages is reliable CIDF messaging over UDP. Other transport layer protocols may be used following a negotiation using the default of protocols and services required and supported by the CIDF client and server. Until we acquire a well-known CIDF port number, we will use 0x0CDF as the CIDF port. The CIDF message layer will listen on the CIDF well-known port for incoming CIDF messages. 4.1.6.3: Conformant transport options * CIDF message layer without acknowledgement and retransmission directly over UDP. * CIDF message layer with acknowledgement and retransmission over UDP. * CIDF message layer directly over TCP. 4.1.6.4: Option Negotiation Message Formats The negotiation for more advanced communication services occurs over a UDP channel using only the CIDF message layer with authentication mechanisms enabled. This enables components that do not support TCP to participate in CIDF. Negotiation occurs by the client querying the server's capabilities. In response, the server specifies the class of CIDF operations supported, message services supported, and whether extensions are supported. The client then selects the services and message mechanisms. This information can also be provided by the directory server. The CIDF transport negotiation protocol resides directly over the CIDF message layer. The query-response data format is shown below. We assume that for cryptographic services, the negotiation of the specific algorithms and modes is handled by the key distribution mechanism. CIDF Specification: Version 0.6 Page 56 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Request (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Type - 1 octet. Specifies the type of request. For option negotiation messages, this value is 1. * Length - 1 octet. Specifies the number of 32-bit words for this message, including the type and length fields. Option Requests are formatted as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Request | Length | Option | Selection | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Parameters (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Request - 1 octet. Specifies the type of request. The following request types are currently supported. - Want (1) - Preferred service. - Can (2) - Sender is capable of using this service. * Length - 1 octet. Specifies the number of 32-bit words for this option request, including the request and length fields. * Option - 1 octet. The option being negotiated. The following option types are currently supported. - Transport (1) - Privacy (2) - Authentication (3) * Selection - 1 octet. The option value being negotiated. The meaning of this fields depends on the option being negotiated. The following selection values are currently supported. For Transport negotiation. - None (0). Used to reject communication with another CIDF node when no acceptable options are received. - UDP (1) - Reliable UDP (2) - TCP (3) CIDF Specification: Version 0.6 Page 57 For Privacy negotiation. - None (0) - IPSEC (1) - SSL (2) - CIDF (3) For Authentication negotiation. - None (0) - IPSEC (1) - SSL (2) - CIDF (3) Currently, the only option parameter specified is the selection of TCP/UDP port number for transport negotiation, which is formatted as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Transport Port Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Type - 1 octet. Specifies the type of option parameter. For port numbers, this value is 1. * Length - 1 octet. Specifies the number of 32-bit words for this message, including the type and length fields. * Transport Port Number - 2 octets. This specifies on which port number the sender of the message will listen following completion of negotiation. Both ends of the channel select their own respective ports. 4.1.6.5: Protocol Description Identification of the remote CIDF component's IP address is handled either through manual configuration or through the CIDF directory service. Note that this service may also indicate the CIDF component's capabilities (can) and preferences (want) for transport and security services. When Sender S wishes to communicate with Receiver R, and the two components have not yet agreed on a transport mechanism, then S must initiate transport mechanism negotiation. S sends a negotiation message to R on the CIDF well-known port indicating the services preferred (if any) and permitted. S includes a separate option request for each supported option, indicating the preferred option (if any). When R receives an option negotiation, R selects the desired value using local preferences if supported by S, S's preferences if supported locally, or the intersection of local and S's capabilities if the preferences are not specified or supported. CIDF Specification: Version 0.6 Page 58 If the local and remote capabilities do not permit communication, the R selects a transport option of None, indicating that communication is not feasible. R responds with only the selected options for transport, privacy, and authentication identified as preferred options. ======================================================================== = 4.2: Message Layer Processing ======================================================================== 4.2.1: Introduction This section describes the processing of CIDF message layer messages. The standard procedures are used for CIDF messages independent of the transport layer. The reliable transmission procedures describe additional procedures to be used when the transport mechanisms is reliable UDP. CIDF privacy and authentication procedures describe the procedures used in providing CIDF layer privacy and authentication mechanisms, respectively. 4.2.2: Standard Procedures Each CIDF message uses the standard CIDF header. 4.2.2.1: Outbound Message Processing On request by the application layer to transmit a CIDF message, the CIDF message layer shall build the message header and append the message. If the application indicates that this message requires source routing, then the CIDF message layer shall use the supplied source route list. If the application indicates that this message requires recorded routing, then the CIDF message layer shall initialize the record route list, placing the outgoing IP address as the first entry on the route list. The CIDF message layer shall insert the current CIDF version number, the application-provided destination, and the current time as the CIDF header time-stamp. The CIDF message layer shall insert a new sequence number. The sequence number is initialize to 0, and incremented for each message sent by the CIDF message layer. The CIDF message layer shall compute the total message length and insert that length into the Length field. The CIDF message layer should compute and insert the checksum prior to message transmission. The checksum is inserted prior to applying CIDF privacy or authentication mechanisms. CIDF Specification: Version 0.6 Page 59 If CIDF privacy or authentication is being used, the CIDF message layer shall encrypt and generate the authentication data for the message based on the current security association in use with the recipient. If CIDF privacy or authentication is being used and no security association exists, then the message transmission request should be rejected. 4.2.2.2: Inbound Message Processing If the Version field is not a valid CIDF version number (currently 1), the CIDF message layer shall discard the message. If CIDF privacy or authentication is being used, the CIDF message layer shall decrypt and authenticate the message, and discard the message on failure. On failure, due to lack of a valid security association, the CIDF message layer should send a response to the source. The response is the CIDF message layer header, with the Control Byte set to 4. If the Checksum field is not 0, the CIDF message layer shall compute the message checksum (using the method described in RFC 793 and discard the message if the Checksum check fails. If the Time Stamp field indicates an unexpected delay, the CIDF message layer should notify the application. If the Destination Address is not the local CIDF node (i.e., the destination does not match the local node's address or any multicast address that the local node is using), the CIDF message layer shall determine the next CIDF hop (using the source route, if provided) and forward the message after adjusting the Sequence Number and Time Stamp. If the message includes a record route option, then the CIDF message layer shall enter its outgoing IP address if there is sufficient room in the record route structure and increment the route index. After processing, the CIDF node should compute the checksum as specified in RFC 793, and place the checksum in the Checksum field. Finally, the message layer shall apply the privacy and authentication transforms for the next CIDF hop and transmit the message. 4.2.3: Reliable Transmission Procedures 4.2.3.1: Outbound Message Processing For reliable message transmission, the CIDF message layer shall maintain the round-trip latency and mean deviation values for each node with which the local component communicates