Flow Reassembly

From BroWiki

Jump to: navigation, search

This page provides a walk-through of the parts of Bro that deal with flow reassembly. It explains the code paths from the point of packet arrival (from either the network or a trace file) to the triggering of application-layer events.

After startup, and depending on the configuration settings and command line arguments, Net.cc/net_init() creates the necessary IOSource instances that will process I/O in Bro's event loop. Packet sources (see PktSrc.h) are responsible for feeding packets to the event loop. Packet sources trigger packet processing through PktSrc::Process(), which calls NetSessions::DispatchPackets. This figures out what is the actual next packet to process since packets on multiple interfaces may arrive slightly out of order, and dispatches to NetSessions::NextPacket(). After some basic processing, the call proceeds to NetSessions::DoNextPacket(), where additional filtering, checksum verification, IP fragment reassembly and the transport layer protocol of the packet are determined. Depending on the protocol, a lookup in Bro's internal connection state tables is performed to see whether the connection the packet belongs to is already known. If not, the appropriate NetSessions::NewConn() method is called and a new connection control block is allocated and registered. Finally, the packet is dispatched to Connection::NextPacket(), which is appropriately specialized in classes derived from Connection.

The base class for connections is Connection (Conn.h), which is further specialized into TCP_Connection (TCP.h), UDP_Connection (UDP.h), and ICMP_Connection (ICMP.h). UDP "connections" are assumed to be all traffic using the same 5-tuple, ICMP "connections" are matching request/reply pairs.

Connection: The base class for connections

The most important method here is the pure virtual Connection::NextPacket(). Connection::ConnTransport() can be used to identify what kind of transport layer connection you are dealing with.

TCP_Connection: TCP connection handling

TCP_Connections make explicit the two endpoints of the connection in TCP_Endpoints (TCP_Endpoint.h) orig (originator) and resp (responder). There are two main components of endpoints: content processors (instances of TCP_Contents, TCP_Contents.h) and connection analyzers. TCP_Contents is specialized into many processors, for example SSL_ProxyEndpoint, TCP_Contents_DNS, TCP_Contents_RPC, TCP_Contents_SMB, and especially TCP_ContentLine for line-based protocols, see

http://www.cl.cam.ac.uk/~cpk25/bro/doxy/classTCP__Contents.html

TCP_ContentLine in turn is specialized into processors for HTTP, Rlogin, network virtual terminals, etc. Content line processors report new lines back to the underlying connection via TCP_Connection::NewLine(). Confusingly these content line processors are often named "...Endpoint", even though they aren't derived from TCP_Endpoint. See

http://www.cl.cam.ac.uk/~cpk25/bro/doxy/classTCP__ContentLine.html

Analyzers for the connection exist at two levels: connection analyzers (TCP_Analyzer, TCP.h) and endpoint analyzers (TCP_EndpointAnalyzer, TCP_Endpoint.h). The connection-level analyzer is typically used to maintain state that is not endpoint-specific. For example, Bro's stepping-stone analyzer consists of analyzers for the endpoints, derived from TCP_EndpointAnalyzer and a SteppingStoneAnalyzer derived from TCP_Analyzer, tracking both endpoint analyzers and maintaining additional state (see SteppingStone.h). The main difference between both types of analyzers and the content processors is that content processors operate at the application layer, i.e., on the reassembled byte stream, whereas the analyzer's input is on the granularity of individual packets.

TCP_Connection::NextPacket() first updates the TCP state machine of the endpoint sending the current packet, and then passes the segment payload to TCP_Endpoint::DataSent(). This in turn calls the content processor's DataSent() method if a content processor is associated, and then similarly calls the DataSent() method of all registered TCP_EndpointAnalyzers. Finally, it may record the payload in a contents file, and matches rules on the stream originating from the endpoint.

TCP_Contents::DataSent() is where TCP flow reassembly happens. TCP_Contents has a TCP_Reassembler to which it feeds the new data when executing DataSent(). If the segment payload can be reassembled into new flow data, the reassembler calls TCP_Contents::DeliverBlock() with the new data. This may trigger a "tcp_contents" event, before the new data are passed on to TCP_Contents::Deliver(). This is the method that content processors derived from TCP_Contents override to process new stream data according to their needs: TCP_Contents::Deliver() calls the underlying TCP_Connection's Deliver() method, which is implemented empty in TCP_Connection::Deliver(). For example, TCP_Contents_RPC::Deliver() parses the RPC structure out of the stream and triggers events which are then processed in the policy layer.

  • So when is it better to define new TCP_Content implementations, and when is it better to move most of the analysis to a new connection type? --CPK

UDP_Connection: UDP connection handling

The UDP case is much easier. UDP_Connection::NextPacket() considers datagram payload going from originator to responder as requests, and payload going the other way as replies. After verifying checksums, it triggers either UDP_Connection::Request() or UDP_Connection::Reply(), which get overridden by classes derived from UDP_Connection, such as UDP_DNS (DNS.h) or UDP_RPC (RPC.h). These derived classes parse the request/reply payloads as required and trigger events accordingly.

Personal tools