DynamicProtocolDetection
From BroWiki
Contents |
Concept
The main idea of the new analyzer framework is to decouple protocol-decoding from connections. Traditionally, Bro decided with the very first packet of a connection which protocol analyzer to choose (based on the connection's destination port), instantiated a corresponding connection object and started analyzing. This decision was fixed until the termination of the connection. In the new framework, we associate an /analyzer tree/ with every connection. This tree can contain an arbitrary number of analyzers in various constellations and can be modified during the whole life-time of a connection, i.e., we can enable/disable analyzers on the fly. Most importantly, this gives us two new capabilities:
- We can perform protocol analysis independently of ports. By using a set of signatures which match typical protocol dialogues, Bro is now able to look at payload to find the correct analyzers. When such a signature matches, it turns on the corresponding analyzer.
- We can turn off analyzers when it becomes obvious that they are parsing the wrong protocol. This allows us to use rather lose protocol signatures and, if in doubt, try multiple analyzers in parallel.
There's more about all this in the Usenix paper.
Using the new analyzer framework
By default, there's no change from the user's point of view. Protocol analysis is still performed solely based on ports. The only difference is that now the set of ports that trigger the use of a particular analyzer can be modified on the script level. Each analyzer script registers its ports with the dpd_config, like this:
global ftp_ports: set[port] = { 21/tcp } &redef;
redef dpd_config += { [ANALYZER_FTP] = [$ports = ftp_ports] };
To change the ports, just do
redef ftp_ports = { 21/tcp, 10021/tcp };
To activate signature-based protocol detection, load the script dpd. The signatures DPD uses to identify protocols are listed in policy/sigs/dpd.sig. See below for how to use these signatures. Note that we don't have signatures for all Bro-supported protocols yet — see again below for the current status of DPD support in Bro. Feedback about the signatures in terms of quality/improvements/additions is greatly appreciated.
Since DPD-enabled analyzers are generally able to determine whether the connections they are processing actually are the ones intended, the DPD signatures may tolerate some false positives. Think of the signatures as an initial filtering stage. However, they still should be tight enough to feed only reasonable candidates to the protocol analyzer. Also, overall performance depends quite a bit on the analyzer's ability to detect that it is in fact parsing the wrong protocol, and often signature and analyzer code need to be tuned in parallel to achieve optimal performance (see below for how to adjust the analyzer code).
Note: The dpd.bro script does not modify the BPF filter; Bro keeps building that from the (primarily port-based) capture_filters defined in the analyzer scripts. When loading dpd you may need to change the filter to include all packets, e.g. on the command line:
bro -f "tcp or udp or icmp" ...
There are a couple of additional new scripts accompanying the new framework which make use of the new capabilities (sometimes only to perform their protocol analysis on all ports):
- detect-protocols.bro
- Reports (confirmed) uses of protocols on non-standard ports. Raises ProtocolFound for corresponding connections, and ServerFound for destination hosts running a service on a non-standard port. The scripts detect-protocols-http.bro can be loaded in addition. It examines HTTP sessions to further classify the payload, e.g., it tries to identify a couple of standard P2P protocols.
- dyn-disable.bro
- As said above, Bro can disable an analyzer on the fly if that finds that it cannot parse a connection's payload---which most probably means the protocol detection went wrong. However, by default Bro does never disable any analyzer but only raises protocol_violation events in such cases. The script dyn-disable.bro catches these events and then turns the complaining analyzers off via the new built-in function disable_analyzer. (If an analyzer has raised the event but is not disabled, in will probably generate lots of weirds subsequently.)
- irc-bot.bro
- Searches bots on IRC connections.
- proxy.bro
- Matches incoming HTTP requests with outgoing ones to find proxies.
- ftp.bro
- Not new, but extended: if Bro is compiled with libmagic support, it can now identify the content-type of FTP data transfers. This works by using the new built-function expect_connection to associate a future connection with a particular analyzer (which in this case is the also new FileAnalyzer).
Programming the new analyzer framework
Class Layout
Compared with Bro's old class hierarchy, the main difference is that there's now only a single Connection class. No other class is derived from Connection anymore. Instead, all analyzers derive from Analyzer.
We associate an /analyzer tree/ with each connection which reflects the data-flow during packet analysis, in terms of which analyzers are to perform their analysis. Each packet is first passed to the root node of the tree which passes its (potentially transformed) input on to all of its children. Each child in turn passes the data on to its successors.
The root node must always be of type TransportLayerAnalyzer. There are such analyzers for TCP, UDP, and ICMP. Application-layer analyzers are either derived from TCP_ApplicationAnalyzer (for TCP protocols) or from the general Analyzer (for all non-TCP protocols).
When a connection begins, the initial analyzer tree is instantiated by the global DPM. The initial tree always contains a corresponding TransportLayerAnalyzer. For TCP and UDP it also contains an instance of class PIA_TCP or PIA_UDP, respectively. The PIAs are responsible for detecting protocols as the connection progresses. Most importantly, they perform the signatures matching. Depending on whether any well-known ports are in use, the initial tree may or may not contain any application-layer analyzers right away.
Analyzer's can support one of two input methods (or both): packet-wise or stream-wise. An Analyzer can accept input via one method (e.g., packet-wise) and pass it on to its children via the other (e.g., stream-wise). The TCP_Analyzer for example reassembles packets into a byte-stream and thus all TCP_ApplicationAnalyzers only see stream-wise input.
Relevant Methods
Any Analyzer-derived class can override the following virtual methods:
- void Init()
- Initialization of the analyzer. Called before any data processing is performed.
- void Done()
- Clean-up of the analyzer. Called just before the instance is destroyed.
- void DeliverPacket(int len, const u_char* data, bool orig, int seq, const IP_Hdr* ip, int caplen)
- Interface for packet-wise input (or, more generally, chunk-wise in-order input as the parent analyzer does not necessarily need to pass full packets around).
- len
- Length of data.
- data
- Pointer to data.
- orig
- True if data is from connection originator, false for responder.
- seq
- >=0 if there's a sequence number associated with the data. -1 if not.
- ip
- Pointer to packet header if there's a packet associated with the data. 0 if not.
- caplen
- Length of the captured packed if ip in non-zero.
- void DeliverStream(int len, const u_char* data, bool orig)
Interface for stream-wise input.
- len
- Length of data.
- data
- Pointer to data.
- orig
- True if data is from connection originator, false for responder.
- void Undelivered(int seq, int len, bool orig)
- Interface for input which is supposed to be stream-wise but could not be fitted into a continuous stream (e.g., parts of a TCP stream which could not be reassembled).
- seq
- Sequence number of not-continuous chunk.
- len
- Length of not-continuous chunk.
- orig
- True if from connection originator, false for responder.
- unsigned int MemoryAllocation() const
- Returns number of memory bytes currently used by the analyzer.
In addition, analyzers need to have two static methods:
- static Analyzer* InstantiateAnalyzer(Connection* conn)
- Returns new instance of the analyzer class.
- static bool Available()
- Returns false if the analyzer is completely disabled and not to be considered for any connections. (Typically, this is the case if there are no event handlers defined for the analyzer.)
Any TCP_ApplicationAnalyzer-derived class can override the following virtual methods in addition to those of Analyzer:
- void EndpointEOF(bool is_orig)
- The given endpoint's data delivery is complete.
- void ConnectionFinished(int half_finished)
- Called whenever an endpoint enters TCP_CLOSED or TCP_RESET.
- void ConnectionReset()
- Called when the connection is reset.
- void PacketWithRST()
- Called whenever a RST packet is seen (sometimes invocation ConnectionReset is delayed).
Note: Whenever overriding one of these methods, call the parent's class' implementation first before doing anything else.
The classes Analyzer, TCP_ApplicationAnalyzer, and DPM provide a couple of methods for passing data on to child analyzers, manipulating the analyzer trees, generating events, etc. Read the source for them. :-)
There is one more thing: SupportAnalyzers which encapsulate common but protocol-independent tasks (e.g., line-splitting for line-based ASCII protocols). While also derived from Analyzer, support analyzers are conceptually different in the sense that
- they are directly associated with a particular parent analyzer. If the parent gets destroyed, all its support analyzers are deleted as well,
- they don't have children, and
- they handle only one direction of the connection's data, i.e., either the originator side or the responder side. If a parent analyzer wants to leverage a support analyzer for both directions, it needs to instantiate two of them.
All the support analyzers of a particular parent analyzer form a list (one list per direction). Every packet/stream-chunk which is handed to the parent first passes through this list. The output of the last support analyzer is then delivered via the parent's Deliver{Packet,Stream}. The most important support analyzer currently is the ContentLine_Analyzer which performs the mentioned line-splitting in ASCII protocols. It ensures to pass only full lines to the parent's DeliverStream().
HOWTOs
Implementing an Application Analyzer
These are the main steps to write an application analyzer for protocol Foo:
- Add a tag for Foo in AnalyzerTags.h.
- Add a line to analyzer_configs in Analyzer.cc. Take care to keep the correct order of the entries.
- Derive a new class from either TCP_ApplicationAnalyzer (for TCP) or Analyzer. Override the methods which you need and don't forget to call the parent's implementation first.
- Add the #include directive to include the header file that declares the new class into Analyzer.cc.
- Add InstantiateAnalyzer and Available methods:
static Analyzer* InstantiateAnalyzer(Connection* conn)
{ return new Foo_Analyzer(conn); }
static bool Available() { return foo_event_1 || foo_event_1; }
- In the constructor, add support analyzers if required, e.g.:
Foo_Analyzer::Foo_Analyzer(Connection* conn)
: TCP_ApplicationAnalyzer(AnalyzerTag::Foo, conn)
{
AddSupportAnalyzer(new ContentLine_Analyzer(conn));
}
- Add calls to Analyzer::ProtocolViolation() at points where the analyzer believes it is parsing the wrong protocol. Don't be too strict though as it's going to see a lot of crud even with the right protocol ...
- Add calls to Analyzer::ProtocolConfirmation() at points where the analyzer can be pretty sure to parse the right protocol.
Determining Analyzer Activation
Analyzers can use one of three ways to be fed new connections:
- use a preconfigured set of ports, thus triggering on all connections using any of the registered ports;
- use content signatures, thus triggering on all connections that match the relevant signatures;
- use hardcoding to trigger on all connections, when signatures won't cut it and the protocol uses arbitrary ports.
We now explain how to do each in turn.
If the analyzer is primarily supposed to work on a fixed set of ports, then add an entry to dpd_config in the analyzer's policy script:
global foo_ports: set[port] = { 12345/tcp, 54321/tcp } &redef;
redef dpd_config += { [ANALYZER_FTP] = [$ports = foo_ports] };
If you want to activate the analyzer via signatures (thus making it port-independent), add them to policy/sigs/dpd.sig. Below is the signature pair used for HTTP. It leverages the requires-reverse-signature statement to make the signature more reliable, and triggers the HTTP Analyzer via the enable "http" statement. Here, "http" refers to the textual name the analyzer is registered under in its entry in Analyzer::analyzer_configs in Analyzer.cc.
signature dpd_http_client {
ip-proto == tcp
payload /^space:*(GET|HEAD|POST)space:*/
tcp-state originator
}
signature dpd_http_server {
ip-proto == tcp
payload /^HTTP\/[0-9]/
tcp-state responder
requires-reverse-signature dpd_http_client
enable "http"
}
If you want to activate the analyzer on all connections, you manually need to hook the analyzer into the analyzer tree, in DPM::BuildInitialAnalyzerTree. For example, for a stream-based TCP content analyzer, you might use this...
// Around line 296:
if ( tcp )
{
// ...
if ( Foo_Analyzer::Available() )
tcp->AddChildAnalyzer(new Foo_Analyzer(conn));
... while for a packet-based one, you could use this:
if ( Foo_Analyzer::Available() )
tcp->AddChildPacketAnalyzer(new Foo_Analyzer(conn));
I supposed one could also use an all-matching signature, but I haven't tried this... |
What To Do If It Does Not Work
If your analyzer is not activated when you expect it to, try any of the below:
- If you are using signatures, make sure the signature actually matches. You can do so by adding an event "<explanation>" statement to the signature, and run Bro on your traffic with the signatures.bro policy.
- Make sure Bro actually processes your traffic. By adding print-filter at the end of your Bro invocation, the resulting BPF filter for your configuration will be printed to the console at startup. As mentioned above, one possible way to widen the filter is by passing -f "tcp or udp or icmp" to Bro.
- Build Bro with debugging support (./configure --enable-debug), and run it with the DPD debug stream enabled (by passing -B dpd at the command line). After completion, have a look at the resulting debug.log file and see whether it provides any clues.
General Caveats
Note: If you add a new file in the src/ or policy/ directories, you need to add it to the Makefile.am in the respective directory, and re-run autogen.sh, ./configure, and potentially make.
Note: TCP_ApplicationAnalyzers can access the state of the parent TCP_Analyzer by calling the method TCP. However, they should be coded in a way that they can also work without having a TCP parent (i.e., TCP() return 0). That will later allow us to use them with decapsulated tunnels. If that is not possible, they should at least do an assert(TCP()) so that ones notices if the analyzer is used in the wrong way.
Comments
That's it. If you feel anything important is missing here let Robin know, or just go ahead and add/clarify it. If there are any questions, feel free to ask.
Implementation Status
This table lists the state of the application-layer prootocol analyzers: whether they have been already adapted to new the frameworks, whether we already have signatures to detect them independent of the port (any signature contributions are appreciated!), and whether the new implementation has already been tested at least somewhat ("not tested" means no tests at all yet, mainly due to lack of having a trace at hand.)
| Analyzer | Adapted | Signatures |
|---|---|---|
| DCE_RPC | Yes | No |
| DNS | Yes | No |
| Finger | Yes | No |
| Gnutella | Yes | No |
| FTP | Yes | Yes |
| HTTP | Yes | Yes |
| Ident | Yes | No |
| IRC | Yes | Yes |
| NetbiosSSN | Yes | No |
| NCP | Yes | No |
| NFS | Yes | No |
| NTP | Yes | No |
| POP3 | Yes | Yes |
| Portmapper | Yes | No |
| RPC | Yes | No |
| RSH | Yes | No |
| Rlogin | Yes | No |
| SMB | Yes | No |
| SSH | Yes | Yes |
| SSL | Yes | No |
| SMTP | Yes | Yes |
| Telnet | Yes | No |

