302 lines
17 KiB
HTML
302 lines
17 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
|
||
<title>WinPcap: NPF driver internals manual</title>
|
||
<link href="tabs.css" rel="stylesheet" type="text/css"/>
|
||
<link href="style.css" rel="stylesheet" type="text/css"/>
|
||
</head>
|
||
<body>
|
||
<!-- Generated by Doxygen 1.6.1 -->
|
||
<div class="navigation" id="top">
|
||
<div class="tabs">
|
||
<ul>
|
||
<li><a href="main.html"><span>Main Page</span></a></li>
|
||
<li><a href="pages.html"><span>Related Pages</span></a></li>
|
||
<li><a href="modules.html"><span>Modules</span></a></li>
|
||
<li><a href="annotated.html"><span>Data Structures</span></a></li>
|
||
<li><a href="files.html"><span>Files</span></a></li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
<div class="contents">
|
||
<h1>NPF driver internals manual<br/>
|
||
<small>
|
||
[<a class="el" href="group__internals.html">WinPcap internals</a>]</small>
|
||
</h1><table border="0" cellpadding="0" cellspacing="0">
|
||
<tr><td colspan="2"><h2>Modules</h2></td></tr>
|
||
<tr><td class="memItemLeft" align="right" valign="top"> </td><td class="memItemRight" valign="bottom"><a class="el" href="group__NPF__ioctl.html">NPF I/O control codes</a></td></tr>
|
||
<tr><td class="memItemLeft" align="right" valign="top"> </td><td class="memItemRight" valign="bottom"><a class="el" href="group__NPF__include.html">NPF structures and definitions</a></td></tr>
|
||
<tr><td class="memItemLeft" align="right" valign="top"> </td><td class="memItemRight" valign="bottom"><a class="el" href="group__NPF__code.html">NPF functions</a></td></tr>
|
||
<tr><td class="memItemLeft" align="right" valign="top"> </td><td class="memItemRight" valign="bottom"><a class="el" href="group__NPF__jitter.html">NPF Just-in-time compiler definitions</a></td></tr>
|
||
</table>
|
||
<hr/><a name="_details"></a><h2>Detailed Description</h2>
|
||
<html>
|
||
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
|
||
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
|
||
<meta name="ProgId" content="FrontPage.Editor.Document">
|
||
<title></title>
|
||
</head>
|
||
|
||
<body>
|
||
|
||
<p>This section documents the internals of the Netgroup Packet Filter (NPF), the kernel
|
||
portion of WinPcap. Normal users are probably interested in how to use WinPcap
|
||
and not in its internal structure. Therefore
|
||
the information present in this module is destined mainly to WinPcap developers and maintainers, or to
|
||
the people interested in how the driver works. In particular, a good knowledge
|
||
of OSes, networking and Win32 kernel programming and device drivers development
|
||
is required to profitably read this section. </p>
|
||
<p>NPF is the WinPcap component that does the hard work, processing the packets
|
||
that transit on the network and exporting capture, injection and analysis
|
||
capabilities to user-level.</p>
|
||
<p>The following paragraphs will describe the interaction of NPF with the
|
||
OS and its basic structure.</p>
|
||
<h2>NPF and NDIS</h2>
|
||
<p>NDIS (Network Driver Interface Specification) is a standard that defines the
|
||
communication between a network adapter (or, better, the driver that manages it)
|
||
and the protocol drivers (that implement for example TCP/IP). Main NDIS purpose
|
||
is to act as a wrapper that allows protocol drivers to send and receive packets
|
||
onto a network (LAN or WAN) without caring either the particular adapter or the
|
||
particular Win32 operating system.</p>
|
||
<p>NDIS supports three types of network drivers:</p>
|
||
<ol>
|
||
<li><strong>Network interface card or NIC drivers</strong>. NIC drivers
|
||
directly manage network interface cards, referred to as NICs. The NIC
|
||
drivers interface directly to the hardware at their lower edge and at their
|
||
upper edge present an interface to allow upper layers to send packets on the
|
||
network, to handle interrupts, to reset the NIC, to halt the NIC and to
|
||
query and set the operational characteristics of the driver. NIC drivers can
|
||
be either miniports or legacy full NIC drivers.
|
||
<ul>
|
||
<li>Miniport drivers implement only the hardware-specific operations
|
||
necessary to manage a NIC, including sending and receiving data on the
|
||
NIC. Operations common to all lowest level NIC drivers, such as
|
||
synchronization, is provided by NDIS. Miniports do not call operating
|
||
system routines directly; their interface to the operating system is
|
||
NDIS.<br>
|
||
A miniport does not keep track of bindings. It merely passes packets up
|
||
to NDIS and NDIS makes sure that these packets are passed to the correct
|
||
protocols.
|
||
<li>Full NIC drivers have been written to perform both hardware-specific
|
||
operations and all the synchronization and queuing operations usually
|
||
done by NDIS. Full NIC drivers, for instance, maintain their own binding
|
||
information for indicating received data. </li>
|
||
</ul>
|
||
<li><strong>Intermediate drivers</strong>. Intermediate drivers interface
|
||
between an upper-level driver such as a protocol driver and a miniport. To
|
||
the upper-level driver, an intermediate driver looks like a miniport. To a
|
||
miniport, the intermediate driver looks like a protocol driver. An
|
||
intermediate protocol driver can layer on top of another intermediate driver
|
||
although such layering could have a negative effect on system performance. A
|
||
typical reason for developing an intermediate driver is to perform media
|
||
translation between an existing legacy protocol driver and a miniport that
|
||
manages a NIC for a new media type unknown to the protocol driver. For
|
||
instance, an intermediate driver could translate from LAN protocol to ATM
|
||
protocol. An intermediate driver cannot communicate with user-mode
|
||
applications, but only with other NDIS drivers.
|
||
<li><b>Transport drivers or protocol drivers</b>. A protocol driver implements
|
||
a network protocol stack such as IPX/SPX or TCP/IP, offering its services
|
||
over one or more network interface cards. A protocol driver services
|
||
application-layer clients at its upper edge and connects to one or more NIC
|
||
driver(s) or intermediate NDIS driver(s) at its lower edge.</li>
|
||
</ol>
|
||
<p>NPF is implemented as a protocol driver. This is not the best possible choice
|
||
from the performance point of view, but allows reasonable independence from the
|
||
MAC layer and as well as complete access to the raw traffic.</p>
|
||
<p>Notice that the various Win32 operating systems have different versions of
|
||
NDIS: NPF is NDIS 5 compliant under Windows 2000 and its derivations (like
|
||
Windows XP), NDIS 3
|
||
compliant on the other Win32 platforms. </p>
|
||
<p>Next figure shows the position of NPF inside the NDIS stack:</p>
|
||
<p align="center"><img border="0" src="npf-ndis.gif"></p>
|
||
<p align="center"><b>Figure 1: NPF inside NDIS.</b></p>
|
||
<p>The interaction with the OS is normally asynchronous. This means that the
|
||
driver provides a set of callback functions that are invoked by the system when
|
||
some operation is required to NPF. NPF exports callback functions for all the I/O operations of the
|
||
applications: open, close, read, write, ioctl, etc.</p>
|
||
<p>The interaction with NDIS is asynchronous as well: events
|
||
like the arrival of a new packet are notified to NPF through a callback
|
||
function (Packet_tap() in this case). Furthermore, the interaction with NDIS and
|
||
the NIC
|
||
driver takes always place by means of non blocking functions: when NPF invokes a
|
||
NDIS function, the call returns immediately; when the processing ends, NDIS invokes
|
||
a specific NPF
|
||
callback to inform that the function has finished. The
|
||
driver exports a callback for any low-level operation, like sending packets,
|
||
setting or requesting parameters on the NIC, etc.</p>
|
||
|
||
<h2>NPF structure basics</h2>
|
||
|
||
<p>Next figure shows the structure of WinPcap, with particular reference to the
|
||
NPF driver.</p>
|
||
|
||
<p align="center"><img border="0" src="npf-npf.gif" width="500" height="412"></p>
|
||
|
||
<p align="center"><b>Figure 2: NPF device driver.</b>
|
||
|
||
<p>NPF is able to
|
||
perform a number of different operations: capture, monitoring, dump to disk,
|
||
packet injection. The following paragraphs will describe shortly each of these
|
||
operations.</p>
|
||
<h4>Packet Capture</h4>
|
||
<p>The most important operation of NPF is packet capture.
|
||
During a capture, the driver sniffs the packets using a network interface and delivers them intact to the
|
||
user-level applications.
|
||
</p>
|
||
<p>The capture process relies on two main components:</p>
|
||
<ul>
|
||
<li>
|
||
<p>A packet filter that decides if an
|
||
incoming packet has to be accepted and copied to the listening application.
|
||
Most applications using NPF reject far more packets than those accepted,
|
||
therefore a versatile and efficient packet filter is critical for good
|
||
over-all performance. A packet filter is a function with boolean output
|
||
that is applied to a packet. If the value of the function is true the
|
||
capture driver copies
|
||
the packet to the application; if it is false the packet is discarded. NPF
|
||
packet filter is a bit more complex, because it determines not only if the
|
||
packet should be kept, but also the amount of bytes to keep. The filtering
|
||
system adopted by NPF derives from the <b>BSD Packet Filter</b> (BPF), a
|
||
virtual processor able to execute filtering programs expressed in a
|
||
pseudo-assembler and created at user level. The application takes a user-defined filter (e.g. <20>pick up all UDP packets<74>)
|
||
and, using wpcap.dll, compiles them into a BPF program (e.g. <20>if the
|
||
packet is IP and the <i>protocol type</i> field is equal to 17, then return
|
||
true<75>). Then, the application uses the <i>BIOCSETF</i>
|
||
IOCTL to inject the filter in the kernel. At this point, the program
|
||
is executed for every incoming packet, and only the conformant packets are
|
||
accepted. Unlike traditional solutions, NPF does not <i>interpret</i>
|
||
the filters, but it <i>executes</i> them. For performance reasons, before using the
|
||
filter NPF feeds it to a JIT compiler that translates it into a native 80x86
|
||
function. When a packet is captured, NPF calls this native function instead
|
||
of invoking the filter interpreter, and this makes the process very fast.
|
||
The concept behind this optimization is very similar to the one of Java
|
||
jitters.</li>
|
||
<li>
|
||
<p>A circular buffer to store the
|
||
packets and avoid loss. A packet is stored in the buffer with a header that
|
||
maintains information like the timestamp and the size of the packet.
|
||
Moreover, an alignment padding is inserted between the packets in order to
|
||
speed-up the access to their data by the applications. Groups of packets can be copied
|
||
with a single operation from the NPF buffer to the applications. This
|
||
improves performances because it minimizes the number of reads. If the
|
||
buffer is full when a new packet arrives, the packet is discarded and
|
||
hence it's lost. Both kernel and user buffer can be
|
||
changed at runtime for maximum versatility: packet.dll and wpcap.dll provide functions for this purpose.</li>
|
||
</ul>
|
||
<p>The size of the user buffer is very
|
||
important because it determines the <i>maximum</i> amount of data that can be
|
||
copied from kernel space to user space within a single system call. On the other
|
||
hand, it can be noticed that also the <i>minimum</i> amount of data that can be copied
|
||
in a single call is extremely important. In presence of a large value for this
|
||
variable, the kernel waits for the arrival of several packets before copying the
|
||
data to the user. This guarantees a low number of system calls, i.e. low
|
||
processor usage, which is a good setting for applications like sniffers. On the
|
||
other side, a small value means that the kernel will copy the packets as soon as
|
||
the application is ready to receive them. This is excellent for real time
|
||
applications (like, for example, ARP redirectors or bridges) that need the better
|
||
responsiveness from the kernel.
|
||
From this point of view, NPF has a configurable behavior, that allows users to choose between
|
||
best efficiency or best responsiveness (or any intermediate situation). </p>
|
||
<p>The wpcap library includes a couple of system calls that can be used both to set the timeout after
|
||
which a read expires and the minimum amount of data that can be transferred to
|
||
the application. By default, the read timeout is 1 second, and the minimum
|
||
amount of data copied between the kernel and the application is 16K.</p>
|
||
<h4> Packet injection</h4>
|
||
<p> NPF allows to write raw packets to the network. To send data, a
|
||
user-level application performs a WriteFile() system call on the NPF device file. The data is sent to the network as is, without encapsulating it in
|
||
any protocol, therefore the application will have to build the various headers
|
||
for each packet. The application usually does not need to generate the FCS
|
||
because it is calculated by the network adapter hardware and it is attached
|
||
automatically at the end of a packet before sending it to the network.</p>
|
||
<p>In normal situations, the sending rate of the packets to the network is not
|
||
very high because of the need of a system call for each packet. For this reason,
|
||
the possibility to send a single packet more than once with a single write
|
||
system call has been added. The user-level application can set, with an IOCTL
|
||
call (code pBIOCSWRITEREP), the number of times a single packet will be
|
||
repeated: for example, if this value is set to 1000, every raw packet written by
|
||
the application on the driver's device file will be sent 1000 times. This
|
||
feature can be used to generate high speed traffic for testing purposes: the
|
||
overload of context switches is no longer present, so performance is remarkably
|
||
better. </p>
|
||
|
||
<h4> Network monitoring</h4>
|
||
<p>WinPcap offers a kernel-level programmable monitoring
|
||
module, able to calculate simple statistics on the network traffic. The
|
||
idea behind this module is shown in Figure
|
||
2: the statistics can be gathered without the need to copy the packets to
|
||
the application, that simply receives and displays the results obtained from the
|
||
monitoring engine. This allows to avoid great part of the capture overhead in
|
||
terms of memory and CPU clocks.</p>
|
||
<p>The monitoring engine is
|
||
made of a <i>classifier</i> followed by a <i>counter</i>. The packets are
|
||
classified using the filtering engine of NPF, that provides a configurable way
|
||
to select a subset of the traffic. The data that pass the filter go to the
|
||
counter, that keeps some variables like the number of packets and
|
||
the amount of bytes accepted by the filter and updates them with the data of the
|
||
incoming packets. These variables are passed to the user-level application at
|
||
regular intervals whose period can be configured by the user. No buffers are
|
||
allocated at kernel and user level.</p>
|
||
<h4>Dump to disk</h4>
|
||
<p>The dump to disk
|
||
capability can be used to save the network data to disk directly from kernel
|
||
mode.
|
||
</p>
|
||
<p align="center"><img border="0" src="npf-dump.gif" width="400" height="187">
|
||
</p>
|
||
<p align="center"><b>Figure 3: packet capture versus kernel-level dump.</b>
|
||
</p>
|
||
<p>In
|
||
traditional systems, the path covered by the packets that are saved to disk is
|
||
the one followed by the black arrows in Figure
|
||
3: every packet is copied several times, and normally 4 buffers are
|
||
allocated: the one of the capture driver, the one in the application that keeps
|
||
the captured data, the one of the stdio functions (or similar) that are used by
|
||
the application to write on file, and finally the one of the file system.
|
||
|
||
</p>
|
||
<p>When the
|
||
kernel-level traffic logging feature of NPF is enabled, the capture driver
|
||
addresses the file system directly, hence the path covered by the packets is the
|
||
one of the red dotted arrow: only two buffers and a single copy are necessary,
|
||
the number of system call is drastically reduced, therefore the performance is
|
||
considerably better.
|
||
|
||
</p>
|
||
<p>Current
|
||
implementation dumps the to disk in the widely used libpcap format. It gives
|
||
also the possibility to filter the traffic before the dump process in order to
|
||
select the packet that will go to the disk.
|
||
</p>
|
||
<h2>Further reading</h2>
|
||
<p>The structure of NPF and its filtering engine derive directly from the one of
|
||
the BSD Packet Filter (BPF), so if you are interested the subject you can read
|
||
the following papers:</p>
|
||
<p>- S. McCanne and V. Jacobson, <a href="ftp://ftp.ee.lbl.gov/papers/bpf-usenix93.ps.Z">The
|
||
BSD Packet Filter: A New Architecture for User-level Packet Capture</a>.
|
||
Proceedings of the 1993 Winter USENIX Technical Conference (San Diego, CA, Jan.
|
||
1993), USENIX. </p>
|
||
<p>- A. Begel, S. McCanne, S.L.Graham, BPF+: <a href="http://www.acm.org/pubs/articles/proceedings/comm/316188/p123-begel/p123-begel.pdf">Exploiting
|
||
Global Data-flow Optimization in a Generalized Packet Filter Architecture</a>,
|
||
Proceedings of ACM SIGCOMM '99, pages 123-134, Conference on Applications,
|
||
technologies, architectures, and protocols for computer communications, August
|
||
30 - September 3, 1999, Cambridge, USA</p>
|
||
<h2>Note</h2>
|
||
<p>The code documented in this manual is the one of the Windows NTx version of
|
||
NPF. The Windows 9x code is very similar, but it is less efficient and
|
||
lacks advanced features like kernel-mode dump.</p>
|
||
<p>
|
||
|
||
|
||
</body>
|
||
|
||
</html>
|
||
</div>
|
||
|
||
<hr>
|
||
<p align="right"><img border="0" src="winpcap_small.gif" align="absbottom" width="91" height="27">
|
||
documentation. Copyright (c) 2002-2005 Politecnico di Torino. Copyright (c) 2005-2009
|
||
CACE Technologies. All rights reserved.</p>
|