Discussion:
libpcap on Mac Os X 10.6 Snow Leopard
Marco De Angelis
2010-01-31 09:07:59 UTC
Permalink
Hi.

We have an application that uses libpcap for many Linux versions and for Mac Os X Leopard with an excellent outcome. When tested on Snow Leopard (10.6.2), it stopped working. I googled a lot and found out about the BPF issues that you mention on many posts like http://www.mail-archive.com/wireshark-***@wireshark.org/msg16294.html

I'm not monitoring my own packets, and anyway, giving read and write permissions to group and to everybody didn't help. I ruled out also the wireless card problems by attaching directly to the router. Nada.

Since Mac Os X 10.6 ships with libpcap 1.0.0, I tried using the new interface with pcap_create and pcap_activate, which also allows buffer customization previously unavailable. After many tests and combinations, it worked with this strange trick: reducing the buffer size to 128 bytes, so that only 1 packet could be held in the system's buffer, and thus it would be delivered to the application immediately when the next packet arrives. Changing all the other settings (timeouts, packet count in the pcap_dispatch, etc) do not affect the results.

Of course the last packet never gets delivered. If a give the buffer enough space for 10 packets, I can see that the last 10 packets of what I'm monitoring are not delivered, therefore I suppose they lie in the buffer and do not get delivered by pcap_dispatch. I tried also using pcap_loop, without any change. Here's the creation of the session.


// Using PCAP 1.0.0 features on Mac OS X Snow Leopard
#if defined(PCAP_HAS_CREATE)
if ((pcapSession = pcap_create(iface->getName().c_str(), errbuf)) == NULL)
{
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<errbuf<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, errbuf);
}
if (pcap_set_snaplen(pcapSession, snapLen) != 0)
{
std::string error = pcap_geterr(pcapSession);
pcap_close(pcapSession);
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<error<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, error);
}
if (pcap_set_promisc(pcapSession, promisc ? 1 : 0) != 0)
{
std::string error = pcap_geterr(pcapSession);
pcap_close(pcapSession);
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<error<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, error);
}
if (pcap_set_timeout(pcapSession, 1000) != 0)
{
std::string error = pcap_geterr(pcapSession);
pcap_close(pcapSession);
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<error<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, error);
}
if (pcap_set_rfmon(pcapSession, 0) != 0)
{
std::string error = pcap_geterr(pcapSession);
pcap_close(pcapSession);
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<error<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, error);
}
// FIXME: This is where the workaround takes place! Increase the buffer and packets
// are proportionally not delivered!
if (pcap_set_buffer_size(pcapSession, 128) != 0)
{
std::string error = pcap_geterr(pcapSession);
pcap_close(pcapSession);
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<error<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, error);
}
if (pcap_activate(pcapSession) != 0)
{
std::string error = pcap_geterr(pcapSession);
pcap_close(pcapSession);
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<error<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, error);
}
#else
if ((pcapSession = pcap_open_live(iface->getName().c_str(), snapLen, promisc ? 1 : 0, 1000,errbuf)) == NULL)
{
LOG_STATIC_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<errbuf<<">");
RAISE_EXCEPTION_WITH_MSG(CreateSessionErrorException, errbuf);
}
#endif
The call to dispatch is very simple, this is a snippet:

while(true)
{
int32_t ret = pcap_dispatch(m_impl->pcapSession, 100,
detail::PacketCaptureSession_pcap_handler, (u_char*)this);

if (ret==-1)
{
std::string error = pcap_geterr(m_impl->pcapSession);
LOG_TRACE(util::logging::METHOD_EXIT_FAIL<<" error <"<<error<<">");
RAISE_EXCEPTION_WITH_MSG(PacketCaptureSessionException, error);
}
...


Any idea that could point me in resolving the issue? Have you ever seen this behaviour before? The application works fine with all other O.S. which run older pcap versions. I recompiled tcpdump 4.0.0 on my machine, and it works! Therefore I shall be able to capture correctly.

Best regards,
Marco
Guy Harris
2010-01-31 20:30:03 UTC
Permalink
The issue described in that message is fixed in 10.6.2.

The other BPF issue - timeouts < 1 second not working - is also fixed in 10.6.2.

These are both BPF issues; libpcap 1.0.0 didn't *introduce* them - 1.0.0 won't have them on pre-10.6 OS X, and 1.0.0 and earlier versions will also have the first of those issues on 10.6 and 10.6.1, and the second of those issues on all 10.6.x releases (the BPF issue was worked around in libpcap; the workaround is also in the main Git branch from tcpdump.org).

In addition, you're specifying a 1-second timeout, so the second issue wouldn't affect you (tcpdump works, and it uses a timeout of 1000, i.e. 1000ms = 1s).
Post by Marco De Angelis
I'm not monitoring my own packets, and anyway, giving read and write permissions to group and to everybody didn't help. I ruled out also the wireless card problems by attaching directly to the router. Nada.
So what is the exact problem you're seeing? What is the difference you see between Leopard and Snow Leopard? (PF_PACKET sockets work differently from BPF, so differences between Linux and {Leopard,Snow Leopard,*BSD} are less interesting here.)-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Marco De Angelis
2010-02-01 15:28:36 UTC
Permalink
Post by Guy Harris
The issue described in that message is fixed in 10.6.2.
Thanks so much for replying (Sorry if this reply arrives twice, I had problems
in subscribing). That is good to know. I have 10.6.2, but I still experience
problems (packets not dispatched).
Post by Guy Harris
These are both BPF issues; libpcap 1.0.0 didn't *introduce* them -
I was just looking at my depedencies, without being sure if I should investigate
more for a Snow Leopard bug or on the libpcap side.
Post by Guy Harris
So what is the exact problem you're seeing? What is the difference you see
between Leopard and Snow Leopard?
Post by Guy Harris
(PF_PACKET sockets work differently from BPF, so differences between Linux and
{Leopard,Snow
Post by Guy Harris
Leopard,*BSD} are less interesting here.)-
The problem is that the packets are not delivered to the application. More
specifically, it seems that libpcap captures them, but the pcap_dispatch (and
pcap_loop as well) does not deliver packets to the pcap_handler. Packets seems
to remain in the buffer and they get delivered only when the buffer is full.

With a buffer of 128 bytes (which can hold only one packet), the packets are
delivered to the application immediately.
With a buffer of 1280 bytes, I get the packets delivered at burst of ten, only
when the next ten are collected. Of course, that means also that the last group
of packets would remain in the buffer and are never delivered.

The problem is, the same code is working perfectly on all other OSes. Can you
suggest something to try out?
Post by Guy Harris
Post by Marco De Angelis
I recompiled tcpdump 4.0.0 on my machine, and it works!
On which machine? The Snow Leopard machine? If so, does the tcpdump 4.0.0
that comes with Snow Leopard *not* work?

The original Tcpdump on Snow Leopard (the one that comes with the O.S.) worked
fine, and also the one I downloaded and recompiled. I recompiled it just to be
sure that they didn't do some "trick" to make it work.
Maybe I just don't trust the Authority :)

Regards,
Marco
Carter Bullard
2010-02-01 16:44:21 UTC
Permalink
Gentle people,
I also am seeing similar behavior with libpcap-1.0.0 on Snow Leopard (10.6.2).
Seems that this just started very recently, possible with the upgrade to 10.6.2
but not sure about that.

In my application, which uses pcap_dispatch() in non-blocking mode, and uses
select() to be notified when to read all available packets, I am receiving packets from
wireless interfaces in big "chunks", where the times between chunks can be rather
large (> 10-30 seconds) even though there are packets every, lets say 0.25 seconds.

I have not had time to verify if this is seen on all interfaces.

I open the interface using pcap_open_live(), with a 0.1 second timeout value. All
other parameters are default.

Is any additional information I can provide to assist?

Carter
Post by Marco De Angelis
Post by Guy Harris
The issue described in that message is fixed in 10.6.2.
Thanks so much for replying (Sorry if this reply arrives twice, I had problems
in subscribing). That is good to know. I have 10.6.2, but I still experience
problems (packets not dispatched).
Post by Guy Harris
These are both BPF issues; libpcap 1.0.0 didn't *introduce* them -
I was just looking at my depedencies, without being sure if I should investigate
more for a Snow Leopard bug or on the libpcap side.
Post by Guy Harris
So what is the exact problem you're seeing? What is the difference you see
between Leopard and Snow Leopard?
Post by Guy Harris
(PF_PACKET sockets work differently from BPF, so differences between Linux and
{Leopard,Snow
Post by Guy Harris
Leopard,*BSD} are less interesting here.)-
The problem is that the packets are not delivered to the application. More
specifically, it seems that libpcap captures them, but the pcap_dispatch (and
pcap_loop as well) does not deliver packets to the pcap_handler. Packets seems
to remain in the buffer and they get delivered only when the buffer is full.
With a buffer of 128 bytes (which can hold only one packet), the packets are
delivered to the application immediately.
With a buffer of 1280 bytes, I get the packets delivered at burst of ten, only
when the next ten are collected. Of course, that means also that the last group
of packets would remain in the buffer and are never delivered.
The problem is, the same code is working perfectly on all other OSes. Can you
suggest something to try out?
Post by Guy Harris
Post by Marco De Angelis
I recompiled tcpdump 4.0.0 on my machine, and it works!
On which machine? The Snow Leopard machine? If so, does the tcpdump 4.0.0
that comes with Snow Leopard *not* work?
The original Tcpdump on Snow Leopard (the one that comes with the O.S.) worked
fine, and also the one I downloaded and recompiled. I recompiled it just to be
sure that they didn't do some "trick" to make it work.
Maybe I just don't trust the Authority :)
Regards,
Marco
-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Guy Harris
2010-02-01 23:25:25 UTC
Permalink
Post by Carter Bullard
Gentle people,
I also am seeing similar behavior with libpcap-1.0.0 on Snow Leopard (10.6.2).
Seems that this just started very recently, possible with the upgrade to 10.6.2
but not sure about that.
In my application, which uses pcap_dispatch() in non-blocking mode, and uses
select() to be notified when to read all available packets,
Whether select() works correctly with BPF devices depends on the version of the OS you're using. It works in "newer" versions of *BSD, for various values of "newer" (I don't have a complete list handy, but the fix went into FreeBSD in 4.6); it works in "no" version of OS X, for the simplest value of "no". :-)

The problem is that a select(), in the OSes where it doesn't work, doesn't start a timer, so the wakeup occurs *only* when the store buffer in the kernel fills up.

The workaround is to

1) put the BPF descriptor into non-blocking mode;

2) pass the desired timeout (same as was specified in pcap_open_live() or pcap_set_timeout()) as the timeout for select() (or, if you're already using a timeout for select(), pass the minimum of the timeout you want and the desired BPF timeout);

3) read from the BPF device with pcap_dispatch() (or pcap_next() or pcap_next_ex()) regardless of whether select() says it has anything to read or not;

as that'll mean you'll check it periodically - and the read will just return -1 with EAGAIN if there are no packets available and return what packets are available if some are.

That's not new in Snow Leopard, however; that's been the case since Day One. (And, yes, there's a bug filed on that.)
Post by Carter Bullard
Is any additional information I can provide to assist?
If you're already doing that workaround, is your program running 32-bit or 64-bit?-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Guy Harris
2010-01-31 22:19:40 UTC
Permalink
Post by Marco De Angelis
I recompiled tcpdump 4.0.0 on my machine, and it works!
On which machine? The Snow Leopard machine? If so, does the tcpdump 4.0.0 that comes with Snow Leopard *not* work?
Marco De Angelis
2010-02-01 09:08:07 UTC
Permalink
Hi Guy.

Thanks so much for replying.
Post by Guy Harris
The issue described in that message is fixed in 10.6.2.
That is good to know. I have 10.6.2, but I still experience problems
Post by Guy Harris
These are both BPF issues; libpcap 1.0.0 didn't *introduce* them -
I was just looking at my depedency, without being sure if I should investigate more for
a Snow Leopard bug or on the libpcap side.
Post by Guy Harris
So what is the exact problem you're seeing? What is the difference you see between Leopard and Snow Leopard?
(PF_PACKET sockets work differently from BPF, so differences between Linux and {Leopard,Snow
Leopard,*BSD} are less interesting here.)-
The problem is that the packets are not delivered to the application. More specifically,
it seems that libpcap captures them, but the pcap_dispatch (and pcap_loop as well) does
not deliver packets to the pcap_handler. Packets seems to remain in the buffer and they
get delivered only when the buffer is full.

With a buffer of 128 bytes (which can hold only one packet), the packets are delivered to
the application immediately.
With a buffer of 1280 bytes, I get the packets delivered at groups of ten, only when the
next ten are collected. Of course, that means also that the last group of packets would
remain in the buffer and are never delivered.

The problem is, the same code is working perfectly on all other OSes. Can you suggest something
to try out?
Post by Guy Harris
Post by Marco De Angelis
I recompiled tcpdump 4.0.0 on my machine, and it works!
On which machine? The Snow Leopard machine? If so, does the tcpdump 4.0.0 that comes with
Snow Leopard *not* work?
Tcpdump worked on Snow Leopard (the one that comes with the O.S.), and also the one I downloaded and
recompiled. I recompiled it just to be sure that they didn't do some "trick" to make it work.
Maybe I just don't trust the Authority :)

Regards,
Marco
Guy Harris
2010-02-01 23:17:30 UTC
Permalink
Post by Marco De Angelis
The problem is that the packets are not delivered to the application. More specifically,
it seems that libpcap captures them, but the pcap_dispatch (and pcap_loop as well) does
not deliver packets to the pcap_handler.
What do you mean by "libpcap captures them"? Do you mean that libpcap reads the packets into the userland buffer attached to the pcap_t, or that *BPF* captures them (i.e., they get put into the *kernel* buffer for the BPF device) but libpcap doesn't read them into its userland buffer?
Post by Marco De Angelis
Packets seems to remain in the buffer and they
get delivered only when the buffer is full.
If you're referring to the BPF kernel buffer, that sounds as if the timeout mechanism isn't working. That was a bug that happened in 10.6 and 10.6.1 for 64-bit programs specifying sub-second timeouts, but that's fixed in 10.6.2 - *if* you're using libpcap (rather than using raw BPF; the bug in BPF isn't fixed, it's just worked around in libpcap).

Is your program built as a 32-bit program or a 64-bit program?
Post by Marco De Angelis
Tcpdump worked on Snow Leopard (the one that comes with the O.S.), and also the one I downloaded and
recompiled. I recompiled it just to be sure that they didn't do some "trick" to make it work.
Maybe I just don't trust the Authority :)
Which authority? The one at

http://www.opensource.apple.com/source/tcpdump/tcpdump-27/

(you might have to sign up for ADC to view that, but it's free)?

Presumably the tcpdump you downloaded and recompiled was recompiled on Snow Leopard, which means that, unless your machine has a 32-bit processor, it was compiled, by default, 64-bit. Perhaps something's broken with 32-bit programs, now (it shouldn't be, as the workaround in libpcap shouldn't change what libpcap does in 32-bit mode, but perhaps there's some other issue).-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Marco De Angelis
2010-02-03 13:03:20 UTC
Permalink
Post by Guy Harris
Post by Marco De Angelis
it seems that libpcap captures them, but the pcap_dispatch (and pcap_loop as
well) does not deliver packets to the pcap_handler.
Post by Guy Harris
What do you mean by "libpcap captures them"? Do you mean that libpcap reads
the packets into the userland
Post by Guy Harris
buffer attached to the pcap_t, or that *BPF* captures them (i.e., they get put
into the *kernel* buffer for
Post by Guy Harris
the BPF device) but libpcap doesn't read them into its userland buffer?
Good question. Do you know how could I verify the buffer they stay in? Is there
some printout I could add before calling pcap_dispatch to see what's in the
kernel buffer and what in the userland buffer?
Post by Guy Harris
Post by Marco De Angelis
Packets seems to remain in the buffer and they
get delivered only when the buffer is full.
If you're referring to the BPF kernel buffer, that sounds as if the timeout
mechanism isn't working. That
Post by Guy Harris
was a bug that happened in 10.6 and 10.6.1 for 64-bit programs specifying
sub-second timeouts, but that's
Post by Guy Harris
fixed in 10.6.2 - *if* you're using libpcap (rather than using raw BPF; the
bug in BPF isn't fixed, it's just
Post by Guy Harris
worked around in libpcap).
I'm only using libpcap to access the device, I never perform direct calls on the
underlying device (the application has to remain platform-neutral). I'm running
on 10.6.2. The timeouts are all set to 1 second.
Post by Guy Harris
Is your program built as a 32-bit program or a 64-bit program?
I was compiling for the native os. The lipo -info command says it is i386. Just
to be sure, I removed all other architectures (ppc and x86_64) from the list of
compilation targets and recompiled. Nothing has changed. I will retry to build
it for x86_64 and see if anything changes.
Post by Guy Harris
Presumably the tcpdump you downloaded and recompiled was recompiled on Snow
Leopard, which means that, unless your machine has a 32-bit processor

It's a 64-bit processor. Anyway, tcpdump works fine. My line of reasoning is: if
tcpdump works correctly and it always uses libpcap, then I should be able to
capture using the same filter. I just cannot understand what my code (posted
earlier) is doing differently from tcpdump.

Thanks for the support
Marco
Guy Harris
2010-02-03 20:45:43 UTC
Permalink
Post by Marco De Angelis
Post by Guy Harris
it seems that libpcap captures them, but the pcap_dispatch (and pcap_loop as well) does not deliver packets to the pcap_handler.
What do you mean by "libpcap captures them"? Do you mean that libpcap reads the packets into the userland
buffer attached to the pcap_t, or that *BPF* captures them (i.e., they get put into the *kernel* buffer for
the BPF device) but libpcap doesn't read them into its userland buffer?
Good question. Do you know how could I verify the buffer they stay in? Is there
some printout I could add before calling pcap_dispatch to see what's in the
kernel buffer and what in the userland buffer?
Yes, but you'd have to add it to libpcap. :-)
Post by Marco De Angelis
Post by Guy Harris
Is your program built as a 32-bit program or a 64-bit program?
I was compiling for the native os.
"The native OS" supports both 32-bit and 64-bit userland code on 64-bit processors (regardless of whether the kernel is running 32-bit or 64-bit). Snow Leopard's GCC, if not given any -arch flags, builds 32-bit on 32-bit machines and 64-bit on 64-bit machines.
Post by Marco De Angelis
The lipo -info command says it is i386.
If it said something such as

Non-fat file: XXX is architecture: i386

then that means it's 32-bit only. If it said something such as

Architectures in the fat file: XXX are: x86_64 i386

then it's both 64-bit and 32-bit, and will run 64-bit on 64-bit machines.
Post by Marco De Angelis
Just
to be sure, I removed all other architectures (ppc and x86_64) from the list of
compilation targets
Are you using an Xcode project or a Makefile or something else (including "manually running gcc from the command line")? If it's an Xcode project, I'm not sure what the default behavior would be.
Post by Marco De Angelis
and recompiled. Nothing has changed. I will retry to build
it for x86_64 and see if anything changes.
That would be interesting, because...
Post by Marco De Angelis
Post by Guy Harris
Presumably the tcpdump you downloaded and recompiled was recompiled on Snow
Leopard, which means that, unless your machine has a 32-bit processor
It's a 64-bit processor.
...that means that if you just did a configure-and-make on tcpdump, it'd be 64-bit, and...
Post by Marco De Angelis
Anyway, tcpdump works fine.
...it might be that 64-bit code works, even with sub-second timeouts, as of 10.6.2, but 32-bit code is broken, so...
Post by Marco De Angelis
My line of reasoning is: if
tcpdump works correctly and it always uses libpcap, then I should be able to
capture using the same filter. I just cannot understand what my code (posted
earlier) is doing differently from tcpdump.
...what it might be doing differently is "running 32-bit".

I'll look at what happens with 32-bit code as well.
Marco De Angelis
2010-02-03 22:49:56 UTC
Permalink
Post by Guy Harris
Post by Marco De Angelis
some printout I could add before calling pcap_dispatch to see what's in the
kernel buffer and what in the userland buffer?
Yes, but you'd have to add it to libpcap.
Post by Marco De Angelis
Post by Guy Harris
Is your program built as a 32-bit program or a 64-bit program?
To summarize, the "lipo -info lipo -i /usr/sbin/tcpdump" command reports as
expected:

Architectures in the fat file: /usr/sbin/tcpdump are: x86_64 i386

Executed on tcpdump recompiled by me on the same machine using makefile:

Non-fat file: tcpdump is architecture: x86_64

I recompiled the test application "inspector" (which uses lipcap to capture) for
architecture i386, and no packets were delivered by libpcap. I recompiled for
x86_64, now I get:

Non-fat file: inspector is architecture: x86_64

But the problem is still the same.

I am compiling with Xcode 3.2.1, gcc version is 4.2.1. The project is generated
by cmake, and the build default arhitecture is i386. I also tried generating the
Makefile with cmake and use to compile the application, but without benefit.

I also tried changing snaplen wih high and low values, timeout values greater
than 1000, promisc mode on/off, etc. None of these changes seem to modify the
behaviour except the aforementioned pcap_set_buffer_size(128).
Marco De Angelis
2010-02-09 10:15:36 UTC
Permalink
Post by Guy Harris
Post by Marco De Angelis
Good question. Do you know how could I verify the buffer
they stay in? Is there
some printout I could add before calling pcap_dispatch to see
what's in the kernel buffer and what in the userland buffer?
Yes, but you'd have to add it to libpcap.
I made an interesting test.
By collecting pcap_stats() after every call to pcap_dispatch and
printing the pcap_stat values out, I could verify that the packets
are received.
E.g. if I filter for ICMP packets, by launching "ping" commands
I can see "ps_recv" increase rapidly.

Now, I don't know what "received" means (in userland? in kernel
buffer?), but maybe you do :)

Thanks,
Marco
Carter Bullard
2010-02-09 17:41:19 UTC
Permalink
Hey Marco,
This may help you if you are not doing it. It seemed to help me on Snow Leopard.

Just after the call to pcap_open_live(), I set this ioctl. You may not need the pcap_setnonblock() for
your application.

if ((pd = pcap_open_live(device->name, snaplen, !pflag, 100, errbuf)) != NULL) {
pcap_setnonblock(pd, 1, errbuf);

#if defined(__APPLE_CC__) || defined(__APPLE__)
int v = 1;
ioctl(pcap_fileno(pd), BIOCIMMEDIATE, &v);
#endif


Carter
Post by Marco De Angelis
Post by Guy Harris
Post by Marco De Angelis
Good question. Do you know how could I verify the buffer
they stay in? Is there
some printout I could add before calling pcap_dispatch to see
what's in the kernel buffer and what in the userland buffer?
Yes, but you'd have to add it to libpcap.
I made an interesting test.
By collecting pcap_stats() after every call to pcap_dispatch and
printing the pcap_stat values out, I could verify that the packets
are received.
E.g. if I filter for ICMP packets, by launching "ping" commands
I can see "ps_recv" increase rapidly.
Now, I don't know what "received" means (in userland? in kernel
buffer?), but maybe you do :)
Thanks,
Marco
-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Guy Harris
2010-02-10 05:37:18 UTC
Permalink
Post by Carter Bullard
Just after the call to pcap_open_live(), I set this ioctl. You may not need the pcap_setnonblock() for
your application.
if ((pd = pcap_open_live(device->name, snaplen, !pflag, 100, errbuf)) != NULL) {
That's a sub-second timeout, which definitely *won't* work in 64-bit mode on 10.6 and 10.6.1, but should work in 10.6.2 (and in 10.5.x and earlier). On 10.6 and 10.6.1, you'd see the symptoms you described, because the timeout would be set to, in effect, 0, meaning "no timeout", so the packets won't be delivered until the "store buffer" fills up with packets. Setting BIOCIMMEDIATE mode means that packets will not be buffered at all, so they're delivered as soon as they arrive.

It seems unlikely that, at least in 64-bit mode, the problem would have *started* with an upgrade from 10.6 or 10.6.1 to 10.6.2; that upgrade fixed a similar problem with Wireshark. If it's still there with 10.6.2, as you and Marco note is the case, something else is happening; I'll look into that.-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Marco De Angelis
2010-02-10 21:15:37 UTC
Permalink
Post by Carter Bullard
Hey Marco,
This may help you if you are not doing it. It seemed to help me on Snow Leopard.
Carter, thank you so much! It works nicely with this addition. I understand that
BIOCIMMEDIATE changes the behaviour and avoids buffering, therefore I will
confine the changes to Snow Leopard only.

Still, I am puzzled by the fact that tcpdump is not invoking ioctl() and yet
it works perfecly fine on my machine.

Still, thank you again.
Marco
Guy Harris
2010-02-10 08:47:33 UTC
Permalink
Post by Marco De Angelis
I made an interesting test.
By collecting pcap_stats() after every call to pcap_dispatch and
printing the pcap_stat values out, I could verify that the packets
are received.
E.g. if I filter for ICMP packets, by launching "ping" commands
I can see "ps_recv" increase rapidly.
Now, I don't know what "received" means (in userland? in kernel
buffer?), but maybe you do :)
I know it depends on the platform. :-)

In BPF-based systems such as *BSD and OS X, it count packets that are seen by the BPF mechanism, regardless of whether they pass the capture filter or not, so it can count packets that aren't even put into the *kernel* buffer. If you have no capture filter, so that all packets "pass the filter", it counts packets put into the kernel buffer, regardless of whether they've been read into userland.

So it sounds as if, for some reason, the timer isn't expiring and causing packets to be delivered.

Your code snippet shows pcap_dispatch() being called at the beginning of a "loop forever" loop, so I presume you're not doing a select() to wait for packets to arrive (that has a problem in older versions of *BSD and still has a problem in OS X).

Could you - and Carter - put, into your programs, the following includes (if they're not already there):

#include <string.h>
#include <errno.h>
#include <sys/ioctl.h>

and, before the include of pcap.h, add

#define PCAP_DONT_INCLUDE_PCAP_BPF_H

and then, after the include of pcap.h, add

#include <net/bpf.h>

and then, in the routine/method that calls pcap_open_live() or pcap_activate(), add

char errbuf[PCAP_ERRBUF_SIZE];
struct BPF_TIMEVAL t;

and, after the pcap_open_live() or pcap_activate() call, do

if (ioctl(pcap_fileno(pd), BIOCGRTIMEOUT, &t) == -1) {
fprintf(stderr, "bpftest: BIOCGRTIMEOUT failed: %s\n",
strerror(errno));
return 2;
}
printf("BIOCGRTIMEOUT = %#08lx, t.tv_sec = %d, t.tv_usec = %d\n",
(unsigned long)BIOCGRTIMEOUT, t.tv_sec, t.tv_usec);

where:

1) "pd" is the return value from pcap_create() or pcap_open_live() (pcapSession, in Marco's code snippet; pd, in Carter's);

2) the printf() call can be replaced by a C++ equivalent, if the program is in C++, and if the program isn't something that runs from the command line, the code can be modified to arrange that the output be somehow visible.

Then run the program and reply with the output it produces.-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Marco De Angelis
2010-02-10 21:42:28 UTC
Permalink
Post by Guy Harris
Your code snippet shows pcap_dispatch() being called at the
beginning of a "loop forever" loop, so I presume you're not
doing a select() to wait for packets to arrive (that has a problem
in older versions of *BSD and still has a problem in OS X).
So the call to pcap_dispatch not preceded by a select() could still
cause problems in 10.6.2?

I tried also to replace the call to pcap_dispatch with pcap_loop,
without luck. Higher timeout values (2000) for pcap_open_live
did not change the behaviour.
Post by Guy Harris
Then run the program and reply with the output it produces.-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
This is the output on my machine:

BIOCGRTIMEOUT = 0x4008426e, t.tv_sec = 1, t.tv_usec = 0

Even if Carter's suggestion works fine, I could see that tcpdump
is not performing any ioctl() call (nor any select() call). That still
puzzles me, as tcpdump works perfecly.

Thanks for the kind support,
Marco
Guy Harris
2010-02-11 08:20:13 UTC
Permalink
Post by Marco De Angelis
So the call to pcap_dispatch not preceded by a select() could still
cause problems in 10.6.2?
It *shouldn't* cause problems, but, from what you and Carter are reporting, it *does* cause problems.
Post by Marco De Angelis
BIOCGRTIMEOUT = 0x4008426e, t.tv_sec = 1, t.tv_usec = 0
OK, that's a 1-second timeout, and that's the right ioctl code for BIOCGRTIMEOUT in a world where its argument is a 32-bit "struct timeval" (which is not correct in 64-bit-land on OS X, which is the problem I mentioned in 10.6 and 10.6.1, but libpcap works around that in 10.6.2, and in the top-of-Git-tree version on tcpdump.org).

So it appears that the timeout is correct.

Can you cut your application down to the smallest code snippet that shows the problem, and send that to me? You don't have to print the contents of the packet, just report when they arrive (or don't arrive, in the case you're dealing with). I'll try running in on 10.6.2 and see what the heck is going on; there's nothing obvious in 10.6.2 that should be causing this problem, and, in fact, tcpdump isn't exhibiting the problem.
Marco De Angelis
2010-02-12 19:02:04 UTC
Permalink
Post by Guy Harris
Can you cut your application down to the smallest code
snippet that shows the problem, and send that to me?
I managed to extrapolate the core. It's a little messy because
of the many tests I made recently and the 80-chars line
limitation, but it show the original problem.
If you uncomment the call to ioctl with
BIOCIMMEDIATE setting, packets get delivered immediately.
I tested it with "libpcaptest en1 icmp" and while running
"ping google.com".

#include <iostream>
#include <stdio.h>
#include <string>
#include <vector>
#include <errno.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/time.h>
#include <net/bpf.h>
#include <pcap.h>

pcap_t* create(const std::string& name,
const std::string& pcapFilter,
uint32_t snapLen, bool promisc);
bool capture(pcap_t * pcapSession);
void close(pcap_t* pcapSession);


int main(int argc, char** argv)
{
if (argc != 3)
{
std::cerr << "Usage: libpcaptest <interface> <filter>"
<< std::endl;
return 1;
}

std::string name(argv[1]), filter(argv[2]);
std::cout << "Capturing from '" << name << " with filter "
<< filter << std::endl;

pcap_t * pcapSession = create(name, filter, 128, true);
capture(pcapSession);
close(pcapSession);
return 0;
}

/**
This is the callback
**/
void test_pcap_handler(u_char* user, const struct pcap_pkthdr* header,
const u_char* pkt_data)
{
std::cout << "Packet captured" << std::endl;
}

/**
Temporary used since on Windows they forgot to sign as 'const char*'
the filter string provided to pcap_compile...
**/
void duplicateFilterString(const std::string& pcapFilter,
std::vector<char>& dupFilter)
{
dupFilter.clear();
dupFilter.resize(pcapFilter.size()+1, 0);

for (uint32_t i=0; i<pcapFilter.size(); ++i)
dupFilter[i] = pcapFilter[i];
}

void close(pcap_t* pcapSession)
{
if (pcapSession)
{
pcap_close(pcapSession);
}
}


pcap_t* create(const std::string& name,
const std::string& pcapFilter,
uint32_t snapLen, bool promisc)
{
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t* pcapSession;

struct BPF_TIMEVAL t;

if ((pcapSession = pcap_open_live(name.c_str(),
snapLen, promisc ? 1 : 0, 1000, errbuf)) == NULL)
{
std::cerr << "Failed pcap_open_live because <"
<<errbuf<<">" << std::endl;
return NULL;
}

if (ioctl(pcap_fileno(pcapSession), BIOCGRTIMEOUT, &t) == -1) {
fprintf(stderr, "bpftest: BIOCGRTIMEOUT failed: %s\n",
strerror(errno));
std::cerr << "Failed ioctl with BIOCGRTIMEOUT because <"
<<errbuf<<">" << std::endl;
return NULL;
}
printf("BIOCGRTIMEOUT = %#08lx, t.tv_sec = %d, t.tv_usec = %d\n",
(unsigned long)BIOCGRTIMEOUT, t.tv_sec, t.tv_usec);

// compile the filter if it's been supplied or snapLen is provided
if (pcapFilter.empty()==false || snapLen<65535)
{
// get netmask
bpf_u_int32 pcapNetaddr, pcapMask;
pcap_lookupnet(name.c_str(), &pcapNetaddr, &pcapMask, errbuf);

struct bpf_program pcapFilterProgram;
std::vector<char> filterDup;
duplicateFilterString(pcapFilter, filterDup);

if (pcap_compile(pcapSession, &pcapFilterProgram,
&filterDup[0], 1, pcapMask) == -1)
{
std::string error = pcap_geterr(pcapSession);
pcap_close(pcapSession);
std::cerr << "Failed pcap_compile because <"
<<errbuf<<">" << std::endl;
return NULL;
}

if (pcap_setfilter(pcapSession, &pcapFilterProgram) == -1)
{
std::string error = pcap_geterr(pcapSession);
pcap_freecode(&pcapFilterProgram);
pcap_close(pcapSession);
std::cerr << "Failed pcap_setfilter because <"
<<errbuf<<">" << std::endl;
return NULL;
}

pcap_freecode(&pcapFilterProgram);
}

// set session in non blocking mode
if (pcap_setnonblock(pcapSession, 1, errbuf)!=0)
{
pcap_close(pcapSession);
std::cerr << "Failed pcap_setnonblock because <"
<<errbuf<<">" << std::endl;
return NULL;
}

/*
Enable this for immediate delivery of packets through callback.

uint32_t v = 1;
if (ioctl(pcap_fileno(pcapSession), BIOCIMMEDIATE, &v) < 0) {
pcap_close(pcapSession);
std::cerr << "Failed ioctl BIOCIMMEDIATE" << std::endl;
return NULL;
}
*/

int dlt;
const char *dlt_name;
dlt = pcap_datalink(pcapSession);
dlt_name = pcap_datalink_val_to_name(dlt);
if (dlt_name == NULL) {
(void)fprintf(stderr,
"listening on %s, link-type %u, capture size %u bytes\n",
name.c_str(), dlt, snapLen);
} else {
(void)fprintf(stderr,
"listening on %s, link-type %s (%s), capture size %u bytes\n",
name.c_str(), dlt_name,
pcap_datalink_val_to_description(dlt), snapLen);
}

return pcapSession;
}

bool capture(pcap_t * pcapSession)
{
struct pcap_stat pcapStats;

while (true)
{
int32_t ret = pcap_dispatch(pcapSession, 100,
test_pcap_handler, (u_char*)NULL);
std::cout << "Read " << ret << " packets" << std::endl;
if (pcap_stats(pcapSession, &pcapStats) != 0)
{
std::string error = pcap_geterr(pcapSession);
std::cerr << "Failed pcap_setnonblock because <"
<<error<<">" << std::endl;
return false;
}
std::cout << "ReceivedPackets " << pcapStats.ps_recv <<
" DroppedPackets " << pcapStats.ps_drop <<
" I/F DroppedPackets " << pcapStats.ps_ifdrop << std::endl;


if (ret==-1)
{
std::string error = pcap_geterr(pcapSession);
std::cerr << "Failed pcap_dispatch because <"<<error<<">" << std::endl;
return NULL;
}


sleep(5);
}
return true;
}
Guy Harris
2010-02-13 00:52:05 UTC
Permalink
Post by Marco De Angelis
Post by Guy Harris
Can you cut your application down to the smallest code
snippet that shows the problem, and send that to me?
I managed to extrapolate the core. It's a little messy because
of the many tests I made recently and the 80-chars line
limitation, but it show the original problem.
If you uncomment the call to ioctl with
BIOCIMMEDIATE setting, packets get delivered immediately.
...and if I "#if 0" out the code that puts the pcap into non-blocking mode, packets don't get delivered immediately, but they *do* arrive, so it appears to be an issue with non-blocking mode.
Post by Marco De Angelis
From looking at both the FreeBSD and OS X BPF code path in the kernel, I suspect the problem would also show up on FreeBSD 7.x and 8.x, at least, as well as 9.0-CURRENT; there are probably older versions that have it as well. However, if you use select() to wait for packets, it probably won't occur on FreeBSD, but would still occur on OS X. Top-of-tree NetBSD looks as if it fixes this, and I don't know if OpenBSD ever had it (the top-of-tree code looks as if the change that introduced this was never added). DragonFly BSD appears to be the same as FreeBSD here.
If you have an Apple Developer Connection account, go to

http://developer.apple.com/bugreporter/

and log in to Apple Bug Reporter and file a bug; please send me the bug number when you do so. If you don't have an ADC account, that page has a "Sign up to become a free ADC Member" link.

This problem was introduced in Snow Leopard as a result of picking up from FreeBSD a fix to another bug; the existence in FreeBSD of that fix is the reason why I suspect the same problem would show up there (and the existence in FreeBSD of a fix to the way BPF works with select() is the reason why I suspect using select() to wait for packets would prevent the bug from appearing there - that fix is *not* in OS X).

Anybody else seeing this problem should also file a bug and send me the bug number. (The more bugs filed, the more likely I suspect it'll be to get fixed in a software update.)

If it shows up in FreeBSD, I'll look at submitting fixes for it and DragonFly BSD as well. (If it doesn't, I need to think about this a little more.)-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.
Guy Harris
2010-02-13 08:45:58 UTC
Permalink
Post by Guy Harris
If it shows up in FreeBSD, I'll look at submitting fixes for it and DragonFly BSD as well.
It shows up in FreeBSD 7.0 as well, as I suspected. I've submitted a FreeBSD bug, kern/143855, and a DragonFly BSD bug.
Marco De Angelis
2010-02-15 23:55:40 UTC
Permalink
Post by Guy Harris
...and if I "#if 0" out the code that puts the pcap into
non-blocking mode, packets don't get delivered
immediately, but they *do* arrive, so it appears to be an
issue with non-blocking mode.
Oh, I see.

I have set the non-blocking mode to 0, expecting
the call to pcap_dispatch to hang when packets are not
collected. But instead, I can see many printouts (Read 0 packets)
which indicate that the pcap_dispatch has exited when no
packets are dispatched. So, is the non-blocking mode bugged, or
am I understanding the modality in the wrong way?

marco$ sudo ./libpcaptest en1 icmp
Capturing from 'en1' with filter 'icmp'
BIOCGRTIMEOUT = 0x4008426e, t.tv_sec = 1, t.tv_usec = 0
listening on en1, link-type EN10MB (Ethernet), capture size 128 bytes
Packet captured
Read 1 packets
ReceivedPackets 4 DroppedPackets 0 I/F DroppedPackets 0
Read 0 packets
ReceivedPackets 4 DroppedPackets 0 I/F DroppedPackets 0
Read 0 packets
ReceivedPackets 4 DroppedPackets 0 I/F DroppedPackets 0
Read 0 packets
ReceivedPackets 4 DroppedPackets 0 I/F DroppedPackets 0
Read 0 packets
ReceivedPackets 4 DroppedPackets 0 I/F DroppedPackets 0
Packet captured
Read 1 packets
ReceivedPackets 9 DroppedPackets 0 I/F DroppedPackets 0
^C
Post by Guy Harris
If you have an Apple Developer Connection account, go to
http://developer.apple.com/bugreporter/
I have an account. I'll file the bug report and send you the code.
Guy Harris
2010-02-19 18:30:47 UTC
Permalink
Post by Marco De Angelis
I have set the non-blocking mode to 0, expecting
the call to pcap_dispatch to hang when packets are not
collected. But instead, I can see many printouts (Read 0 packets)
which indicate that the pcap_dispatch has exited when no
packets are dispatched. So, is the non-blocking mode bugged, or
am I understanding the modality in the wrong way?
Non-blocking mode is, indeed, buggy in Snow Leopard; a non-blocking read from a BPF device will return "no data available" unless the "store buffer" fills up. It's also buggy in FreeBSD and DragonFly BSD, but, on sufficiently-up-to-date versions of those systems, *if* you do a select() on a BPF device():

1) the select will return if the "store buffer" fills up *or* the timer expires;

2) either of those two will cause the "store buffer" to be rotated to the "hold buffer", so a non-blocking read will return data.

Neither of those are true in Snow Leopard. (The first of those isn't true in *any* OS X release - the select() only returns if the "store buffer" fills up - and is also not true in older versions of various BSDs.)

I've submitted bug reports for FreeBSD and DragonFly BSD; Matt Dillon said that he picked up my fix for DFly BSD. I've also attached my fix to the OS X bug you filed.

A read on a BPF device that's *not* in non-blocking mode will block if no packets are available. *However*, if you've set a timeout on the BPF device - as libpcap does if you specify a timeout in pcap_open_live(), or in pcap_set_timeout() before pcap_activate() - it won't block *forever*; if the timeout expires, and no packets have arrived, it'll return, saying no packets arrived. *That's* why, even without turning non-blocking mode on, you *eventually* get "Read 0 packets" indications - you've set a timeout of 1 second. However, the pcap_dispatch() call *will* block for a second.

Note that there is no guarantee that, on all platforms, pcap_dispatch() will return after the timeout expires even if no packets have arrived. That is *not* the case on Solaris, for example, and it might or might not work on Linux.
Loading...