Pablo Neira Ayuso | 4 May 12:50 2015

[PATCH 0/4] Netfilter ingress support (v3)


Another round of the patchset to add Netfilter ingress support. This new
patchset introduces the necessary updates in 2 steps:

1) Add minismalistic ingress hook infrastructure that allows to register one
   client at a time, so you hit -EBUSY in case the hook is in use. Basically,
   we have a function pointer that is rcu-protected to invoke the corresponding
   filter framework which has minimal performance impact in the critical ingress
   path and avoid more pollution in it. This patch also ports the ingress qdisc
   on top of this.

   This also results in most of the qdisc ingress code that used to be embedded
   into net/core/dev.c can now be placed in net/sched/sch_ingress.c, which
   should allow to get rid of the Qdisc->enqueue() call.

2) Add Netfilter ingress support using the minimalistic hook infrastructure.
   There is some extra memory consumption (24 bytes) in net_device but pahole
   reports here a hole due to ____cacheline_aligned_in_smp to get the transmit
   path area in a different cache line. So I'm not sure it's worth the effort
   to reduce this to 8 bytes at the cost of getting the hook code a bit more

As already said, this opens the window to existing nftables core features that
are not present in qdisc ingress and that can be used out-of-the-box, most

1) Multi-dimensional key dictionary lookups.
2) Arbitrary stateful flow tables.
3) Transactions.
(Continue reading)

Daniel Borkmann | 4 May 12:23 2015

[PATCH iptables] libxt_CT: add support for flextuples

This adds iptables user space part for configuration of flextuples.
I also noticed that ct_print*() zone reporting had a whitespace bug,
which is fixed here as well (it needs to be prefixed).

Signed-off-by: Daniel Borkmann <daniel <at>>
Signed-off-by: Thomas Graf <tgraf <at>>
Signed-off-by: Madhu Challa <challa <at>>
 extensions/libxt_CT.c           | 41 +++++++++++++++++++++++++++++++++++++++--
 extensions/         |  7 +++++++
 include/linux/netfilter/xt_CT.h |  2 ++
 3 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/extensions/libxt_CT.c b/extensions/libxt_CT.c
index 6b28fe1..e7df772 100644
--- a/extensions/libxt_CT.c
+++ b/extensions/libxt_CT.c
 <at>  <at>  -40,6 +40,7  <at>  <at>  enum {

 #define s struct xt_ct_target_info
 <at>  <at>  -51,6 +52,7  <at>  <at>  static const struct xt_option_entry ct_opts[] = {
 	{.name = "expevents", .id = O_EXPEVENTS, .type = XTTYPE_STRING},
 	{.name = "zone", .id = O_ZONE, .type = XTTYPE_UINT16,
 	 .flags = XTOPT_PUT, XTOPT_POINTER(s, zone)},
+	{.name = "flextuple", .id = O_FLEXTUPLE, .type = XTTYPE_STRING},
(Continue reading)

Daniel Borkmann | 4 May 12:23 2015

[PATCH nf-next] netfilter: conntrack: add support for flextuples

This patch adds support for the possibility of doing NAT with
conflicting IP address/ports tuples from multiple, isolated
tenants, represented as network namespaces and netfilter zones.
For such internal VRFs, traffic is directed to a single or shared
pool of public IP address/port range for the external/public VRF.

Or in other words, this allows for doing NAT *between* VRFs
instead of *inside* VRFs without requiring each tenant to NAT
twice or to use its own dedicated IP address to SNAT to, also
with the side effect to not requiring to expose a unique marker
per tenant in the data center to the public.

Simplified example scheme:

  +--- VRF A ---+  +--- CT Zone 1 --------+
  |  +--+ ESTABLISHED |
  +-------------+  +--+-------------------+
                   | L3  +-SNAT-[]--eth0
  +-- VRF B ----+  +--- CT Zone 2 --------+
  |  +--+ ESTABLISHED |
  +-------------+  +----------------------+

VRF A and VRF B are two tenants, e.g. represented as a network
namespace. The connection state for each VRF is tracked separately
to implement differing policies, and thus results in one zone per
VRF. The operator does L3 between VRFs using any kind of L3 routing
(Continue reading)

Florian Westphal | 3 May 22:06 2015

[PATCH -next] netfilter: bridge: free nf_bridge info on xmit

nf_bridge information is only needed for -m physdev, so we can always free
it after POST_ROUTING.  This has the advantage that allocation and free will
typically happen on the same cpu.

Signed-off-by: Florian Westphal <fw <at>>
last changed line had spaces instead of tab, thats why it appears in diff.

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 13973da..2b0e8bb 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
 <at>  <at>  -129,6 +129,14  <at>  <at>  static struct nf_bridge_info *nf_bridge_info_get(const struct sk_buff *skb)
 	return skb->nf_bridge;

+static void nf_bridge_info_free(struct sk_buff *skb)
+	if (skb->nf_bridge) {
+		nf_bridge_put(skb->nf_bridge);
+		skb->nf_bridge = NULL;
+	}
 static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
 	struct net_bridge_port *port;
 <at>  <at>  -841,6 +849,7  <at>  <at>  static int br_nf_push_frag_xmit(struct sock *sk, struct sk_buff *skb)
 	skb_copy_to_linear_data_offset(skb, -data-≥size, data->mac, data->size);
 	__skb_push(skb, data->encap_size);
(Continue reading)

Florian Westphal | 3 May 22:05 2015

[PATCH -next] netfilter: bridge: neigh_head and physoutdev can't be used at same time

The neigh_header is only needed when we detect DNAT after prerouting
and neigh cache didn't have a mac address for us.

The output port has not been chosen yet so we can re-use the storage
area, bringing struct size down to 32 bytes on x86_64.

Signed-off-by: Florian Westphal <fw <at>>
 include/linux/skbuff.h    | 8 +++++---
 net/bridge/br_netfilter.c | 2 ++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 66e374d..2cea360 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
 <at>  <at>  -170,12 +170,14  <at>  <at>  struct nf_bridge_info {
-	} orig_proto;
+	} orig_proto:8;
 	bool			pkt_otherhost;
 	unsigned int		mask;
 	struct net_device	*physindev;
-	struct net_device	*physoutdev;
-	char			neigh_header[8];
+	union {
+		struct net_device *physoutdev;
+		char neigh_header[8];
(Continue reading)

Harald Welte | 3 May 16:23 2015

[PATCH] Add --without-{mysql,pgsql}

In some cases you may not want to build a certain output plugin, even
if the headers/libraries actually exist on the build host.
--- | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/ b/
index c814bec..1a7f8de 100644
--- a/
+++ b/
 <at>  <at>  -85,7 +85,10  <at>  <at>  if [! test "x$enable_nfacct" = "xyes"]; then

+AC_ARG_WITH([pgsql], AS_HELP_STRING([--without-pgsql], [Build without postgresql output plugin [default=test]]))
+AS_IF([test "x$with_pgsql" != "xno"], [
 if test "x$PQLIBPATH" != "x"; then
 <at>  <at>  -93,7 +96,10  <at>  <at>  else

+AC_ARG_WITH([mysql], AS_HELP_STRING([--without-mysql], [Build without mysql output plugin [default=test]]))
+AS_IF([test "x$with_mysql" != "xno"], [
(Continue reading)

Liu Hua | 3 May 11:50 2015

[PATCH] netfilter: fix dependency issues between IPv6 defragmentation and ip6tables

commit f6318e558806c925029dc101f14874be9f9fa78f fix some related issue
when ip6tables is enabled. But when IP6_NF_IPTABLES is disabled and
NETFILTER_XT_TARGET_TPROXY is enabled. We will meet build failure with
"net/built-in.o: In function `tproxy_tg_init':
net/netfilter/xt_TPROXY.c:588: undefined reference to `nf_defrag_ipv6_enable'
So this patch change the Kconfig as ipv4 does.

Signed-off-by: Liu Hua <sdu.liu <at>>
 net/netfilter/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index f70e34a..34f54a8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
 <at>  <at>  -865,7 +865,7  <at>  <at>  config NETFILTER_XT_TARGET_TPROXY
 	depends on (IPV6 || IPV6=n)
 	depends on IP_NF_MANGLE
 	select NF_DEFRAG_IPV4
+	select NF_DEFRAG_IPV6
 	  This option adds a `TPROXY' target, which is somewhat similar to
 	  REDIRECT.  It can only be used in the mangle table and is useful


(Continue reading)

Felix Janda | 2 May 21:51 2015

[iptables PATCH 2/2 RFC] Remove Libc5 support code

Current code makes the assumption that !defined(__GLIBC__) means libc5
which is very unlikely the case nowadays.

Fixes compile error because of conflict between kernel and musl headers.
If libc5 is considered still relevant, I could try to come up with an
autoconf test.
 include/libiptc/ipt_kernel_headers.h | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/include/libiptc/ipt_kernel_headers.h b/include/libiptc/ipt_kernel_headers.h
index 18861fe..a5963e9 100644
--- a/include/libiptc/ipt_kernel_headers.h
+++ b/include/libiptc/ipt_kernel_headers.h
 <at>  <at>  -5,7 +5,6  <at>  <at> 

 #include <limits.h>

-#if defined(__GLIBC__) && __GLIBC__ == 2
 #include <netinet/ip.h>
 #include <netinet/in.h>
 #include <netinet/ip_icmp.h>
 <at>  <at>  -13,15 +12,4  <at>  <at> 
 #include <netinet/udp.h>
 #include <net/if.h>
 #include <sys/types.h>
-#else /* libc5 */
-#include <sys/socket.h>
-#include <linux/ip.h>
(Continue reading)

Felix Janda | 2 May 21:51 2015

[iptables PATCH 1/2] Consistently use <errno.h>

On glibc, <sys/errno.h> is a synomym for <errno.h>.
<errno.h> is specified by POSIX, so use that.

Fixes compilation error with musl libc
 iptables/ip6tables-restore.c | 2 +-
 iptables/ip6tables-save.c    | 2 +-
 iptables/iptables-restore.c  | 2 +-
 iptables/iptables-save.c     | 2 +-
 iptables/iptables-xml.c      | 2 +-
 iptables/xtables-restore.c   | 2 +-
 iptables/xtables-save.c      | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/iptables/ip6tables-restore.c b/iptables/ip6tables-restore.c
index 0f4dd97..9393924 100644
--- a/iptables/ip6tables-restore.c
+++ b/iptables/ip6tables-restore.c
 <at>  <at>  -9,7 +9,7  <at>  <at> 

 #include <getopt.h>
-#include <sys/errno.h>
+#include <errno.h>
 #include <stdbool.h>
 #include <string.h>
 #include <stdio.h>
diff --git a/iptables/ip6tables-save.c b/iptables/ip6tables-save.c
index 56e5afb..f35e921 100644
--- a/iptables/ip6tables-save.c
(Continue reading)

Jozsef Kadlecsik | 2 May 19:27 2015

[PATCH 00/34] ipset patches for nf-next

Hi Pablo,

Please consider to apply the next bunch of patches for ipset. The patchset
contains the RCU introduction in ipset, splitted into six parts for easier
review. There are also some bugfixes and a lot of small corrections as well.

* Remove rbtree from ip_set_hash_netiface.c in order to introduce RCU.
* Replace rwlock_t with spinlock_t in "struct ip_set", change the locking
  in the core and simplifications in the timeout routines.
* Introduce RCU locking in bitmap:* types with a slight modification in the
  logic on how an element is added.
* Introduce RCU locking in hash:* types. This is the most complex part of
  the changes.
* Introduce RCU locking in list type where standard rculist is used.
* Fix parallel resizing and listing of the same set so that the original
  set is kept for the whole dumping.
* Fix the sparse warning: cast to restricted __be32
* Use MSEC_PER_SEC consistently instead of the number
* Give a better name to a macro in ip_set_core.c
* Missing rcu protection in mtype_list() fixed.
* Make sure listing doesn't grab a set which is just being destroyed.
* Make ip_set_get_ip*_port to use skb_network_offset from Alexander Drozdov.
* Fix cidr handling for hash:*net* types, reported by Jonathan Johnson.
* Properly calculate extensions offsets and total length so that memory
  is not wasted, from Sergey Popovich.
* Make sure bit operations are not reordered in ip_set_hash_gen.h.
* Remove unnecessary nomatch bitfield from Sergey Popovich.
* Preprocessor directives cleanup from Sergey Popovich.
* Return ipset error instead of bool in uadt functions from Sergey Popovich.
* Use SET_WITH_*() helpers to test set extensions from Sergey Popovich.
(Continue reading)

Linus Lüssing | 1 May 04:56 2015

Matching MLD with ip6tables


According to RFC4890 ("Recommendations for Filtering ICMPv6
Messages in Firewalls"), page 35, a rule like this should match
MLD packets:

$ ip6tables -A icmpv6-filter -p icmpv6 --icmpv6-type {130,131,132,143} ...

However, this does not seem to work for me. My guess is that it
does not match because --protocol is not 'icmpv6' but actually
the hop-by-hop-option first. Is this a bug in the RFC (and if so,
should I report it on some IETF mailing list?)?

Also, is there a way to somehow match IPv6 protocols with IPv6
options in between?

Cheers, Linus
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at>
More majordomo info at