Toerless Eckert | 1 Nov 2011 03:12
Picon
Picon
Picon
Favicon

LXC L3 network isolation, yes/no ?, how ?

I am trying to understand if (and if so how) i can use LXC (or any
other comparable lightweightc container option) to effectively
run applications on a linux system with two separate IP interfaces
as if they each had only access to a single IP interface.

Eg:
    eth0 with address and default-router learned by DHCP
    eg: address 10.1.1.2/24, default-router 10.1.1.254
    DNS prefix and DNS domain name for ether0 of course also learned by DHCP.

    eth1 with address and default-router learned by DHCP
    eg: address 10.2.1.a/242, default-router 10.2.1.254
    DNS prefix and DNS domain name for ether0 of course also learned by DHCP.

    (no need for overlapping addresses).

So, i configure LXC accordingly (how...) for one eth0container, and one
eth1container. All processes running eth0container will have all their
traffic use ony eth0, all the ones in eth1container will only use eth1.

If this works, i'd love to get a pointer to an example config. The
ones i could find on the web looked as if they where using bridging
to attach multiple containers to ultimately the same single IP subnet
with the same default router (and thereby the same DNS prefix and DNS servers).

I can't see how LXC can make my case work without some additional kernel
support because when either process1 or process2 open let say a
client socket and just connect(), then (AFAIK) the default linux routing
logic takes place which would (AFAIK) first figure out where to route the
destination to (eth0 or eth1) and then pick the local IP address of that
(Continue reading)

Eric W. Biederman | 1 Nov 2011 04:19
Favicon

Re: LXC L3 network isolation, yes/no ?, how ?

Toerless Eckert
<Toerless.Eckert@...> writes:

> I am trying to understand if (and if so how) i can use LXC (or any
> other comparable lightweightc container option) to effectively
> run applications on a linux system with two separate IP interfaces
> as if they each had only access to a single IP interface.
>
> Eg:
>     eth0 with address and default-router learned by DHCP
>     eg: address 10.1.1.2/24, default-router 10.1.1.254
>     DNS prefix and DNS domain name for ether0 of course also learned by DHCP.
>
>     eth1 with address and default-router learned by DHCP
>     eg: address 10.2.1.a/242, default-router 10.2.1.254
>     DNS prefix and DNS domain name for ether0 of course also learned by DHCP.
>
>     (no need for overlapping addresses).

That sounds like L2 level isolation.

ip link set eth1 netns XXXX.

Will let move a network device to a choose network namespace.

That is the easy trivial case.  Most people don't have the multiple
physical interfaces so tricky things have to happen.

Does that sound like what you are looking for?

(Continue reading)

Toerless Eckert | 1 Nov 2011 05:32
Picon
Picon
Picon
Favicon

Re: LXC L3 network isolation, yes/no ?, how ?

Thanks, Eric

How do i configure eg: an LXC container to use a specific network name space XXXX ?

Also: if an app within some LXC container does a socket() and then a 
bind(..INADDR_ANY...) how does the kernel know which subset of IP interfaces
it should bind to ? does the process context have a network name space ?

And how do i create per namespace routing tables ?

Example or pointer to docs would be great. or just walk me through the rough
outline of my use case...:

  - create container e0procs, configure just the physical eth0 interface into it ??
    - without assigning an IP address ?
    - run a dhcp daemon from withing container e0proces and that
      will correctly get ip address/mask and default route configured in a
      routing table solely used by container e0procs ?
    - container e0procs DHCPd will also populate containerized /etc/resolv.conf with
      eth0 domain prefix/DNS-servers...

  - same approach for container c1procs, confgiure phys eth1 interface into it,
    start DHCP daemon inside container inside it, get routing table and dNS
    for container c1procs from it.

Is that it ? Of not, then how. If yes, then what type of routing table would
i actually see outside of the containers ? And back to the original question,
would socket(), bind(INADDR_ANY) from inside the containers work correctly ?

Thanks
(Continue reading)

Eric W. Biederman | 1 Nov 2011 13:20
Favicon

Re: LXC L3 network isolation, yes/no ?, how ?

Toerless Eckert
<Toerless.Eckert@...> writes:

> Thanks, Eric
>
> How do i configure eg: an LXC container to use a specific network name space XXXX ?
>
> Also: if an app within some LXC container does a socket() and then a 
> bind(..INADDR_ANY...) how does the kernel know which subset of IP interfaces
> it should bind to ? does the process context have a network name space
> ?

The network namespace.

> And how do i create per namespace routing tables ?

Just like nomral.  From inside the network namespace you setup your
routing tables.

> Example or pointer to docs would be great. or just walk me through the rough
> outline of my use case...:
>
>   - create container e0procs, configure just the physical eth0 interface into it ??
>     - without assigning an IP address ?
>     - run a dhcp daemon from withing container e0proces and that
>       will correctly get ip address/mask and default route configured in a
>       routing table solely used by container e0procs ?
>     - container e0procs DHCPd will also populate containerized /etc/resolv.conf with
>       eth0 domain prefix/DNS-servers...
>
(Continue reading)

Toerless Eckert | 1 Nov 2011 16:26
Picon
Picon
Picon
Favicon

Re: LXC L3 network isolation, yes/no ?, how ?

THanks for replying,

Sorry for asking what probably are a lot of naive questions, my excuse is
that the documentation is somewhat scattered/incomplete ? ;-))

I am trying to figure out how to minimize the virtualization to just the network
name space and instantiate it in a lightweight fashion that can easily
be counterfitted into some existing system. 

What i would like to have is some simple program like "run-ns XXXX <program> <args>"
that would run program <args> within namespace XXXX.

So i was looking for some system call like set_ns(XXXX), but it seems there
is no API like that. Instead i guess i would need to have a "server" process
with pid XXXX that does an unshare(CLONE_NEWNS) and then listens for requests
to fork client programs, and run-ns would need to send a request to that XXXX
process to fork off <program> <args> and make sure that it can transfer all
the pre-existing context of run-ns like pid/gid(s), cwd, environment, and i don't
even know all the other context a linux process has these days. And then of course
communicate exit status of <program> back from XXXX to run-ns.

Meaning: it's great to have something like network name spaces, but without
some setns(XXXX) system call, it's really difficult to use these network name
spaces outside of a concept like LXC - which is a shame, because otherwise
the nework name space woudl exactly be what i am looking for.

I guess i will have to look how much of an isolated network behvior i can
get by using fwmark's. Alas, there is no process-level fwmark context, but
it has to be set via setsockopt(SO_MARK) AFAIK, so one would need some
LD_PRELOAD library or the like to use it.
(Continue reading)

Daniel Lezcano | 1 Nov 2011 16:55
Picon
Favicon

Re: LXC L3 network isolation, yes/no ?, how ?

On 11/01/2011 04:26 PM, Toerless Eckert wrote:
> THanks for replying,
>
> Sorry for asking what probably are a lot of naive questions, my excuse is
> that the documentation is somewhat scattered/incomplete ? ;-))
>
> I am trying to figure out how to minimize the virtualization to just the network
> name space and instantiate it in a lightweight fashion that can easily
> be counterfitted into some existing system.
>
> What i would like to have is some simple program like "run-ns XXXX<program>  <args>"
> that would run program<args>  within namespace XXXX.

Did you look at the lxc-execute command ?

http://lxc.sourceforge.net/man/lxc.html

the "Quick Start" section, third line.

   -- Daniel
Eric W. Biederman | 1 Nov 2011 18:17
Favicon

Re: LXC L3 network isolation, yes/no ?, how ?

Toerless Eckert
<Toerless.Eckert@...> writes:

> THanks for replying,
>
> Sorry for asking what probably are a lot of naive questions, my excuse is
> that the documentation is somewhat scattered/incomplete ? ;-))
>
> I am trying to figure out how to minimize the virtualization to just the network
> name space and instantiate it in a lightweight fashion that can easily
> be counterfitted into some existing system. 
>
> What i would like to have is some simple program like "run-ns XXXX <program> <args>"
> that would run program <args> within namespace XXXX.
>
> So i was looking for some system call like set_ns(XXXX), but it seems there
> is no API like that. Instead i guess i would need to have a "server" process
> with pid XXXX that does an unshare(CLONE_NEWNS) and then listens for requests
> to fork client programs, and run-ns would need to send a request to that XXXX
> process to fork off <program> <args> and make sure that it can transfer all
> the pre-existing context of run-ns like pid/gid(s), cwd, environment, and i don't
> even know all the other context a linux process has these days. And then of course
> communicate exit status of <program> back from XXXX to run-ns.
>
> Meaning: it's great to have something like network name spaces, but without
> some setns(XXXX) system call, it's really difficult to use these network name
> spaces outside of a concept like LXC - which is a shame, because otherwise
> the nework name space woudl exactly be what i am looking for.

Definitely old docs.
(Continue reading)

Will Drewry | 1 Nov 2011 21:02

Re: [RFC] cgroup: syscalls limiting subsystem

Some much delayed comments - thanks for the post.  It's cool to see
all the different ways to approach system call filtering!

On Thu, Oct 20, 2011 at 4:32 PM, Łukasz Sowa <luksow <at> gmail.com> wrote:
> First of all, thanks a lot for your replay.
>
>> On Tue, Oct 18, 2011 at 5:21 PM, Łukasz Sowa <luksow <at> gmail.com> wrote:
>>> Hi,
>>>
>>> Currently, I'm writing BSc thesis about security in modern Linux.
>>> Together with my thesis mentor, I decided that as practical part of my
>>> work I'll implement cgroup subsystem that allows to limit particular
>>> (defined by number) syscalls for groups of processes. The patch is ready
>>> for first review and we decided that I may try to push it to the
>>> mainline - so here it is.

Have you considered doing this as a system call namespace instead of a
cgroup?  (Just curious!)

>>
>> The major problem with this approach is that denying/allowing system
>> calls based purely on the syscall number is very coarse. I'd guess
>> that most vulnerabilities in a system can be exploited just using
>> system calls that almost all applications need in order to get regular
>> work done (open, write, exec ,mmap, etc) which limits the utility of
>> only being able to turn them off by syscall number. So overall I don't
>> think you'll find very much support for this patch.
>
> I have to disagree. Have you read through the link to the article at LWN
> I gave? If not, please do so. There are a few people being interested in
(Continue reading)

Tejun Heo | 2 Nov 2011 00:46

[PATCHSET] cgroup: stable threadgroup during attach & subsys methods consolidation

Hello,

NOT FOR THIS MERGE WINDOW.

This patchset is combination of the following two patchsets.

 [1] cgroup: extend threadgroup locking
 [2] cgroup: introduce cgroup_taskset and consolidate subsys methods, take#2

Changes from the last postings are

* 0001-cgroup-add-cgroup_root_mutex.patch replaces mutex ordering
  reversal patch, which Oleg found out to be broken.  Instead, a new
  sub-mutex cgroup_root_mutex is introduced to break circular
  dependency.

* Rebased on top of the current linus/master.

* Other minor changes to reflect comments from reviews.

* Reviewed/Acked-by's added.

This patchset addresses the following two issues.

1. cgroup currently only blocks new threads from joining the target
   threadgroup during migration, and on-going migration could race
   against exec and exit leading to interesting problems - the
   symmetry between various attach methods, task exiting during method
   execution, ->exit() racing against attach methods, migrating task
   switching basic properties during exec and so on.
(Continue reading)

Tejun Heo | 2 Nov 2011 00:46

[PATCH 01/10] cgroup: add cgroup_root_mutex

cgroup wants to make threadgroup stable while modifying cgroup
hierarchies which will introduce locking dependency on
cred_guard_mutex from cgroup_mutex.  This unfortunately completes
circular dependency.

 A. cgroup_mutex -> cred_guard_mutex -> s_type->i_mutex_key -> namespace_sem
 B. namespace_sem -> cgroup_mutex

B is from cgroup_show_options() and this patch breaks it by
introducing another mutex cgroup_root_mutex which nests inside
cgroup_mutex and protects cgroupfs_root.

Signed-off-by: Tejun Heo <tj@...>
Cc: Oleg Nesterov <oleg@...>
---
 kernel/cgroup.c |   64 ++++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 453100a..efa5886 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
 <at>  <at>  -63,7 +63,24  <at>  <at> 

 #include <linux/atomic.h>

+/*
+ * cgroup_mutex is the master lock.  Any modification to cgroup or its
+ * hierarchy must be performed while holding it.
+ *
(Continue reading)


Gmane