Thomas Meyer | 29 Sep 19:36 2014
Picon

Remove default root?

Hi,

I tried to install the latest Fedora 21 from the Network Install Image
DVD
(
https://fedoraproject.org/get-prerelease#server
http://download.fedoraproject.org/pub/fedora/linux/releases/test/21-Alpha/Server/x86_64/iso/Fedora-Server-netinst-x86_64-21_Alpha.iso )

I did these step to install the distribution:

1.) Extract the initrd and boot options from the DVD via a loop device
mount
2.) Boot UML with:

linux mem=800M umid=fedora21 eth0=tuntap,,,192.168.1.150
ubd0=images/Fedora21-Root.img
ubd1=images/Fedora-Server-netinst-x86_64-21_Alpha.iso plymouth.enable=0
initrd=images/initrd.img inst.stage2=hd:LABEL=Fedora-S-21_A-x86_64
ip=192.168.1.151::192.168.1.150:255.255.255.0:servername:eth0:off
nameserver=8.8.8.8

3.) I only added the option "plymouth.enable=0" because the plymouth
get's confused by the missing KMS(?!)

The problem is, that this only works when I remove the default root
option from UML, as the dracut, get's confused by the root= option.

So as many modern distributions boot via an initramfs image and
autodetect the root partition automatically, I want to know what do you
think about the removal of the default root option from UML?
(Continue reading)

anton.ivanov | 26 Sep 15:18 2014
Picon

[PATCH v2] Fix for occasional userspace process in D/Z state

From: Anton Ivanov <antivano <at> cisco.com>

Occasionally, under very heavy load inside UML, on host
or both one of the processes in UML will remain in D state
or will fail to reap a child which will sit in Z state.

This is very difficult to reproduce with stock UML because
the lseek()/read()|write()|fsync() in the ubd driver will
cause a constant "trickle" of memory synchronization regardless
of what UML is doing. It is possible though (3-4 hours of
loadavg >6 stress load with low memory can trigger it).

If lseek()/read()|write() are replaced with its equivalent
pread()|pwrite() the original bug becomes much easier to
reproduce. It now takes 5-10 minutes of heavy IO to cause it.

This fix seems to cure it. I am not sure if this is the best
place to invoke a memory barrier, this one works.

Signed-off-by: Anton Ivanov <antivano <at> cisco.com>
---

This version takes into account the fact that uml picks up the 
correct define out of arch/x86/um/barrier.h so no need to invoke it 
directly - noted by Richard Weinberger

I still do not know if this is the correct location to hit it with a
mb(). This one seems to cure it.

 arch/um/kernel/exec.c |    1 +
(Continue reading)

anton.ivanov | 26 Sep 13:49 2014
Picon

[PATCH] Fix for "occasional userspace process in D/Z state" bug

From: Anton Ivanov <antivano <at> cisco.com>

This is a fix for a very old UML bug which can be triggered with stock 
UML. It takes a lot of effort to trigger it there because the 
lseek()/read() | write() mechanics of the UBD driver implicitly sync the 
memory all the time by hitting the appropriate barrier implementation in 
the host kernel. 

By improving the disk susbsystem we make this bug raise its ugly head
with a vengeance - you can get a process in D (with an occasional child
in Z state) simply by running an apt-get on 30-40 large packages. 

Is this correct place to have the sync - no idea. It may need to move
to somewhere inside tlb.c. With the fence in exec.c it works (TM).

If I understand this correctly, this also needs to be an instruction 
appropriate for the underlying host so just a barrier() will not cut 
it. You have to fence. En-guarde... Touche... :)

Signed-off-by: Anton Ivanov <antivano <at> cisco.com>
---
 arch/um/kernel/exec.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index 0d7103c..7cb6805 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
 <at>  <at>  -27,6 +27,11  <at>  <at>  void flush_thread(void)
 	ret = unmap(&current->mm->context.id, 0, STUB_START, 0, &data);
(Continue reading)

Anton Ivanov | 22 Sep 20:39 2014
Picon

Fwd: Mail delivery failed: returning message to sender

Hi RIchard, hi list,

There is some problem with the list on sourceforge. It has bounced one of the patches - no 4.

I can resend this for book-keeping purposes.

If memory serves me right there are no changes in this particular one so the earlier version (3) from ~ 2 weeks ago can be used instead.

A.


-------- Original Message -------- Subject: Date: From: To:
Mail delivery failed: returning message to sender
Mon, 22 Sep 2014 16:53:41 +0000
Mail Delivery System <Mailer-Daemon <at> sourceforge.net>
anton.ivanov <at> kot-begemot.co.uk


This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: user-mode-linux-devel <at> lists.sourceforge.net SMTP error from remote mail server after RCPT TO:<user-mode-linux-devel <at> lists.sourceforge.net>: host sfs-lb-ml.v29.ch3.sourceforge.com [172.29.29.17]: 550 Unknown user ------ This is a copy of the message, including all the headers. ------ ------ The body of the message is 29686 characters long; only the first ------ 16384 or so are included here. Return-path: <anton.ivanov <at> kot-begemot.co.uk> Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of kot-begemot.co.uk designates 89.200.143.206 as permitted sender) client-ip=89.200.143.206; envelope-from=anton.ivanov <at> kot-begemot.co.uk; helo=ivanoab3.miniserver.com; Received: from ivanoab3.miniserver.com ([89.200.143.206]) by sog-mx-2.v43.ch3.sourceforge.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76) id 1XW6ro-0005DR-7p for user-mode-linux-devel <at> lists.sourceforge.net; Mon, 22 Sep 2014 16:53:38 +0000 Received: from tun252.maui-covenant.sigsegv.cx ([192.168.17.6] helo=falkor.sigsegv.cx) by ivanoab3.miniserver.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from <anton.ivanov <at> kot-begemot.co.uk>) id 1XW6qk-0001BT-B3; Mon, 22 Sep 2014 16:52:30 +0000 Received: from monstrousnightmare.kot-begemot.co.uk ([192.168.3.80] helo=MonstrousNightmare.cisco.com) by falkor.sigsegv.cx with esmtp (Exim 4.80) (envelope-from <anton.ivanov <at> kot-begemot.co.uk>) id 1XW6rf-0002Zt-Ty; Mon, 22 Sep 2014 17:53:28 +0100 From: anton.ivanov <at> kot-begemot.co.uk To: user-mode-linux-devel <at> lists.sourceforge.net Cc: Anton Ivanov <antivano <at> cisco.com> Subject: [PATCH v4 04/11] L2TPv3 Transport Driver for UML Date: Mon, 22 Sep 2014 17:53:17 +0100 Message-Id: <1411404804-871910-5-git-send-email-anton.ivanov <at> kot-begemot.co.uk> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1411404804-871910-1-git-send-email-anton.ivanov <at> kot-begemot.co.uk> References: <1411404804-871910-1-git-send-email-anton.ivanov <at> kot-begemot.co.uk> X-Spam-Score: -1.5 (-) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain -0.0 SPF_PASS SPF: sender matches SPF record X-Headers-End: 1XW6ro-0005DR-7p From: Anton Ivanov <antivano <at> cisco.com> This transport allows a UML to connect to another UML local or remote, the Linux host or any other network device running the industry standard Ethernet over L2TPv3 protocol as per RFC 3931 (and successors). The transport supports a common set of features with the kernel implementation as well as the Cisco contributed L2TPv3 transport for QEMU/KVM. In all cases this is static tunnels only, no L2TPv3 control plane. Additionally, the transport supports the so called "soft" termination where it can listen for an incoming connection which does not require the remote endpoint to be specified at configuration time. Signed-off-by: Anton Ivanov <antivano <at> cisco.com> --- arch/um/Kconfig.net | 10 + arch/um/drivers/Makefile | 2 + arch/um/drivers/uml_l2tpv3.h | 111 ++++++++++ arch/um/drivers/uml_l2tpv3_kern.c | 434 +++++++++++++++++++++++++++++++++++++ arch/um/drivers/uml_l2tpv3_user.c | 409 ++++++++++++++++++++++++++++++++++ 5 files changed, 966 insertions(+) create mode 100644 arch/um/drivers/uml_l2tpv3.h create mode 100644 arch/um/drivers/uml_l2tpv3_kern.c create mode 100644 arch/um/drivers/uml_l2tpv3_user.c diff --git a/arch/um/Kconfig.net b/arch/um/Kconfig.net index e4a7cf2..d84a1ee 100644 --- a/arch/um/Kconfig.net +++ b/arch/um/Kconfig.net <at> <at> -93,6 +93,16 <at> <at> config UML_NET_SLIP UMLs on a single host). You may choose more than one without conflict. If you don't need UML networking, say N. +config UML_NET_L2TPV3 + bool "L2TPV3 transport" + depends on UML_NET + help + This User-Mode Linux network transport allows one or more running + UMLs on single or multiple hosts to communicate with each other, + the host as well as other remote or local network devices supporting + the industry standard Ethernet over L2TPv3 protocol as described in + the applicable RFCs + [snip] A.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
anton.ivanov | 22 Sep 18:53 2014
Picon

Updated Performance Improvements patchset

This is an update of the performance improvement patchset, it 
addresses a number of issues resulting from porting what was
originally written for 3.3.8 to the current linux kernel. 

Changes (where applicable) are annotated in actual patchsets.

A.

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
Thomas Meyer | 14 Sep 12:42 2014
Picon

[PATCH] um: Register UML with systemd-machined


Call machined's "CreateMachine" to create a scope unit for the
current uml instance.

Signed-off-by: Thomas Meyer <thomas <at> m3y3r.de>
---
 arch/um/Makefile            |  4 +-
 arch/um/include/shared/os.h |  1 +
 arch/um/os-Linux/Makefile   |  6 ++-
 arch/um/os-Linux/machined.c | 89 +++++++++++++++++++++++++++++++++++++++++++++
 arch/um/os-Linux/umid.c     | 69 +++++++++++++++++++----------------
 5 files changed, 135 insertions(+), 34 deletions(-)
 create mode 100644 arch/um/os-Linux/machined.c

diff --git a/arch/um/Makefile b/arch/um/Makefile
index e4b1a96..e9e3dee 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
 <at>  <at>  -132,8 +132,10  <at>  <at>  LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc

 LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt))

+LINK_DBUS = $(shell pkg-config --libs dbus-1)
+
 # Used by link-vmlinux.sh which has special support for um link
-export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE)
+export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) $(LINK_DBUS)

 # When cleaning we don't include .config, so we don't include
 # TT or skas makefiles and don't clean skas_ptregs.h.
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 08eec0b..13f8f10 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
 <at>  <at>  -217,6 +217,7  <at>  <at>  extern int helper_wait(int pid);
 extern int umid_file_name(char *name, char *buf, int len);
 extern int set_umid(char *name);
 extern char *get_umid(void);
+extern char* get_uml_dir_realpath(void);

 /* signal.c */
 extern void timer_init(void);
diff --git a/arch/um/os-Linux/Makefile b/arch/um/os-Linux/Makefile
index 08ff509..23559a3 100644
--- a/arch/um/os-Linux/Makefile
+++ b/arch/um/os-Linux/Makefile
 <at>  <at>  -5,16 +5,18  <at>  <at> 

 obj-y = aio.o execvp.o file.o helper.o irq.o main.o mem.o process.o \
 	registers.o sigio.o signal.o start_up.o time.o tty.o \
-	umid.o user_syms.o util.o drivers/ skas/
+	umid.o user_syms.o util.o drivers/ skas/ machined.o

 obj-$(CONFIG_ARCH_REUSE_HOST_VSYSCALL_AREA) += elf_aux.o

 USER_OBJS := $(user-objs-y) aio.o elf_aux.o execvp.o file.o helper.o irq.o \
 	main.o mem.o process.o registers.o sigio.o signal.o start_up.o time.o \
-	tty.o umid.o util.o
+	tty.o umid.o util.o machined.o

 HAVE_AIO_ABI := $(shell [ -r /usr/include/linux/aio_abi.h ] && \
 	echo -DHAVE_AIO_ABI )
 CFLAGS_aio.o += $(HAVE_AIO_ABI)

+CFLAGS_machined.o += $(shell pkg-config --cflags dbus-1 )
+
 include arch/um/scripts/Makefile.rules
diff --git a/arch/um/os-Linux/machined.c b/arch/um/os-Linux/machined.c
new file mode 100644
index 0000000..d39e7b1
--- /dev/null
+++ b/arch/um/os-Linux/machined.c
 <at>  <at>  -0,0 +1,89  <at>  <at> 
+#include <dbus/dbus.h>
+#include <init.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <os.h>
+#include <string.h>
+
+static int machined_init(void) {
+
+    DBusMessageIter message_iter;
+    DBusMessageIter message_iter_array;
+
+    DBusError dbus_error = {};
+    dbus_bool_t dbus_rc = false;
+
+    DBusConnection *con = NULL;
+    DBusMessage* message = NULL, *reply_message = NULL;
+
+    char *root_dir = get_uml_dir_realpath();
+
+    char *arg_machine_name = NULL;
+    unsigned char arg_uuid[] = { };
+    dbus_uint32_t arg_pid = os_getpid();
+    char *arg_root_dir = root_dir ? "" : root_dir;
+    const char *arg_service = "uml";
+    const char *arg_class = "vm";
+
+#define SLEN 255
+    arg_machine_name = malloc(SLEN);
+    if(arg_machine_name == NULL) {
+        goto out;
+    }
+
+    snprintf(arg_machine_name, SLEN, "uml-uid=%i-umid=%s", getuid(), get_umid());
+
+    con = dbus_bus_get(DBUS_BUS_SYSTEM, NULL);
+    if(con == NULL) {
+        printf("dbus_bus_get: no connection to system bus!\n");
+        goto out;
+    }
+
+    message = dbus_message_new_method_call(
+        "org.freedesktop.machine1",
+        "/org/freedesktop/machine1",
+        "org.freedesktop.machine1.Manager",
+        "CreateMachine");
+    if(message == NULL) {
+        printf("dbus_message_new_method_call: no machined manager found!\n");
+        goto out;
+    }
+
+    dbus_message_iter_init_append (message, &message_iter);
+    /* normal arguments */
+    dbus_rc = dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_machine_name);
+    dbus_rc &= dbus_message_iter_open_container(&message_iter, DBUS_TYPE_ARRAY,
DBUS_TYPE_BYTE_AS_STRING, &message_iter_array);
+    dbus_rc &= dbus_message_iter_append_fixed_array(&message_iter_array, DBUS_TYPE_BYTE,
&arg_uuid, 0);
+    dbus_rc &= dbus_message_iter_close_container(&message_iter, &message_iter_array);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_service);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_class);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_UINT32, &arg_pid);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_root_dir);
+
+    /* append scope properties array */
+    dbus_rc &= dbus_message_iter_open_container(&message_iter, DBUS_TYPE_ARRAY, "(sv)", &message_iter_array);
+    /* ENHANCEME: fill in array of (sv) to control the scope unit */
+    dbus_rc &= dbus_message_iter_close_container(&message_iter, &message_iter_array);
+    if(dbus_rc != true) {
+        printf("DBusMessage: construction of message failed!\n");
+        goto out;
+    }
+
+    reply_message = dbus_connection_send_with_reply_and_block(con, message, -1, &dbus_error);
+    if(reply_message == NULL) {
+        printf("Failed to register with systemd-machined: %s\n", dbus_error.message);
+        goto out;
+    }
+
+out:
+    free(arg_machine_name);
+    free(root_dir);
+    dbus_connection_flush(con);
+    dbus_message_unref(message);
+    dbus_connection_unref(con);
+    return 0;
+}
+
+__uml_postsetup(machined_init);
diff --git a/arch/um/os-Linux/umid.c b/arch/um/os-Linux/umid.c
index c1dc892..b492243 100644
--- a/arch/um/os-Linux/umid.c
+++ b/arch/um/os-Linux/umid.c
 <at>  <at>  -14,6 +14,7  <at>  <at> 
 #include <sys/stat.h>
 #include <init.h>
 #include <os.h>
+#include <stdbool.h>

 #define UML_DIR "~/.uml/"

 <at>  <at>  -24,48 +25,52  <at>  <at>  static char umid[UMID_LEN] = { 0 };

 /* Changed by set_uml_dir and make_uml_dir, which are run early in boot */
 static char *uml_dir = UML_DIR;
+static bool uml_dir_set = false;
+
+char* get_uml_dir_realpath(void) {

-static int __init make_uml_dir(void)
-{
 	char dir[512] = { '\0' };
-	int len, err;
+	int len = 0;
+	char* uml_dir_real = NULL;
+	char *uml_dir_local = uml_dir;

-	if (*uml_dir == '~') {
+	if (*uml_dir_local == '~') {
 		char *home = getenv("HOME");

-		err = -ENOENT;
 		if (home == NULL) {
 			printk(UM_KERN_ERR "make_uml_dir : no value in "
 			       "environment for $HOME\n");
 			goto err;
 		}
 		strlcpy(dir, home, sizeof(dir));
-		uml_dir++;
+		uml_dir_local++;
 	}
-	strlcat(dir, uml_dir, sizeof(dir));
+	strlcat(dir, uml_dir_local, sizeof(dir));
 	len = strlen(dir);
-	if (len > 0 && dir[len - 1] != '/')
+	if (len > 0 && dir[len - 1] != '/') {
 		strlcat(dir, "/", sizeof(dir));
+	}

-	err = -ENOMEM;
-	uml_dir = malloc(strlen(dir) + 1);
-	if (uml_dir == NULL) {
-		printf("make_uml_dir : malloc failed, errno = %d\n", errno);
+	uml_dir_real = malloc(strlen(dir) + 1);
+	if (uml_dir_real == NULL) {
+		printf("get_uml_dir_realpath : malloc failed, errno = %d\n", errno);
 		goto err;
 	}
-	strcpy(uml_dir, dir);
+	strcpy(uml_dir_real, dir);
+err:
+	return uml_dir_real;
+}

-	if ((mkdir(uml_dir, 0777) < 0) && (errno != EEXIST)) {
-	        printf("Failed to mkdir '%s': %s\n", uml_dir, strerror(errno));
+static int __init make_uml_dir(void)
+{
+    int err = 0;
+    char* uml_dir_real = get_uml_dir_realpath();
+
+	if ((mkdir(uml_dir_real, 0777) < 0) && (errno != EEXIST)) {
+	    printf("Failed to mkdir '%s': %s\n", uml_dir_real, strerror(errno));
 		err = -errno;
-		goto err_free;
 	}
-	return 0;
-
-err_free:
-	free(uml_dir);
-err:
-	uml_dir = NULL;
+	free(uml_dir_real);
 	return err;
 }

 <at>  <at>  -128,18 +133,17  <at>  <at>  out:
  *	this boot racing with a shutdown of the other UML
  * In any of these cases, the directory isn't useful for anything else.
  *
- * Boolean return: 1 if in use, 0 otherwise.
+ * Boolean return: true if in use, false otherwise.
  */
-static inline int is_umdir_used(char *dir)
+static inline bool is_umdir_used(char *dir)
 {
 	char file[strlen(uml_dir) + UMID_LEN + sizeof("/pid\0")];
 	char pid[sizeof("nnnnn\0")], *end;
-	int dead, fd, p, n, err;
+	int dead, fd, p, n;

 	n = snprintf(file, sizeof(file), "%s/pid", dir);
 	if (n >= sizeof(file)) {
 		printk(UM_KERN_ERR "is_umdir_used - pid filename too long\n");
-		err = -E2BIG;
 		goto out;
 	}

 <at>  <at>  -154,7 +158,6  <at>  <at>  static inline int is_umdir_used(char *dir)
 		goto out;
 	}

-	err = 0;
 	n = read(fd, pid, sizeof(pid));
 	if (n < 0) {
 		printk(UM_KERN_ERR "is_umdir_used : couldn't read pid file "
 <at>  <at>  -176,13 +179,13  <at>  <at>  static inline int is_umdir_used(char *dir)
 	if ((kill(p, 0) == 0) || (errno != ESRCH)) {
 		printk(UM_KERN_ERR "umid \"%s\" is already in use by pid %d\n",
 		       umid, p);
-		return 1;
+		return true;
 	}

 out_close:
 	close(fd);
 out:
-	return 0;
+	return false;
 }

 /*
 <at>  <at>  -194,7 +197,7  <at>  <at>  out:
 static int umdir_take_if_dead(char *dir)
 {
 	int ret;
-	if (is_umdir_used(dir))
+	if (is_umdir_used(dir) == true)
 		return -EEXIST;

 	ret = remove_files_and_dir(dir);
 <at>  <at>  -350,6 +353,10  <at>  <at>  char *get_umid(void)

 static int __init set_uml_dir(char *name, int *add)
 {
+    if(uml_dir_set == true) {
+        free(uml_dir);
+    }
+
 	if (*name == '\0') {
 		printf("uml_dir can't be an empty string\n");
 		return 0;
 <at>  <at>  -371,7 +378,7  <at>  <at>  static int __init set_uml_dir(char *name, int *add)
 		return 0;
 	}
 	sprintf(uml_dir, "%s/", name);
-
+	uml_dir_set = true;
 	return 0;
 }

--

-- 
1.9.3

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Thomas Meyer | 13 Sep 16:03 2014
Picon

[PATCH] um: register with systemd-machined

Hi,

This patch registers an UML instance with systemd-machined.
You may need to modify the dbus policy to allow this request
in /etc/dbus-1/system.d/org.freedesktop.machine1.conf

Attached patch crashes the UML kernel, but I've know idea why!
Help is appreciated and feedback is welcome!

diff --git a/arch/um/Makefile b/arch/um/Makefile
index e4b1a96..e9e3dee 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
 <at>  <at>  -132,8 +132,10  <at>  <at>  LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc

 LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt))

+LINK_DBUS = $(shell pkg-config --libs dbus-1)
+
 # Used by link-vmlinux.sh which has special support for um link
-export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE)
+export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) $(LINK_DBUS)

 # When cleaning we don't include .config, so we don't include
 # TT or skas makefiles and don't clean skas_ptregs.h.
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 08eec0b..13f8f10 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
 <at>  <at>  -217,6 +217,7  <at>  <at>  extern int helper_wait(int pid);
 extern int umid_file_name(char *name, char *buf, int len);
 extern int set_umid(char *name);
 extern char *get_umid(void);
+extern char *get_uml_dir_realpath(void);

 /* signal.c */
 extern void timer_init(void);
diff --git a/arch/um/os-Linux/Makefile b/arch/um/os-Linux/Makefile
index 08ff509..23559a3 100644
--- a/arch/um/os-Linux/Makefile
+++ b/arch/um/os-Linux/Makefile
 <at>  <at>  -5,16 +5,18  <at>  <at> 

 obj-y = aio.o execvp.o file.o helper.o irq.o main.o mem.o process.o \
        registers.o sigio.o signal.o start_up.o time.o tty.o \
-       umid.o user_syms.o util.o drivers/ skas/
+       umid.o user_syms.o util.o drivers/ skas/ machined.o

 obj-$(CONFIG_ARCH_REUSE_HOST_VSYSCALL_AREA) += elf_aux.o

 USER_OBJS := $(user-objs-y) aio.o elf_aux.o execvp.o file.o helper.o irq.o \
        main.o mem.o process.o registers.o sigio.o signal.o start_up.o time.o \
-       tty.o umid.o util.o
+       tty.o umid.o util.o machined.o

 HAVE_AIO_ABI := $(shell [ -r /usr/include/linux/aio_abi.h ] && \
        echo -DHAVE_AIO_ABI )
 CFLAGS_aio.o += $(HAVE_AIO_ABI)

+CFLAGS_machined.o += $(shell pkg-config --cflags dbus-1 )
+
 include arch/um/scripts/Makefile.rules
diff --git a/arch/um/os-Linux/machined.c b/arch/um/os-Linux/machined.c
new file mode 100644
index 0000000..d39e7b1
--- /dev/null
+++ b/arch/um/os-Linux/machined.c
 <at>  <at>  -0,0 +1,89  <at>  <at> 
+#include <dbus/dbus.h>
+#include <init.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <os.h>
+#include <string.h>
+
+static int machined_init(void) {
+
+    DBusMessageIter message_iter;
+    DBusMessageIter message_iter_array;
+
+    DBusError dbus_error = {};
+    dbus_bool_t dbus_rc = false;
+
+    DBusConnection *con = NULL;
+    DBusMessage* message = NULL, *reply_message = NULL;
+
+    char *root_dir = get_uml_dir_realpath();
+
+    char *arg_machine_name = NULL;
+    unsigned char arg_uuid[] = { };
+    dbus_uint32_t arg_pid = os_getpid();
+    char *arg_root_dir = root_dir ? "" : root_dir;
+    const char *arg_service = "uml";
+    const char *arg_class = "vm";
+
+#define SLEN 255
+    arg_machine_name = malloc(SLEN);
+    if(arg_machine_name == NULL) {
+        goto out;
+    }
+
+    snprintf(arg_machine_name, SLEN, "uml-uid=%i-umid=%s", getuid(), get_umid());
+
+    con = dbus_bus_get(DBUS_BUS_SYSTEM, NULL);
+    if(con == NULL) {
+        printf("dbus_bus_get: no connection to system bus!\n");
+        goto out;
+    }
+
+    message = dbus_message_new_method_call(
+        "org.freedesktop.machine1",
+        "/org/freedesktop/machine1",
+        "org.freedesktop.machine1.Manager",
+        "CreateMachine");
+    if(message == NULL) {
+        printf("dbus_message_new_method_call: no machined manager found!\n");
+        goto out;
+    }
+
+    dbus_message_iter_init_append (message, &message_iter);
+    /* normal arguments */
+    dbus_rc = dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_machine_name);
+    dbus_rc &= dbus_message_iter_open_container(&message_iter, DBUS_TYPE_ARRAY,
DBUS_TYPE_BYTE_AS_STRING, &message_iter_array);
+    dbus_rc &= dbus_message_iter_append_fixed_array(&message_iter_array, DBUS_TYPE_BYTE,
&arg_uuid, 0);
+    dbus_rc &= dbus_message_iter_close_container(&message_iter, &message_iter_array);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_service);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_class);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_UINT32, &arg_pid);
+    dbus_rc &= dbus_message_iter_append_basic(&message_iter, DBUS_TYPE_STRING, &arg_root_dir);
+
+    /* append scope properties array */
+    dbus_rc &= dbus_message_iter_open_container(&message_iter, DBUS_TYPE_ARRAY, "(sv)", &message_iter_array);
+    /* ENHANCEME: fill in array of (sv) to control the scope unit */
+    dbus_rc &= dbus_message_iter_close_container(&message_iter, &message_iter_array);
+    if(dbus_rc != true) {
+        printf("DBusMessage: construction of message failed!\n");
+        goto out;
+    }
+
+    reply_message = dbus_connection_send_with_reply_and_block(con, message, -1, &dbus_error);
+    if(reply_message == NULL) {
+        printf("Failed to register with systemd-machined: %s\n", dbus_error.message);
+        goto out;
+    }
+
+out:
+    free(arg_machine_name);
+    free(root_dir);
+    dbus_connection_flush(con);
+    dbus_message_unref(message);
+    dbus_connection_unref(con);
+    return 0;
+}
+
+__uml_postsetup(machined_init);
diff --git a/arch/um/os-Linux/umid.c b/arch/um/os-Linux/umid.c
index c1dc892..aae05be 100644
--- a/arch/um/os-Linux/umid.c
+++ b/arch/um/os-Linux/umid.c
 <at>  <at>  -14,6 +14,7  <at>  <at> 
 #include <sys/stat.h>
 #include <init.h>
 #include <os.h>
+#include <stdbool.h>

 #define UML_DIR "~/.uml/"

 <at>  <at>  -24,47 +25,55  <at>  <at>  static char umid[UMID_LEN] = { 0 };

 /* Changed by set_uml_dir and make_uml_dir, which are run early in boot */
 static char *uml_dir = UML_DIR;
+static bool uml_dir_set = false;
+
+char* get_uml_dir_realpath(void) {

-static int __init make_uml_dir(void)
-{
        char dir[512] = { '\0' };
-       int len, err;
+       int len = 0;
+       char* uml_dir_real = NULL;
+       char *uml_dir_local = uml_dir;

-       if (*uml_dir == '~') {
+       if (*uml_dir_local == '~') {
                char *home = getenv("HOME");

-               err = -ENOENT;
                if (home == NULL) {
                        printk(UM_KERN_ERR "make_uml_dir : no value in "
                               "environment for $HOME\n");
                        goto err;
                }
                strlcpy(dir, home, sizeof(dir));
-               uml_dir++;
+               uml_dir_local++;
        }
-       strlcat(dir, uml_dir, sizeof(dir));
+       strlcat(dir, uml_dir_local, sizeof(dir));
        len = strlen(dir);
-       if (len > 0 && dir[len - 1] != '/')
+       if (len > 0 && dir[len - 1] != '/') {
                strlcat(dir, "/", sizeof(dir));
+       }

-       err = -ENOMEM;
-       uml_dir = malloc(strlen(dir) + 1);
-       if (uml_dir == NULL) {
-               printf("make_uml_dir : malloc failed, errno = %d\n", errno);
+       uml_dir_real = malloc(strlen(dir) + 1);
+       if (uml_dir_real == NULL) {
+               printf("get_uml_dir_realpath : malloc failed, errno = %d\n", errno);
                goto err;
        }
-       strcpy(uml_dir, dir);
+       strcpy(uml_dir_real, dir);
+err:
+       return uml_dir_real;
+}
+
+static int __init make_uml_dir(void)
+{
+    int err = 0;
+    char* uml_dir_real = get_uml_dir_realpath();

-       if ((mkdir(uml_dir, 0777) < 0) && (errno != EEXIST)) {
-               printf("Failed to mkdir '%s': %s\n", uml_dir, strerror(errno));
+       if ((mkdir(uml_dir_real, 0777) < 0) && (errno != EEXIST)) {
+           printf("Failed to mkdir '%s': %s\n", uml_dir_real, strerror(errno));
                err = -errno;
-               goto err_free;
        }
-       return 0;
-
-err_free:
-       free(uml_dir);
-err:
+       free(uml_dir_real);
+       if(uml_dir_set == true) {
+           free(uml_dir);
+       }
        uml_dir = NULL;
        return err;
 }
 <at>  <at>  -128,18 +137,17  <at>  <at>  out:
  *     this boot racing with a shutdown of the other UML
  * In any of these cases, the directory isn't useful for anything else.
  *
- * Boolean return: 1 if in use, 0 otherwise.
+ * Boolean return: true if in use, false otherwise.
  */
-static inline int is_umdir_used(char *dir)
+static inline bool is_umdir_used(char *dir)
 {
        char file[strlen(uml_dir) + UMID_LEN + sizeof("/pid\0")];
        char pid[sizeof("nnnnn\0")], *end;
-       int dead, fd, p, n, err;
+       int dead, fd, p, n;

        n = snprintf(file, sizeof(file), "%s/pid", dir);
        if (n >= sizeof(file)) {
                printk(UM_KERN_ERR "is_umdir_used - pid filename too long\n");
-               err = -E2BIG;
                goto out;
        }

 <at>  <at>  -154,7 +162,6  <at>  <at>  static inline int is_umdir_used(char *dir)
                goto out;
        }

-       err = 0;
        n = read(fd, pid, sizeof(pid));
        if (n < 0) {
                printk(UM_KERN_ERR "is_umdir_used : couldn't read pid file "
 <at>  <at>  -176,13 +183,13  <at>  <at>  static inline int is_umdir_used(char *dir)
        if ((kill(p, 0) == 0) || (errno != ESRCH)) {
                printk(UM_KERN_ERR "umid \"%s\" is already in use by pid %d\n",
                       umid, p);
-               return 1;
+               return true;
        }

 out_close:
        close(fd);
 out:
-       return 0;
+       return false;
 }

 /*
 <at>  <at>  -350,6 +357,10  <at>  <at>  char *get_umid(void)

 static int __init set_uml_dir(char *name, int *add)
 {
+    if(uml_dir_set == true) {
+        free(uml_dir);
+    }
+
        if (*name == '\0') {
                printf("uml_dir can't be an empty string\n");
                return 0;
 <at>  <at>  -371,7 +382,7  <at>  <at>  static int __init set_uml_dir(char *name, int *add)
                return 0;
        }
        sprintf(uml_dir, "%s/", name);
-
+       uml_dir_set = true;
        return 0;
 }

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Toralf Förster | 5 Sep 17:27 2014
Picon
Picon

fuzz tested UML hangs while using NFS files as victim files - UML issue or NFS issue ?

Within a 32 bit stable Gentoo user mode linux guest system I do experience sometimes that trinity don't exit
after it completed the maximum number of jobs.
This might be either an UML, a trinity or a NFS issue (I use NFS victims files). Now I'm wondering where do
report it ? (Or is it a trinity issue ?)

UML guest kernel is v3.17-rc3-94-gb7fece1, host is 3.16.1.

The trinity log gives :
...
child0:4020] <timed out>
[child0:4020] <timed out>
[main] Bailing main loop because Completed maximum number of operations..
[watchdog] [3628] Watchdog exiting because Completed maximum number of operations..

Within the UML guest I do see this :

trinity ~ # ps -ef | grep trinity
tfoerste  3415  3414  0 16:33 ?        00:00:12 bash -c logger "1#-1, M=/mnt/nfsv3"; cd ~; sudo su -c 'if [[ -d ./t3 ]];
then sudo chmod -R a+rwx ./t3; sudo rm -rf ./t3; fi'; mkdir ./t3; cd ./t3; if [[ -n /mnt/nfsv3 ]]; then if [[ -d
/mnt/nfsv3/victims/v1 ]]; then sudo chmod -R a+rwx /mnt/nfsv3/victims/v1; sudo rm -rf
/mnt/nfsv3/victims/v1 || exit; fi; mkdir -p /mnt/nfsv3/victims/v1/v2; for i in $(seq -w 0 99); do touch
/mnt/nfsv3/victims/v1/v2/f$i; mkdir /mnt/nfsv3/victims/v1/v2/d$i; done; fi; MALLOC_CHECK_=2
trinity -C 2 -N 100000 -q -x mremap -V /mnt/nfsv3/victims/v1/v2
tfoerste  3627  3415  0 16:33 ?        00:00:00 trinity -C 2 -N 100000 -q -x mremap -V /mnt/nfsv3/victims/v1/v2
tfoerste  3628  3627  0 16:33 ?        00:00:04 [trinity-watchdo] <defunct>
tfoerste  3629  3627 36 16:33 ?        00:15:12 [trinity-main]
root      4083  4077  0 17:15 pts/0    00:00:00 grep --colour=auto trinity

trinity ~ # cat /proc/3629/stack                                                                                                                       
[<0805f8b4>] __switch_to+0x44/0x70                                                                                                                     
[<08503de4>] __schedule+0x2f4/0x3a0                                                                                                                    
[<08097ada>] __cond_resched+0x1a/0x30                                                                                                                  
[<08503fc1>] _cond_resched+0x31/0x50                                                                                                                   
[<080dbd92>] truncate_inode_pages_range+0x192/0x650
[<080dc2e2>] truncate_inode_pages_final+0x52/0x60
[<081ec088>] nfs_evict_inode+0x18/0x30
[<08125dfd>] evict+0xdd/0x1b0
[<08126a7d>] iput+0x16d/0x180
[<081e6e84>] nfs_dentry_iput+0x44/0x50
[<0812268a>] __dentry_kill+0x12a/0x200
[<08122eb6>] dput+0x156/0x180
[<0810eb65>] __fput+0x175/0x190
[<0810ebbb>] ____fput+0xb/0x10
[<080928f6>] task_work_run+0x76/0x90
[<0807ea6d>] do_exit+0x32d/0x940
[<0807f162>] do_group_exit+0xa2/0xf0
[<0807f1c7>] SyS_exit_group+0x17/0x20
[<08062990>] handle_syscall+0x60/0x80
[<0807473c>] userspace+0x46c/0x5e0
[<0805f720>] fork_handler+0x60/0x70
[<ffffffff>] 0xffffffff

trinity ~ # cat /proc/3628/stack
[<0805f8b4>] __switch_to+0x44/0x70
[<08503de4>] __schedule+0x2f4/0x3a0
[<08503ee5>] schedule+0x55/0x60
[<0807efde>] do_exit+0x89e/0x940
[<0807f162>] do_group_exit+0xa2/0xf0
[<0807f1c7>] SyS_exit_group+0x17/0x20
[<08062990>] handle_syscall+0x60/0x80
[<0807473c>] userspace+0x46c/0x5e0
[<0805f720>] fork_handler+0x60/0x70
[<ffffffff>] 0xffffffff

trinity ~ # cat /proc/3627/stack
[<0805f8b4>] __switch_to+0x44/0x70
[<08503de4>] __schedule+0x2f4/0x3a0
[<08503ee5>] schedule+0x55/0x60
[<0807e607>] do_wait+0x177/0x200
[<0807f62d>] SyS_wait4+0xbd/0xe0
[<0807f677>] SyS_waitpid+0x27/0x30
[<08062990>] handle_syscall+0x60/0x80
[<0807473c>] userspace+0x46c/0x5e0
[<0805f720>] fork_handler+0x60/0x70
[<ffffffff>] 0xffffffff

At the host I do see 1 linux process with 100% CPU this :

$ date; pgrep -af 'linux earlyprintk' | cut -f1 -d' ' | xargs -n1 gdb /home/tfoerste/devel/linux/linux -n
-batch -ex 'thread apply all bt'
Fri Sep  5 17:21:05 CEST 2014

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
lru_add_drain_cpu (cpu=0) at mm/swap.c:798
798     {

Thread 1 (process 5266):
#0  lru_add_drain_cpu (cpu=0) at mm/swap.c:798
#1  0x080db7a6 in lru_add_drain () at mm/swap.c:849
#2  __pagevec_release (pvec=0x8551fcb0) at mm/swap.c:968
#3  0x080dbd8d in pagevec_release (pvec=<optimized out>) at include/linux/pagevec.h:69
#4  truncate_inode_pages_range (mapping=0x8587421c, lstart=0, lend=-8840006756412162049) at mm/truncate.c:308
#5  0x080dc2e2 in truncate_inode_pages (lstart=<optimized out>, mapping=<optimized out>) at mm/truncate.c:414
#6  truncate_inode_pages_final (mapping=0x8587421c) at mm/truncate.c:460
#7  0x081ec088 in nfs_evict_inode (inode=0x85874144) at fs/nfs/inode.c:131
#8  0x08125dfd in evict (inode=0x85874144) at fs/inode.c:551
#9  0x08126a7d in iput_final (inode=<optimized out>) at fs/inode.c:1419
#10 iput (inode=0x85874144) at fs/inode.c:1437
#11 0x081e6e84 in nfs_dentry_iput (dentry=0xb8c1ec0, inode=0x85874144) at fs/nfs/dir.c:1320
#12 0x0812268a in dentry_iput (dentry=<optimized out>) at fs/dcache.c:290
#13 __dentry_kill (dentry=0x85b6b640) at fs/dcache.c:477
#14 0x08122eb6 in dentry_kill (dentry=<optimized out>) at fs/dcache.c:521
#15 dput (dentry=0x678683c0) at fs/dcache.c:617
#16 0x0810eb65 in __fput (file=0x857bf900) at fs/file_table.c:234
#17 0x0810ebbb in ____fput (work=0x857bf900) at fs/file_table.c:252
#18 0x080928f6 in task_work_run () at kernel/task_work.c:123
#19 0x0807ea6d in exit_task_work (task=<optimized out>) at include/linux/task_work.h:21
#20 do_exit (code=-2065204480) at kernel/exit.c:758
#21 0x0807f162 in do_group_exit (exit_code=0) at kernel/exit.c:886
#22 0x0807f1c7 in SYSC_exit_group (error_code=<optimized out>) at kernel/exit.c:897
#23 SyS_exit_group (error_code=0) at kernel/exit.c:895
#24 0x08062990 in handle_syscall (r=0x84ec3530) at arch/um/kernel/skas/syscall.c:35
#25 0x0807473c in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>,
pid=<optimized out>) at arch/um/os-Linux/skas/process.c:193
#26 userspace (regs=0x84ec3530) at arch/um/os-Linux/skas/process.c:426
#27 0x0805f720 in fork_handler () at arch/um/kernel/process.c:149
#28 0x5a5a5a5a in ?? ()

Now the guests is completely unresponsive while I'm running this command :

trinity ~ # gdb trinity 3629 -batch -ex 'thread apply all bt'

--
Toralf
pgp key: 0076 E94E

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
anton.ivanov | 4 Sep 21:00 2014
Picon

UML Performance improvement patchset


Patch dependencies:

[PATCH v3 01/10] Epoll based interrupt controller

Full redesign of the existing UML poll based controller. The old
poll controller incurs huge penalties for IRQ sharing and many devices
setup due to the device list being walked twice.

Additionally, the current controller has no notion of true Edge, 
Level and Write completion IRQs. 

This patch fixes the list walking bottleneck and adds all of 
the above alowing for UML to be scaled to 100s of devices 
(tested with 512+ network devices).

[PATCH v3 02/10] Remove unnecessary 'reactivate' statements

As a result of adding true Edge/Level semantics in the epoll 
controller there is no need to do the "reactivate fd" any more.

This one is an enhancement of 1 and depends on it.

[PATCH v3 03/10] High performance networking subsystem

This patchset adds vector IO ops for xmit and receive. Xmit
is optional (as it depends on a 3.0+ host), receive is always on.

The result is that UML can now hit 1G+ rates for transports
which have been enabled to use these. Presently this patchset
is kept as "legacy" as possible without leveraging the possibility
to do a true write completion poll from the new IRQ controller. 
This further performance improvement will be submitted separately.

This patch has been tested extensively only with patchsets 1 and 2.

[PATCH v3 04/10] L2TPv3 Transport Driver for UML

This is an implementation of the Ethernet over L2TPv3 protocol
leveraging both the epoll controller and the high perf vector IO.
It has been extensively tested to interop versus a set of
other implementations including Linux kernel, our port of the
same concept to QEMU/KVM, routers, etc.

Depends on 3.

[PATCH v3 05/10] GRE transport for UML

Same as L2TPv3 for GRE. Depends on 3

[PATCH v3 06/10] RAW Ethernet transport for UML

True raw driver (note - all TSO/GSO options in the NIC must
be turned off). Breaks through the 1G barrier with a vengeance
and CPU to spare. Depends on 3.

[PATCH v3 07/10] Performance and NUMA improvements for ubd

This is a well known issue/fix, qemu has the same one. If you 
do not use pwrite you can kill a machine on cache sync with 
ease. This patch is independent of the others.

[PATCH v3 08/10] Minor performance optimization for ubd

Obvious minor optimization, independent of the others.

[PATCH v3 09/10] Better IPC for UBD

Obvious optimization, independent of the others. Pipe has a 
very short queue which has 4k granularity. It is a bad IPC
for passing a lot of small chunks one at a time as used in UBD.

[PATCH v3 10/10] High Resolution Timer subsystem for UML

This version of the patch applies only to the epoll controller. 
Otherwise, the patch with minimal modifications can be applied to
stock UML. It fixes UML as far as its use for network appliance 
on all counts - TCP performance, QoS, traffic shaping, etc.

The patch is not pretty (I would have preferred to kill itimer
completely). It however does what it says on the tin and has been
doing it in testing for 2 years or so now.

Enjoy

--

A.R. Ivanov

anton.ivanov <at> kot-begemot.co.uk
antivano <at> cisco.com

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
anton.ivanov | 29 Aug 09:58 2014
Picon

[PATCHv2 10/10] High Resolution Timer subsystem for UML

From: Anton Ivanov <antivano <at> cisco.com>

This patch adds an extra timer source which has correct timing
and uses an up-to-date OS API and.

Results - correct kernel behaviour on timer related tasks.

    1. Improvement in network performance (TCP state machines
are now fed correct time).
    2. Correct QoS and traffic shaping.

This improvement does not (and cannot) fix UML userspace. Its
timer/time related behaviour is heavily dependent on getting
VTALRM pacing which is instantiated on a per userspace thread
basis. This patch does not fix this!!! It sorts out only the
kernel side - forwarding, qos, tcp, etc.

Signed-off-by: Anton Ivanov <antivano <at> cisco.com>
---

I missed timer_internal.h on the original submission. Apologies.

 arch/um/Makefile                        |    2 +-
 arch/um/include/asm/irq.h               |    3 +-
 arch/um/include/shared/kern_util.h      |    1 +
 arch/um/include/shared/os.h             |    5 +
 arch/um/include/shared/timer-internal.h |   20 ++++
 arch/um/kernel/irq.c                    |   12 +++
 arch/um/kernel/process.c                |    7 +-
 arch/um/kernel/time.c                   |   44 +++++---
 arch/um/os-Linux/signal.c               |   47 +++++++-
 arch/um/os-Linux/skas/process.c         |   24 ++---
 arch/um/os-Linux/time.c                 |  178 ++++++++++++++++++++++++-------
 11 files changed, 270 insertions(+), 73 deletions(-)
 create mode 100644 arch/um/include/shared/timer-internal.h

diff --git a/arch/um/Makefile b/arch/um/Makefile
index 133f7de..9864fb7 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
 <at>  <at>  -121,7 +121,7  <at>  <at>  export LDS_ELF_FORMAT := $(ELF_FORMAT)
 # The wrappers will select whether using "malloc" or the kernel allocator.
 LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc

-LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt))
+LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) -lrt

 # Used by link-vmlinux.sh which has special support for um link
 export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE)
diff --git a/arch/um/include/asm/irq.h b/arch/um/include/asm/irq.h
index be9128b..4dd2f07 100644
--- a/arch/um/include/asm/irq.h
+++ b/arch/um/include/asm/irq.h
 <at>  <at>  -22,8 +22,9  <at>  <at> 
 #define TELNETD_IRQ 		UM_END_ETH_IRQ + 7
 #define XTERM_IRQ 		UM_END_ETH_IRQ + 8
 #define RANDOM_IRQ 		UM_END_ETH_IRQ + 9
+#define HRTIMER_IRQ             UM_END_ETH_IRQ + 10

-#define LAST_IRQ RANDOM_IRQ
+#define LAST_IRQ HRTIMER_IRQ
 #define NR_IRQS (LAST_IRQ + 1)

 #endif
diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
index 83a91f9..0282b36 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
 <at>  <at>  -37,6 +37,7  <at>  <at>  extern void initial_thread_cb(void (*proc)(void *), void *arg);
 extern int is_syscall(unsigned long addr);

 extern void timer_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs);
+extern void hrtimer_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs);

 extern int start_uml(void);
 extern void paging_init(void);
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 7f544f4..d4fefb9 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
 <at>  <at>  -222,6 +222,7  <at>  <at>  extern char *get_umid(void);

 /* signal.c */
 extern void timer_init(void);
+extern void uml_hrtimer_init(void);
 extern void set_sigstack(void *sig_stack, int size);
 extern void remove_sigstack(void);
 extern void set_handler(int sig);
 <at>  <at>  -245,8 +246,12  <at>  <at>  extern void idle_sleep(unsigned long long nsecs);
 extern int set_interval(void);
 extern int timer_one_shot(int ticks);
 extern long long disable_timer(void);
+extern long long timer_remain(void);
 extern void uml_idle_timer(void);
+extern long long persistent_clock_emulation(void);
 extern long long os_nsecs(void);
+extern long long os_vnsecs(void);
+extern int itimer_init(void);

 /* skas/mem.c */
 extern long run_syscall_stub(struct mm_id * mm_idp,
diff --git a/arch/um/include/shared/timer-internal.h b/arch/um/include/shared/timer-internal.h
new file mode 100644
index 0000000..70f1ee1
--- /dev/null
+++ b/arch/um/include/shared/timer-internal.h
 <at>  <at>  -0,0 +1,20  <at>  <at> 
+/* 
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2000 - 2007 Jeff Dike (jdike <at> {addtoit,linux.intel}.com)
+ * Licensed under the GPL
+ */
+
+#ifndef __TIMER_INTERNAL_H__
+#define __TIMER_INTERNAL_H__
+
+#define TIMER_MULTIPLIER 256
+#define TIMER_MIN_DELTA 500 
+
+extern void timer_lock(void);
+extern void timer_unlock(void);
+
+extern long long hrtimer_disable(void);
+extern long long tracingtimer_disable(void);
+
+#endif
+
diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index f4c6fb1..d70c487 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
 <at>  <at>  -529,11 +529,23  <at>  <at>  static struct irq_chip SIGVTALRM_irq_type = {
 	.irq_unmask = dummy,
 };

+static struct irq_chip SIGUSR2_irq_type = {
+	.name = "SIGUSR2",
+	.irq_disable = dummy,
+	.irq_enable = dummy,
+	.irq_ack = dummy,
+	.irq_mask = dummy,
+	.irq_unmask = dummy,
+};
+
+
 void __init init_IRQ(void)
 {
 	int i;

 	irq_set_chip_and_handler(TIMER_IRQ, &SIGVTALRM_irq_type, handle_edge_irq);
+	irq_set_chip_and_handler(HRTIMER_IRQ, &SIGUSR2_irq_type, handle_edge_irq);
+	
 	for (i = 1; i < NR_IRQS - 1 ; i++)
 		irq_set_chip_and_handler(i, &normal_irq_type, handle_edge_irq);
 	os_setup_epoll(MAX_EPOLL_EVENTS);
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index bbcef52..b7ebc00 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
 <at>  <at>  -27,6 +27,7  <at>  <at> 
 #include <kern_util.h>
 #include <os.h>
 #include <skas.h>
+#include <timer-internal.h>

 /*
  * This is a per-cpu array.  A processor only modifies its entry and it only
 <at>  <at>  -215,7 +216,11  <at>  <at>  void arch_cpu_idle(void)
 	unsigned long long nsecs;

 	cpu_tasks[current_thread_info()->cpu].pid = os_getpid();
-	nsecs = disable_timer();
+	/* there is no benefit whatsoever in disabling a pending
+     * hrtimer and setting a nanowait for the same value instead
+     * so we do timer disable + wait only for the tracing one here
+     */ 
+    nsecs = tracingtimer_disable(); 
 	idle_sleep(nsecs);
 	local_irq_enable();
 }
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 117568d..88fa9c6 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
 <at>  <at>  -1,4 +1,5  <at>  <at> 
 /*
+ * Copyright (C) 2012-2014 Cisco Systems
  * Copyright (C) 2000 - 2007 Jeff Dike (jdike <at> {addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
 <at>  <at>  -12,6 +13,8  <at>  <at> 
 #include <asm/param.h>
 #include <kern_util.h>
 #include <os.h>
+#include <timer-internal.h>
+

 void timer_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
 {
 <at>  <at>  -22,6 +25,15  <at>  <at>  void timer_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
 	local_irq_restore(flags);
 }

+void hrtimer_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	do_IRQ(HRTIMER_IRQ, regs);
+	local_irq_restore(flags);
+}
+
 static void itimer_set_mode(enum clock_event_mode mode,
 			    struct clock_event_device *evt)
 {
 <at>  <at>  -44,7 +56,7  <at>  <at>  static void itimer_set_mode(enum clock_event_mode mode,
 static int itimer_next_event(unsigned long delta,
 			     struct clock_event_device *evt)
 {
-	return timer_one_shot(delta + 1);
+	return timer_one_shot(delta);
 }

 static struct clock_event_device itimer_clockevent = {
 <at>  <at>  -54,8 +66,11  <at>  <at>  static struct clock_event_device itimer_clockevent = {
 	.features	= CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
 	.set_mode	= itimer_set_mode,
 	.set_next_event = itimer_next_event,
-	.shift		= 32,
+	.shift		= 0,
+	.max_delta_ns	= 0xffffffff,
+ 	.min_delta_ns	= TIMER_MIN_DELTA, //microsecond resolution should be enough for anyone, same as 640K RAM
 	.irq		= 0,
+	.mult		= 1,
 };

 static irqreturn_t um_timer(int irq, void *dev)
 <at>  <at>  -67,7 +82,7  <at>  <at>  static irqreturn_t um_timer(int irq, void *dev)

 static cycle_t itimer_read(struct clocksource *cs)
 {
-	return os_nsecs() / 1000;
+	return os_nsecs() / TIMER_MULTIPLIER;
 }

 static struct clocksource itimer_clocksource = {
 <at>  <at>  -82,17 +97,21  <at>  <at>  static void __init setup_itimer(void)
 {
 	int err;

-	err = request_irq(TIMER_IRQ, um_timer, 0, "timer", NULL);
+	err = request_irq(TIMER_IRQ, um_timer, IRQF_DISABLED, "timer", NULL);
 	if (err != 0)
 		printk(KERN_ERR "register_timer : request_irq failed - "
 		       "errno = %d\n", -err);
-
-	itimer_clockevent.mult = div_sc(HZ, NSEC_PER_SEC, 32);
-	itimer_clockevent.max_delta_ns =
-		clockevent_delta2ns(60 * HZ, &itimer_clockevent);
-	itimer_clockevent.min_delta_ns =
-		clockevent_delta2ns(1, &itimer_clockevent);
-	err = clocksource_register_hz(&itimer_clocksource, USEC_PER_SEC);
+	err = request_irq(HRTIMER_IRQ, um_timer, IRQF_DISABLED, "hr timer", NULL);
+	if (err != 0)
+		printk(KERN_ERR "register_timer : request_irq failed - "
+		       "errno = %d\n", -err);
+        err = itimer_init();
+	 
+	if (err != 0)
+		printk(KERN_ERR "init itimer failed - "
+		       "errno = %d\n", -err);
+ 
+	err = clocksource_register_hz(&itimer_clocksource, NSEC_PER_SEC/TIMER_MULTIPLIER);
 	if (err) {
 		printk(KERN_ERR "clocksource_register_hz returned %d\n", err);
 		return;
 <at>  <at>  -102,7 +121,7  <at>  <at>  static void __init setup_itimer(void)

 void read_persistent_clock(struct timespec *ts)
 {
-	long long nsecs = os_nsecs();
+	long long nsecs = persistent_clock_emulation();

 	set_normalized_timespec(ts, nsecs / NSEC_PER_SEC,
 				nsecs % NSEC_PER_SEC);
 <at>  <at>  -111,5 +130,6  <at>  <at>  void read_persistent_clock(struct timespec *ts)
 void __init time_init(void)
 {
 	timer_init();
+	uml_hrtimer_init();
 	late_time_init = setup_itimer;
 }
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index 905924b..85cff54 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
 <at>  <at>  -23,7 +23,8  <at>  <at>  void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
 	[SIGBUS]	= bus_handler,
 	[SIGSEGV]	= segv_handler,
 	[SIGIO]		= sigio_handler,
-	[SIGVTALRM]	= timer_handler };
+	[SIGVTALRM]	= timer_handler, 
+	[SIGUSR2]	= hrtimer_handler };

 static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
 {
 <at>  <at>  -58,6 +59,10  <at>  <at>  static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
 #define SIGVTALRM_BIT 1
 #define SIGVTALRM_MASK (1 << SIGVTALRM_BIT)

+#define SIGUSR2_BIT 1
+#define SIGUSR2_MASK (1 << SIGUSR2_BIT)
+
+
 static int signals_enabled;
 static unsigned int signals_pending;

 <at>  <at>  -89,6 +94,17  <at>  <at>  static void real_alarm_handler(mcontext_t *mc)
 	timer_handler(SIGVTALRM, NULL, &regs);
 }

+static void real_hralarm_handler(mcontext_t *mc)
+{
+	struct uml_pt_regs regs;
+
+	if (mc != NULL)
+		get_regs_from_mc(&regs, mc);
+	regs.is_user = 0;
+	hrtimer_handler(SIGUSR2, NULL, &regs);
+}
+
+
 void alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
 {
 	int enabled;
 <at>  <at>  -105,11 +121,33  <at>  <at>  void alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
 	set_signals(enabled);
 }

+void hralarm_handler(int sig, mcontext_t *mc)
+{
+	int enabled;
+
+	enabled = signals_enabled;
+	if (!signals_enabled) {
+		signals_pending |= SIGUSR2_MASK;
+		return;
+	}
+
+	block_signals();
+
+	real_hralarm_handler(mc);
+	set_signals(enabled);
+}
+
+
 void timer_init(void)
 {
 	set_handler(SIGVTALRM);
 }

+void uml_hrtimer_init(void)
+{
+	set_handler(SIGUSR2);
+}
+
 void set_sigstack(void *sig_stack, int size)
 {
 	stack_t stack = ((stack_t) { .ss_flags	= 0,
 <at>  <at>  -129,7 +167,8  <at>  <at>  static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = {

 	[SIGIO] = sig_handler,
 	[SIGWINCH] = sig_handler,
-	[SIGVTALRM] = alarm_handler
+	[SIGVTALRM] = alarm_handler,
+	[SIGUSR2] = hralarm_handler
 };

 
 <at>  <at>  -189,6 +228,7  <at>  <at>  void set_handler(int sig)
 	sigaddset(&action.sa_mask, SIGVTALRM);
 	sigaddset(&action.sa_mask, SIGIO);
 	sigaddset(&action.sa_mask, SIGWINCH);
+	sigaddset(&action.sa_mask, SIGUSR2);

 	if (sig == SIGSEGV)
 		flags |= SA_NODEFER;
 <at>  <at>  -283,6 +323,9  <at>  <at>  void unblock_signals(void)

 		if (save_pending & SIGVTALRM_MASK)
 			real_alarm_handler(NULL);
+
+		if (save_pending & SIGUSR2_MASK)
+			real_hralarm_handler(NULL);
 	}
 }

diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c
index d531879..64ccc64 100644
--- a/arch/um/os-Linux/skas/process.c
+++ b/arch/um/os-Linux/skas/process.c
 <at>  <at>  -346,8 +346,7  <at>  <at>  int start_userspace(unsigned long stub_stack)

 void userspace(struct uml_pt_regs *regs)
 {
-	struct itimerval timer;
-	unsigned long long nsecs, now;
+	unsigned long long nsecs;
 	int err, status, op, pid = userspace_pid[0];
 	/* To prevent races if using_sysemu changes under us.*/
 	int local_using_sysemu;
 <at>  <at>  -356,13 +355,11  <at>  <at>  void userspace(struct uml_pt_regs *regs)
 	/* Handle any immediate reschedules or signals */
 	interrupt_end();

-	if (getitimer(ITIMER_VIRTUAL, &timer))
-		printk(UM_KERN_ERR "Failed to get itimer, errno = %d\n", errno);
-	nsecs = timer.it_value.tv_sec * UM_NSEC_PER_SEC +
-		timer.it_value.tv_usec * UM_NSEC_PER_USEC;
-	nsecs += os_nsecs();
-
 	while (1) {
+
+		nsecs = timer_remain();
+	        nsecs += os_nsecs();
+		
 		/*
 		 * This can legitimately fail if the process loads a
 		 * bogus value into a segment register.  It will
 <at>  <at>  -434,23 +431,18  <at>  <at>  void userspace(struct uml_pt_regs *regs)
 				relay_signal(SIGTRAP, (struct siginfo *)&si, regs);
 				break;
 			case SIGVTALRM:
-				now = os_nsecs();
-				if (now < nsecs)
+				if (nsecs < os_nsecs())
 					break;
 				block_signals();
 				(*sig_info[sig])(sig, (struct siginfo *)&si, regs);
 				unblock_signals();
-				nsecs = timer.it_value.tv_sec *
-					UM_NSEC_PER_SEC +
-					timer.it_value.tv_usec *
-					UM_NSEC_PER_USEC;
-				nsecs += os_nsecs();
-				break;
+			 	break;
 			case SIGIO:
 			case SIGILL:
 			case SIGBUS:
 			case SIGFPE:
 			case SIGWINCH:
+			case SIGUSR2:
 				block_signals();
 				(*sig_info[sig])(sig, (struct siginfo *)&si, regs);
 				unblock_signals();
diff --git a/arch/um/os-Linux/time.c b/arch/um/os-Linux/time.c
index e9824d5..f6eab4f 100644
--- a/arch/um/os-Linux/time.c
+++ b/arch/um/os-Linux/time.c
 <at>  <at>  -1,4 +1,5  <at>  <at> 
 /*
+ * Copyright (C) 2012-2014 Cisco Systems
  * Copyright (C) 2000 - 2007 Jeff Dike (jdike{addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
 <at>  <at>  -10,7 +11,53  <at>  <at> 
 #include <sys/time.h>
 #include <kern_util.h>
 #include <os.h>
+#include <string.h>
 #include "internal.h"
+#include <timer-internal.h>
+
+static timer_t event_high_res_timer = 0;
+
+static inline long long timeval_to_ns(const struct timeval *tv)
+{
+	return ((long long) tv->tv_sec * UM_NSEC_PER_SEC) +
+		tv->tv_usec * UM_NSEC_PER_USEC;
+}
+
+static inline long long timespec_to_ns(const struct timespec *ts)
+{
+ 	return ((long long) ts->tv_sec * UM_NSEC_PER_SEC) +
+ 		ts->tv_nsec;
+}
+
+long long  persistent_clock_emulation (void) {
+	struct timespec realtime_tp;
+
+	clock_gettime(CLOCK_REALTIME, &realtime_tp);
+	return timespec_to_ns(&realtime_tp);
+}
+
+
+int itimer_init(void) {
+	struct sigevent sev, bbev;
+	sev.sigev_notify = SIGEV_SIGNAL;
+	sev.sigev_signo = SIGUSR2; /* note - hrtimer now has its own signal */
+	sev.sigev_value.sival_ptr = &event_high_res_timer;
+	if (timer_create(
+	       CLOCK_MONOTONIC, 
+	       &sev,
+		&event_high_res_timer) == -1
+	) {
+		printk("Failed to create Timer");
+		return -1;
+	} else {
+		printk("Event timer ID is 0x%lx\n", (long) event_high_res_timer);
+	}
+	return 0;
+}
+
+/* 
+* This is used for tracing and cannot be removed at this point (TODO)
+*/

 int set_interval(void)
 {
 <at>  <at>  -24,61 +71,106  <at>  <at>  int set_interval(void)
 	return 0;
 }

-int timer_one_shot(int ticks)
+long long timer_remain (void) 
 {
-	unsigned long usec = ticks * UM_USEC_PER_SEC / UM_HZ;
-	unsigned long sec = usec / UM_USEC_PER_SEC;
 	struct itimerval interval;
+	long long remain = 0;
+	if (getitimer(ITIMER_VIRTUAL, &interval)) {
+		printk(UM_KERN_ERR "Failed to get itimer, errno = %d\n", errno);
+	} else {
+		remain = timeval_to_ns(&interval.it_value);
+	}
+	return remain;
+}

-	usec %= UM_USEC_PER_SEC;
-	interval = ((struct itimerval) { { 0, 0 }, { sec, usec } });
+int timer_one_shot(int ticks)
+{
+ 	struct itimerspec its;
+        unsigned long long nsec;
+ 	unsigned long sec;

-	if (setitimer(ITIMER_VIRTUAL, &interval, NULL) == -1)
-		return -errno;
+
+        nsec = (ticks + 1);
+
+        sec = nsec / UM_NSEC_PER_SEC;
+
+ 	nsec = nsec % UM_NSEC_PER_SEC;
+
+ 	its.it_value.tv_sec = nsec / UM_NSEC_PER_SEC;
+ 	its.it_value.tv_nsec = nsec ;
+
+ 	its.it_interval.tv_sec = 0;
+ 	its.it_interval.tv_nsec = 0; // we cheat here
+
+ 	timer_settime(event_high_res_timer, 0, &its, NULL);

 	return 0;
 }

-/**
- * timeval_to_ns - Convert timeval to nanoseconds
- *  <at> ts:		pointer to the timeval variable to be converted
- *
- * Returns the scalar nanosecond representation of the timeval
- * parameter.
- *
- * Ripped from linux/time.h because it's a kernel header, and thus
- * unusable from here.
- */
-static inline long long timeval_to_ns(const struct timeval *tv)
+long long hrtimer_disable(void)
 {
-	return ((long long) tv->tv_sec * UM_NSEC_PER_SEC) +
-		tv->tv_usec * UM_NSEC_PER_USEC;
+	struct itimerspec its;
+
+	memset(&its, 0, sizeof(struct itimerspec));
+	timer_settime(event_high_res_timer, 0, &its, &its);
+
+	return its.it_value.tv_sec * UM_NSEC_PER_SEC + its.it_value.tv_nsec;
 }

-long long disable_timer(void)
+long long tracingtimer_disable(void)
 {
-	struct itimerval time = ((struct itimerval) { { 0, 0 }, { 0, 0 } });
-	long long remain, max = UM_NSEC_PER_SEC / UM_HZ;
+	struct itimerval itv;

-	if (setitimer(ITIMER_VIRTUAL, &time, &time) < 0)
-		printk(UM_KERN_ERR "disable_timer - setitimer failed, "
-		       "errno = %d\n", errno);
+	memset(&itv, 0, sizeof(struct itimerval));
+	setitimer(ITIMER_VIRTUAL, &itv, &itv);

-	remain = timeval_to_ns(&time.it_value);
-	if (remain > max)
-		remain = max;
+	return itv.it_value.tv_sec * UM_NSEC_PER_SEC + itv.it_value.tv_usec * 1000;
+}
+
+long long disable_timer(void)
+{
+	 long long nsec;
+	 long long tnsec;
+   
+        /* 
+
+	 This is now fixed in the main idle loop so we really kill
+	 both timers here to ensure that UML can exit cleanly and 
+	 not die on a spurious SIG_VTALRM
+      
+        */
+
+
+        nsec = hrtimer_disable();
+        tnsec = tracingtimer_disable();
+        if (nsec > tnsec) {
+		return tnsec;
+        } else {
+		return nsec;
+        }
+}
+
+long long os_vnsecs(void)
+{
+	struct timespec ts;
+	 
+        clock_gettime(CLOCK_PROCESS_CPUTIME_ID,&ts);
+ 	return timespec_to_ns(&ts);

-	return remain;
 }

 long long os_nsecs(void)
 {
-	struct timeval tv;

-	gettimeofday(&tv, NULL);
-	return timeval_to_ns(&tv);
+	struct timespec ts;
+
+        clock_gettime(CLOCK_MONOTONIC,&ts);
+ 	return timespec_to_ns(&ts);
+
 }

+
+
 #ifdef UML_CONFIG_NO_HZ_COMMON
 static int after_sleep_interval(struct timespec *ts)
 {
 <at>  <at>  -169,18 +261,24  <at>  <at>  void idle_sleep(unsigned long long nsecs)
 	struct timespec ts;

 	/*
-	 * nsecs can come in as zero, in which case, this starts a
-	 * busy loop.  To prevent this, reset nsecs to the tick
-	 * interval if it is zero.
+	 *   We sleep here for an interval that is not greater than HZ
+	 *   We did not disable the timer in "disable" so if there is a timer
+	 *   active it will wake us up right on time instead of doing 
+	 *   stupid things trying to program nanosleep in a race condition
+	 *   manner.
 	 */
-	if (nsecs == 0)
-		nsecs = UM_NSEC_PER_SEC / UM_HZ;
+	 
+        if ((nsecs == 0) || (nsecs > UM_NSEC_PER_SEC / UM_HZ)) {   
+	   nsecs = UM_NSEC_PER_SEC / UM_HZ ;
+        }

-	nsecs = sleep_time(nsecs);
 	ts = ((struct timespec) { .tv_sec	= nsecs / UM_NSEC_PER_SEC,
 				  .tv_nsec	= nsecs % UM_NSEC_PER_SEC });

-	if (nanosleep(&ts, &ts) == 0)
+
+	if (clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, &ts) == 0) {
 		deliver_alarm();
+        }
+        set_interval();
 	after_sleep_interval(&ts);
 }
--

-- 
1.7.10.4

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
anton.ivanov | 29 Aug 09:56 2014
Picon

[PATCHv2 3/10] High performance networking subsystem

From: Anton Ivanov <antivano <at> cisco.com>

    Support for multi-packet vector IO - multiple packets
    read in one syscall and written in one syscall. Should work with
    legacy UML, thorough tested only for the epoll based IRQ controller

    Minimal host kernel version for RX - 2.6.32
    Minimal host kernel version for TX - 3.0

    Tested on Debian 7.0/Ubuntu 12.x LTS which have the relevant
    syscalls, but do not have the appropriate glibc routine for TX
    (this is why it is a direct syscall).

Signed-off-by: Anton Ivanov <antivano <at> cisco.com>
---

I have missed net_extra_* on the original submission, this is a resubmit.
Apologies.

 arch/um/drivers/Makefile          |    2 +-
 arch/um/drivers/net_extra_kern.c  |  218 +++++++++++++++++++++++++
 arch/um/drivers/net_extra_user.c  |  319 +++++++++++++++++++++++++++++++++++++
 arch/um/drivers/net_kern.c        |   63 +++++---
 arch/um/include/asm/irq.h         |   26 +--
 arch/um/include/shared/net_kern.h |   24 +++
 arch/um/include/shared/net_user.h |   24 +++
 arch/um/kernel/irq.c              |    3 +
 8 files changed, 646 insertions(+), 33 deletions(-)
 create mode 100644 arch/um/drivers/net_extra_kern.c
 create mode 100644 arch/um/drivers/net_extra_user.c

diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile
index e7582e1..836baaf 100644
--- a/arch/um/drivers/Makefile
+++ b/arch/um/drivers/Makefile
 <at>  <at>  -10,7 +10,7  <at>  <at>  slip-objs := slip_kern.o slip_user.o
 slirp-objs := slirp_kern.o slirp_user.o
 daemon-objs := daemon_kern.o daemon_user.o
 umcast-objs := umcast_kern.o umcast_user.o
-net-objs := net_kern.o net_user.o
+net-objs := net_kern.o net_user.o net_extra_user.o net_extra_kern.o
 mconsole-objs := mconsole_kern.o mconsole_user.o
 hostaudio-objs := hostaudio_kern.o
 ubd-objs := ubd_kern.o ubd_user.o
diff --git a/arch/um/drivers/net_extra_kern.c b/arch/um/drivers/net_extra_kern.c
new file mode 100644
index 0000000..b1d36d8
--- /dev/null
+++ b/arch/um/drivers/net_extra_kern.c
 <at>  <at>  -0,0 +1,218  <at>  <at> 
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike <at> {addtoit,linux.intel}.com)
+ * Copyright (C) 2001 Lennert Buytenhek (buytenh <at> gnu.org) and
+ * James Leu (jleu <at> mindspring.net).
+ * Copyright (C) 2001 by various other people who didn't put their name here.
+ * Licensed under the GPL.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/inetdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/rtnetlink.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include "init.h"
+#include "irq_kern.h"
+#include "irq_user.h"
+#include "mconsole_kern.h"
+#include "net_kern.h"
+#include "net_user.h"
+
+#define DRIVER_NAME "uml-netdev"
+
+/* 
+	These are wrappers around key kernel side functions so we can
+	invoke them from the user side of our Schizofreniac self
+
+*/
+
+extern spinlock_t uml_sigio_lock;
+extern int in_epoll_loop;
+
+static DEFINE_SPINLOCK(net_queue_list);
+
+static struct mmsg_queue_info * pending_queue = NULL;
+
+void uml_net_destroy_skb(void * skb)
+{
+	if (skb) {
+		kfree_skb((struct sk_buff *) skb);
+	}
+}
+
+void * uml_net_build_skb (void * dev)
+{
+	struct uml_net_private *lp = netdev_priv((struct net_device *) dev);
+	struct sk_buff * skb;
+
+	skb =  dev_alloc_skb(lp->max_packet + 32);
+	if (skb) {
+	/* add some tunneling space just in case, we usually do not need it as we use vector IO */
+		skb_reserve(skb,32);	
+		skb->dev = dev;
+		skb_put(skb, lp->max_packet);
+		skb_reset_mac_header(skb);
+		skb->ip_summed =  CHECKSUM_NONE;
+	} else {
+		printk("Failed Atomic SKB Allocation, will drop\n");
+	}
+	return skb;
+}
+
+void * uml_net_skb_data (void * skb) {
+	if (skb) {
+		return ((struct sk_buff *) skb)->data;
+	} else {
+		printk("hole in vector!!!\n");
+		return NULL;
+	}
+}
+
+
+int uml_net_advance_head( struct mmsg_queue_info * queue_info, int advance)
+{
+	int queue_depth;
+	queue_info->head = 
+		(queue_info->head + advance) 
+			% queue_info->max_depth;
+
+	/* caller is already holding the head_lock */
+
+	spin_lock(&queue_info->tail_lock);
+	queue_info->queue_depth -= advance;
+	queue_depth = queue_info->queue_depth;
+	spin_unlock(&queue_info->tail_lock);
+	return queue_depth;
+}
+
+/* 
+	This is called by enqueuers which should hold the
+	head lock already
+*/ 
+
+int uml_net_advance_tail( struct mmsg_queue_info * queue_info, int advance) 
+{
+	int queue_depth;
+	queue_info->tail = 
+		(queue_info->tail + advance) 
+			% queue_info->max_depth;
+	spin_lock(&queue_info->head_lock);
+	queue_info->queue_depth += advance;
+	queue_depth = queue_info->queue_depth;
+	spin_unlock(&queue_info->head_lock);
+	return queue_depth;
+}
+
+
+static int flush_mmsg_queue(struct mmsg_queue_info * queue_info, int queue_depth) 
+{
+	int fd = queue_info->fd;
+	struct mmsghdr * send_from;
+	void ** skb_send_vector;
+	int result = 0, send_len, skb_index, allowed_drop = 0;
+
+	if (! queue_info) {
+		/* someone passed a null queue, should not occur */
+		return 0;
+	}
+
+	if (spin_trylock(&queue_info->head_lock))   {
+		if (spin_trylock(&queue_info->tail_lock)) { 
+			/* update queue_depth */
+			queue_depth = queue_info->queue_depth;
+			spin_unlock(&queue_info->tail_lock); 
+			if (queue_depth > 0) {
+				do {
+					send_len = queue_depth;
+					send_from = queue_info->mmsg_send_vector;
+					send_from += queue_info->head;
+					if (send_len + queue_info->head > queue_info->max_depth) {
+						send_len = queue_info->max_depth - queue_info->head;
+					}
+					if (send_len > 0) {
+						result = net_sendmmsg(
+						    fd, send_from, send_len, 0
+						);
+						if (send_len == result) {
+							/* clear drop allowance */
+							allowed_drop = 0;
+						} else {
+							/* first time we just retry */
+							result = result + allowed_drop;
+							if (send_len - result < 0) {
+								result = send_len;
+							}
+							allowed_drop = (allowed_drop + 1) * 2;
+						}
+					}
+					if (result > 0) {
+						skb_send_vector = queue_info->skb_send_vector;
+						skb_send_vector += queue_info->head;
+						for (skb_index = 0; skb_index < result; skb_index++) {
+							uml_net_destroy_skb(* skb_send_vector);
+							(* skb_send_vector) = NULL; /* just in case */
+							skb_send_vector ++ ;
+						}
+						queue_depth = uml_net_advance_head(queue_info, result);
+					} 
+				} while (
+					(send_len == result)  && /* we sent whatever we tried */
+					(queue_depth > 0)
+					);
+			} 
+		} 
+		spin_unlock(&queue_info->head_lock);
+	} 
+	return queue_depth;
+}
+
+int uml_net_flush_mmsg_queue(
+    struct mmsg_queue_info * queue_info,
+    int queue_depth
+) {
+
+	if (queue_depth >= (queue_info->max_depth - 1)) {
+		return flush_mmsg_queue(pending_queue, queue_depth);
+	}
+	if (spin_trylock(&uml_sigio_lock)) {
+		/* unconditional flush - end of epoll loop */
+		if (!(in_epoll_loop)) {
+			queue_depth = flush_mmsg_queue(queue_info, queue_depth);
+		}
+		spin_unlock(&uml_sigio_lock);
+	} 
+	
+	spin_lock(&net_queue_list);
+	if ((pending_queue) && (pending_queue != queue_info)) {
+		flush_mmsg_queue(pending_queue, queue_depth);
+		/* we need a packet drop procedure here */
+	} else {
+		queue_depth = 0;
+	}
+	pending_queue = queue_info;
+	spin_unlock(&net_queue_list);
+
+	return queue_depth;
+}
+
+void flush_pending_netio(void) {
+	int result; 
+	spin_lock(&net_queue_list);
+	if (pending_queue) {
+		do {
+			result = flush_mmsg_queue(pending_queue, 1);
+		} while (result > 0);
+	}
+	pending_queue = NULL;
+	spin_unlock(&net_queue_list);
+}
+
+
diff --git a/arch/um/drivers/net_extra_user.c b/arch/um/drivers/net_extra_user.c
new file mode 100644
index 0000000..f6715d1
--- /dev/null
+++ b/arch/um/drivers/net_extra_user.c
 <at>  <at>  -0,0 +1,319  <at>  <at> 
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Licensed under the GPL
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdarg.h>
+#include <errno.h>
+#include <stddef.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/wait.h>
+#include <asm/unistd.h>
+#include "net_user.h"
+#include "os.h"
+#include "um_malloc.h"
+
+/* 
+* Principles of operation:
+*
+* EVERYTHING here is built to tolerate a failed memory allocation. 
+* If either a header buffer or a data buffer (taken from skb->data) 
+* is NULL the read will fail and the packet will be dropped. This 
+* is the normal behaviour of recvmsg and recvmmsg functions - if a 
+* particular iov_base == NULL and its corresponding iov_baselen is 
+* 0 we truncate and/or drop the packet altogether.
+*
+* On the negative side this means that we have to do a few more 
+* checks for NULL here and there. On the positive side this means 
+* that the whole thing is more robust including under low
+* memory conditions.
+*
+* There is one special case which we need to handle as a result of 
+* this - any header verification functions should return "broken 
+* header" on hitting a NULL. This will in turn invoke the applicable
+* packet drop logic.
+* 
+* Any changes should follow this overall design.
+*
+* Side effect - none of these need to use the shared (and mutexed) 
+* drop skb. This is surplus to reqs, the normal recvm(m)msg drop 
+* mechanics will drop it.
+*/
+
+int net_readv(int fd, void *iov, int iovcnt)
+{
+	int n;
+
+	CATCH_EINTR(n = readv(fd,  iov,  iovcnt));
+	if ((n < 0) && (errno == EAGAIN))
+		return 0;
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_recvfrom2(int fd, void *buf, int len, void *src_addr, int *addrlen)
+{
+	int n;
+
+	CATCH_EINTR(n = recvfrom(fd,  buf,  len, 0, src_addr, addrlen));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_writev(int fd, void *iov, int iovcnt)
+{
+	int n;
+
+	CATCH_EINTR(n = writev(fd, iov, iovcnt));
+
+	if ((n < 0) && (errno == EAGAIN))
+		return 0;
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_sendmessage(int fd, void *msg, int flags)
+{
+	int n;
+
+	CATCH_EINTR(n = sendmsg(fd, msg, flags));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+int net_recvmessage(int fd, void *msg, int flags)
+{
+	int n;
+
+	CATCH_EINTR(n = recvmsg(fd, msg, flags));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_recvmmsg(int fd, void *msgvec, unsigned int vlen,
+		    unsigned int flags, struct timespec *timeout)
+{
+	int n;
+
+	CATCH_EINTR(n = recvmmsg(fd, msgvec, vlen, flags, timeout));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_sendmmsg(int fd, void *msgvec, unsigned int vlen,
+		    unsigned int flags)
+{
+	int n;
+
+#ifdef HAS_SENDMMSG 
+
+    /* has proper sendmmsg */
+
+	CATCH_EINTR(n = sendmmsg(fd, msgvec, vlen, flags));
+#else
+
+    /* no glibc wrapper for sendmmsg - Ubuntu LTS 12.04, Debian 7.x */
+
+	CATCH_EINTR(n = syscall(__NR_sendmmsg, fd, msgvec, vlen, flags));
+#endif
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+void destroy_skb_vector(void ** vector, int size)
+{
+	int i;
+	void ** tofree = vector;
+
+	for (i=0;i<size;i++) {
+		if ( * vector) {
+			uml_net_destroy_skb(* vector);
+		}
+	vector ++;
+	}
+	kfree(tofree);
+}
+
+void destroy_mmsg_vector(void * mmsgvector, int size, int free_iov_base)
+{
+	struct mmsghdr * vector = (struct mmsghdr *) mmsgvector;
+	struct iovec * iov;
+	int i;
+	if (vector) {
+		for (i = 0; i < size; i++) {
+			iov = vector->msg_hdr.msg_iov;
+			if (iov) {
+				if (free_iov_base) {
+					kfree(iov->iov_base);
+				}
+				kfree(iov);
+			}
+			vector ++;
+		}
+		kfree(mmsgvector);
+	} else {
+		printk("NULL mmsg vector in destroy, should not occur\n");
+	}
+}
+
+void * build_skbuf_vector(int size, void * dev)
+{
+	int i;
+	void **result, **vector;
+	result = uml_kmalloc(size * sizeof(void *), UM_GFP_KERNEL);
+	vector = result;
+	if (vector) {
+		for (i = 0; i < size; i++) {
+			* vector = uml_net_build_skb(dev);
+			vector++;
+		}
+	}
+	return result;
+}  
+
+void rebuild_skbuf_vector(void ** skbvec, int size, void * dev)
+{
+	int i;
+	if (skbvec) {
+		for (i = 0; i < size; i++) {
+			* skbvec = uml_net_build_skb(dev);
+			skbvec++;
+		}
+	}
+}  
+
+void repair_mmsg (void *vec, int iovsize, int header_size)
+{
+	struct mmsghdr * msgvec = (struct mmsghdr *) vec;
+	struct iovec * iov;
+	if (! msgvec->msg_hdr.msg_iov) {
+		msgvec->msg_hdr.msg_iov = uml_kmalloc(sizeof(struct iovec) * iovsize, UM_GFP_KERNEL);
+	}
+	iov = msgvec->msg_hdr.msg_iov;
+	if (iov) {
+		if (! iov->iov_base) {
+			iov->iov_base=uml_kmalloc(header_size, UM_GFP_KERNEL);
+		}
+		if (iov->iov_base) {
+			/* put correct header size just in case - we may have had a short frame */
+			iov->iov_len = header_size; 
+		} else {
+			printk("failed to allocate a header buffer, will cause a packet drop later\n");
+			iov->iov_len = 0;
+		}
+	}
+}
+
+void * build_mmsg_vector(int size, int iovsize)
+{
+	int i;
+	struct mmsghdr *msgvec, *result;
+	struct iovec * iov;
+
+	result = uml_kmalloc(sizeof(struct mmsghdr) * size, UM_GFP_KERNEL);
+	msgvec = result;
+	if (msgvec) {
+		memset(msgvec, '\0', sizeof(struct mmsghdr) * size); 
+		for ( i = 0; i < size; i++) {
+			iov = uml_kmalloc(sizeof(struct iovec) * iovsize, UM_GFP_KERNEL);
+			msgvec->msg_hdr.msg_iov=iov;
+			if (iov) {
+				memset(iov, '\0', sizeof(struct iovec) * iovsize); 
+				msgvec->msg_hdr.msg_iovlen=iovsize;
+			} else {
+				printk("failed to allocate iov\n");
+				msgvec->msg_hdr.msg_iovlen=0; /* silent drop on receive, no xmit */
+			}
+			msgvec++;
+		}
+	}
+	return result;
+}
+
+void add_header_buffers(void * msgvec, int size, int header_size)
+{
+	int i;
+	struct iovec * iov;
+	struct mmsghdr * mmsgvec = (struct mmsghdr *) msgvec;
+	for ( i = 0; i < size; i++) {
+		iov = mmsgvec->msg_hdr.msg_iov;
+		if (iov) {
+	    		iov->iov_base=uml_kmalloc(header_size, UM_GFP_KERNEL);
+			if (iov->iov_base) {
+				iov->iov_len = header_size;
+			} else {
+				printk("failed to allocate a header buffer, will cause a packet drop later\n");
+				iov->iov_len = 0;
+			}
+		} 
+	mmsgvec++;
+	}
+}
+
+/* NOTE - this is only for offset = 0 or 1, other cases are unhandled!!! */
+
+void add_skbuffs(void * msgvec, void ** skbvec, int size, int skb_size, int offset) {
+	int i;
+	struct iovec * iov;
+	struct mmsghdr * mmsgvec = (struct mmsghdr *) msgvec;
+	for ( i = 0; i < size; i++) {
+	/* 
+	    This heavily relies on all IOVs being present, if the initial allocation 
+	    fails it must clean up and switch to "normal" per-packet receive instead
+	    Later allocations of skbufs can fail - this will result in short reads
+	    and skips
+
+	 */
+		iov = mmsgvec->msg_hdr.msg_iov;
+		if (iov) {
+			iov += offset; 
+			iov->iov_base=uml_net_skb_data(* skbvec);
+			if (iov->iov_base) {
+				iov->iov_len = skb_size;
+			} else {
+				printk("NULL SKB will drop\n");
+				iov->iov_len = 0;
+			}
+		} else {
+			printk("NULL IOV will drop\n");
+		}
+		mmsgvec++;
+		skbvec++;
+	}
+}
+
+
diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c
index 64d8426..1d253fa 100644
--- a/arch/um/drivers/net_kern.c
+++ b/arch/um/drivers/net_kern.c
 <at>  <at>  -1,4 +1,5  <at>  <at> 
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2001 - 2007 Jeff Dike (jdike <at> {addtoit,linux.intel}.com)
  * Copyright (C) 2001 Lennert Buytenhek (buytenh <at> gnu.org) and
  * James Leu (jleu <at> mindspring.net).
 <at>  <at>  -29,6 +30,7  <at>  <at> 

 static DEFINE_SPINLOCK(opened_lock);
 static LIST_HEAD(opened);
+static int rr_counter = 0;

 /*
  * The drop_skb is used when we can't allocate an skb.  The
 <at>  <at>  -42,6 +44,7  <at>  <at>  static DEFINE_SPINLOCK(drop_lock);
 static struct sk_buff *drop_skb;
 static int drop_max;

+
 static int update_drop_skb(int max)
 {
 	struct sk_buff *new;
 <at>  <at>  -77,24 +80,38  <at>  <at>  static int uml_net_rx(struct net_device *dev)
 	struct sk_buff *skb;

 	/* If we can't allocate memory, try again next round. */
-	skb = dev_alloc_skb(lp->max_packet);
-	if (skb == NULL) {
-		drop_skb->dev = dev;
-		/* Read a packet into drop_skb and don't do anything with it. */
-		(*lp->read)(lp->fd, drop_skb, lp);
-		dev->stats.rx_dropped++;
+	if (lp->options & UML_NET_USE_SKB_READ) {
+	    /* we expect a full formed, well behaved skb from zero copy drivers here */
+	    skb = (*lp->skb_read)(lp);
+	    if (skb == NULL) {
 		return 0;
-	}
-
-	skb->dev = dev;
-	skb_put(skb, lp->max_packet);
-	skb_reset_mac_header(skb);
-	pkt_len = (*lp->read)(lp->fd, skb, lp);
-
-	if (pkt_len > 0) {
+	    }
+	    pkt_len = skb->len;
+	} else {
+	    skb = dev_alloc_skb(lp->max_packet + 32);
+	    if (skb == NULL) {
+		    drop_skb->dev = dev;
+		    /* Read a packet into drop_skb and don't do anything with it. */
+		    (*lp->read)(lp->fd, drop_skb, lp);
+		    dev->stats.rx_dropped++;
+		    return 0;
+	    }
+
+	    skb_reserve(skb,32);
+	    skb->dev = dev;
+	    skb_put(skb, lp->max_packet);
+	    skb_reset_mac_header(skb);
+
+	    // Mark that virtual devices cannot provide required checksum.
+	    skb->ip_summed = CHECKSUM_NONE;
+	    pkt_len = (*lp->read)(lp->fd, skb, lp);
+	    if (pkt_len > 0) {
 		skb_trim(skb, pkt_len);
 		skb->protocol = (*lp->protocol)(skb);
+	    }
+	}

+	if (pkt_len > 0) {
 		dev->stats.rx_bytes += skb->len;
 		dev->stats.rx_packets++;
 		netif_rx(skb);
 <at>  <at>  -192,8 +209,9  <at>  <at>  static int uml_net_close(struct net_device *dev)
 	struct uml_net_private *lp = netdev_priv(dev);

 	netif_stop_queue(dev);
+	deactivate_fd(lp->fd, dev->irq);

-	um_free_irq(dev->irq, dev);
+	free_irq(dev->irq, dev);
 	if (lp->close != NULL)
 		(*lp->close)(lp->fd, &lp->user);
 	lp->fd = -1;
 <at>  <at>  -216,7 +234,6  <at>  <at>  static int uml_net_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	spin_lock_irqsave(&lp->lock, flags);

 	len = (*lp->write)(lp->fd, skb, lp);
-	skb_tx_timestamp(skb);

 	if (len == skb->len) {
 		dev->stats.tx_packets++;
 <at>  <at>  -273,14 +290,13  <at>  <at>  static void uml_net_poll_controller(struct net_device *dev)
 static void uml_net_get_drvinfo(struct net_device *dev,
 				struct ethtool_drvinfo *info)
 {
-	strlcpy(info->driver, DRIVER_NAME, sizeof(info->driver));
-	strlcpy(info->version, "42", sizeof(info->version));
+	strcpy(info->driver, DRIVER_NAME);
+	strcpy(info->version, "42");
 }

 static const struct ethtool_ops uml_net_ethtool_ops = {
 	.get_drvinfo	= uml_net_get_drvinfo,
 	.get_link	= ethtool_op_get_link,
-	.get_ts_info	= ethtool_op_get_ts_info,
 };

 static void uml_net_user_timer_expire(unsigned long _conn)
 <at>  <at>  -447,6 +463,7  <at>  <at>  static void eth_configure(int n, void *init, char *mac,
 	 * These just fill in a data structure, so there's no failure
 	 * to be worried about.
 	 */
+	dev->ethtool_ops = &uml_net_ethtool_ops;
 	(*transport->kern->init)(dev, init);

 	*lp = ((struct uml_net_private)
 <at>  <at>  -459,7 +476,9  <at>  <at>  static void eth_configure(int n, void *init, char *mac,
 		  .open 		= transport->user->open,
 		  .close 		= transport->user->close,
 		  .remove 		= transport->user->remove,
+		  .options 		= transport->kern->options,
 		  .read 		= transport->kern->read,
+		  .skb_read 		= transport->kern->skb_read,
 		  .write 		= transport->kern->write,
 		  .add_address 		= transport->user->add_address,
 		  .delete_address  	= transport->user->delete_address });
 <at>  <at>  -475,9 +494,9  <at>  <at>  static void eth_configure(int n, void *init, char *mac,

 	dev->mtu = transport->user->mtu;
 	dev->netdev_ops = &uml_netdev_ops;
-	dev->ethtool_ops = &uml_net_ethtool_ops;
 	dev->watchdog_timeo = (HZ >> 1);
-	dev->irq = UM_ETH_IRQ;
+	dev->irq = UM_ETH_BASE_IRQ + (rr_counter % UM_ETH_IRQ_RR); 
+	rr_counter++;

 	err = update_drop_skb(lp->max_packet);
 	if (err)
 <at>  <at>  -829,7 +848,7  <at>  <at>  static void close_devices(void)
 	spin_lock(&opened_lock);
 	list_for_each(ele, &opened) {
 		lp = list_entry(ele, struct uml_net_private, list);
-		um_free_irq(lp->dev->irq, lp->dev);
+		free_irq(lp->dev->irq, lp->dev);
 		if ((lp->close != NULL) && (lp->fd >= 0))
 			(*lp->close)(lp->fd, &lp->user);
 		if (lp->remove != NULL)
diff --git a/arch/um/include/asm/irq.h b/arch/um/include/asm/irq.h
index 4a2037f..be9128b 100644
--- a/arch/um/include/asm/irq.h
+++ b/arch/um/include/asm/irq.h
 <at>  <at>  -1,21 +1,27  <at>  <at> 
+
 #ifndef __UM_IRQ_H
 #define __UM_IRQ_H

+#define UM_ETH_IRQ_RR	        32
+
 #define TIMER_IRQ		0
 #define UMN_IRQ			1
 #define CONSOLE_IRQ		2
 #define CONSOLE_WRITE_IRQ	3
 #define UBD_IRQ			4
-#define UM_ETH_IRQ		5
-#define SSL_IRQ			6
-#define SSL_WRITE_IRQ		7
-#define ACCEPT_IRQ		8
-#define MCONSOLE_IRQ		9
-#define WINCH_IRQ		10
-#define SIGIO_WRITE_IRQ 	11
-#define TELNETD_IRQ 		12
-#define XTERM_IRQ 		13
-#define RANDOM_IRQ 		14
+#define UM_ETH_BASE_IRQ		5
+
+#define UM_END_ETH_IRQ	        UM_ETH_BASE_IRQ + UM_ETH_IRQ_RR
+
+#define SSL_IRQ			UM_END_ETH_IRQ + 1
+#define SSL_WRITE_IRQ		UM_END_ETH_IRQ + 2
+#define ACCEPT_IRQ		UM_END_ETH_IRQ + 3
+#define MCONSOLE_IRQ		UM_END_ETH_IRQ + 4
+#define WINCH_IRQ		UM_END_ETH_IRQ + 5
+#define SIGIO_WRITE_IRQ 	UM_END_ETH_IRQ + 6
+#define TELNETD_IRQ 		UM_END_ETH_IRQ + 7
+#define XTERM_IRQ 		UM_END_ETH_IRQ + 8
+#define RANDOM_IRQ 		UM_END_ETH_IRQ + 9

 #define LAST_IRQ RANDOM_IRQ
 #define NR_IRQS (LAST_IRQ + 1)
diff --git a/arch/um/include/shared/net_kern.h b/arch/um/include/shared/net_kern.h
index 012ac87..2229126 100644
--- a/arch/um/include/shared/net_kern.h
+++ b/arch/um/include/shared/net_kern.h
 <at>  <at>  -1,4 +1,5  <at>  <at> 
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2002 2007 Jeff Dike (jdike <at> {addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
 <at>  <at>  -13,6 +14,8  <at>  <at> 
 #include <linux/list.h>
 #include <linux/workqueue.h>

+#define UML_NET_USE_SKB_READ 1
+
 struct uml_net {
 	struct list_head list;
 	struct net_device *dev;
 <at>  <at>  -28,6 +31,7  <at>  <at>  struct uml_net_private {

 	struct work_struct work;
 	int fd;
+	unsigned int options;
 	unsigned char mac[ETH_ALEN];
 	int max_packet;
 	unsigned short (*protocol)(struct sk_buff *);
 <at>  <at>  -36,6 +40,7  <at>  <at>  struct uml_net_private {
 	void (*remove)(void *);
 	int (*read)(int, struct sk_buff *skb, struct uml_net_private *);
 	int (*write)(int, struct sk_buff *skb, struct uml_net_private *);
+	struct sk_buff * (*skb_read)(struct uml_net_private *);

 	void (*add_address)(unsigned char *, unsigned char *, void *);
 	void (*delete_address)(unsigned char *, unsigned char *, void *);
 <at>  <at>  -47,6 +52,8  <at>  <at>  struct net_kern_info {
 	unsigned short (*protocol)(struct sk_buff *);
 	int (*read)(int, struct sk_buff *skb, struct uml_net_private *);
 	int (*write)(int, struct sk_buff *skb, struct uml_net_private *);
+	struct sk_buff * (*skb_read)(struct uml_net_private *);
+	unsigned int options;
 };

 struct transport {
 <at>  <at>  -59,11 +66,28  <at>  <at>  struct transport {
 	const int setup_size;
 };

+struct mmsg_queue_info {
+	int fd;
+	struct mmsghdr * mmsg_send_vector; 
+	void ** skb_send_vector;
+	int queue_depth, head, tail, max_depth;
+	spinlock_t head_lock; 
+	spinlock_t tail_lock; 
+	unsigned int queue_fsm;
+};
+ 
 extern struct net_device *ether_init(int);
 extern unsigned short ether_protocol(struct sk_buff *);
 extern int tap_setup_common(char *str, char *type, char **dev_name,
 			    char **mac_out, char **gate_addr);
 extern void register_transport(struct transport *new);
 extern unsigned short eth_protocol(struct sk_buff *skb);
+extern struct sk_buff *my_build_skb(void * head, void *data, unsigned int frag_size);
+
+extern void flush_pending_netio(void);
+
+extern int uml_net_advance_tail( struct mmsg_queue_info * queue_info, int advance); 
+extern int uml_net_advance_head( struct mmsg_queue_info * queue_info, int advance); 
+extern int uml_net_flush_mmsg_queue(struct mmsg_queue_info * queue_info, int queue_depth);

 #endif
diff --git a/arch/um/include/shared/net_user.h b/arch/um/include/shared/net_user.h
index 3dabbe1..4b46f37 100644
--- a/arch/um/include/shared/net_user.h
+++ b/arch/um/include/shared/net_user.h
 <at>  <at>  -1,4 +1,5  <at>  <at> 
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2002 - 2007 Jeff Dike (jdike <at> {addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
 <at>  <at>  -38,10 +39,15  <at>  <at>  extern void tap_check_ips(char *gate_addr, unsigned char *eth_addr);
 extern void read_output(int fd, char *output_out, int len);

 extern int net_read(int fd, void *buf, int len);
+extern int net_readv(int fd, void *iov, int iovcnt);
 extern int net_recvfrom(int fd, void *buf, int len);
+extern int net_recvfrom2(int fd, void *buf, int len, void *src_addr, int *addrlen);
 extern int net_write(int fd, void *buf, int len);
+extern int net_writev(int fd, void *iov, int iovcnt);
 extern int net_send(int fd, void *buf, int len);
 extern int net_sendto(int fd, void *buf, int len, void *to, int sock_len);
+extern int net_sendmessage(int fd, void *msg, int flags);
+extern int net_recvmessage(int fd, void *msg, int flags);

 extern void open_addr(unsigned char *addr, unsigned char *netmask, void *arg);
 extern void close_addr(unsigned char *addr, unsigned char *netmask, void *arg);
 <at>  <at>  -50,4 +56,22  <at>  <at>  extern char *split_if_spec(char *str, ...);

 extern int dev_netmask(void *d, void *m);

+
+extern void uml_net_destroy_skb(void * skb);
+extern void * uml_net_build_skb (void * dev);
+extern void * uml_net_skb_data (void * skb);
+
+extern void add_skbuffs(void * msgvec, void ** skbvec, int size, int skb_size, int offset);
+extern void add_header_buffers(void * msgvec, int size, int header_size);
+extern void * build_mmsg_vector(int size, int iovsize);
+extern void rebuild_skbuf_vector(void ** skbvec, int size, void * dev);
+extern void * build_skbuf_vector(int size, void * dev);
+extern int net_recvmmsg(int fd, void *msgvec, unsigned int vlen,
+		unsigned int flags, struct timespec *timeout);
+extern int net_sendmmsg(int fd, void *msgvec, unsigned int vlen,
+		unsigned int flags);
+extern void repair_mmsg (void *msgvec, int iovsize, int header_size);
+extern void destroy_skb_vector(void ** vector, int size);
+extern void destroy_mmsg_vector(void * mmsgvector, int size, int free_iov_base);
+
 #endif
diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 5d7ee49e..f4c6fb1 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
 <at>  <at>  -17,6 +17,7  <at>  <at> 
 #include <as-layout.h>
 #include <kern_util.h>
 #include <os.h>
+#include <net_kern.h>

 /*
 *	We are on the "kernel side" so we cannot pick up the sys/epoll.h 
 <at>  <at>  -136,6 +137,8  <at>  <at>  void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
 		spin_unlock_irqrestore(&uml_sigio_lock, flags);
 	}

+	flush_pending_netio();
+
 	/* This needs a better way - it slows down the event loop */

 	free_irqs();
--

-- 
1.7.10.4

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/

Gmane