1 Apr 2010 01:28

### Re: Re: Vector and Seq

2010/4/1 Jesper Nordenberg
Daniel Sobral skrev 2010-03-31 23:59:

O(log32 n) is different than O(log n). You only ignore multiplicative
constant factors and constant factors. Perhaps you meant to say that
O(log 32 * n) == O(log n)?

You really need to read up on big O notation. log32(n) = log(n) / log(32), so O(log32(n)) = O(log(n) / log(32)) = O(log(n)).

As for O() notation being irrelevant for length limited data types,
given that all computers can only deal with limited size collections in
their memories, does that make O() notation irrelevant for in-memory
algorithms? That's non-sensical, and untrue anyway.

Big O notation is a mathematical concept to describe for example running time and memory usage of algorithms when the number elements grows towards infinity. If you have a limited maximum number of elements it can only be used as a rough estimate to compare data structures and algorithms.

Right now, 2^30 elements on an in-memory collection of any kind is
downright impossible on 32 bits JVM, and very tough on current 64 bits
system. When the access time ratio between 100 and 2^30 elements is
about 1/2, it doesn't make any sense nit-picking the difference. The
slowest rate would have been good enough even for the smaller
collection, given the other nice properties of this collection, and it
is by this token that we can say it is O(1). If you want _the_ fastest
access speed possible, then just go with arrays.

I'm not arguing that Vector is a useful and fast implementation, but if we agree that it's branching factor is at most 32, element access time is O(log n) not O(1) as the number of elements grows towards infinity. If you talk about a length limited implementation, the big O notation is not applicable.

Now, enter the world of computers, where everything is limited, yet we somehow manage to make use of big oh notation!

While theory is important, one should also keep an eye to practical considerations. Here is a well known example of that. Disjoint sets - the optimal solution is O(ma(m, n)), where a(m, n) is the inverse Ackerman's function. Theoretically, this is by no means O(1). Yet, "for all practical purposes, a(m, n) is a constant not larger than four" (quoting Tarjan). Four! Even if we had a problem the size of the universe, this would remain four! This important insight is not apparent through the glasses of big oh notation, which could not run any deeper than a (completely meaningless) comparison of O(1) and O(a(m, n)). These are mere tools, one has to know when it makes sense to use them.

Dimitris

PS:  Random linguistic fact: "Big Omicron" hides an oxymoron in Greek. It literally means "big oh, small". "Big Omega" is also fun: "Big oh, big" (micron = small, mega = big). Very big indeed.

/Jesper Nordenberg

1 Apr 2010 01:30

### Re: Re: Vector and Seq

On Wed, Mar 31, 2010 at 7:43 PM, Jesper Nordenberg wrote:
Daniel Sobral skrev 2010-03-31 23:59:

O(log32 n) is different than O(log n). You only ignore multiplicative
constant factors and constant factors. Perhaps you meant to say that
O(log 32 * n) == O(log n)?

You really need to read up on big O notation. log32(n) = log(n) / log(32), so O(log32(n)) = O(log(n) / log(32)) = O(log(n)).

Nah, just on logs.

As for O() notation being irrelevant for length limited data types,
given that all computers can only deal with limited size collections in
their memories, does that make O() notation irrelevant for in-memory
algorithms? That's non-sensical, and untrue anyway.

Big O notation is a mathematical concept to describe for example running time and memory usage of algorithms when the number elements grows towards infinity. If you have a limited maximum number of elements it can only be used as a rough estimate to compare data structures and algorithms.

Everyone has a limited number of elements, unless you happen to own a real turing machine, but whatever.

Right now, 2^30 elements on an in-memory collection of any kind is
downright impossible on 32 bits JVM, and very tough on current 64 bits
system. When the access time ratio between 100 and 2^30 elements is
about 1/2, it doesn't make any sense nit-picking the difference. The
slowest rate would have been good enough even for the smaller
collection, given the other nice properties of this collection, and it
is by this token that we can say it is O(1). If you want _the_ fastest
access speed possible, then just go with arrays.

I'm not arguing that Vector is a useful and fast implementation, but if we agree that it's branching factor is at most 32, element access time is O(log n) not O(1) as the number of elements grows towards infinity. If you talk about a length limited implementation, the big O notation is not applicable.

/Jesper Nordenberg

--
Daniel C. Sobral

I travel to the future all the time.
1 Apr 2010 04:15

### Splitting scala.collections.immutable into .persistent and .immutable

```Please excuse any misused terms here and even suggesting this change before 2.8
is released :), but here goes:

and after writing an (unreleased) wrapper to some of the Google
Collections/Guava classes it appears that it could be useful to distinguish
between immutable and persistent data structures:

http://en.wikipedia.org/wiki/Persistent_data_structure

The immutable Google Collection/Guava classes are not persistent so when I
wrapped them I overrode the + and - operators to throw a
UnsupportedOperationException because they would be very inefficient to build an
entire new collection to add or subtract a single item and I wanted to ensure
people wouldn't use such an inefficient operation.

For my server based code there are many places where I build an immutable data
structure and don't care about the previous versions of it that were built to
generate the final version and plus with the Google collections there is no
synchronization at all so I don't need to pay for that either.

So it seems like further separating the Scala Collections from mutable and
immutable into immutable, persistent and mutable may be useful.  The persistent
traits would extend the immutable traits and would supply the relatively
efficient + and - operators.

Blair

```
1 Apr 2010 04:40

### RE: Splitting scala.collections.immutable into .persistent and .immutable

> it could be useful to distinguish
> between immutable and persistent data structures:
>
> http://en.wikipedia.org/wiki/Persistent_data_structure

Sigh. If only "persistent" didn't also mean "persistent"...

A

Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. Sign up now.
1 Apr 2010 08:32

### Re: Splitting scala.collections.immutable into .persistent and .immutable

```Blair Zajac skrev 2010-04-01 04:15:
> So it seems like further separating the Scala Collections from mutable
> and immutable into immutable, persistent and mutable may be useful. The
> persistent traits would extend the immutable traits and would supply the
> relatively efficient + and - operators.

The problem is that just because a collection is persistent doesn't mean
that all operations are efficient. Take Scala List for example, it's
typically considered persistent, but most operations on it is horribly
inefficient. So I really don't see how a persistent interface would be
useful in practise.

/Jesper Nordenberg

```
1 Apr 2010 13:52

### Re: Re: Vector and Seq

```O(1), O(log32 n), O(log n) are all valid characterizations, as are O(n), O(n^2), O(n^n). And they all don't
tell you much because big-O is just an upper bound and it hides a constant that can be 0.00001 as well as 100000.

The important piece of information is that iterating over a large vector by indexed access can be about 10
times slower than using an iterator. Ismael Juma posted some nice performance diagrams here:
http://blog.juma.me.uk/2009/10/26/new-jvm-options-and-scala-iteration-performance/ and you
can also compare numbers directly here: http://lamp.epfl.ch/~rompf/vector2/vector-summary.txt
(look for BenchBVectorForeachFast and BenchBVectorIndexedFast). Of course it also depends on whether
-optimize is able to inline things well.

The reason why Vector is not an IndexedSeq is that IndexedSeqLike implements most basic collection
operations using indexed access. This is the fastest method for arrays and strings, but not for Vectors.
So if Vector was an IndexedSeq, we would have had to override most of the methods from IndexedSeqLike with
verbatim copies of the corresponding IterableLike code, which would hardly be a good design.

- Tiark
```
1 Apr 2010 16:33

### Re: Splitting scala.collections.immutable into .persistent and .immutable

Probably not all that useful. Take List for example.

List is persistent for tail and :: (prepend)
List is not persistent for init and :+(append)
List is half persistent for ++ (concatenate)

Other structures will make different tradeoffs.  It's really a slight misnomer to say a structure is persistent - we can only say it is or isn't persistent under a particular operation.

On Wed, Mar 31, 2010 at 7:15 PM, Blair Zajac wrote:
Please excuse any misused terms here and even suggesting this change before 2.8 is released :), but here goes:

and after writing an (unreleased) wrapper to some of the Google Collections/Guava classes it appears that it could be useful to distinguish between immutable and persistent data structures:

http://en.wikipedia.org/wiki/Persistent_data_structure

The immutable Google Collection/Guava classes are not persistent so when I wrapped them I overrode the + and - operators to throw a UnsupportedOperationException because they would be very inefficient to build an entire new collection to add or subtract a single item and I wanted to ensure people wouldn't use such an inefficient operation.

For my server based code there are many places where I build an immutable data structure and don't care about the previous versions of it that were built to generate the final version and plus with the Google collections there is no synchronization at all so I don't need to pay for that either.

So it seems like further separating the Scala Collections from mutable and immutable into immutable, persistent and mutable may be useful.  The persistent traits would extend the immutable traits and would supply the relatively efficient + and - operators.

Blair

1 Apr 2010 16:46

### Re: Splitting scala.collections.immutable into .persistent and .immutable

I can't understand what you mean by "persistent", "not persistent", "half persistent" here. Who defines these terms and where?

(Partially/fully/confluently persistent, as defined by Robert Tarjan and others).

2010/4/1 James Iry
Probably not all that useful. Take List for example.

List is persistent for tail and :: (prepend)
List is not persistent for init and :+(append)
List is half persistent for ++ (concatenate)

Other structures will make different tradeoffs.  It's really a slight misnomer to say a structure is persistent - we can only say it is or isn't persistent under a particular operation.

On Wed, Mar 31, 2010 at 7:15 PM, Blair Zajac wrote:
Please excuse any misused terms here and even suggesting this change before 2.8 is released :), but here goes:

and after writing an (unreleased) wrapper to some of the Google Collections/Guava classes it appears that it could be useful to distinguish between immutable and persistent data structures:

http://en.wikipedia.org/wiki/Persistent_data_structure

The immutable Google Collection/Guava classes are not persistent so when I wrapped them I overrode the + and - operators to throw a UnsupportedOperationException because they would be very inefficient to build an entire new collection to add or subtract a single item and I wanted to ensure people wouldn't use such an inefficient operation.

For my server based code there are many places where I build an immutable data structure and don't care about the previous versions of it that were built to generate the final version and plus with the Google collections there is no synchronization at all so I don't need to pay for that either.

So it seems like further separating the Scala Collections from mutable and immutable into immutable, persistent and mutable may be useful.  The persistent traits would extend the immutable traits and would supply the relatively efficient + and - operators.

Blair

1 Apr 2010 17:03

### Re: Splitting scala.collections.immutable into .persistent and .immutable

I think it depends on whether you require that persistence cost less than copying everything to really count.

In that context, James' comments make sense:

val a = Nil
val b = 1 :: a
val c = 2 :: b
c.tail == b  // Of course they're the same
c.tail eq b  // Returns true => persistent, we didn't copy everything

val a = Nil
val b = 1 :: a
val c = b :+ 2
c.init eq b  // Of course they're the same
c.init eq b  // Returns false; we actually copied everything twice here :(
// This isn't what I'd call persistent, at least not if I'm distinguishing it from immutable!

val a = Nil
val b = 1 :: a
val c = 2 :: a
val d = b ++ c
d.drop(1) == c  // True, of course
d.drop(1) eq c  // True, c wasn't copied, yay!
d.take(1) == b  // True, of course
d.take(1) eq b  // False, we actually copied twice :(
// This is parsimonious in c but not have in b => "half-persistent"

Whether we call the failures "not persistent" or "persistent but maximally inefficient", the point is that saying that a _persistent_ data structure is usefully distinct from an _immutable_ data structure relies upon deeper knowledge of which operations you're going to support.  You can't really map "performs better but is safe" onto "persistent", but the former is what library users are more likely to care about.

--Rex

On Thu, Apr 1, 2010 at 10:46 AM, Dimitris Andreou wrote:
I can't understand what you mean by "persistent", "not persistent", "half persistent" here. Who defines these terms and where?

(Partially/fully/confluently persistent, as defined by Robert Tarjan and others).

2010/4/1 James Iry

Probably not all that useful. Take List for example.

List is persistent for tail and :: (prepend)
List is not persistent for init and :+(append)
List is half persistent for ++ (concatenate)

Other structures will make different tradeoffs.  It's really a slight misnomer to say a structure is persistent - we can only say it is or isn't persistent under a particular operation.

On Wed, Mar 31, 2010 at 7:15 PM, Blair Zajac wrote:
Please excuse any misused terms here and even suggesting this change before 2.8 is released :), but here goes:

and after writing an (unreleased) wrapper to some of the Google Collections/Guava classes it appears that it could be useful to distinguish between immutable and persistent data structures:

http://en.wikipedia.org/wiki/Persistent_data_structure

The immutable Google Collection/Guava classes are not persistent so when I wrapped them I overrode the + and - operators to throw a UnsupportedOperationException because they would be very inefficient to build an entire new collection to add or subtract a single item and I wanted to ensure people wouldn't use such an inefficient operation.

For my server based code there are many places where I build an immutable data structure and don't care about the previous versions of it that were built to generate the final version and plus with the Google collections there is no synchronization at all so I don't need to pay for that either.

So it seems like further separating the Scala Collections from mutable and immutable into immutable, persistent and mutable may be useful.  The persistent traits would extend the immutable traits and would supply the relatively efficient + and - operators.

Blair

1 Apr 2010 17:31

### Re: Splitting scala.collections.immutable into .persistent and .immutable

2010/4/1 Rex Kerr
I think it depends on whether you require that persistence cost less than copying everything to really count.

The thing is, "persistent" (as in persistent data structures) says/requires nothing about performance. Even a persistent structure with slow operations is persistent. Overloading the meaning of technical terms with ad-hoc interpretations makes the terms less useful.

scala.List is a <fully persistent data structure>. Simple. One can even look up the term and immediately and precisely understand what I'm talking about. "On a separate account, some operations of it are fast / O(1) and some operations are slow / O(N)". Also simple.

In that context, James' comments make sense:

val a = Nil
val b = 1 :: a
val c = 2 :: b
c.tail == b  // Of course they're the same
c.tail eq b  // Returns true => persistent, we didn't copy everything

val a = Nil
val b = 1 :: a
val c = b :+ 2
c.init eq b  // Of course they're the same
c.init eq b  // Returns false; we actually copied everything twice here :(
// This isn't what I'd call persistent, at least not if I'm distinguishing it from immutable!

val a = Nil
val b = 1 :: a
val c = 2 :: a
val d = b ++ c
d.drop(1) == c  // True, of course
d.drop(1) eq c  // True, c wasn't copied, yay!
d.take(1) == b  // True, of course
d.take(1) eq b  // False, we actually copied twice :(
// This is parsimonious in c but not have in b => "half-persistent"

Whether we call the failures "not persistent" or "persistent but maximally inefficient", the point is that saying that a _persistent_ data structure is usefully distinct from an _immutable_ data structure relies upon deeper knowledge of which operations you're going to support.  You can't really map "performs better but is safe" onto "persistent", but the former is what library users are more likely to care about.

--Rex

On Thu, Apr 1, 2010 at 10:46 AM, Dimitris Andreou wrote:
I can't understand what you mean by "persistent", "not persistent", "half persistent" here. Who defines these terms and where?

(Partially/fully/confluently persistent, as defined by Robert Tarjan and others).

2010/4/1 James Iry

Probably not all that useful. Take List for example.

List is persistent for tail and :: (prepend)
List is not persistent for init and :+(append)
List is half persistent for ++ (concatenate)

Other structures will make different tradeoffs.  It's really a slight misnomer to say a structure is persistent - we can only say it is or isn't persistent under a particular operation.

On Wed, Mar 31, 2010 at 7:15 PM, Blair Zajac wrote:
Please excuse any misused terms here and even suggesting this change before 2.8 is released :), but here goes:

and after writing an (unreleased) wrapper to some of the Google Collections/Guava classes it appears that it could be useful to distinguish between immutable and persistent data structures:

http://en.wikipedia.org/wiki/Persistent_data_structure

The immutable Google Collection/Guava classes are not persistent so when I wrapped them I overrode the + and - operators to throw a UnsupportedOperationException because they would be very inefficient to build an entire new collection to add or subtract a single item and I wanted to ensure people wouldn't use such an inefficient operation.

For my server based code there are many places where I build an immutable data structure and don't care about the previous versions of it that were built to generate the final version and plus with the Google collections there is no synchronization at all so I don't need to pay for that either.

So it seems like further separating the Scala Collections from mutable and immutable into immutable, persistent and mutable may be useful.  The persistent traits would extend the immutable traits and would supply the relatively efficient + and - operators.

Blair

Gmane