Jonathan Larmour | 2 Oct 2009 17:51

NAND technical review

As per my ecos-discuss mail just now, I would like to get going straight 
away with a public discussion of the _technical_ merits of both NAND 
implementations. There is a risk of rehashing old ground, but I'm sure in 
both cases things have moved on a bit since the last time round, not least 
in response to comments, so it would also be good to clarify the current 
state.

I think at first the ball is really in Ross/eCosCentric's court to give 
the technical rationale for the decision, so I'd like to ask him first to 
give his rationale and his own perspective of the comparison of the 
pros/cons. I think the primary onus of the legwork is on eCosCentric, not 
least because they saw Rutger's version before implementation - although 
that was an early version, so it's entirely possible things have changed 
now. Obviously I would especially like Rutger's view on whether any 
purported benefits of eCosCentric's implementation are really the case, 
and any claimed disadvantages of his own are plausible. I suspect some of 
this to come down to subjective opinions of course.

But this is an open discussion, so I'd appreciate anyone's views. I'd 
especially value Simon Kallweit's views as someone who has actually used 
both code implementations which gives him a very good perspective. 
Although if anyone wants to contribute, please keep it on topic, within 
this thread, and technical.

Thanks. Over to Ross....

Jifl
--

-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

(Continue reading)

Ross Younger | 6 Oct 2009 15:51
Favicon

Re: NAND technical review

Jonathan Larmour wrote:
> I think at first the ball is really in Ross/eCosCentric's court to give
> the technical rationale for the decision, so I'd like to ask him first
> to give his rationale and his own perspective of the comparison of the
> pros/cons.

Here goes with a comparison between the two in something close to their
current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).
For brevity, I will refer to the two layers as "E" (eCosCentric) and "R"
(Rutger) from time to time.

Note that this is only really a comparison of the two NAND layers. I have
not attempted to compare the two YAFFS porting layers, though I do mention
them in a couple of places where it seemed relevant.

BTW: I will be off-net tomorrow and all next week, so please don't think I
am ignoring the discussion...

1. NAND 101 -------------------------------------------------------------

(Those familiar with NAND chips can skip this section, but I appreciate
that not everybody on-list is in the business of writing NAND device
drivers :-) )

(i) Conceptual

A chip comprises a number of blocks (a round power of two).

Each block comprises a number of pages (another power of two).

(Continue reading)

Jonathan Larmour | 7 Oct 2009 05:12

Re: NAND technical review

Hi Ross,

First thanks very much for all this. Quite a bit to digest but only 
because it's extremely useful. Sorry for the number of questions I have - 
it's not meant to be inquisitorial, but obviously I need to get to the 
bottom of certain issues.

I've added Rutger to the CC as he may be able to comment on some of the 
issues I raise.

You can assume tacit acceptance/understanding of whatever I haven't 
commented on.

Ross Younger wrote:
> Here goes with a comparison between the two in something close to their
> current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).

FWIW, Rutger is now up to r666.

> However, not all chips are quite the same. The ONFI initiative is an attempt
> to standardise chip protocols and most new chips should comply with it. A
> number of chips on the market are _nearly_ ONFI-compliant: deviations
> typically occur over the format of the ReadID response and that of an
> address. I believe that older chips did their own thing entirely.

Good ONFI support should be the highest priority as that's the way 
everything is likely to go, although we do need the others too. OTOH, my 
experience of NOR flash chip interfaces is that standard specs are all 
well and good, but manufacturers still like to add their own touches. So I 
suspect ONFI will probably correspond to a common subset of functionality, 
(Continue reading)

Jürgen Lambrecht | 7 Oct 2009 11:40
Favicon

Re: Re: NAND technical review

Ross Younger wrote:
> Jonathan Larmour wrote:
>   
>> I think at first the ball is really in Ross/eCosCentric's court to give
>> the technical rationale for the decision, so I'd like to ask him first
>> to give his rationale and his own perspective of the comparison of the
>> pros/cons.
>>     
>
> Here goes with a comparison between the two in something close to their
> current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).
> For brevity, I will refer to the two layers as "E" (eCosCentric) and "R"
> (Rutger) from time to time.
>
> Note that this is only really a comparison of the two NAND layers. I have
> not attempted to compare the two YAFFS porting layers, though I do mention
> them in a couple of places where it seemed relevant.
>
> BTW: I will be off-net tomorrow and all next week, so please don't think I
> am ignoring the discussion...
>   
<snip>

> (a) Partitions
>
> E's application interface also provides logic implementing partitions.
> That is to say, all access to a NAND array must be via a `partition';
> the NAND layer sanity-checks whether the requested flash page or block
> address is within the given partition. This is quite a lightweight
> layer and hasn't added much overhead of either code footprint or
(Continue reading)

Rutger Hofman | 7 Oct 2009 14:14
Picon
Picon

Re: NAND technical review

Ross Younger wrote:
[snip]
> Getting data into and out of the chip involves a simple protocol sequence.
> 
> Commands are single bytes; addresses are sequences of a few bytes depending
> on the chip size and the operation invoked.
> 
> For example, to read a page of data on the spec sheet I have to hand is:
> * Write 0x00 into the command latch
> * Write the four address bytes in turn into the address latch
> * Write 0x30 into the command latch
> * Chip signals Busy; wait for it to signal Ready
> * Read out (up to) 2112 bytes of data.

AFAIK, there are two kinds of chips on the market: Large-page chips (2K 
data pages) and Small-page chips (512B pages). These speak a different 
command language, but in their wiring they are the same. The large-page 
chips are (nearly) ONFI-compliant, the Small-page chip command language 
is different. Ancient chips aside, if a chip gives its Device Type Byte, 
NAND flash code can look up in its tables what the chip parameters are 
(page size, block size, number of blocks, 8 or 16 bit data bus, etc). 
Miracle: Device Type Bytes are shared across manufacturers, so the table 
is limited in size.

I saw an annoucement of 4K-page chips, but the datasheets are 
confidential. Is there anybody who can comment on these?

> However, not all chips are quite the same. The ONFI initiative is an attempt
> to standardise chip protocols and most new chips should comply with it. A
> number of chips on the market are _nearly_ ONFI-compliant: deviations
(Continue reading)

Rutger Hofman | 7 Oct 2009 18:26
Picon
Picon

Re: NAND technical review

I should have stated this in my first mail...

I am not at all qualified to say anything about E's work, because I 
didn't have time to do any kind of review of it. So, I will mainly limit 
myself to comments on things that concern R's work, and where I say 
anything on E it will be based on the E's mails on the list.

Jonathan Larmour wrote:
[snip]
> A device number does seem to be a bit limiting, and less deterministic. 
> OTOH, a textual name arguably adds a little extra complexity.

This will be straightforward to change either way.

> I note Rutger's layer needs an explicit init call, whereas yours DTRT using a constructor, which is good.

I followed flash v2 in this. If the experts think a constructor is 
better, that's easy to change too.

> Does your implementation _require_ a BBT in its current implementation? 
> For simpler NAND usage, it may be overkill e.g. an application where the 
> number of rewrites is very small, so the factory bad markers may be 
> considered sufficient.

This is a bit hairy in my opinion, and one reason is that there is no 
Standard Layout for the spare areas. One case where a BBT is forced: my 
BlackFin NFC can be used to boot from NAND, but it enforces a spare 
layout that is incompatible with MTD or anybody. It is even incompatible 
with most chips' specification that the first byte of spare in the first 
page of the block is the Bad Block Marker. BlackFin's boot layout uses 
(Continue reading)

Rutger Hofman | 7 Oct 2009 18:31
Picon
Picon

Re: NAND technical review

Jürgen Lambrecht wrote:
> Ross Younger wrote:
>> Jonathan Larmour wrote:
[snip]

> Is it possible that R's model follows better the "general" structure of 
> drivers in eCos?
> I mean: (I follow our CVS, could maybe differ from the final commit of 
> Rutger to eCos)
> 1. with the low-level chip-specific code in /devs 
> (devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and 
> devs/flash/micron/nand)
> 2. with the "middleware" in /io (io/flash_nand/current/src and there 
> /anc, /chip, /controller)
> 3. with the high-level code in /fs

As far as I know, this has been the case for some releases already.

> Is it correct that R's abstraction makes it possible to add partitioning 
> easily?
> (because that is an interesting feature of E's implementation)

I think it would not be hard to add. It might involve a change in API 
though, which is no problem as long as the number of clients is small, 
and all the more when those clients desire it.

Rutger

Jürgen Lambrecht | 8 Oct 2009 09:15
Favicon

Re: Re: NAND technical review

Rutger Hofman wrote:

<snip>
>>> - R's model shares the command sequence logic amongst all chips,
>>> differentiating only between small- and large-page devices. (I do not
>>> know
>>> whether this is correct for all current chips, though going forwards
>>> seems
>>> less likely to be an issue as fully-ONFI-compliant chips become the
>>> norm.)
>>>       
>> Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it
>> may be too prescriptive to be robustly future-proof.
>>     
>
> Well, there is no way I can see into the future, but I definitely think
> that the wire command model for NAND chips is going to stay -- it is in
> ONFI, after all. Besides, all except the 1 or 2 most pioneering museum
> NAND chips use it too. There are chips that use a different interface,
> like SSD or MMC or OneNand, but then these chips come with on-chip bad
> block management, wear leveling of some kind, and are completely
> different in the way they must be handled. I'd say E's and R's
> implementations are concerned only with 'raw' NAND chips.
>
>   
Correct, only for raw NAND chips to be soldered on a board. The others 
have an embedded controller and are already packaged.
>> One could say that makes it a more realistic emulation. But yes I can
>> see disadvantages with a somewhat rigid world view. Thinking out loud, I
>> wonder if Rutger's layer could work with something like Samsung OneNAND.
(Continue reading)

Jürgen Lambrecht | 8 Oct 2009 10:16
Favicon

Re: Re: NAND technical review

Just some explanatory remarks below, hardware related.

Ross Younger wrote:

<snip>
> 1. NAND 101 -------------------------------------------------------------
>
> (Those familiar with NAND chips can skip this section, but I appreciate
> that not everybody on-list is in the business of writing NAND device
> drivers :-) )
>
> (i) Conceptual
>   
<snip>
>
> Now, I mentioned ECC data. NAND technology has a number of underlying
> limitations, importantly that it has reliability issues. I don't have a full
> picture - the manufacturers seem to be understandably coy - but my
> understanding is that on each page, a driver ought to be able to cope with a
> single bit having flipped either on programming or on reading. The
>   
Such a "broken bit" is because the transistor that contains the bit is 
physically broken, and is stuck at 1 or at 0 (I don't know if it can be 
both). So you cannot anymore erase it (flip it back to 1) or program it 
(flip to 0).

I thought only programming or erasing could break it, not reading?
Is somebody sure about this?
> recommended way to achieve this is by storing an ECC in the spare area: the
> algorithm published by Samsung is popular, requiring 22 bits of ECC per 256
(Continue reading)

Ross Younger | 8 Oct 2009 14:31
Favicon

Re: NAND technical review

Rutger Hofman wrote:
> R has part-read and part-write support. One thing that has always
> puzzled me is how this interacts with ECC. ECC often works on a complete
> subpage, like 256 bytes on a 2KB page chip; then I understand. But what
> if the read/write is not of such a subpage?

This is a very good question - I revisited it the other day when working on
hardware ECC support for the customer port I'm working on - and I don't have
a particularly good answer for it.

If the read is less than an ECC stride[*], one could perhaps fill in the ECC
calculation by reading the rest of that stride's worth anyway and not
passing it to the caller. Similarly, a write that is less than a stride
could be "filled in" with 0xFF for the purposes of computing its ECC. How
this would be achieved efficiently is an exercise for the reader as a bit of
refactoring is likely to be involved...

[*] I'm using "stride" here to mean the amount of data that an ECC
calculation operates over. The Samsung algorithm which computes 22 bits of
ECC over 256 bytes of data is common, not least of which because that's the
one used by the Linux MTD layer.

I did wonder about not supporting less-than-page reads and writes at all,
but my code currently tries its best on the grounds of being liberal in what
it accepts.

In passing, I note that some large page devices allow the data and spare
areas to be written in subpages (e.g. this Samsung K9 chip to hand - 2048
main + 64 spare per page - allows writes in units of 512 main and 16 spare);
there might be a use to be found here in allowing an application to treat a
(Continue reading)


Gmane