The first question I have in mind, is, how do you define a lock benchmark? Is your goal to minimize overhead? Is your goal to minimize the latency of a successful uncontended acquire? Is your goal to minimize bus load for other CPU when three CPUs are waiting for the spin lock?
What we're measuring is not well defined.
Looking at  (wow, So little have changed in SMP architectures since 1990!) and , gives a few options:
1) Measure the time it takes to perform a critical section N times by n CPUs concurrently.
2) Measure overhead. Compare the time it takes to a single CPU to do a task N times with no locks. Do that with n CPUs, where the task is protected by a spin lock. The time it took to do the task with spinlock, minus the time it took to do it alone is the overhead.
3) Measure the overhead of other m CPUs doing a different task, while n CPUs are contending over a critical section.
4) Measure latency. This is a bit tricky to measure. What  suggests is, measure the average time between two spin locks read. If I understand that correctly, they measure the time it took to just check if the spinlock is taken.
5) What I had in mind, is measuring how much time did it took from the time a CPU released the lock, to the time another CPU held it. This is tricky to do, since CPUs clocks are not always synchronized or cheap enough to check.
The devil's in the details, so even though  have a bird's eye view description of the methodology, there are still missing details (e.g., did they measure time or cycles?).
I'm far from expert on this matters, so I wonder:
Is there an industry standard for benchmarking a spinlock? Something like JS octane. A benchmark which should mimic many real world scenarios?
If indeed there is, is there an open source implementation?
If there isn't, are there better papers describing spinlock benchmarking methodology?