Benchmarks | NotebookReview

Renee Notebook Virtuoso

Reputations:: 610

Messages:: 2,645

Likes Received:: 0

Trophy Points:: 55

I keep hearing negative things about benchmark and I can understand why.

Benchmarks are made to measure something or answer a question. But they are written by people who don’t always ask the right question or don’t supply a bench mark that answers it. At one time, I was the technical lead for DC’s benchmark team of one of the major computer manufacturers. If there are any comments or questions about benchmarks, I thought this may be a good time to answer them.

Renee

Renee, Jan 28, 2011

#1

tilleroftheearth Wisdom listens quietly...

Reputations:: 5,398

Messages:: 12,692

Likes Received:: 2,717

Trophy Points:: 631

Renee,

Benchmarks can answer any question you ask them to (if written properly). However, that doesn't mean the question (asked) was appropriate in the first place.

While a 'score' may be double, triple or even an order of magnitude different based on a certain benchmark, it can have very little correlation to the real world use of that product with real workloads and actual apps/programs.

While isolating all the different factors offers us a chance to concentrate on a specific variable we may be interested in (or, think we're interested in...) - this is exactly what disconnects benchmarks from reality!

I don't buy new parts to boast/test how good/great/untouchable their benchmark 'scores' are; I buy/try new products to increase my productivity in tangible ways that will get my computer work done faster.

Although my heaviest use of computers is for editing RAW image files (up to a few thousand at a time), I also appreciate other aspects of a well balanced computer system too: including how responsive it is (at idle and/or full load).

As hinted above: my 'benchmark' is to swap a new part in, use my computer as I've been using it for the last few years and see if the work is completed faster, slower, the same or even if I noticed nuances like: 'although it was only slightly faster (a minute on a 60 minute run, for example), the system was a lot more 'connected' to my inputs during that time than without '______' (the new piece of equipment I was trying/testing).

What I would like to see in a 'benchmark' is this:
instead of using benchmarks that isolate and emphasis certain parameters (for example: maximum 4K Random R/W's), I would rather have a benchmark program that records a day of my (specific) computer usage and gives me (simply 'reports', not influences) all those numbers/scores for the specific hardware/software/data set I just used.

Now, when I switch in the new part(s), I want it to play back exactly what I did and then show me any differences in any specific 'scores' and/or total time.

Note: I don't want the 'script' to just blow through in 1/2 Hr what it took me to do in 8 Hrs - I want it to intelligently analyse where any bottlenecks were encountered and what the new part was able to do to minimize them.

I'm not holding my breath for any such program - this is why I'm doing this manually (still).

People say that 'manually', is too close to 'subjective' and that the conclusion is/can be faulty. It may be a faulty conclusion for others (if they guess their workflow matches mine wrongly), but for my specific case, I haven't been proved wrong yet.

My question to you then is this: is such a benchmark available?

Benchmark Software that:

1) Simply reports key aspects/'scores' of a fully functional (work) system.

2) Does not influence the scores reported (nor can manufacturers influence those scores either, by 'faking' running those benchmark scores and simply returning a good/great 'value').

3) Is reliable/consistent between runs to less than a tenth of a percent (no reason it shouldn't be so...).

4) Can report on estimated time saved over a month/quarter/year (previous run vs. current run).

5) Can take the $$ amount of the 'upgrade' and give a cost/benefit ratio over a month/quarter/year time scale.

Renee, I think I've summarized my position fairly.

The only conclusion (unless you have some very compelling arguments against) is that current benchmarks belong in the labs at the development stage of a product.

For a finished product (and a user looking to significantly/tangibly improve from where they currently are now), benchmarks simply confuse the issue of whether product 'a' is better than product 'b' - instead, the question gets mutated to whether product 'a' has a better 'score' than product 'b' - and that is an entirely different question (and answer, in most cases) than what most people think they're getting an answer to.

To give a concrete example:

Three 500GB 2.5" HDD's:

Momentus XT,
Hitachi 7K500,
Scorpio Black.

While the Scorpio Black is indisputedly the 'benchmark' king - in actual use on my identical systems with exact, identical setups the Scorpio Black is a distant third place finisher for the tasks I want/need my systems for. In some cases, it takes almost twice as long as the XT Hybrid to complete a given task (compacting a database, for example).

While it trades places with the Hitachi a few times, the only 'best' it can boast is a better price - not performance (overall). The XT is the fastest platter based solution easily (if we don't look at 'benchmarks' and lose our focus).

Hope to hear from others on their views of benchmarks (with some real life examples too).

tilleroftheearth, Jan 29, 2011

#2

Renee Notebook Virtuoso

Reputations:: 610

Messages:: 2,645

Likes Received:: 0

Trophy Points:: 55

I totally disagree, but remember I was the technical lead in a computer benchmark era when actually mini-computers were the only game in town.

"However, that doesn't mean the question (asked) was appropriate in the first place"

Customers often gave "mandatoty" benchmarks to a company. It made no difference whether the benchmark answered a question or not, the object of the game was to pass the benchmark given to you. At other times, bench marks really were germane. Still at other times customers really did set out to answer a question.

What we saw was that at times, benchmarks were written to exclude a particular operating system or approach like a benchmark which required the timing to initialize 3000 disks. Where does one find 3000 disks for mini-computers even if one works for a vendor? That sounds as if it the benchmarks is made to preclude minis all together.

Renee

"What I would like to see in a 'benchmark' is this:
instead of using benchmarks that isolate and emphasis certain parameters (for example: maximum 4K Random R/W's), I would rather have a benchmark program that records a day of my (specific) computer usage and gives me (simply 'reports', not influences) all those numbers/scores for the specific hardware/software/data set I just used."

While what you eliminated was also useful in certain circumstances, certainly you would want the benchmark that measured aggregate throughput as you proposed. Your approach looks at the total system throughput.

Renee

Renee, Jan 29, 2011

#3

Renee Notebook Virtuoso

Reputations:: 610

Messages:: 2,645

Likes Received:: 0

Trophy Points:: 55

1) Simply reports key aspects/'scores' of a fully functional (work) system.

This would be the “total system throughput benchmark

2) Does not influence the scores reported (nor can manufacturers influence those scores either, by 'faking' running those benchmark scores and simply returning a good/great 'value').

You asked an interesting question here. Typing results wouldn’t do any good because you had to turn in code also. The code had to work of course. This was where the output of a benchmark was to be found.

3) Is reliable/consistent between runs to less than a tenth of a percent (no reason it shouldn't be so...).

Oh of course there was. These were real multiuser systems and often there was a certain amount of variance in the scores, the same as one would observe on a PC today.

4) Can report on estimated time saved over a month/quarter/year (previous run vs. current run).

That’s almost an accounting question. That question, because of it’s nature would be answered by sales.

5) Can take the $$ amount of the 'upgrade' and give a cost/benefit ratio over a month/quarter/year time scale.

Again this is a “salesy” question. It’s not technical and would best be answered by the person dealing with the customer “strategically”.

Renee

Renee, Jan 29, 2011

#4

Renee Notebook Virtuoso

Reputations:: 610

Messages:: 2,645

Likes Received:: 0

Trophy Points:: 55

"To give a concrete example:

Three 500GB 2.5" HDD's:

Momentus XT,
Hitachi 7K500,
Scorpio Black.

While the Scorpio Black is indisputedly the 'benchmark' king - in actual use on my identical systems with exact, identical setups the Scorpio Black is a distant third place finisher for the tasks I want/need my systems for. In some cases, it takes almost twice as long as the XT Hybrid to complete a given task (compacting a database, for example).

While it trades places with the Hitachi a few times, the only 'best' it can boast is a better price - not performance (overall). The XT is the fastest platter based solution easily (if we don't look at 'benchmarks' and lose our focus)."

THIS is interesting. I wonder why? They are the same speed (rpm)? What are their seek times?
Renee

Renee, Jan 29, 2011

#5