Originally Posted by Rob Williams
There were high times, low times, and times when I wanted to jump out the window and run into traffic. But in the end, we got an exhaustive and accurate benchmark out of the deal. I regret not deciding to jump on this well over a month ago... I just had no idea what we were in for :S
Indeed, lol. And even yesterday we were still picking glass from the floor. This thing consumes us in more than one way. Uff!
Ok, here's the promised description of the CPU2006 benchmark suite, for future reference. I'll try to make this short and to the point.
First and foremost, Rob already linked to it, but here it is again, the website of the suite
, for anyone wishing to dig deeper.
What is CPU2006?
CPU2006 is a benchmark suite composed of 29 individual benchmarks that test a CPU integer and floating point performance, as well as its memory subsystem. Every individual benchmark is composed of real-world algorithms (algorithms that are used on various types of applications, from video encoding to scientific applications or file compression) and was specifically developed to stress test the CPU.
The CPU2006 benchmarks are divided in two distinct groups:
- CINT2006 measures computer-intensive integer performance and is composed of 12 tests developed either in C or C++.
- CFP2006 measures computer-intensive floating-point performance and is composed of 17 tests, developed either in C, C++ or Fortran.
Results are presented separately for the integer and floating-point groups.
What kind of metrics there are?
CPU2006 can perform two different types of benchmarks:
Techgage will concern itself only with Speed Benchmarks and these are the results that will be shown in future CPU reviews.
How is the data reported?
- Speed Benchmarks: How fast the CPU performs a task.
- Rate Benchmarks: How many tasks can a CPU perform in a given time.
Rob is still working on the best way to present this data. But he's also going to publish the results to SPEC website. So we can discuss those, knowing that Rob will use these as a base for whatever cool graphs and presentation he cooks for us on future CPU review articles.
A typical SPEC report can be seen here
You can see from the top-right, this is an integer benchmark results file. That is, this is a CINT2006 results file. A floating-point (CFP2006) results file is exactly the same, only the tests are different. One example (for the exact same system) can be found here
Results at the top-right are median values obtained from all tests. Individual benchmark results can be seen on the graph below and on the Results Table further down.
CPU2006 runs the tests 3 times. Run #1, Run #2 and Run #3. Each of these Runs
can still be configured to run the same test more than once. These individual runs inside a Run # are called Copies
. So a Run # can be made up of 1 or more copies (1 or more executions of the benchmark).
CPU2006 can be configured to run two Sets
of benchmarks. Base
. Each of these sets performs 3 Runs and their results are calculated separately.
- Base: Base Run #1 is executed. It executes as many Copies, as it was configured to. The final result for Base Run #1 is calculated by averaging the results of every copy executed. Then Base Run #2 starts, then Base Run #3.
- Peak: After all Base benchmark is done, Peak benchmark starts. It executes just as Base did above. The difference is that the results are calculated differently. Instead of averaging the results of every executed Copy, Peak locates the highest performing Copy and uses that as the result for that Run #.
Peak is optional and doesn't actually need to be performed if we so wish. Its values can be collected from the Base run by simply grabbing the highest speed Copy from that run. But in that case, if the benchmark is configured to run just one Copy, the results from Peak and Base will coincide for every Run.
Techgage isn't planning at this time to run more than one Copy. So there are no plans to include Peak in Techgage benchmarks.
Results are published in Seconds
and as a Ratio
- Seconds: The average time of an executed Copy, for Base. Or the fastest time of an executed Copy, for Peak.
- Ratio: CPU2006 uses a reference machine (a 296 MHz UltraSPARC II) as the basis for the ratio calculation. A ratio of 20, means 20 times faster than this reference machine result.
It's this Ratio result that serves as the basis for CPU2006 reporting, as you can see form the reports linked above.
How is the benchmark suite executed?
SPEC has a very stringent set of procedures and rules that it demands every tester to follow in order for the benchmarks to be sanctioned and validated by SPEC (and allowed to be posted to their website).
The machine being benchmarked must have a compiler suite installed. For Linux this is usually GCC, while for Windows this is usually the Intel compiler. Other compilers can be used, but they must conform to certain C99
rules that SPEC source code makes use of (which on windows excludes Microsoft's VC++, for instance).
The tester can however use pre-compiled binaries of the individual tests. But this is by far less than ideal, because those binaries may have been compiled on a different CPU and may not fully reflect the CPU architecture being tested.
Techgage will always compile and never use pre-built binaries. Any exception will be clearly noted and the reasons detailed.
The tester must configure the compilation process, conforming to SPEC run rules
; a very complex rules set, as you can see from the link. Fortunately, compiler configuration files already exist that facilitate this process somehow and the tester is left with the task of tweaking these to their needs... following the rules at all times. Here you can find an example of a configuration file
An utility called runspec
, executes the benchmark once all is properly configured. This can take several hours. On a Intel 2600k Rob used, it takes around 13 hours to execute the whole benchmark (3 Runs of Base and Peak, 1 Copy each).
Once the benchmark is executed, result files are automatically generated and everything is checked against SPEC rules. If the benchmark conforms to these rules, the results files are flagged as valid and can be submitted to SPEC for inclusion on their results page
. Otherwise they are flagged as Invalid.
Invalid results aren't bad results (all results are good since they are always accurate). These are however results that don't conform to SPEC rules. But that may have been intentional, depending on the needs and requirements of the tester. The results just simply cannot be submitted to SPEC, since their specificity isn't in agreement with SPEC requirements for a standard benchmark procedure.
Does Techgage follow SPEC rules?
To the letter! All benchmarks done by Techgage in the context of its CPU or computer review articles, will be submitted to SPEC. Any exceptions will be clearly mentioned on the article and the reasons for that explained.
Techgage will use Windows for CPU2006 benchmarking. The source code build tools will be the Intel Compiler v12 and Visual Studio 2008. The configuration files have all been already tested and fully conform to SPEC rules.
Visual Studio is required because the Intel Compiler uses Microsoft C and C++ standard libraries. The reason 2008 is being used and not 2010 is because there's currently a bug in the Intel Compiler that generates an header conflict between math.h and Intel's specific mathinf.h (source
). Because SPEC does not allow changes in the source code of the individual benchmarks, we adopt VS 2008.