I don't know if its me not seeing something, but I've always thought that these PC-wide synthetic benchmarks are useful to exactly no one.
3D Mark is useful in that it'll give you an idea of how fast a game will run, a storage benchmark will tell you how fast you can transfer files/load apps. A PC-wide benchmark (sysmark and pcmark types) on the other hand, tells you very little about how fast your computer is at doing anything.
Take these three examples (fictional scores):
A dual socket server with two 6 core Xeons, no GPU, hooked up to a massive RAID server. Say this is able to score 100 points.
A quad core sandy bridge cpu, 3 way SLI GTX 580s, 1TB 7200rpm HDD. This is able to score 100 points also.
A dual core notebook, integrated GPU, with a mid-range SSD. This is able to score 50 points.
Obviously these 3 systems are intended for completely different tasks, but the scores are able to tell us absolutely nothing about how fast they are. If you were to pick a web-browsing scenario the notebook might be the best choice, as it will be just as fast yet far more efficient at such a simple task. The notebook obviously scored far lower.
Your above scenario is a hypothetical problem, but unless we go and run those systems through the PCMark tests and show the results, we can't actually discuss whether the testing is meaningful or not. Very likely, the dual-socket server would perform very well and score much higher than the other systems, because PCMark does virtually nothing with the GPU. Unfortunately, the notebook would also score very high in most areas because of the SSD. Anyway, I'll have to wait and see the actual tests and results in PCMark 7 to decide if I like what they're doing.
As for Vantage, the individual test suites show the finer points of detail, while the overall score is a weighted average of all the individual results. It's one of the reasons I didn't like the PCMark Vantage results -- take a look at this review, for instance: http://www.anandtech.com/show/4202/10
The only component being changed in all of those tests is the SSD. The overall scores should change, sure, but why does "Memories" show a difference? Obviously, the test isn't just about memory performance. TV and Movies shows the least change, which means it's probably video encoding/decoding where the SSD/HDD performance matters little. Gaming is a joke, unless by "gaming" they mean "loading a game level but not playing the game"; the same goes for music--a workload that usually doesn't need anything beyond storage.
Ultimately, all of the PCMark Vantage tests hit most parts of a system, just to varying degrees, and when you look at the whitepaper where the specific tests are listed you can see why "Gaming" is heavily influenced by SSD/HDD choice. That doesn't mean the test as a whole is bad, but you need to understand what it's actually testing. If you use a notebook with an HDD and switch to an SSD and PCMark Vantage scores improve by 25% (which is typical), that difference is very much reflected in real-world usage. CPU and GPU limited testing won't change much, but for everyday tasks the SSD will certainly help a lot.
So it seems a review of PCMark 7 is in order, then. That sounds very odd - a review of a benchmarking program. A benchmark of a benchmark :D
Maybe it's just me then, personally an abstraction layer between me and the results only tends to confuse things. I suppose for a person who isn't interested in finding out what a good level of IOPS (or otherwise) is, but still wants to try to find the best for his/her money, it would be useful.
"I suppose for a person who isn't interested in finding out what a good level of IOPS (or otherwise) is, but still wants to try to find the best for his/her money, it would be useful."
You more or less answered your own concern really.. For instance, if I was interested in how quickly I could start iTunes, I would look towards SSD performance primarily, as that seems to have the greatest impact above any other component.. whether that's advertised though is another issue entirely..
Its really just a matter of ignoring the overall score from a general POV (more is better, etc), and dissecting the parts that matter 'to you' as a consumer, and seeing how they're tested individually.. It's a matter of 'why' you would want something to be faster... if something scores really high, but is no faster at all (why would you buy it?).
I know I'm just rehashing more common sense things, it's really just knowing what speeds 'said things' up for you (gaming, etc), and ignoring the remaining results especially if they really aren't relavent. (cd drive reading speed when you arent using an optical drive for anything).
Wish I could edit my posts, and maybe you can, but I'm missing it..
I more should have just addressed what you mentioned about I/OPS, and even delete my initial post, because I'm sure this isn't quite necessary to add either. But, as far as I/OPS, you should probably ask why they even matter, before just dismissing them (http://www.orcsweb.com/blog/brad/what-are-storage-...
That link does more explain things though. Pretty short explanation too.
""IO" stands for input/output. IOPs stands for input/output operations per second.
Every time something is written to, or read from, a storage solution - that generates an IO operation. Physical disks have a limited number of IO operations per second (IOPs) that they can handle. Storage devices - including servers with local drives - often include multiple physical disks so the IOPs capacity would be a calculated combination of those resources, taking into account RAID level overhead and other factors.
If your application, or a combination of all the applications using that storage system, generate IOPs traffic in excess of the systems maximum IOPs capacity, the requests start to queue up and wait - meaning everything starts to run just a bit slower. Then as IOPs load continues to increase, things run slower and slower until performance is no longer acceptable.
It is very important to properly scale your system in a way that can support IOPs well in excess of the expected load - to allow for both traffic increases and to handle short bursts when they arise."
...they won't bump into tests that cause chip meltdown, which could potentially happen to non-synthetic loads. I'm talking about Furmark here.
Or if they do, i hope people act responsibly about it and throttle it where it is due, not by detecting the benchmark tool, but actually detecting the potential overload. Some people said the throttled driver wouldn't detect Furmark if its executable was renamed. I can't confirm or deny it, but that's not the Right Thing to do.
Just a semantics note here, but the Right Thing is a relative thing depending on for who, and what their goals are. The ethical thing to do, or the honest thing to do, is a better phrasing.
Futuremark actually asked me for some details on my concerns with 3DMark, and since I wrote them a lengthy response I figured some of the readers might want to hear the details as well. So, here's the major portion of the email content:
One concern with 3DMark (all iterations) is that they're simply not real games. They are 3D benchmarks that are repeatable, and as such they're somewhat useful as a baseline comparison. However, the only way to really tell how well certain games run is to go and benchmark the games themselves. Batman and HAWX, for example, are far less stressful on hardware than Crysis and Metro 2033; looking at just a 3DMark result, you can't really tell which games will run well on a system and which will have performance issues.
Of course, running gaming benchmarks doesn't really tell you how well other games will run either. A benchmark of Crysis really only tells you how Crysis will run; other games might have similar performance characteristics, but without testing you won't actually know. Thus, the best thing to truly quantify gaming performance is to grab as many games as you can and run benchmarks on all of them. So, even if 3DMark were to switch to the latest Unreal Engine and id Tech engines for the graphics core, while the results may (or may not) correlate better with games using UE/idTech, they still don't do more than give a baseline of performance.
The other concern is probably the bigger issue, and unfortunately it's mostly something Futuremark can't control. When there's a popular benchmark out there, the hardware manufacturers will often spend a lot more effort optimizing to run well on that benchmark rather than working on real performance problems. NVIDIA and ATI/AMD have been doing high performance GPUs long enough that they don't seem to worry about 3DMarks that much these days, but one look at the Intel IGP results tells you that they're focusing on the wrong areas.
If we average the four 3DMark scores, HD Graphics with Core i5 comes out 32% ahead of HD 4250. Do the same thing for the six games we tested, and the Intel "lead" drops to 1.4%.
The GT 420M setup wins in 3DMarks on average by 28%; in actual gaming results at low detail, the lead is 37%. Bump up to our medium detail settings, however, and the lead is now 57%; at high (which is generally unplayable on either system), the lead is 69%.
The performance optimizations that Intel has worked on with 3DMark vs. gaming is pretty clear (the GMA 4500MHD was worse back in the Core 2 days), but more importantly, these optimizations take time away from real problems. The fact is, Intel still has driver compatibility issues with quite a few games, but they focus on the stuff people are benchmarking rather than doing the admittedly large amount of work to really make their drivers 100% (or at least 99%) compatible. Having Futuremark certify drivers only does so much here, at least as far as I can tell, and it's almost impossible to catch every "cheat" that might be done to improve scores (as opposed to a driver enhancement that affects a large number of 3D rendering tasks).
So basically, the more weight we put on 3DMarks scores, the less meaningful they become because of the optimization games that companies will play. PCMark is certainly susceptible to this phenomenon as well, but in general my experience is that the PCMark test suite uses more in the way of real-world scenarios. If a manufacturer tries to improve their platform's productivity or multimedia scores, that will show up in actual office and multimedia applications as well.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
13 Comments
Back to Article
HibyPrime1 - Monday, March 21, 2011 - link
I don't know if its me not seeing something, but I've always thought that these PC-wide synthetic benchmarks are useful to exactly no one.3D Mark is useful in that it'll give you an idea of how fast a game will run, a storage benchmark will tell you how fast you can transfer files/load apps. A PC-wide benchmark (sysmark and pcmark types) on the other hand, tells you very little about how fast your computer is at doing anything.
Take these three examples (fictional scores):
A dual socket server with two 6 core Xeons, no GPU, hooked up to a massive RAID server. Say this is able to score 100 points.
A quad core sandy bridge cpu, 3 way SLI GTX 580s, 1TB 7200rpm HDD. This is able to score 100 points also.
A dual core notebook, integrated GPU, with a mid-range SSD. This is able to score 50 points.
Obviously these 3 systems are intended for completely different tasks, but the scores are able to tell us absolutely nothing about how fast they are. If you were to pick a web-browsing scenario the notebook might be the best choice, as it will be just as fast yet far more efficient at such a simple task. The notebook obviously scored far lower.
JarredWalton - Monday, March 21, 2011 - link
Your above scenario is a hypothetical problem, but unless we go and run those systems through the PCMark tests and show the results, we can't actually discuss whether the testing is meaningful or not. Very likely, the dual-socket server would perform very well and score much higher than the other systems, because PCMark does virtually nothing with the GPU. Unfortunately, the notebook would also score very high in most areas because of the SSD. Anyway, I'll have to wait and see the actual tests and results in PCMark 7 to decide if I like what they're doing.As for Vantage, the individual test suites show the finer points of detail, while the overall score is a weighted average of all the individual results. It's one of the reasons I didn't like the PCMark Vantage results -- take a look at this review, for instance:
http://www.anandtech.com/show/4202/10
The only component being changed in all of those tests is the SSD. The overall scores should change, sure, but why does "Memories" show a difference? Obviously, the test isn't just about memory performance. TV and Movies shows the least change, which means it's probably video encoding/decoding where the SSD/HDD performance matters little. Gaming is a joke, unless by "gaming" they mean "loading a game level but not playing the game"; the same goes for music--a workload that usually doesn't need anything beyond storage.
Ultimately, all of the PCMark Vantage tests hit most parts of a system, just to varying degrees, and when you look at the whitepaper where the specific tests are listed you can see why "Gaming" is heavily influenced by SSD/HDD choice. That doesn't mean the test as a whole is bad, but you need to understand what it's actually testing. If you use a notebook with an HDD and switch to an SSD and PCMark Vantage scores improve by 25% (which is typical), that difference is very much reflected in real-world usage. CPU and GPU limited testing won't change much, but for everyday tasks the SSD will certainly help a lot.
HibyPrime1 - Monday, March 21, 2011 - link
So it seems a review of PCMark 7 is in order, then. That sounds very odd - a review of a benchmarking program. A benchmark of a benchmark :DMaybe it's just me then, personally an abstraction layer between me and the results only tends to confuse things. I suppose for a person who isn't interested in finding out what a good level of IOPS (or otherwise) is, but still wants to try to find the best for his/her money, it would be useful.
choirbass - Tuesday, March 22, 2011 - link
"I suppose for a person who isn't interested in finding out what a good level of IOPS (or otherwise) is, but still wants to try to find the best for his/her money, it would be useful."You more or less answered your own concern really.. For instance, if I was interested in how quickly I could start iTunes, I would look towards SSD performance primarily, as that seems to have the greatest impact above any other component.. whether that's advertised though is another issue entirely..
Its really just a matter of ignoring the overall score from a general POV (more is better, etc), and dissecting the parts that matter 'to you' as a consumer, and seeing how they're tested individually.. It's a matter of 'why' you would want something to be faster... if something scores really high, but is no faster at all (why would you buy it?).
I know I'm just rehashing more common sense things, it's really just knowing what speeds 'said things' up for you (gaming, etc), and ignoring the remaining results especially if they really aren't relavent. (cd drive reading speed when you arent using an optical drive for anything).
choirbass - Tuesday, March 22, 2011 - link
Wish I could edit my posts, and maybe you can, but I'm missing it..I more should have just addressed what you mentioned about I/OPS, and even delete my initial post, because I'm sure this isn't quite necessary to add either. But, as far as I/OPS, you should probably ask why they even matter, before just dismissing them (http://www.orcsweb.com/blog/brad/what-are-storage-...
That link does more explain things though. Pretty short explanation too.
""IO" stands for input/output. IOPs stands for input/output operations per second.
Every time something is written to, or read from, a storage solution - that generates an IO operation. Physical disks have a limited number of IO operations per second (IOPs) that they can handle. Storage devices - including servers with local drives - often include multiple physical disks so the IOPs capacity would be a calculated combination of those resources, taking into account RAID level overhead and other factors.
If your application, or a combination of all the applications using that storage system, generate IOPs traffic in excess of the systems maximum IOPs capacity, the requests start to queue up and wait - meaning everything starts to run just a bit slower. Then as IOPs load continues to increase, things run slower and slower until performance is no longer acceptable.
It is very important to properly scale your system in a way that can support IOPs well in excess of the expected load - to allow for both traffic increases and to handle short bursts when they arise."
yioemolsdow - Wednesday, April 20, 2011 - link
★∵☆.◢◣ ◢◣
◢■■◣ ◢■■◣
◢■■■■■■■■■◣
◢■■■╭~~*╮((((( ■■■◣
◥■■/( '-' ) (' .' ) ■■■◤
◥■■■/■ ..../■ ■■◤
◥■■■■■◤ jordan air max oakland raiders $34a€“39;
◥■■■◤
◥■◤ Christan Audigier BIKINI JACKET $25;
▼
\ Ed Hardy AF JUICY POLO Bikini $25;
\
\ gstar coogi evisu true jeans $35;
\
\ gstar coogi evisu true jeans $35;
\
\ coogi DG edhardy gucci t-shirts $18;
● \ ●
《 》 》》
》 《
_?▂▃▄▅▆▇███▇▆▅▄▃▂
^__^:====( www etradinglife com )======
Oxaqata - Monday, March 21, 2011 - link
Wonder if this one will be an Intel biased dual-core loving joke also! Cannot believe you have kept the older version in your suite for so long.Gonemad - Tuesday, March 22, 2011 - link
...they won't bump into tests that cause chip meltdown, which could potentially happen to non-synthetic loads. I'm talking about Furmark here.Or if they do, i hope people act responsibly about it and throttle it where it is due, not by detecting the benchmark tool, but actually detecting the potential overload. Some people said the throttled driver wouldn't detect Furmark if its executable was renamed. I can't confirm or deny it, but that's not the Right Thing to do.
GullLars - Tuesday, March 22, 2011 - link
Just a semantics note here, but the Right Thing is a relative thing depending on for who, and what their goals are. The ethical thing to do, or the honest thing to do, is a better phrasing.Gonemad - Thursday, March 24, 2011 - link
I stand corrected. I meant the ethical and honest thing to do. Just to leave no doubts.Thank you, nonetheless.
JarredWalton - Wednesday, March 23, 2011 - link
Futuremark actually asked me for some details on my concerns with 3DMark, and since I wrote them a lengthy response I figured some of the readers might want to hear the details as well. So, here's the major portion of the email content:One concern with 3DMark (all iterations) is that they're simply not real games. They are 3D benchmarks that are repeatable, and as such they're somewhat useful as a baseline comparison. However, the only way to really tell how well certain games run is to go and benchmark the games themselves. Batman and HAWX, for example, are far less stressful on hardware than Crysis and Metro 2033; looking at just a 3DMark result, you can't really tell which games will run well on a system and which will have performance issues.
Of course, running gaming benchmarks doesn't really tell you how well other games will run either. A benchmark of Crysis really only tells you how Crysis will run; other games might have similar performance characteristics, but without testing you won't actually know. Thus, the best thing to truly quantify gaming performance is to grab as many games as you can and run benchmarks on all of them. So, even if 3DMark were to switch to the latest Unreal Engine and id Tech engines for the graphics core, while the results may (or may not) correlate better with games using UE/idTech, they still don't do more than give a baseline of performance.
The other concern is probably the bigger issue, and unfortunately it's mostly something Futuremark can't control. When there's a popular benchmark out there, the hardware manufacturers will often spend a lot more effort optimizing to run well on that benchmark rather than working on real performance problems. NVIDIA and ATI/AMD have been doing high performance GPUs long enough that they don't seem to worry about 3DMarks that much these days, but one look at the Intel IGP results tells you that they're focusing on the wrong areas.
Take the comparison between an Arrandale i5-520M with HD Graphics and an Athlon II P340 with HD 4250: http://www.anandtech.com/bench/Product/232?vs=336
If we average the four 3DMark scores, HD Graphics with Core i5 comes out 32% ahead of HD 4250. Do the same thing for the six games we tested, and the Intel "lead" drops to 1.4%.
Or perhaps even better, how about Sandy Bridge i7-2820QM with Intel HD 3000 Graphics against i5-460M with GT 420M: http://www.anandtech.com/bench/Product/327?vs=238
The GT 420M setup wins in 3DMarks on average by 28%; in actual gaming results at low detail, the lead is 37%. Bump up to our medium detail settings, however, and the lead is now 57%; at high (which is generally unplayable on either system), the lead is 69%.
The performance optimizations that Intel has worked on with 3DMark vs. gaming is pretty clear (the GMA 4500MHD was worse back in the Core 2 days), but more importantly, these optimizations take time away from real problems. The fact is, Intel still has driver compatibility issues with quite a few games, but they focus on the stuff people are benchmarking rather than doing the admittedly large amount of work to really make their drivers 100% (or at least 99%) compatible. Having Futuremark certify drivers only does so much here, at least as far as I can tell, and it's almost impossible to catch every "cheat" that might be done to improve scores (as opposed to a driver enhancement that affects a large number of 3D rendering tasks).
So basically, the more weight we put on 3DMarks scores, the less meaningful they become because of the optimization games that companies will play. PCMark is certainly susceptible to this phenomenon as well, but in general my experience is that the PCMark test suite uses more in the way of real-world scenarios. If a manufacturer tries to improve their platform's productivity or multimedia scores, that will show up in actual office and multimedia applications as well.
tipoo - Thursday, March 24, 2011 - link
Wow, awesome. Have they responded? Let us know when they do!Intel certainly seems like the most shoddy player, although AMD and Nvidia are hardly clean players either.
lili94 - Wednesday, March 23, 2011 - link
welcome