Amd's Bulldozer Disappoints: Why That's Good News
#1
Posted 14 October 2011 - 01:31 PM
#2
Posted 14 October 2011 - 01:54 PM
#3
Posted 14 October 2011 - 03:07 PM
why isnt turbo core like crossire x gpu scaling, it should be 60% more or faster, on each core
why isnt turbo core like gpu scaling, each core should be 60 % or faster from turbo core, anybody know this?
#4
Posted 14 October 2011 - 04:23 PM
#5
Posted 14 October 2011 - 05:11 PM
#6
Posted 14 October 2011 - 05:58 PM
Intels fastest quad 2600 is 3.6 ghz 1333mhz RAM and 9mb cache. The 8 core AMD model is 245 dollars. It better in every way except doesn't have hyperthread. Which doesn't matter in single thread performance. Who did these benchmarks and how did they post them before final silicone was ever produced. Many of these benchmarks were posted a year ago and then have been re-posted many times after someone points out the numbers in them don't match what should be there. Quick photoshop and there back
#7
Posted 14 October 2011 - 06:21 PM
Scottyugs7, on 14 October 2011 - 05:58 PM, said:
Intels fastest quad 2600 is 3.6 ghz 1333mhz RAM and 9mb cache. The 8 core AMD model is 245 dollars. It better in every way except doesn't have hyperthread. Which doesn't matter in single thread performance. Who did these benchmarks and how did they post them before final silicone was ever produced. Many of these benchmarks were posted a year ago and then have been re-posted many times after someone points out the numbers in them don't match what should be there. Quick photoshop and there back
Wow.. you really are clueless. The 4170 has FOUR - read that again, 4MB cache. NOT 12.
Also, the 4 core chip, is REALLY only 2. Two and a half if you really want to count the dual integer units per core. However, all the key components, are SHARED per module. Meaning ONE decode stage PER PAIR of 'cores', ONE Floating Point processor PER PAIR of 'cores'. In other words, this is the AMD equivalent to HyperThreading. Only, with this, Windows doesn't see the extra cores as 'virtual cores' and doesn't manage them correctly as a result.
AMDs processors also cannot keep up in work done PER CLOCK. The Intel i5 SMOKES the new ULTRA HIGH END 8150 from AMD. Rolls it up and smokes it!
The new AMD processors perform the WORST in comparison when in SINGLE THREADED applications. Actually performing WORSE than the AMD PHENOM II!
Now the question:
Do you actually believe the crap you post, or are you trolling?
#8
Posted 14 October 2011 - 07:49 PM
The real reason that Bulldozer did not stack up in the benchmarks is the compiler used for for each of the benchmarks. All of these closed-source benchmarks are compiled on the standard Intel compiler with the Intel libraries. It is not optimized to support any instructions beyond SSE3 for any processor other than Intel chips. SSE4.1, SSE4.2, AVX, and FMA4 significantly increase the floating point performance of AMD processors, but are not used by code compiled on an Intel compiler.
If you look at the integer performance of the benchmarks, AMD almost always out-performs the intel chips and shows a 15-30% increase in performance over the Phenom II x6 processors. If the compiler used was completely optimized for both Intel and AMD, floating point performance would also show similar gains.
Lastly, under full load where all of the threads are being used, the Intel chip is not physically capable of beating the AMD chip. 4 cores that complete one instruction each per cycle cannot physically beat 8 cores completing 1 instruction each per cycle, when threads are continually running.
#9
Posted 14 October 2011 - 07:58 PM
BryanMeyers, on 14 October 2011 - 07:49 PM, said:
Apparently not much of an engineer.
You have absolutely no farking clue.
Quote
If you look at the integer performance of the benchmarks, AMD almost always out-performs the intel chips and shows a 15-30% increase in performance over the Phenom II x6 processors. If the compiler used was completely optimized for both Intel and AMD, floating point performance would also show similar gains.
Lastly, under full load where all of the threads are being used, the Intel chip is not physically capable of beating the AMD chip. 4 cores that complete one instruction each per cycle cannot physically beat 8 cores completing 1 instruction each per cycle, when threads are continually running.
AMD lost on every benchmark available, except light multithreading loads with very few floating point operations. Not because of compilers. Because of poor design choices.
Since you don't understand the reasoning those choices have such impact, I suggest a little light reading that will give you a quick over view and some understanding. Anandtech does a surprisingly good job with that.
Also, since you have made it clear you have no concept at all. Intel processors since the Core 2 have all been capable of FOUR instructions per clock cycle PER CORE. Again, do the research. Don't bullshit. You will get caught.
#10
Posted 14 October 2011 - 09:14 PM
While the RISC specifics of Bulldozer have not been released, in the K10 architecture used by Phenom, 9 functional operations and 4 memory operations could be dispatched each cycle.
Since a modified floating point unit, 4th integer pipeline, and second integer core were added to the Bulldozer module, I would expect to see 16 integer operations issued across 8 pipeline and 3-4 floating point operations per bulldozer module.
Core issues 4 CISC instructions per cycle to the uOp schedulers. K10 issued 3 CISC instructions per cycle to it's compilers. the number of issues per cycle for Bulldozer has increased beyond the 3 CISC limitation of K10 to avoid starving the new pipelines.
I am not "bullshitting" we were simply not referring to the same thing and I was incorrect about how many issues per cycle were allotted to each design. I am well aware of the impact of each of these design choices as my graduate work centered around the design of superscalar architectures with multiple issue schedulers.
As for the compiler issue, this is a known problem that has existed since the 386 days. Futuremark is compiled entirely using the Intel compiler and has always shown favoritism towards Intel. Open-source benchmarks compiled on GNU compilers show this bias very plainly. There is a reason that SPEC is still used in the industry as a benchmarking tool: It relies on unbiased compilers. Don't believe me? Do a quick google search for "Intel's Cripple AMD function"
#11
Posted 14 October 2011 - 09:17 PM
#12
Posted 15 October 2011 - 12:59 AM
BryanMeyers, on 14 October 2011 - 09:14 PM, said:
While the RISC specifics of Bulldozer have not been released, in the K10 architecture used by Phenom, 9 functional operations and 4 memory operations could be dispatched each cycle.
Since a modified floating point unit, 4th integer pipeline, and second integer core were added to the Bulldozer module, I would expect to see 16 integer operations issued across 8 pipeline and 3-4 floating point operations per bulldozer module.
That is a great deal closer to the reality. In the end though, expectations and end results are two different things. Now, ignore the benchmarks for a moment. Break out the real world applications. Looking at iTunes, you find that the new cores have terrible single threaded performance dealing with audio re-compression, this is not going to be related to the compiler in any way. If that isn't enough, look at Handbrake, an open source application that we know isn't compiled with a compromised compiler. This is a very heavily multithreaded application. When re-compressing video with x.264 (again, open source software), the processor tells us what it is really made of. Coming in considerably slower than the old Phenom II's.
Quote
I am not "bullshitting" we were simply not referring to the same thing and I was incorrect about how many issues per cycle were allotted to each design. I am well aware of the impact of each of these design choices as my graduate work centered around the design of superscalar architectures with multiple issue schedulers.
As for the compiler issue, this is a known problem that has existed since the 386 days. Futuremark is compiled entirely using the Intel compiler and has always shown favoritism towards Intel. Open-source benchmarks compiled on GNU compilers show this bias very plainly. There is a reason that SPEC is still used in the industry as a benchmarking tool: It relies on unbiased compilers. Don't believe me? Do a quick google search for "Intel's Cripple AMD function"
Now, one last note regarding the compiler. Since I have seen this before. Intel was FORCED to remove the BIAS. The compiler is now designed to use the most optimum code path for "known" processors. This creates its own problems, of course, as they can sit on their butts as long as they want to before optimizing for a new CPU.
http://www.agner.org...g/read.php?i=49
#13
Posted 15 October 2011 - 05:38 AM
As for handbrake: Sandy-bridge has a built in x264 transcoder because of its onboard graphics. This is part of the reason that AMD has insisted that AMD graphics cards be used in the testing process. UVD 1 and higher have x264 transcoding capabilities and it has been demonstrated that the presence of even a mid-range AMD card ( 5750 or 6770 ) is capable of besting the sandy-bridge chips without the aid of the processor. It is as important to test the platform as it is to test the chip itself. Had a 6990 been running in the AMD test systems, a significant change in performance would have been observed. Even though AMD supports NVIDIA hardware, it is designed to run best with its own products. Had the FX series been an APU and not just a CPU, these benchmarks would provide a much different picture as well.
As for Intel optimizing their compiler for other architectures, the compiler may have been fixed, but the libraries are still crippled. It is still an issue in the newer libraries and for new instruction sets like FMA4 and XOP. There may never be support in the Intel libraries for those extensions because they are AMD specific and will never be implemented in Intel architectures. That said, even GNU isn't capable of handling the new extensions.
That means every benchmark does not express the true performance of floating point operations on Bulldozer. FMA4 is critical to proper scheduling on the new floating point unit. I will also point back to the improved integer performance. Every benchmark I have seen shows that integer performance is up on the last generation. If separate graphs had been used for floating point and integer performance in the benchmark reviews, the 8150 would have consistently been on the top for integer performance in almost every benchmark.
#14
Posted 15 October 2011 - 11:35 AM
BryanMeyers, on 15 October 2011 - 05:38 AM, said:
As for handbrake: Sandy-bridge has a built in x264 transcoder because of its onboard graphics. This is part of the reason that AMD has insisted that AMD graphics cards be used in the testing process. UVD 1 and higher have x264 transcoding capabilities and it has been demonstrated that the presence of even a mid-range AMD card ( 5750 or 6770 ) is capable of besting the sandy-bridge chips without the aid of the processor. It is as important to test the platform as it is to test the chip itself. Had a 6990 been running in the AMD test systems, a significant change in performance would have been observed. Even though AMD supports NVIDIA hardware, it is designed to run best with its own products. Had the FX series been an APU and not just a CPU, these benchmarks would provide a much different picture as well.
Intel Quick Sync is very finicky. I would have to double check the platform used in testing, but I can say, without a doubt, that if a 'P' series chipset is used, then Quick Sync is immediately disabled. 'H' series support quicksync if no other video card is used, and only the 'Z' series would really let them get away with using quick sync, and a dedicated video card.
The real question though: how did the new 8 core chip end up slower than last generations, lower clocked, 6 core chip?
Quote
That means every benchmark does not express the true performance of floating point operations on Bulldozer. FMA4 is critical to proper scheduling on the new floating point unit. I will also point back to the improved integer performance. Every benchmark I have seen shows that integer performance is up on the last generation. If separate graphs had been used for floating point and integer performance in the benchmark reviews, the 8150 would have consistently been on the top for integer performance in almost every benchmark.
Benchmarks maybe. Real world use, not hardly.
In the real world, the new processor has been all over the place. Very specific mult-threading scenarios work out quite well for AMD. Getting up to i5 and almost i7 performance levels. The rest of the time, it lags anywhere from slow to what-the-hell-happened SLOW.
There is something terribly wrong with this chip design.
#15
Posted 15 October 2011 - 12:34 PM
I began to suspect this to be the case when the Phenom II x6 chips launched. Especially when performance between windows 7 and ubuntu 11.04 for the SPEC benchmarks provided very different pictures. I can speak to the consistency of compilation between Windows and Ubuntu because Intel was benching the same on both sides according to SPEC. This was not the case with the x6 processor which saw a near-linear increase in performance on Ubuntu, but saw diminishing returns on Windows.
#16
Posted 15 October 2011 - 02:17 PM
BryanMeyers, on 15 October 2011 - 12:34 PM, said:
I began to suspect this to be the case when the Phenom II x6 chips launched. Especially when performance between windows 7 and ubuntu 11.04 for the SPEC benchmarks provided very different pictures. I can speak to the consistency of compilation between Windows and Ubuntu because Intel was benching the same on both sides according to SPEC. This was not the case with the x6 processor which saw a near-linear increase in performance on Ubuntu, but saw diminishing returns on Windows.
The problem we are dealing with here, more than anything else, is how the cores represent themselves to Windows. Windows 7 uses core parking, and will attempt to load up 'real' cores before the 'virtual' cores. AMD chose to tell Windows it has 8 full cores on a system with only 4 complete cores. So Windows loads up 1, 2, and 3 on a lightly multi-threaded application, where the peak performance would be realized on 1, 3, and 5 OR 2, 4, and 6. That is where most of the multi-threading performance hit is taken. EG - Prime 95. Prime 95 creates multiple, independent, single core floating point workloads that will quickly and easily show the limitations of the new AMD processors. Interestingly, while Prime would be a perfect example, I have yet to see anyone test with Prime.
#17
Posted 15 October 2011 - 11:10 PM
No wonder PC makers are struggling with their profit margins and companies like HP trying to sell off their PC business.
And with win 8 doesnt necessarily requiring people to buy a new PC unlike win 7,the industry has a very bleak future.
#18
Posted 16 October 2011 - 05:43 AM
#19
Posted 16 October 2011 - 12:10 PM
Adame2qf, on 16 October 2011 - 05:43 AM, said:
People like me, who still prefer quality over quantity. I rip using the absolute highest quality settings available. I do not buy 160Kb/s audio because they sound like garbage.
#20
Posted 17 October 2011 - 08:04 AM
Specifically speaking as a gamer. most games out there aren't multi core optimized, they say they are but that just means picking the core with the lowest overhead.
Like having 3k cores at 1mhz isn't the same as 1 core at 3.0ghz
Help












