PCWorld Forums

PCWorld Forums: Amd's Bulldozer Disappoints: Why That's Good News - PCWorld Forums

Jump to content

  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

Amd's Bulldozer Disappoints: Why That's Good News

#1 User is offline   PCWorld 

  • Advanced Member
  • PipPipPipPipPipPipPipPip
  • Group: PCWorld BOT
  • Posts: 103,725
  • Joined: 01-August 07

Posted 14 October 2011 - 01:31 PM

Post your comments for AMD's Bulldozer Disappoints: Why That's Good News here
0

#2 User is offline   ClaudeD 

  • Advanced Member
  • PipPipPipPip
  • Group: Members
  • Posts: 493
  • Joined: 01-January 07

  Posted 14 October 2011 - 01:54 PM

Holy Mozes, another slanted comparion between Intel and AMD where the AMD processor is faster then the Intel cpu with a "our stuff only tests 2 cpu processors exception". So what good is the comparison? Lets see some eal world appcication test reviews.
0

#3 User is offline   Linker11 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 2
  • Joined: 11-March 11

  Posted 14 October 2011 - 03:07 PM

why isnt turbo core like crossire x gpu scaling, it should be 60% more or faster, on each core

why isnt turbo core like crossire x gpu scaling, it should be 60% more or faster, on each core

why isnt turbo core like gpu scaling, each core should be 60 % or faster from turbo core, anybody know this?
0

#4 User is offline   campdude 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 9
  • Joined: 24-May 11

  Posted 14 October 2011 - 04:23 PM

They benched the FX with turbo core enabled. If you never looked at the benches and read (this artice) you would think oh its got "turbo core" that changes things.. but they benched it with that enabled.
0

#5 User is offline   waldojim 

  • Elite
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 15,066
  • Joined: 29-October 08
  • Location:Texas

Posted 14 October 2011 - 05:11 PM

We have been waiting for multi-threaded apps since Intel released the Core2Quad... or the original Phenom depending on who you are. They have yet to show. And I don't see it happening just because AMD released another so-so chip that can ONLY work well with heavily threaded applications that don't actually load down the chip.
"There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'" -- Isaac Asimov

Lenovo W520 CTO Intel i7-2620m, 8GB Patriot ram @ 1333Mhz, Nvidia Quadro 1000m with 2GB GDRR3, Plextor M3 256GB SSD, 1080P wide color display, Windows 8 Pro
Media Center: Intel Core i5 760 @ 3.1Ghz, 4GB DDR3, Corsair GS600PSU, EVGA Geforce 550ti, EVGA P55 SLI, 3x 1TB raid 5, 1x 1TB boot drive, Windows 8 Pro, Win TV 950(USB), Pioneer BR.
Server: AMD Phenom X4 945 @ 3.0Ghz, MSI 790FX-GD70, 16gb ddr3 RAM @ 1333mhz, 2TB Seagate HDD, 64GB Patriot SSD, Asus Silent Gefore 210
The Green machine: AMD Sempron 145EE Unlocked and OC'd to 4.1Ghz, Gigabyte GD970A-DS3, 8GB ram @ 1600mhz, Nvidia 550Ti, Thermaltake BlueOrb, Antec EW385
Samsung Galaxy Nexus, Paranoid Android 4.2 Rom http://www.speedtest...d/315465831.png
0

#6 User is offline   Scottyugs7 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 3
  • Joined: 12-October 11

  Posted 14 October 2011 - 05:58 PM

Bulldozer is boss. The quad core AMD cost 115 buck (FACT) the 4170 does like 4ghz percore 1866mhz RAM controller 12mb cache
Intels fastest quad 2600 is 3.6 ghz 1333mhz RAM and 9mb cache. The 8 core AMD model is 245 dollars. It better in every way except doesn't have hyperthread. Which doesn't matter in single thread performance. Who did these benchmarks and how did they post them before final silicone was ever produced. Many of these benchmarks were posted a year ago and then have been re-posted many times after someone points out the numbers in them don't match what should be there. Quick photoshop and there back
0

#7 User is offline   waldojim 

  • Elite
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 15,066
  • Joined: 29-October 08
  • Location:Texas

Posted 14 October 2011 - 06:21 PM

View PostScottyugs7, on 14 October 2011 - 05:58 PM, said:

Bulldozer is boss. The quad core AMD cost 115 buck (FACT) the 4170 does like 4ghz percore 1866mhz RAM controller 12mb cache
Intels fastest quad 2600 is 3.6 ghz 1333mhz RAM and 9mb cache. The 8 core AMD model is 245 dollars. It better in every way except doesn't have hyperthread. Which doesn't matter in single thread performance. Who did these benchmarks and how did they post them before final silicone was ever produced. Many of these benchmarks were posted a year ago and then have been re-posted many times after someone points out the numbers in them don't match what should be there. Quick photoshop and there back

Wow.. you really are clueless. The 4170 has FOUR - read that again, 4MB cache. NOT 12.
Also, the 4 core chip, is REALLY only 2. Two and a half if you really want to count the dual integer units per core. However, all the key components, are SHARED per module. Meaning ONE decode stage PER PAIR of 'cores', ONE Floating Point processor PER PAIR of 'cores'. In other words, this is the AMD equivalent to HyperThreading. Only, with this, Windows doesn't see the extra cores as 'virtual cores' and doesn't manage them correctly as a result.

AMDs processors also cannot keep up in work done PER CLOCK. The Intel i5 SMOKES the new ULTRA HIGH END 8150 from AMD. Rolls it up and smokes it!

The new AMD processors perform the WORST in comparison when in SINGLE THREADED applications. Actually performing WORSE than the AMD PHENOM II!

Now the question:

Do you actually believe the crap you post, or are you trolling?
"There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'" -- Isaac Asimov

Lenovo W520 CTO Intel i7-2620m, 8GB Patriot ram @ 1333Mhz, Nvidia Quadro 1000m with 2GB GDRR3, Plextor M3 256GB SSD, 1080P wide color display, Windows 8 Pro
Media Center: Intel Core i5 760 @ 3.1Ghz, 4GB DDR3, Corsair GS600PSU, EVGA Geforce 550ti, EVGA P55 SLI, 3x 1TB raid 5, 1x 1TB boot drive, Windows 8 Pro, Win TV 950(USB), Pioneer BR.
Server: AMD Phenom X4 945 @ 3.0Ghz, MSI 790FX-GD70, 16gb ddr3 RAM @ 1333mhz, 2TB Seagate HDD, 64GB Patriot SSD, Asus Silent Gefore 210
The Green machine: AMD Sempron 145EE Unlocked and OC'd to 4.1Ghz, Gigabyte GD970A-DS3, 8GB ram @ 1600mhz, Nvidia 550Ti, Thermaltake BlueOrb, Antec EW385
Samsung Galaxy Nexus, Paranoid Android 4.2 Rom http://www.speedtest...d/315465831.png
0

#8 User is offline   BryanMeyers 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 5
  • Joined: 14-October 11

  Posted 14 October 2011 - 07:49 PM

@Waldojim: I am a computer engineer and can honestly say that you don't have a clue what you are talking about. One decode and issue stage does not have to equate to one instruction issued per cycle. In fact the bulldozer modules dispatch two instructions every cycle and can do up to 8 integer operations per cycle. In addition, the "shared" floating point unit allows for 2 - 128 bit floating point operations, 4 - 64 bit floating point operations, or 8 - 32-bit floating point calculations per cycle, as well as a mix of those numbers.

The real reason that Bulldozer did not stack up in the benchmarks is the compiler used for for each of the benchmarks. All of these closed-source benchmarks are compiled on the standard Intel compiler with the Intel libraries. It is not optimized to support any instructions beyond SSE3 for any processor other than Intel chips. SSE4.1, SSE4.2, AVX, and FMA4 significantly increase the floating point performance of AMD processors, but are not used by code compiled on an Intel compiler.

If you look at the integer performance of the benchmarks, AMD almost always out-performs the intel chips and shows a 15-30% increase in performance over the Phenom II x6 processors. If the compiler used was completely optimized for both Intel and AMD, floating point performance would also show similar gains.

Lastly, under full load where all of the threads are being used, the Intel chip is not physically capable of beating the AMD chip. 4 cores that complete one instruction each per cycle cannot physically beat 8 cores completing 1 instruction each per cycle, when threads are continually running.
0

#9 User is offline   waldojim 

  • Elite
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 15,066
  • Joined: 29-October 08
  • Location:Texas

Posted 14 October 2011 - 07:58 PM

View PostBryanMeyers, on 14 October 2011 - 07:49 PM, said:

@Waldojim: I am a computer engineer and can honestly say that you don't have a clue what you are talking about. One decode and issue stage does not have to equate to one instruction issued per cycle. In fact the bulldozer modules dispatch two instructions every cycle and can do up to 8 integer operations per cycle. In addition, the "shared" floating point unit allows for 2 - 128 bit floating point operations, 4 - 64 bit floating point operations, or 8 - 32-bit floating point calculations per cycle, as well as a mix of those numbers.

Apparently not much of an engineer.
You have absolutely no farking clue.

Quote

The real reason that Bulldozer did not stack up in the benchmarks is the compiler used for for each of the benchmarks. All of these closed-source benchmarks are compiled on the standard Intel compiler with the Intel libraries. It is not optimized to support any instructions beyond SSE3 for any processor other than Intel chips. SSE4.1, SSE4.2, AVX, and FMA4 significantly increase the floating point performance of AMD processors, but are not used by code compiled on an Intel compiler.

If you look at the integer performance of the benchmarks, AMD almost always out-performs the intel chips and shows a 15-30% increase in performance over the Phenom II x6 processors. If the compiler used was completely optimized for both Intel and AMD, floating point performance would also show similar gains.

Lastly, under full load where all of the threads are being used, the Intel chip is not physically capable of beating the AMD chip. 4 cores that complete one instruction each per cycle cannot physically beat 8 cores completing 1 instruction each per cycle, when threads are continually running.

AMD lost on every benchmark available, except light multithreading loads with very few floating point operations. Not because of compilers. Because of poor design choices.

Since you don't understand the reasoning those choices have such impact, I suggest a little light reading that will give you a quick over view and some understanding. Anandtech does a surprisingly good job with that.

Also, since you have made it clear you have no concept at all. Intel processors since the Core 2 have all been capable of FOUR instructions per clock cycle PER CORE. Again, do the research. Don't bullshit. You will get caught.
"There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'" -- Isaac Asimov

Lenovo W520 CTO Intel i7-2620m, 8GB Patriot ram @ 1333Mhz, Nvidia Quadro 1000m with 2GB GDRR3, Plextor M3 256GB SSD, 1080P wide color display, Windows 8 Pro
Media Center: Intel Core i5 760 @ 3.1Ghz, 4GB DDR3, Corsair GS600PSU, EVGA Geforce 550ti, EVGA P55 SLI, 3x 1TB raid 5, 1x 1TB boot drive, Windows 8 Pro, Win TV 950(USB), Pioneer BR.
Server: AMD Phenom X4 945 @ 3.0Ghz, MSI 790FX-GD70, 16gb ddr3 RAM @ 1333mhz, 2TB Seagate HDD, 64GB Patriot SSD, Asus Silent Gefore 210
The Green machine: AMD Sempron 145EE Unlocked and OC'd to 4.1Ghz, Gigabyte GD970A-DS3, 8GB ram @ 1600mhz, Nvidia 550Ti, Thermaltake BlueOrb, Antec EW385
Samsung Galaxy Nexus, Paranoid Android 4.2 Rom http://www.speedtest...d/315465831.png
0

#10 User is offline   BryanMeyers 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 5
  • Joined: 14-October 11

  Posted 14 October 2011 - 09:14 PM

I was referring to CISC instructions, not the so-called microOps that are dispatched for the RISC cores in the Core architecture. An Intel Core chip is able to dispatch 8 functional uOps + 3 memory uOps every cycle, if the buffers contain enough instructions.

While the RISC specifics of Bulldozer have not been released, in the K10 architecture used by Phenom, 9 functional operations and 4 memory operations could be dispatched each cycle.

Since a modified floating point unit, 4th integer pipeline, and second integer core were added to the Bulldozer module, I would expect to see 16 integer operations issued across 8 pipeline and 3-4 floating point operations per bulldozer module.

Core issues 4 CISC instructions per cycle to the uOp schedulers. K10 issued 3 CISC instructions per cycle to it's compilers. the number of issues per cycle for Bulldozer has increased beyond the 3 CISC limitation of K10 to avoid starving the new pipelines.

I am not "bullshitting" we were simply not referring to the same thing and I was incorrect about how many issues per cycle were allotted to each design. I am well aware of the impact of each of these design choices as my graduate work centered around the design of superscalar architectures with multiple issue schedulers.

As for the compiler issue, this is a known problem that has existed since the 386 days. Futuremark is compiled entirely using the Intel compiler and has always shown favoritism towards Intel. Open-source benchmarks compiled on GNU compilers show this bias very plainly. There is a reason that SPEC is still used in the industry as a benchmarking tool: It relies on unbiased compilers. Don't believe me? Do a quick google search for "Intel's Cripple AMD function"
0

#11 User is offline   BryanMeyers 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 5
  • Joined: 14-October 11

  Posted 14 October 2011 - 09:17 PM

"K10 issued 3 CISC instructions per cycle to it's compilers." I meant to say schedulers not compilers, sorry for typing too fast
0

#12 User is offline   waldojim 

  • Elite
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 15,066
  • Joined: 29-October 08
  • Location:Texas

Posted 15 October 2011 - 12:59 AM

View PostBryanMeyers, on 14 October 2011 - 09:14 PM, said:

I was referring to CISC instructions, not the so-called microOps that are dispatched for the RISC cores in the Core architecture. An Intel Core chip is able to dispatch 8 functional uOps + 3 memory uOps every cycle, if the buffers contain enough instructions.

While the RISC specifics of Bulldozer have not been released, in the K10 architecture used by Phenom, 9 functional operations and 4 memory operations could be dispatched each cycle.

Since a modified floating point unit, 4th integer pipeline, and second integer core were added to the Bulldozer module, I would expect to see 16 integer operations issued across 8 pipeline and 3-4 floating point operations per bulldozer module.

That is a great deal closer to the reality. In the end though, expectations and end results are two different things. Now, ignore the benchmarks for a moment. Break out the real world applications. Looking at iTunes, you find that the new cores have terrible single threaded performance dealing with audio re-compression, this is not going to be related to the compiler in any way. If that isn't enough, look at Handbrake, an open source application that we know isn't compiled with a compromised compiler. This is a very heavily multithreaded application. When re-compressing video with x.264 (again, open source software), the processor tells us what it is really made of. Coming in considerably slower than the old Phenom II's.

Quote

Core issues 4 CISC instructions per cycle to the uOp schedulers. K10 issued 3 CISC instructions per cycle to it's compilers. the number of issues per cycle for Bulldozer has increased beyond the 3 CISC limitation of K10 to avoid starving the new pipelines.

I am not "bullshitting" we were simply not referring to the same thing and I was incorrect about how many issues per cycle were allotted to each design. I am well aware of the impact of each of these design choices as my graduate work centered around the design of superscalar architectures with multiple issue schedulers.

As for the compiler issue, this is a known problem that has existed since the 386 days. Futuremark is compiled entirely using the Intel compiler and has always shown favoritism towards Intel. Open-source benchmarks compiled on GNU compilers show this bias very plainly. There is a reason that SPEC is still used in the industry as a benchmarking tool: It relies on unbiased compilers. Don't believe me? Do a quick google search for "Intel's Cripple AMD function"


Now, one last note regarding the compiler. Since I have seen this before. Intel was FORCED to remove the BIAS. The compiler is now designed to use the most optimum code path for "known" processors. This creates its own problems, of course, as they can sit on their butts as long as they want to before optimizing for a new CPU.

http://www.agner.org...g/read.php?i=49
"There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'" -- Isaac Asimov

Lenovo W520 CTO Intel i7-2620m, 8GB Patriot ram @ 1333Mhz, Nvidia Quadro 1000m with 2GB GDRR3, Plextor M3 256GB SSD, 1080P wide color display, Windows 8 Pro
Media Center: Intel Core i5 760 @ 3.1Ghz, 4GB DDR3, Corsair GS600PSU, EVGA Geforce 550ti, EVGA P55 SLI, 3x 1TB raid 5, 1x 1TB boot drive, Windows 8 Pro, Win TV 950(USB), Pioneer BR.
Server: AMD Phenom X4 945 @ 3.0Ghz, MSI 790FX-GD70, 16gb ddr3 RAM @ 1333mhz, 2TB Seagate HDD, 64GB Patriot SSD, Asus Silent Gefore 210
The Green machine: AMD Sempron 145EE Unlocked and OC'd to 4.1Ghz, Gigabyte GD970A-DS3, 8GB ram @ 1600mhz, Nvidia 550Ti, Thermaltake BlueOrb, Antec EW385
Samsung Galaxy Nexus, Paranoid Android 4.2 Rom http://www.speedtest...d/315465831.png
0

#13 User is offline   BryanMeyers 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 5
  • Joined: 14-October 11

  Posted 15 October 2011 - 05:38 AM

I can't speak to the iTunes performance because I am not familiar with it's underlying organization. But I would tend to think it would be more intel optimized because it is Apple's native platform. We often forget that iTunes is available on a PC as a courtesy and not because they want to.

As for handbrake: Sandy-bridge has a built in x264 transcoder because of its onboard graphics. This is part of the reason that AMD has insisted that AMD graphics cards be used in the testing process. UVD 1 and higher have x264 transcoding capabilities and it has been demonstrated that the presence of even a mid-range AMD card ( 5750 or 6770 ) is capable of besting the sandy-bridge chips without the aid of the processor. It is as important to test the platform as it is to test the chip itself. Had a 6990 been running in the AMD test systems, a significant change in performance would have been observed. Even though AMD supports NVIDIA hardware, it is designed to run best with its own products. Had the FX series been an APU and not just a CPU, these benchmarks would provide a much different picture as well.

As for Intel optimizing their compiler for other architectures, the compiler may have been fixed, but the libraries are still crippled. It is still an issue in the newer libraries and for new instruction sets like FMA4 and XOP. There may never be support in the Intel libraries for those extensions because they are AMD specific and will never be implemented in Intel architectures. That said, even GNU isn't capable of handling the new extensions.

That means every benchmark does not express the true performance of floating point operations on Bulldozer. FMA4 is critical to proper scheduling on the new floating point unit. I will also point back to the improved integer performance. Every benchmark I have seen shows that integer performance is up on the last generation. If separate graphs had been used for floating point and integer performance in the benchmark reviews, the 8150 would have consistently been on the top for integer performance in almost every benchmark.
0

#14 User is offline   waldojim 

  • Elite
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 15,066
  • Joined: 29-October 08
  • Location:Texas

Posted 15 October 2011 - 11:35 AM

View PostBryanMeyers, on 15 October 2011 - 05:38 AM, said:

I can't speak to the iTunes performance because I am not familiar with it's underlying organization. But I would tend to think it would be more intel optimized because it is Apple's native platform. We often forget that iTunes is available on a PC as a courtesy and not because they want to.

As for handbrake: Sandy-bridge has a built in x264 transcoder because of its onboard graphics. This is part of the reason that AMD has insisted that AMD graphics cards be used in the testing process. UVD 1 and higher have x264 transcoding capabilities and it has been demonstrated that the presence of even a mid-range AMD card ( 5750 or 6770 ) is capable of besting the sandy-bridge chips without the aid of the processor. It is as important to test the platform as it is to test the chip itself. Had a 6990 been running in the AMD test systems, a significant change in performance would have been observed. Even though AMD supports NVIDIA hardware, it is designed to run best with its own products. Had the FX series been an APU and not just a CPU, these benchmarks would provide a much different picture as well.

Intel Quick Sync is very finicky. I would have to double check the platform used in testing, but I can say, without a doubt, that if a 'P' series chipset is used, then Quick Sync is immediately disabled. 'H' series support quicksync if no other video card is used, and only the 'Z' series would really let them get away with using quick sync, and a dedicated video card.

The real question though: how did the new 8 core chip end up slower than last generations, lower clocked, 6 core chip?

Quote

As for Intel optimizing their compiler for other architectures, the compiler may have been fixed, but the libraries are still crippled. It is still an issue in the newer libraries and for new instruction sets like FMA4 and XOP. There may never be support in the Intel libraries for those extensions because they are AMD specific and will never be implemented in Intel architectures. That said, even GNU isn't capable of handling the new extensions.

That means every benchmark does not express the true performance of floating point operations on Bulldozer. FMA4 is critical to proper scheduling on the new floating point unit. I will also point back to the improved integer performance. Every benchmark I have seen shows that integer performance is up on the last generation. If separate graphs had been used for floating point and integer performance in the benchmark reviews, the 8150 would have consistently been on the top for integer performance in almost every benchmark.

Benchmarks maybe. Real world use, not hardly.

In the real world, the new processor has been all over the place. Very specific mult-threading scenarios work out quite well for AMD. Getting up to i5 and almost i7 performance levels. The rest of the time, it lags anywhere from slow to what-the-hell-happened SLOW.

There is something terribly wrong with this chip design.
"There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'" -- Isaac Asimov

Lenovo W520 CTO Intel i7-2620m, 8GB Patriot ram @ 1333Mhz, Nvidia Quadro 1000m with 2GB GDRR3, Plextor M3 256GB SSD, 1080P wide color display, Windows 8 Pro
Media Center: Intel Core i5 760 @ 3.1Ghz, 4GB DDR3, Corsair GS600PSU, EVGA Geforce 550ti, EVGA P55 SLI, 3x 1TB raid 5, 1x 1TB boot drive, Windows 8 Pro, Win TV 950(USB), Pioneer BR.
Server: AMD Phenom X4 945 @ 3.0Ghz, MSI 790FX-GD70, 16gb ddr3 RAM @ 1333mhz, 2TB Seagate HDD, 64GB Patriot SSD, Asus Silent Gefore 210
The Green machine: AMD Sempron 145EE Unlocked and OC'd to 4.1Ghz, Gigabyte GD970A-DS3, 8GB ram @ 1600mhz, Nvidia 550Ti, Thermaltake BlueOrb, Antec EW385
Samsung Galaxy Nexus, Paranoid Android 4.2 Rom http://www.speedtest...d/315465831.png
0

#15 User is offline   BryanMeyers 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 5
  • Joined: 14-October 11

  Posted 15 October 2011 - 12:34 PM

Some of the stuff I've been reading today also points to issues with the Windows 7 kernel. I can say with certainty that the scheduling for the x64 kernel was not optimized for systems with more than 4 cores, although it can support more. The kernel used for server 2008 is a whole different bag of tricks, mind you.

I began to suspect this to be the case when the Phenom II x6 chips launched. Especially when performance between windows 7 and ubuntu 11.04 for the SPEC benchmarks provided very different pictures. I can speak to the consistency of compilation between Windows and Ubuntu because Intel was benching the same on both sides according to SPEC. This was not the case with the x6 processor which saw a near-linear increase in performance on Ubuntu, but saw diminishing returns on Windows.
0

#16 User is offline   waldojim 

  • Elite
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 15,066
  • Joined: 29-October 08
  • Location:Texas

Posted 15 October 2011 - 02:17 PM

View PostBryanMeyers, on 15 October 2011 - 12:34 PM, said:

Some of the stuff I've been reading today also points to issues with the Windows 7 kernel. I can say with certainty that the scheduling for the x64 kernel was not optimized for systems with more than 4 cores, although it can support more. The kernel used for server 2008 is a whole different bag of tricks, mind you.

I began to suspect this to be the case when the Phenom II x6 chips launched. Especially when performance between windows 7 and ubuntu 11.04 for the SPEC benchmarks provided very different pictures. I can speak to the consistency of compilation between Windows and Ubuntu because Intel was benching the same on both sides according to SPEC. This was not the case with the x6 processor which saw a near-linear increase in performance on Ubuntu, but saw diminishing returns on Windows.


The problem we are dealing with here, more than anything else, is how the cores represent themselves to Windows. Windows 7 uses core parking, and will attempt to load up 'real' cores before the 'virtual' cores. AMD chose to tell Windows it has 8 full cores on a system with only 4 complete cores. So Windows loads up 1, 2, and 3 on a lightly multi-threaded application, where the peak performance would be realized on 1, 3, and 5 OR 2, 4, and 6. That is where most of the multi-threading performance hit is taken. EG - Prime 95. Prime 95 creates multiple, independent, single core floating point workloads that will quickly and easily show the limitations of the new AMD processors. Interestingly, while Prime would be a perfect example, I have yet to see anyone test with Prime.
"There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'" -- Isaac Asimov

Lenovo W520 CTO Intel i7-2620m, 8GB Patriot ram @ 1333Mhz, Nvidia Quadro 1000m with 2GB GDRR3, Plextor M3 256GB SSD, 1080P wide color display, Windows 8 Pro
Media Center: Intel Core i5 760 @ 3.1Ghz, 4GB DDR3, Corsair GS600PSU, EVGA Geforce 550ti, EVGA P55 SLI, 3x 1TB raid 5, 1x 1TB boot drive, Windows 8 Pro, Win TV 950(USB), Pioneer BR.
Server: AMD Phenom X4 945 @ 3.0Ghz, MSI 790FX-GD70, 16gb ddr3 RAM @ 1333mhz, 2TB Seagate HDD, 64GB Patriot SSD, Asus Silent Gefore 210
The Green machine: AMD Sempron 145EE Unlocked and OC'd to 4.1Ghz, Gigabyte GD970A-DS3, 8GB ram @ 1600mhz, Nvidia 550Ti, Thermaltake BlueOrb, Antec EW385
Samsung Galaxy Nexus, Paranoid Android 4.2 Rom http://www.speedtest...d/315465831.png
0

#17 User is offline   karthiq 

  • Expert
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 1,331
  • Joined: 04-August 10

  Posted 15 October 2011 - 11:10 PM

I dint know PC processors alone cost hundreds of dollars....I am not much into hardware :)

No wonder PC makers are struggling with their profit margins and companies like HP trying to sell off their PC business.

And with win 8 doesnt necessarily requiring people to buy a new PC unlike win 7,the industry has a very bleak future.
0

#18 User is offline   Adame2qf 

  • Newbie
  • Pip
  • Group: New Member
  • Posts: 3
  • Joined: 12-October 11

  Posted 16 October 2011 - 05:43 AM

Who still uses iTunes for compression? I cannot remember the last time I bought a CD, or borrowed one from the library.
0

#19 User is offline   waldojim 

  • Elite
  • PipPipPipPipPipPipPipPip
  • Group: Members
  • Posts: 15,066
  • Joined: 29-October 08
  • Location:Texas

Posted 16 October 2011 - 12:10 PM

View PostAdame2qf, on 16 October 2011 - 05:43 AM, said:

Who still uses iTunes for compression? I cannot remember the last time I bought a CD, or borrowed one from the library.


People like me, who still prefer quality over quantity. I rip using the absolute highest quality settings available. I do not buy 160Kb/s audio because they sound like garbage.
"There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.'" -- Isaac Asimov

Lenovo W520 CTO Intel i7-2620m, 8GB Patriot ram @ 1333Mhz, Nvidia Quadro 1000m with 2GB GDRR3, Plextor M3 256GB SSD, 1080P wide color display, Windows 8 Pro
Media Center: Intel Core i5 760 @ 3.1Ghz, 4GB DDR3, Corsair GS600PSU, EVGA Geforce 550ti, EVGA P55 SLI, 3x 1TB raid 5, 1x 1TB boot drive, Windows 8 Pro, Win TV 950(USB), Pioneer BR.
Server: AMD Phenom X4 945 @ 3.0Ghz, MSI 790FX-GD70, 16gb ddr3 RAM @ 1333mhz, 2TB Seagate HDD, 64GB Patriot SSD, Asus Silent Gefore 210
The Green machine: AMD Sempron 145EE Unlocked and OC'd to 4.1Ghz, Gigabyte GD970A-DS3, 8GB ram @ 1600mhz, Nvidia 550Ti, Thermaltake BlueOrb, Antec EW385
Samsung Galaxy Nexus, Paranoid Android 4.2 Rom http://www.speedtest...d/315465831.png
0

#20 User is offline   xyberviri 

  • Senior Member
  • PipPipPipPipPip
  • Group: Members
  • Posts: 662
  • Joined: 15-March 10

  Posted 17 October 2011 - 08:04 AM

Until Multi-Threaded/Multi-Core Architecture becomes mandatory at the application level or natively supported at the kernel level of the operating system, these advancements mean little.

Specifically speaking as a gamer. most games out there aren't multi core optimized, they say they are but that just means picking the core with the lowest overhead.

Like having 3k cores at 1mhz isn't the same as 1 core at 3.0ghz
0

Share this topic:


  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users