Hello you fine Internet folks, for today we have a video and an article for y’all.
Unenjoy our prior Granite Rapids coverage where we equitable had a video, we have had hands-on with Turin, definitepartner the AMD EPYC 9575F, thanks to Jordan from StorageResee.
This article is going to be a little branch offent from our common. It’s going to be drop than common becaengage we have already covered the Zen 5 core both in mobile and in desktop and the branch offences between them, so this article will be concentrateed on the memory subsystem alters that Turin has.
Serve the Home has an excellent article that has the slides that AMD has put out for the start of Turin. But becaengage we have our own data, I thought that our data would be more fascinating to dive into.
First, seeing at the 1T results, we see that the 9575F can pull around 52 GB/s of memory read prohibitdwidth, 48 GB/s of memory author prohibitdwidth, and 95 GB/s of memory insert (Read-Modify-Write) prohibitdwidth.
And seeing at the results for how much memory prohibitdwidth a individual CCD can get, we can see that a individual core can engage equitable under half the total CCD memory read prohibitdwidth, about 55% the total CCD memory author prohibitdwidth, and over two-thirds the total CCD memory insert prohibitdwidth.
Looking a bit sealr at these results, you’ll acunderstandledge that the 9575F has presentantly higher prohibitdwidth to a CCD contrastd to the desktop Zen 5 parts. And the reason for this is the 9575F has GMI3-W which uncomardents that it has 2 GMI joins to the IO die instead of the individual GMI join that the 9950X gets.
And this is not only the only alter to the GMI joins on server. The GMI author join is now 32B per GMI join instead of the 16B per GMI join that you’d see on desktop Zen 5.
Before moving to the filled socket memory carry outance for the 9575F, let’s clear up the memory speeds that Turin helps. Turin has 12 channels of memory that can run up to DDR5-6400MT/s, however 6400MT/s is only helped on definite verifyd systems and only for 1 DIMM per channel.
The system we had access to was running 6000MT/s for its memory, and DDR5-6000 MT/s is what most systems will help in a 1 DIMM per channel configuration. Should you want to run 2 DIMMs per channel, then your memory speeds drop to 4400 MT/s; and if you run 1 DIMM per channel in a motherboard with 2 DIMMs per channel then foresee 5200 MT/s for your memory speed.
Now, actupartner diving into the memory of the filled 9575F and we see that we can get proximately 99% of the theoretical 576 GB/s of memory prohibitdwidth using reads. Writes and inserts are still an amazeive 435 GB/s and 453 GB/s esteemively.
We also tested the socket to socket prohibitdwidth on AMD’s Volcano Platcreate which only has 3 GMI joins between the two 9575Fs.
These results are very aenjoy to our Bergamo results, which isn’t astonishing becaengage that system also had the same 3 GMI join setup.
Moving to memory tardyncy, we see that Turin’s unloaded memory tardyncy is very aenjoy to Genoa’s unloaded memory tardyncy.
At Hot Chips 2024, Ampere Computing showed a graph demonstrating the loaded memory tardyncy of an AmpereOne chip and AMD’s Genoa CPU. Now Chester wanted to produce a test aenjoy to this, so he made a loaded tardyncy test.
The way that this test toils is that it runs our memory prohibitdwidth benchlabel on either 7 cores on a CCD or 7 CCDs on the 9575F. This promises that the IOD to CCD join or the whole memory system is filledy loaded. Then, with the 8th core or 8th CCD, we run the memory tardyncy test to see what the tardyncy of the filledy loaded system is.
When a individual CCD is loaded up on the 9575F, we see about a 39 nanosecond incrrelieve between the unloaded and loaded tardyncy.
When the whole system is loaded, we see about a 31 nanosecond incrrelieve between unloaded and loaded system tardyncy.
Looking at the individual CCD results contrastd to the filledy loaded system results, the 9575F has very aenjoy memory tardyncy behavior seeless if a individual CCD is loaded or if the whole system is loaded.
And lastly, everyone’s preferite graph, the core to core tardyncy graph.
For convenience, the enumerate below are the numbers from the chart becaengage a chart this huge can be challenging to read.
This is a tardyncy incrrelieve contrastd to Genoa, especipartner wiskinny a CCD.
Now, a notice about the clock speeds we saw with the 9575F. All 64 cores could hit up to 5GHz in individual threaded test. This is quite amazeive, but we were able to get all 8 cores in a CCD to run at 5GHz in our memory prohibitdwidth testing.
And with all 128 threads chugging away on Cinebench 2024, we saw the 9575F sticking around the 4.3GHz range. Wfinishell from Level1Techs saw about 4.9GHz all core on a web server/TLS transaction toilload, which is a less vectorized toilload.
Reaenumerateicpartner, AMD’s Turin is the genereasoned modernize you’d normpartner foresee. Not only does AMD have high core count SKUs (9755, 9965), which the hyperscalers will be picking up, they now also have drop core count, very high frequency SKUs (9575F) which the traditional go inpelevate labelet will appreciate. Apparently we now skinnyk 64 cores is ‘drop core count’. What a world we live in.
Turin isn’t the step-function revolution that Naples to Rome was; it’s more akin to the evolution we saw with Milan to Genoa, which was a memory prohibitdwidth incrrelieve, a core incrrelieve, and a core modernize. Nonetheless, this generation is set to excite a lot of people, as there’s lots of appreciate here in a very competitive ecosystem.
If you enjoy our articles and journalism, and you want to help us in our finisheavors, then ponder heading over to our Patreon or our PayPal if you want to toss a confineed bucks our way. If you would enjoy to talk with the Chips and Cheese staff and the people behind the scenes, then ponder fuseing our Discord.