AMD's Ryzen CPUs (Ryzen/TR/Epyc) & Vega/Polaris GPUs

Discussion in 'Hardware Components and Aftermarket Upgrades' started by Rage Set, Dec 14, 2016.

  1. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    6,102
    Messages:
    19,270
    Likes Received:
    23,984
    Trophy Points:
    931
    Ryzen 3000-Series EXPLAINED! Time For Great AMD Notebooks?
    HardwareCanucks
    Published on Apr 9, 2019
    The AMD Ryzen 3000 series will be broken into two very different architectures: one for the desktop (Zen 2) and another for notebooks and Chromebooks (Zen+). Let's go over some of the differences with the AMD mobile / notebook processors first before getting into the upcoming desktop series sometime in the near future.


    andi klein 1 month ago
    "Hope, that Lenovo brings out a premium AMD Ryzen-H Thinkpad."

    Jake Ross 1 month ago
    "The AMD notebooks we already have would be great, if the cooling were better. Current Ryzen APUs are pretty stellar. It's a shame the industry has gotten so thin and weak."

    WiseSilverWolf 2 days ago
    "Ryzen 3000 is taking too long :( Navi is taking too long :( it feels like AMD is moving at a snails pace for new releases."
     
    custom90gt likes this.
  2. ajc9988

    ajc9988 Death by a thousand paper cuts

    Reputations:
    1,442
    Messages:
    5,462
    Likes Received:
    7,769
    Trophy Points:
    681
    This is a good question, but takes some nuance to explain.

    First, let's start by examining why the WX Threadripper series has memory issues, both the claims and the truth. Then, we will examine the structure of the upcoming chips related to the memory controller and CPU core dies. Then, we shall finish up by addressing the rumors and why they are misinterpreting the data.

    To start, Threadripper WX was designed so the memory controllers on two of the four dies were shut off. That meant that every time it needed a memory call, it had to go over the IF, then out to memory, then round trip. This meant that you had two IF crossings as well as the memory crossing. At first blush, this seems roughly like the two with mem controllers using the controller on the other chip (and for the most part, it is). But, this also means that the "average" memory call is higher than that of the chips with direct memory access.

    Now, they first espoused that these chips did not have enough memory bandwidth to support 24 and 32 cores. This, to a degree, is true, but not entirely. Through later experimentation, it was shown that a 32-core 7551P showed the exact same behavior as its quad channel counterpart. So although there is a chance that the CPU would benefit with more bandwidth of memory per core, that wasn't the cause of the problem.

    So, what was causing the problem. First I will cover a less discussed issue which I feel is a problem, but has not been confirmed, then I will cover the actual problem that Level1Techs and Anandtech addressed with CorePrio and the scheduler. My hypothesis is that there was an issue of stale data. What do I mean by that? Well, for cross core communications, each CCX is connected to its mirror core CCX. So, if you needed data from a core that was not on its mirror CCX, you would have to jump to the corresponding CCX, then use the infinity fabric to go to the other CCX, then round-trip it. That is 4 Infinity Fabric crossings, resulting in a core to core communication in the area of 180-230ns. Considering the speed of cycles, plus when it would then need to do the memory jump as well for something to complete it's calculation, by the time it was done, there are times that the data it was working on was no longer needed. This then requires for the data to be retired, etc. Now, the effects of stale data can be combated with a good scheduler with node awareness. And this is where the confirmed problem lies (although it should be noted that AMD said this is not the whole of the problem, but is related). What was found is that Microsoft's scheduler has a problem. It wants to move all the processing onto the main core. Because it is constantly moving the threads to core 0, you get a problem of thread thrashing. It inefficiently keeps moving tasks from other threads, and sometimes other dies, back to core 0. This causes a slowdown in processing that destroys efficiency. But, the problem doesn't stop there. Microsoft's scheduler does not have good NUMA node awareness. What I mean by that is the scheduler is designed more for 2P systems and was modified for an old Intel chip where there was a shared memory controller, but two core dies on chip. So, Microsoft's scheduler was made to overflow 1 node, meaning that if one set of core dies is filled, it can schedule over onto another node. That seems fine until you realize that there are 4-nodes on this one chip. Because of that, the scheduler could not properly schedule for the tasks to be handled on a chip with 4-dies. To address this, AMD has the dynamic local mode cutting the cores in half and CorePrio to try to reset the thread scheduler to help.

    Now that we have discussed the issue in more depth, let's look at what the new chips bring to the table. First, the I/O chip moved the memory controllers off the core chip die. That means for every memory call, you will have to go over Infinity Fabric. Second, they improved the memory controller and can bin the I/O dies, which means that higher in the stack they are likely to have better memory controllers, same with core dies. Third, they doubled the bandwidth of IF with gen 2, along with lowering the IF2 latency. Fourth, on Epyc chips, they standardized the latency more between the chips, which means certain latencies will be higher, other latencies will be lower, but you will have fewer issues with stale data. They also worked on the retire pipeline on Zen 2 (which the standard latency and changes in chip prediction and store/retire pipelines point to addressing my critique on stale data above). There are still questions surrounding core die to core die comms or if they all travel through the I/O chip. We should know more on that either in a week OR we may have to wait for Hot Chips in August to get the nitty gritty details on it (please, AMD, record and put that Keynote on the AMD Youtube page!!!).

    So, now to the rumors and leaks. Because you have to go over IF for EVERY memory call, people assume the additive latency of a memory call will be higher than that of the first gen and second gen Ryzen chips. It is a logical assumption since you have to go over both to get the data. What they miss is that with the lower latency IF2 and the improved memory controller, the latency does not necessarily have to be higher, and could be lower.

    AdoredTV awhile ago analyzed a sample on UserBench which had the low latency out to 16MB (showing that the L3 cache was larger), but also showed that with a single channel filled, the latency was around 100ns, which is higher than first and second gen Ryzen. With the recent leak on the 12-core with dual-channel memory configuration, using memory which seemed to be at stock around 2666, we saw a latency of 80ns, which is in line roughly with the first and second gen Ryzen memory latency until memory is overclocked and timings are tightened (my latency can be between 58ns and 68ns, depending on timings and settings on my 1950X). Now, the new chips will also allow for the IF2 to run at half the rate of single rate on memory (so 2666 would be 1333 single rate rather than the double, and half of that would be what IF2 would run at). Now, with slower speeds, you lower bandwidth and increase latency. But, because of this, you can now clock your memory higher, like 4000MHz. So you would be lowering the real latency in ns of the memory, but increasing the latency off the IF2. So the question is, on balance, will the faster ram offset the latency increase on IF2? But, either way, it means that IF2 will not hold back the memory clocks.

    To be frank, the assumptions that the latency on Zen 2 will increase is based on a logical deduction, but without examining features of the new processors which cut against the increased latency hypothesis. Because of that, don't worry about them until we have hard data in hand, which will be between 1 and 6 weeks.
     
    Deks likes this.
  3. TANWare

    TANWare Just This Side of Senile, I think. Super Moderator

    Reputations:
    2,421
    Messages:
    9,178
    Likes Received:
    4,439
    Trophy Points:
    431
    You have to remember too that if IF is halved to allow 5Ghz memory it will increase latency as well. As far as waiting, well we need the hardware and some analyzation time. It very well could end up TR3 has even worse latency issues, maybe not on a 32 core dual CCX but on any true quad CCX (another reason for delays?).

    I honestly think the Zen2 will not be bringing the shovel to Intel's grave as many hoped. It hopefully will be an improvement but I do not see it being a major game changer. Now Epyc and the extreme amount of cores, that could very well be a game changer.
     
    ajc9988, tilleroftheearth and hmscott like this.
  4. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    6,102
    Messages:
    19,270
    Likes Received:
    23,984
    Trophy Points:
    931
    IDK if AMD is thoughtless enough to "increase" latency beyond reasonable / acceptable limits as part of improving memory performance - I'm sure AMD is hyper-sensitive to latency issues given the Ryzen 1/1+ buggaboo with latency being the limiting factor for gaming performance.

    Also, the only reason I can think of for Threadripper being delayed is the reason AMD has already stated as being a limiting factor for 7nm production - AMD has to allocate products from the same 7nm production pool, so some things have to come before others.

    Threadripper releases so far has lagged behind Ryzen CPU release already in the last 2 releases so it makes sense ThreadRipper 3 will "lag" behind Ryzen 3 as well.

    Except for AMD trimming off the Threadripper 3 end of the product release map, nothing has been officially said about ThreadRipper 3, so maybe we should just be patient and let AMD work their magic?

    As for me, I'm hoping for this kind of magic with Navi+Ryzen 3 to continue to unfold: AMD multi-track drifting through Intel and Nvidia - poof their gone.JPG
     
    Last edited: May 27, 2019
    ajc9988 likes this.
  5. Papusan

    Papusan JOKEBOOK's Sucks! Dont waste your $$$ on FILTHY

    Reputations:
    20,687
    Messages:
    22,471
    Likes Received:
    38,530
    Trophy Points:
    931
  6. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    6,102
    Messages:
    19,270
    Likes Received:
    23,984
    Trophy Points:
    931
  7. TANWare

    TANWare Just This Side of Senile, I think. Super Moderator

    Reputations:
    2,421
    Messages:
    9,178
    Likes Received:
    4,439
    Trophy Points:
    431
    AMD probably knows how much the new Epyc will be in demand. Where before it only commanded a small takeover in server market share it may soon command a much larger stake. This will mean a lot of CPU's at 4 CCX per CPU. Just thinking of the few super computers announced the FAB's will have to be running fast and furiously.

    Whenever you half the speed there is the latency add or the step down and then the added latency of the slower access speed after the step down. It is inevitable but hopefully the higher RAM speed would help somewhat.
     
    ajc9988 and hmscott like this.
  8. bennyg

    bennyg Notebook Virtuoso

    Reputations:
    1,433
    Messages:
    2,226
    Likes Received:
    2,191
    Trophy Points:
    181
    Oooooof shots fired, 16 core Zen2 @ 4.2Ghz scores 4,278 in R15!!

    So says Jum from Adoredtv (claims to have screenshot, but doesn't show)

     
  9. Talon

    Talon Notebook Virtuoso

    Reputations:
    1,129
    Messages:
    3,114
    Likes Received:
    3,540
    Trophy Points:
    331
    ajc9988 likes this.
  10. Papusan

    Papusan JOKEBOOK's Sucks! Dont waste your $$$ on FILTHY

    Reputations:
    20,687
    Messages:
    22,471
    Likes Received:
    38,530
    Trophy Points:
    931
    [​IMG]

    Rumored AMD Navi Radeon RX 3080 Branding Is Risky And Ridiculous Hothardware.com | May 21, 2019

    One of the more recent rumors zeroed in on a $330 price tag for the alleged Radeon RX 3080 XT with performance that would match the NVIDIA GeForce RTX 2070. Considering that the GeForce RTX 2070 retails for around $500, that would make for a powerhouse performance bargain. But reality has sunken in and there's no way that AMD could possibly undercut NVIDIA by that much on its new 7nm process node... even if it was having remarkable success with initial yields.
     
    Zymphad likes this.
Loading...

Share This Page