Ryzen vs i7 (Mainstream); Threadripper vs i9 (HEDT); X299 vs X399; Xeon vs Epyc

Discussion in 'Hardware Components and Aftermarket Upgrades' started by ajc9988, Jun 7, 2017.

  1. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    6,592
    Messages:
    19,933
    Likes Received:
    24,745
    Trophy Points:
    931
    Yeah, I was hoping to redirect the "Forget Insecure Intel, Let's all buy AMD CPU's!" OT posts out of that thread and bring them over here where it's on topic...

    I might as well point out the NetCAT news:

    CPU Vulnerabilities, Meltdown and Spectre, Kernel Page Table Isolation Patches, and more
    http://forum.notebookreview.com/thr...atches-and-more.812424/page-128#post-10950250
     
    Last edited: Sep 13, 2019
    ajc9988 likes this.
  2. ole!!!

    ole!!! Notebook Prophet

    Reputations:
    2,184
    Messages:
    5,697
    Likes Received:
    3,610
    Trophy Points:
    431
    what is the memory penalty of having quad channel vs duel channel on your current TR system do u know?

    also is it possible for you to go 3 channels instead
     
  3. ajc9988

    ajc9988 Death by a thousand paper cuts

    Reputations:
    1,526
    Messages:
    5,688
    Likes Received:
    8,028
    Trophy Points:
    681
    The penalty can be very severe in some cases. Try changing the interleaving on a threadripper to use half the bandwidth on memory to do AES. It cuts performance drastically. I even told you encryption was part of my interest and showed my benchmarks on it for sisoft. So go on. You aren't worth dealing with. I've tried multiple times to explain things to you.

    And 3 channels was used LONG ago. Look up old 6-dimm motherboards. That is why Intel currently uses 6-channels on their big socket server chips.
     
    Last edited by a moderator: Sep 14, 2019
  4. ole!!!

    ole!!! Notebook Prophet

    Reputations:
    2,184
    Messages:
    5,697
    Likes Received:
    3,610
    Trophy Points:
    431
    i forgot the numa stuff with first and 2nd gen TR due to it's design. im more looking for the memory penalty for dual channel vs quad channel, like on intel's HEDT system and in this case TR3000.

    TR3000 will have similar chiplet design as 3900x so that means with higher memory latency by default compare to zen+. going quad channel means more latency than dual channel but i wish to find out how much of a penalty there is with TR3000 quad vs dual and intel HEDT quad vs dual.
     
    Last edited by a moderator: Sep 14, 2019
  5. ajc9988

    ajc9988 Death by a thousand paper cuts

    Reputations:
    1,526
    Messages:
    5,688
    Likes Received:
    8,028
    Trophy Points:
    681
    So you will ignore the performance for a figure that is divorced from performance overall (latency), while not understanding the impact of the huge L3 cache nor the fact bandwidth impact is task dependent and keeping the cache fed. If you want to know, test it yourself. As I've said, I have tried to explain the importance of the different subsystems and their interplay to you before. The fact you are now not understanding what I've talked to you about before, while trying to ask a question so ambiguous that I cannot give an answer because you don't have all the necessary information shows you seem to have ulterior motives here.
     
  6. ole!!!

    ole!!! Notebook Prophet

    Reputations:
    2,184
    Messages:
    5,697
    Likes Received:
    3,610
    Trophy Points:
    431
    lets not put word in my mouth. regardless of the penalty of architecture and who gets better advantage, i simply wish to know what the penalty is going from channel 2 to channel 4 on both intel and amd system. this is why me and @tilleroftheearth have such hard time talking to you because you keep doing the strawman argument.

    i asked you because i know you have TR1 system and im asking your assistance to check memory latency on your quad channel, then test it in dual channel to see the difference. if you donno or dont wish to answer, just say so and i'll find out myself or i'll ask others. stop accusing of others what they havent do or what they may do, stop putting words in other's mouth, stop overthinking it and sometimes take what we ask and say at face value, it'll help you a lot.
     
    tilleroftheearth likes this.
  7. ajc9988

    ajc9988 Death by a thousand paper cuts

    Reputations:
    1,526
    Messages:
    5,688
    Likes Received:
    8,028
    Trophy Points:
    681
    Go look up my AES scores in geekbench 5. There is NO penalty going to quad channel, tons of penalty going to dual channel. The specific AES in that test is implemented in bitlocker, among other programs. You'll see a 12.5GB/s and around 25GB/s scores. In fact, their implementation this time lowered my mem bandwidth for that test from GB3 being 33GBps, GB4 was 30GBps, and GB5 is 25GBps while 7980XE is getting 33GBps without having as fast of ram on it's system. But that can also be effected by software optimizations as well. As i said, whether bandwidth is utilized depends on the task, but how well it utilizes memory calls to keep cache fed is also driver and software optimization related.

    Now if you are talking non-linear scaling, referring to rank stress on the memory controller, etc., then that is a deeper discussion.

    But since I already told you you didn't give enough information for me to answer and wanting me to guess what you are looking for for an answer, go find it yourself!
     
    ole!!! likes this.
  8. ole!!!

    ole!!! Notebook Prophet

    Reputations:
    2,184
    Messages:
    5,697
    Likes Received:
    3,610
    Trophy Points:
    431
    i always thought going from dual channel to quad would increase latency cause of overhead. kinda like SSD raid, dual core has higher single core latency than a single core cpu, bonded internet vs single loop etc etc.

    are you sure theres 0 penalty or the penalty is too small to factor much
     
  9. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,329
    Messages:
    11,800
    Likes Received:
    9,732
    Trophy Points:
    931
    I looked it over, and I seem to remember this is the guy who did the original testing, but a couple of things he said that make no sense to me...

    He went on about thread limiting before, but now says it's completely unnecessary. I don't understand that bit, and I would really have liked to see a comparison of how his 7980XE did against the 3900X directly, and if the 3900X (as he makes it sound) overcomes problems his 7980XE has (which also confuses me considering quad channel benefit on the latter over dual channel only on the former). He also said that "it can stream anything out the box without issue or tweaking", but then also says "if you do 1080/60 streaming it's going to run the CPU at full speed and you are going to need better cooling". That one's also pretty weird, since... first of all, duh, and second of all, if it really is maxing out the load the games should be suffering a fair amount, and I KNOW that AMD has bigger trouble with load balancing than intel does when games want high single thread usage and a large CPU load is being used due to much lower single core speeds, which... seems to be the reverse case for him? I really want to see a head to head between the two. Especially with utilization data in realtime as he tests (like an afterburner layout properly done and showing per-thread utilization).

    I still do feel NVENC is the better way to go overall, though. I don't think my stance on that one is going to change anytime soon, but I'd really like to see a head to head "this one wins where this one loses" even if he doesn't understand the reason. He knows a fair bit about streaming, I will not discount that. One of the most knowledge-able people about the technicalities I've come across, but he makes some weird statements/choices to me.
     
    hmscott and ajc9988 like this.
  10. ajc9988

    ajc9988 Death by a thousand paper cuts

    Reputations:
    1,526
    Messages:
    5,688
    Likes Received:
    8,028
    Trophy Points:
    681
    This first one is with the interleaving setup wrong resulting in a reduction in performance (it acting like dual channel UMA on the test). This was right after I had flashed the newest BIOS and was setting things back up and had setup the channel interleaving wrong in the BIOS. This is not setting it purely up for dual channel in the way going to "game mode" does within Ryzen Master, but it is pretty close. Here is also after the interleaving was correctly set in BIOS so that it was fully utilizing the quad channel memory.
    upload_2019-9-14_5-0-59.png



    Geekbench removed the memory testing from this version of their benchmark. But you can see it vary per test on which tests are more memory sensitive and which ones are not. Here, AES-XTS, Machine Learning, and speech recognition are all HEAVILY hit by the memory settings and bandwidth. Lesser effected tasks, as coded by Geekbench, include text rendering and navigation. But beyond that, such as text compression, horizon detection, image inpainting, and HDR, you see it trail off, being within 1/6th of the score. Those likely are more about the boost the bandwidth gives by keeping the cache fed, but not so much that it is starved like the other two categories I mentioned. Many of the rest are margin of error type stuff, where so long as the cache is efficiently fed, it is negligible (except maybe camera, but I haven't dove so deep in analyzing this new benchmark yet to rundown all of its nuance).

    Single core performance is different. Why? Because it doesn't have to split the bandwidth between the cores. It only has to keep the active core alive and well fed, not splitting up the bandwidth between many cores (hence why I mentioned the bandwidth per core previously before you attacked me and I wrote you off).

    upload_2019-9-14_5-9-21.png
    https://browser.geekbench.com/v5/cpu/compare/25528?baseline=25528
    As can be seen is there are some effects from quad channel relating to the overhead you mentioned, but the vast majority are de minimis. Now, this also depends on how you have the ram setup for interleaving. That is where a discussion of channel and rank interleaving comes in. Let's say you are comparing channel interleaving between a dual channel and a quad channel set, both with single rank DIMMs. You will be interleaving between memory channels more with the quad channel, which can combat certain overhead, but with Zen and Zen+, you have to then use the farther memory channels, which adds to the latency (latency is roughly standardized accessing memory on Zen 2, so it is the same regardless of channel used, roughly, compared to prior gens). When you start using dual rank on dual channel, then use rank interleaving, you are now writing to four alternating ranks on the memory, with operations being able to access the other rank while the one just accessed is recovering. It is closer to what is seen with the quad channel memory with single rank, but without the near/far dynamic of earlier Zen and Zen+. But, the point is when the bandwidth is not so split and the cache is being kept fed by not splitting the bandwidth among the majority of cores, the performance benefit closes or shifts. On multi-tasking, depending on the task and memory needs, with high core count activation, there is without a doubt a benefit to extra memory bandwidth on specific workloads. This comes down, in part, to what the task is, how the software is optimized, and bandwidth per core.

    I also was trying to explain the nuance of how mainstream systems with 8-cores and dual channel would have roughly the memory bandwidth per core of a 16-core system with 4-memory channels, but a quad-core with dual channel memory would have double the memory bandwidth per core, just like a 16-core with 8-memory channels would have. And that is where you got rude.

    Now, as you can see, once a single core's memory bandwidth to keep the cache fed is satiated, that extra bandwidth doesn't mean ANYTHING. It is useless unless needed for a specific task. At that point, if the programs you use are not memory bandwidth limited, you would do better with taking less memory bandwidth with the mainstream platform dual channel, as many of those CPUs, especially on the Intel side, have higher boost speeds or can be overclocked higher, leading to more performance (single and lightly multi-threaded workloads benefit more from the frequency at that point than trying to get increased memory bandwidth). But because this is task specific and down to the programming of each program being evaluated, leaving it open ended in questions on which is better can create a situation where you get an "it depends" answer, something that is no help to anybody.

    So I'm pretty sure on this one and gave a more nuanced answer with pictures to show that your instincts are correct for single threaded workloads (where the effects you spoke of would appear), but that for multi-tasking and heavier workloads, those worries can disappear in quite significant ways depending on the task.

    There are a couple points that might make his findings easier to understand:
    1) AMD's task balancing, specifically on streaming, sides towards devoting more CPU resources to the encoding/streaming side than Intel. This is why, in reviews of Zen and Zen+, AMD would often get hit worse on frame rates on the streamer side (actual game play) than Intel would. But, at certain loads, it could do better at not losing frames for the viewer side. I am assuming this has carried forward in some way to Zen 2. This behavior is also seen in some multi-tasking benchmarks for the release of Zen and Zen+, where for certain workloads TR would chew through both by allotting resources better for that multi-tasking, whereas other times Intel just laid the smackdown with multi-tasking.
    2) AMD worked with Microsoft to change the scheduler behavior on AMD CPUs. This was to address the thread thrashing as well as thread propagation, making new threads use cores on the same CCX to lower the latency of cross-CCX communications, which is especially important since all inter-CCX communications now go through the I/O die. The scheduler changes would effect certain tasks, although I do not have a Zen 2 chip to check if streaming is one of the tasks benefited.
    3) Some types of rendering are performed in cache, some from memory, some from I/O (meaning storage). Tile based rendering, for example, is done primarily from cache for some of it, which is something AMD excels at, hence good performance in things like V-Ray and Cinebench. Other rendering, it isn't as clear. If you look through Puget System's testing of the 3900X, there are times, even in Photoshop, that is now beats the Intel 9900K. They even recommend it in some cases for Premiere and Resolve over the Intel CPU.

    This gets to the above discussion, in part, on bandwidth keeping the cache fed. There is a point on some programs where the issue isn't memory bandwidth, and in those cases the extra memory channels will not show a significant uplift in performance, instead favoring core speed and IPC. Which brings me to point 4.
    4) AMD has a higher IPC than Intel's current chips. So at 4.15-4.2GHz, AMD's CPU is performing closer to Intel's chips at 4.4-4.6GHz. I don't know what he has his 7980XE overclocked to, if at all. But if it isn't overclocked at all, then the performance per thread would be higher on the 12-core than the 18-core. If it is, the single thread performance is roughly tied, but as mentioned in point 1, AMD devotes more resources to the encoding side. So if the performance has finally reached the threshold to not ding frame rates too heavily for the streamer, that really can be a boon for streaming. Further, with 6 fewer cores and 12 fewer threads, limiting OBS is less needed since it is sharing the resources with a game, meaning the game is taking enough threads to keep it from hitting the thread cap on OBS, thereby potentially reducing the setup time and the need to limit threads.This also would be related to the scheduler changes made in point 2.

    I do agree, I would like to see some hard data on the topic from him and would love to see the head to head comparison to better know what he is seeing. He may do that closer to the 3950X launch, which I hope AMD seeds him with a chip for a review.

    And I can agree on both the NVENC statement and wanting the head to head. I know he is knowledgeable, but also I am not knowledgeable enough to fully question his choices (hence why I like to have you chime in on the topic because I know you are WAYYYY more knowledgeable on streaming than I am).
     
    Last edited: Sep 14, 2019
    hmscott likes this.
Loading...

Share This Page