By now, you probably heard about the Meltdown and Spectre family of
vulnerabilities, and probably are quite busy dealing with the aftermath
(evaluating patches, applying patches, reversing patches). Also, you are
probably wandering how on earth you will get back the performance that your
datacenters lost due to the patches. Here we will explore a few ways to squeeze
more performance out of your existing infrastructure to regain the performance
you lost to Meltdown/Spectre, and even get a little bit more to reduce the need
to buy more servers. This advice applies to Datacenters big or small. And by
the way, many of these tips will make you look great in front of your CEO, CIO
and CFO!
Brief
recap of Meltdown and Spectre.
In order to understand why Meltdown and Spectre are important, reduce performance, and how to regain some of that performance, is important to do a brief, focused recap.
Meltdown (CVE-2017-5754) and Spectre (CVE-2017-5715, CVE-2017-5753) are an industry-wide “family” of vulnerabilities that affects to varying degrees many processor architectures from many manufacturers. As far back as 1996 the possibility of exploits like these (on X86-32/64) was pointed out [1]. But it was not until 2016 that a practical way to do it began to emerge [2].
Meltdown affects all Intel Processors with Out-of-Order-Execution (OOE) and, more importantly, Speculative-Execution, perhaps going back to the Original PentiumPro, and all Atom processors made after 2013 (the original Atoms were In-Order-Execution). AMD processors are immune [3], and Via (remember Via?) has remained silent. Meltdown also affects other µarchitectures, like several ARM processors, including the up-and-coming Cortex-A75 (intended for datacenter use), as well as many others used in cellphones and appliances [5], also IBM’s POWER7+, 8 and 9 are affected [4]. But this paper is not concerned with other architectures.
Spectre is an industry-wide “family” of vulnerabilities with two variants (so far), that affects pretty much all microprocessors with OOE in one or both forms. X86-32/64 processors from Intel, AMD [3], and perhaps Via. Other architectures are also affected, like ARM [5], IBM Power7+, 8 and 9 [4], SPARCv9 [6]. But, again, this paper is not concerned with other architectures.
You need to patch. No ifs (the guys telling you to use firewalls and evaluate any loss of performance vis a vis security requirements are lawyers, not engineers).
Do not pay attention to people saying to evaluate risk versus performance in order to decide if to apply the patches or not, nor to appliance sellers saying that no patches are coming because “we only run our own code”. Meltdown and Spectre can be combined with other vulnerabilities to inject code, worm like [7][8]. Also, there are known cases of tampering with the build process or update channels of software makers, to inject contaminated (and signed) code [9] [10], so, even if you only use software provided by trusted entities, or made by the maker of the appliance, you are still at risk. Patch, and pressure your suppliers to provide patches! You do not want to become collateral damage of a war between two nation states just because you happen to use a specific appliance. Let Stuxnet be a warning.
Of course, right now patches, specially microcode patches, seem to be unstable. Is fair to evaluate the patches, or give them a little time to settle and mature. But eventually you need to patch. Do not, under any circumstances, declare that a machine which has patches available will not be patched.
All vulnerabilities exploit OOE using variable mechanisms to read or guess kernel data that should not be readable. This data may include sensitive passwords and cryptographic keys. Proof of Concept code does exist, but it has not been weaponized as of this writing. Mitigations imply certain measures that significantly reduce performance. The more your workload calls the kernel, the more your performance is affected. But there are some modern features and instructions that can be used to mitigate this performance hit, so, if your processor is older than say, Haswell, the more performance you lose. [11]
In order to fix Meltdown, one has to resort to a technique called Kernel Page Table Isolation. Problem is, once you separate the page tables of the Kernel and the apps, every time you switch from one mode to another, you flush a small (but critical) cache called the Translation Lookaside Buffer (or TLB). There are a feature and an instruction in Intel CPUs that reduce the need to flush all the TLB every time one switches from user mode to kernel mode (and vice-versa), the feature is called PCID (Process Context ID) and the instruction is called INVPCID (invalidate PCID). The first processor to have both was Haswell, and both are needed for patches that have a smaller performance hit to work, otherwise, you will only get the patches that reduce your performance a lot.
In order to fix Spectre variant two, one has to fudge with Branch
prediction and another small (but critical) cache called the Branch Target Buffer
(BTB) [11]. In order to minimize the impact on the BTB, one needs certain
instructions called IBRS ("indirect branch restricted speculation"),
STIBP ("single thread indirect branch predictors”) and IBPB
("indirect branch prediction barrier") enabled by microcode. The
older the server/processor combo in question is, the less likely it is to
receive a microcode update via firmware or OS. On machines that do not get a
microcode update, other techniques, like Google’s “retpoline” [12] may be used.
NOTE: Spectre Variant one is mitigated by reducing the resolution in certain OS and App timers, and is done on an app by app basis.
Now you know roughly what Meltdown and Spectre are, why is imperative to patch EVERY SINGLE INSTANCE AFFECTED and why you get slowdowns.
But you need to regain your lost performance now, and you need to
regain even more performance to minimize (but not eliminate) the purchase of
new servers until 2022. So, we move on to many ways to squeeze more performance
out of your existing server fleet.
Beware. Hardware with these bugs really fixed will NOT arrive until 2022 (at least).
Granted, you may think you can get out of this by buying new Hardware, and redeploying your VMs with less oversubscription. But the problem with that approach is that you will end up with a lot of new Hardware “patched at the factory”, instead of being truly fixed. And you will have to live with that Hardware for five (or more) years.
Why do I say Hardware with the bugs really fixed will not arrive until 2022? Quite simply stated, designing a new microprocessor generation takes about 4 years, and then a little bit more time to manufacture those processors in volume and put them inside new server generations. All processor makers were notified of Meltdown and Spectre in July 2017. So, one has to wait until about 2022 (being optimistic) for any hope of buying a server with a processor designed from the ground up without these vulnerabilities, as opposed to one “Patched at the Factory”. And pretty much the processor inside every server you buy from now until sometime in 2022 will be “Patched at the Factory”. And you, dear reader, will have to live with those servers for 5 years (or more). Do not take my word for it, Linus Torvalds (creator and benevolent dictator of Linux) has this to say about the next crop of “patched at the factory” processors: « As it is, the patches are COMPLETE AND UTTER GARBAGE.» [13]
Why is this important? Well, as we saw previously, Meltdown and Spectre
are just the tip of the iceberg of a class of vulnerabilities related to
“Speculative Execution”. As we speak, researchers (good, bad, white hats, black
hats, friendly nation states and unfriendly nation states [depending of your
perspective]) are exploring that rabbit hole for even more security
implications of this. It is much, much better to get a server with a processor
redesigned to not have the flaw in the first place, than to have one
that has the flaws patched at the factory, but with the lurking threat of
discovering yet another flaw with yet another round of patches.
Of course, is a fiction to believe that one can completely eliminate all server purchases from here on to 2022. But servers you buy say, two years in the future, will not only be patched at the factory, but will be benchmarked with those patches already installed, making planning easier, and be priced accordingly to said benchmarks.
Next are a few tips to squeeze more performance out of your current infrastructure, minimizing (but not eliminating) cash outlays and server purchases as much as possible. Oh, and as we said before, these tips will make you look great in front of your CEO, CIO and CFO!
Recommendation #1: Move Workloads from physical machines to virtual ones
I know what you are thinking. This first advise seems completely counter intuitive! Didn’t putting a workload on a VM incurs a performance penalty? And doesn’t that performance penalty will come in addition to the performance penalty incurred from patching Meltdown and Spectre? The answer to both questions is yes. And still, I stand by this advice.
Yes, there is a performance penalty for going from physical to virtual servers. In the old times (2005), that penalty was between 5 and 30% depending on workload [14], but, with more than 10 years of optimizations, hypervisors have increased their performance a lot. Nowadays, the performance penalty is much lower [15][16].
Now, there are many system administration advantages in moving a workload from physical servers to virtual ones, and those have to be considered as a bonus. But in our case, we are interested in two effects:
a.) Unlocking the idle capacity inside those servers. If a server is using only 50% of its CPU capacity, and we virtualize it, even after a 30% performance penalty, we still have 20% of the CPU capacity available to other uses.
b.) Increasing the size of the pool of machines available for virtualization. This makes it more likely that we will find out a machine tailored to a workload, say, a Haswell or newer machine for a Kernel-intensive workload, and a pre-Haswell machine for a workload that does not call the Kernel too much (remember, the more your application calls the Kernel, the more a performance hit it receives).
At this point, most likely, your organization is using virtualization, hopefully Bare-Metal-Virtualization, and if not, I urge you to start using it. But I am quite certain that there are still some workloads on physical servers. There are valid technical reasons to keep workloads on physical servers. In the past there were many reasons, but nowadays, there are very few. The reason many workloads that could be virtualized are still on physical servers is corporate inertia (you will see this phrase a lot from here on), plain and simple. We all know that stubborn sysadmin, or the person that does not keep current and is still thinking about that 2007 paper [14], or the manager that is too conservative for his (and the organization’s) own good. Well, now you have the impetus for change.
Recommendation #2: Move some of your workloads from other Hypervisors to KVM (Kernel-based Virtual Machine)
There are differences in performance between hypervisors. And while is true that over the years those differences have reduced as hypervisor code has matured, and every hypervisor has adopted and adapted the best techniques from one another, it is still true that KVM, due to the way it was implemented, has an edge in terms of performance for most workloads.
So, move as many of your VMs and Hosts as possible from VMware, Xen, and Hyper-V to KVM. Do not get me wrong, all those four Hypervisors are great Hypervisors, with many strong points, with VMware being the “Gold Standard”, but if your main concern is squeezing as much performance as possible from your current infrastructure, you need to go to KVM, warts and all.
Chances are your organization already has KVM. For instance, most implementations of OpenStack use KVM as their hypervisor, and RedHat and many other Linux distros have it integrated. And if you are not using KVM, I urge you to integrate it to your environment (perhaps at the expense of some other more expensive hypervisor). Is not that expensive if you use Ubuntu, CentOS or RougeWave (for example).
If you have been a good sysadmin, you created all your VMs using OVF 2.0 (and if not, you better have a very good technical reason) therefore, moving them to another hypervisor is relatively simple (unless you have been using some proprietary management and instrumentation functions and APIs of your hypervisor). If you have not been using OVF 2.0, you should start in earnest. And if you have been using proprietary APIs and management functions, you should really look at cross platform solutions.
Of course, do this ONLY if it makes financial sense. If you are a huge Microsoft shop and you are getting Hyper-V for free due to your licensing terms, it makes no sense to bring in a paid hypervisor, no matter how much performance you regain. Also, if you have many Oracle databases, the money Oracle charges for running their database in other Hypervisors (as opposed to their own Xen derivate) makes it untenable to move those workloads to KVM. But, on the other hand, if you are paying high costs for licensing some other hypervisors due to corporate inertia, perhaps this advice will not only get you more performance, but also reduce your licensing costs to boot!
So, if you can, expand your KVM pool of VMs and hosts, to recoup as much performance as you can.
Recommendation #3: Partition your server pools wisely
Most hypervisors allow you to move VMs from a physical host of one processor generation to a host of a different processor generation as needed. And if you have been a good sysadmin, you have enabled this feature. The way hypervisors do this is to make all the processors in the host pool to report themselves to the VMs as belonging to the oldest generation available in the pool, and hiding any capabilities not supported by that generation.
If you recall our analysis of Meltdown and Spectre, you realize that in
order to get security at the Hardware level, one needs processors with the PCID
feature and the INVPCID, IBRS, STIBP and IBPB instructions. The first two are
present in Haswell and higher processors, while the other three come with a
microcode update. Which means that you need to partition your fleet in at least
three groups: Haswell or higher with microcode update, Haswell or higher with
no microcode update, and lower than Haswell.
If you do not do it like that, then all the microcode updates will be for naught, as the new capacities will be hidden from the VMs, which will resort to use the less efficient patches.
Note1: Considering that Haswell was announced in 2013, is highly unlikely that anything older than Haswell will receive a microcode update to get IBRS, STIBP and IBPB.
Note 2: This discussion intentionally leaves out AMD processors, as those are “somewhat less vulnerable” to Meltdown and Spectre and handle thing in a slightly different fashion, but the advice of a smart split of your AMD server pool (between Zen and various generations of Buldozer) still stands.
And now you see why Recommendation #1 was not such a contradiction. By expanding the pool of physical servers available for virtualization, you make it easier on yourself to partition your pools along those lines.
Recommendation #4: Deploy your workloads in the proper pools
This one should be evident by now. For example, do you have a workload that calls the kernel a lot? Deploy on your pool of machines with Haswell (or higher) and microcode updates. Have a workload that invokes the kernel very little? Deploy on lower than Haswell, no microcode.
Recommendation #5: If you have workloads that can be moved from VMs to Containers, Just Do It!
Containers are the new kids in town. As such, many conservative Sysadmins and Managers distrust them. But many applications are now ready to move to containers, and are supported by their developers and commercial entities too.
As you may know, containers have even less overhead than bare-metal hypervisors. So, in our context, containers allow us to recover even more performance from our infrastructure.
If any of your workloads has a container ready implementation, with adequate support, move it now. If the only reason for not doing it was corporate inertia, now you have the impetus needed to “make it so”.
Recommendation #6: Be on the lookout for inefficient workloads
Here is a personal anecdote: In one of my previous works as a sysadmin (more like the senior manager of the sysadmins) a friend programed a critical ETL application in Java on Windows, to be deployed in Java on Linux. The application worked on a Windows server, consuming 40% of CPU. The guy went on vacation before launch. As soon as it was moved to Linux, it jumped to 100%. Since it was never tested in Linux, and it was consuming 100% CPU, I declared that a showstopper, and refused to pass it into production. But the CEO personally said that the application had to go online that day (Dec 23, 2003). I questioned my friend’s teammates, and it turns out that the application checked a directory for a file to process, and if the file was not there it checked again immediately, and if the file was still not there, it checked again, and if…. You get the drift. I instructed his teammates to put a timer between checks. Answer: We do not know how to do that in Java. So, I instructed them to break the loop and execute only once, and I put the application in cron to execute every 5 seconds (I had to roll my sleeves for this, as well as developing a watchdog for the application, which that team “conveniently” forgot). Result? Processor usage in Linux measured in 5%, application stable for 18 months, until it was replaced.
There are many applications in this world programmed in an inefficient manner, calling the kernel excessively (among other sins). Sometimes, the sin is yours, my dear admin colleague (a workload provisioned in a VM with less memory than needed, forcing it to use the virtual swapfile a lot, for instance, trashing the VM, and the SAN to boot).
If you spot some of those among your workloads, have a word with the
developer (be it an individual or group in your organization, or a company, or
a customer) in order to reach a more efficient implementation.
Recommendation #7: If you are an ISP and are using a private cloud for your workloads, and a public one to sell, unify them
I know many ISPs in LatAm, and many of them have two (or more) clouds, one (or more) to sell, and one (or more) for their internal workloads. And more often than not, those clouds use the same server models and the same technology and SW. The reason for the complete separation being merely administrative and cultural (corporate inertia).
If you unify those clouds, you achieve many benefits:
a.) You expand your pool of machines. And as we saw in Recommendation #3, having a bigger pool means an easier time partitioning your servers in a smart way.
b.) You get access to idle performance in one of your clouds that you may need in the other (in one of the ISPs I know, the internal Openstack Cloud was in full usage from day 1, because it was designed that way, while the customer facing cloud was sub-utilized, because it was bought with the full capacity of blade servers from day one, even though the ISP had to attract customers to it over the course of a year).
c.) You transmit a powerful message to your customers: «the cloud I offer you is so good that we thrust it with our own workloads. We will support it like it was ours, because our workloads are running on it. If there is a glitch, we go down with you. Your pain is our pain. In the words of Microsoft: “We eat our own dog food”». That inspires more confidence in a prospective customer than: «Our cloud is very secure, stable, and technologically advanced, but, just in case, we run our workloads in a completely separate cloud from the one we sell to you».
Recommendation #8: If you are a normal Company, use public clouds for some of your workloads
If your company is not selling a public cloud, it should probably use one. Take a close look at your workloads, see which ones can be moved to public clouds without much security implications, or costly redesign, and move them. This will free up resources in your datacenters for your more sensitive workloads. Let the public cloud providers be the ones who buy more “tainted” servers for the time being. They can get more favorable terms with server makers, terms we can only dream about.
Recommendation #9: Upgrade the processors in your servers
If, after all this, you still need more performance, you could consider upgrading the processors inside your existing servers. Yes, upgrading the processor is possible on servers, and not only reserved for Desktops and Workstations.
It may seem ironic that I recommend buying new processors, even knowing
that those processors still have the vulnerabilities, and many of them will not
have the features needed to reduce the performance hit. But hey! This will be
MUCH less expensive than buying new servers. New servers with the
vulnerabilities still in them that will stick around in your datacenter for 5
years (or more). But, and this is very important, look at the TCO of doing this
before you decide. You have to take into account parts, labor, disruption,
power and cooling (newer servers tend to be more efficient), remaining useful
life of the upgraded server versus a new one and many other factors. Develop a
solid business case analysis before doing it.
Many manufacturers allow you to upgrade your server processors in the field, and they publish guidelines to do so. Check [17] for an example.
Some tips:
a.) If the equipment is still under warranty, do not do it. Wait until the warranty is over. Is not worth the hassle.
b.) If the equipment is still supported, call the manufacturer and work with them. They will be happy to help. And this way you ensure that you do not lose support for the equipment due to unauthorized work performed.
c.) If the equipment is in end-of-sale, end-of-life, end-of-support, still contact the manufacturer. If kindly asked, the support personnel can at least orient you as to which parts work and which parts do not, and will point you to the right documents and firmware updates.
d.) Update every single firmware in the machine before upgrading the processor.
e.) If the machine has a vacant socket, populate it, and redistribute/complete the DIMMS accordingly.
f.) If you administer a small-scale datacenter, upgrade on a case by case basis.
g.) If you administer a mid-scale datacenter, upgrade by batches. Upgrade all the spare parts and development and testing servers. Then move the upgraded parts to production, upgrade the parts you just removed and lather, rinse, repeat.
h.) If you administer a large-scale datacenter, you know better than anyone how to handle things like this, and do not need my advice. But I need yours. I would use the method in point g.), but also, upgrade whenever I need to repair damaged boards, as part of the repair process. Let me know what you think about that.
i.) If at all possible, upgrade to a next generation processor. Many manufacturers have boards that support processors from two generations. Even if you do not make the cut to a Haswell processor with firmware updates, at least new processors are faster per clock cycle, and more energy efficient, which reduces power and cooling requirements.
j.) Involve the manufacturer of the server. This repetition was on purpose.
Recommendation #10: Consider AMD servers
Let’s face it, it is impossible to go on for four or more years without buying more servers. But when buying servers, many organizations tend to compare only Intel based servers, among different manufacturers. And, with the arrival of Zen based processors from AMD, that is a mistake, as AMD became competitive again.
The new processors based on AMD’s Zen architecture are immune to Meltdown, and somewhat less susceptible to Spectre [3]. AMD processors and servers also tend to be less expensive, and come with many goodies, like a higher maximum number of cores per socket, more PCI-e lanes and more memory bandwidth. Granted, they also have some drawbacks (mainly in the way they share their caches among the cores, and raw IPC).
Do not get me wrong, Intel based servers are great, and Meltdown/Spectre is an industry-wide problem. But from a security standpoint, and in general, it may behoove you to include AMD alongside with Intel in any server purchase process you may initiate in the future. Simply stated, tell server vendors that if they do not have both Intel and AMD in their Lineup, they are not invited to your future RFPs.
As you can see, the recommendations from 1 to 8 aim to recoup some performance hidden in your current infrastructure, and allow you to have a more favorable ratio of VMs per Physical Host, as well as deploying the workloads in the most adequate server pools. The last two recommendations (9 and 10) aim at helping you minimize the impact that additional horsepower will have in your budget and security until 2022. I really hope this info and this set of tips and tricks has been useful to you, please let me know whether that’s the case. Questions or comments are welcome.
More Resources:
[1] "The Intel 80x86 Processor Architecture: Pitfalls for Secure Systems" May 1995. IEEE Symposium on Security and Privacy
[2] https://www.wired.com/story/meltdown-spectre-bug-collision-intel-chip-flaw-discovery/
[3] https://www.amd.com/en/corporate/speculative-execution
[4] https://www.ibm.com/blogs/psirt/potential-impact-processors-power-family/
[5] https://developer.arm.com/support/security-update
[6] https://www.theregister.co.uk/2018/01/16/oracle_quarterly_patches_jan_2018/
[8] https://www.symantec.com/connect/blogs/linux-worm-targeting-hidden-devices
[9] http://blog.talosintelligence.com/2017/09/avast-distributes-malware.html
[10] https://www.welivesecurity.com/2017/07/04/analysis-of-telebots-cunning-backdoor/
[12] https://support.google.com/faqs/answer/7625886
[13] https://lkml.org/lkml/2018/1/21/192
[14] https://www.vmware.com/pdf/hypervisor_performance.pdf
[15] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and Linux containers,”
In Proc. 2015 IEEE Int. Symp. Perform. Anal. Syst. Software (ISPASS 2015).Philadelphia, PA, USA: IEEE Press, 29-31 Mar. 2015, pp. 171–172.
[16] K.-T. Seo, H.-S. Hwang, I.-Y. Moon, O.-Y. Kwon, and B.-J. Kim, “Performance comparison analysis of Linux container and virtual
machine for building Cloud,” Adv. Sci. Technol. Lett. , vol. 66, pp. 105–111, Dec. 2014
[17] https://support.hpe.com/hpsc/doc/public/display?docId=c03911173
No hay comentarios:
Publicar un comentario