The Future of our Storage World is Autonomous

We were honored to present at the recent SNIA Storage Developer Conference which took place on Septeber 27th and 28th of this year. Rather than write a shortened recap of our presentation, we wanted to share it in full so that you can explore the exciting work that Magnition and team are doing to accelerate innovation in storage and memory performance.

The industry has been struggling with a seemingly intractable problem in modeling application workloads. This is further exacerbated by the inablility to model and map the effect of adjusting storage and memory parameters within the underlying systems to improve performance in real-time.

It was intractable, until Magnition solved the challenge with our newly available CacheLab platform. When you take the power of CacheLab and deliver previously uncapturable analytics and the proprietary algorithmic capability of Magnition Miniature Simulation Technology,

Welcome and thank you for attending this session on fully autonomous storage and Memory hierarchies, my name is Irfan Ahmad and I am the CEO and founder of Magnition. We are a company focused on providing technology and algorithms for the future of storage and memory systems, in particular for fully automated and fully autonomous storage systems. Prior to this, I was a founder of CloudPhysics, which has recently been acquired by Hewlett Packard Enterprise, and I used to work before that at VMware as a tech lead in the operating systems and distributed resource scheduling teams where we brought to market light style data center management products, including load balancing, automatic data movement, and real time I/O scheduling product.

The key question to start with is do the storage and memory industries need fully autonomous systems now? Obviously, what’s been happening in the last few years in the use of AI, machine learning and wide scale data analytics is a trend towards automating tasks for consumers.

Today we have self-driving autopilot and autonomous systems that are being prototyped and tested on California roads. This is the state of the art when it comes to autonomous systems in cars. However, enterprise storage customers and data center operators enjoy not the same level of sophistication. In fact, we’re quite far from being able to build fully autonomously data centers. The challenging machine learning problems associated with fully autonomous data centers are actually quite unique and in some cases much more challenging than what is found in other domains.

But we can change how the customers operating these products manage these systems and make a significant difference in the operational cost and efficiencies. It is fairly well known that storage memories caches are very complex stateful resources and the combination of increasingly dynamic workloads and the stateful nature of the resource itself doesn’t easily lend to rapid changes in resource allocation policies because data movement is involved, warm up times are required, etc. And of course, this makes automation actually quite a difficult task, and it’s not for the lack of trying.

Researchers have been trying to implement autopilot systems for storage for several decades, with limited results. Remarkably, both the automotive industry and the computer industry have been trying to achieve autonomous operations for about the same length of time.

So on the right hand side, you’ll see some visualizations for LiDAR and LiDAR based object detection, while vehicles are in motion in multiple directions, as well as static obstacles mapped to those locations using hints from GPS locations on maps and so on. On the left, we see quite dynamic behavior from storage workloads and storage systems. You see some realizations for access patterns for disc workloads taken from real production customers in the background, and then you’ll see also on the left, large chunks of time included in those that most visualization of heat maps are varying levels of activity, varying levels of Rios, distance, cash ability, working sets, and so on.

So they tend to be quite dynamic with a lot of variants that you can see in some of the colors. Now, if we visualize these in just the right way, we apply to write AI/ ML tools, we can actually become quite predictive.

And that’s where the most recent research in the field has been taking us. So if we look at the plot that is constantly moving on the left, that’s another specific example. Talk a little bit more about that in a bit. But this one is looking at the highest, most expensive tiers in a storage system, which are the tier zero caches. This particular case is a customer workload that exhibits quite a dramatic shift in the working set of the workload over a very short period of time.

So short it would require quite a dynamic algorithm to maintain SLOs, and today those algos have not been possible. Now to get there, we need to build autonomous systems that are selfaware perform, real time modeling and adaptive mitigations. The equivalent of this and self-driving cars would be pedestrians crossing the road, sudden, unexpected traffic, or a highway undergoing construction. And not the car has to figure out some invasion strategy. Whereas in the automotive domain, those models are getting mature in the world of storage and memory of resource management, we’re just now starting to get those ability in our tool belt.

With this background, let’s definitely talk about the current state of the art, which is manually managed storage and memories. Well, as you’re seeing in the industry that started to become infeasible. So on the one hand, it is not uncommon for a modern application to undergo actual code changes power, especially in microservice-oriented architectures, where applications are spread out over large networks. Similarly, storage systems and access patterns that they had to deal with and behaviors in workloads that they have to deal with change by the minute, and the frequency, intensity, variance, and volatility is starting to increase.

On the other hand, thankfully, we have more hardware variety than ever to be able to handle these variances.Unfortunately, along with those additional tiers of memory and storage naturally comes complexity. So we are going through a period when hardware complexity is actually making it more challenging, and at this point in time it is infeasible to successfully manage storage and memory allocations manually and to do so efficiently.

So the net result is we are increasingly vulnerable to seemingly insurmountable challenges. Technical challenges, including issues like thrashing interference, unpredictable availability, and so on. Now, these limit from a business point of view, what efficiencies and cost reductions can be had.

And technically these are today leading to over provisioning, which results in high cost. And obviously there’s a lack of control over performance leading to a much higher risk. So the only way out of this impact that we are in this certain juncture in time that we find ourselves in is to rethink our approach. As an industry. I believe we’ll need to build and bring to market fully autonomous storage and fully autonomous memory hierarchies. And we will need to build an ecosystem of technology, suppliers and standards to be able to accomplish this.

So that’s really my goal in this conference. Talk to establish the business case and need associated with this in our industry. Now in the world of storage, we can learn from other domains and understand and try to build a taxonomy of the models and the types of systems and characteristics that have to be learned in an autonomous memory hierarchy. Now I personally characterize these into two kinds of very high level 10,000 foot level domains versus a certain class of models that have to do with self awareness.

And then the other class is environmental awareness. Now, of course, we can compare and contrast with the self-driving cars in the automotive industry. So here a self-awareness example is how a car accelerates. And remarkably, that’s not a static notion. It’s actually very dynamic.

So as the car gets older or as it undergoes different stresses and different conditions, the acceleration is variable, so the model between an input and the output is actually continued on many other factors. So a car should be self-aware of its dynamic ability to accelerate. As a model. Breaking has a similar characteristic steering and other types of control conditions. Even battery discharge is quite dependent and requires predictive models.

Now environmental awareness for the automotive industry would include GPS maps, the ability to detect and understand static obstacles versus dynamic obstacles, the ability to to look at the terrain, look at the distances and the relative object velocities and map those to what’s already known about the environment. So applying this to the world of autonomous data infrastructure, we can also do a self awareness versus environmental awareness categorization. For example, with self awareness, we have to be able to understand the tiers of memories, disk, flash, etcetera, that exists in a hardware system. I think of this like brakes and acceleration capabilities.

We have caches that provide tremendous acceleration, different types of memories with different capabilities such as symmetries, statefulness, persistence, and throughput limits. So the data paths in our system that govern latencies, for example, may not have redundancies. These are all the types of things that go towards self-awareness in the storage domain. Now, of course there are environmental awareness concerned as well.

So in the case of storage and memories workloads are extremely dynamic. They change possibly at the microsecond scale. So these exceedingly dynamic systems and the environmental components of the systems are such that I believe these are much more dynamic per year at a time than, for example, the self-driving vehicle to be able to do QoS constraints and dealing with congestion and other factors. It’s quite challenging.

So other environmental issues is cost being dynamic. We know from cloud workloads and cloud economics that cost may not be static and interesting optimization when we achieved as a result of this. So all of this goes towards environmental awareness. So this allows us to now start to build a bit of a vocabulary around what are the underlying models? How we might go about creating those and we’ll get into in a moment.

But we can start to build an appreciation for how such models need to be built in real time for memories and storage, and then they need to adapt. They need to be self-aware and adapt so that at the end of the day, storage and memory systems are fully autonomous. So here’s some background to be able to understand autonomous systems. Typically, autonomous systems will operate in some sort of an OODA loop. The idea of these loops goes back to Colonel John Boyd of the US Air Force, was a very famous fighter pilot, and he developed these theories and emphasize that a faster rhythm from observations to orientations to decisions to actions is a significant advantage in systems of many different types.

So as a result, a lot of autonomous systems run one or often many more than one OODA loops. So observations come from instrumentation and being able to record that instrumentation and to make sense of it. Orientation has to do with being able to model the bits and bytes of data that are coming from the sensors and put them in context and adjust the models about how they are behaving and how those systems are behaving under various types of stresses and load, and then to convert that into some decision that indicates an expectation of the ability to adjust the behavior and the course of events in the future.

Well, that decision has to take many things into account. Obviously cost benefit analysis, confidence in the accuracy of models and stability period for when the system predicts that the next phase transition might happen, and so on. Now, finally, an action has to be taken and that connects to the actuation capability to make a change in the system.

So this is a fairly fundamental, high level understanding of how the system might behave. So I thought maybe we could Zoom in and talk at a high level architecture of a fully autonomous memory hierarchy. So we have this central OODA loop fundamental to any autonomous system. We know for legacy applications over microservices or container based Kubernetes, or maybe some orchestrated applications that are distributed, and they are running against a storage system, which itself is is complex and multitier. Multilayer could be a cloud storage system could be database, a dedupe engine, a single SSD, could be a cache more often than some combination of these.

So the first thing is from the left to right, we have instrumentation so that crosses over between Observe and Orient. For example, in the case of Magnition’s technology, that instrumentation could be at the device or the operating system, an event bus or message level. In the case of low-level devices, think of it as an I/O or memory access pattern that we are able to instrument in a very lightweight manner. Obviously, we cannot take every single memory request and instrument it. But most recent research in this area, including ours, has opened up a new set of tools that would include very careful sampling strategies.

So we first worked on instrumentation in our product so that we can at extremely low cost, maintain ongoing observation. Now I’m going to talk about orientation. This is always going to be above modeling. So how do we fit those specific measured vectors of performance availability and translate them into models we have already learned, and furthermore update those models from the new knowledge that is being gained through the information arriving. Now, once we have this orientation, we can enter the predict phase, which is where we use these models and the real-time observations and create a series of what-ifs from which we decide what decision to make to improve the situation, to correct some imbalance that might be happening, or to bring the system back into some compliance with quantitive service targets.

So in this step, obviously we have very specific algorithms needing to be implemented, including cost benefit analysis, stability analysis, and so on. So now, once we have that done, the decisions that are being selected in, obviously they need a set of actuators in the storage to memory hierarchy to actually make the changes in the parameters. Now, these parameters typically already exist in code in these systems. But, as we know as engineers, these parameters are in there because oftentimes engineers working on these products, are aware that there is a need and that in the “best of” settings for any one of these parameters is dependent upon too many conditions to just guess at it.

The difference is that today we tune these parameters in the lab before every major release, but they’re not really dynamically tunable to today so what we want to do is move towards being able to have a larger number of actuators that could influence the different behaviors of our systems and make them available.

Expose them to these new autonomous controllers. Now, of course, there’s a OODA loop, so any decisions will affect future observations and the future runs through the cycle will ensure that the system can continuously enforce these targets. These SLAs and continue to optimize and turn these knobs to hit at all times these desired outcomes at the lowest possible cost for this sort of a loop operating in such high throughput as would be required in a memory hierarchy or storage system. It has to be exceedingly efficient.

What are the types of use cases that we could offer if we had fully autonomous QoS? So I picked just two and I’ll go through them. I’m focusing here on performance autonomy use cases because of the limitation of time and by particular expertise. So the first one I want to talk about is autonomous SLAs, which in many ways are represented by the Holy Grail in the system, which is latency guarantees.

So the question is, could you take a storage system in dial in a certain number of milliseconds of a performance guarantee at a very high percentile. If you could do that, what could be built with it? Well, several core value proposition exist around this capability over user being able to dial in a certain latency or throughput target with a dollar budget and have an autonomous system auto-allocate the right amount of capacity throughout the platform, throughout the tiers to hit those SLAs. Now, if we had such a system, we could set it and forget it and achieve lower overall risk of service disruptions and revenue disruptions.

Such features would also allow higher margins in product because of business can do more consolidation with a smaller BOM (Bill of Materials) or sharing of resources while still being guaranteed that business critical functions achieve the service latencies that have been specified.

So overall, this really boils down to achieving lower OPEX and lower risk profiles by dialing in the desired policies in the system, making sure that those are achieved or if they cannot be achieved, then providing feedback to the operator that those requests for quality of service are not feasible within budget.

Another really interesting use-case is having a system accommodated from a completely different angle, which is autonomously optimized by cost for more performance, whichever is a primary driver of business objective function. So instead of thinking of it as a CS problem and for other workload types, we can think of it as a pure cost optimization challenge.

So here, using real-time workload modeling and resource allocation procedures, we’re able to dynamically adjust resources and provide isolation among different clients and end users, to achieve completely autonomous rightsizing dynamically for all the tenants of that system. Now, if we are able to achieve this, then the value propositions again are fascinating.

We could achieve provably optimal lowest cost of ownership or whatever the workflow that customers throws in the system. The workload could be something we’ve never seen in the lab as part of our performance tests, but a fully autonomous system is able to navigate and figure out exactly how to drive the TCO down to the lowest possible. Or we could use this system to eliminate noisy neighbor problems to gain a huge improvement in PC for infrastructure teams.

Now finally, we can use the operations in predictive planning use-cases here so that fully autonomous system can plan itself on how much further ahead it needs to allocate more resources behind more hardware, release more hardware from the cloud, or a plan to do many other technical situations like this.

So having seen the technical use cases, the value proposition in a high level vision of autonomous memory hierarchies, we can now look at some of the technical underpinnings of how this could be accomplished. Now again, I’m focused on the performance autonomy. What we see here is very interesting dynamic constantly changing patterns. So just to give you a little introduction to what’s happening here on the X-axis, we have a model of the resource allocation for a particular workload. So think of this as the tier zero memory which is allocated in gigabytes for this workload.

Now the Y-axis represents the performance. The performance in this case is plotted as a miss ratio so lower is better. And of course we can plot latency directly from that as well. But for this visualization with product miss ratios curve that you see then relates the performance of the system on the Y-axis and the cost being allocated to achieve that performance on the X-axis. Now the curve that you see that’s constantly transforming that actually represents the performance prediction of that particular application if we were to at any given moment in time, give it more or less of that tier zero memory.

So this is behaving quite radically right. So you would think that an application would have lower latency, which is good. The more money you’re willing to spend on by buying memory for it. And that’s generally the case.

But in this workload there is a burst of activity that happens over the course of very short period of a few minutes where the workload all of a sudden needs a lot more memory than it did just to be able to hit its normal target of performance. So here if we draw the latency target as horizontal line, what we see is that the curve that constantly shifts ends up having the intersection with that latency target move quite a bit to the right. In other words, the runtime requirements associated with being able to hit that latency target increase during that burst.

As you can imagine, burst carries with it a much larger working set. Let’s say, for a database, many more roles are being needed to be able to process transaction there are being requested so that generally will increase the working set.

So in this real world customer example, what we see is that the nature of the working set is quite variable and many times much higher amounts of memory is needed to be able to hit those targets and other times less memory is quite good enough. So the first notice that this plot is quite dynamic, as I mentioned, and to be able to have fully autonomous systems, we need to be able to have this model available in real time.

Now, the availability of that model up until today has been quite difficult and not real-time, and therefore systems haven’t typically been built to be able to exploit a knowledge of this nature because that knowledge was quite hard to come by.

So with with Magnition technology, we’re able to, for example, take a continuous optimization approach. So this curve from what you see here is similar and its nature to the previous, although for a different customer workload, but here I just frozen in time so we can study it.

So if any given instance in time, if this was the predicted model of the performance of the system, then what we know is of course latency gets lower. The more money you’re willing to spend, the more resources you’re going to get to this particular application. That makes a lot of sense.

But these models are not just linear. They have these very interesting behaviors that have to be modeled. Otherwise the system would end up making a lot of mistakes. So here in this example, if we were able to have this underlying hidden capability in our fully autonomous system that it can predict the performance of its highest tier or the most important or most expensive tier memories, and do so in real time. Now we can actually for the first time in the industry, build latency target and we hold on to them completely autonomously.

Hand off the lights-out model. So again, the Y-axis is performance lowers better because we’re dealing with agency. X-axis is the total amount of sources given to this workload dynamically. It could be a much larger amount of memory, and we are deciding to give a particular workload a smaller chunk of memory, and that’s what we’re showing here. So that curve just basically tells us how much the performance would achieve at any given resource allocation.

Now, if we’re able to in real-time draw this red curve here, we can actually convert that into a research allocation decision for that work completely dynamically and autonomously. That would result in achieving latency SLOs. So if we pin this workload at seven milliseconds of the 95 percentile even as a workload evolves and its working set increases we will be able to follow along and keep up with it. It’s a very powerful capability that is just now become feasible and is one of the key suppliers into assembling a fully autonomous storage system of the future.

Now, one particular tier’s memory allocation is not good enough obviously. So the previous example was a very powerful one, but a simple one. In fact, many tiers are involved. So here without going to the details, I hint at how a fully autonomous system could use these models to construct the exact allocation of resources and various tiers to be able to hit the desired objectives. So hopefully we can come together as an industry and build consensus around these concepts.

Some examples I’ve given barely touch the surface so that we can start to build a system that in the end game is able to finally deliver completely lights out self-managing, policy-driven systems to the world of storage and memory hierarchies and the database industries and so on. If we are able to achieve that, we’ll finally be able to throw away all of these control knobs that we have today and finally manage our data centers in a way where we can throw away the steering wheel and have the data center in a self aware manner follow the policy that we provided and finally allow our customers the ability to truly enjoy experience for their enterprise products and applications and systems that they are starting to get used to with their consumer product.

So I look forward to collaborations across the industry around this and please get in touch and look forward to fruitful conversations. Thank you very much.

Leave a Reply

Your email address will not be published. Required fields are marked *