The Story of Storlets: IBM’s Computational Storage Model
Cloud storage vendors generally use commodity servers as the underlying storage nodes that serve large data sets to users. IBM leverages storage node processing capabilities to execute computational modules, namely storlets, close to where the data is stored. The Storlet Engine provides cloud storage with the capability to dynamically upload storlets and execute them in sandboxes that insulate the storlets from the rest of the system and from other storlets.
Interestingly, IBM proposes that a Storlets Marketplace can be used as a repository of storlets from different vendors. An application on top of the storage can mash-up and use various storlets from the marketplace to create its logic and functionality.
- Reduced WAN bandwidth used from fewer bytes transferred
- Enhanced security through decreased data exposure
- Cost savings from reduced amount of infrastructure
- Compliance support from improved provenance tracking
The definition of a storlet is a bit elusive but can best be described as a unit of computation where required computations are brought to the data, instead of the other way around. Storlets can analyze each object and extract its metadata, including size, subject, resolution, format, and more. Storlets are dynamically loaded code. Perhaps the primary researcher and developer of storlets, Michael Factor, Distinguished Engineer and expert on Storage Systems at IBM Research in Haifa, Israel can explain in his own words:
“A new method of storing information is called object storage. This approach stores information as objects. Each object contains the data (the bits and bytes of our documents, movies, images, and so forth), together with metadata that holds user- and system-defined tags. These smart data objects include rich information – or metadata – that describes the content of the data, how the object is related to other objects, how the data should be handled, replicated, or backed up, and more.
Although object storage can store objects, manage them, protect them, and so on – it doesn’t by itself dramatically increase the rate at which we can extract value from objects. But what if we could turn a software-defined object store into a smart storage platform?
Storlets bring the computation to the data
A new research prototype called “storlets” holds the promise of greatly increasing the value we get out of storage and the speed at which we can access what we need. A new software-defined mechanism, storlets allow object storage to move the computation to the data, instead of the system having to move the data to a server to carry out the computation.”
The significance of storlets might not be immediately apparent but the real value is that the technology allows you to process data where it’s stored. This means that there’s no over-the-network data transfer, which saves both time and money.
“Our vision is to reduce costs, increase flexibility and improve security by turning the object store into a platform, and allowing the functionality of the object store to be extended using software.”
Also known as “computational storage,” storlets introduce stored procedures for storage cloud which provide the ability to run computations (“storlets”) safely and securely, close to the data in the cloud. Storlets typically run in a sandbox, loaded as objects and triggered by events on objects (e.g., put/get) or on their associated metadata attributes.
You can view the entire 1:31 hour video that covers IBM’s software-defined storage offerings and research.
I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.