One of the sensitive areas when it comes to docker is persistent storage… A typical service upgrade involves shutting down the “V1” container and pulling/starting the “V2” container. If no actions are taken, then all your data will be wiped… This is not really the scenario we want off course!
So today we’ll go over several variants when it comes down to data persistence ;
- Default : No Data Persistence
- Data Volumes : Container Persistence
- Data Only Container : Container Persistence
- Host Mapped Volume : Container Persistence
- Host Mapped Volume, backed by Shared Storage : Host Persistence
- Convoy Volume Plugin : Host Persistence
What do I mean with the different (self invented) persistence level ;
- Container : An upgrade of the container will not scratch the data
- Host : A host failure will not result in data loss
So let’s go through the different variants, shall we?
The most basic implementation… We created our container without any notion of volumes. The data resides within the container. As we mentioned during the introduction, during an upgrade we’ll suffer data loss… Where this may not be an issue for some containers, there are state full implementations who do want to keep their data (for instance ; databases, …)
One step up from the “default” is to add a volume to your container implementation. This will ensure that a given volume is mapped into a data volume. These volumes reside on the host system and will remain untouched during service upgrades.
Data Only Container
A slight variation on the typical data volume is to use a data-only container. Here you’ll create a container (typically a base busybox or alpine) that contains the volumes as you would have used it in the “data volume” variation. When starting our main container, we’ll use the “-volumes-from” parameter to ensure that all the volumes from our “data-only container” are mapped into our main container. So this pattern is one where we see a typical “side kick” implementation.
Host Mapped Volume
Another variation on the “data volume” pattern is mapping a volume to a directory on host level. With the “data volume”, the folder will be physically located in a file on the default volume location. With the “host mapped volume“, you’ll do a direct mapping between the directory (volume) on container level and on host level. In essence, you’ll have the same advantages as with the volume, where more hybrid scenario’s become possible… The main disadvantage here is mapping the rights (uid/gid) between the container & the host level.
Host Mapped Volume, backed by Shared Storage
We can crank it a notch up… and use a folder that is backed by shared storage. Think about NFS, Gluster, … whatever for this. The main advantage here is that you will not suffer any data loss in case of a host level failure.
Convoy Volume Plugin
Mapping to host level still feels a bit “static”. You have to clearly align your hosts in a default manner, or you’ll run into the wall at a given point. Another implementation is “Convoy“… Very unrespectfully said, Convoy will run as a docker volume extension and will behave as an intermediate container. This intermediate container will ensure the link to your shared storage. At the moment, the main implementations are NFS & Gluster, where others have been touted during a beer too as “in the near future”…
There are probably even more patterns… If you have any, feel free to give me a ping!
I’m aware of Flocker, though I must admit I have not experimented with that. Though the concept looks very nice!
- Data persistence is possible with Docker!
- Be aware that there are different implementations. Each with given advantages & disadvantages.
- Pro Tip : Always test your deployments! Doublecheck all aspects related to resilience & performance.