Comment on page
When initially developing a Grainite application, Grainite will typically run on a single-node development machine, such as a laptop or development VM. However, once the application is ready for staging and eventually for production, it will be deployed within a Kubernetes cluster with 3 or more nodes.
- 3-node Kubernetes cluster
- Each node must have the following hardware requirements:
- CPU: 8 cores (logical)
- RAM: 32GB
- Persistent Disk/Volumes (SSD): 2256 GiB
- The cluster should be deployed in 3 availability zones.
- Example: Node 0 in AZ0, Node 1 in AZ1, and Node 2 is in AZ2
A Grainite cluster consists of 3 nodes which run in Kubernetes as pods. Each Kubernetes node must run on a VM or server instance with the required hardware specified above. Today, we provide scripts to make it easy to deploy a cluster on Amazon’s EKS (AWS), Google’s GKE (GCP), Azure Kubernetes Service and VMWare Tanzu Kubernetes Grid, which can run on-premises.
The reason Grainite is deployed as a 3 node cluster in production is for high data durability and availability. The above cluster configuration is rated for a throughput of up to 250 external events per second of up to 4K event size with 4 additional messages generated internally in the cluster (lookups of or writes to other grains, topic appends, etc.).
Other higher cluster configurations are also available, including a Medium and a Large configuration (coming shortly). Please contact your sales rep (or [email protected]) if you would like to determine whether our standard cluster size works for you or to inquire about these vertically or horizontally scaled larger clusters.
Availability: The availability is limited by the availability of the region. Typical < 99.99% Durability: up to 11-nines assuming redundant media (like EBS) Recovery: Survives the loss of one zone. Automatic fault recovery with zero RPO and 3s RTO. Licensing Cost: 1 cluster
This is the simplest deployment and is supported today. All nodes run in the same region. However, as mentioned before, the nodes should be spread across three availability zones to prevent data loss in case a zonal failure occurs. To prevent data loss in the case of a regional failure, backups should be run with the destination residing in a separate region. In a regional failure, an operator would follow the steps to create a new cluster and perform a restore operation on the new cluster to restore data from the original cluster with. The recovery point would be determined by the most recent backup and thus more frequent backups result in a better RPO at the expense of cost. The recovery time objective would be directly related to the amount of data in question. One can conservatively expect 20MB/s for data transfer from blob storage in a public cloud environment and an additional 30 minutes of time required for creating the new Grainite cluster. So for a cluster which has accumulated 100 GB of backed up data, one can expect to have a new cluster available in another region for serving in just under two hours.