Data disksedit
Managed disks can be attached to Data nodes to use as the data directory for the node. The ARM template can attach Standard HDD disks or Premium managed disks, for those VM SKUs that support them:
-
storageAccountType
-
The performance tier of managed disks.
Standard
will use Standard HDD disks, whilstDefault
will use Premium managed disks for those VM SKUs that support Premium managed disks, and Standard HDD disks for those that do not. The default isDefault
. -
vmDataDiskSize
-
The size of each attached managed disk. Choose between
32TiB
32 Tebibytes
16TiB
16 Tebibytes
8TiB
8 Tebibytes
4TiB
4 Tebibytes
2TiB
2 Tebibytes
1TiB
1 Tebibyte
512GiB
512 Gibibytes
256GiB
256 Gibibytes
128GiB
126 Gibibytes
64GiB
64 Gibibytes
32GiB
32 Gibibytes
Default is
1TiB
. -
vmDataDiskCount
-
The number of managed disks to attach to each data node. The total number of managed disks will be
vmDataNodeCount * vmDataDiskCount
If the number of disks selected is more than can be attached to the data node VM SKU, the maximum number of disks that can be attached for the data node VM SKU size will be used. This is equivalent to
Math.min(vmDataDiskCount, data node VM SKU maximum attached disks)
Must be greater than or equal to 0. Default is the maximum number of disks supported by the data node VM SKU.
Disks are partitioned with fdisk
when less than 2TB, and with parted
when larger,
with an ext4 filesystem and 4096 byte block size.
Data is striped across attached disks per data node in a RAID 0 array, using mdadm
on Linux. When only one managed disk is attached, no RAID 0 array is configured. When a value of 0 is specified, the data node will use the temp storage of the VM.
Temp storage, with filesystem /dev/sdb1
mounted on /mnt
in Ubuntu,
is present on the physical machine hosting the VM. It is ephemeral in nature and
not persistent; A VM can move to a different host at any point in time for various
reasons, including hardware failures. When this happens, the VM will be created on
the new host using the OS disk from the storage account, and new temp storage
will be created on the new host.
Using temp storage can be a cost effective way of running an Elasticsearch cluster on Azure with decent performance, so long as you understand the tradeoffs in doing so, by snapshotting frequently and ensuring adequate data redundancy through sufficient replica shards.
Striping data across attached disks is recommended to improve Input/Output operations per second (IOPS) performance, since the IOPS and throughput limit per disk can be combined. The IOPS for Premium disks is higher than for Standard HDD disks, so Premium disks are recommended where application performance is paramount.