VMAX3 Architecture — Theory

Abid
8 min readSep 27, 2019

--

VMAX3 is a flagship enterprise storage array from EMC succeeding VMAX 10K/20K/40K models. VMAX3 can be configured in 3 models VMAX 100K/200K/400K. The VMAX3 Family with new HYPERMAX OS 5977 release delivers a number of revolutionary changes as claimed by EMC.

New features of VMAX3:

  • All new HYPERMAX OS 5977 replaced with Enginuity 5876.
  • Dynamic Virtual Matrix Architecture (DVM) — CPU pooling concept provides on-demand cores.
  • 100% Virtually Provisioned — Factory pre-configured.
  • Embedded NAS (eNAS).
  • New TimeFinder SnapVX — Point in time replication technology.
  • Other number of enhancements & Operational capacities has been increased.

VMAX3 Array model comparisons:

VMAX3 Array Components

VMAX3 Engine

VMAX3 inherited the same engine based concept from VMAX. Basic building block for VMAX3 is Engines.

Engine consists of two redundant director boards (controllers) that house global memory, front-end connectivity, back-end connectivity & internal network communications components.

Components of each Director:

  1. Management Module
  2. Front-end I/O Module
  3. Back-end I/O Module
  4. Flash I/O Module
  5. Memory Module (Global Memory Component)
  6. Power Supply
  7. Fans
  8. Infiniband (IB) module — Connects to IB interconnect switches

Port Numbering:

Physical port numbering within the engine is used to determine how cables are connected to the front-end, back-end, and fabric I/O modules. The physical port numbering on each director board follows the same rules.

  • I/O module slots are numbered 0–10 from left to right
  • I/O modules that have 4 ports are numbered 0–3 from bottom to top
  • I/O modules that have 2 ports are numbered 0–1 from bottom to top

Logical port numbering within the engine is used to determine how the HYPERMAX OS interprets what is connected to the front-end, back-end, and fabric ports. The logical port numbering on each director board follows the same rules.

  • Supports 32 logical ports (ports 0–31)
  • Logical ports are numbered from left to right, bottom to top
  • Ports 0–3 and 20–23 are internally dedicated to the vault I/O modules

MMCS & MM: There are 2 types of management modules — Management Module Control Stations & standard Management Module (MM). The first engine of each system will be deployed with an MMCS in each director. Each subsequent engine (engines 2–8) will be deployed with a management module in each director, in place of an MMCS. MMCS combines the management module and control station (service processor) hardware into a single module.

MMCS functions:

  1. Environmental monitoring capabilities for power, cooling & connectivity.
  2. Each MMCS monitors one of the system standby power supplies (SPS) through a RS232 connection.
  3. Each MMCS is also connected to both internal Ethernet switches within the system.
  4. Provides support functionality. Each MMCS connects to LAN and remote connectivity for EMC support team.
  5. Can be connected to external laptop.

Management Module (MM): is a subset of MMCS, which does not include control station functionality. Each management module has an RS232 connection to an SPS. Management module A connects only to Ethernet switch A, and management module B connects only to Ethernet switch B and is responsible for monitoring and reporting.

Global memory technology overview:

Global memory is a crucial component in the architecture. All read and write operations are transferred to or from global memory. Transfers between the host processor and channel directors can be processed at much greater speeds than transfers involved with physical drives. HYPERMAX OS uses complex statistical prefetch algorithms which can adjust to proximate conditions on the array. EMC intelligent algorithms adjust to the workload by constantly monitoring, evaluating and optimizing cache decisions.

DDR3 DRAM technology (16 slots per director) allows the array to be configured up to 2TB of mirrored memory per engine, and up to 16TB mirrored per array.

Shared physical memory:

Global memory is accessible by any director within the array.

  • If an array has a single engine, physical memory pairs are internal to the engine
  • If an array has multiple engines, physical memory is paired across engines

Dual-write technology is maintained by the array. In the event of a director or memory failure, the data continues to be available from the redundant copy.

Front-end connectivity:

The front-end I/O modules are used for channel connectivity. There are 32 ports (16Gbps) per engine. There are different types of front-end I/O modules that allow connectivity to various different interfaces. These include SAN, SRDF, and embedded NAS (eNAS).

SAS back-end connectivity:

The system’s architecture incorporates a 6 Gb/s SAS (Serial attached SCSI) back-end design to ensure high performance and full redundancy. SAS is a reliable, high end protocol that uses a connectionless tree structure with unique paths to individual devices. The paths are stored in routing tables which are built during a discovery phase and are used to route I/O to the desired end point.

The SAS back-end subsystem provides independent redundant paths to the data stored on physical drives. This provides seamless access to information, even in the event of a component failure and/or replacement.

Flash I/O Module: The flash I/O modules are utilized during the vaulting sequence.

As cache size has grown, the time required to move all array data to a persistent state has also increased. Vaulting is designed to limit the time needed to power off the system if it needs to switch to a battery supply. Unlike previous platforms, VMAX All Flash and VMAX3 platforms do not vault to backend drives. Data is now vaulted to dedicated I/O modules, known as flash I/O modules, saving disk space. Vaulting to flash also expedites the vaulting process, and centralizes it to the engine components, removing the need for battery backup to disks, creating an overall denser configuration than previous systems.

Vault triggers: State changes that require the system to vault are referred to as vault triggers. There are two types of vault triggers.

  1. Internal availability triggers: Internal availability triggers are initiated when global memory data becomes compromised due to component unavailability. Once these components become unavailable, the system triggers the Need to Vault (NTV) state, and vaulting occurs. There are 3 types of internal triggers.

a. Vault flash availability — The flash I/O modules are used for storage of meta data under normal conditions, as well as storing any data that is being saved during the vaulting process. When the overall available flash space in the flash I/O modules becomes the same size as N copy of global memory, the NTV process triggers. This is to ensure that all of the data is saved before a potential further loss of vault flash space occurs.

b. Global memory (GM) availability — When any of the mirrored director pairs are both unhealthy either logically or environmentally, NTV triggers because of GM unavailability.

c. Fabric availability — When both the fabric switches are environmentally unhealthy, NTV triggers because of fabric unavailability.

2. External availability triggers: External availability triggers are initiated under circumstances when global memory data is not compromised, but it is determined that the system preservation is improved by vaulting. Vaulting in this context is used as a mechanism to stop host activity, facilitate easy recovery or act as an attempt to proactively take action to prevent potential data loss. There are two external triggers:

a. Engine trigger — When an entire engine fails, the system vaults.

b. DAE trigger — If the system has lost access to the whole DAE or DAEs, including dual-initiator failure, and loss of access causes configured RAID members to become non-accessible, the system vaults.

InfiniBand (IB) module: Dual Infiniband switches (a.k.a., Fabric or MIBE) provide connectivity to the matrix interface board enclosure (MIBE), as part of the Dynamic Virtual Matrix.

Dynamic Virtual Matrix

DVM enables hundreds of CPU cores to be pooled and allocated on-demand to meet the performance requirements for dynamic mixed workloads and is architected for agility and efficiency at scale. Resources are dynamically apportioned to host applications, data services, and storage pools to meet application service levels. This enables the system to automatically respond to changing workloads and optimize itself to deliver the best performance available from the current hardware.

The Dynamic Virtual Matrix provides fully redundant architecture along with fully shared resources within a dual controller node and across multiple controllers. DVM provides Dynamic load distribution architecture and is essentially the bios of the VMAX operating software, and provides a truly scalable multi-controller architecture that scales and manages from two fully redundant storage controllers up to sixteen fully redundant storage controllers all sharing common I/O, processing and cache resources.

DVM Functioning: The dynamic virtual matrix uses Infiniband (56 Gbps) technology to carry control, metadata, and user data through the system. This technology connects all of the engines in the system to provide a powerful form of redundancy and performance. This allows the engines to share resources and act as a single entity while communicating. In any system that has two or more engines, there are two redundant matrix interface board enclosures (MIBE) that connect to the fabric I/O modules of each director board. The purpose of the dynamic virtual matrix is to create a communication interconnection between all of the engines, and therefore a single engine system does not require a dynamic virtual matrix or MIBEs.

VMAX 100K, 200K contains two 12-port MIBEs whereas 400K contains 18-port MIBEs.

VMAX3 — Multi Core Technology:

The VMAX3 System can focus hardware resources (cores) as needed by data services. The VMAX architecture (10K, 20K and 40K) supports a single, hard-wired dedicated core for each dual port for FE or BE access — regardless of data service performance changes.

The VMAX3 architecture provides a CPU pooling concept — provides a set of threads on a pool of cores, and the pools provide a service for FE access, BE access or a data service such as replication. The default configuration is balanced across FE ports, BE ports and data services.

A unique feature of VMAX3 allows the system to provide the best performance possible even when the workload is not well distributed across the various ports/drives and central data services when there may be 100% load on a port pair. In this specific case for the heavily utilized FE port pair, all the FE cores can be used for a period of time to the active dual port.

There are 3 core allocation policies –

  • Balanced (default).
  • Front-end.
  • Back-end.

EMC Services can shift the ‘bias’ of the pools between balanced, front-end (e.g. lots of small host I/Os and high cache hits), and back-end (e.g. write-heavy workloads) — and that this will become dynamic and automated over time. This change can be only done by EMC personnel.

--

--

No responses yet