VSP G1000 Architecture & Theory of Operation

Abid
14 min readAug 30, 2019

--

Along with product hardware & architecture, am going to discuss about the detailed technical functioning of subsystem. This blog is a bit lengthy but will serve as an architecture primer.

VSP G1000 is a high performance unified block & file enterprise storage system offering 3 times the speed of VSP in terms of I/O throughput and capable of delivering 4.8 Million IOPS. G1000 incorporates Storage Virtual Operating System (SVOS) which abstracts virtual layers from hardware enabling multiple physical storage systems to be managed as a single. SVOS also provides non-disruptive data movement between hardware platforms. G1000 also provides many other features like Global Active Device (GAD), Virtual Storage Machine (VSM) and enhanced operational capacities than VSP.

Architecture of G1000 inherits HiStar-E Network & Virtual Storage Directors (VSD) from VSP which provides unparallel performance and fault tolerance system.

Let’s look at the hardware of fully populated G1000 configuration.

  • G1000 can be configured with one or two controller chassis and can scale from single rack system to the max of 6 rack system .
  • Fully configured G1000 can have up to 12 drive chassis and 2 controller chassis.
  • Each controller rack (DKC rack) contains the controller chassis and up to 2 drive chassis. Each additional rack (DKU/Disk rack) can contain 2 drive chassis which can hold intermix of 16U SFF or LFF drives.
  • Controller chassis contains control logic, processors, memory and interfaces to hosts & drives. Drive chassis contains, drives, power supplies and interface circuitry that connects to controller.
  • VSP G1000 (6 racks) is configured as 2 modules. Basic Module & Optional Module.
  • The Basic Module and Optional Module are called as Module 0 and Module 1 respectively.
  • Module-0 and Module-1 are connected via Grid Switch (GSW).

DKC — Disk Controller Unit | DKU — Disk Unit

Each frame has 2 digit identifying number. Each of the DKC racks will be RK-00 and RK-01 respectively. HDU racks associated with its DKC rack uses the same left digit number followed by a 1 or 2 depending on its position relative to DKC rack.

System configuration starts from Controller Chassis 0 and scales to the right with 2 Disk chassis racks. Then DKC 1 and scaling towards left with 2 more disk chassis racks. This scaling approach is good, when customer starts from a minimal system with a single DKC and scales all the way to fully populated configuration. From a performance perspective, system configuration with 2 DKCs scales on both sides simultaneously balancing the resources.

Below are physical view schematics of VSP G1000 storage array for reference (front & rear view) from HiTrack.

Controller Chassis (DKC) Components: Hitachi Factory names and Hitachi Data Systems names of the components are as below.

The controller chassis includes the logical components, memory, SAS drive interfaces, and host interfaces. If a two-controller system has two service processors (SVPs), both are mounted in controller chassis #0. Controller chassis includes following maximum number of components.

  1. PCI Express Switch (ESW) or Grid Switch (GSW) — 4 switches
  2. Processor Boards (MPB) or Virtual Storage Directors (VSD) — 8 VSDs installed in 4 pairs.
  3. Cache Memory Adapter (CMA) or Data Cache Adapter (DCA) — 4 CHA installed in 2 pairs.
  4. Channel Adapter (CHA) or Front End Adapters (FED) — 12 FEDs installed in 6 pairs.
  5. Disk Adapter (DKA) or Back End Directors (BED) — 4 BEDs installed in 2 pairs.
  6. Service processor (SVP) — 2 SVPs, or 1 SVP and 1 Hub
  7. Control Panel
  8. Cache Backup Module (BKP) — 4 Modules installed in 2 pairs.
  9. Power Supply — 4
  10. Cooling fan — 10

The controller is logically divided in the center. Each side of the controller is a cluster that works in parallel with the other cluster. The virtual storage directors, front-end directors and back-end directors, cache, and cache backup kits, are installed symmetrically across the two clusters and work as one system. VSD’s and Cache Directors are located on the front of Controller chassis. The rear side of the controller includes 12 configurable I/O slots. Four of the slots support either FEDs or BEDs, and four of the slots support either FEDs or VSD pairs.

Hitachi factory name for the the first 5 components together is “Logic Box” and the directors are called as “Logic Boards”. Hitachi calls a component with multiple names which often leads to a confusion. Eg. VSDs are shown as MPBs in the system.

Logic Box & Logic Boards

VSP G1000 uses 5 types of Logic Boards (Directors). HiStar-E Network which is often referred as VSP G1000 architecture is actually a Grid Switch, one of the 5 Logic boards. Grid Switch board inter-connects the other 4 logic boards.

Lets take a detailed look at each of the Logic Board.

HiStar-E Grid Switch (GSW):

HiStar-E Network is basically a PCI Express Grid Switch system. There are either 2 or 4 GSW boards per controller chassis. Total of 8 GSW boards for dual controller chassis. For dual controller chassis, there are cross connections between the corresponding grid switches in each logic box.

In order to install more than 2 FED and 1 BED boards, all 4 GSWs are required. Each GSW board has 24 full duplex ports. These 24 ports are used as follows:

  • 4 ports are dedicated to VSD boards and these ports only see job request traffic or system metadata.
  • 4 ports are dedicated for cross connection to the GSW board in the second chassis.
  • 8 ports are dedicated to connect DCA boards which see reads/writes and control memory updates.
  • 8 ports are dedicated to FED and BED ports which see data and metadata.

GSWs are not inter-connected within a chassis. Every FED and BED board in a chassis is connected to two GSWs, while each VSD and DCA board is connected to all four GSWs.

Virtual Storage Director Board (VSD): (Processing brain of VSP G1000)

VSD contains an Intel Xeon 2.1GHz 8-core microprocessor and basically an I/O processor board which controls FEDs, BEDs, local memory and communication to SVP. VSDs are independent of FEDs and BEDs. User data is never passed through VSD board. VSDs only executes I/O requests and tell FEDs and BEDs what they have to do and where in cache to operate. Each VSD board has 4GB of local DRAM and flash memory device which contain system microcode and current configuration. These Flash Memory devices are the “boot devices” for the subsystem.

In USPv, firmware code used to be in separated modules for each components like FEDs, BEDs. Whereas from VSP, firmware as a whole package is loaded into VSDs. System core will schedule a process depending upon the nature of job it is executing. Any function or task will be executed as a process and will be one of the following 5 processes:

  • Target process — manages host requests to and from a FED board for a particular LDEV.
  • External (virtualization) process — manages requests to or from a FED port used in virtualization mode (external storage). Here the FED port is operated as if it were a host port in order to interoperate with the external subsystem.
  • BED process — manages the staging or destaging of data between cache blocks and internal drives via a BED board.
  • HUR Initiator (MCU) process — manages the “respond” side of a Hitachi Universal Replicator connection on a FED port.
  • RCU Target (RCU) process — manages the “pull” side of a remote copy connection on a FED port.

In addition there will be subsystem software such as HDP and housekeeping processes executed by the various VSDs. Each VSD communicates over the switched grid network with the Data Accelerator processors (DA explained under FED section) on each FED or BED board. The Data Accelerator processor (on FEDs and BEDs) functions as a data pump to move data between the host and cache (FEDs) and between the cache and the drives (BEDs) under the direction of the VSD boards.

Each VSD boards local DRAM is partitioned into 9 spaces. 8 spaces are occupied by a “private work spaces” for 8 cores. 1 space is reserved for “shared control memory region”. There is master control memory located in first 2 slots of global cache which contains a master copy of system data and metadata. All LDEVs in the system are not managed by all VSDs together. LDEVs ownership is assigned to VSDs in round-robin fashion. So that each VSD takes an ownership of equal number of LDEVs. However all LDEVs from a parity group all assigned to same VSD. All metadata associated with those LDEVs managed by a particular VSD is located in the local Control Memory space. Metadata queries occur at very high speed from the local control memory. All updates to the metadata for these LDEVs write through local memory over a GSW path into the master copy of Control Memory located in cache (the first two DCA boards).

A VSD will accept all I/O request for an LDEV it owns without regard for which FED board handled the host request, or which BED is the target of the request. Each VSD interacts with all installed FEDs and BEDs (across both chassis). Adding VSD boards to a subsystem simply increases the I/O processing power (I/O jobs and software) of that subsystem. Each FED board maintains a local copy of the LDEV‐to‐VSD mapping tables in order to know which VSD owns which LDEV. No other VSD boards will ever be involved in these operations unless that VSD board fails. If a VSD board should fail, all of its assigned LDEVs are temporarily reassigned to the other VSD board in that same feature. The fail‐over VSD will read all of the necessary Control Memory information from the shared master Control Memory tables maintained in the cache system. Upon replacement of the failed VSD board, the reverse will occur, and the original LDEV management will revert to the new board. As a rule of thumb, a VSD board should be kept below 70% busy to manage host latencies, and only 35% busy if providing for processor headroom to maintain host performance in case of a failure of one VSD board of a pair.

Data Cache Adapter (DCA) Boards: Cache path control Adapters (CPA)

Data Cache Adapters are memory boards which services all user read/write data. 4 DCA boards can be installed in 2 pairs per controller chassis. Total of 8 DCA boards per dual-controller subsystem providing 2TB global cache. Global Cache is segmented into 64 KB blocks and subsystem internal algorithm assigns certain amount of blocks to write and reads. There is no fixed ratio, as system changes the ratio depending on the usage profile.

DCA boards contain master copy of metadata called as “Control Memory”. Control Memory region contains Subsystem configuration details, Parity Groups/LDEVs/VDEVs data, tables to manage replication operations and external I/O & HDT/HDP control information. The first 2 DCA boards in the DKC-0 chassis have a reserved region of 112GB (4 x 28GB each — mirrored) used for subsystem master copy of Control Memory. Also each DCA board has 500MB reserved region for Cache Directory. This is mapping table to manage pointers from LDEVs to allocated cache slots on DCA boards. Remaining space is allocated to VSD boards as 64KB segments for data buffers when required.

Most of the control memory reads are managed within VDS boards from a local copy of metadata without reaching the master copy of control memory on DCA boards. The only time a processor board normally accesses the master copy of the Control Memory region in global cache is when updating table information, when taking over for a failed partner VSD (of the same installed feature), or after replacing a failed VSD board.

Memory Organization
The VSP G1000 has three parallel logical memory structures that manage access to data, metadata, control tables and subsystem software. These include:

  • Control Memory for configuration data, metadata and control tables. Kept in local DRAM on each VSD board, along with a master global copy in a region of shared cache.
  • VSD Cache Regions (primary cache region for user data blocks, including the mirrored write blocks, parity blocks, and Copy Product data blocks).
  • Local workspace for the processors on each VSD, FED and BED board.

The VSP G1000 has three types of physical memory systems, which include:

  • Cache System (on the Data Cache Adapters) — the subsystem’s primary cache space of up to 2TB per dual chassis system contains:
    - Control Memory master copy on first two DCA boards (4 x 28GB each, mirrored)
    - Data Cache Directories (500MB, one on each DCA board)
    - Data Cache Regions (one per VSD board) allocated from all of the remaining space.
  • VSD Local Memory (LM) on each VSD board used for a process execution for each Intel processor core and a local copy of part of Control Memory.
  • FED and BED Local Memory on each FED and BED board for temporarily buffering data blocks, as well as holding control information such as the LDEV to VSD mapping tables.

Channel Adapter Boards (FEDs):

There are up to 12 FED boards installed in 6 pairs per chassis for a maximum configuration. FED boards controls the interface between the subsystem and hosts. VSP G1000 offers various FED port protocols & variety of FED boards.

Protocols:

  • iSCSI
  • Fibre Channel
  • FICON (shortwave & longwave)
  • Fibre Channel over Ethernet (FCOE)

A two-controller system supports the following maximum number of connections via the FEDs:

  • 192 Fibre Channel ports (16 Gbps, 16-port)
  • 192 Fibre Channel ports (8 Gbps, 16-port)
  • 96 Fibre Channel ports (16 Gbps, 8-port)
  • 176 FICON ports (16 Gbps, 16-port) available in longwave and shortwave versions
  • 176 FICON ports (8 Gbps, 16-port) available in longwave and shortwave versions
  • 192 FCoE ports (10 Gbps, 16-port)
  • 88 iSCSI ports (10 Gbps, 8-port)

Lets see the functioning of FED boards. 16-port configuration FED

16-port open fibre consists of 2 boards, each with 8Gbps or 16Gbps ports. Each board has 1 Data Accelerator processor, 2 Tachyon processors (port processors) and a DRAM. FED board does not decode or execute I/O commands using any of these onboard processors. The Data Accelerator processor communicates with VSD boards, accepting or responding to the host I/O commands. Data accelerator directs the host I/O requests to particular VSD managing the respective LDEV. The VSD processes the commands, manages the metadata in Control Memory, and then sends instructions to the Data Accelerator processors in the FEDs. These then transfer data between the host and cache, virtualized subsystems and cache or HUR operations and cache. The VSD that owns an LDEV tells the FEDs and BEDs where to read or write the data in cache. The local DRAM is used for buffering host data, maintaining LDEV mapping tables (to the managing VSDs), and housekeeping tables.

Disk Adapter Boards (BED):

There can be a maximum of 2 pairs of BED boards per controller chassis. Each BED board has 8 x 6Gbps full duplex SAS links. BED boards execute all I/O jobs received from VSD processor boards and control all reading or writing to drive via the SAS controller chips. BEDs do not understand the notion of an LDEV, but functions on drive ID, a block address and a cache address.

BED functions on Data Accelerators like in FEDs. Each BED board has 2 Data Accelerator processors, 2 SPC SAS I/O Controller processors, and two banks of local DRAM. There is a lot of processing power on each BED board with the use of local DRAM. The local DRAM is used for buffering data, job requests from the VSDs, and other housekeeping tables. There are four GSW switch ports, over which pass job requests to VSDs and user data moving to or from cache.

SPC (Stored Program Control) Processor is programmed with an algorithms to address high performance links to the drives. Each SPC processor contains a high performance CPU that has the ability to directly control SAS or SATA drives (using SAS link protocol encapsulation) over its four full duplex 6Gbps SAS links. The SPC creates a direct connection with one drive per link via the DKU switches when the SPC needs to communicate with a drive. There are no loops as with Fibre Channel. The speed at which a link is driven (3Gbps or 6Gbps) depends upon the interface speed of the selected drive for that individual I/O operation. Those drives with the 3Gbps interface are driven at that rate by the SAS link, and those drives that are 6Gbps are driven at that higher rate whenever a BED SAS link is communicating with them. The speed used with each SAS link is thus dynamic, and depends upon the individual connection made through the switches moment by moment.

Service Processor (SVP):

SVP is a blade PC running Windows operating system installed in DKC-0 Chassis used for system configuration and reporting. It contains the Device Manager — Storage Navigator software that is used to configure and monitor the system. Connecting the SVP to a service center enables the storage system to be remotely monitored and maintained by the support team.

When a second SVP is installed, the primary SVP is the active SVP, and the secondary SVP is the hot idle SVP with active Windows. A hub facilitates the transfer of information from the VSD pairs to the primary SVP.

Cache Backup Module (BKM): Cache Flash Memory

Backup memory modules are installed in pairs, called a backup memory kits. Each module contains two batteries and cache flash memory of either a 128 GB SSD flash drive (small kit) or a 256 GB SSD flash drive (large kit). If the power fails, the cache is protected from data loss by the backup batteries and the cache flash memory (SSD). The batteries keep the cache alive for up to 32 minutes while the data is copied to the SSD.

There are 4 Cache Adapters and 4 Cache backup modules. Each cache flash memory in each BKM is directly connected to its corresponding Cache Adapter blade and backs up the data in that blade if the power fails. When data that is not already stored on disk is written to the cache, it is written to one blade of the Cache Adapter and mirrored to the other. If one BKM box fails, or one phase of the power fails, the other BKM box backs up the mirrored data from its corresponding Cache Adapter blade, and no data is lost. In the unlikely event where a BKM box has failed and a full power failure occurs, the other BKM box backs up the mirrored data from the Cache Adapter Blade and no data is lost.

--

--