Photo by Olia Gozha on Unsplash

don't think outside of the box, think like there is no box

Deploying a Microsoft High Performance Cluster on AWS using Infrastructure as Code.

16 May 2020 | 30 minute read

Microsoft HPC Pack brings the power of high-performance computing (HPC) to the commercial mainstream. The centralized management and deployment interface helps to simplify deployment for both large and small compute clusters and provide a simple and effective management experience to increase cluster administrator productivity.

You can find more information about HPC on the Microsoft Site

When running on Microsoft Azure, you can use the ARM templates provided by Microsoft to install an HPC. You can also use the PowerShell scripts on the internet to install HPC on Virtual Machines manually. You can log on to a VM and install everything as an administrator. Nowadays, we want to use Infrastructure as Code and use scripts and pipelines to install Infrastructure like Virtual Machines on Microsoft Azure or EC2 Instances on Amazon Web Services (AWS). This way, we can redeploy the same setup on each new environment without worrying about run-book, differences, and manual errors.

In this blog, you will learn about deploying an HPC on AWS using CloudFormation templates (in YAML) and PowerShell scripts. I will highlight some things to remember when installing an HPC.

CloudFormation templates

When you want to use infrastructure as code to deploy EC2 instances on AWS, you can use CloudFormation files (in YAML). You can use it to deploy separate instances by creating a template for one single EC2 instance. You can also use an Auto Scaling Group, where you can deploy multiple instances at the same time. You can define a minimum and a maximum number of instances. AWS will make sure there is always a minimal number of instances running. We will create three Auto Scaling Groups for running our SQL Server, to run the HeadNode and the ComputeNodes.

To install the HPC software, we will use the user data option in the CloudFormation templates. This will install the software when the machine is booted; we will create scripts to make sure this is only installed and configured when it is not installed yet. If the installation fails, we can just terminate the instance, and the auto scaling group will automatically start a new instance.

Before we can install the HPC software, we need to make sure it is available on the instances. We used Packer to create Amazon Machine Images (AMI) and used Chocolately to install the AWS Command Line Interface. With this CLI, we can use run AWS instructions to copy software from an S3 Bucket to our AMI.

The HPC has three components we need to install, a Microsoft SQL Server, one or multiple HeadNodes and, one or more ComputeNodes.

SQL Server

The first thing we need is a running Microsoft SQL Server Instance. We used the Microsoft SQL Server AMI provided by Microsoft on AWS to create an AMI, which we will use to create our database server. Like we mentioned before, we installed AWS CLI on the AMI. Because we use an AMI with SQL Server already installed, we do not need to worry about installing the SQL Server software.

When downloading the HPC Pack software, Microsoft provides Powershell scripts to create and install the database on the SQL Server instance. But the instructions require you to log in to the machine and run the scripts as an administrator using Powershell. We changed the scripts for creating the database a little. We want to create the database just once on the first time we create the SQL Server, the next time we create a SQL instance we want to reuse the database and not restart all over again. We will create the database on the D-drive and use a different volume. You can reuse the volumes on AWS instances, find out how by using this blog I wrote

Create the databases

Before you can create a database in a folder on a drive, you need to make sure that the SQL server user has permissions to create the database. First, you need to create the folder on the drive to store the database and next allow the SQL server to create the database files in the folder. We will save this script as setupsqlinstance.ps1.

$path = "D:\SQL"
New-Item -ItemType Directory -Force -Path $path
$Acl = (Get-Item $path).GetAccessControl('Access')
$Ar = New-Object System.Security.AccessControl.FileSystemAccessRule("NT SERVICE\MSSQLSERVER", "FullControl", "ContainerInherit,ObjectInherit", "None", "Allow")
$Acl.SetAccessRule($Ar)
Set-Acl $path $Acl

We need to create a SQL script for creating the databases, to run an HPC you need to create the following databases:

  • HPCManagement
  • HPCScheduler
  • HPCReporting
  • HPCDiagnostics
  • HPCMonitoring

This SQL script will create the HPCManagement database if it does not exist, and if it does, it will reconnect the database. We will create an SQL_CREATE_DB statement to create the database. If the database files exist, we will add the FOR ATTACH statement to reuse the database and register it in SQL Server. Next, we will verify the database doesn’t exist and execute the query. In the script below, you will find the SQL script to create one database; you can extend the script to create all the databases you need, the size and growth are the same as below. We will save this script as CreateHpcDatabase.sql

USE master
GO

DECLARE @hpc_datapath nvarchar(256);
SET @hpc_datapath = 'D:\SQL';
DECLARE @DBNAME nvarchar(256);
DECLARE @SQL_CREATE_DB nvarchar(2000);
DECLARE @exist INT

SET @DBNAME = @hpc_datapath +'\HPCManagement.mdf'
exec master.dbo.xp_fileexist @DBNAME, @exist OUTPUT
SET @exist = CAST(@exist AS BIT)

SET @SQL_CREATE_DB = 'CREATE DATABASE HPCManagement 
ON (
	NAME = HPCManagement_data, 
	FILENAME = ''' + @hpc_datapath + '\HPCManagement.mdf'', 
	size = 1024MB, 
	FILEGROWTH  = 50% 
) 
LOG ON 
( 
	NAME = HPCManagement_log,
	FILENAME = ''' + @hpc_datapath +'\HPCManagement.ldf'',
	size = 128MB,	
	FILEGROWTH  = 50% 
)'

IF (@exist = 1)
  SET @SQL_CREATE_DB = @SQL_CREATE_DB + ' FOR ATTACH;'

IF (NOT EXISTS (SELECT name FROM master.dbo.sysdatabases WHERE ([name] = 'HPCManagement' ) ) )
IF NULLIF(@hpc_datapath, '') IS NOT NULL
EXECUTE (@SQL_CREATE_DB)

Setting Up Mixed Mode Authentication

So now that we have a SQL script to create the databases, we a PowerShell to run this script. To connect to our SQL server, we will use SQL server authentication. The default authentication in SQL Server is Windows authentication, so we need to change this. We can do this within SQL Server, but we need to restart the SQL server services to activate it. We created a script containing the one line below and saved as update_sql_mixed_mode.sql.

EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'LoginMode', REG_DWORD, 2 

The last step for the SQL server is to connect all the pieces and create a PowerShell we can use to run in the UserData option in the CloudFormation template. This PowerShell will run our SQL script on the SQL server itself, so we can simply use localhost to connect.

Creating the users

One last thing before we can create our database, we need a user to for connecting to the database. Microsoft provides two scripts in the HPC pack to create database users; we just need to call them from within this PowerShell we are creating. We want to make sure that we do not store username or password in our source code repository. And you probably also want to use a different user for the different environments. We used the AWS parameter store to store the encrypted secrets, by calling the aws ssm get-parameter we can get the parameters from the store and use them in our script. If they change, we do not need to change any of the code because they are fetched every time we run the script. Now we can use these parameters to create the users in the databases.

$ServerInstance = "localhost"

Invoke-Sqlcmd -ServerInstance $ServerInstance -InputFile "C:\programdata\installdata\SQL\update_sql_mixed_mode.sql"

# stop / start sql server to activate mixed mode authentication
net stop sqlserveragent
net stop mssqlserver 

net start mssqlserver
net start sqlserveragent 

Invoke-Sqlcmd -ServerInstance $ServerInstance -InputFile "C:\programdata\installdata\SQL\CreateHpcDatabase.sql" 

$password = aws ssm get-parameter --name "/sql-server/service-account/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$HpcUser = aws ssm get-parameter --name "/sql-server/service-account/username" --query "Parameter.Value" --with-decryption --output text --region eu-central-1

$ParameterArray = "TargetAccount=$HpcUser", "PassWord=$password"
$hpcDBs = @('HPCDiagnostics', 'HPCManagement', 'HPCMonitoring', 'HPCReporting', 'HPCScheduler')
foreach($hpcdb in $hpcDBs)
{
    Invoke-Sqlcmd -ServerInstance $ServerInstance -Database $hpcdb -InputFile "C:\programdata\installdata\SQL\AddDbUserForHpcSetupUser.sql" -Variable  $ParameterArray 
    Invoke-Sqlcmd -ServerInstance $ServerInstance -Database $hpcdb -InputFile "C:\programdata\installdata\SQL\AddDbUserForHpcService.sql" -Variable  $ParameterArray 
}

Setting up CloudFormation template yaml

We need to make sure this PowerShell scripts run every time we boot our SQL server instance. To make that happen, we will add these scripts to the UserData option of the LaunchTemplate in the CloudFormation file. We have created a createdisks.ps1 script to connect the disks every time we reboot our server, you can find the link to the blog earlier in this blog. Setting the persist tag to true will ensure this script will run every time the instance is booted.

  LaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: !Ref AWS::StackName
      LaunchTemplateData:
        ImageId: 'ReferenceTheTheAMIFile'
        InstanceType: z1d.large
        IamInstanceProfile: 'YourProfile'
        SecurityGroupIds: 'YourSecurityGroup'
        TagSpecifications:
          - ResourceType: instance
            Tags:
              - Key: Name
                Value: SQLServer
              - Key: network-interface-manager-pool
                Value: SQLServer
        UserData:
          Fn::Base64: 
            Fn::Sub: |
              <powershell>
                Write-Host "Mounting volumes and disks"
                Invoke-Expression -Command "& 'C:\\programdata\\installdata\\createdisks.ps1' "
                Write-Host "Mounting done"
                
                Write-Host "Setup SQL Instance"
                Invoke-Expression -Command "& 'C:\\programdata\\installdata\\setupsqlinstance.ps1' "

                Write-Host "Create HPC databases"
                Invoke-Expression -Command "& 'C:\\programdata\\installdata\\setuphpcdatabase.ps1' "
              </powershell>
              <persist>true</persist>

So now we have a running SQL Server instance, and we are almost ready to connect the HeadNode. We created the user, so we should be able to connect to the instance, but we are on a cluster. If the SQL server instance is recreated, the IP-address and server name will change. So we need to add something that will not change and what we can use to always make a connection. Therefore we will add a DNS record using AWS Route53 and connect that to the SQL server, and later on, will we create another one for the HeadNode. We also added an IP-address that never changes.

First, you need to create a NetworkInterface, you can use the network-interface-manager-pool tag to connect this to the SQL server instance.

  SQLServerNIC:
    Type: AWS::EC2::NetworkInterface
    Properties:
      SubnetId: 'YourSubNet'
      GroupSet: 'YourSecurityGroup'
      PrivateIpAddress: !Sub
        - ${IpAddress}
        - IpAddress: !Join
            - .
            -   - !Select [0, !Split [., !Select [0, !Ref Subnets]]]
                - !Select [1, !Split [., !Select [0, !Ref Subnets]]]
                - !Select [2, !Split [., !Select [0, !Ref Subnets]]]
                - '5'
      Tags:
        - Key: Name
          Value: SQLServer
        - Key: network-interface-manager-pool
          Value: SQLServer

Now we can create a RecordSetGroup and connect that to the NetworkInterface IP-address. The name of the RecordSets contains the DNS-name we can use to connect to our SQL Server instance.

   SQLServerDNSRecord:
    Type: AWS::Route53::RecordSetGroup
    Properties:
      HostedZoneId: !Ref PrivateHostedZoneId
      RecordSets:
        - Name: 'mssql.mydomain.com'
          Type: A
          TTL: '60'
          Weight: 1
          SetIdentifier: sql-server
          ResourceRecords:
            - !GetAtt SQLServerNIC.PrimaryPrivateIpAddress

So that it for creating the SQL Server, now let’s move to the HeadNode.

HeadNode

HPC Pack includes a highly scalable job scheduler that provides support for interactive Service-Oriented Architecture (SOA) applications using High Performance Computing for Windows Communication Foundation (HPC for WCF) and parallel jobs using the Microsoft Message Passing Interface (MS-MPI). Essential applications from key independent software providers (ISVs) can be run on the cluster to help you meet your business needs in a timely, cost-effective, and highly productive manner.

The HeadNode makes sure jobs for the HPC are distributed correctly over the ComputeNodes. It holds the network topology and node templates you can assign to the ComputeNodes. You can find more information about the complete HPC pack on the Microsoft site, use the link at the top of this blog.

Create a self-signed certificate

The first thing we need to make sure when installing the HPC Pack software on the HeadNode is that the SQL Server instance is up and running. The installation of the HPC Pack verifies the connection with the SQL Server. Next, we need a certificate to communicate over HTTPS, Microsoft provides a PowerShell script called CreateCertificate.ps1 in the installation pack of the HPC software. In this blog, we will use this certificate. As you might know, you can only export a certificate if you use a password; in the Microsoft script, there is a default password. However, this is not what we want to use, so we changed a few lines in the PowerShell but left the rest untouched.

Add the top of the script we added the following lines. First, we changed the CommonName into something we could recognize, and we changed the Path into where we wanted the certificate to be saved. Like with the SQL Server users, we added another password to the AWS Parameter Store. We will need this password to export the certificate, but we also need it to install the HeadNode software later on.

$CommonName = "HPC Pack 2016 Communication"
$Path = "C:\ProgramData\installdata\HPCCertificate.pfx"
$password = aws ssm get-parameter --name "/certificate/hpc/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$password = $password | ConvertTo-SecureString -AsPlainText -Force

At the bottom of the script, we changed in lines below to make sure the certificate is saved to the Path we need and uses the password we stored in the AWS Parameter Store

PFXString = $enrollment.CreatePFX([Runtime.InteropServices.Marshal]::PtrToStringAuto([Runtime.InteropServices.Marshal]::SecureStringToBSTR($Password)), 0)
Set-Content -Path $Path -Value ([Convert]::FromBase64String($PFXString)) -Encoding Byte
# Remove the certificate from Cert:\CurrentUser\My\$thumbprint store
Remove-Item Cert:\CurrentUser\My\$thumbprint -Confirm:$false -Force

Installing the HPC Pack on the HeadNode.

When we created the AMI for installing the HeadNode, we used the AWS CLI to copy the HPC Pack software from an AWS S3 Bucket to the AMI. We will add this script to the UserData of the EC2 instance. These scripts only need to install the software when it is not running already. At the top of the script, we add a check to verify that the schedular service is installed.

# if already installed using the hpc scheduler service installed in this script
if (Get-Service HpcScheduler -ErrorAction SilentlyContinue)
{
    Write-Host "HPC Pack already installed, skipping..."
}

So if this service isn’t installed, we can install the HPC Pack on the HeadNode. The HPC Pack installer also provides a command-line interface. You can provide the parameters you need to install the HPC Pack unattended. Before we can run the installer, we will create the command line arguments in a single character string and pass this as an argumentlist to the installer.

We need both the information about the certificate as well as the SQL Server and the SQL-User for connecting to the database. We will get them from the AWS Parameter Store and store them in variables we later on need.

$certificate_password = aws ssm get-parameter --name "/certificate/hpc/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$sql_password = aws ssm get-parameter --name "/sql-server/service-account/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$sql_user = aws ssm get-parameter --name "/sql-server/service-account/username" --query "Parameter.Value" --with-decryption --output text --region eu-central-1

$tgtdir = "C:\ProgramData\installdata\HPC\2016\"
$certpath = "C:\ProgramData\installdata\HPCCertificate.pfx"

$ClusterName = "HeadNode.mydomain.com"
$SQLServerInstance = "mssql.mydomain.com"

Let’s pass this information into one big ArgumentList. We need to tell the HPC Pack software we want to install the HeadNode version of the software, and we want to run without any user interaction, so the first two parameters are -unattend and -HeadNode. We will add a name to our cluster using the DNS HeadNode entry we will create in our CloudFormation file. For the installation, we need the just created certificate and the password we used to export the certificate to disk.

$setupArg = "-unattend -HeadNode -ClusterName:$ClusterName -SSLPfxFilePath:$certpath -SSLPfxFilePassword:$certificate_password"

It is time to create a connection to the database on the running SQL Server instance. We need to specify a connection string to each database separately. We know they all exist on the same server, so we can reuse the connection to the server with the sql_user and sql_password variables. Next, we can specify the connection string to each of the databases and add them all to the setupArg variable. The HPC Pack installer will create tables and views in the database using this connection string.

$secinfo = "Integrated Security=False;User ID=$sql_user;Password=$sql_password"
$mgmtConstr = "Data Source=$SQLServerInstance;Initial Catalog=HpcManagement;$secinfo"
$schdConstr = "Data Source=$SQLServerInstance;Initial Catalog=HpcScheduler;$secinfo"
$monConstr  = "Data Source=$SQLServerInstance;Initial Catalog=HPCMonitoring;$secinfo"
$rptConstr  = "Data Source=$SQLServerInstance;Initial Catalog=HPCReporting;$secinfo"
$diagConstr = "Data Source=$SQLServerInstance;Initial Catalog=HPCDiagnostics;$secinfo"
$setupArg = "$setupArg -MGMTDBCONSTR:`"$mgmtConstr`" -SCHDDBCONSTR:`"$schdConstr`" -RPTDBCONSTR:`"$rptConstr`" -DIAGDBCONSTR:`"$diagConstr`" -MONDBCONSTR:`"$monConstr`""           

The last thing we need to do is installing the HPC software on the EC2 instance by calling the installer and passing the setupArg as the argument list. We will wait until the process finishes using the -Wait option from the Start-Process command in PowerShell. Installing the HeadNode software may take a while. It also might fail, you can take a look at the log files in the Windows\Temp folder, this is where the HPC Pack creates a new folder every time the installer is started. So make sure you are able to logon to the server or can connect using remote PowerShell, the AWS Console offers options to connect using a command-line interface.

$p = Start-Process -FilePath "$tgtdir\setup.exe" -ArgumentList $setupArg -PassThru -Wait
if($p.ExitCode -eq 0)
{
    LogWrite "Succeed to Install HPC Pack HeadNode"
    break
}
if($p.ExitCode -eq 3010)
{
    LogWrite "Succeed to Install HPC Pack HeadNode, a reboot is required."
    break
}

Configure the HeadNode

After the installation of the HPC Pack software, we need to configure the HeadNode. The HPC pack provides us with a PowerShell module we can use to complete the configuration. After activating the snap-in, we need we can get the network interface we need. In this case, we have two, because we will create two in the CloudFormation file, the same way as we did in the SQL Server deployment. We will select the first one and set the HpcNetwork on Enterprise using this network interface.

Enterprise network: An organizational network connected to the HeadNode and, in some cases, to other nodes in the cluster. The enterprise network is often the public or organization network that most users log on to perform their >work. All intra-cluster management and deployment traffic is carried on the enterprise network unless a private network and an optional application network also connect the cluster nodes.

Add-PSSnapin Microsoft.HPC
$status = (Get-Service -Name HpcScheduler -ErrorAction SilentlyContinue -ErrorVariable StErr | Select-Object -ExpandProperty Status)
$NIC = Get-HpcNetworkInterface -ErrorAction Stop
Set-HpcNetwork -Topology Enterprise -Enterprise $NIC.Name[0] -EnterpriseFirewall $null

We have added the network setting to the cluster; the next step is to let is use the correct credentials when starting jobs. Again we will get the information we need from the AWS Parameter Store. We will also create a PSCredential using the username and password. We will use the PSCredential to pass on to the HPC cluster configuration methods for the HpcJobCredentials and HpcClusterProperty.

$username = aws ssm get-parameter --name "/service-account/username" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$password = aws ssm get-parameter --name "/service-account/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$password = $password | ConvertTo-SecureString -AsPlainText -Force
$credential = New-Object System.Management.Automation.PSCredential($username,$password)

Set-HpcJobCredential -Credential $credential
Set-HpcClusterProperty -InstallCredential $credential

The PowerShell plugin offers a lot of options for the configuration of the HPC Pack. We will not discuss them all here, but what you might need to do is add users to the HPC Pack using the Add-HpcMember and provide a -Role parameter, you can use Administrator, JobAdministrator or User

The last step is setting up the HeadNode and bringing it online. The HeadNode needs to be online; else it will not accept any ComputeNode connections.

add users

Add-HpcMember -Name $item -Role administrator -ErrorVariable Err -ErrorAction Stop -WarningAction Stop
Set-HpcNode -Name $env:COMPUTERNAME -Role BrokerNode
Set-HpcNodeState -Name $env:COMPUTERNAME -State online

Share the certificate

So we are almost done with setting up the HeadNode. There is only one thing we need to do, and that is, make the certificate available for the ComputeNodes. Because we are using a self-signed certificate, the ComputeNodes need to install the same certificate so we can use a trusted HTTPS connection between the nodes. This leaves us with a problem, because we are installing the HeadNode and the certificate is in a folder on our C-Drive. The solution is not that complicated; during the installation of the HPC Software a share is created, which can be used by the ComputeNodes during setup. So we copy the certificate we used to this shared folder (C:\Program Files\Microsoft HPC Pack 2016\Data\InstallShare\Certificates\).

Setting up CloudFormation template yaml

The HeadNode CloudFormation files are almost the same as the once we used to deploy the SQL Server instance. We used Packer to create a new AMI based on the standard Windows Server version provided by Microsoft. We added Chocolately and the AWS CLI. For running the HeadNode we created an Auto Scaling Group with a launch template and provided the HeadNode with a DNS name called HeadNode.mydomain.com. The only thing different in the LaunchTemplate is, of course, the user-data.

First, we install the certificate by calling the little modified PowerShell script provided by Microsoft. Next, we will install the HPC Pack using credentials from the parameter store, and the last step is we copy the certificate we use to install the HeadNode to the share created by the installer.

UserData:
  Fn::Base64: 
    Fn::Sub: |
      <powershell>
        $username = aws ssm get-parameter --name "/service-account/username" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
        $password = aws ssm get-parameter --name "service-account/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
        $password = $password | ConvertTo-SecureString -AsPlainText -Force
        $credential = New-Object System.Management.Automation.PSCredential($username,$password)

        Write-Host "Create HPC Certificate"
        Invoke-Expression -Command "& 'C:\\programdata\\installdata\\CreateHpcCertificate.ps1' "
        Write-Host "Done Create HPC Certificate"

        Write-Host "Installing HPC Pack"
        Invoke-Command -ComputerName . -Credential $credential -File C:\\programdata\\installdata\\install_hpc_HeadNode.ps1
        Write-Host "Done Installing HPC Pack"

        $certpath = "C:\ProgramData\installdata\HPCCertificate.pfx"
        Copy-Item -Path $certpath -Destination 'C:\Program Files\Microsoft HPC Pack 2016\Data\InstallShare\Certificates\'
      </powershell>
      <persist>true</persist>

ComputeNde

The ComputeNode will run the jobs provided by the HeadNode based on the template which is configured on the ComputeNode.

Copy the certificates

Before we can install the software on the ComputeNode we need to copy the certificates we need from the share on the HeadNode. Except for the certificate we copied in there, there is another certificate that we must use to communicate with the HeadNode. The first step is to copy them both from the share to a local folder on the ComputeNode.

We are running a deployment from an AWS CodeBuild pipeline, and therefore we do not have permission to connect to the HeadNode without the correct credentials. Remember, we installed the HPC Pack on the HeadNode using specific credentials, we now use those credentials again to connect to the share with the certificates. We use these credentials to create a drive using the New-PSDrive command to the share on the HeadNode and use this drive to copy the certificates from the HeadNode to the ComputeNode.

$username = aws ssm get-parameter --name "/service-account/username" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$password = aws ssm get-parameter --name "/service-account/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
$password = $password | ConvertTo-SecureString -AsPlainText -Force
$credential = New-Object System.Management.Automation.PSCredential($username,$password)

New-PSDrive -Name "InstallDir" -PSProvider "FileSystem" -Root "\\HeadNode.mydomain.com\REMINST" -Credential $credential
Copy-Item InstallDir:\certificates\*.* -Destination "C:\ProgramData\installdata"

Installing the HPC Pack on the ComputeNode.

To install the software on the ComputeNode you can use the same installer as we used on the HeadNode. The difference is that we need to tell the installer is needs to install the ComputeNode. But before we can install the software, we first need to make sure we can communicate with the HeadNode using HTTPS. We first need to install the .cer file from the HeadNode to make sure the HeadNode and the ComputeNode trust each other. We need to add this certificate to the Root of the local machine.

$cerFileName = "C:\ProgramData\installdata\HpcHnPublicCert.cer"
Import-Certificate -FilePath $cerFileName -CertStoreLocation Cert:\LocalMachine\Root  

When we created the certificate, we used a password from the AWS Parameter Store to export it from the certificate store on the HeadNode. We now need the same password to install the software on the ComputeNode. We also need to tell the ComputeNode what the HeadNode is its needs to connect to. So we change the command line arguments a bit and then call the Start-Process again.

$tgtdir = "C:\ProgramData\installdata\HPC\2016\"
$certpath = "C:\ProgramData\installdata\HPCCertificate.pfx"
$certificate_password = aws ssm get-parameter --name "/certificate/hpc/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1

$dnsname = aws ssm get-parameter --name " /Vpc/Default/PrivateDns/Name" --query "Parameter.Value" --output text --region eu-central-1
$HeadNode = "HeadNode.mydomain.com"

$setupArg = "-unattend -ComputeNode:$HeadNode -SSLPfxFilePath:$certpath -SSLPfxFilePassword:$certificate_password"

$p = Start-Process -FilePath "$tgtdir\setup.exe" -ArgumentList $setupArg -PassThru -Wait
if($p.ExitCode -eq 0)
{
    LogWrite "Succeed to Install HPC Pack HeadNode"
    break
}
if($p.ExitCode -eq 3010)
{
    LogWrite "Succeed to Install HPC Pack HeadNode, a reboot is required."
    break
}

Double-hop

After installing the HPC Pack on the ComputeNode we need to make sure it can connect to the HeadNode. We run the PowerShell from a pipeline; there is a remote connection to the EC2 instance with credentials to connect. But we need different credentials to connect to configure the HeadNode, so we need to provide different credentials. Because we use credentials to configure the ComputeNode which are passed thru to the HeadNode we get a double-hop issue. The standard in remote PowerShell is that you can ony use the credentials on the machine you are connected to, you are not allowed to connect to another machine.

To configure the ComputeNode correctly, we need tell PowerShell it is allowed to pass the credentials to the HeadNode. To make this work, we need to create a SessionConfiguration; here we can provide the correct credentials by using the -RunAsCredential. Next, we can call the script to configure the ComputeNode using the SessionConfiguration, now this script will always use the provided credentials also when the connection to the HeadNode.

Register-PSSessionConfiguration -Name InstallHPC -RunAsCredential $credential -Force 
Invoke-Command -ComputerName . -Credential $credential -File C:\\programdata\\installdata\\configure_hpc_ComputeNode.ps1 -ConfigurationName InstallHPC

Configure the ComputeNode

The script for configuring the ComputeNode will use the PowerShell module for HPC. When installing the ComputeNode it will install all the software needed to run a ComputeNode. What we need to do is set the node from unapproved to online. First, we use the Get-HpcNode to get information about the node. Then we assign the template we need for this ComputeNode. After assigning the template the node status will change to Offline, now we can set the node state to Online. IF you start the HPC software on the HeadNode you will see this ComputeNode as Online, and it can be used by the HeadNode to run jobs.

$DefaultNodeTemplate = "Default ComputeNode Template"

Add-PSSnapin Microsoft.HPC

$node = Get-HpcNode -HealthState Unapproved -Name $env:COMPUTERNAME
Assign-HpcNodeTemplate -Name $DefaultNodeTemplate -Node $node -PassThru -Confirm:$false -ErrorAction SilentlyContinue -WarningAction SilentlyContinue -ErrorVariable Err
   
$node = Get-HpcNode -HealthState OK -State Offline -Name $env:COMPUTERNAME
Set-HpcNodeState -Name $env:COMPUTERNAME -State Online -ErrorAction Stop

Setting up CloudFormation template yaml

Just like the SQL Server and the HeadNode we will also use an AutoScalingGroup from running the ComputeNodes. Because you probably want more then one ComputeNode we will not assign IP addresses or DNS entries, simply because we do not really care about this configuration on the ComputeNodes and because we do not need them.

We again just just change the UserData option in the CloudFormation file. In the UserData part, we need to put getting the credentials we need from the AWS Parameter Store and make sure we use the copy certificate script before installing the HPC Pack. We separated the installation and the configuration from the ComputeNode into two scripts. Only for the configuration, we need the PSSessionConfiguration, for the installation, we can just pass tru the credentials using the Invoke-Command method.

UserData:
  Fn::Base64: 
    Fn::Sub: |
      <powershell>
        $ServiceusernameAccount = aws ssm get-parameter --name "/service-account/username" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
        $password = aws ssm get-parameter --name "/service-account/password" --query "Parameter.Value" --with-decryption --output text --region eu-central-1
        $password = $password | ConvertTo-SecureString -AsPlainText -Force
        $credential = New-Object System.Management.Automation.PSCredential($username,$password)

        New-PSDrive -Name "InstallDir" -PSProvider "FileSystem" -Root "\\HeadNode.${PrivateHostedZoneName}\REMINST" -Credential $credential
        Copy-Item InstallDir:\certificates\*.* -Destination "C:\ProgramData\installdata"

        Invoke-Command -ComputerName . -Credential $credential -File C:\\programdata\\installdata\\install_hpc_ComputeNode.ps1 

        Register-PSSessionConfiguration -Name InstallHPC -RunAsCredential $credential -Force 
        Invoke-Command -ComputerName . -Credential $credential -File C:\\programdata\\installdata\\configure_hpc_ComputeNode.ps1 -ConfigurationName InstallHPC

        Remove-PSDrive -Name InstallDir
      </powershell>
      <persist>true</persist>

Dependencies

The tricky part of installing this HPC Pack is the dependency between all the different components. You can indicate dependencies between the different resources, but there is a small problem here because when the EC2 instances is started the CloudFormation resource is done, but we are not done with the software installation yet. To resolve this problem, you can use the cfn-signal.exe functionality to indicate the installation is completed. You can use this to send a signal when the software installation is done. What you need to do is change the CloudFormation template and put all the scripts in the Metadata option

Within the MetaData you need to configure configsets which need to contain the script you want to run. You can use multiple commands that need to run after each other. The last step is sending the signal when done. In the resource that depends on this resource, add the WaitOnResourceSignals: true in the UpdatePolicy section in the CloudFormation file.

Metadata:
  AWS::CloudFormation::Init:
    configSets:
      ascending:
        - setup
        - install_hpc
    setup:
      files:
        c:\ProgramData\installdata\install_hpc.ps1:
          content: !Sub |
            New-PSDrive -Name "InstallDir" -PSProvider "FileSystem" 
                -Root "\\HeadNode.${PrivateHostedZoneName}\REMINST" -Credential $credential
            Copy-Item InstallDir:\certificates\*.* -Destination "C:\ProgramData\installdata"
    install_hpc:
      commands:
        00-add-adminuser:
          command: powershell.exe -ExecutionPolicy Unrestricted Add-LocalGroupMember 
            -Group "Administrators" -Member "awsad\$env:computername$"
        01-install-hpc:
          command: powershell.exe -ExecutionPolicy Unrestricted C:\programdata\installdata\install_hpc.ps1
        02-signal-completion:
          command: !Sub >
            cfn-signal.exe -e %ERRORLEVEL% --resource ComputeNodeWindowsAutoScalingGroup
            --stack ${AWS::StackName} --region ${AWS::Region}

Done!!

So that’s it, we create a High Performance Cluster running on Amazon Web Services without using the AWS Console and without manual actions. We can use these scripts and templates every time we need a new configuration and we can deploy it to every environment. Because we use an AutoScalingGroup for every instance we created, AWS will make sure we have the minimum number of instances running.

About

arjan

I work as a consultant for Xebia. I am passionate about problem-solving for customers with the help of technology. I love to learn new techniques, technologies and ways to improve myself.