Failover Cluster with Shared Virtual Disk beta

Failover Cluster with Shared Virtual Disk beta

In our last installment, we took a look at deploying a set of VMs with shared storage in the form of shared virtual disks. This week, we’ll go into the guests and bring the disks online and deploy Failover Cluster.

Preparing The Disks

When we attach the shared virtual disks along with our Windows template vDisk, the shared disks will start out offline and unformatted. We’ll rectify that here. You can use the Disk Management applet in Computer Management to do this, but we’ll present it here as a set of DISKPART.EXE commands, which are easy to copy/paste and apply in a script somewhere.

No matter which approach you take, the process is the same and comes down to these steps:

  1. On the primary:
    1. Online the shared disks
    2. Explicitly assign a drive letter (optional)
    3. Format the disks
  2. On the secondary
    1. Online the shared disks
    2. Explicitly assign a drive letter (optional)

Add the following commands to a text file (init-disks-primary.txt for example):

select disk 1
online disk
attributes disk clear readonly
create partition primary
select partition 1
format fs=NTFS LABEL="DATA"
assign letter=S

select disk 2
online disk
attributes disk clear readonly
create partition primary
select partition 1
format fs=NTFS LABEL="LOGS"
assign letter=L

These commands are DISKPART.EXE commands to bring each disk online, set them writable and create a partition and format it. We also explicitly assign drive letters (S for SQL and L for Logs). The latter isn’t required.

To run these commands in batch, simply redirect the contents of this file to stdin for DISKPART.EXE, which might look like this:

DISKPART.EXE < init-disks-primary.txt

We do this on the primary. The secondary is a very similar process, but we don’t need to create a partition or format it (the primary will have already done so). For the secondary, add the following DISKPART.EXE commands to a file called init-disks-secondary.txt:

select disk 1
online disk
attributes disk clear readonly
select partition 1
assign letter=S

select disk 2
online disk
attributes disk clear readonly
select partition 1
assign letter=L

And run those in a similar fashion:

DISKPART.EXE < init-disks-secondary.txt

At this point, we have both disks available to both guests. It’s important that we don’t try to write anything to the disks at this point — we need to deploy Failover Cluster to control which VM has write access to the disks (the active node in the cluster). If you take a look at the Disk Management applet in Computer Management, you ought to see something similar this:

disk-mgmt-clustered

Installing and configuring Failover Cluster on both nodes is straightforward through PowerShell. First, we install the Failover Cluster role on both VMs:

Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools

And then create a new cluster from the primary node and add the secondary and the shared virtual disks. To deploy the cluster, we’ll need the following information:

  1. A unique name for the cluster (SQL-SC in the example below)
  2. A shared IP address for clients to access the cluster (192.0.2.30 in the example below)
  3. The names of the nodes/VMs in the cluster (RSVD-SC1 and RSVD-SC2 in the example below)

Given those, we can create the cluster:

New-Cluster -Name SQL-SC -Node RSVD-SC1 -StaticAddress 192.0.2.30

Add the second node to the cluster:

Add-ClusterNode -Cluster SQL-SC -Name RSVD-SC1

And finally add all of the shared storage to the cluster:

Get-ClusterAvailableDisk -Cluster SQL-SC | Add-ClusterDisk -Cluster SQL-SC

At this point, you should have a two node cluster and Failover Cluster Manager should show the two shared virtual disks as cluster shared storage, which might look something like this:

rsvd-clustered

With the cluster running, simply use the Microsoft SQL Server installer to install a clustered instance of SQL and point it at the shared storage. You can then point your SQL clients at the cluster name given above and live happily ever after.

[Serpens Cluster image by Robert Sullivan and used unmodified under Public Domain]

Advertisements

Hyper-V Shared Virtual Disks Beta

Hyper-V Shared Virtual Disks Beta

We’ve been busy here at Tintri HQ. One of the things that has taken up a chunk of my time is taking a look at some functionality that is currently in beta. Specifically Shared Virtual Disks for Hyper-V.

Overview

Shared Virtual Disks allow you to have multiple virtual machines that have a common virtual disk (often multiple) that each of the VMs can communicate. The primary use case for this is for highly available clustered applications that require shared storage and quorum. Much in the same way that a Tintri VMstore has two controllers for redundancy and common storage between the two.

These clustered VMs are configured as active-passive sets where the active VM is the one that currently has the ability to do I/O to the storage.

More Information

Shared Virtual Disk support over SMB sounds pretty simple, but there’s a lot to it. It involves taking SCSI commands, such as those used to manage SCSI reservations on shared storage, and tunnel them within SMB commands. If you’re really interested in how this works, you can find out more in the MS-RSVD specification.

A big shout out to our friends at Microsoft who have been working with us for quite some time on this and helping to make sure that any ambiguities in the spec were cleared up.

Deployment

Deployment of clustered VMs is a little more involved than a standalone VM. The way we’ve been doing it is to take a freshly sysprepped Windows 2012r2 VHDX file, create two VMs from it and create and add the shared disks to it. From there, I can set them up as a Microsoft Failover Cluster and install clustered SQL.

We’ll use standard Hyper-V PowerShell cmdlets to demonstrate the process.

First, creation of the two VMs. Note that they have no disks attached yet.

$mastername = "SQL-1"
$slavename = "SQL-2"
$smbpath = "\\vmstore01-data.vmlevel.com\VMs"
$masterpath = "$smbpath\$mastername"
$slavepath = "$smbpath\$slavename"
$master = New-VM -Name $mastername -Path $masterpath -MemoryStartupBytes 8GB -Generation 2 -NoVHD
$slave = New-VM -Name $slavename -Path $slavepath -MemoryStartupBytes 8GB -Generation 2 -NoVHD

Next, we copy the operating system vDisk to each VM’s folder and add it to the VM.

$vmtemplate = "\\vmstore01-data.vmlevel.com\Templates\Windows Core 2012r2.vhdx"
copy $vmtemplate "$masterpath\$mastername.vhdx"
Add-VMHardDiskDrive -VM $master -Path "$masterpath\$mastername.vhdx" 
copy $vmtemplate "$slavepath\$slavename.vhdx"
Add-VMHardDiskDrive -VM $slave -Path "$slavepath\$slavename.vhdx"

You’ll notice that because the template is on the same storage appliance as the VMs are being deployed to, that the copy of the vDisk is nearly instant. This is due to local copy offload or ODX, which is something that we’ve covered before. If the template was on one VMstore and being deployed to another VMstore, distributed copy offload would kick in making the copy far quicker than usual.

At this point, our VMs are pretty-much standard. They have a single disk from a common golden image.

Next, we need to create the shared disks and attach those. In this case, we intend to have two shared disks for my SQL 2014 instance — one for the data and one for logs. Here’s how:

New-VHD -Path "$masterpath\$mastername-data.vhdx" -Fixed -SizeBytes 100GB
New-VHD -Path "$masterpath\$mastername-logs.vhdx" -Fixed -SizeBytes 25GB

Add-VMHardDiskDrive -VM $master -Path "$masterpath\$mastername-data.vhdx" -SupportPersistentReservations
Add-VMHardDiskDrive -VM $master -Path "$masterpath\$mastername-logs.vhdx" -SupportPersistentReservations

Add-VMHardDiskDrive -VM $slave -Path "$masterpath\$mastername-data.vhdx" -SupportPersistentReservations
Add-VMHardDiskDrive -VM $slave -Path "$masterpath\$mastername-logs.vhdx" -SupportPersistentReservations

Things to note here:

  1. The -SupportPersistentReservations option. This is where the magic happens and allows these two VMs to share the same vDisk
  2. We’ve placed the two share vDisks under the same directory as the master and simply pointed the slave to it, but this is arbitrary. These shared disks could be in their own directory. It is important to keep them all somewhat local to each other though.

At this point, we have two virtual machines that both have common, shared storage over SMB3.

This is how it looks in Hyper-V Manager once deployed:

RSVD-disk

In a follow-up article, we’ll look at provisioning this shared storage and setting up the Failover Cluster.

Limitations

As mentioned, this functionality is still in beta and not yet generally available. When the initial support is released, it will come with some temporary limitations. The focus has been put on data integrity above everything else, so to limit the development and test scope and allow more resources to prove out the data integrity side of things, the use case for Shared Virtual Disks over SMB is currently limited to:

  • Windows 2012r2 Hosts
  • Windows 2012r2 Guests
  • SQL 2014 as a clustered guest application
  • Shared disks must be fixed VHDX
  • Snapshots and replication are currently not supported

More specific detail around these use cases will accompany the release, but we’re always interested to hear your thoughts and requirements around additional use cases and functionality.

[High Availability image by flattop341 and used unmodified under CC2.0]

Enterprise Cloud and All-You-Can-Eat Buffets

Enterprise Cloud and All-You-Can-Eat Buffets

What does Enterprise Cloud have in common with an all-you-can-eat buffet?

Delegation and Self-Service

As well as automating tasks to be run on a regular schedule to avoid manual handling, many tasks are being automated to allow the delegation of executing certain tasks to other humans.

Consider the case where you have 10 VDI desktops deployed. As common tasks come up, such as restoring files from snapshots, diagnosing performance issues or provisioning new desktops, it’s easy to jump in and take care of matters by hand. Take that number to 1000 and you’re likely going to start to see issues maintaining those by hand as you scale. Get to 10,000 or more and it’s an entirely different class of problem.

This doesn’t just apply to VDI — DevOps deployments and Enterprise server farms are seeing the same kinds of challenges as they scale too.

In order to scale past a few systems, you need to start to delegate some number of tasks to someone else. Whether that be a helpdesk team of some kind, or a developer or application owner, or even potentially the end user of a VDI desktop.

However, delegation and self-service are not just a case of dumping a bunch of tech in front of folks and wishing them luck. In most cases, these folks won’t have the technical domain knowledge required to safely manage their portion of infrastructure. We need to identify the tasks that they need to be able to perform and package those up safely and succinctly.

Buffet!

Consider a restaurant with an all-you-can-eat buffet. One of the nice ones — we’re professionals here. Those buffets don’t have a pile of raw ingredients, knives and hotplates, yet they’re most definitely still self-service.

You’re given a selection of dishes to choose from. They’ve all been properly prepared and safely presented, so that you don’t need to worry about the food preparation yourself. There is the possibility of making some bad decisions (roast beef and custard), but you can’t really go far enough to actually do yourself any great harm.

They do this to scale. More patrons with fewer overhead costs, such as staff.

DIY Self-Service

As we deploy some kind of delegation or self-service infrastructure, we need to:

  1. Come up with a menu of tasks that we wish to allow others to perform,
  2. Work out the safety constraints around putting them in the hands of others, and
  3. Probably still having staff to pour the bottomless mimosas instead of simply a tap.

We did introduce these two things in previous series’ of articles. In particular, #1 is a case of listing and defining one or more business problems, as we saw in the automation series.. For example, users that accidentally delete or lose an important file, might need a way to retrieve files from a snapshot from a few days ago. #2 above is referring to taking and validating very limited user input. In the restore example above, we’d probably only allow the user to specify the day that contains the snapshot they’re looking for and maybe the name of their VM.

Public Cloud

Self-service and autonomy are one of the things that Public Cloud have brought to the table at a generic level. By understanding the specifics of your own Enterprise, you can not only meet, but exceed that Public Cloud agility within your own data centre. This can also be extended to seamlessly include Public Cloud for the hybrid case.

Next Steps

As with each of these series, we’re starting here with a high level overview and will follow that up with an illustrative example over the coming articles. We’ll build on what we’ve learned in those previous series and we’ll again use the common System Center suite to get some hands-on experience. As always, the concepts and workflow apply quite well to tools other than System Center too.

To summarise, delegation and self-service are essential for most organisations as they scale. When used to safely allow autonomy of other groups, it can save you and your team significantly.

[Buffet picture by Kenming Wang and used unmodified under SA2.0]

 

A Very Particular Set Of Skills

A Very Particular Set Of Skills

Has anybody not heard of the recent ransomware attack known as WannaCry? No? Good. Hopefully you’re only aware of it through news articles, but for far too many folks, this is not the case.

We all keep our patches up to date and we all use various levels of protection to limit the attack surface and potential spread of these kinds of attacks.

Unfortunately, and for various reasons, these kinds of attacks can still wreak havoc.

When this does happen, it doesn’t have to ruin your year.

For an individual virtual machine affected by this, simply:

  1. Revert your affected VM back to a previous snapshot using SyncVM
  2. Start the VM disconnected from the network
  3. Apply any updates to close the exploited security hole
  4. Reconnect to the network
  5. Don’t pay the ransom

In cases where there are a very large number of affected VMs, a lot of this process can be automated.

To misappropriate and misquote the famous speech from the movie Taken, our customers have a very particular set of skills. Skills we’ve been assisting them with over a long career.

[Ransom Note image by Sheila Sund and used unmodified under CC BY 2.0]

Enterprise Cloud Orchestration recap

Enterprise Cloud Orchestration recap

This brief article hopes to summarise and collect the recent set of articles published around orchestration in the Enterprise Cloud.

  1. In our first article, we gave an overview of orchestration in the context of the larger automation umbrella and looked at is as a way to simplify the safe execution of automated tasks.
  2. Part two in the series looked at orchestration workflows (runbooks in System Center speak), using System Center Orchestrator 2016 as an example.
  3. Article #3 looked at a Microsoft PowerShell template for calling complex PowerShell functionality from within a System Center Orchestrator runbook.
  4. In our next article, number four, we looked at the use-case specific code. Our example used Tintri SyncVM to perform some innovative and efficient data-copy management for our Test/Dev folks.
  5. Finally, article five in the series pulled it all together and allowed us to execute the orchestration runbook, and our PowerShell activity, and see the results.

This series extended upon our automation series to take a business problem and create an agile and automated solution suitable for safely delegating to folks outside our core infrastructure group. This could also be scheduled for regular execution within Orchestrator.

Keep your eye out for the next series, which will look at putting this in the hands of the end user through a simple self-service portal.

[La grande salle de la Philharmonie de Paris image by Jean-Pierre Dalbera and used unmodified under CC2.0]

Orchestration for Enterprise Cloud part 5

Orchestration for Enterprise Cloud part 5

We’ve spent the past four installments in this series putting together a System Center Orchestrator runbook workflow to call into PowerShell to call Tintri Automation Toolkit cmdlets to do a bunch of stuff.

The stuff that it’s doing is solving a real business need for us — we want our developers to be able to test their code against a current copy of production data. Dump and restore operations are very expensive and error prone, so we’re taking advantage of Tintri’s SyncVM functionality to handle the data synchronisation for us. As we’ll see, this is going to take less than a minute to perform!

In this article, we’ll walk through executing this runbook and show how easy it makes the task. This simplicity makes it a great candidate for a task that can be delegated to someone with less in-depth knowledge (or access to) the cloud infrastructure. This is a big step forward toward self-service.

Orchestrator Web Console

If we now point our web browser at port 82 of our Orchestrator server (for example, http://scorch-2016.vmlevel.com:82/), you should be presented with the Orchestrator Web Console and you should see our new Runbook.

scorch-runbooks

Select the runbook and click the Start Runbook button.

scorch-start

It will prompt you for the required input — simply the name of the developer’s virtual machine. It doesn’t request any information about the VMstore that the VM is stored on, it doesn’t ask for the production VM name, it doesn’t ask which snapshot to sync from and it doesn’t ask which virtual disks to synchronise. All of that is taken care of inside the runbook. This drastically reduces the number of places we could accidentally mess something up when we’re in a hurry or if we delegate this task to someone else.

Should something go wrong with the destination VM as part of this process, the SyncVM process we’re using takes a safety snapshot automatically, so at worst, we can easily roll it back.

We’ll enter our VM name (vmlevel-devel) and kick off the runbook job.

scorch-vmname

Next we’ll click on the Jobs tab and should see a running job.

scorch-jobs

If the job doesn’t have an hour glass (indicating it’s running) or a green tick (indicating success), it’s worth checking that the Orchestrator Runbook service is started on your runbook servers (check your Services applet). I’ve noticed that at times it doesn’t start correctly by itself despite being set to be automatically started:

scorch-services

After a little while (it takes about 45 seconds in my lab), hit refresh and the job should have succeeded. At that point, click on View Instances and then View Details to view the details of the job.

scorch-jobdetails

If we click on the Activity Details tab and scroll down, we can see the parameters of the Run .Net Script activity that calls our PowerShell code. If you look closely, you’ll see the variables we have defined. This especially includes out TraceLog variable, which you can see in the above output gives us a very detailed run-down of the process executed.

Given that this has succeeded, we’ve achieved our goal. Our developer VM has our developer code and OS on it, but has a copy of the latest production data snapshot. The whole process took less than 60 seconds and the developer is now up and running with recent production data — all without costly dumps and restores.

Try it for yourself and see.

[Ovation image by Joi Ito and used unmodified under CC2.0]

Orchestration for Enterprise Cloud part 4

Orchestration for Enterprise Cloud part 4

Leading up to this point in this series, we’ve spoken a little about System Center Orchestrator and why we might want to deploy runbooks within it (or another orchestration tool). We also looked at how to create a runbook and pass parameters between runbook activities. We then looked at a Microsoft template for calling sophisticated pieces of PowerShell as part of that runbook workflow. As we covered both here and in our automation series, we’re generally doing all of this to solve a real business problem.

In this article, we’ll look at the portion of the sample code that we haven’t looked at yet. This is the code that actually calls into the Tintri Automation Toolkit for PowerShell and performs the magic that is data-copy management through SyncVM.

The Code

The use-case specific code that’s going to solve our business need is the code from line 119 to 194. This uses the Tintri Automation Toolkit for PowerShell (free download from the Support Portal) to use SyncVM to handle our zero-copy data synchronisation.

You’ll notice that each logical section of code is surrounded by a try { …. } catch { …. } block. The Tintri cmdlets will throw exceptions when an operation fails and using try and catch allows us to correctly handle those cases and collect any information needed for our trace log and to pass back to the user.

The rest is all pretty straightforward and just calls the following cmdlets to get the job done:

  1. Import-Module to import the Tintri PowerShell modules. In PowerShell 3.0 and later, this should automatically happen, but by explicitly trying to import it, it’s easy to tell when the module isn’t available. This module needs to be installed on each of our Runbook Servers.
  2. Connect-TintriServer creates a session with our Tintri VMstore. Note the use of the -UseCurrentUserCredentials option. This code is run as the Orchestrator service account (specified at install time) on the runbook servers. The -UseCurrentUserCredentials option allows the use of Kerberos Single Sign On (SSO) to authenticate against the VMstore. This means no hard-coded passwords and also means that if/when we change those service account credentials, we don’t need to track down all of the scripts that use the credentials and change those too. REST and PowerShell SSO is something that we covered in detail in a previous post.
  3. Get-TintriVM on line 148 retrieves an object representing our developer VM. We’ll use that further down.
  4. Get-TintriVM (line 161), Get-TintriVMSnapshot (line 163) and Get-TintriVDisk (line 165) get objects to represent the production VM, its most recent snapshot and the set of virtual disks within that snapshot respectively.
  5. Sync-TintriVDisk on line 178 is where the magic happens. We take the development VM object, and a subset of the vDisks attached to the latest production snapshot (data disks 1 and 2, skipping system disk 0), and performs the zero-copy data synchronisation. At the completion of this cmdlet, the development VM will have been booted with the production data from that latest snapshot.
  6. Disconnect-TintriServer on line 192 just closes our session to the Tintri VMstore. It’s always good practice to do so.

Tracing

Note the types of things that we’re logging to our trace log too. In the case of the Connect-TintriServer cmdlet call, this creates a connection and authenticates us. It will fail if either of those things goes wrong. As a result, we’re logging the VMstore we’re connecting to and the username we’re connecting as. On failure, we log the exception message so that we know why it failed.

In the case of the per-VM operations, we log the VM we’re operating on and the exception message.

What we’re trying to do is to leave a very clear trail of what happened leading up to a failure.

Ready To Roll

At this point, we’re ready to execute this whole workflow. We’ll demonstrate that in the next article with a bunch of screenshots just to break things up a little.

[Clones image by HJ Media Studios and used unmodified under SA2.0]