We’ve come a long way in this series of articles. Thanks for coming along for the ride.

Whilst we haven’t worked our way through this in a fashion typical of a development project (today’s topic should never come last), hopefully it’s allowed us to tackle parts of automation projects independently.

At the end of our last article, we had some fairly modular automation code to solve a particular business problem. There is one critical piece missing though. Our code only covers the happy case. If anything unexpected happens, it all falls apart. In this article, we’ll look at some ways to handle that.

This would normally never be the last step in a project and would be something we add as we go. For educational purposes (and clarity of previous examples), I’ve left it to the end as its own article.

What Could Possibly Go Wrong?

Our code currently does the following:

  1. Reads and parses a JSON file of VM mappings
  2. Loops through each mapping pair and calls our Sync-ProdRep function, which:
    1. Connects to a Tintri VMstore using SSO
    2. Retrieves an object representing the production and the reporting VMs
    3. Takes a Tintri snapshot of the production VM and retrieves the snapshot object
    4. Calls Sync-TintriVDisk to attach the production snapshot database vDisks to the reporting VM

[Note: the above language representation of the code at a high level is a good way to draft things out before firing up your favourite automation text editor]

This is all great. Except for the times that it isn’t. What happens in any of these cases:

  • The JSON file has a syntax error or typo in it?
  • The VMstore is unavailable due to network outage?
  • The current user’s credentials have expired or similar (including clock skew in the Kerberos case)?
  • The user typed the VM names incorrectly?
  • The snapshot was unable to be taken for some reason?
  • The SyncVM operation was unable to complete for some reason?

Things will break. The first you’ll probably hear about it is first thing in the morning when the angry masses are amassed outside your cube with pitchforks and torches. You’re going to need two things:

  • An answer as to what went wrong, and
  • The least amount of impact

The exact nature of how you handle issues is going to very from project to project and operation to operation and requires some careful consideration.

There is a lot of documentation on the syntax of various error handling constructs for PowerShell and other languages and tools. Instead of rehashing that here, we’ll concentrate more on the impact to various parts of our automation example.

Exceptions

PowerShell cmdlets in the Tintri PowerShell Toolkit, like with many PowerShell cmdlets, indicate failure by throwing something called an exception. Whilst there are other ways to determine success or failure of an operation we’ll use exceptions here to illustrate our point.

We can tell PowerShell to try a particular Tintri operation and if it fails, allow us to catch an exception object so that we can determine what to do next.

If we take our code as an example, each of the cmdlets could potentially fail for a variety of reasons. The Connect-TintriServer cmdlet could fail due to network issues or authentication issue or a range of other issues. The Get-TintriVM cmdlet might fail if the VM name was mistyped or the the VM had been migrated to another storage appliance.

Let’s look at the Connect-TintriServer cmdlet specifically:

Try {
    Connect-TintriServer $vmstore -UseCurrentUserCredentials -ErrorAction Stop
} Catch {
    Write-Output "$(Get-Date -Format o): $($_.Exception)"
    return $false
}

We’ve added the -ErrorAction Stop option to the cmdlet to tell it to halt and raise an exception and we’ve wrapped it in this Try/Catch block. If Connect-TintriServer fails, an exception is raised and the code inside the catch block is executed.  In this case, we show a message including a datestamp and the exception message. None of the rest of the cmdlets following this will be of any use without a successful Tintri Server session, so we use the return statement to stop our function and indicate failure. Our Foreach-Object loop will continue on with the next VM pair.

Ideally, we’d take a look at each of the rest of the calls and come up with some form of action to take on error. It may just be a case of failing in the same way that we have for Connect-TintriServer, but perhaps there’s cases where instead of failing, we could try again or perform some other type of recovery action.

It’s also worth us explicitly checking the returned value when we call it from our Foreach-Object loop. Consider this example:

$mappings.mappings | `
   Foreach-Object {
       $result = Sync-ProdRep -Prod $_.prod -Report $_.report
       if($result -eq $false) {
           Write-Output "$(Get-Date -Format o): $_.prod succeeded"
       } else {
           Write-Output "$(Get-Date -Format o): $_.prod failed"
       }
 }

In this example, we just display a message of success on success or indicating failure otherwise.

We probably also want to add a final line to our function to explicitly return $true if all of the operations succeed.

Here’s a complete version of our script with all of the modifications:

function Sync-ProdRep {
    [CmdletBinding()]
    param(
      [parameter(Mandatory=$true)][String]$prodname,
      [parameter(Mandatory=$true)][String]$reportname
    )
    # Connect to our VMstore
    $vmstore = "vmstore01.vmlevel.com"
    Try {
      Connect-TintriServer -UseCurrentUserCredentials $vmstore -ErrorAction Stop
    } Catch {
      Write-Output "$(Get-Date -Format o): $($_.Exception)"
      return $false
    }

    # Retrieve a VM object for both VMs by name
    Try {
        $report = Get-TintriVM -Name $reportname -ErrorAction Stop
    } Catch {
        Write-Output "$(Get-Date -Format o): $reportname $($_.Exception)"
        return $false
    }
    Try {
        $prod = Get-TintriVM -Name $prodname -ErrorAction Stop
    } Catch {
        Write-Output "$(Get-Date -Format o): $prodname $($_.Exception)"
        return $false
    }
    # Take a snapshot of our production VM and using the
    # Returned snapshot ID, retrieve the snapshot object
    Try {
      $snapshotIdNew-TintriVMSnapshot `
         -SnapshotDescription "Reporting snapshot" `
         -VM $prod -ErrorAction Stop `
         -SnapshotConsistency CRASH_CONSISTENT
      $snapshot = Get-TintriVMSnapshot -VM $prod `
         -SnapshotId $snapshotId -ErrorAction Stop
    } Catch {
       Write-Output "$(Get-Date -Format o): snap $prodname $($_.Exception)"
       return $false
    }
    # Use SyncVM's vDisk sync to synchronise the data and
    # log vDisks from the prod snapshot to the reporting VM
    Try {
     $result = Sync-TintriVDisk -VM $report `
        -SourceSnapshot $snapshot `
        -AllButFirstVDisk -ErrorAction Stop
    } Catch {
       Write-Output "$(Get-Date -Format o): sync $prodname $($_.Exception)"
       return $false
    }
}

$mappings = ConvertFrom-Json "$(Get-Content 'host-mappings.json')"
$mappings.mappings | `
   Foreach-Object {
       $result = Sync-ProdRep -Prod $_.prod -Report $_.report
       if($result -eq $false) {
           Write-Output "$(Get-Date -Format o): $_.prod succeeded"
       } else {
           Write-Output "$(Get-Date -Format o): $_.prod failed"
       }
 }

So we’ve seen here how to indicate success or failure within our own functions and modules, and we’ve also seen how to catch and deal with exceptions from running PowerShell cmdlets we’ve called.

There are many ways that we could continue to improve the code above, but as it stands now, it’s a little more robust than how it was at the end of the last article.

[Image by Hernán Piñera and used unmodified under CC 2.0]

Advertisements

2 thoughts on “Automation and Private Cloud part V

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s