Orchestration for Enterprise Cloud part 2

Orchestration for Enterprise Cloud part 2

In our previous post, we set the scene for being able to use System Center Orchestrator for being able to bundle up automated functionality in a way that’s more consumable by others. We also defined the business problem we’re trying to solve and worked out which inputs we would rely on the user to provide and which things we will decide on their behalf.

In this article, we’ll start to look at the Orchestrator workflow and how to include some PowerShell automation as part of that. We’ll then follow up with some specifics around example PowerShell code, which has its share of peculiarities when being run under Orchestrator.

Orchestrator Workflows

System Center Orchestrator, like many orchestration tools, allows us to take automation modules, called Activities, and glue them together in a workflow called a Runbook. There are a bunch of pre-defined activities that ship with Orchestrator, and Microsoft provides extra activity bundles in a set of Integration Packs that are available for download. We’ll be keeping things quite simple in our example and will rely primarily on the Run .NET Script activity that comes standard.

Open Orchestrator’s Runbook Designer tool, select your Orchestrator runbook server and we’ll create a new runbook. In my example, I’ve called mine Prod-Dev Data Sync. Using the activities on the left, drag and drop the following activities into the runbook and connect them:

  • Initialize Data — this is where our runbook will start
  • Run .NET Script — this activity is where we’ll call out to PowerShell to do Tintri SyncVM magic
  • Return Data — at the end of the runbook, any returned data will be handled here and passed back to Orchestrator

It will end up looking like this:

scorch16.png

We can far more complex runbooks than this, and indeed ours could be improved, but this will work as a starting point.

Next, right-click the Initialize Data activity, select Properties and click the Details tab. This is where we set the inputs that this runbook will take from the end user. We’ll add one string input parameter and we’ll call it VMname. It should look as follows:

scorch-init

We’re going to blindly pass this along to the next activities in the runbook, but it would be better to perform some kind of basic checks here to make sure that we’re operating on a developer VM and not a production VM. We’ll come back to that.

Next, right-click the Run .NET Script activity, hit Properties and again select the Details tab. This is where we’ll add our PowerShell script shortly. For now, set the language to PowerShell and leave the the script pane blank. Click on the Published Data tab. This is where we take any values returned our script and pass them to the next activity in the runbook.

We’re going to add three variables to be returned:

  1. ResultStatus as an integer
  2. ErrorMessage as a string
  3. TraceLog as a string

The names don’t matter too much, but the name you use in the Variable field will need to match the variable names we use in our script, and the Name field will be plumbed into the Return Data activity shortly. The result should look like this:

scorch-scriptret

The plumbing for our runbook is almost done. We just need to take care of the data that we want to return and we’re then ready to start adding our PowerShell code.

First, right-click on the runbook’s title in the tab along the top, and click Properties, and select the Returned Data tab. This is where we’ll declare which parameters we plan to return. To begin with, these will match what we returned from the Run .NET Script activity earlier.

scorch-retdata

This defines the data we plan to return, but we still need to link that up with the previous activities in the runbook.

Right-click the Return Data activity, click Properties and select the Details tab. You’ll notice that it has already populated with the list of return parameters we defined a moment ago.

For each parameter to be returned, we need to subscribe to a piece of data published by earlier activities. In our case, we’ll right-click the text field, select Subscribe and Published Data. You should be able to find each of the variables we defined as output from the Run .NET Script activity earlier. Orchestrator will then fill in the text fields with template text that looks like ‘{ResultStatus from “Run .Net Script”}’ as seen below:

scorch-retsubscribe

At this point, our runbook is pretty much complete. To summarise what we’ve done:

  1. Declared the VM name as an input parameter and subscribed to it
  2. Passed that VM name to a PowerShell script (still to come)
  3. Took the returned data from the script and returned that at the end of our runbook.

We’ll cover the PowerShell script in two separate articles — our next will cover the plumbing needed for PowerShell scripts in Orchestrator, and then the one after will look at the use-case specific code.

Ideas for improvement

The runbook we have will be pretty functional once we’re doing with the PowerShell component. But it is about as simple as it could be. Here are a couple of ways that we could insert some more activities into the runbook to make things more robust:

  1. Have an activity after the Initialize Data activity that checks that the virtual machine name is a developer VM, rather than production or some other application.
  2. Extend that to check that the user requesting the activity owns the VM being operated on. Most hypervisors don’t have a concept of an owner of a VM in this context, so we’d need to think of ways to map that ourselves.
  3. Check the output of the script and if the operation failed, automatically email us with the trace log data and anything else helpful.

Time permitting, we’ll try to tackle one or more of these once we have the initial project up and running.

[Orchestra image by aldern82 and used without modification under SA2.0]

Orchestration For Enterprise Cloud

Orchestration For Enterprise Cloud

In our last series, we looked at taking a business problem or need and turning into a piece of automated code. We started out by breaking the problem down into smaller pieces, we then put together a piece of sortamation to demonstrate the overall solution, and then made it more modular and added some error handling.

In this series, we’re going to extend upon this and integrate our automation into an orchestration framework. This approach will apply to any orchestration framework, but we’ll use System Center Orchestrator 2016 in our examples.

Why Orchestration?

Primarily for delegation of tasks. We may have written a script that we can run to perform some mundane task, but for us to be able to successfully scale, we need to start putting some of these tasks into the hands of others.

As a dependency for delegation, we also want to use automation and orchestration as a way to guarantee us consistent results and general damage prevention. Sure, we could just allow everyone access to SCVMM or vCenter to manage their virtual machines, but that’s a recipe for disaster. Orchestration gives us a way to safely grant controlled access to limited sets of functionality to make other groups or teams more self-sufficient.

The Process

Much like in our earlier automation series, we want to start by defining a business problem and breaking it down to smaller tasks from there. We want to extend this too to include the safe delegation of this task and this will include thinking carefully about input we’ll accept from the user, information we’ll give back to the user, and what kind of diagnostic information we’ll need to collect so that if something goes wrong, we can take a look later.

The fictitious, but relevant, business problem that we’re going to solve in this series is a common DevOps problem:

Developers want to be able to test their code against real, live production data to ensure realistic test results.

The old approach would be to have someone dump a copy of the production application database and restore the dump to each of the developers’ own database instances. This is expensive in time and capacity and can adversely impact performance. It’s also error-prone.

Instead, we’ll look at making use of Tintri’s SyncVM technology to use space-efficient clones to be able to nearly-instantly make production data available to all developers. We’ll do this with some PowerShell and a runbook in System Center Orchestrator.

We can then either schedule the runbook to be executed nightly, or we can make the runbook available to the helpdesk folks, who can safely execute the runbook from the Orchestrator Web Console. [Later, we’ll look at another series that shows us how to make this available to the developers themselves — probably through a Service Manager self-service portal — but let’s not get too far ahead of ourselves]

Core Functionality

Our production VM and developer VMs all have three virtual disks:

  1. A system drive that contains the operating system, any tools or applications that the developer needs, and the application code either in production or in development.
  2. A disk to contain database transaction logs.
  3. A disk to contain database data file logs.

In our workflow, we’ll want to use SyncVM to take vDisks #2 and #3 from a snapshot of the production VM, and attach them to the corresponding disk slots on the developer’s VM. We want this process to not touch vDisk #1, which contains the developer’s code and development tools.

Inputs

For us to use SyncVM (as we saw previously), we need to pass in some information to the Tintri Automation Toolkit about what to sync. Looking at previous similar code, we probably need to know the following:

  • The Tintri VMstore to connect to.
  • The production VM and the snapshot of it that we wish to sync
  • The set of disks to sync
  • The destination developer VM

In order to limit potential accidents, we probably want to limit how much of this comes from the person we’re delegating this to. In our example, we’ll assume a single VMstore and a single known production VM. We’ve also established earlier that there is a consistent and common pattern for the sets of disks to sync (always vDisk #2 and #3). The only parameter where there should be any user input is the destination VM. The rest can all be handled consistently and reliably within our automation.

Output

After this has been executed, we’ll need to let the user know if it succeeded or failed, along with any helpful information they’ll need. We’ll also want to track a lot of detailed information about our automation so that if an issue arises, we have some hope of resolving it.

Summary

So far, we have again defined a business need or business problem (synchronisation of production data to developer VMs), defined the set of inputs we’ll need and where those will come from, and we’ve defined the outputs.

In the next installment, we’ll start to get our hands dirty with System Center Orchestrator, followed by PowerShell and the Tintri Automation Toolkit for PowerShell.

[Orchestra image by Sean MacEntee and used unmodified under CC2.0]

Could The AWS Outage Happen To You?

Could The AWS Outage Happen To You?

No doubt we’ve all been affected by the recent Amazon Web Services outage in one form or probably many. It’s had coverage the Internet over… now that the Internet is again running. We’re also starting to get a picture of what went wrong.

This article isn’t a big I told you so about public cloud, nor is it a jab at Amazon or the poor Engineer who must have had a rough couple of days this week.

I’m hoping that this article serves as a bit of a public service announcement for the way that we all operate within our Enterprise and Private clouds so that the same doesn’t happen to those of us running the infrastructure within our organisations.

What Happened?

Amazon has posted a summary of what went down. I want to take a look at a specific excerpt:

At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

The details are thin, but our immediate reaction might be to jump on that poor S3 team member. I hope his or her week is looking up already and I’m not sure that they’re individually to blame as much as the automation.

The industry is talking about automation and orchestration being a necessity due to the scale and complexity of the in-house systems we have. It’s unreasonable to expect that the complexity of one of the biggest pieces of technical infrastructure can be maintained in the heads of the engineers running it — especially being as fluid as it is.

Any automation or orchestration needs to put bounds around what the end user can do. Whether it be a strong, technical engineer, an end user, or someone in between.

What Can We Learn From This?

We’re all talking a lot about orchestration and automation. We’ve got self-service portals for end users and we’re putting new tools in front of helpdesk staff. We’re having to automate big chunks of what used to be standard IT stuff because the scale is too big to do these things manually anymore.

Whether the automation or orchestration is designed for the most technical of folks, or whether it’s for someone without the technical chops, it’s important to make sure that the tooling only allows stuff that should be allowed and prevents everything else. It sounds simpler than it really is, but it’s worth considering and limiting the potential use cases to avoid tragedy in-house.

A Simple, Contrived Example

To illustrate the point, let’s assume that we have some orchestration to allow an operator to control RAID arrays in servers in our data centre. One of their responsibilities is to monitor the number of write cycles to the SSDs in the RAID array. When a drive crosses a threshold, the operator marks it failed and organises a replacement drive.

[I told you that this illustrative example would be contrived…]

These fictional arrays are made up of RAID 6 stripe sets, which as we know supports two drive failures.

Now let’s say that the operator sees three drives cross the threshold at the same time and pushes the big red button on each of them. We now have an incomplete RAID set and an outage.

Sure, in the real world, we wouldn’t remove non-failed disks from an array before a replacement was there and would only ever do one at a time. However, if we expand this type of cluster-like redundancy to hosts, clusters and geographic regions, we’re starting to see how quickly we’re going to lose visibility of all of the moving parts.

The point is that if you have a piece of automation that accepts input from someone (including yourself late, late at night), try to preempt the valid use cases and and restrict or limit the rest.

[Image by William Warby and used unmodified under CC 2.0]

Automation and Private Cloud part V

Automation and Private Cloud part V

We’ve come a long way in this series of articles. Thanks for coming along for the ride.

Whilst we haven’t worked our way through this in a fashion typical of a development project (today’s topic should never come last), hopefully it’s allowed us to tackle parts of automation projects independently.

At the end of our last article, we had some fairly modular automation code to solve a particular business problem. There is one critical piece missing though. Our code only covers the happy case. If anything unexpected happens, it all falls apart. In this article, we’ll look at some ways to handle that.

This would normally never be the last step in a project and would be something we add as we go. For educational purposes (and clarity of previous examples), I’ve left it to the end as its own article.

What Could Possibly Go Wrong?

Our code currently does the following:

  1. Reads and parses a JSON file of VM mappings
  2. Loops through each mapping pair and calls our Sync-ProdRep function, which:
    1. Connects to a Tintri VMstore using SSO
    2. Retrieves an object representing the production and the reporting VMs
    3. Takes a Tintri snapshot of the production VM and retrieves the snapshot object
    4. Calls Sync-TintriVDisk to attach the production snapshot database vDisks to the reporting VM

[Note: the above language representation of the code at a high level is a good way to draft things out before firing up your favourite automation text editor]

This is all great. Except for the times that it isn’t. What happens in any of these cases:

  • The JSON file has a syntax error or typo in it?
  • The VMstore is unavailable due to network outage?
  • The current user’s credentials have expired or similar (including clock skew in the Kerberos case)?
  • The user typed the VM names incorrectly?
  • The snapshot was unable to be taken for some reason?
  • The SyncVM operation was unable to complete for some reason?

Things will break. The first you’ll probably hear about it is first thing in the morning when the angry masses are amassed outside your cube with pitchforks and torches. You’re going to need two things:

  • An answer as to what went wrong, and
  • The least amount of impact

The exact nature of how you handle issues is going to very from project to project and operation to operation and requires some careful consideration.

There is a lot of documentation on the syntax of various error handling constructs for PowerShell and other languages and tools. Instead of rehashing that here, we’ll concentrate more on the impact to various parts of our automation example.

Exceptions

PowerShell cmdlets in the Tintri PowerShell Toolkit, like with many PowerShell cmdlets, indicate failure by throwing something called an exception. Whilst there are other ways to determine success or failure of an operation we’ll use exceptions here to illustrate our point.

We can tell PowerShell to try a particular Tintri operation and if it fails, allow us to catch an exception object so that we can determine what to do next.

If we take our code as an example, each of the cmdlets could potentially fail for a variety of reasons. The Connect-TintriServer cmdlet could fail due to network issues or authentication issue or a range of other issues. The Get-TintriVM cmdlet might fail if the VM name was mistyped or the the VM had been migrated to another storage appliance.

Let’s look at the Connect-TintriServer cmdlet specifically:

Try {
    Connect-TintriServer $vmstore -UseCurrentUserCredentials -ErrorAction Stop
} Catch {
    Write-Output "$(Get-Date -Format o): $($_.Exception)"
    return $false
}

We’ve added the -ErrorAction Stop option to the cmdlet to tell it to halt and raise an exception and we’ve wrapped it in this Try/Catch block. If Connect-TintriServer fails, an exception is raised and the code inside the catch block is executed.  In this case, we show a message including a datestamp and the exception message. None of the rest of the cmdlets following this will be of any use without a successful Tintri Server session, so we use the return statement to stop our function and indicate failure. Our Foreach-Object loop will continue on with the next VM pair.

Ideally, we’d take a look at each of the rest of the calls and come up with some form of action to take on error. It may just be a case of failing in the same way that we have for Connect-TintriServer, but perhaps there’s cases where instead of failing, we could try again or perform some other type of recovery action.

It’s also worth us explicitly checking the returned value when we call it from our Foreach-Object loop. Consider this example:

$mappings.mappings | `
   Foreach-Object {
       $result = Sync-ProdRep -Prod $_.prod -Report $_.report
       if($result -eq $false) {
           Write-Output "$(Get-Date -Format o): $_.prod succeeded"
       } else {
           Write-Output "$(Get-Date -Format o): $_.prod failed"
       }
 }

In this example, we just display a message of success on success or indicating failure otherwise.

We probably also want to add a final line to our function to explicitly return $true if all of the operations succeed.

Here’s a complete version of our script with all of the modifications:

function Sync-ProdRep {
    [CmdletBinding()]
    param(
      [parameter(Mandatory=$true)][String]$prodname,
      [parameter(Mandatory=$true)][String]$reportname
    )
    # Connect to our VMstore
    $vmstore = "vmstore01.vmlevel.com"
    Try {
      Connect-TintriServer -UseCurrentUserCredentials $vmstore -ErrorAction Stop
    } Catch {
      Write-Output "$(Get-Date -Format o): $($_.Exception)"
      return $false
    }

    # Retrieve a VM object for both VMs by name
    Try {
        $report = Get-TintriVM -Name $reportname -ErrorAction Stop
    } Catch {
        Write-Output "$(Get-Date -Format o): $reportname $($_.Exception)"
        return $false
    }
    Try {
        $prod = Get-TintriVM -Name $prodname -ErrorAction Stop
    } Catch {
        Write-Output "$(Get-Date -Format o): $prodname $($_.Exception)"
        return $false
    }
    # Take a snapshot of our production VM and using the
    # Returned snapshot ID, retrieve the snapshot object
    Try {
      $snapshotIdNew-TintriVMSnapshot `
         -SnapshotDescription "Reporting snapshot" `
         -VM $prod -ErrorAction Stop `
         -SnapshotConsistency CRASH_CONSISTENT
      $snapshot = Get-TintriVMSnapshot -VM $prod `
         -SnapshotId $snapshotId -ErrorAction Stop
    } Catch {
       Write-Output "$(Get-Date -Format o): snap $prodname $($_.Exception)"
       return $false
    }
    # Use SyncVM's vDisk sync to synchronise the data and
    # log vDisks from the prod snapshot to the reporting VM
    Try {
     $result = Sync-TintriVDisk -VM $report `
        -SourceSnapshot $snapshot `
        -AllButFirstVDisk -ErrorAction Stop
    } Catch {
       Write-Output "$(Get-Date -Format o): sync $prodname $($_.Exception)"
       return $false
    }
}

$mappings = ConvertFrom-Json "$(Get-Content 'host-mappings.json')"
$mappings.mappings | `
   Foreach-Object {
       $result = Sync-ProdRep -Prod $_.prod -Report $_.report
       if($result -eq $false) {
           Write-Output "$(Get-Date -Format o): $_.prod succeeded"
       } else {
           Write-Output "$(Get-Date -Format o): $_.prod failed"
       }
 }

So we’ve seen here how to indicate success or failure within our own functions and modules, and we’ve also seen how to catch and deal with exceptions from running PowerShell cmdlets we’ve called.

There are many ways that we could continue to improve the code above, but as it stands now, it’s a little more robust than how it was at the end of the last article.

[Image by Hernán Piñera and used unmodified under CC 2.0]

Automation and Private Cloud part IV

Automation and Private Cloud part IV

Over the past three articles in this series, we’ve defined a business problem and broken it down into components in order to create a simple high-level design, we’ve written some initial code to perform some of the fundamental operations, and using that initial design and some abstraction, we’ve incorporated some of the rest of the design objectives.

We have some automation code now that in the usual happy case probably works just fine. But we’re not done yet. The article after this will help us to take care of issues as they arise and in this article, we’ll look at making this automation even more flexible and portable and cloud-like.

Detached Configuration

In our last article, we ended up calling our Sync-ProdRep function with the names of a production and a reporting VM to process. It looked like this in our script:

Sync-ProdRep -ProdName "sqlprod1" -ReportName "sqlrep1"
Sync-ProdRep -ProdName "sqlprod2" -ReportName "sqlrep2"
Sync-ProdRep -ProdName "sqlprod3" -ReportName "sqlrep3"

This is fine given that we have only a handful of pairs. But what about cases where we have many more? Or if we want to allow others to use our automation script? In the latter case, the instructions will be to open the script in a text editor and change all of the VM names.

Here, we’ll create a simple configuration file that contains only the host mappings, and we’ll teach our script how to parse it and call the Sync-ProdRep function for each VM pair. It may sound complication, but we’ll create the file as a JSON format file and use the in-built JSON magic that PowerShell already has.

First create a file called host-mappings.json in a text editor and add this as the contents:

{
 "mappings": [
 { "prod": "sqlprod1", "report": "sqlrep1" },
 { "prod": "sqlprod2", "report": "sqlrep2" },
 { "prod": "sqlprod3", "report": "sqlrep3" }
 ]
}

The JSON format is well-documented across the web, but what we have is a JSON object called “mappings” that is an array (a set) of production and reporting VM name mappings.

Now we’ll replace all three Sync-ProdRep lines above with this little PowerShell excerpt:

$mappings = ConvertFrom-Json "$(Get-Content 'host-mappings.json')"
$mappings.mappings | `
   Foreach-Object { Sync-ProdRep -Prod $_.prod -Report $_.report }

This may seem daunting, there’s nothing there that you won’t find in any of the standard PowerShell corners of the interwebs. Here’s a breakdown:

  1. We’re using Get-Content to read our JSON file into a string.
  2. ConvertFrom-Json is turning our JSON text into a PowerShell object that we’re storing in a variable called $mappings.
  3. The $mappings object contains an array called ‘mappings’ (see line 2 of our JSON file).
  4. We’re using Foreach-Object to take each item in that array (each item is the name of a production and the name of a reporting VM) and pass each in to Sync-ProdRep as a production and a reporting VM name.

This achieves exactly the same thing as the code we ended the last article with. The difference is that to add, remove or modify the list of VM pairs we process, we simply modify the JSON file and not the code itself.

Why?

This kind of abstraction allows us to grant control over different parts of the process to different users. In its simplest form, that JSON file could be modified by someone without any PowerShell experience and our code would work unchanged.

Extend that with another business case and the JSON file could be automatically generated from a list of VMs from SCVMM that have particular tags for example.

Or could be created carefully by some automation behind a System Center (or other) Self-Service Portal. The synchronisation automation code doesn’t need to change as the business needs grow over time.

Homework

You’ll notice that the Tintri VMstore hostname is still a hard-coded string in our Sync-ProdRep function. If we had VMs spread across multiple VM-aware storage appliances, this wouldn’t work. How would you move that hostname into the JSON configuration file and pass it into the Sync-ProdRep function?

Summary

Here’s a snapshot of where our automation code is at this point:

function Sync-ProdRep {
    [CmdletBinding()]
    param(
      [parameter(Mandatory=$true)][String]$prodname,
      [parameter(Mandatory=$true)][String]$reportname
    )
    # Connect to our VMstore
    $vmstore = "vmstore01.vmlevel.com"
    Connect-TintriServer -UseCurrentUserCredentials $vmstore

    # Retrieve a VM object for both VMs by name
    $report = Get-TintriVM -Name $reportname
    $prod = Get-TintriVM -Name $prodname

    # Take a snapshot of our production VM and using the
    # Returned snapshot ID, retrieve the snapshot object
    $snapshotIdNew-TintriVMSnapshot `
       -SnapshotDescription "Reporting snapshot" `
       -VM $prod `
       -SnapshotConsistency CRASH_CONSISTENT
    $snapshot = Get-TintriVMSnapshot -VM $prod `
       -SnapshotId $snapshotId

    # Use SyncVM's vDisk sync to synchronise the data and
    # log vDisks from the prod snapshot to the reporting VM
    $result = Sync-TintriVDisk -VM $report `
       -SourceSnapshot $snapshot `
       -AllButFirstVDisk
}

$mappings = ConvertFrom-Json "$(Get-Content 'host-mappings.json')"
$mappings.mappings | `
   Foreach-Object { Sync-ProdRep -Prod $_.prod -Report $_.report }

 

[Image created by Wolfgang Maslo and used unmodified under CC 2.0]

Automation and Private Cloud part III

Automation and Private Cloud part III

In our previous articles, we created a high-level design for a solution to a business problem, and we automated one part of that. All of this using tools that come with a Tintri VM-aware Storage appliance.

Our code currently uses SyncVM to synchronise vDisks from a production SQL VM to a reporting VM. In this article, we’ll build on that a little so that it performs that task over a number of similar VMs.

Modularity

We have code to sync vDisks between our sqlprod1 VM and our sqlrep1 VM. In our example, we have two other pairs of prod/rep VMs we want to be able to do the same work to. We could just copy and paste the existing code a few times into the same script and be done with it. However, if we need to update that process later, we then need to remember to change it in three places. Adding more VMs to the list only increases that problem.

Instead, we’ll package up the code we finished the last article with and call it for each of the VM pairs we want. This packaging uses PowerShell functions. I’m going to take our previous code and simply wrap it up in the function stuff. Here’s the code and we’ll walk through the changes afterwards:

function Sync-ProdRep {
    [CmdletBinding()]
    param(
      [parameter(Mandatory=$true)][String]$prodname,
      [parameter(Mandatory=$true)][String]$reportname
    )
    # Connect to our VMstore
    $vmstore = "vmstore01.vmlevel.com"
    Connect-TintriServer -UseCurrentUserCredentials $vmstore

    # Retrieve a VM object for both VMs by name
    $report = Get-TintriVM -Name $reportname
    $prod = Get-TintriVM -Name $prodname

    # Take a snapshot of our production VM and using the
    # Returned snapshot ID, retrieve the snapshot object
    $snapshotIdNew-TintriVMSnapshot `
       -SnapshotDescription "Reporting snapshot" `
       -VM $prod `
       -SnapshotConsistency CRASH_CONSISTENT
    $snapshot = Get-TintriVMSnapshot -VM $prod `
       -SnapshotId $snapshotId

    # Use SyncVM's vDisk sync to synchronise the data and
    # log vDisks from the prod snapshot to the reporting VM
    $result = Sync-TintriVDisk -VM $report `
       -SourceSnapshot $snapshot `
       -AllButFirstVDisk
}

There’s a lot of text there, but very little of it has changed from what we created in the previous article. We’ll look at specifically what we’ve changed and why. There are a lot of PowerShell resources out there and I don’t want to duplicate that here, but I’ll describe a few things that are pertinent.

  1. We’ve removed the lines where we set the $prodname and $reportname variables to the names of the production and reporting VM.
  2. We’ve wrapped the whole code block in a starting and ending set of curly braces ({}) and indented each line to make it clearer to read.
  3. We’ve defined that code block as a function that we have called Sync-ProdRep. As we’ll see, this will make it trivial for us to call this code block over and over again.
  4. There’s a new line with all of that CmdletBinding() stuff in it. What this will do, is allow us to pass in some information when we call Sync-ProdRep and have it automatically put into some variables for us. This may seem a little unclear, but should become clearer very shortly.
  5. We’ve got some new lines that start with [parameter….] and end with the variable names we removed (see item #1 in this list). This, combined with item #4, defines prodname and reportname as parameters that can be passed to Sync-ProdRep to tell it which production VM and which reporting VM to do our Production->Replication synchronisation to.

That may be a lot to take in if automation and PowerShell are new to you. What it means overall is that to execute the sync code within our script, we can simply add some lines to the bottom of the script (after the closing brace) to call our Sync-ProdRep code, telling it which VMs to operate on. Like this:

Sync-ProdRep -ProdName "sqlprod1" -ReportName "sqlrep1"
Sync-ProdRep -ProdName "sqlprod2" -ReportName "sqlrep2"
Sync-ProdRep -ProdName "sqlprod3" -ReportName "sqlrep3"

Not bad. We’ve abstracted our synchronisation code into its own separate module. If we need to update or change it (we will be), we do that in one single place. If we want to change which VMs we use, or how we define the list (we will be), that’s in a single place.

Are We There Yet?

We’ve done a lot to go from a business issue to a solution design and now have some code that seems like it will do the job we need.

There is still some more for us to do, but we’re getting there. I promise.

Next, we’ll look at improving maintenance of our automation and we’ll spend some time covering error handling and diagnostics. Currently we aren’t handling any error conditions or logging at all and that’s bad.

[Machinery image created by bradleyolin and used unmodified under CC 2.0]

Automation and Private Cloud part II

Automation and Private Cloud part II

In our last article, we looked at a potential approach for developing automation to solve a business need. This business need was to move the daily reporting over a series of SQL Server VMs’ production data to a set of secondary reporting VMs where the jobs could be run without impacting production.

In this article, we’re going to start to take a look at development of the actual script. We’ll do this in PowerShell, but the same high-level process applies regardless of the tools used.

Moving The Data

As mentioned in the last article, we have a database log disk and a database disk on each VM as well as the disk containing the operating system and applications. Let’s assume that they’re called sqlprod1, sqlprod2 and sqlprod3. Since then, we have cloned each VM and the application owners have performed any steps needed to have the reporting work within the clones. Let’s call the clones sqlrep1, sqlrep2 and sqlrep3.

What we want to do nightly is to use Tintri SyncVM to sync the database and log disks from each production VM to its corresponding reporting VM. This has no more impact on the production VMs than taking a snapshot.

Here’s how this SyncVM case would look if we weren’t automating it:

prod-report-syncvm

We’re replacing the log and data vDisks on our reporting VM with vDisks from a recent snapshot taken of our production VM. This will restart the reporting VM, but not the production VM.

Looking at the online help for the Sync-TintriVDisk PowerShell cmdlet (from the Tintri PowerShell Toolkit), we figure that we’ll need a command that looks something like this:

$result = Sync-TintriVDisk -VM $report `
    -SourceSnapshot $snapshot `
    -AllButFirstVDisk

Let’s break this down in case the specifics aren’t familiar or obvious:

  • $result = collects the result of the Sync-TintriVDisk cmdlet. This will often contain useful information about whether the cmdlet succeeded, or failed and often some detail about why. We’ll deal with this information later.
  • -VM $report is us providing the cmdlet with an object representing the VM that we’re operating on. We’ll call the object $report here and fill it in as we work backwards.
  • -SourceSnapshot $snapshot is where we provide the snapshot from which we’re synchronising vDisks. This will be a snapshot on our production VM and we’ll fill that in as we work backwards.
  • -AllButFirstVDisk is an option to allow us to sync all but the first vDisk. You’ll recall that our database VMs have the OS and applications on the first disk and then the logs on the second and database files on the third. We only want to sync the 2nd and 3rd disks.

Working Back

So we have our SyncVM command, but we still need to find our reporting VM and fill in the $report variable with it and we need to take the production snapshot and fill in $snapshot with it. Running the following commands before calling Sync-TintriVDisk in our script ought to suffice for the $report variable:

$reportname = "sqlrep1"
$report = Get-TintriVM -Name $reportname

Here, we’re setting a variable name with the name of the reporting VM, requesting that VM by name and storing the object in the $report variable we need for SyncVM. We don’t need to do this in two distinct steps — the -Name option allows us to specify “sqlrep1”  as a string. We’ll come back to why we’re doing this as we continue to build this example.

By looking at the API documentation for snapshots, it looks like breaking down of the snapshot process into smaller tasks consists of these steps:

  1. Find the VM,
  2. Take the snapshot, remembering the returned snapshot ID, and
  3. Retrieve that snapshot object.

This equates to the following PowerShell commands being added to our script before the SyncVM call:

$prodname = "sqlprod1"
$prod = Get-TintriVM -Name $prodname
$snapshotIdNew-TintriVMSnapshot `
    -SnapshotDescription "Reporting snapshot" `
    -VM $prod `
    -SnapshotConsistency CRASH_CONSISTENT
$snapshot = Get-TintriVMSnapshot -VM $prod -SnapshotId $snapshotId

I’m using crash consistent snapshots here, but VM-consistent work and may be a better choice depending on your use case.

Summary

At this point, our automation script is not complete, but is starting to take some real shape. We’re currently able to use SyncVM to automatically take a snapshot of one of our production VMs, and attach the data and log disks from that snapshot to our reporting VM.

I’ll summarise all of the code that we have so far here. Given that we started at the bottom and have been working backwards, it’ll be helpful to see it all in one. I’ll also add some PowerShell comments to the code to make it easier to understand and maintain over time.

I’ve also added an extra line of code to call Connect-TintriServer to create a PowerShell API session to the Tintri VMstore. We introduced this cmdlet, and using Single-Sign-On (SSO) in an earlier article.

Keep your eye out for subsequent articles.

# Variables containing our VM names (for now)
$prodname = "sqlprod1"
$reportname = "sqlrep1"

# Connect to our VMstore
$vmstore = "vmstore01.vmlevel.com"
Connect-TintriServer -UseCurrentUserCredentials $vmstore

# Retrieve a VM object for both VMs by name
$report = Get-TintriVM -Name $reportname
$prod = Get-TintriVM -Name $prodname

# Take a snapshot of our production VM and using the
# Returned snapshot ID, retrieve the snapshot object
$snapshotIdNew-TintriVMSnapshot `
   -SnapshotDescription "Reporting snapshot" `
   -VM $prod `
   -SnapshotConsistency CRASH_CONSISTENT
$snapshot = Get-TintriVMSnapshot -VM $prod `
   -SnapshotId $snapshotId

# Use SyncVM's vDisk sync to synchronise the data and
# log vDisks from the prod snapshot to the reporting VM
$result = Sync-TintriVDisk -VM $report `
   -SourceSnapshot $snapshot `
   -AllButFirstVDisk

[Machinery image used unmodified under CC 2.0 and was provided by Jori Samonen]