Glasnostic Quick Start

This guide lets you quickly evaluate Glasnostic. We’ll show you how to use Glasnostic through the example of an e-commerce application called Online Boutique based on Google’s cloud native microservices application of the same name.

For a general introduction to Glasnostic—what it is, who it is for, and why it is important—read this overview.

Key takeaways

In this guide, you will learn how to:

  • measure service interactions within your application;
  • visualize these interactions in Glasnostic;
  • control them in real-time;
  • identify and block unexpected interactions; and
  • automatically microsegment the environment to prevent future unintended interactions.

Prerequisites

Kubernetes cluster

You need a running Kuber​​netes cluster, using any Kubernetes version 1.16 or higher. This guide is tested with:

Kubernetes tooling

Make sure you have a working bash and kubectl. The commands in this guide expect to find the proper kubeconfig for your cluster in the standard ~/.kube/config location. If you require a different kubeconfig file, make sure to set the KUBECONFIG environment variable accordingly, e.g. by running:

$ export KUBECONFIG=path/to/correct/kubeconfig

Glasnostic environment

For this guide, you need a Glasnostic account, an environment named Online Boutique, and a network within that environment.

  1. Create a Glasnostic account at https://glasnostic.com/signup.
  2. To create an environment and network, see Creating an Environment.

Once set up, your environment should look like this:

Screenshot of Glasnostic console

Install Glasnostic

To install Glasnostic, run:

curl -s https://get.glasnostic.com/install-k8s | bash -s <NETWORK_ID>

<NETWORK_ID> is the network ID that was created when you created your network.

This may take a minute or two.

Deploy the Online Boutique application

Run this command to deploy the application:

kubectl apply -f https://get.glasnostic.com/quick-start/boutique.yaml

Once the installation has been completed, you can view it in Glasnostic. Open the Console and navigate to the Online Boutique environment you created, which should look like this:

Screenshot of Glasnostic console

See troubleshooting below if you don’t see the environment or the application.

Application architecture

Online Boutique is a web-based e-commerce application that consists of 10 microservices, including a product catalog, cart, checkout, payment and shipping. You can browse items, add products to your cart, and purchase items at http://<HOST>/, where <HOST> is the hostname returned by this command:

kubectl get service frontend-external | awk '{print $4}'

The application architecture is as follows:

Application architecture

The application has a web front end and nine microservices, four of which are shared services, plus a Redis cache to store shopping cart data. Our version also includes a load generator to drive requests.

Incident: Abandoned shopping carts

Scenario: The business team complains that revenue from sales on Online Boutique has gone down. The number of users appears to be mostly constant, but some users stopped buying from the site. As the lead DevOps engineer in the company, you are tasked with investigating the issue. You discover from logs that carts are abandoned during checkout in higher numbers than before. You also know that the incident happened shortly after an update to the Recommendation Service was deployed.

What’s going on with checkouts?

Since the issue has to do with the checkout process, let’s start by looking at what checkoutservice is doing.

  1. Make sure the Sources perspective is selected, then find and click the checkoutservice in the topology graph.
Screenshot of Glasnostic console
  1. Click on the Metrics tab and notice how the aggregate latency ( L ) between checkoutservice on the Sources side and its dependencies on the Destinations side is abnormally high.

  2. Using the Metrics menu in the menu bar, choose Latency as the key metric. Then look at the metrics in the Sources and Destinations columns. On the Sources side, checkoutservice has an average latency of just 150 milliseconds (ms), but in looking at the destination side, we see that productcatalogservice takes on average 14 seconds to complete, while six other destinations incur comparatively miniscule latencies. This extremely high latency for requests going to productcatalogservice might very well be the reason for the drop in checkout completions!

Screenshot of Glasnostic console

Let’s see what’s going on with productcatalogservice.

  1. Click Cancel for now.

The trouble with productcatalogservice

This time, we want to see who is talking to productcatalogservice and how much. Choose Destinations from the Perspective menu in the menu bar.

  1. Find and click on the productcatalogservice in the topology graph. (Click all instances if there is more than one.)

    Note: Since we are in "Destination" perspective, the Destination column is now on the left and the Sources column on the right.

  2. Note how, overall, interactions with productcatalogservice have very high latency ( L ) and concurrency ( C ). Note too that, while not as extreme as the latencies between checkoutservice and productcatalogservice, the latencies between frontend and productcatalogservice are also too high. Finally, latencies between recommendationservice and productcatalogservice are high, as well.

Screenshot of Glasnostic console
  1. Using the Metrics menu in the menu bar, choose Requests as the key metric. Notice how the number of requests between recommendationservice and productcatalogservice is unexpectedly high. Apparently, recommendationservice is hammering productcatalogservice, causing excessive concurrency between the two.
Screenshot of Glasnostic console
  1. Using the Metrics menu in the menu bar, choose Concurrency as the key metric. Notice how the concurrency between recommendationservice and productcatalogservice is also unexpectedly high.

At this point, we could dive head-first into finding the cause for this behavior and then readying a patch to deploy, but that of course would take some time—during which carts would continue to be abandoned during checkout. It would be much better to contain the situation while the team diagnoses what’s going on by exerting some backpressure against recommendationservice.

Taking control

Checkoutservice is critical for completing purchases and thus takes precedence over recommendationservice, which merely shows related products. We’ll therefore apply some backpressure against the latter to free up productcatalogservice capacity for the former—at least until the team gets a chance to fix the resource behavior of recommendationservice.

For this action, let’s switch back and choose Sources from the Perspective menu in the menu bar.

  1. Click the Create View button and make sure the Definition tab is selected. Enter recommendation* into the Source column and productcatalog* into the Destination column, hitting Return each time. Click the Metrics tab and enter “Recommendation service backpressure” in the name box.

    Note: We are using wildcards (`*`) here because we want this view to apply to all instances — past, present, and future — and because the exact instance of the pod will naturally change over time.

  2. To set a connection pool-aware policy for requests from recommendationservice instances to productcatalog instances, click Set Policy for concurrency ( C ) and enter “30”, then click Set (or hit Return). This policy will limit concurrent requests to 30.

    Because exerting backpressure will increase latencies and thus increase the likelihood that the response will be no longer needed, let’s also shed long-running requests. Click Set Policy for latency ( L ) and enter “1000” to limit request durations to 1.0 seconds.

  3. Any policies you create are committed to a git repository for auditing. As with regular git workflows, click Commit and then Push on the next screen to push the changes live.

Confirm checkoutservice has recovered

Once the policies have been pushed out onto the network, you should see their effect in the new data points as they roll in.

  1. Staying in the Recommendation service backpressure view, you should be able to see that both concurrency and latency ( L and C ) are now actively controlled as intended.
Screenshot of Glasnostic console
  1. Now let’s confirm that checkoutservice has recovered. Click the back buttonspan Go back button to return to the Home view. Then click Create View. Again, since we want to keep this view around for a while, enter checkoutservice* in the Source column and * in the Destination column so we capture all instances of services past, present, and future. Click the Metrics tab, name it “Checkout interactions”, click Commit and then Push on the next page.

    Using the Metric menu in the menu bar, choose Latency as our key metric.

  2. The Destinations column should now confirm that productcatalogservice latency has gone down from 14 seconds to just around 100 ms and that everything appears to be running smoothly.

Screenshot of Glasnostic console

Summary

We started out by examining the checkoutservice and discovered that its productcatalogservice dependency exhibited unacceptable latencies. We then looked at which services might put undue load on productcatalogservice and identified recommendationservice as the culprit. We then exerted backpressure against recommendationservice and confirmed that this action allowed checkoutservice to recover.

That’s it! This is how you can use Glasnostic to quickly detect issues, identify their causes, and fix them.

Incident: Unexpected egress

Scenario: While exploring the topology, you notice that shippingservice is talking to an unknown service 247.173.190.239. This is unexpected, to say the least, and almost certainly undesired, so you decide to investigate the behavior a bit before putting a stop to it.

Run this command to kick this incident off:

kubectl label pod -l app=shippingservice ENABLE_CACHE_SHIPPING=true

Examining the situation

  1. Click back to the Home view and look at the topology graph. Within a few seconds, you should see a line running from shippingservice to 247-173-190-239 1. This looks wrong on many levels, but we want to examine it before we shut it down.
  2. Click the unknown endpoint, then click the Metrics tab to inspect what is going on.
Screenshot of Glasnostic console

You see that this behavior started just a few minutes ago and doesn’t amount to more than a trickle. Nevertheless, this is definitely not anything the application is supposed to do.

Take control

To stop this behavior, we’ll simply apply a zero-request policy to deny the interaction.

  1. Name the view “Unexpected shipping service egress”.
  2. Click Set Policy for requests ( R ), enter “0” and click Set or hit Return.
  3. Activate the new policy by clicking Commit and then Push on the next page.
Screenshot of Glasnostic console

After a few seconds you can confirm that interactions along this particular route are now denied.

Screenshot of Glasnostic console

Summary

We discovered an unexpected interaction from shippingservice to an unknown endpoint and examined a bit of its history and characteristics before putting an end to it, then confirmed the interaction was in fact denied.

By doing so we have prevented a suspicious egress from a specific source to a specific destination. But what if we wanted to ensure that all unexpected interactions are denied automatically?

Automatic micro-segmentation

Glasnostic provides a micro-segmentation feature that automatically blocks unknown interactions. First, you define the set of known interactions by specifying a time frame. Glasnostic then collects all interactions during that time frame and compiles them into an automatic allowlist of known safe interactions.

Configuring automatic micro-segmentation consists of these steps:

  1. Click the Segmentation tab and click Edit.
  2. Enter the time frame that should be taken as the baseline from which to collect interactions and click Set. This generates the list of allowable interactions. If you want to allow additional interactions, enter them in the Manual Allowlist tab.

    Note: Glasnostic’s automatic micro-segmentation operates on logical service labels, not instances of services. As a result, new instances of services will be automatically allowed, whereas new logical services will not.

Screenshot of Glasnostic console
  1. Click Commit and then Push on the next page.

Automatic micro-segmentation is now active, as is indicated by the LED in the Segmentation tab.

Summary

We set up automatic micro-segmentation to prevent unknown interactions from occurring. You can test whether it works by trying to talk to the Online Boutique application with a web browser. If you didn’t use the browser to talk to it during the time frame you chose as the baseline for the segmentation, the connection should now be denied and show up as a blue (denied) line in the topology graph.

Next steps

  1. Check out Glasnostic Overview to get more ideas about what is Glasnostic and how it can help you
  2. Learn about other metrics and features in the Glasnostic UI.

Troubleshooting

If you don’t see any data in your environment, make sure commander.glasnostic.com:443 is reachable from your cluster:

curl -o /dev/null -s -w "%{http_code}\n" https://glasnostic.com/

The first line should read HTTP/2 200.

If you encounter a problem with pod glasnosticd init time too long on MacKooks using Docker Desktop, you can try to:

  1. Uninstall Glasnostic for Kubernetes
  2. mkdir /var/local
  3. Reinstall Glasnostic for Kubernetes

Uninstalling Online Boutique

To remove Online Boutique from your cluster, run:

kubectl delete -f demo/boutique.yaml

Uninstalling Glasnostic

Run the following command to uninstall Glasnostic:

$ kubectl delete ns glasnostic-system
$ kubectl delete Mutatingwebhookconfigurations/glasnostic-sidecar-injector