Glasnostic Quick Start

This guide lets you quickly evaluate Glasnostic. Using a Kubernetes environment, we’ll show you how to use Glasnostic through the example of an e-commerce application called Online Boutique that we based on Google’s cloud native microservices application of the same name.

For a general introduction to Glasnostic—what it is, who it is for, and why it is important—read this overview.

Key takeaways

You will learn how to:

  1. Measure service interactions within your application;
  2. Visualize these interactions in Glasnostic;
  3. Control them in real-time;
  4. Identify and block unexpected interactions; and
  5. Automatically microsegment the environment to prevent future unintended interactions.

Prerequisites

Kubernetes cluster

You need a running Kuber​​netes cluster, using any Kubernetes version 1.16 or higher. This guide is tested with:

Kubernetes tooling

Make sure you have a working bash and kubectl. Also, make sure you have openssl installed as the installer needs it to create certificates for communicating with the Kubernetes cluster.

Note: Please make sure your system time is set correctly to avoid issues with OpenSSL generating the required certificates.

The commands in this guide expect to find the proper kubeconfig for your cluster in the standard ~/.kube/config location. If you require a different kubeconfig file, make sure to set the KUBECONFIG environment variable accordingly, e.g. by running:

export KUBECONFIG=path/to/correct/kubeconfig

Glasnostic environment

For this guide, you need a Glasnostic account, an environment named Online Boutique, and a network within that environment.

  1. Create a Glasnostic account at https://glasnostic.com/signup.
  2. To create an environment and network, see Creating an Environment.

Once set up, your environment should look like this:

Environment settings

Install Glasnostic

To install Glasnostic, run:

curl -s https://get.glasnostic.com/install-k8s | bash -s <NETWORK_ID>

where <NETWORK_ID> is the network ID that was created when you created your network.

This may take a minute or two.

Deploy the Online Boutique application

Run this command to deploy the application:

kubectl apply -f https://get.glasnostic.com/quick-start/boutique.yaml

Note: If you are trying to reinstall the application, it’s always best to remove any leftovers from a previous install by running kubectl delete ns glasnostic-online-boutique. See Uninstalling Online Boutique, below.

Once the installation has been completed, you can view it in Glasnostic. Open the Console and navigate to the Online Boutique environment you created. Once the application has started up, the topology graph should look like this:

Screenshot of Glasnostic console

Note: The deployment creates an internet-facing load balancer by default. If you do not wish to create one or if your cluster setup does not support the creation of load balancers, please delete the frontend-external service in the glasnostic-online-boutique namespace. Alternatively, download the k8s yaml instructions and edit them to disable the creation of the load balancer before deploying the application.

See troubleshooting below if you don’t see the environment or the application.

Application architecture

Online Boutique is a web-based e-commerce application that consists of 10 microservices, including a product catalog, cart, checkout, payment and shipping. You can browse items, add products to your cart, and purchase items at http://<HOST>/, where <HOST> is the hostname returned by this command:

kubectl get service frontend-external | awk '{print $4}'

The application architecture is as follows:

Application architecture

The application has a web front end and nine microservices, four of which are shared services, plus a Redis cache to store shopping cart data. Our version also includes a load generator to drive requests.

Incident 1: Abandoned shopping carts

Scenario: The business team complains that revenue from sales on Online Boutique have gone down. The number of users appears to be mostly constant, but some users stopped buying from the site. As the lead DevOps engineer in the company, you are tasked with investigating the issue. You discover from logs that carts are abandoned during checkout in higher numbers than before. You also know that the incident happened shortly after an update to the recommendationservice was deployed.

What’s going on with checkouts?

Since the issue has to do with the checkout process, let’s start by looking at what checkoutservice is doing.

  1. Make sure the Sources perspective is selected in the top menu, then find and click the checkoutservice in the topology graph.
Checkoutservice is selected
  1. Click on the Metrics tab and notice how the aggregate latency ( L ) between checkoutservice on the Sources side and its dependencies on the Destinations side is abnormally high.
Latency graph
  1. Using the Metrics menu in the menu bar, choose Latency as the key metric.

    Metrics dropdown
  2. Then look at the metrics in the Sources and Destinations columns. On the Sources side, checkoutservice has an average latency of 5.5 seconds, but in looking at the destination side, we see that productcatalogservice takes on average 9.3 seconds to complete, while other destinations incur comparatively miniscule latencies. This extremely high latency for requests going to productcatalogservice might very well be the reason for the drop in checkout completions!

Latency graph

Let’s see what’s going on with productcatalogservice.

  1. Click Cancel for now.

The trouble with productcatalogservice

This time, we want to see who is talking to productcatalogservice and how much.

  1. Choose Destinations from the Perspective menu in the menu bar.

    Perspective menu
  2. Find and select productcatalogservice in the topology graph (select all instances if there are more than one), then select the Metrics tab.

    Note: Since we are in Destination perspective, the Destination column is now on the left and the Sources column on the right. The perspective is inverted, however: for each destination, the Sources column shows which sources interact with it, ordered by the current metric.

  3. Note how, overall, interactions with productcatalogservice have very high latency ( L ) and concurrency ( C ). Note too that, while not as extreme as the latencies between checkoutservice and productcatalogservice, the latencies between frontend and productcatalogservice are also too high. Finally, latencies between recommendationservice and productcatalogservice are high, as well.

Metrics
  1. Using the Metrics menu in the menu bar, choose Requests as the key metric.

  2. Notice how the number of requests between recommendationservice and productcatalogservice is unexpectedly high. Apparently, recommendationservice is hammering productcatalogservice, causing excessive concurrency between the two.

Recommendatioservice requests
  1. Using the Metrics menu in the menu bar, choose Concurrency as the key metric.

  2. Notice how the concurrency between recommendationservice and productcatalogservice is also unexpectedly high.

At this point, we could dive head-first into finding the cause for this behavior and then readying a patch to deploy, but that of course would take some time—during which carts would continue to be abandoned during checkout. It would be much better to contain the situation while the team diagnoses what’s going on by exerting some backpressure against recommendationservice.

  1. Click Cancel for now.

Taking control

Checkoutservice is critical for completing purchases and thus takes precedence over recommendationservice, which merely shows related products. We’ll therefore apply some backpressure against the latter to free up productcatalogservice capacity for the former—at least until the team gets a chance to fix the resource behavior of recommendationservice.

  1. Switch back to Source perspective by choosing Sources from the Perspective menu in the menu bar.

  2. Click the Create View button and make sure the Definition tab is selected.

  3. Enter recommendation* into the Source column and productcatalog* into the Destination column, hitting Return each time.

    Note: We are using wildcards (*) here because we want this view to apply to all instances—past, present, and future—and because the exact instance of the pod will naturally change over time.

  1. Click the Metrics tab and enter “Recommendation service backpressure” in the name box.

  2. To set a connection pool-aware policy for requests from recommendationservice instances to productcatalogservice instances, click Set Policy for concurrency ( C ) and enter “30”, then click Set (or hit Return). This policy will limit concurrent requests to 30.

  3. Because exerting backpressure will increase latencies and thus increase the likelihood that the response will be no longer needed, let’s also shed long-running requests. Click Set Policy for latency ( L ) and enter “1000” to limit request durations to 1.0 seconds and click Set.

Set policy
  1. Any policies you create are committed to a git repository for auditing. As with regular git workflows, click Commit and then Push on the next screen to push the changes live.

Confirm checkoutservice has recovered

Once the policies have been pushed out onto the network, you should see their effect in the new data points as they roll in.

  1. Staying in the Recommendation service backpressure view, you should be able to see that both concurrency and latency (C and L) are now actively controlled as intended.
Policy hit demonstration
  1. Now let’s confirm that checkoutservice has recovered. Click the back button Go back button to return to the Home view.

  2. Click Create View. Again, since we want to keep this view around for a while, enter checkoutservice* in the Source column and * in the Destination column so we capture all instances of services past, present and future.

  3. Click the Metrics tab, name it “Checkout interactions”, click Commit and then Push on the next page.

  4. Using the Metric menu in the menu bar, choose Latency as our key metric.

  5. The Destinations column should now confirm that productcatalogservice latency has gone down from 9.3 seconds to just around 100 ms and that everything appears to be running smoothly.

Updated Latency value

Summary

We started out by examining the checkoutservice and discovered that its productcatalogservice dependency exhibited unacceptable latencies. We then looked at which services might put undue load on productcatalogservice and identified recommendationservice as the culprit. We then exerted backpressure against recommendationservice and confirmed that this action allowed checkoutservice to recover.

That’s it! This is how you can use Glasnostic to quickly detect issues, identify their causes, and fix them.

Incident 2: Unexpected egress

Scenario: While exploring the topology, you notice that shippingservice is talking to an unknown service 247.173.190.239. This is unexpected, to say the least, and almost certainly undesired, so you decide to investigate the behavior a bit before putting a stop to it.

Run this command to kick this incident off:

kubectl label -n glasnostic-online-boutique pod -l app=shippingservice ENABLE_CACHE_SHIPPING=true

Examining the situation

  1. Ensure you are in the Home view.
  2. Look at the topology graph. Within a few seconds, you should see a line running from shippingservice to 247-173-190-239 1. This looks wrong on many levels, but we want to examine it before we shut it down.
  3. Click the unknown endpoint, then click the Metrics tab to inspect what is going on.
Unknown endpoint selected

You see that this behavior started just a few minutes ago and doesn’t amount to more than a trickle. Nevertheless, this is definitely not anything the application is supposed to do.

Take control

To stop this behavior, we’ll simply apply a zero-request policy to deny the interaction.

  1. Name the view “Unexpected shipping service egress”.
  2. Click Set Policy for requests ( R ), enter “0” and click Set or hit Return.
  3. Activate the new policy by clicking Commit and then Push on the next page.
Denied policy demonstration
  1. After a few seconds you can confirm that interactions along this particular route are now denied.
Denied policy demonstration

Summary

We discovered an unexpected interaction from shippingservice to an unknown endpoint and examined a bit of its history and characteristics before putting an end to it, then confirmed the interaction was in fact denied.

By doing so we have prevented a suspicious egress from a specific source to a specific destination. But what if we wanted to ensure that all unexpected interactions are denied automatically?

Automatic micro-segmentation

Glasnostic provides a micro-segmentation feature that automatically blocks unknown interactions. First, you define the set of known interactions by specifying a time frame. Glasnostic then collects all interactions during that time frame and compiles them into an automatic allowlist of known safe interactions.

Configuring automatic micro-segmentation consists of these steps:

  1. Click the Segmentation tab and click Edit.

  2. Enter the time frame that should be taken as the baseline from which to collect interactions and click Set. This generates the list of allowable interactions. If you want to allow additional interactions, enter them in the Manual Allowlist tab.

    Note: Glasnostic’s automatic micro-segmentation operates on logical service labels, not instances of services. As a result, new instances of services will be automatically allowed, whereas new logical services will not.

Setting automatic micro-segmentation
  1. Click Commit and then Push on the next page.

Automatic micro-segmentation is now active, as is indicated by the LED in the Segmentation tab.

Summary

We set up automatic micro-segmentation to prevent unknown interactions from occurring. You can test whether it works by trying to talk to the Online Boutique application with a web browser. If you didn’t use the browser to talk to it during the time frame you chose as the baseline for the segmentation, the connection should now be denied and show up as a blue (denied) line in the topology graph.

Next steps

Check out Glasnostic Overview to get more ideas about what is Glasnostic and how it can help you

Troubleshooting

If you don’t see any data in your environment, make sure commander.glasnostic.com:443 is reachable from your cluster:

curl -o /dev/null -s -w "%{http_code}\n" https://glasnostic.com/

The first line should read HTTP/2 200.

If you encounter a problem with pod glasnosticd init time too long on MacKooks using Docker Desktop, you can try to:

  1. Uninstall Glasnostic for Kubernetes
  2. mkdir /var/local
  3. Reinstall Glasnostic for Kubernetes

Uninstalling Online Boutique

To remove Online Boutique from your cluster, run:

kubectl delete ns glasnostic-online-boutique

Uninstalling Glasnostic

Run the following command to uninstall Glasnostic:

kubectl delete ns glasnostic-system
kubectl delete Mutatingwebhookconfigurations/glasnostic-sidecar-injector