Service exposition

Data products expose interfaces to the outside world. These interfaces (whether UIs, or APIs) can be accessed by other products or by end users. Clients accessing the interfaces can run inside or outside of the same Kubernetes cluster. For example, Apache ZooKeeper is a dependency for other products, and it usually needs to be accessible only from within Kubernetes, while Apache Superset is a data analysis product for end users and therefore needs to be accessible from outside the Kubernetes cluster. Users connecting to Superset can be restricted within the local company network, or they can connect over the internet depending on the company security policies and demands. This page gives an overview over the different options for service exposition, when to choose which option and how these options are configured.

Motivation

Service exposition is such a complicated topic, that Stackable has build it’s own operator for that: Stackable Listener Operator. The following section explains the motivation behind implementing such an operator instead of using plain regular Kubernetes Services.

Products advertising their addresses

Some products require information about their external accessibility. This is e.g. important for HDFS, where the namenode keeps track of which datanode serves which block. Another case is Kafka, where it is required for client bootstrapping. A common use case is an HDFS client connecting to a namenode in order to read block 42. Therefore, the namenode needs to know which datanode is serving block 42. The namenode then responds with the IP or hostname of the datanode containing that block 42. For that to work, the datanode needs to know it’s external address on startup and tell it the namenode. (And yes, we needed to patch the Hadoop sourcecode for that)

The listener-operator runs as CSI driver (same as the secret-operator) and places files inside the CSI volume, which tell the tool how it is reachable.

Integration with secret-operator

If a tool is secured using TLS or Kerberos, it does not only need to be reachable via the determined address, it also needs a TLS certificate/keytab issued on the determined address. secret-operator integrated with to listener-operator, so that the platform takes care of provisioning certificates with the correct addresses (in the form of SAN entries).

ListenerClasses

A ListenerClass describes how a product should be exposed. Please read on its documentation before continuing on this page.

As a quick reminder, the platform ships with 3 default ListenerClasses:

cluster-internal

Used for listeners that are only accessible internally from the cluster. For example: communication between ZooKeeper nodes.

external-unstable

Used for listeners that are accessible from outside the cluster, but which do not require a stable address. For example: individual Kafka brokers.

external-stable

Used for listeners that are accessible from outside the cluster, and do require a stable address. For example: Kafka bootstrap.

Keep in mind that you are not restricted to this list, you can configure your own custom ListenerClasses.

Configuring the ListenerClass for a Stacklet

The listener-operator is integrated into most of the Stackable products, currently only Stackable Operator for OPA (OpenPolicyAgent) and Stackable Operator for Apache Spark are not using listener-operator.

Most of the products configure the ListenerClass at the role level as follows. However, there are some products that have this option at the rolegroup level. One example is HDFS, where some roles require a listener service per Pod, to individually access single instances.

spec:
  my-role:
    roleConfig:
      listenerClass: external-unstable

Every operator has a documentation section called "Service exposition with ListenerClasses", which may provide details for the specific tool.