Introduction

Hadron is a distributed data storage system designed to ingest data in the form of events, and to facilitate working with that data in the form of multi-stage structured workflows.

Building distributed applications can be tough. Teams might have tens, hundreds or even thousands of microservices. Platforms may have thousands of data signals ranging from business critical application events, to telemetry signals including logs, tracing, metrics and the like. All of this data is important, and now more than ever teams need a way to not only capture this data, but also to work with this data in a scalable and extensible way.

Hadron offers a powerful solution to these problems using the following primitives:

  • Events - all data going into and coming out of Hadron is structured in the form of events.
  • Streams - durable logs for storing arbitrary data, with absolute ordering and horizontal scalability.
  • Pipelines - workflow orchestration for data on Streams, providing structured concurrency for arbitrarily complex multi-stage structured workflows.
  • Exchanges - ephemeral messaging used to exchange non-durable events between processes, perfect for GraphQL Subscriptions, WebSockets, Push Notifications and the like.
  • Endpoints - general-purpose RPC handlers for leveraging Hadron's powerful networking capabilities.
  • Producers - client processes connected to Hadron, written in any language, working to publish data to Hadron.
  • Consumers - client processes connected to Hadron, written in any language, working to consume data from Streams, process Pipeline stages, consume ephemeral messages from Exchanges, or even handle RPC Endpoints.

Hadron was born into the world of Kubernetes, and Kubernetes is a core expection in the Hadron operational model. To learn more about how Hadron leverages the Kubernetes platform, go to the Kubernetes chapter of this guide.

The next chapter of this guide will walk you through the process of getting Hadron up and running. See you there.

Quick Start

Hadron is designed to run within a Kubernetes cluster. This chapter will bring you up-to-speed on everything you need to know to start running your own Hadron clusters.

Kubernetes

If you already have Kubernetes clusters, then choose a cluster to use. Starting with a development oriented cluster or namespace is typically a good idea. If you do not have a Kubernetes cluster available for use, then you have a few options:

  • Get warmed up to Kubernetes using Kind - a tool for running local Kubernetes clusters using Docker container “nodes”. The Use Case: Local Development in this guide is a great way to get started with Hadron and Kind for local development.
  • Managed Kubernetes. Every cloud provider has a managed Kubernetes offering, take your pick.
  • The Hadron Collider — our hosted Hadron Cloud (under construction). The only hosted and fully managed Hadron solution. Pick your cloud, pick your region, and start using Hadron. The Hadron Cloud offers deep integrations with major cloud providers, and is the best way to get started with Hadron.

Helm is the package manager for Kubernetes, and it will need to be installed and available for command-line usage to get started. If you've been using Kubernetes for any amount of time, then you are probably already using helm.

Installation

Before we install the Hadron Operator, we are going to install cert-manager. Though cert-manager is not required by Hadron, it does greatly simplify the setup of TLS certificates, which Hadron uses for its validating webhooks. Instead of manually crafting our own certs, we'll stick with cert-manager.

helm repo add jetstack https://charts.jetstack.io
helm upgrade cert-manager jetstack/cert-manager --install --set installCRDs=true

Now we are ready to install the Hadron Operator:

# Helm >= v3.7.0 is required for OCI usage.
helm install hadron-operator oci://ghcr.io/hadron-project/charts/hadron-operator --version 0.1.3

This will install the Hadron Operator along with roles, service accounts, validating webhooks, and other Kubernetes resources which the Operator requires.

Create a Stream

You are now ready to create a Stream. Create a file stream.yaml with the following contents:

apiVersion: hadron.rs/v1beta1
kind: Stream
metadata:
  name: events
spec:
  partitions: 3
  image: ghcr.io/hadron-project/hadron/hadron-stream:latest
  pvcVolumeSize: "5Gi"

See the Stream Reference for more details on the spec fields listed above, as well as other config options available for Streams. Now apply the file to your Kubernetes cluster as shown below using the kubectl CLI (part of the Kubernetes distribution).

kubectl apply -f stream.yaml

In this example, we are applying the Stream CR instance as an independent Kubernetes manifest. For production usage and long-term maintainability, it is recommended to include your Hadron manifests as part of a helm chart which will likely include other Streams, Pipelines and Tokens.

Applying this resource to your cluster will result in the creation of a Kubernetes StatefulSet bearing the same name, along with the creation of a few Kubernetes Services.

Create a Token

In order to access the resources of a Hadron cluster, a Token CR must be created describing a set of permissions for the bearer of the Token. Create a file token.yaml with the following contents:

apiVersion: hadron.rs/v1beta1
kind: Token
metadata:
  name: hadron-full-access
spec:
  all: true

See the Token Reference for more details on the spec of Hadron Tokens. Now apply the file to your Kubernetes cluster as shown below.

kubectl apply -f token.yaml

Applying this resource to your cluster will result in the creation of a Kubernetes Secret in the same namespace bearing the same name. The generated secret may be mounted and used as an env var for your application deployments and the like, or it may be used by the Hadron CLI for cluster access.

CLI Access

Now that we've defined a Stream along with a Token to allow us to access that Stream, we are ready to start publishing and consuming data.

First, let's get a copy of the Token's generated Secret (the actual JWT) for later use. We will use kubectl to extract the value of the secret:

HADRON_TOKEN=$(kubectl get secret hadron-full-access -o jsonpath='{.data.token}' | base64 --decode)

With the decoded token set as an environment variable, let's now run the CLI:

kubectl run hadron-cli --rm -it \
    --env=HADRON_TOKEN=${HADRON_TOKEN} \
    --env=HADRON_URL='http://events.default.svc.cluster.local:7000' \
    --image ghcr.io/hadron-project/hadron/hadron-cli:latest

Here we are running a temporary pod which will be removed from the Kubernetes cluster when disconnected. Once the pod session is started, you should see the help text of the CLI displayed, and then you should have access to the shell prompt.

From here, we can execute CLI commands to interact with our new Stream. Let's publish a simple event:

hadron stream pub --type example.event '{"demo": "live"}'

This will publish a simple event as a JSON blob. Publishing of binary events, such as protobuf, is also fully supported. Let's create a consumer to read this event.

hadron stream sub --group demo --start-beginning

You should see some output which looks like:

handling subscription delivery id=<generated> source=hadron-cli specversion=1.0 type=example.event optattrs={} data='{"demo": "live"}'

Wrapping Up

This example shows the most basic usage of Hadron. From here, the next logical steps might be:

  • Continue reading: learn more about how Hadron leverages the Kubernetes platform in the next chapter of this guide.
  • Define Pipelines: start modeling your application workflows as code using Pipelines. See the Use Case: Service Provisioning for deeper exploration on this topic.
  • Application Integration: go to the Use Case: Local Development for more details on how to integrate with Hadron.

Hadron & Kubernetes

Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.
~ kubernetes.io

Kubernetes has become a cornerstone of the modern cloud ecosystem, and Hadron is purpose built for the Kubernetes platform. Hadron is native to Kubernetes. It was born here and it knows the ins and outs.

Hadron is designed from the ground up to take full advantage of Kubernetes and its rich API for deploying and running applications. Building upon this foundation has enabled Hadron to achieve an operational model with a simple setup, which removes performance bottlenecks, simplifies clustering and consensus, provides predictable and clear scalability, and positions Hadron for seamless and intuitive integration with user applications and infrastructure.

Each of the above points merits deeper discussion, all of which are covered in further detail throughout the reference section of this guide. Here are the highlights.

Simple Setup

Everything related to Hadron operations is handled by the Operator and is driven via Kubernetes configuration files. Provisioning, auto-scaling, networking, access control, all of this is controlled through a few lines of YAML which can be versioned in source control and reviewed as code.

The Hadron Operator handles the entire lifecycle of Hadron clusters, including:

  • upgrading a cluster to a new version of Hadron,
  • horizontally scaling a cluster,
  • adding credentials and access control to a cluster,

All of this and more is declarative and fully managed, which means less operational burden for users.

Removing Performance Bottlenecks

Horizontally scaling a Hadron cluster is fully dynamic, clients are able to detect cluster topology changes in real-time as they take place, and very importantly: the path which data takes from the moment of publication to the moment it is persisted to disk is direct and simple. No extra network hops. No overhead. Just an HTTP2 data stream directly from the client to a Hadron partition's internal function which writes data to disk.

Seamless and Intuitive Integration

Hadron clusters are exposed for application and infrastructure integration using canonical Kubernetes patterns for networking and access. Clients receive a stream of cluster metadata and can react in real-time to topology changes to maximize use of horizontal scaling and high-availability.

If your applications are already running in Kubernetes, then integration couldn't be more simple.

Events

A specification for describing event data in a common way
~ cloudevents.io

Events are everywhere. Everything is an event. As the industry has continued to work with event-driven data streams, we've accumulated many best practices on how to model our data in a reliable, extensible, and intuitive way. CloudEvents is at the heart of this movement, and also happens to be at the very core of Hadron.

Everything in Hadron is an event, a CloudEvents 1.0 event. Streams, Pipeline inputs and outputs, ephemeral messages, everything in Hadron is built around the CloudEvents model.

Ease of Use

Hadron events are well-structured, easy to analyze, and prime for automation and application usage.

Interoperable

Hadron is positioned from the very beginning to integrate seamlessly with the greater event-driven ecosystem.

Ready to Go

Hadron clients speak CloudEvents fluently. Publishing data to, and consuming data from Hadron is clear, concise, and covers a wide range of applications.

Streams

Streams are append-only, immutable logs of data with absolute ordering per partition.

Streams are deployed as independent StatefulSets within a Kubernetes cluster, based purely on a Stream CRD (YAML) stored in Kubernetes.

Each pod of a Stream's StatefulSet acts as a leader of its own partition. Kubernetes guarantees the identity, stable storage and stable network address of each pod. Replication is based on a deterministic algorithm and does not require consensus due to these powerful identity and stability properties.

Scaling

Streams can be horizontally scaled based on the Stream's CRD. Scaling a Stream to add new partitions will cause the Hadron Operator to scale the corresponding StatefulSet.

Each pod constitutes an independent container process with its own set of system resources (CPU, RAM, storage), all of which will span different underlying virtual or hardware nodes (OS hoststs) of the Kubernetes cluster. This helps to ensure availability of the Stream.

High Availability (HA)

In Hadron, the availability of a Stream is reckoned as a whole. If any partition of the Stream is alive and running, then the Stream is available. The temporary loss of an individual partition does not render the Stream overall as being unavailable.

This is a powerful property of Hadron, and is an intentional design decision which the Hadron ecosystem builds upon.

Hadron clients automatically monitor the topology of the connected Hadron cluster, and will establish connections to new partitions as they become available, and will failover to healthy partitions when others become unhealthy. Hashing to specific partitions based on the subject of an event or event batch is still the norm, however clients also dynamically take into account the state of connections as part of that procedure.

Producers

Hadron clients which publish data to a Hadron Stream are considered to be Producers. Stream Producers establish durable long-lived connections to all partitions of a Stream, and will publish data to specific partitions based on hashing the subject of an event or event batch.

Consumers

Hadron clients which consume data from a Stream are considered to be Consumers. Stream Consumers establish durable long-lived connections to all partitions of a Stream, and consume data from all partitions.

Every event consumed includes details on the ID of the event as well as the partition from which it came. This combination establishes uniqueness, and also happens to be a core facet of the CloudEvents ecosystem.

Consuming data from a Stream is transactional in nature, and the lifetime of any transaction is tied to the client's connection to the corresponding Stream partition. If a Consumer's connection is ever lost, then any outstanding deliveries to that Consumer are considered to have failed and will ultimately be delivered to another Consumer for processing.

Groups

Stream Consumers may form groups. Groups are used to load-balance work across multiple processes working together as a logical unit.

All consumers must declare their group name when they first establish a connection to the backing Stream partitions. When multiple consumers connect to the Hadron cluster bearing the same group name, then they are treated as members of the same group.

Durable or Ephemeral

Stream Consumers may be durable or ephemeral. Durable groups will have their progress recorded on each respective Stream partition. Ephemeral groups only have their progress tracked in memory, which is erased once all group members disconnect per partition.

Data Lifecycle

Data residing within a Hadron Stream has a configurable retention policy. There are 2 retention policy strategies currently available:

  • "time": (default) this strategy will preserve the data on the Stream for a configurable amount of time (defaults to 7 days). After the data has resided on the Stream for longer than the configured period of time, it will be deleted.
  • "retain": this strategy will preserve the data on the Stream indefinitely.

These configuration options are controlled via the Stream CRD.

Pipelines

Pipelines are workflow orchestration for data on Streams, providing structured concurrency for arbitrarily complex multi-stage workflows.

Pipelines exist side by side with their source Stream, and Streams may have any number of associated Pipelines. Pipelines are triggered for execution when an event published to a Stream has an event type which matches one of the trigger patterns of an associated Pipeline.

Why

So, why do Pipelines exist, and what are they for?

Practically speaking, as software systems grow, they will inevitably require sequences of tasks to be executed, usually according to some logical ordering, and often times these tasks will cross system/service boundaries.

When a system is young, such workflows are often simple, unnamed, and involve only one or two stages. As the system evolves, these workflows will grow in the number of stages, and orchestration often becomes more difficult.

Pipelines offer a way to name these workflows, to define them as code so that they can be versioned and reviewed. Pipelines are a way to avoid sprawl, to avoid confusion, and to bring clarity to how a software system actually functions.

Pipelines can be used to define the entire logical composition of a company's software systems. A specification of a system's functionality.

Scaling & High Availability

Pipelines exist side by side with their source Stream. All scaling and availability properties of the source Stream apply to any and all Pipelines associated with that Stream. See the Streams Overview for more details on these properties.

Publishers

Pipelines do not have their own direct mechanism for publishing data to a Pipeline. Instead, data is published to the Pipeline's source Stream, and when an event on that source Stream has a type field which matches one of the Pipeline's triggers, then a new Pipeline execution will be started with that event as the "root event" of the Pipeline execution.

Triggers

Every Pipeline may be declared with zero or more triggers. When an event is published to a Pipeline's source Stream, its type field will be compared to each of the matcher patterns in the Pipeline's triggers list. If any match is found, then a new Pipeline execution will begin for that event.

If a Pipeline is declared without any triggers, or with a trigger which is an empty string (""), then it will match every event published to its source Stream.

Consumers

Pipelines are consumed in terms of their stages. As Hadron client programs register as Pipeline consumers, they are required to specify the stage of the Pipeline which they intend to process. All Pipeline consumers form an implicit group per stage.

Pipeline Evolution

As software systems evolve over time, it is inevitable that Pipelines will also evolve. Pipelines may be safely updated in many different ways. The only dangerous update is to remove a Pipeline's stage. Doing so should ALWAYS be considered to result in data loss. These semantics may change in the future, however it is best avoided.

Adding new stages, changing dependencies, changing stage ordering, all of these changes are safe and Hadron will handle them as expected. See the next section below for best practices on how to make such changes.

There is no renaming of Pipeline stages, this is tantamount to deleting a stage and adding a new stage with a different name.

Best Practices

As Pipelines evolve, users should take care to ensure that their applications have been updated to process any new stages added to the Pipeline. Hadron makes this very simple:

  • Before applying the changes to the Pipeline which adds new stages, first update the application's Pipeline consumers.
  • Add a consumer for any new stages.
  • Deploy the new application code. The new consumers will log errors as they attempt to connect, as Hadron will reject the consumer registry until the new stages are applied to the Pipeline. This is expected, and will not crash the client application. The client will simply back off, and retry the connection again soon.
  • Now it is safe to apply the changes to the Pipeline.

In essence: add your Pipeline stage consumers first.

If this protocol is not adhered to, then the only danger is that the Pipeline will eventually stop making progress, as too many parallel executions will remain in an incomplete state as they wait for the new Pipeline stages to be processed. Avoid this by deploying your updated application code first.

Data Lifecycle

All Pipeline stage outputs are stored on disk per Pipeline instance. This data is preserved per Pipeline instance until all stages of the Pipeline instance have completed, at which point all data of that Pipeline instance is deleted.

For cases where Pipelines need to produce outputs which should be transactionally written back to the source Stream of the Pipeline, the Transactional Pipeline to Stream outputs feature will solve this use case nicely.

Exchanges

These docs are currently under construction.

  • intended for ephemeral events,
  • events are dropped after being consumed or are dropped immediately if there are no consumers,
  • perfect for ephemeral event streams such as GraphQL Subscriptions, WebSockets, Push Notifications and the like,

RPC Endpoints

These docs are currently under construction.

  • an abstraction built directly on top of gRPC,
  • can be thought of as something similar to proxied gRPC,
  • clients are configured with automatic serialization & deserialization for nearly identical look and feel of plain old gRPC,

Producers & Consumers

These terms can be interchanged with Publisher and Subscriber respectively.

Hadron clients which publish data to Streams, Exchanges or Endpoints are considered to be Producers. Hadron clients which consume data from Streams, Pipelines, Exchanges or Endpoints are considered to be Consumers.

Producers and Consumers establish durable long-lived connections to backend components in the target Hadron cluster, which helps to avoid unnecessary setup and teardown of network connections.

Producers and Consumers typically exist as user defined code within larger applications. However, they may also exist as open source projects which run independently based on runtime configuration, acting as standalone components, often times both producing and consuming data. The latter are typically referred to as Connectors.

Producers and Consumers may be created in any language. The Hadron team maintains a common Rust client which is used as the shared foundation for clients written in other languages, which provides maximum performance and safety across the ecosystem.

The Hadron team also maintains the Hadron CLI, which is based upon the Rust client and which can be used for basic production and consumption of data from Hadron.

Monitoring

All Hadron components, including the Operator and Streams, are instrumented with Prometheus metrics.

Metrics exposed by Hadron components can be easily collected using ServiceMonitors & PodMonitors which come from the Prometheus Operator project. The Hadron helm chart can optionally generate these monitors for you by setting prometheusOperator.enabled=true. More details are included in the Helm chart's README.

Hadron components expose their metrics over standard HTTP, and can be easily collected using any other pattern supported by Prometheus, such as direct Prometheus discovery and scraping, the OTEL collector, and the like.

Monitoring Mixin

Hadron monitoring can also be configured using Monitoring Mixins. We are currently in the process of adding the Hadron mixin to the official mixin's site, however the functionality is the same:

  • Vendor the Hadron mixin into the repo with your infrastructure config using jsonnet-bundler: jb install https://github.com/hadron-project/hadron/monitoring/hadron-mixin,
  • Generate the Hadron mixin config files (dashboards, alerts, rules &c) along with whatever customizations you would like,
  • Then apply the generated config files to your monitoring stack.

Generating the config and making customizations to it is described in the Monitoring Mixins docs here.

Out of the box, a reference Hadron dashboard is included which can help with getting started. The following image is based on the Pipeline Transactional Processing demo app:

Metrics

The following metrics are exposed by the various Hadron components, which are accessible on port 7002 at the path /metrics on each respective component.

Operator

# HELP hadron_operator_is_leader a gauge indicating if this node is the leader, where 1.0 indicates leadership, any other value does not
# TYPE hadron_operator_is_leader gauge
hadron_operator_is_leader

# HELP hadron_operator_num_leadership_changes the number of leadership changes in the operator consensus group
# TYPE hadron_operator_num_leadership_changes counter
hadron_operator_num_leadership_changes

# HELP hadron_operator_watcher_errors a counter of errors encountered while watching resources in the K8s API
# TYPE hadron_operator_watcher_errors counter
hadron_operator_watcher_errors

# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds

# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds

# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes

# HELP process_threads Number of OS threads in the process.
# TYPE process_threads gauge
process_threads

# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes

# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes

Stream

# HELP hadron_pipeline_active_instances the number of active pipeline instances
# TYPE hadron_pipeline_active_instances gauge
hadron_pipeline_active_instances

# HELP hadron_pipeline_last_offset_processed the last offset to be processed by the pipeline
# TYPE hadron_pipeline_last_offset_processed counter
hadron_pipeline_last_offset_processed

# HELP hadron_pipeline_stage_subscriptions the number of stage subscribers currently registered
# TYPE hadron_pipeline_stage_subscriptions gauge
hadron_pipeline_stage_subscriptions

# HELP hadron_pipelines_watcher_errors k8s watcher errors from the pipelines watcher
# TYPE hadron_pipelines_watcher_errors counter
hadron_pipelines_watcher_errors

# HELP hadron_secrets_watcher_errors k8s watcher errors from the secrets watcher
# TYPE hadron_secrets_watcher_errors counter
hadron_secrets_watcher_errors

# HELP hadron_stream_current_offset the offset of the last entry written to the stream
# TYPE hadron_stream_current_offset counter
hadron_stream_current_offset

# HELP hadron_stream_subscriber_last_offset_processed stream subscriber group last offset processed
# TYPE hadron_stream_subscriber_last_offset_processed counter
hadron_stream_subscriber_last_offset_processed

# HELP hadron_stream_subscriber_group_members stream subscriber group members count
# TYPE hadron_stream_subscriber_group_members gauge
hadron_stream_subscriber_group_members

# HELP hadron_stream_subscriber_num_groups number of subscribers currently registered on this stream
# TYPE hadron_stream_subscriber_num_groups gauge
hadron_stream_subscriber_num_groups

# HELP hadron_streams_watcher_errors k8s watcher errors from the streams watcher
# TYPE hadron_streams_watcher_errors counter
hadron_streams_watcher_errors

# HELP hadron_tokens_watcher_errors k8s watcher errors from the tokens watcher
# TYPE hadron_tokens_watcher_errors counter
hadron_tokens_watcher_errors

# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds

# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds

# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes

# HELP process_threads Number of OS threads in the process.
# TYPE process_threads gauge
process_threads

# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes

# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes

Local Development

Let's get started with Hadron for local application development using Kubernetes and Kind.

Kind is a tool for running local Kubernetes clusters using Docker container “nodes” and is the easiest way to get started with Kubernetes.

Install Kind

Follow the instructions in the Kind installation guide to ensure kind is installed locally and ready for use.

Next, create a cluster.

kind create cluster

Once your cluster is up and running, you are ready to move on to the next step.

Install Hadron

Helm is the package manager for Kubernetes, and it will need to be installed and available for command-line usage for this use case. If you've been using Kubernetes for any amount of time, then you are probably already using helm.

Before we install the Hadron Operator, we are going to install cert-manager. Though cert-manager is not required by Hadron, it does greatly simplify the setup of TLS certificates, which Hadron uses for its validating webhooks. Instead of manually crafting our own certs, we'll stick with cert-manager.

helm repo add jetstack https://charts.jetstack.io
helm upgrade cert-manager jetstack/cert-manager --install --set installCRDs=true

Now we are ready to install the Hadron Operator:

# Helm >= v3.7.0 is required for OCI usage.
helm install hadron-operator oci://ghcr.io/hadron-project/charts/hadron-operator --version 0.1.3

Install Example Resources

For this use case, let's use the example resources found in the Hadron repo. Here is the code.

Apply the code to the cluster:

wget -qO- https://raw.githubusercontent.com/hadron-project/hadron/tree/main/charts/hadron-operator/examples/full.yaml | kubectl apply -f -

The example file is about 75 lines of code, so here we will only show the names and types of the resources for brevity.

apiVersion: hadron.rs/v1beta1
kind: Stream
metadata:
  name: events
  ...

---
apiVersion: hadron.rs/v1beta1
kind: Pipeline
metadata:
  name: service-creation
# ...

---
apiVersion: hadron.rs/v1beta1
kind: Token
metadata:
  name: hadron-full-access
# ...

---
apiVersion: hadron.rs/v1beta1
kind: Token
metadata:
  name: hadron-read-only
# ...

---
apiVersion: hadron.rs/v1beta1
kind: Token
metadata:
  name: hadron-read-write
# ...

This example code defines a Stream, a Pipeline associated with that Stream, and 3 Tokens which can be used to experiment with Hadron's authentication and authorization system.

Application Integration

Integrating you application with Hadron involves three simple steps:

  • Add the Hadron client as an application dependency. This is language dependent. In Rust, simply add hadron-client = "0.1.0-beta.0" to the [dependencies] section of your Cargo.toml.
  • Next, determine the access token which your application will use. For this use case, we'll use the hadron-full-access Token.
  • Finally, we need the URL to use for connecting to the Hadron Stream. This is always deterministic based on the name of the Stream itself, and follows the pattern: http://{streamName}.{namespaceName}.svc.{clusterApex}:7000.
    • {streamName} is events,
    • {namespaceName} is default,
    • {clusterApex} defaults to cluster.local in Kubernetes,
    • which all works out to http://events.default.svc.cluster.local:7000,
    • for details on how to connect from outside of the Kubernetes cluster, see the Streams reference.

Now that we have this info, let's define a Kubernetes Deployment which uses this info. In this case we will just be using the Hadron CLI to establish Stream subscriptions, but it will sufficiently demonstrate how to integrate an application using these details.

Create a file called deployment.yaml with the following contents:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-client
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo-client
  template:
    metadata:
      labels:
        app: demo-client
    spec:
      containers:
      - name: cli
        image: ghcr.io/hadron-project/hadron/hadron-cli:latest
        command: ["hadron", "stream", "sub", "--group=demo-client", "--start-beginning"]
        env:
        - name: HADRON_TOKEN
          valueFrom:
            secretKeyRef:
              name: hadron-full-access
              key: token
        - name: HADRON_URL
          value: http://events.default.svc.cluster.local:7000

Now apply this file to the cluster:

kubectl apply -f deployment.yaml

This will create a new deployment, with 3 replicas, and each replica pod will be running an instance of the Hadron CLI. The CLI will create a subscription to all partitions of the Stream events, and will print the contents of each event it receives and will then ack the event.

Next Steps

From here, some good next steps may be:

  • Make some changes: you've made it pretty far through the guide! Now might be a good time to try some experimentation of your own.
  • Model application workflows: start modeling your own application workflows by encoding them as Pipelines. Start writing your client code for publishing application events to your Stream, and then write the code which will process the various stages of your Pipelines. Check out the Use Case: Service Provisioning for some deeper exploration.
  • Prepare for Production deployment: reviewing the reference sections for Hadron's resources is a great way to prepare for deploying Hadron in a production environment. Streams have various configuration options which can be tuned for scaling, storage, resource quotas and the like.

Use Case: Service Provisioning

This use case assumes some familiarity with Hadron. It is recommended to have at least read the Quick Start chapter before continuing here.

Getting Started

We've joined a new company, ExampleCloud, which offers a service where customers may provision various types of storage systems in the cloud. Though conceptually simple, there are lots of individual stages in this workflow, each of which will take different amounts of time to complete, and all having different failure conditions and data requirements.

With Hadron, defining a Pipeline to model this workflow is simple:

apiVersion: hadron.rs/v1beta1
kind: Pipeline
metadata:
  name: service-creation
spec:
  # The source Stream of this Pipeline.
  sourceStream: events
  # Event types which trigger this Pipeline.
  triggers:
    - service.created
  # When first created, the position of the Source stream to start from.
  startPoint:
    location: beginning
  # Maximum number of parallel executions per stage.
  maxParallel: 50
  stages:
    # Deploy the customer's service in Kubernetes.
    - name: deploy-service

The existance of the Stream events is assumed, review Quick Start chapter for details.

Here we've defined a Pipeline called service-creation to model this workflow. Hadron uses this config to generate resources within its cluster to store and process the data for this new Pipeline. As show above, we can even document the purpose and expectations of or workflow stages.

With this configuration, any time our ExampleCloud application publishes a new event of type service.created to our Stream events, Hadron will automatically trigger a new Pipeline execution which will pass that new event through the Pipeline stages defined above (right now, only 1 stage).

Client Setup

Next let's create a program which uses the Hadron Client to publish events of type service.created to our Stream events, and then we will also create a subscription to our new Pipeline which will process the stage deploy-service.


#![allow(unused)]
fn main() {
// Event producer which publishes events to stream `events`.
let client = hadron::Client::new("http://events.default:7000", /* ... params ... */)?;
let publisher = client.publisher("example-cloud-app").await?;

// Publish a new event based on some application logic.
let event = hadron::NewEvent { /* ... snip ... */ };
publisher.publish(event).await?;
}

#![allow(unused)]
fn main() {
// Process Pipeline stage `deploy-service`.
client.pipeline("service-creation", "deploy-service", deploy_handler);
}

Awesome! The consumer code of deploy_handler shown above can do whatever it wants. The only requirement is that when it is done, it must return a Result<NewEvent, Error> — that is, it must return an output event for success, or an error for failure cases (resulting in a retry).

Client disconnects will trigger retries. Errors will be tracked by Hadron and exposed for monitoring. Output events from successful processing of stages are persisted to disk. Once a stage is completed successfully for an event, that stage will never be executed again for the same event.

New Requirements

Things are going well at our new company. Service creation is trucking along, and life is good. However, as it turns out, there are a few important steps which we've neglected, and our boss would like to have that fixed.

First, we forgot to actually charge the customer for their new services. Company isn't going to survive long unless we start charging, so we'll need to add a new stage to our Pipeline to handle that logic.

Next, customers have expressed that they would really like to know the state of their service, so once their service has been deployed, we'll need to deploy some monitoring for it. Let's add a new stage for that as well.

Finally, customers are also saying that it would be great to receive a notification once their service is ready. Same thing, new stage.

With all of that, we'll update the Pipeline's stages section to look like this:

  stages:
    # Deploy the customer's service in Kubernetes.
    - name: deploy-service

    # Setup billing for the customer's new service.
    - name: setup-billing
      dependencies: ["deploy-service"]

    # Setup monitoring for the customer's new service.
    - name: setup-monitoring
      dependencies: ["deploy-service"]

    # Notify the user that their service is deployed and ready.
    - name: notify-user
      dependencies: ["deploy-service"]

Pretty simple, but let's break this down. First, the original deploy-service stage is still there and unchanged. Next, we've added our 3 new stages setup-billing, setup-monitoring, and notify-user. There are a few important things to note here:

  • Each of the new stages depends upon deploy-service. This means that they will not be executed until deploy-service has completed successfully and produced an output event.
  • Once deploy-service has completed successfully, our next 3 stages will be executed in parallel. Each stage will receive a copy of the root event which triggered this Pipeline execution, as well as a copy of the output event from the deploy-service stage.

This compositional property of Hadron Pipelines sets it apart from the crowd as a powerful data processing and data integration system.

Given that we've added new stages to our Pipeline, we need to add some new stage consumers to actually handle this logic. This is nearly identical to our consumer for deploy-service, really only the business logic in the handler will be different for each.


#![allow(unused)]
fn main() {
client.pipeline("service-creation", "setup-billing", billing_handler).await?;
client.pipeline("service-creation", "setup-monitoring", monitoring_handler).await?;
client.pipeline("service-creation", "notify-user", notify_handler).await?;
}

Synchronization

A subtle, yet very impactful aspect of doing parallel processing is synchronization. How do we guard against conflicting states? How do we prevent invalid actions?

At ExampleCloud, we want to provide the best experience for our users. As such, we will add one more stage to our Pipeline which we will use to synchronize the system. Here is the new stage:

    # Synchronize the state of the system.
    #
    # - Update tables using the event data from all of the upstream stages.
    # - Update the state of the new service in the database so that the
    #   user can upgrade, resize, or otherwise modify the service safely.
    - name: sync
      dependencies:
        - deploy-service
        - setup-billing
        - setup-monitoring
        - notify-user

The sync stage we've defined here will integrate the data signal from all previous stages in the Pipeline, receiving a copy of the root event and output events of each stage. There is a lot that we can do with this data, but here are a few important things that we should do:

  • Update the state of the user's new service in our database. This will ensure that users see the finished state of their services in our fancy ExampleCloud API and UI.
  • With this small amount of synchronization, we can prevent race conditions in our system by simply not allowing certain actions to be taken on a service when it is in the middle of a Pipeline.

Next Steps

From here, we could begin to model other workflows in our system as independent Pipelines triggered from different event types. Examples might be:

  • Upgrade a service: maybe the user needs a bigger or smaller service. There are plenty of granular tasks involved in making this happen, which is perfect for Pipelines.
  • Delete a service: resources originally created and associated with a service may live in a few different microservices. Have each microservice process a different stage for their own service deletion routines as part of a new Pipeline specifically for deleting services.

See the Use Case: Transactional Processing for more details on how to build powerful stateful systems while still leveraging the benefits of an event-driven architecture.

Transactional Processing

This chapter is broken into two parts:

What is Transactional Processing

This use case assumes some familiarity with Hadron. It is recommended to have at least read the Quick Start chapter before continuing here. Reading the previous chapter Use Case: Service Provisioning is a great primer as well.

Transactional processing is an algorithmic pattern used to ensure that event processing is idempotent.

In the greater context of system design and operations, errors take place all the time. This can lead to race conditions, duplicate events, invalid states, you name it. How do we deal with these issues?

In some cases, none of these issues matter. Maybe we don't care about the data, maybe we don't retry in the face of errors. However, when attempting to build a consistent system (either strong or eventual), then we need to care about these issues. Deeply.

Transactional processing is the way to deal with these issues. Before we delve more deeply into implementing a transactional processing pattern, we need to understand identity.

Identity

An event must be uniquely identifiable.

In Hadron, this is easy, as we use the CloudEvents model, and each event bears the id and source fields which together uniquely identify an event. Producers provide the id and source fields when publishing events, the only trick is in determining what the values for these fields should be.

Establishing Identity

Depending on the requirements of the system being built, there are a few different ways to establish the identity of an event before publishing it to Hadron.

OutTableIdentity: this approach uses a transactional database system to generate events as part of the database transaction in which the event transpired, writing those events first to a database table, called the out-table, and then using a separate transaction to copy or move those events from the out-table into Hadron.

  • Most transactional databases provide ACID guarantees around this process (it is just a regular transaction).
  • This means that the generation of the event is "all or nothing". If the transaction is rolled-back before it is committed, then it is as though it never happened.
  • The event should have a unique ID enforced by the database as it is written to the out-table and which will map directly to the Hadron CloudEvents id field.
  • The event should also have a well established value for source, which will often correspond to the application's name, the out-table's name, or some other value which uniquely identifies the entity which generated the event.
  • The process of moving events from the out-table into Hadron could still result in duplicate events, however the ID of the event from the out-table will be preserved and firmly establishes its identity.

OccurrenceIdentity: this approach establishes an identity the moment the event occurs, typically using something like a UUID. The generated ID is embedded within the event when it is published to Hadron, and even though the publication could be duplicated due to retries, the generated ID firmly establishes its identity.

  • The generated ID will directly map to the Hadron CloudEvents id field of the event when published.
  • The event should also have a well established value for source, which will often correspond to the application's name or some other value which uniquely identifies the entity which generated the event.
  • This approach works well for systems which need to optimize for rapid ingest, and as such typically treat the occurrence itself as authoritative and will often not subject the event to dynamic stateful validation (which is what out-tables are good for).

These two patterns for establishing the identity of an event generalize quite well, and can be applied to practically any use case depending on its requirements.

In all of these cases, the id and source fields of the CloudEvents model together establish an absolute identity for the event. This identity is key in implementing a transactional processing model which is impervious to race conditions, duplicate events and the like.

Hadron is Transactional

Hadron itself is a transactional system.

For Streams, transactions apply only to ack'ing that an event or event batch was processed. For Pipeline stages, ack'ing an event requires an output event which is transactionally recorded as part of the ack.

For both Stream and Pipeline consumers, once an event or event batch has been ack'ed, those events will never be processed again by the same consumer group. However, it is important to note that Hadron does not use the aforementioned CloudEvents identity for its transactions. Hadron Stream partitions use a different mechanism to track processing, and users of Hadron should never have to worry about that.

For users of Hadron, processing should only ever take into account the id and source fields of an event to establish identity.

Implement Transactional Processing

This chapter builds upon the ideas established in the previous chapter What is Transactional Processing and it is recommended to have a firm understanding of that content before continuing here.

Armed with the knowledge of event identity, we are now ready to implement our own transactional processing system.

For this use case we will make the following starting assumptions:

  • We are using an RDBMS (like PostgreSQL or CockroachDB) for application state.
  • We are using the OutTableIdentity pattern to transactionally generate our events.
  • We are using a microservices model, so we have multiple out-tables each in its own database schema (think namespace except for a database) owned by a different microservice.
  • Every event published to our Hadron Stream is coming from one of our microservice out-tables. Let's say our Stream is named events.

What properties does this model have?

  • Every event on our Stream will have a guaranteed unique identity based on the combination of the id and source fields.
  • Duplicates may still exist, but we know they are duplicates based on the id and source field combination. Our implementation below will easily deal with such duplicates.
  • As requirements evolve, we can seamlessly add new microservices to our system following this same pattern, and our uniqueness properties will still hold.

This is not the only way to implement a pattern like this, but it is a great stepping stone which can be adapted to your specific use cases.

Transactional Consumers

Given our starting assumptions and the properties of our model, we are ready to implement a transactional processing algorithm for our consumers.

First, we'll show what this looks like for Streams, then we'll show what this looks like for Pipelines.

Stream Consumers

For any given microservice, we will need an in-table. For transactional processing, an in-table is the logical counterpart to an out-table, but is far more simple. The in-table in this case needs only two columns, id and source, corresponding to an event's id and source fields. The in-table will also have a compound primary key over both of these fields.

Time to implement. Our microservice will have a live subscription to our Stream events, and when an event is received, the algorithm of the event handler will be roughly as follows:

  • Open a new transaction with our database.
  • Attempt to write a new row to the in-table using the event's id and source fields as values for their respective columns.
    • If an error is returned indicating a primary key violation, then we know that we've already processed this event. Close the database transaction. Return a success response from the event handler. Done.
    • Else, if no error, then continue.
  • Now time for business logic. Do whatever it is your microservice needs to do. Manipulate some rows in the database using the open transaction. Whatever.
  • If your business logic needs to produce a new event as part of its business logic, then craft the new event, and write it to the out-table.
  • Commit the database transaction. If errors take place, no worries. Just return the error from the event handler and a retry will take place.
  • Finally, return a success from the event handler. Done!

For cases where a new event was generated as part of the business logic, the out-table is already setup to ship these events over to Hadron.

The code implementing this model can be found here: examples/stream-transactional-processing.

Pipeline Consumers

For Pipeline consumers, the in-table needs four columns, id, source, stage and output. The in-table will have a compound primary key over the columns id, source and stage.

Let's do this. Our microservice will have a Pipeline stage subscription to whatever Pipeline stages it should process, and when an event is received, the algorithm of the event handler will be roughly as follows:

  • Open a new transaction with our database.
  • Query the in-table using the table's primary key, where id is the root event's id, source is the root event's source, and stage is the name of the stage being processed.
    • If a row is returned, then we know that we've already processed this root event for this stage, and the row contains the output event which our Pipeline stage handler needs to return. Close the database transaction. Return the output event. Done.
    • Else, if no row is found, then this event has not yet been processed. Continue.
  • Now time for business logic. Do whatever it is your microservice needs to do. Manipulate some rows in the database using the open transaction, use the root event or any of the dependency events of this Pipeline stage. Whatever.
  • Pipeline stage handlers are required to return an output event indicating success. Construct a new output event. For simplicity, use the id of the root event and simply make the source something unique to this microservice's Pipeline stage.
  • Using the open database transaction, write a new row to the in-table where id is the root event's id, source is the root event's source, stage is the name of the stage being processed, and output is the serialized bytes of our new output event.
  • Commit the database transaction. If errors take place, no worries. Just return the error from the event handler and a retry will take place.
  • Finally, return the output event from the event handler. Done!

With Pipelines, we are able to model all of the workflows of our applications, even spanning across multiple microservices, teams, and even system boundaries. Because Pipelines require an output event to be returned from stage consumers, we do not need an independent out-table process to ship these events over to Hadron.

The code implementing this model can be found here: examples/pipeline-transactional-processing.

Next Steps

Now that you've seen the algorithms for implementing a transactional processing models, it would be good to:

  • Dive into the code: follow the links provided above to see the reference implementation code.
  • Copy the code and modify it: if you need to implement your own transactional processing system, the reference code is a great stepping stone.

Tokens

Hadron uses a simple authentication and authorization model which works seamlessly with Kubernetes.

All authentication in Hadron is performed via Tokens. Tokens are Hadron CRDs, which are recognized and processed by the Operator. Token CRs result in a JWT being created with the Operator's private RSA key. The generated JWT is then written to a Kuberetes Secret in the same namespace as the Token, and will bear the same name as the Token.

apiVersion: hadron.rs/v1beta1
kind: Token
metadata:
  ## The name of this Token.
  ##
  ## The generated Kubernetes Secret will have the same name.
  name: :string
  ## The Kubernetes namespace of this Token.
  ##
  ## The generated Kubernetes Secret will live in the same namespace.
  namespace: :string
spec:
  ## Grant full access to all resources of the cluster.
  ##
  ## If this value is true, then all other values are
  ## ignored when establishing authorization.
  all: :bool

  ## Pub/Sub access for Streams by name.
  ##
  ## Permissions granted on a Stream extend to any Pipelines
  ## associated with that Stream.
  streams:
    ## The names of all Streams which this Token can publish to.
    pub: [:string]
    ## The names of all Streams which this Token can subscribe to.
    sub: [:string]

  ## Pub/Sub access for Exchanges by name.
  exchanges:
    ## The names of all Exchanges which this Token can publish to.
    pub: [:string]
    ## The names of all Exchanges which this Token can subscribe to.
    sub: [:string]

  ## Pub/Sub access for Endpoints by name.
  endpoints:
    ## The names of all Endpoints which this Token can publish to.
    pub: [:string]
    ## The names of all Endpoints which this Token can subscribe to.
    sub: [:string]

Streams

Streams are append-only, immutable logs of data with absolute ordering per partition.

Streams are deployed as independent StatefulSets within a Kubernetes cluster, based purely on a Stream CRD stored in Kubernetes.

Each pod of a Stream's StatefulSet acts as a leader of its own partition. Kubernetes guarantees the identity, stable storage and stable network address of each pod. Replication is based on a deterministic algorithm and does not require consensus due to these powerful identity and stability properties.

apiVersion: hadron.rs/v1beta1
kind: Stream
metadata:
  ## The name of this Stream.
  ##
  ## The generated Kubernetes StatefulSet of this Stream will have the same name.
  name: :string
  ## The Kubernetes namespace of this Stream.
  ##
  ## The generated Kubernetes Secret will live in the same namespace.
  namespace: :string
spec:
  ## The number of partitions this Stream should have.
  ##
  ## This directly corresponds to the number of replicas of the
  ## backing StatefulSet of this Stream. Scaling this value up or down will
  ## cause the backing StatefulSet to be scaled identically.
  partitions: :integer

  ## When true, the Stream processes will use `debug` level logging.
  debug: :bool

  ## The data retention policy for data residing on this Stream.
  retentionPolicy:
    ## The retention strategy to use.
    ##
    ## Allowed values:
    ## - "retain": this strategy preserves the data on the Stream indefinitely.
    ## - "time": this strategy preserves the data on the Stream for the amount of time
    ##   specified in `.spec.retentionPolicy.retentionSeconds`.
    strategy: :string (default "time")
    ## The number of seconds which data is to be retained on the Stream before it is deleted.
    ##
    ## This field is optional and is only evaluated when using the "time" strategy.
    retentionSeconds: :integer (default 604800) # Default 7 days.

  ## The full image name, including tag, to use for the backing StatefulSet pods.
  image: :string

  ## The PVC volume size for each pod of the backing StatefulSet.
  pvcVolumeSize: :string

  ## The PVC storage class to use for each pod of the backing StatefulSet.
  pvcStorageClass: :string

  ## The PVC access modes to use for each pod of the backing StatefulSet.
  pvcAccessModes: [:string]

Pipelines

Pipelines are workflow orchestration for data on Streams, providing structured concurrency for arbitrarily complex multi-stage workflows.

Pipelines are defined as CRDs stored in Kubernetes. Pipelines exist side by side with their source Stream, and Streams may have any number of associated Pipelines. Pipelines are triggered for execution when an event published to a Stream has an event type which matches one of the trigger patterns of an associated Pipeline.

apiVersion: hadron.rs/v1beta1
kind: Pipeline
metadata:
  ## The name of this Pipeline.
  name: :string
  ## The Kubernetes namespace of this Pipeline.
  ##
  ## This Pipeline's associated `sourceStream` must exist within
  ## the same Kubernetes namespace.
  namespace: :string
spec:
  ## The name of the Stream which feeds this Pipeline.
  ##
  ## Events published to the source Stream which match this Pipeline's
  ## `triggers` will trigger a new Pipeline execution with the matching
  ## event as the root event.
  sourceStream: :string

  ## Patterns which must match the event `type` of an event on the
  ## source Stream in order to trigger a Pipeline execution.
  triggers: [:string]

  ## The maximum number of Pipeline executions which may be executed in parallel.
  ##
  ## This is calculated per partition.
  maxParallel: :integer

  ## The location of the source Stream which this Pipeline
  ## should start from when first created.
  startPoint:
    ## The start point location.
    location: :string # One of "beginning" | "latest" | "offset"
    ## The offset to start from, which is only evaluated when `location` is `offset`.
    ##
    ## This is applied to all partitions of the source Stream identically.
    offset: :integer

  ## Workflow stages of this Pipeline.
  stages:
    ## Each stage must have a unique name.
    - name: :string
      ## The names of other stages in this Pipeline which
      ## must be completed first before this stage may start.
      after: [:string]
      ## The names of other stages in this Pipeline
      ## which this stage depends upon for input.
      ##
      ## All dependencies listed will have their outputs delivered
      ## to this stage at execution time.
      ##
      ## When a stage is listed as a dependency, there is no need to
      ## also declare it in the `after` list.
      dependencies: [:string]

Exchanges

These docs are currently under construction.

RPC Endpoints

These docs are currently under construction.

Clients

The Hadron team maintains the following Hadron clients by language.

Rust

The official Rust Hadron client library.

As Hadron matures and grows, the Hadron team will be releasing client libraries for various other languages using the Rust client as a common core, providing maximum speed, safety and efficiency for all client languages.

CLI

Hadron ships with a native CLI (Command-Line Interface), called hadron.

The best way to use the Hadron CLI is by launching a temporary pod inside of the Kubernetes cluster where your Hadron cluster is running. This makes access clean, simple, and secure because the credentials never need to leave the cluster. For a Stream named events deployed in the default namespace, and a Token named app-token, the following command will launch the Hadron CLI:

kubectl run hadron-cli --rm -it \
    --env HADRON_TOKEN=$(kubectl get secret app-token -o=jsonpath='{.data.token}' | base64 --decode) \
    --env HADRON_URL="http://events.default.svc.cluster.local:7000" \
    --image ghcr.io/hadron-project/hadron/hadron-cli:latest

Accessing Hadron from outside of the Kubernetes cluster is currently in progress, and details will be added here when ready.

Commands

The Hadron CLI

USAGE:
    hadron [FLAGS] [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
    -v               Enable debug logging

OPTIONS:
        --token <token>    Set the auth token to use for interacting with the cluster
        --url <url>        Set the URL of the cluster to interact with

SUBCOMMANDS:
    help        Prints this message or the help of the given subcommand(s)
    pipeline    Hadron pipeline interaction
    stream      Hadron stream interaction

hadron stream

Hadron stream interaction

USAGE:
    hadron stream <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    help    Prints this message or the help of the given subcommand(s)
    pub     Publish data to a stream
    sub     Subscribe to data on a stream

hadron stream pub

Publish data to a stream

USAGE:
    hadron stream pub [FLAGS] [OPTIONS] <data> --type <type>

FLAGS:
        --binary     If true, treat the data payload as a base64 encoded binary blob
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --id <id>            The ID of the new event, else a UUID4 will be generated
    -o <optattrs>...         Optional attributes to associate with the given payload
        --source <source>    The source of the new event, else `hadron-cli` will be used
        --type <type>        The type of the new event

ARGS:
    <data>    The data payload to be published

hadron stream sub

Subscribe to data on a stream

USAGE:
    hadron stream sub [FLAGS] [OPTIONS] --group <group>

FLAGS:
    -d, --durable            Make the new subscription durable
    -h, --help               Prints help information
        --start-beginning    Start from the first offset of the stream, defaults to latest
        --start-latest       Start from the latest offset of the stream, default
    -V, --version            Prints version information

OPTIONS:
    -b, --batch-size <batch-size>        The batch size to use for this subscription [default: 1]
    -g, --group <group>                  The subscription group to use
        --start-offset <start-offset>    Start from the given offset, defaults to latest

hadron pipeline

Hadron pipeline interaction

USAGE:
    hadron pipeline <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    help    Prints this message or the help of the given subcommand(s)
    sub     Subscribe to data on a stream

hadron pipeline sub

Subscribe to data on a stream

USAGE:
    hadron pipeline sub <pipeline> <stage>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <pipeline>    The pipeline to which the subscription should be made
    <stage>       The pipeline stage to process