Skip to content

A collection of runnable and self-contained examples inspired by various akka-streams (pekko-streams), Alpakka (Pekko connectors) and akka-http (pekko-http) docs, tutorials and blogs

License

Notifications You must be signed in to change notification settings

pbernet/akka_streams_tutorial

Repository files navigation

GolpoAI NotebookLM Sourcegraph Build Status Scala Steward

Pekko tutorial

This repository contains a collection of runnable and self-contained examples from various Pekko Streams, Pekko Connectors and Pekko HTTP tutorials, blogs, and postings.

Akka vs Pekko
As of umbrella release 22.10 Lightbend has changed the licensing model. Apache Pekko is the open source alternative. A BIG Thank you to the committed Pekko committers.

For now the branch migrate_pekko contains the migration (with a few losses). Currently, this is the only maintained branch. The plan is to move the content of this branch to a new pekko_tutorial repo and to support Scala 3 when Pekko Connectors is ready.

Project Description

"It's working!" a colleague used to shout across the office when yet another proof of concept was running its first few hundred meters along the happy path, aware that the real work started right there. This repo aims to provide you with exactly this feeling by offering a collection of runnable examples in Scala and Java.

Getting Started

Prerequisites

Install Java 17 or higher ( recommended: GraalVM)

Install sbt and Docker

Installation

  1. Clone the repository: git clone https://github.com/pbernet/akka_streams_tutorial.git
  2. Navigate to the project directory: cd akka_streams_tutorial
  3. Compile the project: sbt compile

Running Examples

Each example class contains instructions on how to run it from the IDE. Most examples are throttled and provide a verbose log, by searching the log you see what is happening. Some examples deliberately throw RuntimeException eg to show recovery behaviour.

Examples Overview

Featured examples with complex workflows:

Many examples deal with shared state management. While most Pekko Streams operators are stateless, the samples in package sample.stream_shared_state show some trickier stateful operators in action.

The *Echo example series implement round trips eg HttpFileEcho and WebsocketEcho

Using testcontainers allows running realistic integration test scenarios with just one click:

Examples of integrating AWS services with Pekko Connectors:

Run them via the corresponding IT test classes locally in localstack/minio or against your AWS account.

Other example resources

Element deduplication

Dropping identical (consecutive or non-consecutive) elements in an unbounded stream:

The following use case uses a local caffeine cache to avoid duplicate HTTP file downloads:

  • Process a stream of incoming messages with reoccurring TRACE_ID
  • For the first message: download a .zip file from a FileServer and add TRACE_ID→Path to the local cache
  • For subsequent messages with the same TRACE_ID: fetch file from cache to avoid duplicate downloads per TRACE_ID
  • Use time based cache eviction to get rid of old downloads
Class Description
FileServer Local HTTP FileServer for non-idempotent file download simulation
LocalFileCacheCaffeine Pekko streams client flow, with cache implemented with caffeine

Windturbine example

Working sample from the blog series 1-4 from Colin Breck where classic Actors are used to model shared state, life-cycle management and fault-tolerance in combination with Pekko Streams. Colin Breck explains these concepts and more in the 2017 Reactive Summit talk Islands in the Stream: Integrating Akka Streams and Akka Actors

Class Description
SimulateWindTurbines Starts n clients which feed measurements to the server
WindTurbineServer Start server which collects measurements

The clients communicate via websockets with the WindTurbineServer. After a restart of SimulateWindTurbines the clients are able to resume. Shutting down the WindTurbineServer results in reporting to the clients that the server is not reachable. After restarting WindTurbineServer the clients are able to resume. Since there is no persistence, the processing just continuous.

Apache Kafka WordCount

The ubiquitous word count with an additional message count. A message is a sequence of words. Start the classes in the order below and watch the console output.

Class Description
KafkaServerEmbedded Uses Embedded Kafka (= an in-memory Kafka instance). No persistence on restart
WordCountProducer pekko-streams-kafka client which feeds random words to topic wordcount-input
WordCountKStreams.java Kafka Streams DSL client to count words and messages and feed the results to wordcount-output and messagecount-output topics. Contains additional interactive queries which should yield the same results WordCountConsumer
WordCountConsumer pekko-streams-kafka client which consumes aggregated results from topic wordcount-output and messagecount-output

HL7 V2 over TCP via Kafka to Websockets

The PoC in package alpakka.tcp_to_websockets is from the E-Health domain, relaying HL7 V2 text messages in some kind of "Trophy" across these stages:

Hl7TcpClientHl7Tcp2KafkaKafkaServerKafka2WebsocketWebsocketServer

The focus is on resilience (= try not to lose messages during the restart of the stages). However, currently messages may reach the WebsocketServer unordered (due to retry in Hl7TcpClient) and in-flight messages may get lost (upon re-start of WebsocketServer).

Start each stage separately in the IDE, or together via the integration test AlpakkaTrophySpec

Analyse Wikipedia edits live stream

Find out whose Wikipedia articles were changed in (near) real time by consuming the Wikipedia Edits stream provided via SSE. The class WikipediaEditsAnalyser implements the following workflow:

Use the title as identifier to fetch the extract from the Wikipedia API, eg for Douglas Adams.

Local NER processing on the extract / content using opennlp yields personsFoundLocal, which are then added to the wikipediaedits Elasticsearch/Opensearch Index.

Also, do remote NER processing on the extract / content using OpenAI GPT_4_O_MINI to obtain personsFoundRemote.

All persons found (local and remote) can be viewed in the Index with a Browser, eg http://localhost:{mappedPort}/wikipediaedits/_search?q=personsFoundLocal:*

All content is also transformed into embeddings using LangChain4j BgeSmallEnV15QuantizedEmbeddingModel to a local InMemoryEmbeddingStore to be able to RAG chat against the content of the currently edited Wikipedia pages via a local AI Assistant http://localhost:8080/assistant

Movie subtitle translation via LLMs

SubtitleTranslator translates all blocks of an English source .srt file to a target language using LLMs via LangChain4j

Pekko streams helps with:

  • Workflow modelling
  • Scene splitting to session windows. All blocks of a scene are grouped in one session and then translated in one API call
  • Throttling to not exceed the OpenAI API rate limits
  • Continuous writing of translated blocks to the target file to avoid data loss on NW failure

About

A collection of runnable and self-contained examples inspired by various akka-streams (pekko-streams), Alpakka (Pekko connectors) and akka-http (pekko-http) docs, tutorials and blogs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •