Announcing ZephFlow: A Lightweight Data Processing Framework Now Open Source - AI & LLM Workflows

OCSF

Pricing

Blog

Company

Contact

Docs

Try OCSF Mapper

Try Data API Builder

Workflow Builder

Announcing ZephFlow: A Lightweight Data Processing Framework Now Open Source

We are excited to announce the open source release of ZephFlow, our lightweight yet powerful data processing framework.

Bo Lei

Co-Founder & CTO, Fleak

We're excited to announce the open source release of ZephFlow, our lightweight yet powerful data processing framework. After months of internal development and refinement, we're making this tool available to the broader developer community.

Why We Built ZephFlow

Working with data processing frameworks often introduces significant challenges:

Configuring and operating processing clusters consumes substantial resources
Implementing jobs frequently requires complex configuration
Many existing solutions have steep learning curves, even for simpler use cases

ZephFlow was developed to address these pain points by providing a more accessible approach to data transformation that doesn't sacrifice capability.

What is ZephFlow?

ZephFlow is a flexible data processing framework that allows developers to build transformation pipelines using a directed acyclic graph (DAG) structure. It provides:

A simple, expressive DSL for defining data transformations
Support for both SQL and our custom Fleak Eval Expression Language (FEEL)
The ability to run as a standalone process, within a JVM application, or as an HTTP service
Powerful operators for filtering, transforming, and validating data

ZephFlow can adapt to your specific needs—running as a synchronous API backend for smaller workloads where multiple jobs share resources, or as an asynchronous data pipeline with dedicated resources for more demanding scenarios.

Getting Started

Here's a simple example of a ZephFlow pipeline:

// Create a flow that filters and transforms data
ZephFlow flow = ZephFlow.startFlow();
ZephFlow processedFlow = flow
    .kafkaSource("broker:9092", "input-topic", "consumer-group", EncodingType.JSON_OBJECT, null)
    .filter("$.status == 'success'")  // FEEL expression
    .eval("dict_merge($, dict(processed_at=epoch_to_ts_str($.timestamp, \"yyyy-MM-dd'T'HH:mm:ss\")))")
    .kafkaSink("broker:9092", "output-topic", null, EncodingType.JSON_OBJECT, null);

// Execute the flow
processedFlow.execute("job_id", "env", "service");

Key Benefits

Simplicity: Define complex transformations with minimal code
Flexibility: Run anywhere - from your local development environment to production services
Resource Efficiency: Process data without excessive infrastructure overhead
Expressiveness: Leverage SQL or FEEL for powerful data manipulations

Use Cases

We've successfully deployed ZephFlow for various internal use cases:

Log processing and normalization
ETL workloads
Event streaming and transformation
Data validation and enrichment

The framework is particularly effective when you need powerful transformation capabilities without the operational complexity of larger distributed systems.

Get Involved

Check out the documentation to learn more about ZephFlow and how to use it. The source code is available on GitHub.

We welcome contributions, feedback, and feature requests. Join us in building a more efficient approach to data processing.

Other Posts

Jul 25, 2025

The Data Format Problem Killing Your Security AI Performance

Demo: 95% accuracy. Production: 27% accuracy. The culprit? Your AI models are fluent in vendor demos but can't speak the dozens of data dialects flooding your actual security environment.

Jul 23, 2025

AI-Ready Security Data is a Data Engineering Problem

Security teams invest in AI but see no ROI because they're building on a broken data foundation. This isn't a security failure; it's a data engineering problem solved by prioritizing high-quality, normalized data over raw volume.

Jul 11, 2025

A Step-by-Step Guide: From Okta Logs to OCSF in Databricks

Transform Okta logs to OCSF for analysis in Databricks. This guide covers automated mapping with Fleak, building a data pipeline with ZephFlow, and querying your logs in a Delta Lake. Simplify your security analytics workflow now.