Skip to content

Datapipe #107

@puritanne

Description

@puritanne

Datapipe is a real-time, incremental Python ETL library for machine learning with record-level dependency tracking.

The library is designed for describing data processing pipelines and is capable of tracking dependencies for each record in the pipeline. This ensures that tasks within the pipeline receive only the data that has been modified, thereby improving the overall efficiency of data handling.

https://datapipe.dev/

Key Features:

  • Incremental Processing: datapipe processes only new or modified data, significantly reducing computation time and resource usage.

  • Real-time ETL: The library supports real-time data extraction, transformation, and loading.

  • Dependency Tracking: Automatic tracking of data dependencies and processing states.

  • Python Integration: Seamlessly integrates with Python applications, offering a Pythonic way to describe data pipelines.

Ideal projects for Datapipe

  • Projects with complex ML pipelines with a human-in-the-loop component

  • ML projects that require real-time model retraining based on newly labeled data

  • Projects that require content moderation

Github

https://github.com/epoch8/datapipe – Datapipe Core

https://github.com/epoch8/datapipe-examples/ – Usage examples

Screenshots

1707253126680-1707253124483Screenshot-2024-02-06-at-15 40 36
1707204445926-1707204444758Screenshot-2024-01-08-at-16 09 37

Logo

1707254356701-1707254355917logo_monochrome

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions