Introduction
Kubedoop Data Platform is a modular, Kubernetes-native platform. Through Kubedoop, users can quickly and easily deploy data infrastructure and algorithm infrastructure to address DataOps and MLOps requirements.
Kubedoop includes mainstream data processing components such as HDFS, Hive, Kafka, Superset, etc., while supporting data lakes and real-time data warehouses to meet the migration needs from traditional Hadoop platforms to Kubernetes platforms.
Built on Kubernetes Operator technology, Kubedoop automates the lifecycle management of data processing tasks, including task creation, startup, monitoring, scheduling, restart, and scaling. Users only need to define data processing tasks through simple configuration files, and Kubedoop will automatically deploy the tasks to the Kubernetes cluster and manage their lifecycle.
Components
Kubedoop Product Operators:
- Kubedoop Operator for Apache Airflow
- Kubedoop Operator for Apache DolphinScheduler
- Kubedoop Operator for Apache Doris
- Kubedoop Operator for Apache Hadoop HDFS
- Kubedoop Operator for Apache HBase
- Kubedoop Operator for Apache Hive
- Kubedoop Operator for Apache Kafka
- Kubedoop Operator for Apache Kyuubi
- Kubedoop Operator for Apache NiFi
- Kubedoop Operator for Apache Spark
- Kubedoop Operator for Apache Superset
- Kubedoop Operator for Trino
- Kubedoop Operator for Apache Zookeeper
Built-in Kubedoop Operators:
Contributing
If you would like to contribute to Kubedoop, please refer to our contribution guide for more information. We welcome all forms of contributions, including but not limited to code, documentation, and use cases.