A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Runs as standalone application on a single box / JVM. Also supports embedded mode.
Runs as an MapReduce application on multiple Hadoop versions. Also supports Azkaban for launcing MR jobs.
Yarn / Mesos
Runs as a Standalone Cluster with Master and Workers. This mode supports HA, and can run on bare metals as well.
Runs as Elastic Cluster on AWS cloud (Azure and GCP support coming soon). This mode supports HA.