Docker image for main Apache Hadoop components (Yarn/Hdfs)
These images are part of the Bigdata docker image series. All of the images use the same base docker image which contains plugin scripts to launch different project in containerized environments.
For more detailed instruction about the available environment variables see the README in the flokkr/docker-baseimage
repository.
Docker images are tested with Kubernetes
The easiest way to start is to do a kubectl apply -f .
from the ./exmaples
directories (Using ephemeral storage!)
For more specific use case it’s recommended to use flekszible. The resource definitions can be found in this repository (./hadoop
,./hdfs
,./yarn
…)
Install Flekszible (download binary and put it to the path)
cd /tmp
mkdir cluster
cd cluster
flekszible source add github.com/flokkr/docker-hadoop
flekszible app add hdfs
flekszible generate
kubectl apply -f .
You can list available apps (after source import):
flekszible app search
+---------+-------------------------------+
| path | description |
+---------+-------------------------------+
| hdfs | Apache Hadoop HDFS base setup |
| hdfs-ha | Apache Hadoop HDFS, HA setup |
...
The base setup can be modified with additional transformatios:
flekszible definitions search | grep hdfs
...
| hdfs/persistence | Add real PVC based persistence |
| hdfs/onenode | remove scheduling rules to make it possible to run multiple datanode on the same k8s node. |
...
You can apply transformations with modifing the Flekszible
descriptor file:
Original version:
source:
- url: github.com/flokkr/docker-hadoop
import:
- path: hdfs
Modified:
source:
- url: github.com/flokkr/docker-hadoop
import:
- path: hdfs
transformations:
- type: hdfs/onenode
- type: image
image: flokkr/hadoop:3.2.0