项目作者: alexandrustoica

项目描述 :
Real-Time & Batch Data Processing Pipeline
高级语言: Python
项目地址: git://github.com/alexandrustoica/pipeoop.git
创建时间: 2019-12-14T20:15:35Z
项目社区:https://github.com/alexandrustoica/pipeoop

开源协议:

下载


pipeoop

pipeoop is a real-time and batch data processing pipeline for an industrial spray painting gun machine. Design Specs - WIP

How to run

  1. Build our services.

    1. $ docker-compose build
  2. Run our services (you can run in in detached mode with -d).

    1. $ docker-compose up

    Read the documentation here.

    Note: It might take a while to set up our HDFS service and requires at least 8gb RAM.

  3. Optional: You can connect to our HDFS service by finding cloudera container’s id:

    1. docker exec -it $(docker container ls | grep cloudera | cut -d " " -f1) bash

    Read the documentation here.

  4. Optional: You can go to http://localhost:8888/ to access the HDFS server with Hue.

    1. username: cloudera
    2. password: cloudera

5.1. To stop our Docker containers:

  1. $ docker-compose down

5.2 To stop our Docker containers and delete their data (databases, cache, etc):

  1. $ docker-compose down -v --rmi

Read the documentation here.

How to contribute

To run our project, you’ll need Docker and Docker Compose.

Fork repository, make changes, send us a pull request. We will review your changes and apply them to the master branch shortly, provided they don’t violate our quality standards. To avoid frustration, before sending us your pull request, please run all services provided in docker-compose.yml file, and make sure everything runs according to our design specs:

  1. $ docker-compose up

Note: Each directory contains data configurations or programs used in our pipeline, if you need to create a new service, please attach all the required files in its directory.

Authors

  • Alexandru Stoica - Initial work - Master