项目作者: kevinmchan

项目描述 :
An airflow pipeline for building and scoring NBA daily fantasy models
高级语言: Jupyter Notebook
项目地址: git://github.com/kevinmchan/airjordan.git
创建时间: 2021-02-07T01:40:15Z
项目社区:https://github.com/kevinmchan/airjordan

开源协议:

下载


Airjordan

An airflow pipeline for building and scoring NBA daily fantasy models, including:

  • ETL to load nba stats
  • generating model features
  • scoring model predictions

Dependencies

This project requires the following dependencies in order to work:

  • A postgres sql database to store nba stats in ETL job: Follow these instructions to set up postgres Cloud SQL instance on GCP.
  • Google Cloud Storage bucket to store model objects: Follow these instructions to set up a service account to access files in GCS bucket. You’ll also need to download private keys required to access your bucket.
  • Docker and docker compose: Learn how to install docker and install docker-compose*.
  • API key for mysportsfeed: Get an API key for nba data feeds at mysportsfeed.com. Low cost options are available for personal use.
  • Environment file: Create a .env file in the project root directory (see the .env.template file) to store your mysportsfeed api key, postgres credentials and path to GCS json credentials file.

* If there are issues docker permissions, follow these instructions.

This implementation of the airflow pipeline uses airflow’s CeleryExecutor which allows for parallel execution and easy scaling up of worker nodes.

  1. docker-compose up --build

The airflow webserver is available on port 8080 on the host machine.

Run using pipenv

Note by default, this local developer version of the airflow pipeline, using pipenv, uses the SequentialExecutor which does not allow for parallel execution.

  1. # install dependencies
  2. pipenv install --dev
  3. # activate environment
  4. pipenv shell
  5. # setup airflow
  6. export AIRFLOW_HOME=$(pwd)
  7. airflow db init
  8. airflow users create --username airflow --password airflow --firstname Anonymous --lastname Admin --role Admin --email admin@example.org
  9. # run airflow
  10. airflow webserver
  11. airflow scheduler