In [2]:
import warnings
warnings.filterwarnings("ignore")

[![View the code](https://img.shields.io/badge/GitHub-View_the_Code-blue?logo=GitHub)](https://github.com/khuyentran1401/hydra_demo)

## Configure your Data Science Projects with Hydra


### Introduction

[Hydra](https://hydra.cc/) is a simple tool to manage complex configurations in Python. To install Hydra, type:

```bash
pip install hydra-core
```

The video below shows some simple features of Hydra. 

In [4]:
from IPython.display import HTML

# Youtube
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/IzEngnqOaRA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')


Imagine your YAML configuration file looks like this:

```yaml
process:
  keep_columns:
      - Income
      - Recency
      - NumWebVisitsMonth
      - Complain
      - age
      - total_purchases
      - enrollment_years
      - family_size

  remove_outliers_threshold:
    age: 90
    Income: 600000
```
To access the list under `process.keep_columns` in the configuration file, simple add the `@hydra.main` decorator to the function that uses the configuration:

```python
import hydra
from omegaconf import DictConfig, OmegaConf


@hydra.main(config_path="../config", config_name="main")
def process_data(config: DictConfig):

    print(config.process.keep_columns)

process_data()
```
Output:
```bash
['Income', 'Recency', 'NumWebVisitsMonth', 'Complain', 'age', 'total_purchases', 'enrollment_years', 'family_size']
```

### Group Configuration Files

In [5]:
# Youtube
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/t9hwWxBnU0o?start=55" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

Imagine the structure of your `config` directory looks like this:

```bash
config
├── main.yaml
└── process
    ├── process_1.yaml
    ├── process_2.yaml
    ├── process_3.yaml
    └── process_4.yaml
```

Each file has different values for the same parameters. You can set the parameters in the file `process_2.yaml` as default by adding the following to `main.yaml`:
```yaml
defaults:
  - process: process_2
  - _self_
```

Now the parameters in `main.yaml` are merged with the parameters in `process_2.yaml`.

Running the file [`print_config.py`](https://github.com/khuyentran1401/hydra_demo/blob/master/hydra_group/src/print_config.py):

```bash
python print_config.py
```
should print:
```yaml
# From process_2.yaml
process:
  keep_columns:
  - Income
  - Recency
  - NumWebVisitsMonth
  - Complain
  - age
  - total_purchases
  - enrollment_years
  - family_size
  remove_outliers_threshold:
    age: 90
    Income: 600000
  family_size:
    Married: 2
    Together: 2
    Absurd: 1
    Widow: 1
    YOLO: 1
    Divorced: 1
    Single: 1
    Alone: 1

# From main.yaml
raw_data:
  path: data/raw/marketing_campaign.csv
intermediate:
  dir: data/intermediate
  name: scale_features.csv
  path: ${intermediate.dir}/${intermediate.name}
flow: all
image:
  kmeans: image/elbow.png
  clusters: image/cluster.png
```


### Override Default Parameters

In [2]:
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/t9hwWxBnU0o?start=167" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

You can also override the default parameters on the command line. For example, to replace `process_2` with `process_1`, run the following:

```bash
python print_config.py process=process_1
```

The output should be the combination of all parameters in `main.yaml` and in `process_1.yaml`:
```yaml
# From process_1.yaml
process:
  keep_columns:
  - Income
  - Recency
  - NumWebVisitsMonth
  - AcceptedCmp3
  - AcceptedCmp4
  - AcceptedCmp5
  - AcceptedCmp1
  - AcceptedCmp2
  - Complain
  - Response
  - age
  - total_purchases
  - enrollment_years
  - family_size
  remove_outliers_threshold:
    age: 90
    Income: 600000
  family_size:
    Married: 2
    Together: 2
    Absurd: 1
    Widow: 1
    YOLO: 1
    Divorced: 1
    Single: 1
    Alone: 1
    
# From main.yaml
raw_data:
  path: data/raw/marketing_campaign.csv
intermediate:
  dir: data/intermediate
  name: scale_features.csv
  path: ${intermediate.dir}/${intermediate.name}
flow: all
image:
  kmeans: image/elbow.png
  clusters: image/cluster.png
```