How to install and setup Apache Airflow on Ubuntu 16 or 18

Create a new Ubuntu virtual machine login

# sudo su

# apt-get update


# apt install python


# python –version

Python 2.7.12


# apt-get install software-properties-common

# apt-get install python-pip
# export SLUGIFY_USERS_TEXT_UNIDECODE=yes
# pip install apache-airflow
# airflow initdb

Gets error so upgrade pip
 pip install --upgrade pip
 airflow initdb

Gets error so  hash -d pip

#  hash -d pip
# pip install apache-airflow

# airflow initdb

Gets error so down grade marshmallow-sqlalchemy
# pip uninstall marshmallow-sqlalchemy
# pip install marshmallow-sqlalchemy==0.17.1
# airflow initdb

Works!
# airflow webserver -p 8080

Open network to allow port 8080 for IP address of server

Go to browser and enter URL: <IP address>:8080

Gets an error that the scheduler does not appear to be running

To fix
# ls -al ~/airflow/
# vi ~/airflow/airflow.cfg

Search for max_threads and change from 2 to 1 because we are running sqlight for the database


# airflow webserver --help
# airflow webserver -p 8080 -D

Start airflow with -D for demon


# airflow scheduler -D

Start the scheduler in the background
# airflow worker -D
 

Does not work?

Next Steps, Coming soon

  • How to replace the SQLight database with MySQL or Postgress
  • How to change the executor to celery
  • How to add encryption to protect passwords