Running Airflow with Docker on Developer Environment
What is Airflow:
Apache Airflow™ is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow provides extensible Python framework to build workflow which can be integrated with any technology. Web UI provided by airflow helps to manage workflows very effectively.
Airflow can be deployed in multiple ways from a single process on your laptop to a distributed setup to support even the biggest workflows.
Installing Airflow:
Here I will be explaining airflow standalone setup for developer.
What you need:
- Docker
- Python [with PIP and Virtual environment ]
So I assume now you have docker and python setup ready, lets get going.
- Create new Dockerfile and copy the below code to it.
FROM apache/airflow:2.5.1-python3.9
COPY requirements.txt /requirements.txt
RUN pip install --no-cache-dir -r /requirements.txt
USER root
RUN apt-get update && apt-get install -y \
wget
COPY start.sh /start.sh
RUN chmod +x /start.sh
USER airflow
ENTRYPOINT ["/bin/bash","/start.sh"]
Lets decode line by line
FROM apache/airflow:2.5.1-python3.9 : Refer airlfow base image
COPY requirements.txt /requirements.txt : copy the requirement.txt file this is helpful when we need to install required package for DAGs
RUN pip install --no-cache-dir -r /requirements.txt : install the packages
USER root : switch to root use
RUN apt-get update && apt-get install -y wget : update the apt-get and install wget
COPY start.sh /start.sh : start.sh has airflow comamnds we will be executing copy the file
RUN chmod +x /start.sh : change permission of start.sh file so we can execute
USER airflow: switch to airflow user
ENTRYPOINT ["/bin/bash","/start.sh"] : declare entry point commands.
2. Create start.sh file
#!/bin/bash
airflow standalone
The airflow standalone
command initializes the database, creates a user, and starts all components. standalone command execute multiple commands like
airflow db migrate
airflow users create \
--username admin \
--firstname Peter \
--lastname Parker \
--role Admin \
--email spiderman@superhero.org
airflow webserver --port 8080
airflow scheduler
3. Build the docker image
docker build . -t airflow-local
4. Run the image
docker run -p 8080:8080 -v /c/local/clone/path/airflow-examples/dags:/opt/airflow/dags -d airflow-local
# Here replace local path of airflow DAG's repository, so that if you made changes to dags in the folder we don't have to restart docker
5. After successful run, this will initialize airflow database, webserver and scheduler. Airflow UI will be up and running at http://localhost:8080/home
6. Go to the running container to note password generated by airflow, we need it to login in the airlfow webserver

7. visit http://localhost:8080/home and on login page enter username and password you noted from logs.

After successful login, you should be able to see the home screen.

You can find all code at https://github.com/harshalpagar/airflow-examples.
Happy Learning :-)
Thank You.