2023年11月29日发(作者:)

CentOS7安装Airflow

实验环境:

centos7

python3.6

安装配置:

1.看看是否有gcc,没有的话需要进⾏安装:

yum install gcc (后续安装airflow如果不成功,可以再次执⾏,它会更新包)【这个很重要哦】

2.安装脚本和依赖:

yum install -y python36

yum install -y python36-pip

yum install -y python36-devel

pip3 install paramiko

安装airflow前,还需要安装依赖的环境:

yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

安装airflow

pip3 install apache-airflow

安装pymysql

pip3 install pymysql

3.配置环境变量

# vi /etc/profile

#airflow

export AIRFLOW_HOME=/software/airflow

# source /etc/profile

初始化

1.初始化数据库表(默认使⽤本地的sqlite数据库):

[root@centos-slave1 centos]# airflow initdb

[2019-09-19 22:58:10,546] {__init__.py:51} INFO - Using executor SequentialExecutor

DB: sqlite:////software/airflow/

[2019-09-19 22:58:11,457] {:369} INFO - Creating tables

INFO [ion] Context impl SQLiteImpl.

INFO [ion] Will assume non-transactional DDL.

INFO [ion] Running upgrade -> e3a246e0dc1, current schema

INFO [ion] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted

/usr/local/lib/python3.6/site-packages/alembic/ddl/:39: UserWarning: Skipping unsupported ALTER for creation of implicit constraint

"Skipping unsupported ALTER for "

INFO [ion] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations

INFO [ion] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance

INFO [ion] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices

INFO [ion] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log

INFO [ion] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun

INFO [ion] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration

INFO [ion] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config

INFO [ion] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user

INFO [ion] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end

INFO [ion] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss

INFO [ion] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection

INFO [ion] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table

INFO [ion] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table

INFO [ion] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index

INFO [ion] Running upgrade cc1e65623dc7 -> bdaa763e6c56, Make xcom value column a large binary

INFO [ion] Running upgrade bdaa763e6c56 -> 947454bf1dff, add ti job_id index

INFO [ion] Running upgrade 947454bf1dff -> d2ae31099d61, Increase text size for MySQL (not relevant for other DBs' text types)

INFO [ion] Running upgrade d2ae31099d61 -> 0e2a74e0fc9f, Add time zone awareness

INFO [ion] Running upgrade d2ae31099d61 -> 33ae817a1ff4, kubernetes_resource_checkpointing

INFO [ion] Running upgrade 33ae817a1ff4 -> 27c6a30d7c24, kubernetes_resource_checkpointing

INFO [ion] Running upgrade 27c6a30d7c24 -> 86770d1215c0, add kubernetes scheduler uniqueness

INFO [ion] Running upgrade 86770d1215c0, 0e2a74e0fc9f -> 05f30312d566, merge heads

INFO [ion] Running upgrade 05f30312d566 -> f23433877c24, fix mysql not null constraint

INFO [ion] Running upgrade f23433877c24 -> 856955da8476, fix sqlite foreign key

INFO [ion] Running upgrade 856955da8476 -> 9635ae0956e7, index-faskfail

INFO [ion] Running upgrade 9635ae0956e7 -> dd25f486b8ea

INFO [ion] Running upgrade dd25f486b8ea -> bf00311e1990, add index to taskinstance

INFO [ion] Running upgrade 9635ae0956e7 -> 0a2a5b66e19d, add task_reschedule table

INFO [ion] Running upgrade 0a2a5b66e19d, bf00311e1990 -> 03bc53e68815, merge_heads_2

INFO [ion] Running upgrade 03bc53e68815 -> 41f5f12752f8, add superuser field

INFO [ion] Running upgrade 41f5f12752f8 -> c8ffec048a3b, add fields to dag

INFO [ion] Running upgrade c8ffec048a3b -> dd4ecb8fbee3, Add schedule interval to dag

INFO [ion] Running upgrade dd4ecb8fbee3 -> 939bb1e647c8, task reschedule fk on cascade delete

INFO [ion] Running upgrade c8ffec048a3b -> a56c9515abdc, Remove dag_stat table

INFO [ion] Running upgrade 939bb1e647c8 -> 6e96a59344a4, Make not nullable

INFO [ion] Running upgrade 6e96a59344a4 -> 74effc47d867, change datetime to datetime2(6) on MSSQL tables

INFO [ion] Running upgrade 939bb1e647c8 -> 004c1210f153, increase queue name size limit

Done.

2.查看其⽣成⽂件:

[root@centos-slave1 centos]# cd /software/airflow/

[root@centos-slave1 airflow]# ls

logs

3.配置MySQL数据库(创建airflow数据库,并创建⽤户和授权,给airflow访问数据库使⽤):

mysql> CREATE DATABASE airflow;

Query OK, 1 row affected (0.00 sec)

mysql> GRANT all privileges on root.* TO 'root'@'localhost' IDENTIFIED BY 'root';

ERROR 1819 (HY000): Your password does not satisfy the current policy requirements

#这个错误与validate_password_policy的值有关。默认值是1,即MEDIUM,所以刚开始设置的密码必须符合长度,且必须含有数字,⼩写或⼤写字母,特殊字符。

有时候,只是为了⾃⼰测试,不想密码设置得那么复杂,譬如说,我只想设置root的密码为root

必须修改两个全局参数:

1)⾸先,修改validate_password_policy参数的值:

mysql> set global validate_password_policy=0;

Query OK, 0 rows affected (0.00 sec)

#这样,判断密码的标准就基于密码的长度了。这个由validate_password_length参数来决定。

mysql> select @@validate_password_length;

+----------------------------+

| @@validate_password_length |

+----------------------------+

| 8 |

+----------------------------+

1 row in set (0.00 sec)

2)修改validate_password_length参数,设置密码仅由密码长度决定。

mysql> set global validate_password_length=1;

Query OK, 0 rows affected (0.00 sec)

mysql> select @@validate_password_length;

+----------------------------+

| @@validate_password_length |

+----------------------------+

| 4 |

+----------------------------+

1 row in set (0.00 sec)

mysql> GRANT all privileges on root.* TO 'root'@'localhost' IDENTIFIED BY 'root';

Query OK, 0 rows affected, 1 warning (0.35 sec)

mysql> FLUSH PRIVILEGES;

Query OK, 0 rows affected (0.01 sec)

4.配置airflow使⽤LocalExecutor执⾏器,及使⽤MySQL数据库:

vim airflow/

# The executor class that airflow should use. Choices include

# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor

#executor = SequentialExecutor

executor = LocalExecutor

# The SqlAlchemy connection string to the metadata database.

# SqlAlchemy supports many different database engine, more information

# their website

#sql_alchemy_conn = sqlite:////data/airflow/

sql_alchemy_conn = mysql+pymysql://root:123456@localhost:3306/airflow

再次初始化数据库表:

airflow initdb

报错:

Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql

解决⽅案:

更改MySQL配置

vim /etc/

[mysqld]

explicit_defaults_for_timestamp=1

或者在数据库中运⾏⼀下语句:

set @@it_defaults_for_timestamp=on;

5.查看创建的airflow数据表:

mysql> use airflow;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql> show tables;

+-------------------+

| Tables_in_airflow |

+-------------------+

1.添加airflow-scheduler服务启动脚本:

airflow webserver

airflow scheduler

[root@centos-master airflow]# airflow webserver

[2019-09-20 00:09:52,980] {:213} INFO - ure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=3168

[2019-09-20 00:09:53,111] {__init__.py:51} INFO - Using executor LocalExecutor

____________ _____________

____ |__( )_________ __/__ /________ __

____ /| |_ /__ ___/_ /_ __ /_ __ _ | /| / /

___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /

_/_/ |_/_/ /_/ /_/ /_/ ____/____/|__/

[2019-09-20 00:09:54,071] {:90} INFO - Filling up the DagBag from /software/airflow/dags

Running the Gunicorn Server with:

Workers: 4 sync

Host: 0.0.0.0:8080

Timeout: 120

Logfiles: - -

=================================================================

[root@centos-master airflow]# airflow scheduler

[2019-09-20 00:10:35,983] {:213} INFO - ure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=3287

[2019-09-20 00:10:36,117] {__init__.py:51} INFO - Using executor LocalExecutor

____________ _____________

____ |__( )_________ __/__ /________ __

____ /| |_ /__ ___/_ /_ __ /_ __ _ | /| / /

___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /

_/_/ |_/_/ /_/ /_/ /_/ ____/____/|__/

[2019-09-20 00:10:36,542] {scheduler_:1315} INFO - Starting the scheduler

[2019-09-20 00:10:36,542] {scheduler_:1323} INFO - Running execute loop for -1 seconds

[2019-09-20 00:10:36,544] {scheduler_:1324} INFO - Processing each file at most -1 times

[2019-09-20 00:10:36,544] {scheduler_:1327} INFO - Searching for files in /software/airflow/dags

[2019-09-20 00:10:36,550] {scheduler_:1329} INFO - There are 20 files in /software/airflow/dags

[2019-09-20 00:10:36,769] {scheduler_:1376} INFO - Resetting orphaned tasks for active dag runs

[2019-09-20 00:10:36,801] {dag_:545} INFO - Launched DagFileProcessorManager with pid: 3338

[2019-09-20 00:10:36,905] {:54} INFO - Configured default timezone

[2019-09-20 00:10:36,920] {:213} INFO - ure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=3338