Ansible 学习路径
前言
因为工作的缘故接触并积极推动 Ansible 在企业级生产环境的落地,独立承担并实现了《基于 ansible 的主机自动化配置管理》项目,此前也先后接触过 Puppet 和 SaltStack,本文不会讨论开源或者自主研发方案的优劣,重点是和大伙儿分享自己在 ansible 技术领域积累的一些项目实战经验,如果大家遇到任何问题也欢迎通过留言或者其他方式进行互动,我尽力做到有效回复。
Ansible is Simple IT Automation
更新历史
2020 年 06 月 21 日 - 增加 Mitogen for Ansible
2020 年 06 月 01 日 - 增加基于 Ansible 的自动化运维开源项目
2020 年 01 月 22 日 - 增加 Ansible 参考文章
2018 年 05 月 15 日 - 初稿
阅读原文 - https://liaojiaxin158.github.io/post/ansible/
扩展阅读
ansible - https://docs.ansible.com/
Ansible 标准化学习路径
Ansible 相关的书籍在逐步增多,由于 Ansible 版本迭代更新频率高但学习成本低,个人建议书为辅,官方文档为主
Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.
Ansible’s main goals are simplicity and ease-of-use. It also has a strong focus on security and reliability, featuring a minimum of moving parts, usage of OpenSSH for transport (with other transports and pull modes as alternatives), and a language that is designed around auditability by humans–even those not familiar with the program.
We believe simplicity is relevant to all sizes of environments, so we design for busy users of all types: developers, sysadmins, release engineers, IT managers, and everyone in between. Ansible is appropriate for managing all environments, from small setups with a handful of instances to enterprise environments with many thousands of instances.
Ansible manages machines in an agent-less manner. There is never a question of how to upgrade remote daemons or the problem of not being able to manage systems because daemons are uninstalled. Because OpenSSH is one of the most peer-reviewed open source components, security exposure is greatly reduced. Ansible is decentralized–it relies on your existing OS credentials to control access to remote machines. If needed, Ansible can easily connect with Kerberos, LDAP, and other centralized authentication management systems.
This documentation covers the current released version of Ansible and also some development version features. For recent features, we note in each section the version of Ansible where the feature was added.
Ansible releases a new major release of Ansible approximately every two months. The core application evolves somewhat conservatively, valuing simplicity in language design and setup. However, the community around new modules and plugins being developed and contributed moves very quickly, adding many new modules in each release.
Ansible Lightbulb 新版本是 Red Hat Ansible Automation Platform Workshops
The Ansible Lightbulb project is an effort to provide a content toolkit and educational reference for effectively communicating and teaching Ansible topics.
Ansible Lightbulb - https://github.com/ansible/lightbulb
Red Hat Ansible Automation Platform Workshops - https://ansible.github.io/workshops/
Ansible Documentation 是 Ansible 官方文档,我的建议还是对英文不要害怕,多动手查多敲命令去理解
Ansible Documentation - http://docs.ansible.com/ansible/latest/index.html
如果大家需要使用 Role 推荐阅读 Ansible Best Practices
1 | inventories/ |
提升 Ansible 执行效率的插件
众所周知,Ansible 是基于 ssh(当然还有 telnet,winrm 等连接插件)的自动化配置管理工具,其简单易用,无 agent 式的工作方式在很多场景中都有不少优势,不过也是由于这种工作方式导致了它没有其他 c/s 类的工具执行效率高,饱受其他 C/S 类工具使用者的讥讽,对此,Ansible 官方也对 Ansible 的速度效率做了不少优化手段。
参数名 / 优化类别 | 说明 | ||
---|---|---|---|
fact cache | 将 facts 信息第一次收集后缓存到 memory 或者 redis 或者文件中。 |
||
gather_subset | 可选择性的收集 network ,hardware 等信息,而不是全部 |
||
control_path | 开启 ssh socket 持久化,复用 ssh 连接 |
||
pipelinling | 开启 ssh pipelining , 客户端从管道中读取执行渲染后的脚本,而不是在客户端创建临时文件 |
||
fork | 提高并行执行主机的数量 | ||
serial | 将 play_hosts` ①` 中主机再分批执行 |
||
strategy | 默认 linear , 每个主机的单个 task 执行完成会等待其他都完成后再执行下个任务,设置 free 可不等待其他主机,继续往下执行(看起来会比较乱),还有一个选项 host_pinned ,我也不知道干嘛的 |
无意发现了一个 Mitogen 的 Ansible plugin(strategy plugin),当前已迭代到 0.29 版本,看介绍说能提升 1.2x ~ 7x 以上的执行效率,着实惊人!
它通过高效的远程过程调用来取代 ansible 默认的嵌入式与纯 python shell 调用,它不会优化模块本身的执行效率,只会尽可能快的②去执行模块获取返回 (执行模块前也是有一系列连接,发送数据,传输渲染脚本等操作的) 来提高整体的效率,特性如下
Expect a 1.25x - 7x speedup and a CPU usage reduction of at least 2x, depending on network conditions, modules executed, and time already spent by targets on useful work. Mitogen cannot improve a module once it is executing, it can only ensure the module executes as quickly as possible.
One connection is used per target, in addition to one sudo invocation per user account. This is much better than SSH multiplexing combined with pipelining, as significant state can be maintained in RAM between steps, and system logs aren’t spammed with repeat authentication events.
A single network roundtrip is used to execute a step whose code already exists in RAM on the target. Eliminating multiplexed SSH channel creation saves 4 ms runtime per 1 ms of network latency for every playbook step.
Processes are aggressively reused, avoiding the cost of invoking Python and recompiling imports, saving 300-800 ms for every playbook step.
Code is ephemerally cached in RAM, reducing bandwidth usage by an order of magnitude compared to SSH pipelining, with around 5x fewer frames traversing the network in a typical run.
Fewer writes to the target filesystem occur. In typical configurations, Ansible repeatedly rewrites and extracts ZIP files to multiple temporary directories on the target. Security issues relating to temporary files in cross-account scenarios are entirely avoided.
The effect is most potent on playbooks that execute many short-lived actions, where Ansible’s overhead dominates the cost of the operation, for example when executing large with_items
loops to run simple commands or write files.
大体就是执行过程中主机使用一个连接(默认每执行一个 task
或者 loop
循环都会重新打开一次连接的);渲染的执行代码暂存于内存中;减少多路复用 ssh
隧道的时间消耗;减少临时文件传输的带宽;代码重用,避免代码的重新编译成本等
实现原理的话,可以去看下官网解释,反正我是没怎么看懂
① . play_hosts
为内置参数,指当前正在执行的 playbook 中的主机列表
②. 尽可能快的
指到运行模块前的阶段
- Download and extract mitogen-0.2.9.tar.gz
- Modify
ansible.cfg
1 | [defaults] |
The strategy
key is optional. If omitted, the ANSIBLE_STRATEGY=mitogen_linear
environment variable can be set on a per-run basis. Like mitogen_linear
, the mitogen_free
and mitogen_host_pinned
strategies exists to mimic the free
and host_pinned
strategies.
https://networkgenomics.com/ansible/
https://mitogen.networkgenomics.com/ansible_detailed.html
基于 Ansible 的开源项目
第一个是 ansible 官方开源项目,其他都是和 ansible 相关的运维平台开源项目,推荐学习和参考
Ansible - https://github.com/ansible/ansible
Jumpserver - http://www.jumpserver.org/
OpsManage - https://github.com/welliamcao/OpsManage
spug - https://github.com/openspug/spug
BigOps - http://www.bigops.com/
Ansible 项目实践
以下内容来自于《基于 ansible 的主机自动化配置管理》项目,基于 ansible 目前可以满足生产环境所有基线要求,相信对大家有一定的参考价值
ansible 部署
因为生产环境为内外网物理隔离,所有的安装部署都是离线进行的
1 | # Install Packages |
ansible.cfg 配置解析
ansible.cfg 不影响执行结果但合理的配置会有效提升效率
1 | # 配置文件路径(优先级) |
Linux
- 服务端操作系统:RHEL 6/7(Windows 不可作为控制端)
- 服务端 Python 版本:2.7.14(实测安装完成无需额外调整)
- Ansible 版本:2.3.3.0(实测 2.4 以上版本已不支持 rhel5.5,客户端需 simplejson)
- 管理对象:目前主要针对 RHEL 5/6/7(Windows 使用高版本 Ansible)
- 基线标准:参考《主机岗配置基线 v1.1.xlsx》
服务端
- 操作系统版本:RHEL 6/7
- Python 版本:2.7.14
- 安装方式:pip 离线安装依赖包
客户端
- 操作系统版本:RHEL 5/6/7
- 非最小模式安装无需做调整
- RHEL5.5 需要安装 simplejson
核心用法
1 | # 检测 ansible 是否可以正常访问主机 |
Windows
- 服务端操作系统:RHEL 6/7(Windows 不可作为控制端)
- 服务端 Python 版本:2.7.14(实测安装完成无需额外调整)
- Ansible 版本:2.5.0(Windows 原生模块支持需要持续更新 Ansible 新版本)
- 管理对象:目前主要针对 Windows 7/2008/2012(不支持 xp/2003)
- 基线标准:参考《Windows 安全基线》
服务端
- 操作系统版本:RHEL 6/7
- Python 版本:2.7.14
- 安装方式:pip 离线安装依赖包(目前使用 pipenv 切换管理 Linux 和 Windows)
客户端
- 操作系统版本:Window 7/2008/2012
- WinRM(Windows 7/2008 需要升级至 Powershell v3.0)
核心用法
1 | # 检测 ansible 是否可以正常访问主机 |
结语
很抱歉我暂时不能分享全部信息,但是这并不妨碍技术上的交流,我会逐步分享有价值的可公开代码
- 遵循 what/why/how 思路,要理解 ansible 能解决什么问题,为什么选择 ansible,怎么使用 ansible 去解决
- Ansible 学习成本低但不等同于没有难度,学习路径推荐参考官方文档并积极实践,官网没有答案要善用 Google 搜索
- Ansible 纯后台模式只解决了部分问题,还有更多需求要通过基于 Ansible 的自动化运维平台来实现,拥抱开源技术不能固步自封