弱鸡咋玩呢?试着让小鸡们联手?
目录
一、在每个小鸡上都安装ray环境
apt-get install python3-pip pip3 install -U ray
二、部署集群
选择一个小鸡作为头节点(head)
ray start --head --port=6379
header运行ray成功
Local node IP: IP-of-head-node 2020-12-23 16:12:46,527 INFO services.py:1092 -- View the Ray dashboard at http://localhost:8265 -------------------- Ray runtime started. -------------------- Next steps To connect to this Ray runtime from another node, run ray start --address='IP-of-head-node:6379' --redis-password='5241590000000000' Alternatively, use the following Python code: import ray ray.init(address='auto', _redis_password='5241590000000000') If connection fails, check your firewall settings and network configuration. To terminate the Ray runtime, run ray stop
选择其他小鸡作为子节点(worker)
根据以上头节点的提示,在所有子节点上运行(注意替换IP-of-head-node)
ray start --address='IP-of-head-node:6379' --redis-password='5241590000000000'
三、测试子节点是否成功和头节点/构成集群
在头节点小鸡上新建一个Python脚本
vim ray_test.py
import ray import time # ray.init(address="auto") ray.init(address='auto', _redis_password='5241590000000000') @ray.remote def f(): time.sleep(0.01) return ray._private.services.get_node_ip_address() if __name__ == "__main__": ips = set(ray.get([f.remote() for _ in range(1000)])) print(ips) print(len(ips))
这里对官方的实例中的f函数做了修改,否则会报
AttributeError: module 'ray' has no attribute 'services'
错误。
在头节点小鸡上运行脚本
python3 ray_test.py
输出
2020-12-23 16:34:03,700 INFO worker.py:651 -- Connecting to existing Ray cluster at address: 192.3.231.zz:6379 {'67.198.228.xx', '104.168.89.yy', '192.3.231.zz'} 3
可见头节点成功连接上子节点,同时在子节点上运行了f()函数并获取到子节点的IP信息。输出结果表示集群中有三个小鸡,其中有一个是头结点,其余两个是子节点。
四、后记
集群是搭建起来了,接下来可以试着跑一些网络请求密集型的小程序。
参考:
1. 官网文档
2. Ray Distributed AI Framework Curriculum Offered on the Intel® AI DevCloud
3. Modern Parallel and Distributed Python: A Quick Tutorial on Ray