网站首页 > 技术文章正文

深入理解子进程(续)（子进程和子线程）

nanyue 2024-09-09 04:53:20 技术文章 6 ℃

sh 的实现

sh 为 Shell 操作提供了一个非常好用的接口，功能也非常强大。

from sh import ls
print(ls('-l'))

源码：https://github.com/amoffat/sh/blob/1.12.14/sh.py

代码总共 3500 多行，这里只介绍它实现中的关键部分，有兴趣可以详细阅读它的源码。读代码之前最好先看一下它的文档 Reference，了解各个对象间的逻辑关系以及代码执行流程，避免陷入细节中。

从 sh 模块导入任意命令

我们可以从 sh 模块导入任意命令，而且这些命令都是动态生成的。

下面解析它的实现：

# magicmodule.py
import sys
from types import ModuleType
class MagicModule(ModuleType):
def __init__(self, name):
super.__init__(name)
def __getattr__(self, name):
return name
sys.modules[__name__] = MagicModule(__name__)

当导入一个模块时，Python 的导入机制会创建一个模块对象存到 sys.modules[__name__]中，而from module import attr就会获取该模块对象的属性。把系统创建的模块对象替换成实现了__getattr__方法的模块对象，就能导入任意命令了。

sh 中相关代码在 L3333 SelfWrapper类和 L3581 修改sys.modules。

优雅的异常处理

sh 支持这样的用法：

from sh import ErrorReturnCode_12
from sh import SignalException_9
from sh import SignalException_SIGHUP
try:
some_cmd
except ErrorReturnCode_12:
print("couldn't do X")

这些异常类都是动态生成的(L433)：

def get_exc_from_name(name):
"""takes an exception name, like:
ErrorReturnCode_1
SignalException_9
SignalException_SIGHUP
and returns the corresponding exception. this is primarily used for
importing exceptions from sh into user code, for instance, to capture those
exceptions
"""

它用下面这个正则表达式(L424) 匹配异常类的名称，取出返回码或者信号码：

rc_exc_regex = re.compile("(ErrorReturnCode|SignalException)_((d )|SIG[a-zA-Z] )")

然后动态生成一个类：

exc = ErrorReturnCodeMeta(name, (base,), {"exit_code": rc})

ErrorReturnCodeMeta(L325) 是一个元类，动态生成的异常类是元类的实例。

which 命令

sh 用 Python 实现了 which命令的功能(L522):

实现原理就是遍历 PATH环境变量中所有的路径，找到符合要求的可执行文件：

# L528
def is_exe(fpath):
# 存在 & 可执行 & 是个文件
return (os.path.exists(fpath) and
os.access(fpath, os.X_OK) and
os.path.isfile(os.path.realpath(fpath)))
# L554
for path in paths_to_search:
exe_file = os.path.join(path, program)
if is_exe(exe_file):
found_path = exe_file
break

后面启动子进程需要用到这个函数。

def resolve_command_path(program):
# 查找可执行文件
path = which(program)
if not path:
# 替换下划线为横杠，再次尝试
if "_" in program:
path = which(program.replace("_", "-"))
if not path:
return None
return path
# 导入命令时调用此函数，创建 Command 对象
def resolve_command(name, baked_args=None):
path = resolve_command_path(name)
cmd = None
if path:
cmd = Command(path)
if baked_args:
cmd = cmd.bake(**baked_args)
return cmd

Command 类的实现

接下来先看 Command类的实现(L1054)，所有命令都是这个类的实例。

class Command(object):
""" represents an un-run system program, like "ls" or "cd". """
# L1065
_call_args = {
"fg": False, # run command in foreground
# run a command in the background. commands run in the background
# ignore SIGHUP and do not automatically exit when the parent process
# ends
"bg": False,
# ...一堆参数
}
# L1188
def __init__(self, path, search_paths=None):
found = which(path, search_paths)
# L1209
self._path = encode_to_py3bytes_or_py2str(found)

它把大量的参数定义放在 _call_args中，这样有几个好处：

很方便处理大量参数
很方便写注释
可读性更好

因为要允许用户直接创建 Command 对象，所以又调用了一次 which。

接着看它怎么处理参数的(L1236)：

@staticmethod
def _extract_call_args(kwargs):
""" takes kwargs that were passed to a command's __call__ and extracts
out the special keyword arguments, we return a tuple of special keyword
args, and kwargs that will go to the execd command """
kwargs = kwargs.copy
call_args = {}
for parg, default in Command._call_args.items:
key = "_" parg
if key in kwargs:
call_args[parg] = kwargs[key]
del kwargs[key]
invalid_kwargs = special_kwarg_validator(call_args,
Command._kwarg_validators)
if invalid_kwargs:
exc_msg = 
for args, error_msg in invalid_kwargs:
exc_msg.append(" %r: %s" % (args, error_msg))
exc_msg = "n".join(exc_msg)
raise TypeError("Invalid special arguments:nn%sn" % exc_msg)
return call_args, kwargs

它将参数分成特殊参数(下划线开头) 和普通参数，特殊参数能够控制命令的执行过程，还能看到它对特殊参数进行了统一校验，出错提示也非常清晰。

Command 对象的 bake方法，功能类似于functools.partial:

>>> from sh import ls
>>> lslh = ls.bake('-l', '-h')
>>> lslh
total 56K
-rw-r--r-- 1 guyskk guyskk 155 May 12 13:00 aaa.json
-rw-r--r-- 1 guyskk guyskk 162 May 12 13:00 bbb.py
...
>>> ls
<Command '/usr/bin/ls'>
>>> lslh
<Command '/usr/bin/ls -l -h'>
>>>

bake方法的实现(L1265):

def bake(self, *args, **kwargs):
fn = type(self)(self._path)
fn._partial = True
call_args, kwargs = self._extract_call_args(kwargs)
pruned_call_args = call_args
for k, v in Command._call_args.items:
try:
if pruned_call_args[k] == v:
del pruned_call_args[k]
except KeyError:
continue
fn._partial_call_args.update(self._partial_call_args)
fn._partial_call_args.update(pruned_call_args)
fn._partial_baked_args.extend(self._partial_baked_args)
sep = pruned_call_args.get("long_sep", self._call_args["long_sep"])
prefix = pruned_call_args.get("long_prefix",
self._call_args["long_prefix"])
fn._partial_baked_args.extend(compile_args(args, kwargs, sep, prefix))
return fn

_partial_call_args属性稍后会用到。

Command 是 callable 对象，它的 __call__方法实现比较复杂(L1324)：

def __call__(self, *args, **kwargs):
# ...中间的具体实现不太好理解，先看最后一行
return RunningCommand(cmd, call_args, stdin, stdout, stderr)

RunningCommand 是创建子进程执行命令，所以这里主要是处理参数和三个标准 IO。

处理管道命令，如果第一个参数是正在运行的命令，就复用它的标准输入:

# L1373
# check if we're piping via composition
stdin = call_args["in"]
if args:
first_arg = args.pop(0)
if isinstance(first_arg, RunningCommand):
if first_arg.call_args["piped"]:
stdin = first_arg.process
else:
stdin = first_arg.process._pipe_queue
else:
args.insert(0, first_arg)

处理 fg(foreground) 参数:

if call_args["fg"]:
if call_args["env"] is None:
launch = lambda: os.spawnv(os.P_WAIT, cmd[0], cmd)
else:
launch = lambda: os.spawnve(os.P_WAIT, cmd[0], cmd, call_args["env"])
exit_code = launch

os.spawn*和os.system运行的效果差不多，区别在于它不需要通过 sh 进程执行命令。

out参数(err参数也差不多):

# stdout redirection
stdout = call_args["out"]
if output_redirect_is_filename(stdout):
stdout = open(str(stdout), "wb")

RunningCommand 的实现

先看接口(L649)：

class RunningCommand(object):
"""this represents an executing Command object."""
def __init__(self, cmd, call_args, stdin, stdout, stderr):
"""
cmd is an array, where each element is encoded as bytes (PY3) or str
(PY2)
"""

其实和 Popen 的接口差不多，只是它把一堆参数放在 call_args里面了。

其中有个 iter参数允许迭代获取输出，而不是等子进程结束后再一次性获取。

# set up which stream should write to the pipe
# TODO, make pipe None by default and limit the size of the Queue
# in oproc.OProc
pipe = OProc.STDOUT
if call_args["iter"] == "out" or call_args["iter"] is True:
pipe = OProc.STDOUT
elif call_args["iter"] == "err":
pipe = OProc.STDERR

通过 OProc创建进程执行命令(L750)，之后等待进程结束：

if spawn_process:
# this lock is needed because of a race condition where a background
# thread, created in the OProc constructor, may try to access
# self.process, but it has not been assigned yet
process_assign_lock = threading.Lock
with process_assign_lock:
self.process = OProc(self, self.log, cmd, stdin, stdout, stderr,
self.call_args, pipe, process_assign_lock)
if should_wait:
self.wait

其实 RunningCommand实现了__str__和__repr__方法，所以它看上去像字符串，它也实现了__iter__方法，也就能迭代获取输出。

>>> ret = sh.ls('-l')
>>> type(ret)
<class 'sh.RunningCommand'>
>>> for line in ret:
... print(line)
total 28392
-rwxr-xr-x 1 guyskk guyskk 8464 Jul 1 18:34 a.out
-rw-r--r-- 1 guyskk guyskk 421 Jun 4 22:15 app.py

OProc 的实现

OProc(L1678) 封装了创建进程以及进程通信的逻辑，绝大部分特殊参数都是在这处理的。它的构造函数特别特别长，逻辑太多了。

特殊参数这么多，估计作者也很无奈:

# convenience
ca = self.call_args

这里主要看一下 伪终端相关的参数：

_tty_inDefault value: False, meaning a os.pipe will be used.
_tty_outDefault value: True

If True, sh creates a TTY for STDOUT, otherwise use a os.pipe.

子进程的输入默认是管道，输出默认是伪终端。伪终端是行缓存模式，所以能不停地取到输出，对比一下前面用 Popen 运行 hello.py 的效果:

sh 把 _tty_out默认值设为 True 使得它在兼容性方面比 Popen 好很多， Why isttyout=True the default?

大致看一下实现:

# L1770
elif ca["tty_in"]:
self._stdin_read_fd, self._stdin_write_fd = pty.openpty
# tty_in=False is the default
else:
self._stdin_write_fd, self._stdin_read_fd = os.pipe
# L1782
# tty_out=True is the default
elif ca["tty_out"]:
self._stdout_read_fd, self._stdout_write_fd = pty.openpty
else:
self._stdout_read_fd, self._stdout_write_fd = os.pipe

后面的 fork, exec 和 Popen 几乎一样，就不重复介绍了。

读后感

sh 的代码我看了大约一周才理清其中的逻辑，代码太复杂了。

大致有几个原因:

Unix 进程本身的复杂性，概念和暗坑很多
用了一些不为人知的 Python 特性(黑魔法)
源码只有一个文件，3500 代码，代码结构不清晰， bottle 项目也是这样的问题
功能太强大了

源码中还有非常多细节我没有提到，其实很多我也不明白，所以只能大致地介绍几个要点，梳理一下命令的执行过程，希望能有所帮助 ╮(￣▽￣")╭

上一篇：零基础小白Python入门必看:面向对象之典型魔术方法
下一篇： python返回类型注解（python返回类名）

网站首页 > 技术文章 正文