【翻译】Google代码覆盖率最佳实践

software testing initiatives in various very large software companies. One of the areas that we have consistently advocated for is the use of code coverage data to assess risk and identify gaps in testing. However, the value of code coverage is a highly debated subject with strong opinions, and a surprisingly polarizing topic. Every time code coverage is mentioned in any large group of people, seemingly endless arguments ensue. These tend to lead the conversation away from any productive progress, as people securely bunker in their respective camps. The purpose of this document is to give you tools to steer people on all ends of the spectrum to find common ground so that you can move forward and use coverage information pragmatically. We put forth best practices in the domain of code coverage to work effectively with code health.

在最近几十年，先进的软件测试理念和技术在许多大型或超大型软件公司得以大力推广，其中一个领域就是通过对代码覆盖率的衡量来评估软件(质量)风险以及定位测试疏漏的部分。但是，对于代码覆盖在测试中的有效价值存在激烈的争议，争议各方都有强烈的观点。任何时候只要在大规模讨论中提到代码覆盖，一定会演变成无休无止的争论。争论各方只是安全地固守自己的阵营，强调自己固有的观点，不会打破壁垒去倾听思考，从而产生积极的和有创造性的交流，这样的讨论不会产生任何有意义的结果或结论。这篇文章会给大家介绍一些工具，帮助并带领持有各种观点的人们在讨论时找到共同的基础和出发点，从无休止无意义的争论中跳出来，让代码覆盖在实际应用中起到真正有效的作用。文中我们提出了在代码覆盖领域中的最佳实践经验，以有效地实现代码健康。

Code coverage provides significant benefits to the developer workflow. It is not a perfect measure of test quality, but it does offer a reasonable, objective, industry standard metric with actionable data. It does not require significant human interaction, it applies universally to all products, and there are ample tools available in the industry for most languages. You must treat it with the understanding that it’s a lossy and indirect metric that compresses a lot of information into a single number so it should not be your only source of truth. Instead, use it in conjunction with other techniques to create a more holistic assessment of your testing efforts.

代码覆盖的使用会给软件开发流程带来显著的好处。代码覆盖(率)不能完美地测量软件测试的效果与质量，但它提供了一个合理的，客观的，具有行业标准化的度量制，并且它提供的数据具有可操作性(具体的比如在单元测试中采用条件语句覆盖标准，可以从结果中得知有没有以及哪些条件未被覆盖)。代码覆盖的测量不需要太多的人力参与，它对于所有的软件产品具有通用性，在工业界也有大量的针对大多数编程语言的代码覆盖测量工具。但是在使用代码覆盖作判断和决策的时候，我们需要理解，它只是把很多信息压缩在一个单一数值中的间接度量，并会遗漏一些信息，所以我们不能把它当作事实的唯一来源。相对的，我们应当把代码覆盖和其它技术结合起来，对测试效果进行全局的整体评估。

It is an open research question whether code coverage alone reduces defects, but our experience shows that efforts in increasing code coverage can often lead to culture changes in engineering excellence that in the long run reduce defects. For example, teams that give code coverage priority tend to treat testing as a first class citizen, and tend to bake stronger testability into their product design, so that they can achieve their testing goals with less effort. All this in turn leads to writing higher quality code to begin with (more modular, cleaner contracts in their APIs, more manageable code reviews, etc.). They also start caring more about their overall health, and engineering and operational excellence.

单一使用代码覆盖(来评估测试效果)能否减少软件缺陷是一个开放性的研究问题，但是我们的经验表明，努力提高代码覆盖有助于改进整个工程卓越的文化，长此以往就能达到减少代码缺陷的效果。比如，那些对代码覆盖重视的团队会把测试当成首要任务对待，相应地，在产品设计的环节中，他们也会致力于设计可测试性更强的产品，以帮助他们在后续流程中用更少的投入来达到预先设定的测试目标。所有这些会反向促使开发人员在开发初期就致力于写出更高质量的代码(比如，更丰富的模块，更清晰干净的 API 文档，更易控制的代码审查等等)。他们也会从一开始就更加关注代码的整体健康，以及工程和操作上的完备和卓越性。

A high code coverage percentage does not guarantee high quality in the test coverage. Focusing on getting the number as close as possible to 100% leads to a false sense of security. It could also be wasteful, burning machine cycles and creating technical debt from low-value tests that now need to be maintained. Bad code being pushed to production due to missing tests could happen either because (a) your tests did not cover a specific path of code, a test gap that is easy to identify with code coverage analysis, or (b) because your tests did not cover a specific edge case in an area that did have code coverage, which is difficult or impossible to catch with code coverage analysis. Code coverage does not guarantee that the covered lines or branches have been tested correctly, it just guarantees that they have been executed by a test. Be mindful of copy/pasting tests just for the sake of increasing coverage, or adding tests with little actual value, to comply with the number. A better technique to assess whether you’re adequately exercising the lines your tests cover, and adequately asserting on failures, is mutation testing.

高代码覆盖率并不能保证高质量的测试覆盖。仅仅专注于把一个数字提高到尽可能的接近 100%会造成对代码安全的错误认知。这种行为在资源上也是一种浪费，包括消耗机器资源，以及制造一些不必要的技术负债(比如维护一些价值和功效低的测试用例)。因为测试的疏漏而导致把坏代码发布到产品中，通常有以下两种可能:(1)你的测试没有覆盖某条具体的路径，这样的测试疏漏能通过对代码覆盖的分析而被轻易发现和确定;(2)你忘记了测试某个具体的边缘用例，而这个边缘用例恰好没有代码处理(覆盖)，在这种情况下，仅仅对代码覆盖进行分析就很难甚至不可能发现这样的测试疏漏。此外，代码覆盖并不保证被覆盖的语句或分支运行正确，它只能保证它们被某条测试执行过;而是否运行正确依赖于测试用例的质量。因此，我们需要注意，不要仅仅为了增加代码覆盖而简单地粘贴复制测试用例，或者仅仅是为了增加测试用例的数量而添加意义甚微的用例。对于检验那些被代码覆盖的语句是不是充分有效地被执行了，以及它们是不是能充分有效地对错误进行断言，一个更好的方法是变异测试。

But a low code coverage number does guarantee that large areas of the product are going completely untested by automation on every single deployment. This increases our risk of pushing bad code to production, so it should receive attention. In fact a lot of the value of code coverage data is to highlight not what’s covered, but what’s not covered.

尽管如此(即上条中提到的“高代码覆盖率并不能保证高质量的测试覆盖”)，低代码覆盖率则一定说明产品中有大量的部分完全没有被测试到，无论是在自动化进程的哪一个环节。将这样的坏代码发布到产品中会增加风险，因此需要引起足够的重视。事实上，代码覆盖率的价值不仅仅在于揭示那些被测试覆盖的部分，了解那些未被覆盖的部分更加重要。

There is no “ideal code coverage number” that universally applies to all products. The level of testing you want/need for a set of code should be a function of (a) business impact/criticality of the code; (b) how often you will need to touch/change the code; (c) how much longer you expect the code to live, its complexity, and domain variables. We cannot mandate every single team should have x% code coverage; this is a business decision best made by the owners of the product with domain-specific knowledge. Any mandate to reach x% code coverage should be accompanied by infrastructure investments to make testing easy, such as integrating tools into the developer workflow. Be mindful that engineers may start treating your target like a checkbox and avoid increasing coverage beyond the target, even if doing so would be prudent.

实际中并没有一个可以对所有产品通用的“理想的代码覆盖”的具体数字。你想要或需要测试到达的程度应该根据以下因素具体情况具体设定:(1)代码对业务的影响力或者代码的重要性;(2)代码更新的频率;(3)代码的生存周期，复杂度，以及它所包含的领域变量。我们不能强制每一个团队都必须达到某一指定的代码覆盖(x%)，具体数字目标最好由具有领域专业知识的产品负责人(PO)来制定。如果一定要强制达到 x%覆盖率，那么在生产流程中应有相应的投入来建设平台和设施，使测试工作更容易开展，具体的比如把测试工具集成在开发人员的工作流程中。需要留意的是，工程师最开始的时候也许只会把覆盖率目标当成一个需要完成的指标，一旦指标完成，就会避免投入更多去达到更高的测试覆盖和效果，而这样恰好违背了设置测试覆盖目标的初衷。

In general code coverage of a lot of products is below the bar; we should aim at significantly improving code coverage across the board. Although there is no “ideal code coverage number,” at Google we offer the general guidelines of 60% as “acceptable”, 75% as “commendable” and 90% as “exemplary.” However we like to stay away from broad top-down mandates and encourage every team to select the value that makes sense for their business needs.

总的说来，很多产品的代码覆盖都偏低，我们应该致力在整个行业内显著提高代码覆盖。尽管不存在一个通用的“理想覆盖率”，在谷歌，我们提供一个通用的参考标准:60%为“可接受”，75%为“推荐”，90%为“出色的”。但是我们避免大范围至上而下的强制执行，而是鼓励每个团队根据他们具体的业务需求制定相应合理的标准。

We should not be obsessing on how to get from 90% code coverage to 95%. The gains of increasing code coverage beyond a certain point are logarithmic. But we should be taking concrete steps to get from 30% to 70% and always making sure new code meets our desired threshold.

我们不应该过分追求把已经到达某一阈值的代码覆盖率进一步提升更高一点，比如，从 90%提高到 95%。进一步提升已经达某个(高)阈值的覆盖率，往往事倍功半。与此相对的，我们应该采取具体措施一步一步把代码覆盖从 30%提升到70%，并且始终保证新的代码测试覆盖也能达到预设的阈值。

More important than the percentage of lines covered is human judgment over the actual lines of code (and behaviors) that aren’t being covered (analyzing the gaps in testing) and whether this risk is acceptable or not. What’s not covered is more meaningful than what is covered. Pragmatic discussions over specific lines of code not covered that take place during the code review process are more valuable than over-indexing on an arbitrary target number. We have found out that embedding code coverage into your code review process makes code reviews faster and easier. Not all code is equally important, for example testing debug log lines is often not as important, so when developers can see not just the coverage number, but each covered line highlighted as part of the code review, they will make sure that the most important code is covered.

比语句覆盖率的大小更重要的事情是，通过对测试缺口的分析，我们对未被覆盖的代码(和功能)及其导致的风险可接受性进行人为判断。没有被测试覆盖的部分比已覆盖的部分更有意义。在代码审核过程中，对未被测试覆盖的具体语句的实际讨论比任意地制定一个过高的目标数字更有意义。我们发现，通过在代码审核的过程中嵌入代码覆盖信息会使代码审核更便捷。并不是所有的代码都同等重要，比如，对调试日志代码的测试往往不重要。所以，如果程序员在代码审核中不仅能看见代码覆盖率的数字，还能看见被测试覆盖的每一条语句都被突出标注，他们就更能确保那些重要的代码没有被测试遗漏掉。

Just because your product has low code coverage doesn’t mean you can’t take concrete, incremental steps to improve it over time. Inheriting a legacy system with poor testing and poor testability can be daunting, and you may not feel empowered to turn it around, or even know where to start. But at the very least, you can adopt the ‘boy-scout rule’ (leave the campground cleaner than you found it). Over time, and incrementally, you will get to a healthy location.

即使你的产品代码覆盖率低，你仍然可以采取具体的递进的办法来逐步改进它。接手一个测试和可测试性都很差的保留系统是一件让人望而生畏的事情，你可能对改进这个局面感到无力，甚至不知道从何开始。但是你至少可以采用“童子军规则 (即，让营地比你来时更干净)”，具体到软件工程中，也就是确保你每次签入的代码都比你签出时更加干净。假以时日，系统会逐步到达一个健康良好的状态。

Make sure that frequently changing code is covered. While project wide goals above 90% are most likely not worth it, per-commit coverage goals of 99% are reasonable, and 90% is a good lower threshold. We need to ensure that our tests are not getting worse over time.

确保经常变动的代码能被测试覆盖。尽管在项目中对所有代码泛泛地设置 90%覆盖率的目标在很多情况下并不值得，但对每一条新提交的代码设置 99%的覆盖率目标却是合理的，而在这种具体情况下，90%就是一个有效的底线。我们必须确保我们的测试效果至少不会时间推移而越来越糟。

Unit test code coverage is only a piece of the puzzle. Integration/System test code coverage is important too. And the aggregate view of the coverage of all sources in your Pipeline (unit and integration) is paramount, as it gives you the bigger picture of how much of your code is not exercised by your test automation as it makes its way in your pipeline to a production environment. One thing you should be aware of is while unit tests have high correlation between executed and evaluated code, some of the coverage from integration tests and end-to-end tests is incidental and not deliberate. But incorporating code coverage from integration tests can help you avoid situations where you have a false sense of security that even though you’re not covering code in your unit tests, you think you’re covering it in your integration tests.

单元测试的代码覆盖只是保证产品质量的一部分。集成/系统测试的代码覆盖同样重要。把生产流程(包括单元和集成等)中所有跟代码覆盖相关的资源综合起来分析评估是很重要的，因为它会给你提供一个更大的图景来观察了解你的自动化测试遗漏了多少代码，如果不及时采取相应措施，这些被漏测的代码最终会到达产品中。需要注意的是，单元测试用例在被执行和被分析的代码中具有高相关性，但是在集成和终端测试中，有一些覆盖是偶然发生的。尽管如此，测量并采用集成测试的代码覆盖能帮助我们避免有时候对代码安全的盲目意识，具体来讲，就是想当然认为那些没有在单元测试中覆盖的部分在集成测试中会被覆盖。

We should gate deployments that do not meet our code coverage standards. Teams should debate and decide which gating mechanism makes sense to them. You should however be careful that it doesn’t turn into being treated as a checkbox that is required to be filled, as it can backfire (pressure to 'hit the metric' almost never yields the desired outcome). There are many mechanisms available: gate on coverage for all code vs gate on coverage to new code only; gate on a specific hard-coded code coverage number vs gate on delta from prior version, specific parts of the code to ignore or focus on. And then, commit to upholding these as a team. Drops in code coverage violating the gate should prevent the code from being checked in and reaching production.

对于没有达到我们代码覆盖标准的产品发布我们应当采用 gate 制度。团队应当在商讨后再决定什么样的 gate 机制最合理。但是，我们应该当心不要把这个问题变成强制要求，否则会适得其反(在压力下为了达成某种目标而达成，往往不会产生真正需要的结果)。可用的机制有很多，包括制定 gate 是基于所有代码的覆盖或者是仅仅基于新代码;gate 制定是基于某个具体的固定的覆盖目标值，或是基于和前一个版本覆盖值对比的差异;抑或是某些特殊部分的代码在制定 gate 时可以忽视或者需要特别重视，等等。当做好决定后，整个团队都需要保证支持这个机制。当代码覆盖没有达到约定 gate 标准的时候，代码签入应当被阻止，这样的代码不允许进入产品中。

If you would like to learn more about Google's coverage infrastructure, we welcome you to read our paper “Coverage at Google” which can be found 《Code Coverage Best Practices 》

如果你想更多了解谷歌关于代码覆盖的框架设置、实现、及应用，欢迎阅读我们的论文 “代码覆盖在谷歌中的应用”。

来源：Google Testing Blog
原作者：Carlos Arguelles, Marko Ivanković, and Adam Bender

原文网址：https://testing.googleblog.com/2020/08/code-coverage-best-practices.html

译者：shower

日期：2020-08-07

审校：shower 土司阿哈