Bart Large CNN 等基于 Transformer 的模型可以很容易为文本生成摘要。这些机器学习模型易于使用但比较难扩展。下面一起来看看如何使用 Bart Large CNN 以及如何优化其性能。
在 Transformer 和神经网络提出之前，实际也有一些解决方案，但没有一个是真正令人满意的。
近年来，人们基于 Transformer 创建了许多表现优秀的预训练模型。其中就包含由 Facebook 发布的 Bart Large CNN，其在文本摘要生成方面表现出色。
以下是如何在 Python 代码中使用 Bart Large CNN。
使用 Bart Large CNN 的最简单方法是从 Hugging Face 存储库下载它，然后调用 Transformers 库中现有的摘要生成方法：
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
article = """New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18."""
summary = summarizer(article, max_length=130, min_length=30))
Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.
max_length 参数表示摘要的 Token 长度范围。Token 可以是一个词，也可以是标点符号或词。一般来说，100 个 Token 大致等于 75 个单词。
重要提示：每次输入的文本不能多于 1024 个 Token（约为 800 个单词），这是模型的内部限制。如果你想为更长的文本生成摘要，一个可行的办法是将文本分成若干部分，分别生成摘要，然后将生成的摘要拼接起来，甚至你还可以为摘要生成摘要！
Bart Large CNN 模型存在两个不能忽视的问题。
首先，与许多深度学习模型一样，它需要大量的磁盘空间和 RAM（大约 1.5GB）。与 GPT-3、GPT-J、T5 11B 等大型深度学习模型相比，Bart Large CNN 算是小巫见大巫。
其次，模型运行耗时严重。如果你想为一段由 800 个单词组成的文本生成摘要，在一个性能较强的 CPU 上大约也需要 20 秒……
解决方案是在 GPU 上部署 Bart large CNN。例如，在 NVIDIA Tesla T4 上，大概会有 10 倍的加速，为一段由 800 个单词组成的文本生成摘要只需要 2 秒。
现在 GPU 价格还是很贵，是否应当配备应取决于您的实际需要。
使用 Bart Large CNN 为文本生成摘要容易用脚本实现，但是如果您想在生产环境中用来处理大量请求怎么办？
第二种解决方案是将此任务委托给例如 NLP Cloud 等第三方服务，类似的服务可以通过提供 API 为您提供 Bart Large CNN 模型的处理能力。
借助 Transformers 和 Bart Large CNN，可以毫不费力地在 Python 中为文本生成摘要。
现在越来越多的公司在其应用程序中实现自动生成文本摘要。任务的难点在于模型过于复杂而带来的性能问题。当然，也有一些技术可以加速 Bart Large CNN 运行。
作者：Julien Salinas，是 NLPCloud.io 首席技术官