home..

Faithful Reasoning Using Large Language Models (2022 Deepmind)

huyi / December 2022

0. Abstract

opacity(不透明) –> compromise performance especially multi-step
to address the limitation…
chaining together reasoning steps
each step results from calls to two fine-tuned LMs: selection + inference
advantage of the method
outperform baseline on multi-step logical deduction and scientific question-answering dataset: Proof Writer & question-answering version of EntailmentBank
generates humanly interpretable reasoning traces –> checkable

underlying computations mirror standard definitions of logical validity
advanatage: checkable
necessity:
- 不知道它是从前置知识中得到的答案还是从relevant context中得到的答案
- 前置知识中有很多bias：trained on human data collected from the internet

forward-chaining
backbone: selection & inference (SI) causal structure in between
$+$ two further fine-tuned language models
- the halter terminate the reasoning process and return an answer in the required format
- a learned value function assesses the quality of the current reasoning step –> guide a beam search over reasoning traces

fig1

SI
the halt(generating answer) notice: When there is sufficient information, the model predicts the answer in such a way that it cannot rely on knowledge embedded in its weights, but must depend on the reasoning trace.
3.1 SI

对一下架构做iteration，最后一次的inference用来输入halter

selection LM: training an LM to refer to statements in the context by their sentence labels fig3

assumption: the Inference model produces logically correct inferences

SI 不知道怎么停
final inference 形式不 formal
two-stage Halter
1. given the question and the final inference, if the question can be answered
2. if the question can be answered, use the same LM to generate the answer

如果超过一定的iteration仍然没有得到最终结果，就判定为’unknown’。最终的实验结果把那些判定为unknown的都去掉了。

inference的方向比较随机，没有定向，直觉上会产生很多的unknown question

value function (a language model LM$_{value}$): compute the value of adding a reasoning step to the current trace
correct: ia both logically valid and is on the ground truth
–> SI generate p candidate steps –> keep top b candidates according to LM$_{value}$ –> generate the next step candidates –> b * p traces