home..

QA-GNN (2021 Jure Leskovec)

huyi / October 2022

0. Abstract
- - challenge
- - key innovations of QA-GNN
1. Introduction
2. Problem statement
3. Approach: QA-GNN

0. Abstract

- challenge

methods need to

identify relevant knowledge from large KGs
perform joint reasoning over the QA context and KG
- key innovations of QA-GNN
relevance scoring use LMs to estimate the importance of KG nodes relative to the given QA context
joint reasoning connect the QA context and KG to form a joint graph, and mutually update their representations

1. Introduction

QA must be able to reason over relevant knowledge

knowledge can be implicitly encoded in large language models (LMs) pre-trained on unstructured text
knowledge can be explicitly represented in structured knowledge graphs
LM pros and cons:
- 单纯用 LM 来做 QA 有一些好的结果 –> broad coverage of knowledge
roberta: A robustly optimized bert pretraining approach.

Exploring the limits of transfer learning with a unified text-to-text transformer.
- not good at structured reasoning(bad at negation Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly)
KG pros and cons:
- pros:
  - more suited for structured reasoning
  - enable explainable predictions
- cons:
  - lack coverage and be noisy

challenges of combining LM and KG

identify informative knowledge from a large KG
capture the nuance of the QA context and the structure of the KGs to perform joint reasoning over these two sources of information

limitations of previous work

Previous works retrieve a subgraph from the KG by taking topic entities (KG entities mentioned in the given QA context) and their few-hop neighbors.

this introduces many entity nodes that are semantically irrelevant to the QA context, especially when the number of topic entities or hops increases.
existing LM+KG methods for reasoning treat the QA context and KG as two separate modalities

QA-GNN

relevance scoring score each entity on the KG subgraph(few-hop neighbors of topic entities)

score their relevance to the given QA context through a pre-trained LM
joint reasoning view the QA context as an additional node (QA context node)
- joint graph –> working graph unifies two modalities into one graph
- augment the feature of each node with the relevance score
- design a new attention-based GNN module for reasoning

2. Problem statement

LM + KG 的定义(broadly) $f_{head}(f_{enc}(\mathsf{x}))$
subgraph retieval topic nodes $\mathcal{V}_{q,a}=\mathcal{V}_q\cup\mathcal{V}_a$
subgraph: all nodes on the k-hop paths between nodes in $\mathcal{V}_{q,a}$

3. Approach: QA-GNN

fig2 QA context [q;a] as input

LM –> representation for the context + retrieve subgraph $\mathcal{G}_{sub}$
joint graph by introducing QA context node z –> $\mathcal{G}_W$ (# 3.1) introduce a QA context node z that represents the QA context, and connect z to the topic entities Vq,a so that we have a joint graph over the two sources of knowledge
calculate relevance score as an additional feature (# 3.2)
attention-based GNN module (# 3.3)
make prediction using (# 3.4)
- the LM representation,
- QA context node representation
- a pooled working graph representation

3.1 Joint graph representation

initialization:

$z:z^{LM}=f_{enc}(text(z))$
node: entity embedding

3.2 KG node relevance scoring

motivation: subgraph too big and too many irrelevant nodes

\[\rho_v=f_{head}(f_{enc}([text(z);text(v)]))\] too much weight on generic nodes ?

3.3 GNN architecture

leverage and update the representation of the QA context and KG

Node type & relation-aware message

Node type, relation, and score-aware attention

3.4 Inference & Learning

Given a question q and an answer choice a,probablity of it being an answer $p(a|q)\propto\exp(MLP(z^{LM},z^{GNN},g))$

g 是 graph pooling

用的 cross entropy loss