home..

Graph Representation Learning

chap1 Introduction

1.1 What is a graph?

$\mathcal{G=(V,E)}$

simple graph,directed graph, directed graph with weighted edges可以由adjacency matrix来表示

simple graph: 其中的边undirected,且没有自己指向自己的节点)
directed graph就是adjacency matrix不对称而已

1.1.1 Multi-relational Graphs

different types of edges extend the edge notation to include an edge or relation type $\tau$ , e.g., $(u, \tau, v)\in\mathcal{E}$,对于每一种type $\tau$ 可以有一个adjacency matrix–>adjacency tensor $\mathcal{A}\in\mathbb{R}^{\mathcal{|V|\times|R|\times|V|}}$, 其中 $\mathcal{R}$是 relations的集合

也有可以连接多种节点类型的relation type

1.1.2 Feature Information

node-level attributes that we represent using a real-valued matrix: $\mathsf{X}\in\mathbb{R}^{\mathcal{|V|\times m}}$
$\mathcal{|V|}$个节点,每个节点m维向量 注:节点顺序和adjacency matrix里面的节点顺序相同 很少考虑edge在除了types之外还有自己的feature

1.2 Machine learning on graphs

problem-driven

分类不可直接沿用 supervised / unsupervised

图机器学习的任务类型:

1.2.1 Node classification

the goal is to predict the label $y_u$ associated with all the nodes $u \in V$, when we are only given the true labels on a training set of nodes $V_{train}\subset\mathcal{V}$.

node classification和标准的supervised classification 区别:

    nodes in standard supervised classification: independent and identically distributed
    nodes in graph machine learning: interconnected-->model the relationships between nodes

由于我们可以接触到test nodes的一些信息,所以不完全算unsupervised–>semi-supervised

1.2.2 Relation prediction

别名:link prediction, graph completion, relational inference

standard setup:有一堆nodes $\mathcal{V}$, 一堆incomplete set of edges $\mathcal{E}$ \ $\mathcal{E}_{train}$

1.2.3 Clustering and community detection

接近于unsupervised

1.2.4 Graph classification, regression, and clustering

chap2 Background and Traditional Approaches

what’s used before the advent of modern deep learning approaches?

针对三种不同的学习任务:

node and graph classification tasks: basic graph statistics, kernel methods
relation prediction: various approaches for measuring the overlap between node neighborhoods,
clustering or community detection on graphs: spectral clustering using graph Laplacians

2.1 Graph Statistics and Kernel Methods

traditional way: extracting some statistics or features—based on heuristic functions or domain knowledge—and then use these features as input to a standard machine learning classifier

–>introduce some important node-level features and statistics
–>how these node-level statistics can be generalized to graph-level statistics
–>extended to design kernel methods over graphs

==>GOAL: introduce various heuristic statistics and graph properties

2.1.1 Node-level statistics and features

以Medicis家族的social network为例:

what properties or statistics of the Medici node distinguish it from the rest of the graph?
what are useful properties and statistics that we can use to characterize the nodes in this graph?

Node degree

$d_u=\sum\limits_{u\in V}\mathsf{A}[u,v]$表示这个节点伸出来的边的个数

Node centrality \(e_u=\frac{1}{\lambda}\sum\limits_{v\in V}\mathsf{A}[u,v]e_v\) 即centrality和neighbour的centrality 的 weighted sum成正比,其中$\lambda$为一个常数,按照上面的公式,可以发现e是Adjacency matrix的本征矢: \(\lambda e=\mathsf{A}e\)

还有其他的centrality:

The clustering coefficient

2.1.2 Graph-level features and graph kernels

Bag of nodes simply aggregating nodelevel statistics

The Weisfeiler-Lehman kernel iterative neighborhood aggregation: 通过迭代来包含整个图的信息(neighbour的neighbour的neighbour……)

basic idea:

  1. assign an initial label $l^{(0)}(v)$ to each node. 通常使用digree这个feature: $l^{(0)}(v)=d_v$
  2. iteratively assign a new label to each node by hashing the multiset of the current labels within the node’s neighborhood image.png 上述公式里面的“”表示multi-set(其中可以有重复的数),HASH表示一种映射: HASH:$\mathbb{R}^{|\mathcal{N}|}\rightarrow\mathbb{R}$
? 确认一下neighbourhood包不包含点自身
  1. K次迭代后,$l^{(K)}(v)$代表了K-hop neighbourhood –> feature representation for the graph
这里相当于对graph进行卷积,局域性 & 平移不变性

Graphlets and path-based methods node-level discussion中, 有一种确定graph feature的方法是数某种特征的subgraph的数量, 在此可以叫 graphlet.

the graphlet kernel:

alternative to enumerating:path-based method

只需要关心graph中不同的paths.
比如:the random walk kernel (2003) involves running random walks over the graph and then counting the occurrence of different degree sequences
比如:shortest-path kernel (2005) involves a similar idea but uses only the shortest-paths between nodes (rather than random walks)
? 没太搞懂这个方法
? degree sequence是啥

2.2 Neighborhood Overlap Detection

回顾上节:

上节讲node/graph-level features.

上节问题:

这些features并不能quantify the relationships between nodes
上面的features对relation prediction任务就没什么大用

本节关心的:

2.2.1 Local overlap measures

以上两种 index 相当于给了 low-degree neighbour 更高的权重,因为直觉上,a shared low-degree neighbor is more informative than a shared high-degree one.

2.2.2 Global overlap measures

只考虑local overlap(1-hop)不够:two nodes could have no local overlap in their neighborhoods but still be members of the same community in the graph.

Katz index

image.png 根据下面的理论可以改写成这个形式:

image.png $\mathsf{A}^i[u,v]$表示了从u到v,i步的path的条数。

如果$\beta<1$,long path的权重就会降低

Geometric series of matrices

> Theorem 1. *Let X be a real-valued square matrix and let λ1 denote the largest eigenvalue of X.*,,则有:

<br>$\mathsf{(I-X)^{-1}}=\sum\limits_{i=0}^{\infin}\mathsf{X^i}$

<br>当且仅当 λ1 < 1 且 (I-X) 满秩。

Leicht, Holme, and Newman (LHN) similarity

Random walk methods

stochastic matrix $P = AD^{-1}$ image.png

solution:

image.png

–> image.png

2.3 Graph Laplacians and Spectral Methods

goal: clustering learn low dimensional embeddings of nodes

2.3.1 Graph Laplacians


PART I : Node Embeddings

chap 3: Neighborhood Reconstruction Methods

44d7124db256b14322a40405546ab5a.jpg

0e96ae6442a329221da2388d308838a.jpg

4ac73b470149dc766ebaeb5d7a8e6dd.jpg

5020ea8c524f438cf953412b537f5da.jpg

chap 4: Multi-relational Data and Knowledge Graphs


PART II : Graph Neural Network

45aac99c32ff8d6d5b9249e13e696e0.jpg

55dd2b4abce31239af9074ff9e878cf.jpg

46e4cf9afd6f1eac8f86cefef27e14a.jpg

7366da36bf62b212ce9730f37b96a1f.jpg

© 2023 huyi   •  Powered by Soopr   •  Theme  Moonwalk