This repository contains the code to replicate the results in Graph2Seq paper.
- Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks.
- Official Github repository: here.
- Github repository of the dataset: WikiSQL.
In addition to the main resources, I also checked the following references:
- PyTorch Seq2Seq Tutorial
- The corresponding data can be downloaded from here.
- PyG_gcn
- The corresponding data can be downloaded from here.
- PyG_MP_Net
python==3.9
was used for the implementation.- Other dependencies:
torch==1.13
torch_geometric==2.2
numpy==1.23
pandas==1.5
- Please refer to
requirements.txt
for all the dependencies.
The Graph2Seq model contains two main components. In what follows, an overview of the files implementing these component (and their current state) is elaborated.
Graph2Seq:
- Graph Encoder & Graph Embedding
- ✅
graph_encoder.py
: Two different variation of a GNN model (GCN & Bi-GCN) are implemented. Bi-GCN follows the GNN architecture explained in the paper. The underlying convolution layer used in Bi-GCN is implemented inconv_layer.py
. The graph encoder is complete and its functionality can be tested separately (by runninggraph_encoder.py
file).
- ✅
- Attention-Based Decoder
- ✅
attention_decoder.py
: This file contains the implementation of the attention-based decoder. To check its correct functionality, it has been tested inSeq2Seq_model.py
as the decoder part of a sequence-to-sequence translation task.
- ✅
Other files and their functionalities are as follows:
- ✅
params.py
: contains different parameters. - ✅
parser.py
: parses the required arguments. - ✅
utils.py
: contains some utility functions and classes. - 👀
main.py
: controls the main flow of the procedure that consists of:- Data loading and processing: The data should be loaded and processed to the correct format usable by the model in this part of the code. This part is incomplete 👀. However, I wrote the assumptions about the data format, which also specifies what steps I need to take to prepare the data.
- Model definition: Here, different components of the model, their corresponding optimizers, and the criterion are defined.
- Training & validation:
Here, the training and validation takes place.
The training and validation procedure is implemented in
train.py
file. - Testing:
Here the trained model is tested with the test split of the data.
The evaluation of the test split is implemented in
eval.py
file.
- ✅
train.py
: This file contains the training and validation procedure. - ✅
eval.py
: This file contains the evaluation procedure. - 👀
data_proc/data_loading.py
: Here, the data should be loaded and processed to the correct format. The original data contains natural language question, SQL queries, and SQL tables. The SQL queries need to be converted to graph so that they can be used by the graph encoder.