Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
# Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
This repository contains the code used for the paper [Leveraging Grammar and
Reinforcement Learning for Neural Program
Synthesis](https://openreview.net/forum?id=H1Xw62kRZ).
## Requirements
We recommend installing this code into a virtual environment. In order to run
the code, you first need to install pytorch, following the instructions from
[the pytorch website](http://pytorch.org/). Once this is done, you can install
this package and its dependencies by running:
```bash
pip install cython
python setup.py install
```
The experiments in the original paper were run using the dataset found at [the
Karel dataset webpage](https://msr-redmond.github.io/karel-dataset/). We
recommend you download and extract them into the `./data** directory.
## Commands
The code can be interacted with using two commands: `train_cmd.py` to perform
training of a model and `eval_cmd.py` to perform testing. This section
introduces the possible option, you can also use `--help` to see what is
available.
### Train
* `--kernel_size`, `--conv_stack`, `--fc_stack`, `--tgt_embedding_size`,
`--lstm_hidden_size`, `--nb_lstm_layers` are flags to specify the architecture
of the model to learn. See `nps/network.py` to see how they are used.
`--nb_ios` specifies how many of the IO pairs should be used as inputs to the
encoder (note that due to the architecture, even a model trained with `x` IO
can be used to do prediction, even if a different number of IOs is available
at test time).
* `--use_grammar` makes the model use the handwritten syntax checker, found in
`syntax/checker.pyx`. `--learn_syntax` adds a Syntax LSTM to the model that
attempts to learn a syntax checker, jointly with the rest of the model. The
importance of this objective is controlled by the `--beta` parameter.
* `--signal` allows to choose the loss, between `supervised`, `rl` and
`beam_rl`. Supervised attempts to reproduce the ground truth program, while
`rl` and `beam_rl` try to maximize expected rewards. What rewards are used is
specified using the `--environment` argument (it can be Consistency to
evaluate coherence of the programs with the observed IO grids, Generalization
to also take into account the held out pair, or Perf to additionally include
consideration about number of steps taken.) In the case where the beam search
approximation is used, it is also possible to specify a Reward Combination
Function using `--reward_comb`. The default one is `RenormExpected` but the
"bag of samples" version can be used by choosing `X1m1BagExpected` for 1/-1
rewards or `XBagExpected` for the general case. In order to be able to fit
experiments in a single GPU, you may need to adjust `--nb_rollouts` (how many
samples are taken from the model to estimate a gradient when using `rl`) or
`--rl_beam` (the size of the beam search when using `beam_rl`). There is also
the `--rl_inner_batch` option that splits the computation of a batch into
several minibatches that are separately evaluated before doing a gradient
step.
* `--optim_alg` chooses the optimization algorithm used, `--batch_size` allows
to choose the size of the mini batches. `--learning_rate` adjusts the learning
rate. `--init_weights` can be used to specify a '.model' file from which to
load weights.
* `--train_file` specify the json file where to look for the training samples
and `--val_file` indicates a validation set. The validation set is used to
keep track of the best model seen so far, so as to perform early stopping. The
`--vocab` file is there to give a correspondence between tokens and indices in
the learned predictions. Setting `--nb_samples` allows to train on only part
of the dataset (0, the default, trains on the whole dataset.).
`--result_folder` allows to indicate where the results of the experiment
should be stored. Changing `--val_frequency` allows to evaluate accuracy on
the validation set less frequently.
* Specify `--use_cuda` to run everything on a GPU. You can use the
`CUDA_VISIBLE_DEVICES` to run on a specific GPU.
```bash
# Train a simple supervised model, using the handcoded syntax checker
train_cmd.py --kernel_size 3 \
--conv_stack "64,64,64" \
--fc_stack "512" \
--tgt_embedding_size 256 \
--lstm_hidden_size 256 \
--nb_lstm_layers 2 \
\
--signal supervised \
--nb_ios 5 \
--nb_epochs 100 \
--optim_alg Adam \
--batch_size 128 \
--learning_rate 1e-4 \
\
--train_file data/1m_6ex_karel/train.json \
--val_file data/1m_6ex_karel/val.json \
--vocab data/1m_6ex_karel/new_vocab.vocab \
--result_folder exps/supervised_use_grammar \
\
--use_grammar \
\
--use_cuda
# Train a supervised model, learning the grammar at the same time
train_cmd.py --kernel_size 3 \
--conv_stack "64,64,64" \
--fc_stack "512" \
--tgt_embedding_size 256 \
--lstm_hidden_size 256 \
--nb_lstm_layers 2 \
\
--signal supervised \
--nb_ios 5 \
--nb_epochs 100 \
--optim_alg Adam \
--batch_size 128 \
--learning_rate 1e-4 \
--beta 1e-5 \
\
--train_file data/1m_6ex_karel/train.json \
--val_file data/1m_6ex_karel/val.json \
--vocab data/1m_6ex_karel/new_vocab.vocab \
--result_folder exps/supervised_learn_grammar \
\
--learn_syntax \
\
--use_cuda
# Use a pretrained model, to fine-tune it using simple Reinforce
# Change the --environment flag if you want to use a reward including performance.
train_cmd.py --signal rl \
--environment BlackBoxGeneralization \
--nb_rollouts 100 \
\
--init_weights exps/supervised_use_grammar/Weights/best.model \
--nb_epochs 5 \
--optim_alg Adam \
--learning_rate 1e-5 \
--batch_size 16 \
\
--train_file data/1m_6ex_karel/train.json \
--val_file data/1m_6ex_karel/val.json \
--vocab data/1m_6ex_karel/new_vocab.vocab \
--result_folder exps/reinforce_finetune \
\
--use_grammar \
\
--use_cuda
# Use a pretrained model, fine-tune it using BS Expected reward
# Change the --environment flag if you want to use a reward including performance.
# Change the --reward_comb flag if you want to use one of the "bag of samples" loss
# Remove the --rl_use_ref flag if you don't want to make use of the known ground truth in
# the bag.
train_cmd.py --signal beam_rl \
--environment BlackBoxGeneralization \
--reward_comb RenormExpected \
--rl_inner_batch 8 \
--rl_use_ref \
--rl_beam 64 \
\
--init_weights exps/supervised_use_grammar/Weights/best.model \
--nb_epochs 5 \
--optim_alg Adam \
--learning_rate 1e-5 \
--batch_size 16 \
\
--train_file data/1m_6ex_karel/train.json \
--val_file data/1m_6ex_karel/val.json \
--vocab data/1m_6ex_karel/new_vocab.vocab \
--result_folder exps/beamrl_finetune \
\
--use_grammar \
\
--use_cuda
```
### Evaluation
The evaluation command is fairly similar. Any flags non-specified has the same
role as for the `train_cmd.py` command. The relevant file is `nps/evaluate.py`.
* `--model_weights` should point to the model to evaluate.
* `--dataset` should point to the json file containing the dataset you want to
evaluate against.
* `--output_path` points to where the results should be written. This should be
a prefix for all the names of the files that will be generated, followed
* `--dump_programs` can be used to investigate by dumping the programs returned
by the model.
* `--eval_nb_ios` is analogous to `--nb_ios` during training, how many IO pairs
should be used as input to the model.
* `--val_nb_samples` is analogous to `--nb_samples`, can be used to do
evaluation on only part of the dataset.
* `--eval_batch_size` specifies the batch size to use during decoding. This
doesn't affect accuracies and batching operations only allows to go faster.
* `--beam_size` controls the size of the beam search to run when decoding the
programs and `--top_k` should be the largest integer for which the accuracies
should be computed.
This will generate a set of files. If `--dump_programs` is passed, the `--top_k`
most likely programs for each element of the dataset will be dumped, with their
rank and their log-probability in the `generated` subfolder. This will also
include the reference program, under the name `target`.
The values at various ranks are reported in the generated files. `exactmatch`
corresponds to exactly reproducing the input, `semantic` corresponds to
generating a program being correct on the observed IOs, `fullgeneralize` means
generating a program correct on the observed AND held-out IOs. `syntax` simply
indicates that the program was synctatically correct. If the file,
`semantic_top3.txt` contains the number 75.00, this means that for 75.00% of the
samples, one of the top 3 programs according to the model will be semantically
correct on the observed samples.
```bash
# Evaluate a trained model on the validation set, dumping programs to allow for debugging.
eval_cmd.py --model_weights exps/supervised_use_grammar/Weights/best.model \
\
--vocabulary data/1m_6ex_karel/new_vocab.vocab \
--dataset data/1m_6ex_karel/val.json \
--eval_nb_ios 5 \
--eval_batch_size 8 \
--output_path exps/supervised_use_grammar/Results/ValidationSet_ \
\
--beam_size 64 \
--top_k 10 \
--dump_programs \
--use_grammar \
\
--use_cuda
# Evaluate a trained model on the test set
eval_cmd.py --model_weights exps/beamrl_finetune/Weights/best.model \
\
--vocabulary data/1m_6ex_karel/new_vocab.vocab \
--dataset data/1m_6ex_karel/test.json \
--eval_nb_ios 5 \
--eval_batch_size 8 \
--output_path exps/beamrl_finetune/Results/TestSet_ \
\
--beam_size 64 \
--top_k 10 \
--use_grammar \
\
--use_cuda
```
## Citation
If you use this code in your research, consider citing:
```
@Article{Bunel2018,
author = {Bunel, Rudy and Hausknecht, Matthew and Devlin, Jacob and Singh, Rishabh and Kohli, Pushmeet},
title = {Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis},
journal = {ICLR},
year = {2018},
}
```