-
Notifications
You must be signed in to change notification settings - Fork 22
Framework Customization
(Bill) Yuchen Lin edited this page Jul 24, 2019
·
11 revisions
The AlpacaTag framework is able for customization on the back-end model (Model Customization) and the active learning methods (AL Customization).
You can build your own model stacking the modules in the project.
AlpacaTag:
- alpaca_client
+ alpaca_server:
+ pytorchAPI
- active_learning
+ models:
- cnn_bilstm_crf.py
- cnn_cnn_crf.py
...
- modules:
- baseRNN.py
- CharEncoderCNN.py
- DecoderCRF.py
...
- annotation
Using the modules we have, build your own model by simple stacking the modules. Then, put your model into models folder.
wrapper.py
self.model = CNN_BiLSTM_CRF(self.p.word_vocab_size,
self.word_embedding_dim,
self.word_lstm_size,
self.p.char_vocab_size,
self.char_embedding_dim,
self.char_lstm_size,
self.p._label_vocab.vocab, pretrained=embeddings)
change the name of model class into your model's name.
self.model = CUSTOMIZED_MODEL(self.p.word_vocab_size,
self.word_embedding_dim,
self.word_lstm_size,
self.p.char_vocab_size,
self.char_embedding_dim,
self.char_lstm_size,
self.p._label_vocab.vocab, pretrained=embeddings)
AlpacaTag:
- alpaca_client
+ alpaca_server:
+ pytorchAPI
+ active_learning:
- acquisition.py
- models
- modules
- annotation
Acquisition class has active learning methods as functions.
def customize_active_learning(self, dataset, model, num_instances, batch_size = 50):
model.train(False)
probs = np.ones(len(dataset))*float('Inf')
new_dataset = [datapoint for j,datapoint in enumerate(dataset) if j not in self.train_index]
new_datapoints = [j for j in range(len(dataset)) if j not in self.train_index]
data_batches = create_batches(new_dataset, batch_size = batch_size, str_words = True, tag_padded = False)
probscores = []
for data in data_batches:
scores = customize_method(data) # you need to change here
probscores.extend(scores)
probs[new_datapoints] = np.array(probscores)
test_indices = np.argsort(probs)
cur_indices = set()
i = 0
self.return_index = []
self.return_score = []
while len(cur_indices) < num_instances:
cur_indices.add(test_indices[i])
self.return_index.append(test_indices[i])
self.return_score.append(probs[test_indices[i]])
i += 1
self.train_index.update(cur_indices)
Acquisition class automatically manages indices of already sampled instances and still not sampled instances. To use this management, follow the above code and implement your own method (customize_method) on getting scores of the instances.