Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Train Ranking Model Using Pre-Train Model #1223

Open
mustfkeskin opened this issue Oct 26, 2023 · 0 comments
Open

[QST] Train Ranking Model Using Pre-Train Model #1223

mustfkeskin opened this issue Oct 26, 2023 · 0 comments

Comments

@mustfkeskin
Copy link

❓ Questions & Help

Details

I want to built ranking model using pre-train embeddings

  1. I can train a model using embedding lookups, but the input of the model will be an id-based feature
  2. I want to give embedding to my model at inference time not id based feature. How can I do this?

I follow this tutorial. I don't have any problems while training the DCN model. After the model training is completed, I want to change the input of the model id to embedding.

My code

import nvtabular as nvt
from nvtabular import ops
cat_features = ["query", "title"] >> ops.Categorify(dtype="int32", 
                                                    out_path="../data/categories",
                                                    freq_threshold={"query":0, "title":0}
                                                   )


from merlin.models.utils.example_utils import workflow_fit_transform

train_path = os.path.join("../data/train.parquet")
valid_path = os.path.join("../data/val.parquet")
output_path = os.path.join("../data/integration")

workflow_fit_transform(output, train_path, valid_path, output_path)


query_embs = np.random.random((2000, 64))
title_embs = np.random.random((2000, 64))


embed_dims = {}
embed_dims = {"query" : query_embs.shape[1],
              "title" : title_embs.shape[1]
             }

embeddings_init = {
    "query": mm.TensorInitializer(query_embs),
    "title": mm.TensorInitializer(title_embs),
}

embeddings_block = mm.Embeddings(
    train.schema.select_by_tag(Tags.CATEGORICAL),
    infer_embedding_sizes=True,
    embeddings_initializer=embeddings_init,
    trainable={'query': False,
               'title': False},
    dim=embed_dims,
)
input_block = mm.InputBlockV2(train.schema, categorical=embeddings_block)


model = mm.DCNModel(
    train.schema,
    depth=2,
    input_block=input_block,
    deep_block=mm.MLPBlock([64, 32]),
    prediction_tasks=mm.BinaryOutput(target_column)
)

model.compile(optimizer="adam")
model.fit(train, batch_size=1024, epochs=10)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant