Merge pull request #33 from imohitmayank/training_llm

Training LLM article added + Section rename and shuffle
imohitmayank · Sep 4, 2024 · c650555 · c650555
2 parents 8daef07 + bec7795
commit c650555
Show file tree

Hide file tree

Showing 11 changed files with 231 additions and 10 deletions.
diff --git a/docs/imgs/nlp_trainingllm_cover.jpg b/docs/imgs/nlp_trainingllm_cover.jpg
diff --git a/docs/imgs/nlp_trainingllm_iterativetraining.png b/docs/imgs/nlp_trainingllm_iterativetraining.png
diff --git a/docs/imgs/nlp_trainingllms_4dparallelism.png b/docs/imgs/nlp_trainingllms_4dparallelism.png
diff --git a/docs/imgs/nlp_trainingllms_scalinglaws.png b/docs/imgs/nlp_trainingllms_scalinglaws.png
diff --git a/docs/imgs/rl_rlhf_instructgpt.png b/docs/imgs/rl_rlhf_instructgpt.png
diff --git a/docs/machine_learning/interview_questions.md b/docs/machine_learning/interview_questions.md
@@ -120,6 +120,8 @@
 
         Temperature allows you to control the trade-off between exploration and exploitation in the model's predictions. It's a hyperparameter that can be adjusted during training or inference to achieve the desired level of certainty in the model's output, depending on the specific requirements of your application.
 
+        Here is a good online [tool](https://artefact2.github.io/llm-sampling/index.xhtml) to learn about the impact of temperature and other parameters on output generation. 
+
 
 !!! Question ""
     === "Question"

diff --git a/docs/machine_learning/loss_functions.md b/docs/machine_learning/loss_functions.md
@@ -3,7 +3,7 @@
 
 ## Introduction
 
-- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0 and sometimes even be high enough so that we terminate training to handle overfitting.
+- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0.
 - We also have cost functions that is nothing but aggrgation of the loss functions over a batch or complete dataset. The cost function is the function that we use in practice to optimize the model.
 
 !!! Hint

diff --git a/docs/machine_learning/model_compression_quant.md b/docs/machine_learning/model_compression_quant.md
@@ -483,4 +483,6 @@ Fine-tuning the model can be done very easily using the `llama.cpp` library. Bel
 
 [8] LLM.int8() - [Blog](https://huggingface.co/blog/hf-bitsandbytes-integration)
 
-[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)
+[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)
+
+[10] [A Visual Guide to Quantization](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization)
diff --git a/docs/natural_language_processing/training_llm.md b/docs/natural_language_processing/training_llm.md
diff --git a/docs/reinforcement_learning/rlhf.md b/docs/reinforcement_learning/rlhf.md
@@ -56,6 +56,12 @@ Using human feedback in reinforcement learning has several benefits, but also pr
 
 - Reinforcement learning from human feedback (RLHF) has shown great potential in improving natural language processing (NLP) tasks. In NLP, the use of human feedback can help to capture the nuances of language and better align the agent's behavior with the user's expectations.
 
+<figure markdown> 
+    ![](../imgs/rl_rlhf_instructgpt.png)
+    <figcaption>PPO model trained with RLHF outperforming SFT and base models by OpenAI. Source [2]</figcaption>
+</figure>
+
+
 ### Summarization
 
 - One of the first examples of utilizing RLHF in NLP was proposed in [1] to improve summarization using human feedback. Summarization aims to generate summaries that capture the most important information from a longer text. In RLHF, human feedback can be used to evaluate the quality of summaries and guide the agent towards more informative and concise summaries. This is quite difficult to capture using the metrics like ROUGE as they miss the human preferences.
@@ -86,4 +92,6 @@ Using human feedback in reinforcement learning has several benefits, but also pr
 
 ## References
 
-[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
+[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
+
+[2] [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -83,6 +83,12 @@ nav:
           # -  'ChatGPT': 'natural_language_processing/chatgpt.md'
           -  'LLaMA': 'natural_language_processing/llama.md'
           -  'Mamba': 'natural_language_processing/mamba.md'
+      - 'Large Language Models':
+          - 'Training LLMs': 'natural_language_processing/training_llm.md'
+          - 'Prompt Engineering': 'natural_language_processing/prompt_engineering.md'
+          - 'natural_language_processing/explainable_ai_llm.md'
+          - 'natural_language_processing/streaming_chatgpt_gen.md'
+          - 'natural_language_processing/making_llm_multilingual.md'
       -  'Tasks':
           -  'natural_language_processing/paraphraser.md'
           -  'natural_language_processing/text_similarity.md'
@@ -93,13 +99,6 @@ nav:
           -  'Named Entity Recognition' : 'natural_language_processing/named_entity_recognition.md'
           -  'Natural Language Querying': 'natural_language_processing/nlq.md'
           # -  'Retrieval Augmented Generation (RAG)' : 'natural_language_processing/rag.md'
-      # - 'Techniques':
-          # - 'natural_language_processing/metrics.md'
-      - 'Techniques':
-          - 'Prompt Engineering': 'natural_language_processing/prompt_engineering.md'
-          - 'natural_language_processing/explainable_ai_llm.md'
-          - 'natural_language_processing/streaming_chatgpt_gen.md'
-          - 'natural_language_processing/making_llm_multilingual.md'
 
   - 'Audio Intelligence':
       - 'Interview Questions': 'audio_intelligence/interview_questions.md'