Skip to content

Commit

Permalink
Merge pull request #33 from imohitmayank/training_llm
Browse files Browse the repository at this point in the history
Training LLM article added + Section rename and shuffle
  • Loading branch information
imohitmayank authored Sep 4, 2024
2 parents 8daef07 + bec7795 commit c650555
Show file tree
Hide file tree
Showing 11 changed files with 231 additions and 10 deletions.
Binary file added docs/imgs/nlp_trainingllm_cover.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/nlp_trainingllm_iterativetraining.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/nlp_trainingllms_4dparallelism.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/nlp_trainingllms_scalinglaws.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/rl_rlhf_instructgpt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/machine_learning/interview_questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@

Temperature allows you to control the trade-off between exploration and exploitation in the model's predictions. It's a hyperparameter that can be adjusted during training or inference to achieve the desired level of certainty in the model's output, depending on the specific requirements of your application.

Here is a good online [tool](https://artefact2.github.io/llm-sampling/index.xhtml) to learn about the impact of temperature and other parameters on output generation.


!!! Question ""
=== "Question"
Expand Down
2 changes: 1 addition & 1 deletion docs/machine_learning/loss_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

## Introduction

- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0 and sometimes even be high enough so that we terminate training to handle overfitting.
- Loss functions are the "ideal objectives" that neural networks (NN) tries to optimize. In fact, they are the mathematical personification of what we want to achieve with the NN. As the name suggests, it is a function that takes input and compute a loss value that determines how further away the current model is from the ideal model for that example. In an ideal world, we would expect the loss value to be 0, but in reality it could get very close to 0.
- We also have cost functions that is nothing but aggrgation of the loss functions over a batch or complete dataset. The cost function is the function that we use in practice to optimize the model.

!!! Hint
Expand Down
4 changes: 3 additions & 1 deletion docs/machine_learning/model_compression_quant.md
Original file line number Diff line number Diff line change
Expand Up @@ -483,4 +483,6 @@ Fine-tuning the model can be done very easily using the `llama.cpp` library. Bel

[8] LLM.int8() - [Blog](https://huggingface.co/blog/hf-bitsandbytes-integration)

[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)
[9] GGUF/GGML - [Official Docs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | [Blog - Quantize Llama_2 models using GGML](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html) | [K Quants](https://github.com/ggerganov/llama.cpp/pull/1684)

[10] [A Visual Guide to Quantization](https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization)
210 changes: 210 additions & 0 deletions docs/natural_language_processing/training_llm.md

Large diffs are not rendered by default.

10 changes: 9 additions & 1 deletion docs/reinforcement_learning/rlhf.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ Using human feedback in reinforcement learning has several benefits, but also pr

- Reinforcement learning from human feedback (RLHF) has shown great potential in improving natural language processing (NLP) tasks. In NLP, the use of human feedback can help to capture the nuances of language and better align the agent's behavior with the user's expectations.

<figure markdown>
![](../imgs/rl_rlhf_instructgpt.png)
<figcaption>PPO model trained with RLHF outperforming SFT and base models by OpenAI. Source [2]</figcaption>
</figure>


### Summarization

- One of the first examples of utilizing RLHF in NLP was proposed in [1] to improve summarization using human feedback. Summarization aims to generate summaries that capture the most important information from a longer text. In RLHF, human feedback can be used to evaluate the quality of summaries and guide the agent towards more informative and concise summaries. This is quite difficult to capture using the metrics like ROUGE as they miss the human preferences.
Expand Down Expand Up @@ -86,4 +92,6 @@ Using human feedback in reinforcement learning has several benefits, but also pr

## References

[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
[1] [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)

[2] [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
13 changes: 6 additions & 7 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,12 @@ nav:
# - 'ChatGPT': 'natural_language_processing/chatgpt.md'
- 'LLaMA': 'natural_language_processing/llama.md'
- 'Mamba': 'natural_language_processing/mamba.md'
- 'Large Language Models':
- 'Training LLMs': 'natural_language_processing/training_llm.md'
- 'Prompt Engineering': 'natural_language_processing/prompt_engineering.md'
- 'natural_language_processing/explainable_ai_llm.md'
- 'natural_language_processing/streaming_chatgpt_gen.md'
- 'natural_language_processing/making_llm_multilingual.md'
- 'Tasks':
- 'natural_language_processing/paraphraser.md'
- 'natural_language_processing/text_similarity.md'
Expand All @@ -93,13 +99,6 @@ nav:
- 'Named Entity Recognition' : 'natural_language_processing/named_entity_recognition.md'
- 'Natural Language Querying': 'natural_language_processing/nlq.md'
# - 'Retrieval Augmented Generation (RAG)' : 'natural_language_processing/rag.md'
# - 'Techniques':
# - 'natural_language_processing/metrics.md'
- 'Techniques':
- 'Prompt Engineering': 'natural_language_processing/prompt_engineering.md'
- 'natural_language_processing/explainable_ai_llm.md'
- 'natural_language_processing/streaming_chatgpt_gen.md'
- 'natural_language_processing/making_llm_multilingual.md'

- 'Audio Intelligence':
- 'Interview Questions': 'audio_intelligence/interview_questions.md'
Expand Down

0 comments on commit c650555

Please sign in to comment.