Merge pull request #27 from imohitmayank/feb_2024

Practical part added in Model Compression, and others.
imohitmayank · Mar 3, 2024 · 63013e0 · 63013e0
2 parents 272d028 + 72cfb6b
commit 63013e0
Show file tree

Hide file tree

Showing 7 changed files with 467 additions and 12 deletions.
diff --git a/docs/data_science_tools/python_snippets.md b/docs/data_science_tools/python_snippets.md
@@ -499,6 +499,20 @@ def send_message_to_slack(message):
 send_message_to_slack("test")
 ```
 
+## Colab Snippets
+
+- [Google Colab](https://colab.research.google.com/) is the go-to place for many data scientists and machine learning engineers who are looking to perform quick analysis or training for free. Below are some snippets that can be useful in Colab.
+
+- If you are getting `NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968` or similar error when trying to run `!pip install` or similar CLI commands in Google Colab, you can fix it by running the following command before running `!pip install`. But note, this might break some imports. So make sure to import all the packages before running this command.
+
+```python linenums="1"
+import locale
+locale.getpreferredencoding = lambda: "UTF-8"
+
+# now import
+# !import ...
+```
+
 <!-- ## Python Classmethod vs Staticmethod
 
 https://stackoverflow.com/questions/12179271/meaning-of-classmethod-and-staticmethod-for-beginner -->
diff --git a/docs/imgs/ml_modelcompression_quant_awq.png b/docs/imgs/ml_modelcompression_quant_awq.png
diff --git a/docs/imgs/ml_modelcompression_quant_awq2.png b/docs/imgs/ml_modelcompression_quant_awq2.png
diff --git a/docs/imgs/ml_quantization_thebloke_llama.png b/docs/imgs/ml_quantization_thebloke_llama.png
diff --git a/docs/machine_learning/ML_snippets.md b/docs/machine_learning/ML_snippets.md
@@ -276,6 +276,10 @@ torch.cuda.get_device_name(0)
 ## Output: 'GeForce MX110'
 ```
 
+## Monitor GPU usage
+
+- If you want to continuously monitor the GPU usage, you can use `watch -n 2 nvidia-smi --id=0` command. This will refresh the `nvidia-smi` output every 2 second.
+
 ## HuggingFace Tokenizer
 
 - Tokenizer is a pre-processing step that converts the text into a sequence of tokens. [HuggingFace tokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer) is a wrapper around the [tokenizers library](https://github.com/huggingface/tokenizers), that contains multiple base algorithms for fast tokenization.
@@ -309,6 +313,74 @@ vocabulary = tokenizer.get_vocab()
 # vocabulary['hello'] returns 7592
 ```
 
+## Explore Model
+
+- You can use the `summary` method to check the model's architecture. This will show the layers, their output shape and the number of parameters in each layer.
+
+=== "Keras"
+    ``` python linenums="1"
+    # import
+    from keras.models import Sequential
+    from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
+
+    # create a model
+    model = Sequential()
+    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
+    model.add(MaxPooling2D((2, 2)))
+    model.add(Conv2D(64, (3, 3), activation='relu'))
+    model.add(MaxPooling2D((2, 2)))
+    model.add(Conv2D(64, (3, 3), activation='relu'))
+    model.add(Flatten())
+    model.add(Dense(64, activation='relu'))
+    model.add(Dense(10, activation='softmax))
+
+    # print the model summary
+    model.summary()
+    ```
+
+=== "PyTorch"
+    ``` python linenums="1"
+    # import
+    import torch
+    import torch.nn as nn
+
+    # create a model
+
+    class Net(nn.Module):
+        def __init__(self):
+            super(Net, self).__init__()
+            self.conv1 = nn.Conv2d(1, 32, 3, 1)
+            self.conv2 = nn.Conv2d(32, 64, 3, 1)
+            self.conv3 = nn.Conv2d(64, 64, 3, 1)
+            self.fc1 = nn.Linear(1024, 64)
+            self.fc2 = nn.Linear(64, 10)
+
+        def forward(self, x):
+            x = F.relu(self.conv1(x))
+            x = F.max_pool2d(x, 2, 2)
+            x = F.relu(self.conv2(x))
+            x = F.max_pool2d(x, 2, 2)
+            x = F.relu(self.conv3(x))
+            x = x.view(-1, 1024)
+            x = F.relu(self.fc1(x))
+            x = self.fc2(x)
+            return F.log_softmax(x, dim=1)
+
+    # create an instance of the model
+    model = Net()
+    # print the model summary
+    print(model)
+    ```
+
+- To check the named parameters of the model and their dtypes, you can use the following code,
+
+=== "PyTorch"
+    ``` python linenums="1"
+    print(f"Total number of names params: {len(list(model.named_parameters()))}")
+    print("They are - ")
+    for name, param in model.named_parameters():
+        print(name, param.dtype)
+    ```
 <!-- ## Tensor operations
 
 - Tensors are the building blocks of any Deep Learning project. Here, let's go through some common tensor operations,

diff --git a/docs/machine_learning/interview_questions.md b/docs/machine_learning/interview_questions.md
@@ -466,4 +466,12 @@
 
     === "Answer"
         
-        XGBoost (Extreme Gradient Boosting) is a specific implementation of the Gradient Boosting method that uses a more efficient tree-based model and a number of techniques to speed up the training process and reduce overfitting. XGBoost is commonly used in machine learning competitions and it's one of the most popular libraries used for gradient boosting. It's used for classification and regression problems.
+        XGBoost (Extreme Gradient Boosting) is a specific implementation of the Gradient Boosting method that uses a more efficient tree-based model and a number of techniques to speed up the training process and reduce overfitting. XGBoost is commonly used in machine learning competitions and it's one of the most popular libraries used for gradient boosting. It's used for classification and regression problems.
+
+!!! Question ""
+    === "Question"
+        #### What is `group_size` in context of Quantization?
+
+    === "Answer"
+        
+        Group size is a parameter used in the quantization process that determines the number of weights or activations *(imagine weights in a row of matrix)* that are quantized together. A smaller group size can lead to better quantization accuracy, but it can also increase the memory and computational requirements of the model. Group size is an important hyperparameter that needs to be tuned to achieve the best trade-off between accuracy and efficiency. Note, the default groupsize for a GPTQ is 1024. [Refer this interesting Reddit discussion](https://www.reddit.com/r/LocalLLaMA/comments/12rtg82/what_is_group_size_128_and_why_do_30b_models_give/?rdt=46348)