Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MindSpore cannot convert the data types of model parameters #313

Open
PhyllisJi opened this issue Oct 31, 2024 · 0 comments
Open

MindSpore cannot convert the data types of model parameters #313

PhyllisJi opened this issue Oct 31, 2024 · 0 comments

Comments

@PhyllisJi
Copy link

Environment

Hardware Environment(Ascend/GPU/CPU): CPU/GPU

Software Environment:

  • MindSpore version (source or binary): 2.2.14
  • Python version (e.g., Python 3.7.5): 3.8
  • OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04
  • GCC/Compiler version (if compiled from source):

Describe the current behavior

MindSpore’s lack of flexibility in converting model parameter data types, unlike other frameworks, introduces several limitations that impact its effectiveness in various scenarios. There are several flaws in this design:

  1. Limited Memory Flexibility: Different data types require varying amounts of memory, and the ability to switch types helps optimize memory usage on constrained devices. A fixed data type can lead to unnecessary memory consumption or prevent larger models from being loaded on memory-limited devices.

  2. Reduced Compatibility and Portability: In multi-framework environments (e.g., converting models between PyTorch and MindSpore), restrictions on data types increase the difficulty of model transfer. This can lead to inconsistent model performance across platforms, affecting model accuracy and reliability.

  3. Lowered User Experience: Developers often need to adjust data types based on specific task requirements, especially when balancing accuracy and performance. Without flexible data type conversion, users face limitations in training and deploying models, reducing the overall usability of the framework.

Describe the expected behavior

Like any other framework

Steps to reproduce the issue

import mindspore
import numpy as np
import random


class Model_ObroUQ4fAbnFbmOInLYaYRvj8vfg1MYT(mindspore.nn.Cell):
    def __init__(self):
        super(Model_ObroUQ4fAbnFbmOInLYaYRvj8vfg1MYT, self).__init__()
        self.conv1_mutated = mindspore.nn.Conv2d(in_channels=1, out_channels=6, kernel_size=(8, 8), stride=(1, 1), pad_mode="pad", padding=(0, 0, 0, 0), dilation=(1, 1), group=1, has_bias=True)
        self.tail_flatten = mindspore.nn.Flatten(start_dim=1, end_dim=-1)
        self.tail_fc = mindspore.nn.Dense(in_channels=2646, out_channels=10)

    def construct(self, input):
        conv1_output = self.conv1_mutated(input)
        tail_flatten_output = self.tail_flatten(conv1_output)
        tail_fc_output = self.tail_fc(tail_flatten_output)

        tail_fc_output = tail_fc_output
        return tail_fc_output


def set_mindspore_params(ms_model, init_params):
    import mindspore

    for name, param in ms_model.parameters_and_names():
        if name in init_params:
            target_params = init_params[name]
            if len(target_params.shape) == 2:
                target_params = init_params[name].T

            if str(target_params.dtype).endswith('float16'):
                dtype = mindspore.float16
            elif str(target_params.dtype).endswith('float64'):
                dtype = mindspore.float64
            else:
                dtype = mindspore.float32

            param.set_data(mindspore.Tensor(target_params, dtype))
    return ms_model


def set_paddle_params(paddle_model, init_params):
    import paddle

    for name, param in paddle_model.named_parameters():
        if name in init_params:
            param_data = init_params[name]
            if "weight" in name:
                if len(param_data.shape) == 2:
                    if param_data.shape == (param.shape[1], param.shape[0]):
                        param_data = param_data.T  # 转置矩阵
            param.set_value(paddle.to_tensor(param_data, dtype=param_data.dtype, place=param.place))
    return paddle_model


# tf can set the data type of the model by changing the dtype of the parameter
def tf_change_model_dtype(model, is_gpu):
    import tensorflow as tf

    dtype = "float16" if is_gpu else "float64"
    for variable in model.variables:
        new_variable = tf.cast(variable, dtype)
        model.variables.append(new_variable)
    return model


# torch can set the data type of the model through function calls
def torch_change_model_dtype(model, is_gpu):
    if not is_gpu:
        model = model.double()
    else:
        model = model.half()

    return model


# paddle can set the data type of the model by setting environment variables
def paddle_change_model_dtype(model_class, is_gpu, init_params):
    import paddle
    paddle.set_device('gpu:1') if is_gpu else paddle.set_device('cpu')
    paddle.set_default_dtype('float16') if is_gpu else paddle.set_default_dtype('float64')
    model = model_class()
    new_init_params = {}
    if is_gpu:
        for name, param in init_params.items():
            new_init_params[name] = param.astype('float16')
    else:
        for name, param in init_params.items():
            new_init_params[name] = param.astype('float64')
    model = set_paddle_params(model, new_init_params)
    paddle.set_default_dtype('float32')

    return model


# When we try all three methods, we cannot change the data type of the ms model
def ms_change_model_dtype(model_class, is_gpu, init_params):
    import mindspore

    mindspore.context.set_context(device_target=('GPU' if is_gpu else 'CPU'))
    model = model_class
    new_init_params = {}
    for name, param in init_params.items():
        new_init_params[name] = param.astype('float16')
    model = set_mindspore_params(model, new_init_params)
    return model


def get_mindspore_params(model):
    params = {}
    for name, param in model.parameters_and_names():
        target_params = param.numpy()
        if len(target_params.shape) == 2:
            target_params = target_params.T
        params[name] = target_params
    return params


ms_model = Model_ObroUQ4fAbnFbmOInLYaYRvj8vfg1MYT()
ms_input = mindspore.Tensor(np.random.randn(1, 1, 28, 28).astype(np.float16))

init_params = get_mindspore_params(ms_model)
ms_model = ms_change_model_dtype(ms_model, False, init_params)
ms_model(ms_input)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant