Skip to content

Latest commit

 

History

History
185 lines (139 loc) · 7.69 KB

README.md

File metadata and controls

185 lines (139 loc) · 7.69 KB

Configuration file

The YAML schema

Here we report a schematic representation of a generic YAML file for the configuration of any multimodal feature extraction.

dataset_path: <path_to_the_dataset_folder> # either relative or absolute path 

gpu_list: <list_of_gpu_ids> # list of gpu ids to use during the extraction, -1 for cpu computation

visual|textual|audio: 

  items|interactions:
  
    input_path: <input_file_or_folder> # this path is relative to dataset_path
    
    output_path: <output_folder> # this path is relative to dataset_path
    
    [item_column]: <column_for_item_descriptions> # OPTIONAL, the column name for the item description in the tsv file [1]
    
    [interaction_column]: <column_for_interaction_reviews> # OPTIONAL, the column name for the interaction reviews in the tsv file [2]
    
    model: [
      {
        name: <model_name>, # as indicated in the specific backend you are using [3]
        
        output_layers: <list_of_output_layers>, # as indicated in the specific backend you are using [4]
        
        [reshape]: <reshape_size>, # OPTIONAL, a tuple only for visual modality
        
        [clear_text]: <whether_to_clear_input_text>, # OPTIONAL, a boolean for textual modality
        
        [backend]: <backend_for_pretrained_model>, # OPTIONAL, the backend to use for the pretrained model [3]
        
        [task]: <pretrained_model_task>, # OPTIONAL, only for textual modality [5]
      }
    
      ...
    
    ]
  
  ... 

...

Notes

Please refer to the [*] reported in the YAML schema from above.

[1] In case of textual/items, the tsv input file is supposed to be formatted in the following manner:

<ITEM_ID_COLUMN_NAME>\t<ITEM_DESCRIPTION_COLUMN_NAME>
[first_item_id]\t[first_item_description]
...
[last_item_id]\t[last_item_description]

where <ITEM_ID_COLUMN_NAME> and <ITEM_DESCRIPTION_COLUMN_NAME> are customizable. Note that if no item_column is provided in the configuration file, Ducho takes the last column (i.e., <ITEM_DESCRIPTION_COLUMN_NAME>) of the tsv file as item column by default.

[2] In case of textual/interactions, the tsv input file is supposed to be formatted in the following manner:

<USER_ID_COLUMN_NAME>\t<ITEM_ID_COLUMN_NAME>\t<REVIEW_COLUMN_NAME>
[first_user_id]\t[first_item_id]\t[first_review]
...
[last_user_id]\t[last_item_id]\t[last_review]

where <USER_ID_COLUMN_NAME>, <ITEM_ID_COLUMN_NAME>, and <REVIEW_COLUMN_NAME> are customizable. Note that if no interaction_column is provided in the configuration file, Ducho takes the last column (i.e., <REVIEW_COLUMN_NAME>) of the tsv file as interaction column by default.

[3] We provide a modality/backend table for this:

Audio
Video Textual
Tensorflow link
PyTorch link link
Transformers link Transformers: link
SentenceTransformers: link

[4] Depending on the backend you are using:

  • TensorFlow: use the exact same naming scheme obtained by calling the method summary() on the instantiated model object. For example:
import tensorflow
resnet50 = getattr(tensorflow.keras.applications, 'ResNet50')()
print(resnet50.summary())

"""
here is the final part of the console output:
...
conv5_block3_add (Add)         (None, 7, 7, 2048)   0           ['conv5_block2_out[0][0]',       
                                                                  'conv5_block3_3_bn[0][0]']      
                                                                                                  
 conv5_block3_out (Activation)  (None, 7, 7, 2048)   0           ['conv5_block3_add[0][0]']       
                                                                                                  
 avg_pool (GlobalAveragePooling  (None, 2048)        0           ['conv5_block3_out[0][0]']       
 2D)                                                                                              
                                                                                                  
 predictions (Dense)            (None, 1000)         2049000     ['avg_pool[0][0]']               
                                                                                                  
==================================================================================================
Total params: 25,636,712
Trainable params: 25,583,592
Non-trainable params: 53,120
__________________________________________________________________________________________________

in this case, for example, 'avg_pool' is what we are looking for.

"""
  • PyTorch+Textual: indicate the minimum path to reach the output layer (using the exact same names obtained by instantiating the model object and separated by '.'). For example:
import torchvision
alexnet = getattr(torchvision.models, 'alexnet')(weights='DEFAULT')
print(alexnet)

"""
here is the console output:
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

in this case, for example, 'classifier.3' is what you are looking for.
"""
  • PyTorch+Audio: depending on the pre-trained model, you may be asked to indicate the layer number in ascending (i.e., [0, L-1]) or descending order (i.e., [L-1, 0]). Once again, just instantiate the model and see its structure to find it out.

  • Transformers+Textual: you are asked to indicate the layer number in descending order (i.e., [L-1, 0]). In the case of SentenceTransformers, you don't really need to indicate any output layer (the backend already comes with its own extraction which is fixed).

  • Transformers+Audio: you are asked to indicate the layer number in ascending order (i.e., [L-1, 0]).

[5] The list of available tasks is here.