Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev deepfm for check #385

Open
wants to merge 23 commits into
base: dev_deepfm
Choose a base branch
from

Commits on Apr 16, 2022

  1. Fix swin dataloader import bug (Oneflow-Inc#334)

    * fix import bug
    
    * refine
    
    * code format
    
    * fix comment
    BBuf authored Apr 16, 2022
    Configuration menu
    Copy the full SHA
    38e47c5 View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2022

  1. Dlrm use of roc auc score (Oneflow-Inc#331)

    * eval graph return broadcast
    
    * fix
    
    * eval data pipeline
    
    * pred to local
    
    * dense&label to float
    
    * use flow roc_auc_score
    
    * add throughput
    
    * prefetch eval batches
    
    * datareader worker=1
    
    * use sklearn roc auc score
    
    * default value for table size array
    
    * update readme
    
    * rm dtype in to global
    
    * use of roc_auc_score in dlrm
    
    * to_global is_balanced=True
    
    * make dataset by spark
    
    * sync eval
    
    * rm is_balanced=True
    
    * fix start time
    
    * rm sklearn in requirements
    
    * fix sparse missing value
    
    * update
    ShawnXuan authored Apr 19, 2022
    Configuration menu
    Copy the full SHA
    1318117 View commit details
    Browse the repository at this point in the history

Commits on Apr 21, 2022

  1. Dlrm spark tool (Oneflow-Inc#336)

    * dev_dlrm_dockerfile
    
    * rm dockerfile
    
    * add tools
    
    * update
    
    * update
    
    * fix
    ShawnXuan authored Apr 21, 2022
    Configuration menu
    Copy the full SHA
    3878c81 View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2022

  1. Configuration menu
    Copy the full SHA
    6e322b0 View commit details
    Browse the repository at this point in the history
  2. Update scala tool (Oneflow-Inc#338)

    * update dlrm tool default path
    
    * makeDlrmDataset function
    
    * mod_idx->modIdx
    
    * update
    
    * fix
    
    * step by step
    ShawnXuan authored Apr 27, 2022
    Configuration menu
    Copy the full SHA
    8d42e41 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2022

  1. Dev deepfm multicol oneemb (Oneflow-Inc#339)

    * wdl -> dlrm
    
    * update train.py
    
    * update readme temporary
    
    * update
    
    * update
    
    * udpate
    
    * update
    
    * update
    
    * update
    
    * update arguments
    
    * rm spase optimizer
    
    * update
    
    * update
    
    * update
    
    * dot
    
    * eager 1 device, old embedding
    
    * eager consistent ok
    
    * OK for train only
    
    * rm transpose
    
    * still only train OK
    
    * use register_buffer
    
    * train and eval ok
    
    * embedding type
    
    * dense to int
    
    * log(dense+1)
    
    * eager OK
    
    * rm model type
    
    * ignore buffer
    
    * update sh
    
    * rm dropout
    
    * update module
    
    * one module
    
    * update
    
    * update
    
    * update
    
    * update
    
    * labels dtype
    
    * Dev dlrm parquet (Oneflow-Inc#282)
    
    * update
    
    * backup
    
    * parquet train OK
    
    * update
    
    * update
    
    * update
    
    * dense to float
    
    * update
    
    * add lr scheduler (Oneflow-Inc#283)
    
    * Dev dlrm eval partnum (Oneflow-Inc#284)
    
    * eval data part number
    
    * fix
    
    * support slots (Oneflow-Inc#285)
    
    * support slots
    
    * self._origin in graph
    
    * slots to consistent
    
    * format
    
    * fix speed (Oneflow-Inc#286)
    
    Co-authored-by: guo ran <[email protected]>
    
    * Update dlrm.py
    
    bmm -> matmul
    
    * Dev dlrm embedding split (Oneflow-Inc#290)
    
    * support embedding model parallel
    
    * to consistent for embedding
    
    * update sbp derivation
    
    * fix
    
    * update
    
    * dlrm one embedding add options (Oneflow-Inc#291)
    
    * add options
    
    * add fp16 and loss_scaler (Oneflow-Inc#292)
    
    * fix (Oneflow-Inc#293)
    
    * Dev dlrm offline auc (Oneflow-Inc#294)
    
    * calculate auc offline
    
    * fix one embedding module, rm optimizer conf (Oneflow-Inc#296)
    
    * calculate auc offline
    
    * update
    
    * add auc calculater
    
    * fix
    
    * format print
    
    * add fused_interaction
    
    * fix
    
    * rm optimizer conf
    
    * fix
    
    Co-authored-by: ShawnXuan <[email protected]>
    
    * refine embedding options (Oneflow-Inc#299)
    
    * refine options
    
    * rename args
    
    * fix arg
    
    * Dev dlrm offline eval (Oneflow-Inc#300)
    
    * update offline auc
    
    * update
    
    * merge master
    
    * Dev dlrm consistent 2 global (Oneflow-Inc#303)
    
    * consistent-
    
    * update
    
    * Dev dlrm petastorm (Oneflow-Inc#306)
    
    petastorm dataset
    
    * bce with logits (Oneflow-Inc#307)
    
    * Dev dlrm make eval ds (Oneflow-Inc#308)
    
    * fix
    
    * new val dataloader each time
    
    * rm usless
    
    * rm usless
    
    * rm usless
    
    * Dev dlrm vocab size (Oneflow-Inc#309)
    
    * fix
    
    * new val dataloader each time
    
    * rm usless
    
    * rm usless
    
    * rm usless
    
    * vocab size
    
    * fix fc(scores) init (Oneflow-Inc#310)
    
    * udate dense relu (Oneflow-Inc#311)
    
    * update
    
    * use naive logger
    
    * rm logger.py
    
    * update
    
    * fix loss to local
    
    * rm usless line
    
    * remove to local
    
    * rank 0
    
    * fix
    
    * add graph_train.py
    
    * keep graph mode only in graph_train.py
    
    * rm is_global
    
    * update
    
    * train one_embedding with graph
    
    * update
    
    * rm usless files
    
    * rm more files
    
    * update
    
    * save -> save_model
    
    * update eval arguments
    
    * rm eval_save_dir
    
    * mv import oneflow before sklearn.metrics, otherwise not work on onebrain
    
    * rm usless lines
    
    * print host and device mem after eval
    
    * add auc calculation time
    
    * update
    
    * add fused_dlrm temporarily
    
    * eager train
    
    * shuffling_queue_capacity -> shuffle_row_groups
    
    * update trainer for eager
    
    * rm dataset type
    
    * update
    
    * update
    
    * parquet dataloader
    
    * rm fused_dlrm.py
    
    * update
    
    * update graph train
    
    * update
    
    * update
    
    * update lr scheduler
    
    * update
    
    * update shell
    
    * rm lr scheduler
    
    * rm useless lines
    
    * update
    
    * update one embedding api
    
    * fix
    
    * change size_factor order
    
    * fix eval loader
    
    * rm debug lines
    
    * rm train/eval subfolders
    
    * files
    
    * support test
    
    * update oneembedding initlizer
    
    * update
    
    * update
    
    * update
    
    * rm usless lines
    
    * option -> options
    
    * eval barrier
    
    * update
    
    * rm column_ids
    
    * new api
    
    * fix push pull job
    
    * rm eager test
    
    * rm graph test
    
    * rm
    
    * eager_train-
    
    * rm
    
    * merge graph train to train
    
    * rm Embedding
    
    * update
    
    * rm vocab size
    
    * rm test name
    
    * rm split axis
    
    * update
    
    * train -> train_eval
    
    * update
    
    * replace class Trainer
    
    * fix
    
    * fix
    
    * merge mlp and fused mlp
    
    * pythonic
    
    * interaction padding
    
    * format
    
    * left 3 store types
    
    * left 3 store types
    
    * use capacity_per_rank
    
    * fix
    
    * format
    
    * update
    
    * update
    
    * update
    
    * use 13 and 26
    
    * update
    
    * rm size factor
    
    * update
    
    * update
    
    * update readme
    
    * update
    
    * update
    
    * modify_read
    
    * rm usless import
    
    * add requirements.txt
    
    * rm args.not_eval_after_training
    
    * rm batch size per rank
    
    * set default eval batches
    
    * every_n_iter -> interval
    
    * device_memory_budget_mb_per_rank -> cache_memory_budget_mb_per_rank
    
    * dataloader-
    
    * update
    
    * update
    
    * update
    
    * update
    
    * update
    
    * update
    
    * use_fp16-
    
    * single py
    
    * disable_fusedmlp
    
    * 4 to 1
    
    * new api
    
    * add capacity
    
    * Arguments description (Oneflow-Inc#325)
    
    * Arguments description
    
    * rectify README.md
    
    * column-
    
    * make_table
    
    * MultiTableEmbedding
    
    * update store type
    
    * update
    
    * update readme
    
    * update README
    
    * update
    
    * iter->step
    
    * update README
    
    * add license
    
    * update README
    
    * install oneflow nightly
    
    * Add tools directory info to  DLRM README.md (Oneflow-Inc#328)
    
    * Add deepfm model(FM component missed)
    
    * Add FM component
    
    * Update README.md
    
    * Fix loss bug; change weight initialization methods
    
    * change lr scheduler to multistepLR
    
    * Add dropout layer to dnn
    
    * Add monitor for early stopping
    
    * Simplify early stopping schema
    
    * Normal initialization for oneembedding; Adam optimizer; h52parquet
    
    * Add logloss in eval for early stop
    
    * Fix dataloader slicing bug
    
    * Change lr schedule to reduce lr on plateau
    
    * Refine train/val/test
    
    * Add validation and test evaluation
    
    * Update readme and help message
    
    * use flow.roc_auc_score, prefetch eval batches, fix train step start time
    
    * Delete unused args;
    Change file path;
    Add Throughput measurement.
    
    * Add deepfm with MultiColOneEmbedding
    
    * remove fusedmlp; change interaction class to function; keep val graph predict in gpu
    
    * Use flow._C.binary_cross_entropy_loss;
    Remove sklearn from env requirement;
    
    * Fix early stop bug;
    Check if path valid before loading model
    
    * Change auc time and logloss time to metrics time;
    Remove last validation;
    
    * replace view with keepdim;
    replace nn.sigmoid with tensor.sigmoid
    
    * change unsqueeze to keepdim;
    use list in dataloader
    
    * Use from numpy to reduce cast time
    
    * Add early stop and save best to args
    
    * Reformat deepfm_train_eval
    
    * Use BCEWithLogitsLoss
    
    * Update readme;
    Change early_stop to disable_early_stop;
    Update train script
    
    * Update README.md
    
    * Fix early stop bugs
    
    * Refine save best model help message
    
    * Add scala script and spark launching shell script
    
    * Delete h5_to_parquet.py
    
    * Update readme.md
    
    * Use real values in table size array example;
    delete criteo_parquet.py
    
    * Add split_criteo_kaggle.py
    
    * Update readme.md
    
    * Rename training script;
    Update readme.md
    
    * Update Readme.md (fix bad links)
    
    * Update README.md
    
    * Format files
    
    * Add out_features in DNN
    
    Co-authored-by: ShawnXuan <[email protected]>
    Co-authored-by: guo ran <[email protected]>
    Co-authored-by: BakerMara <[email protected]>
    Co-authored-by: BoWen Sun <[email protected]>
    5 people authored May 20, 2022
    Configuration menu
    Copy the full SHA
    a0c9af5 View commit details
    Browse the repository at this point in the history

Commits on May 23, 2022

  1. Configuration menu
    Copy the full SHA
    7b88533 View commit details
    Browse the repository at this point in the history

Commits on May 30, 2022

  1. dcn on Criteo (Oneflow-Inc#335)

    * add dcn files.
    
    * add README.md
    
    * update readme.md, requirements.txt, train.sh. pretrained models coverted from pytroch is in /models-torch2flow .
    
    * deleted files
    
    * deleted files
    
    * auto format by CI
    
    * deleted .gitignore
    
    * updated files
    
    * modified nn.init.zeros_ and nn.init.xavier_normal_ in crossnet.
    
    * fix change form /scripts/swin_dataloader_compare_speed_with_pytorch.py
    
    * add processing frappe from csv to parqurt format files: tools/frappe-parquet.py , tools/frappe-parquet.sh
    
    * modified frappe download link in README.md
    
    * delete tools dir
    
    * add tools dir
    
    * update dcn_graph_train_eval files
    
    * update fuxi dcn graph train and eval files , new dataset make tool based on fuxi
    
    * modified train.sh table_size_array
    
    * fix some erroe in fuxi_data_util when save csv
    
    * Criteo dcn related files
    
    * modified README.md
    
    * modified dcn_train_eval.py some arguments name
    
    * create graph when lr_decay
    
    * deleted fm_persistent
    
    * update dcn_train_eval.py
    
    * formated file by
    
    * new tool dir , and modified dcn_train_eval.py/sh fake path
    
    * add feature_map_json argment
    
    * delete unnecessary and useless code
    
    * add cast in make_criteo_parquet.py, modified dcn_train_eval.py
    
    * delete useless
    
    * add throughput
    
    * add valid test samples arg
    
    * fix batch_size and train_batch_size mismatched problem
    
    * delete uesless print code
    
    * add a blank line in the bottom of dataset_config.yaml
    
    * add requirements.txt, update README.md
    
    * move loss=loss.numpy() to improve efficiency
    
    * delete fuxi code in dcn_train_eval.py, add scala related files, update README
    
    * update README
    
    * remove RecommenderSystems/dcn/tools/make_criteo_parquet.py and RecommenderSystems/dcn/tools/dataset_config.yaml, update table_size_array
    
    * simplified DNN module, modified test eval process and related READEME and train.sh contents
    
    * add Crossnet fuxi quote, modified directory description in Readme and ddn to dcn
    
    * name auc loglogg in eval process as val_auc val_logloss, add pandas sklearn in requirements.txt, modified READEME
    
    * simplified train.sh and related  README contents
    
    * simplified L2,3,4 in train.sh
    
    * set size_factor default=3
    
    * add dcn structure image
    
    * update Crossnet implementation in README
    
    * update Crossnet implementation in README
    
    * update Crossnet implementation in README
    
    * update Crossnet implementation in README
    
    * update README
    
    Co-authored-by: oneflow-ci-bot <[email protected]>
    jiangyzy and oneflow-ci-bot authored May 30, 2022
    Configuration menu
    Copy the full SHA
    fce11a6 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2022

  1. Dev ipnn pr (Oneflow-Inc#342)

    * only ipnn
    
    * ipnn only to pr
    
    * rm .gitignore
    
    * modify README
    
    * delete useless code
    
    * delete useless .py
    
    * modify README
    
    * add split_criteo_kaggle.py
    
    * modify np_to_global function
    BakerMara authored Jun 7, 2022
    Configuration menu
    Copy the full SHA
    7fd41f1 View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2022

  1. Configuration menu
    Copy the full SHA
    ee0d6d8 View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2022

  1. Dev xdeepfm pr (Oneflow-Inc#347)

    * modify README, delete useless code, rename files
    
    * modify model name
    
    * modify readme
    
    * modify readme
    
    * delete useless code and black
    BakerMara authored Jun 13, 2022
    Configuration menu
    Copy the full SHA
    b176898 View commit details
    Browse the repository at this point in the history

Commits on Jun 16, 2022

  1. dev deepfm fused mlp (Oneflow-Inc#346)

    * Replace Dnn with fused mlp
    
    * Add disable_fusedmlp to args;
    
    * Remove duplicate args
    
    * Format deepfm_train_eval.py
    Liuxinman authored Jun 16, 2022
    Configuration menu
    Copy the full SHA
    578399e View commit details
    Browse the repository at this point in the history

Commits on Jun 22, 2022

  1. Configuration menu
    Copy the full SHA
    9120eba View commit details
    Browse the repository at this point in the history

Commits on Jul 14, 2022

  1. Dev mmoe spark (Oneflow-Inc#351)

    * MMoe parquet script;
    Add a mmoe model draft;
    
    * Add Mmoe dataloader;
    Add MmoeModule;
    
    * Add mmoe eval part;
    Remove useless code;
    
    * Update args
    
    * Add sh script
    
    * Fix bugs in parallel
    
    * Replace table size array;
    
    * Update readme;
    Update args;
    
    * Update README.md
    
    * Change gate and tower to dnn
    
    * fix typo in mmoe_parquet.py;
    remove used import
    
    * Update README.md (dataset);
    Update mmoe_train_eval.py to deal with empty str args;
    
    * Remove sklearn and pandas dependency in mmoe_parquet.py
    
    * Fix bugs in mmoe_parquet.py
    
    * Simplify mmoe_parquet
    
    * Update readme
    
    * format mmoe_train_eval.py
    
    * Format mmoe_parquet.py
    
    * Remove num_sparse_features and num_dense_features
    Liuxinman authored Jul 14, 2022
    Configuration menu
    Copy the full SHA
    e6b7b42 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2022

  1. Configuration menu
    Copy the full SHA
    1c05ca9 View commit details
    Browse the repository at this point in the history

Commits on Aug 3, 2022

  1. Configuration menu
    Copy the full SHA
    64c6eb1 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2022

  1. Configuration menu
    Copy the full SHA
    9a7546d View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2022

  1. Dlrm key type (Oneflow-Inc#348)

    * add oneembedding key_type
    
    * pad dense input
    ShawnXuan authored Aug 8, 2022
    Configuration menu
    Copy the full SHA
    012a1f6 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2022

  1. Configuration menu
    Copy the full SHA
    1996903 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2022

  1. Dev roberta and update CPT NEW (Oneflow-Inc#364)

    * test new pr
    
    * update CPT
    
    * update transformer
    
    * dev roberta
    
    * update README file
    
    * updata README file
    
    * fix bug
    
    * fix roberta bug
    
    * modify according to the review
    
    * update readme file
    
    * update train_MNLI.py
    
    * update roberta
    
    * fix CPT
    
    * update readme
    
    * update file
    
    * Delete empty line
    
    * update readme
    
    * auto format by CI
    
    * Remove redundant dependencies
    
    Co-authored-by: oneflow-ci-bot <[email protected]>
    songzetao and oneflow-ci-bot authored Aug 11, 2022
    Configuration menu
    Copy the full SHA
    b276b95 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2022

  1. KnowledgeDistillation (Oneflow-Inc#362)

    * copy as a new pr
    
    * update model.py
    
    * train teacher
    
    * add student_kd adn student
    
    * add args
    
    * add infer files
    
    * update README file
    
    * add train script
    
    * Remove redundant files
    
    * add requirements and update Readme
    
    * add infer.sh
    
    * black all files
    
    * refactoring code
    
    * refactoring code directory
    
    * update readme
    
    * update comment
    
    * auto format by CI
    
    * Update KnowledgeDistillation/KnowledgeDistillation/README.md
    
    Co-authored-by: oneflow-ci-bot <[email protected]>
    Co-authored-by: Liang Depeng <[email protected]>
    3 people authored Aug 12, 2022
    Configuration menu
    Copy the full SHA
    f85f881 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2022

  1. MetaKD (Oneflow-Inc#358)

    * copy as a new pr
    
    * update requirements, add some bash scripts
    
    * Generate data
    
    * convert easynlp to oneflow version
    
    * generate data
    
    * process data
    
    * train teacher
    
    * student first
    
    * Perfect code for review
    
    * add readme
    
    * Adjust directory and delete redundant files
    
    * Delete redundant files
    
    * Delete redundant files again
    
    * delete files in easynlp
    
    * add requirement
    
    * delete build.sh
    
    * auto format by CI
    
    * delete files in easynlp
    
    * auto format by CI
    
    * add requirement in easynlp
    
    Co-authored-by: oneflow-ci-bot <[email protected]>
    songzetao and oneflow-ci-bot authored Sep 2, 2022
    Configuration menu
    Copy the full SHA
    eccafd3 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2022

  1. new deepfm for check

    zhipeng.li committed Sep 6, 2022
    Configuration menu
    Copy the full SHA
    9ee4b07 View commit details
    Browse the repository at this point in the history