forked from microsoft/Megatron-DeepSpeed
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor training pipeline and add Llama2 tokenizer support #7
Merged
Merged
Changes from all commits
Commits
Show all changes
552 commits
Select commit
Hold shift + click to select a range
061e2cc
Update `train_aGPT_7B.sh`
saforem2 47bf9b5
Update `train_llama_alcf.sh`
saforem2 e68d270
Update `ALCF/README.md`
saforem2 4dd51dd
Merge pull request #14 from argonne-lcf/sunspot-frameworks-tests
saforem2 ac414a0
Update README.md
saforem2 9aa7fab
Fix path in `prof.export_chrome_trace()` from `pretrain_gpt_alcf.py`
saforem2 7d20359
Merge pull request #15 from argonne-lcf/fix-trace-output-path
saforem2 0508cf6
changed environment variable
zhenghh04 c4250a1
added torch profiler per step output support
zhenghh04 fa04d11
local changes
zhenghh04 bec5f9e
merge
zhenghh04 6cca87f
distributed loading
zhenghh04 62f8f56
fixed print issue
zhenghh04 2f01543
Update README.md
saforem2 13171c2
Update README.md
saforem2 06ac065
Added function for on-the-fly building the dataset
zhenghh04 120a2b5
fixed minor issue in _build_train_valid_test_datasets_single
zhenghh04 6fdbfd3
fixed variable order in Builder
zhenghh04 3e2aa23
fixed minor issue
zhenghh04 b371742
Add `setup_tokenizer_and_data()` function to `ALCF/helpers.sh`
saforem2 d93fb7f
Update `train_llama_alcf.sh`
saforem2 05d82c3
Update `train_aGPT_7B.sh`
saforem2 6de8496
Update `ALCF/README.md`
saforem2 03aa7c1
Update `ALCF/helpers.sh`
saforem2 3cd3f1a
Update `train_aGPT_7B.sh`
saforem2 bc1dbfd
Fix `--data-cache-path` in `ALCF/helpers.sh, train_llama_alcf.sh`
saforem2 c3a4451
Add `ALCF/sunspot-env-2024-04-15-002.sh`
saforem2 0fc3919
Update `train_aGPT_7B.sh`
saforem2 318d860
Merge branch 'tokenizer-tests' of https://github.com/argonne-lcf/Mega…
saforem2 c7a20cf
Merge pull request #17 from argonne-lcf/tokenizer-tests
saforem2 efb2a3a
added a barrier to make sure all the datasets are built before other …
zhenghh04 ccf8835
Update `{train_llama_alcf.sh,ALCF/helpers.sh}`
saforem2 358139f
Update `ALCF/helpers.sh`
saforem2 47585ed
Concat datasets that belongs the same corpus
zhenghh04 2b5b41f
convert MDS checkpoint to Hf Llama model
vksastry 10a34ea
fixed bugs
zhenghh04 7c80c2c
Update `ALCF/helpers.sh`
saforem2 93db2a9
optimized loading blendable dataset meta data, by loading and broadca…
zhenghh04 89f2a95
added broadcast
zhenghh04 96cb1e5
fixed overflow issue
zhenghh04 cb2f1dc
removed unnecessary mpi4py
zhenghh04 b48d6f8
Merge pull request #18 from argonne-lcf/distributed_loading_v2
zhenghh04 0dea6aa
Update dataset_utils.py
zhenghh04 f16416a
merge distributed_loading
zhenghh04 5d26dfe
fixed a minor bug
zhenghh04 3dc424f
remove unnecessary barrier
zhenghh04 60fc482
added pfw tracing for test_blendable_dataset
zhenghh04 b1f17d5
fixed bug
zhenghh04 10a3737
added more loging
zhenghh04 bc28f84
removed allreduce calls that are not needed
zhenghh04 6eb21b7
removed allreduce call that are not needed any more
zhenghh04 20a2430
fixed a bug
zhenghh04 f718694
added more logging info
zhenghh04 699bde4
Merge branch 'distributed_loading' of ../Megatron-DeepSpeed-distribut…
zhenghh04 dd3b070
Merge branch 'distributed_loading' of github.com:argonne-lcf/Megatron…
zhenghh04 b4c832e
added more logging for index_dataset
zhenghh04 1719b0e
added new log
zhenghh04 053b42d
changed things into helper
zhenghh04 52b2cca
fixed issue with dlioprofiler
zhenghh04 cbc7830
fixed some bugs
zhenghh04 03a9bfa
Merge branch 'pfw_trace' of github.com:argonne-lcf/Megatron-DeepSpeed…
zhenghh04 36a2671
fixed profiler issue
zhenghh04 5c8d376
reduced printing
zhenghh04 d9085b6
added more timing info
zhenghh04 0ef6bfd
fixed timing issue for all reduce
zhenghh04 26ee1c3
Merge pull request #20 from argonne-lcf/pfw_trace
zhenghh04 9413dc9
Merge pull request #21 from argonne-lcf/distributed_loading_v2
zhenghh04 f6363fb
changed init
zhenghh04 a55df51
reducing printing from non-root ranks
zhenghh04 a24f01b
reduce printing
zhenghh04 5a54149
reducing printing
zhenghh04 3acdda7
added MiCS as an option
zhenghh04 73f6cee
Merge branch 'mics' into distributed_loading
zhenghh04 712d08d
Update `dropout` in `ALCF/helpers.sh`
saforem2 482c235
Update {`ALCF/helpers.sh`, `train_llama_alcf.sh`}
saforem2 2e26950
Merge pull request #22 from argonne-lcf/sequence-parallel
saforem2 f4c2c16
Add `ALCF/data-lists/aurora/*.txt`
saforem2 231d2b5
Add `setup_conda_aurora` to `ALCF/helpers.sh`
saforem2 852575d
Merge pull request #23 from argonne-lcf/aurora-updates
saforem2 aaf6152
Fix `ezpz_{save,get}jobenv` in `ALCF/helpers.sh`
saforem2 56a1c37
Merge pull request #24 from argonne-lcf/ezpz-hotfix
saforem2 b905e53
Correctly set `dfl_fallback` on Aurora if no `DATA_FILE_LIST` specified
saforem2 4a07103
Merge pull request #25 from argonne-lcf/aurora-dfl-fix
saforem2 ba5f871
added warning if the file list is not provided correctly
zhenghh04 c690202
make it still compatible to previous
zhenghh04 a96bcea
added support for XPU
zhenghh04 30fe479
Update README.md
saforem2 9208eae
Update README.md
saforem2 caf82d7
Merge pull request #26 from argonne-lcf/saforem2-patch-1
saforem2 1f983f3
Create `llama-toggle` branch
saforem2 f902e91
Merge pull request #19 from argonne-lcf/checkpoint_convert
saforem2 67d6810
Update README.md
saforem2 3091871
Update `setEnv` for Aurora in `ALCF/helpers.sh`
saforem2 81fe55f
Update README.md
saforem2 983a0bd
Merge pull request #27 from argonne-lcf/saforem2-patch-1
saforem2 7d1784b
Updates to `NO_LLAMA` mode
saforem2 bf979a7
Update `pretrain_gpt_alcf.py`
saforem2 84fa77c
Update `pretrain_gpt_alcf.py`
saforem2 e6461f5
Merge pull request #28 from argonne-lcf/llama-toggle
saforem2 f138b27
added more log
zhenghh04 a7249fe
resolve conflict in file list
zhenghh04 a36569e
added warning info when XPU profiling is not available
zhenghh04 79d11a7
Create `alcf-patch-1` branch
saforem2 e058427
Update `ALCF/helpers.sh`
saforem2 1ae3768
Update `ALCF/helpers.sh`
saforem2 abead32
Update `ALCF/README.md`
saforem2 025ff3f
Update ALCF/README.md`
saforem2 d012937
Merge pull request #29 from argonne-lcf/alcf-patch-1
saforem2 ef5356b
Merge pull request #16 from argonne-lcf/distributed_loading
saforem2 732e567
Add `ALCF/data-lists/aurora/*.txt`
saforem2 0320b69
Update `ALCF/data-lists/sunspot/*.txt`
saforem2 a51fb11
Update `ALCF/data-lists/polaris/*.txt`
saforem2 9d10704
Update `.gitignore`
saforem2 ec600e5
Update `ALCF/helpers.sh`
saforem2 168cdda
Add `ALCF/requirements/requirements.txt`
saforem2 7df9329
Update `ALCF/helpers.sh`
saforem2 77ffd10
Update `ALCF/helpers.sh`
saforem2 e884f15
Update `ALCF/helpers.sh,requirements/requirements.txt}`
saforem2 10a17e2
Merge pull request #30 from argonne-lcf/distributed-data-lists
saforem2 fb49de8
Update `ALCF/helpers.sh`
saforem2 7272326
Update `ALCF/helpers.sh`
saforem2 18ca369
Merge pull request #31 from argonne-lcf/alcf-helpers-patch-1
saforem2 f826667
Update `ALCF/helpers.sh` with kvs fix on Aurora
saforem2 26b846a
Update `ALCF/helpers.sh`
saforem2 7cd5bfa
Merge pull request #32 from argonne-lcf/alcf-aurora-kvs-fix
saforem2 bc7fbc6
Update `ALCF/README.md`
saforem2 f94b845
Update `ALCF/README.md`
saforem2 6f98d5a
Merge pull request #33 from argonne-lcf/alcf-update-readme
saforem2 06357f4
Create `alcf-startup-time`
saforem2 c7a1e36
Add `ALCF/notes/deepspeed_init_time.md`
saforem2 0548bfb
Update `ALCF/notes/deepspeed_init_time.md`
saforem2 6a8f55c
Update deepspeed_init_time.md
saforem2 d0e3d79
Update `ALCF/helpers.sh`
saforem2 bb690e3
Update `pretrain_gpt_alcf.py`
saforem2 aa698da
Update `train_llama_alcf.sh`
saforem2 12baf30
Update `megatron/training.py`
saforem2 8eabb7a
Update `megatron/training.py`
saforem2 d9fc18e
Update `ALCF/helpers.sh`
saforem2 1d413c6
Update `megatron/training.py`
saforem2 93e4a51
Update `megatron/utils.py`
saforem2 99bddfa
Update `ALCF/helpers.sh`
saforem2 9a8ccfd
Update `ALCF/helpers.sh`
saforem2 c6a63bc
Merge pull request #34 from argonne-lcf/alcf-startup-time
saforem2 57ba1fb
Update `ALCF/helpers.sh`
saforem2 7388c1a
Update `ALCF/helpers.sh`
saforem2 37a7c5c
Merge pull request #36 from argonne-lcf/alcf-helpers-patch
saforem2 b511a2e
Update `ALCF/helpers.sh`
saforem2 561ddc1
Fix micro batch size on Polaris
saforem2 9ee09fe
Update `ALCF/helpers.sh`
saforem2 76209f4
Update `ALCF/helpers.sh`
saforem2 541ebf1
Update `ALCF/helpers.sh`
saforem2 d76331f
Update `ALCF/helpers.sh`
saforem2 d017b4c
Update `ALCF/helpers.sh`
saforem2 bac8aab
Update `ALCF/helpers.sh`
saforem2 911cc5c
Update `ALCF/helpers.sh`
saforem2 2ac4fb0
Update `ALCF/helpers.sh`
saforem2 4876eb8
Update `ALCF/helpers.sh` on Polaris
saforem2 7385e3b
Update `ALCF/helpers.sh`
saforem2 5f5bbd4
Update `pretrain_gpt_alcf.py`
saforem2 b38bcb6
Update `ALCF/helpers.sh`
saforem2 0999de2
Update `ALCF/requirements/requirements.txt`
saforem2 6ad3a99
Fix opt hyperparams in `ALCF/helpers.sh`
saforem2 019dc3c
Update `ALCF/helpers.sh`
saforem2 54bd608
Track grad_norm in `megatron/training.py`
saforem2 969f4c5
Update `train_aGPT_7B.sh`
saforem2 9550656
Update `train_llama_alcf.sh`
saforem2 5d96d64
Update `train_aGPT_7B.sh`
saforem2 8897dc2
Merge pull request #43 from argonne-lcf/alcf-helpers-patch-1
saforem2 bcbe75f
Update README.md
saforem2 0270321
Merge pull request #49 from argonne-lcf/saforem2-patch-2
saforem2 b7c17ca
Move `ALCF/mds_to_hf.py` to `mds_to_hf.py`
saforem2 81470e9
Merge pull request #51 from argonne-lcf/checkpoint-conversion
saforem2 5001600
fixed data loader issue for TP>1 PP>1
zhenghh04 38b2505
Update `ALCF/data-lists/aurora/*.txt`
saforem2 461bc7f
Merge pull request #52 from argonne-lcf/bugfix/tp_pp_dataloader
saforem2 ea0c3c7
fixed dftracer compatibility
zhenghh04 50e2729
hf cp conversion and inference scripts added
464a0d2
Merge pull request #53 from argonne-lcf/checkpoint_hf
saforem2 a0ac750
added requirements.txt
zhenghh04 de7f22f
Update utils.py
zhenghh04 3edba7f
Add `--train-range-to-skip` to `megatron/arguments.py`
saforem2 76a259b
Add logic for `--trin-range-to-skip` to `megatron/training.py`
saforem2 fd1ac6d
Update `ALCF/helpers.sh`
saforem2 6f27f5d
Update `train_aGPT_7B.sh`
saforem2 6df33ad
fix: `--override-opt_param-scheduler` if `OVERRIDE_CKPT_OPT_PARAM=1`
saforem2 73720c2
Merge pull request #56 from argonne-lcf/train-skip-range
saforem2 8bc5313
merge: Create `microsoft-main`
saforem2 a1ede68
Remove duplicate `--profile` arg
saforem2 6b32cff
debug: `sequence_parallel` issue in `RMSNorm` ??
saforem2 12f6f8e
fix check
zhenghh04 5ac877a
Update `megatron/training_log_alcf.py`
saforem2 b3e0f6f
Update `megatron/training.py`
saforem2 2113dbc
Update `megatron/utils.py`
saforem2 7f71572
Update `megatron/training_log.py`
saforem2 7cb9c11
Update `pretrain_gpt_alcf.py`
saforem2 e83de19
Update `megatron/training_log.py`
saforem2 29756d6
Warn if mismatch b/w iters in `megatron/checkpointing.py`
saforem2 1a7f03b
fix: `try/except` for non tensors in `megatron/training_log.py`
saforem2 828f6a9
fix: Correctly draw `grad_acc_steps` batches of data when skipping step
saforem2 295fcb3
Update `pretrain_gpt_alcf.py`
saforem2 cf80e6b
added sophia
09accde
Merge pull request #59 from mngom2/spike-skipper
saforem2 cef3fc7
Merge pull request #58 from argonne-lcf/spike-skipper
saforem2 fd94b37
merge: Resolve merge conflicts pulling in from Microsoft upstream
saforem2 9b5be12
merge: `argonne-lcf-microsoft-main` into `main`
saforem2 5394156
shuffle concate dataset index
zhenghh04 573b668
fixed bugs
zhenghh04 41ff059
Update `ALCF/helpers.sh`, `train_aGPT_7B.sh`
saforem2 89db92a
merge: `feature/profile` with data fix into `microsoft-main`
saforem2 9de83a9
Fix `shuffle_idx` in `megatron/data/gpt_dataset.py`
saforem2 d7a2594
Fix `shuffle_idx` in `megatron/data/gpt_dataset.py`
saforem2 3e33a6a
Update `ALCF/helpers.sh`, `train_aGPT_7B.sh`
saforem2 43cde2b
Update `pretrain_gpt_alcf.py`
saforem2 9f09733
Update `megatron/data/{blendable,gpt,indexed}_dataset.py`
saforem2 2b31b44
Update `ALCF/requirements/requirements.txt`
saforem2 5e9eed0
Update `megatron/utils.py`
saforem2 3dcb297
fixed bugs and added commandline option
zhenghh04 bec9b7a
Merge branch 'debug-logging' into feature/profile
saforem2 43fc2fe
fixed typo
zhenghh04 94d5337
Merge branch 'feature/profile' of github.com:argonne-lcf/Megatron-Dee…
zhenghh04 bb55e97
Merge pull request #67 from argonne-lcf/feature/profile
saforem2 d50239f
added support for blending samples across different files in the same…
zhenghh04 9b4f510
Merge pull request #64 from argonne-lcf/debug-logging
saforem2 324ef11
Merge branch 'alcf-hzheng-data-fix' into hzheng-data-fix
saforem2 45ff652
Discard changes to megatron/data/gpt_dataset.py
saforem2 52a406c
Consistent logging in `megatron/data/*.py`
saforem2 63b1901
Update `megatron/data/gpt_dataset.py`
saforem2 7ef26bf
Use `time.perf_counter` in `megatron/data/blendable_dataset.py`
saforem2 deb95cd
fix init issue for silently ignoring the deepspeed config (#452)
xylian86 68da2db
Update `ALCF/helpers.sh`
saforem2 ab3a8ec
Merge branch 'main' of https://github.com/microsoft/Megatron-DeepSpee…
saforem2 ed21bd9
Merge branch 'hzheng-data-fix' of https://github.com/argonne-lcf/Mega…
saforem2 6acc370
fix moe tflops (#445)
ranzhejiang 467279b
Merge 'upstream/main' into `hzeng-data-fix`
saforem2 9e015cc
Remove duplicate `gradient_accumulation_steps` in DS config
saforem2 58dc2d7
Update default EVAL args
saforem2 277d308
Catch eval metrics in `megatron/training.py`
saforem2 af4cba1
Save git branch to env in `train_aGPT_7B.sh`
saforem2 8a8472c
fixed print out bug
zhenghh04 dfd0643
Merge pull request #68 from argonne-lcf/feature/blending_corpus
saforem2 6cb727d
Fix `args.shuffle` in `megatron/data/gpt_dataset.py`
saforem2 5d10179
Update `--{shuffle,blend}-sample-in-corpus` arg in `ALCF/helpers.sh`
saforem2 160d6a6
fix: `GRAD_ACC_STEPS` when `NHOSTS == 256`
saforem2 40db8c2
Merge pull request #63 from argonne-lcf/hzheng-data-fix
saforem2 ce7d553
🚧 `ALCF/ds_to_universal.py`
saforem2 8e0bff8
docs: Add `ALCF/notes/checkpoints.md`
saforem2 bd8c246
feat: Enable `--use-flash-attn-builder` by default on Aurora
saforem2 26f2e71
Update python.yml
saforem2 48b3c81
Update python.yml
saforem2 0a997bb
Update python.yml
saforem2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/bin/bash --login | ||
|
||
# AWS NCCL OFI Plugin settings below | ||
export NCCL_CROSS_NIC=1 | ||
export NCCL_COLLNET_ENABLE=1 | ||
export NCCL_NET="AWS Libfabric" | ||
export LD_LIBRARY_PATH=/soft/libraries/aws-ofi-nccl/v1.9.1-aws/lib:$LD_LIBRARY_PATH | ||
export LD_LIBRARY_PATH=/soft/libraries/hwloc/lib/:$LD_LIBRARY_PATH | ||
export FI_CXI_DISABLE_HOST_REGISTER=1 | ||
export FI_MR_CACHE_MONITOR=userfaultfd | ||
export FI_CXI_DEFAULT_CQ_SIZE=131072 | ||
######################################################### | ||
# WARNING: !!! | ||
# - Currently, `export NCCL_NET_GDR_LEVEL=PHB` | ||
# causes a hang on Polaris. | ||
# so, we don't set it for the time being [2024-05-14]. | ||
# - Seems to work on Perlmutter ??? | ||
# | ||
# export NCCL_NET_GDR_LEVEL=PHB | ||
######################################################### |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
0.0018520780893211373 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0000_text_document algebraic-stack-train | ||
0.0017591050606817512 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0001_text_document algebraic-stack-train | ||
0.001459052794333798 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0002_text_document algebraic-stack-train | ||
0.0007405667281569194 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0003_text_document algebraic-stack-train | ||
0.00019420030110896795 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0004_text_document algebraic-stack-train | ||
0.0009008668715801845 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0005_text_document algebraic-stack-train | ||
0.00015115827957143057 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0006_text_document algebraic-stack-train | ||
0.0014552844319220648 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0007_text_document algebraic-stack-train | ||
0.0012469861325685161 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0008_text_document algebraic-stack-train | ||
0.00136412011372413 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0009_text_document algebraic-stack-train | ||
0.0007064279699221103 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0010_text_document algebraic-stack-train | ||
0.0008472240000687427 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0011_text_document algebraic-stack-train | ||
0.0001984375713341955 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0012_text_document algebraic-stack-train | ||
0.0005472773881697123 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0013_text_document algebraic-stack-train | ||
0.001815779629850992 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0014_text_document algebraic-stack-train | ||
0.0018313600689757324 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0015_text_document algebraic-stack-train |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
0.0002583902668716813 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0000_text_document arxiv | ||
0.0002646575141232155 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0001_text_document arxiv | ||
0.0003165521247456758 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0002_text_document arxiv | ||
0.0002920706460176214 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0003_text_document arxiv | ||
0.00028396813182810215 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0004_text_document arxiv | ||
0.00030445161883108107 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0005_text_document arxiv | ||
0.00031628781276576474 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0006_text_document arxiv | ||
0.0003083776568189157 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0007_text_document arxiv | ||
0.0003176359471472902 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0008_text_document arxiv | ||
0.0002536009369131698 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0009_text_document arxiv | ||
0.0003067491424681363 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0010_text_document arxiv | ||
0.0002597217257557784 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0011_text_document arxiv | ||
0.0003788556450109768 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0012_text_document arxiv | ||
0.0002796563272052598 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0013_text_document arxiv | ||
0.00033573826524290287 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0014_text_document arxiv | ||
0.00030523658022800287 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0015_text_document arxiv | ||
0.00032211552192240096 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0016_text_document arxiv | ||
0.0003329295675164247 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0017_text_document arxiv | ||
0.0003101982186639862 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0018_text_document arxiv | ||
0.00032361798234223355 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0019_text_document arxiv | ||
0.0003495541581652915 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0020_text_document arxiv | ||
0.0002821637448858042 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0021_text_document arxiv | ||
0.00030399523537629673 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0022_text_document arxiv | ||
0.0002955658968247219 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0023_text_document arxiv | ||
0.00028942158502924254 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0024_text_document arxiv | ||
0.00028769546171490733 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0025_text_document arxiv | ||
0.0002938111057234182 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0026_text_document arxiv | ||
0.0002711150403010948 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0027_text_document arxiv | ||
0.00031130095874747565 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0028_text_document arxiv | ||
0.0003002996118160777 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0029_text_document arxiv | ||
0.0003732757901604459 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0030_text_document arxiv | ||
0.00026784205751795894 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0031_text_document arxiv | ||
0.0002799626521661984 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0032_text_document arxiv | ||
0.00034334276069078164 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0033_text_document arxiv | ||
0.0003582469803674965 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0034_text_document arxiv | ||
0.00031094844818418623 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0035_text_document arxiv | ||
0.0002766228384977191 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0036_text_document arxiv | ||
0.00030297116159471485 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0037_text_document arxiv | ||
0.00027033888377464685 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0038_text_document arxiv | ||
0.00030090862368377933 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0039_text_document arxiv | ||
0.00028543875802490955 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0040_text_document arxiv | ||
0.00027559768459074204 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0041_text_document arxiv | ||
0.0003182185533962886 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0042_text_document arxiv | ||
0.0003311392971435837 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0043_text_document arxiv | ||
0.00028751652060804325 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0044_text_document arxiv | ||
0.000303466863212589 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0045_text_document arxiv | ||
0.00033400462801277524 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0046_text_document arxiv | ||
0.0002589234031777426 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0047_text_document arxiv | ||
0.0002913508598466723 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0048_text_document arxiv | ||
0.0002670572450004856 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0049_text_document arxiv | ||
0.00032027399105647656 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0050_text_document arxiv | ||
0.00032188376258379377 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0051_text_document arxiv | ||
0.0003161585784100882 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0052_text_document arxiv | ||
0.0003184249182974135 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0053_text_document arxiv | ||
0.00030381336664000807 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0054_text_document arxiv | ||
0.0003190437442184283 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0055_text_document arxiv | ||
0.0002537961798200545 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0056_text_document arxiv | ||
0.0003017817117223326 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0057_text_document arxiv | ||
0.00028685268513240224 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0058_text_document arxiv | ||
0.00031265179094451165 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0059_text_document arxiv | ||
0.00034708319096986816 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0060_text_document arxiv | ||
0.00026650837943080664 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0061_text_document arxiv | ||
0.00034588832248507335 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0062_text_document arxiv | ||
0.0002416982248399037 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0063_text_document arxiv | ||
0.0003089296918222243 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0064_text_document arxiv | ||
0.00029137184185700827 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0065_text_document arxiv | ||
0.00026464226846800774 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0066_text_document arxiv | ||
0.00030545397919456627 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0067_text_document arxiv | ||
0.0003206778460448875 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0068_text_document arxiv | ||
0.00030968971641110967 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0069_text_document arxiv | ||
0.00023325653928600864 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0070_text_document arxiv | ||
0.00030526899198338555 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0071_text_document arxiv | ||
0.00035376719076633584 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0072_text_document arxiv | ||
0.000290224385981026 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0073_text_document arxiv | ||
0.000294650083382008 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0074_text_document arxiv | ||
0.00028768858128616436 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0075_text_document arxiv | ||
0.00030856965235527843 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0076_text_document arxiv | ||
0.00030579942447879054 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0077_text_document arxiv | ||
0.0002863101084704357 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0078_text_document arxiv | ||
0.0002870032092492213 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0079_text_document arxiv | ||
0.000264182727569885 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0080_text_document arxiv | ||
0.0002974012367036449 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0081_text_document arxiv | ||
0.00032238412143059203 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0082_text_document arxiv | ||
0.00031683716893819036 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0083_text_document arxiv | ||
0.00031157434937617524 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0084_text_document arxiv | ||
0.0003411742735695989 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0085_text_document arxiv | ||
0.00026778444816570715 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0086_text_document arxiv | ||
0.0003037045797275201 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0087_text_document arxiv | ||
0.00027746114370081314 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0088_text_document arxiv | ||
0.00027148285946862043 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0089_text_document arxiv | ||
0.00028042950114678207 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0090_text_document arxiv | ||
0.0003235607816590721 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0091_text_document arxiv | ||
0.0003086692227306295 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0092_text_document arxiv | ||
0.00033990349455148105 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0093_text_document arxiv | ||
0.00030945053208470265 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0094_text_document arxiv | ||
0.00027309074552265303 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0095_text_document arxiv | ||
0.00028737393506316194 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0096_text_document arxiv | ||
0.0003098868328009879 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0097_text_document arxiv | ||
0.0002614229162588409 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0098_text_document arxiv | ||
0.0002884388407820923 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0099_text_document arxiv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
0.0031025147279277244 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/books-0000_text_document books | ||
0.003102019887362634 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/books-0001_text_document books | ||
0.0009996745994661548 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/books-0002_text_document books |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 issue (security): Hard-coded library path in environment variable.
The line contains a hard-coded path for the
LD_LIBRARY_PATH
environment variable. While this is not a secret, it is important to ensure that such paths do not expose sensitive directories or files. Consider using environment variables or configuration files to manage paths securely.