You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 19, 2023. It is now read-only.
Describe the bug
Discovered errors in a few notebooks relating to the CLX library after running them in the 21.10 stable release. The errors were tested and found in both CentOS and Ubuntu operating systems. Not sure if these errors are a result of possible updates to the codebase or if it was an uncaught bug.
Steps/Code to reproduce bug
Steps to reproduce the behavior:
1.Go to RAPIDS Sample Notebooks and clone the 21.10 branch
2.Click on the CLX folder
3.Run all the cells of the notebooks to produce the examples illustrated below
Expected behavior
There will be several examples that will create an error. Many examples miss details that could aide in implementation. The code may be a few commits behind from the 21.10 repo.
Environment details (please complete the following information):
Environment location: Docker
Linux Distro/Architecture: Ubuntu 20.04 amd64 and CentOS 8
Example # 2
anomalous_behavior_profiling_supervised
import xgboost as xgb
import cudf
#from cuml.preprocessing import train_test_split
from cuml.preprocessing.model_selection import train_test_split
from cuml import ForestInference
import sklearn.datasets
import cupy
df = cudf.read_json("./labelled_nv_smi.json")
Where
from cuml.preprocessing import train_test_split
Seems to be outdated and isn't recognized - I've changed to:
from cuml.preprocessing.model_selection import train_test_split
Which revealed a second error where ./labelled_nv_smi.json cannot be not found" I am not able to run notebook as might be expected without a testing json file or correct filepath to determine success
Error thrown below:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-8680e3b7e66c> in <module>
----> 1 df = cudf.read_json("./labelled_nv_smi.json")
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/json.py in read_json(path_or_buf, engine, dtype, lines, compression, byte_range, *args, **kwargs)
95 compression=compression,
96 *args,
---> 97 **kwargs,
98 )
99 df = cudf.from_pandas(pd_value)
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
205 else:
206 kwargs[new_arg_name] = new_arg_value
--> 207 return func(*args, **kwargs)
208
209 return cast(F, wrapper)
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
612
613 with json_reader:
--> 614 return json_reader.read()
615
616
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read(self)
746 obj = self._get_object_parser(self._combine_lines(data_lines))
747 else:
--> 748 obj = self._get_object_parser(self.data)
749 self.close()
750 return obj
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
768 obj = None
769 if typ == "frame":
--> 770 obj = FrameParser(json, **kwargs).parse()
771
772 if typ == "series" or obj is None:
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in parse(self)
883
884 else:
--> 885 self._parse_no_numpy()
886
887 if self.obj is None:
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1138 if orient == "columns":
1139 self.obj = DataFrame(
-> 1140 loads(json, precise_float=self.precise_float), dtype=None
1141 )
1142 elif orient == "split":
ValueError: Expected object or value
A second instance of the same issue
Predictive_Maintenance_Sequence_Classifier
import cudf;
from cuml.model_selection._split import train_test_split
#from cuml.preprocessing.model_selection import train_test_split;
from clx.analytics.binary_sequence_classifier import BinarySequenceClassifier;
import s3fs;
from os import path;
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
dflogs = cudf.read_csv("kernel.tsv", delimiter='\t', header=None, names=['label', 'log'])
Where
from cuml.preprocessing import train_test_split
Seems to be outdated and isn't recognized - I've changed to:
from cuml.preprocessing.model_selection import train_test_split
Which revealed a second error where kernel.tsv cannot be not found" I am not able to run notebook as might be expected without a testing tsv file or correct filepath to determine success.
padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
label_tensor = torch.tensor(int_labels).to('cuda')
Where I am not able to test unless Torch is compiled with CUDA enabled
Error thrown below:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-14-a9620f79a890> in <module>
1 padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
2 int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
----> 3 label_tensor = torch.tensor(int_labels).to('cuda')
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
164 "Cannot re-initialize CUDA in forked subprocess. " + msg)
165 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166 raise AssertionError("Torch not compiled with CUDA enabled")
167 if _cudart is None:
168 raise AssertionError(
AssertionError: Torch not compiled with CUDA enabled
After discovering the above I tried to remedy by reinstalling pytorch with CUDA enabled using:
from os import path
import s3fs
try:
import pytorch; print('pytorch Version:', pytorch.__version__)
except ModuleNotFoundError:
!conda install pytorch torchvision torchaudio cudatoolkit=11.4 -c pytorch -c nvidia -y
import pytorch; print('pytorch Version:', pytorch.__version__)
#conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.dataset import random_split
from torch.utils.dlpack import from_dlpack
try:
import seqeval; #print('seqeval Version:', seqeval.__version__)
except ModuleNotFoundError:
!conda install -c conda-forge seqeval -y
import seqeval; #print('seqeval Version:', seqeval.__version__)
from seqeval.metrics import classification_report,accuracy_score,f1_score
from transformers import BertForTokenClassification
from tqdm import tqdm,trange
from collections import defaultdict
import pandas as pd
import numpy as np
import cupy
import cudf
Which failed.
A second instance of the same issue
cybert_log_parsing
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-2-2268e14fff44> in <module>
1 dir_path = "put/path/extracted/cic_ids2017/"
----> 2 datasets = os.listdir(dir_path)
FileNotFoundError: [Errno 2] No such file or directory: 'put/path/extracted/cic_ids2017/'
Desired outcome
Clx notebooks should be immediately ready to be replicated and implemented with less effort. Notebooks should be updated to reflect the commits made to the repositories during each release cycle. Clx functions and models work as expected.
Request impacts
Our Clx notebooks are public and require accurate information - Medium Priority
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Describe the bug
Discovered errors in a few notebooks relating to the CLX library after running them in the 21.10 stable release. The errors were tested and found in both CentOS and Ubuntu operating systems. Not sure if these errors are a result of possible updates to the codebase or if it was an uncaught bug.
Steps/Code to reproduce bug
Steps to reproduce the behavior:
1.Go to RAPIDS Sample Notebooks and clone the 21.10 branch
2.Click on the CLX folder
3.Run all the cells of the notebooks to produce the examples illustrated below
Expected behavior
There will be several examples that will create an error. Many examples miss details that could aide in implementation. The code may be a few commits behind from the 21.10 repo.
Environment details (please complete the following information):
Additional context
Examples of Discrepancies:
Example # 1
CLX_Workflow_Notebook2
Error thrown below:
Where I am not able to run a workflow as might be expected without a testing source data to determine success
A second instance of the same issue
CLX_Workflow_Notebook3
Where I am not able to run a workflow as might be expected without a testing source data to determine success
Identical Error thrown below:
Example # 2
anomalous_behavior_profiling_supervised
Where
Seems to be outdated and isn't recognized - I've changed to:
Which revealed a second error where ./labelled_nv_smi.json cannot be not found" I am not able to run notebook as might be expected without a testing json file or correct filepath to determine success
Error thrown below:
A second instance of the same issue
Predictive_Maintenance_Sequence_Classifier
Where
Seems to be outdated and isn't recognized - I've changed to:
Which revealed a second error where kernel.tsv cannot be not found" I am not able to run notebook as might be expected without a testing tsv file or correct filepath to determine success.
Identical Error thrown below:
Example # 3
cybert_example_training
Where I am not able to test unless Torch is compiled with CUDA enabled
Error thrown below:
After discovering the above I tried to remedy by reinstalling pytorch with CUDA enabled using:
Which failed.
A second instance of the same issue
cybert_log_parsing
After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.
Identical Error thrown below:
A third instance of the same issue
CLX_Supervised_Asset_Classification
After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.
Identical Error thrown below:
Example # 4
DGA_Detection
Where I received a memory error.
Error thrown below:
A second instance of the similar issue
custream_n_graph
Which causes the Kernel to restart.
A third instance of the similar issue
Phishing_Detection_using_Bert_CLX
Which causes the Kernel to restart.
A fourth instance of the similar issue
pii_detection_training_example
Which causes the Kernel to restart.
Example # 5
LODA_anomaly_detection
Where wget is not recognized
Error thrown below:
Example # 6
FLAIR_DNS_Log_Parsing
Where query_output1545120200000_1545163200000.tab' is not available
Error thrown below:
A second instance of the same issue
IDS_using_LODA
Error thrown below:
Desired outcome
Clx notebooks should be immediately ready to be replicated and implemented with less effort. Notebooks should be updated to reflect the commits made to the repositories during each release cycle. Clx functions and models work as expected.
Request impacts
Our Clx notebooks are public and require accurate information - Medium Priority
@taureandyernv @fondaing @efajardo-nv @bsuryadevara for awareness
The text was updated successfully, but these errors were encountered: