[WIP] 21.10 Notebook Testing Report #457

Nicholas-7 · 2021-10-18T08:34:11Z

Describe the bug
Discovered errors in a few notebooks relating to the CLX library after running them in the 21.10 stable release. The errors were tested and found in both CentOS and Ubuntu operating systems. Not sure if these errors are a result of possible updates to the codebase or if it was an uncaught bug.

Steps/Code to reproduce bug
Steps to reproduce the behavior:
1.Go to RAPIDS Sample Notebooks and clone the 21.10 branch
2.Click on the CLX folder
3.Run all the cells of the notebooks to produce the examples illustrated below

Expected behavior
There will be several examples that will create an error. Many examples miss details that could aide in implementation. The code may be a few commits behind from the 21.10 repo.

Environment details (please complete the following information):

Environment location: Docker
Linux Distro/Architecture: Ubuntu 20.04 amd64 and CentOS 8
GPU Model/Driver: GV100, 450.142.00
CUDA: 11.0
Method of Library install: Docker Install

Additional context
Examples of Discrepancies:

Example # 1
CLX_Workflow_Notebook2

workflow = SplunkAlertWorkflow(name="my-splunk-alert-workflow", source=source, destination=dest)
workflow.run_workflow()

Error thrown below:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-637a4e2a27ab> in <module>
      1 workflow = SplunkAlertWorkflow(name="my-splunk-alert-workflow", source=source, destination=dest)
----> 2 workflow.run_workflow()

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/workflow/workflow.py in run_workflow(self)
    179                     self._io_reader.fetch_data()
    180                 )
--> 181                 enriched_dataframe = self.workflow(dataframe)
    182                 if enriched_dataframe and not enriched_dataframe.empty:
    183                     self._io_writer.write_data(enriched_dataframe)

<ipython-input-2-e6dcdb279a63> in workflow(self, dataframe)
      8         # We use a splunk notable parser to parse data raw Splunk notable data.
      9         snp = SplunkNotableParser()
---> 10         parsed_df = snp.parse(dataframe, raw_data_col_name)
     11 
     12         # Create alerts dataframe

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/parsers/splunk_notable_parser.py in parse(self, dataframe, raw_column)
     48         """
     49         # Cleaning raw data to be consistent.
---> 50         dataframe[raw_column] = dataframe[raw_column].str.replace("\\\\", "")
     51         parsed_dataframe = self.parse_raw_event(dataframe, raw_column, self.event_regex)
     52         # Replace null values of all columns with empty.

/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py in __getitem__(self, arg)
    681         """
    682         if _is_scalar_or_zero_d_array(arg) or isinstance(arg, tuple):
--> 683             return self._get_columns_by_label(arg, downcast=True)
    684 
    685         elif isinstance(arg, slice):

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py in _get_columns_by_label(self, labels, downcast)
   1574         If downcast is True, try and downcast from a DataFrame to a Series
   1575         """
-> 1576         new_data = super()._get_columns_by_label(labels, downcast)
   1577         if downcast:
   1578             if is_scalar(labels):

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/frame.py in _get_columns_by_label(self, labels, downcast)
    524 
    525         """
--> 526         return self._data.select_by_label(labels)
    527 
    528     def _get_columns_by_index(self, indices):

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column_accessor.py in select_by_label(self, key)
    344                 if any(isinstance(k, slice) for k in key):
    345                     return self._select_by_label_with_wildcard(key)
--> 346             return self._select_by_label_grouped(key)
    347 
    348     def select_by_index(self, index: Any) -> ColumnAccessor:

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column_accessor.py in _select_by_label_grouped(self, key)
    406 
    407     def _select_by_label_grouped(self, key: Any) -> ColumnAccessor:
--> 408         result = self._grouped_data[key]
    409         if isinstance(result, cudf.core.column.ColumnBase):
    410             return self.__class__({key: result})

KeyError: 'Raw'

Where I am not able to run a workflow as might be expected without a testing source data to determine success

A second instance of the same issue
CLX_Workflow_Notebook3

workflow = SplunkAlertWorkflow(name="splunk_workflow", source=source, destination=dest,
                               threshold=2.0, raw_data_col_name="Raw")
workflow.run_workflow()

Where I am not able to run a workflow as might be expected without a testing source data to determine success

Identical Error thrown below:

---------------------------------------------------------------------------
KeyError: 'Raw'

Example # 2
anomalous_behavior_profiling_supervised

import xgboost as xgb
import cudf
#from cuml.preprocessing import train_test_split
from cuml.preprocessing.model_selection import train_test_split
from cuml import ForestInference
import sklearn.datasets
import cupy

df = cudf.read_json("./labelled_nv_smi.json")

Where

from cuml.preprocessing import train_test_split

Seems to be outdated and isn't recognized - I've changed to:

from cuml.preprocessing.model_selection import train_test_split

Which revealed a second error where ./labelled_nv_smi.json cannot be not found" I am not able to run notebook as might be expected without a testing json file or correct filepath to determine success

Error thrown below:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-8680e3b7e66c> in <module>
----> 1 df = cudf.read_json("./labelled_nv_smi.json")

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/json.py in read_json(path_or_buf, engine, dtype, lines, compression, byte_range, *args, **kwargs)
     95                 compression=compression,
     96                 *args,
---> 97                 **kwargs,
     98             )
     99         df = cudf.from_pandas(pd_value)

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    205                 else:
    206                     kwargs[new_arg_name] = new_arg_value
--> 207             return func(*args, **kwargs)
    208 
    209         return cast(F, wrapper)

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
    612 
    613     with json_reader:
--> 614         return json_reader.read()
    615 
    616 

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read(self)
    746                 obj = self._get_object_parser(self._combine_lines(data_lines))
    747         else:
--> 748             obj = self._get_object_parser(self.data)
    749         self.close()
    750         return obj

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    768         obj = None
    769         if typ == "frame":
--> 770             obj = FrameParser(json, **kwargs).parse()
    771 
    772         if typ == "series" or obj is None:

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in parse(self)
    883 
    884         else:
--> 885             self._parse_no_numpy()
    886 
    887         if self.obj is None:

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1138         if orient == "columns":
   1139             self.obj = DataFrame(
-> 1140                 loads(json, precise_float=self.precise_float), dtype=None
   1141             )
   1142         elif orient == "split":

ValueError: Expected object or value

A second instance of the same issue
Predictive_Maintenance_Sequence_Classifier

import cudf;
from cuml.model_selection._split import train_test_split
#from cuml.preprocessing.model_selection import train_test_split;
from clx.analytics.binary_sequence_classifier import BinarySequenceClassifier;
import s3fs;
from os import path;
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score

dflogs = cudf.read_csv("kernel.tsv", delimiter='\t', header=None, names=['label', 'log'])

Where

from cuml.preprocessing import train_test_split

Seems to be outdated and isn't recognized - I've changed to:

from cuml.preprocessing.model_selection import train_test_split

Which revealed a second error where kernel.tsv cannot be not found" I am not able to run notebook as might be expected without a testing tsv file or correct filepath to determine success.

Identical Error thrown below:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-556afabf946c> in <module>
----> 1 dflogs = cudf.read_csv("kernel.tsv", delimiter='\t', header=None, names=['label', 'log'])

/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
    108         na_filter=na_filter,
    109         prefix=prefix,
--> 110         index_col=index_col,
    111     )
    112 

cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()

FileNotFoundError: [Errno 2] No such file or directory: 'kernel.tsv'

Example # 3
cybert_example_training

padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
label_tensor = torch.tensor(int_labels).to('cuda')

Where I am not able to test unless Torch is compiled with CUDA enabled

Error thrown below:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-14-a9620f79a890> in <module>
      1 padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
      2 int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
----> 3 label_tensor = torch.tensor(int_labels).to('cuda')

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
    164                 "Cannot re-initialize CUDA in forked subprocess. " + msg)
    165         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166             raise AssertionError("Torch not compiled with CUDA enabled")
    167         if _cudart is None:
    168             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled

After discovering the above I tried to remedy by reinstalling pytorch with CUDA enabled using:

from os import path
import s3fs

try:
        import pytorch; print('pytorch Version:', pytorch.__version__)  
except ModuleNotFoundError:
        !conda install pytorch torchvision torchaudio cudatoolkit=11.4 -c pytorch -c nvidia -y
        import pytorch; print('pytorch Version:', pytorch.__version__)
        
#conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.dataset import random_split
from torch.utils.dlpack import from_dlpack


try:
        import seqeval; #print('seqeval Version:', seqeval.__version__)  
except ModuleNotFoundError:
        !conda install -c conda-forge seqeval -y
        import seqeval; #print('seqeval Version:', seqeval.__version__)

from seqeval.metrics import classification_report,accuracy_score,f1_score
from transformers import BertForTokenClassification
from tqdm import tqdm,trange
from collections import defaultdict
import pandas as pd
import numpy as np
import cupy
import cudf

Which failed.

A second instance of the same issue
cybert_log_parsing

cybert = Cybert()
cybert.load_model(MODEL_FILENAME, CONFIG_FILENAME)

After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.

Identical Error thrown below:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-92030a3ca008> in <module>
      1 cybert = Cybert()
----> 2 cybert.load_model(MODEL_FILENAME, CONFIG_FILENAME)

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/cybert.py in load_model(self, model_filepath, config_filepath)
     89             model_filepath, config=config_filepath,
     90         )
---> 91         self._model.cuda()
     92         self._model.eval()
     93         self._model = nn.DataParallel(self._model)

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in cuda(self, device)
    461             Module: self
    462         """
--> 463         return self._apply(lambda t: t.cuda(device))
    464 
    465     def cpu(self: T) -> T:

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    379                 # `with torch.no_grad():`
    380                 with torch.no_grad():
--> 381                     param_applied = fn(param)
    382                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    383                 if should_use_set_data:

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in <lambda>(t)
    461             Module: self
    462         """
--> 463         return self._apply(lambda t: t.cuda(device))
    464 
    465     def cpu(self: T) -> T:

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
    164                 "Cannot re-initialize CUDA in forked subprocess. " + msg)
    165         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166             raise AssertionError("Torch not compiled with CUDA enabled")
    167         if _cudart is None:
    168             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled

A third instance of the same issue
CLX_Supervised_Asset_Classification

cat_cols.remove("label")
ac.train_model(X_train, cat_cols, cont_cols, "label", batch_size, epochs, lr=0.01, wd=0.0)

After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.

Identical Error thrown below:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-17-3cbe42caeeeb> in <module>
      1 cat_cols.remove("label")
----> 2 ac.train_model(X_train, cat_cols, cont_cols, "label", batch_size, epochs, lr=0.01, wd=0.0)

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/analytics/asset_classification.py in train_model(self, train_gdf, cat_cols, cont_cols, label_col, batch_size, epochs, lr, wd)
     96 
     97         self._model = TabularModel(embedding_sizes, n_cont, out_sz, self._layers, self._drops, self._emb_drop, self._is_reg, self._is_multi, self._use_bn)
---> 98         self._to_device(self._model, self._device)
     99         self._config_optimizer()
    100         for i in range(epochs):

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/analytics/asset_classification.py in _to_device(self, data, device)
    264         if isinstance(data, (list, tuple)):
    265             return [self._to_device(x, device) for x in data]
--> 266         return data.to(device, non_blocking=True)

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
    610             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    611 
--> 612         return self._apply(convert)
    613 
    614     def register_backward_hook(

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    379                 # `with torch.no_grad():`
    380                 with torch.no_grad():
--> 381                     param_applied = fn(param)
    382                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    383                 if should_use_set_data:

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in convert(t)
    608             if convert_to_format is not None and t.dim() == 4:
    609                 return t.to(device, dtype if t.is_floating_point() else None, non_blocking, memory_format=convert_to_format)
--> 610             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    611 
    612         return self._apply(convert)

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
    164                 "Cannot re-initialize CUDA in forked subprocess. " + msg)
    165         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166             raise AssertionError("Torch not compiled with CUDA enabled")
    167         if _cudart is None:
    168             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled

Example # 4
DGA_Detection

%%time
dd.train_model(train_data, labels, batch_size=BATCH_SIZE, epochs=EPOCHS, train_size=0.7)

Where I received a memory error.

Error thrown below:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<timed eval> in <module>

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in train_model(self, train_data, labels, batch_size, epochs, train_size, truncate)
    110                     types_tensor = self._create_types_tensor(df["type"])
    111                     df = df.drop(["type", "domain"], axis=1)
--> 112                     input, seq_lengths = self._create_variables(df)
    113                     model_result = self.model(input, seq_lengths)
    114                     loss = self._get_loss(model_result, types_tensor)

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _create_variables(self, df)
    182         df = df.drop("len", axis=1)
    183         seq_len_tensor = torch.LongTensor(seq_len_arr)
--> 184         seq_tensor = self._df2tensor(df)
    185         # Return variables
    186         # DataParallel requires everything to be a Variable

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _df2tensor(self, ascii_df)
    195         """
    196         dlpack_ascii_tensor = ascii_df.to_dlpack()
--> 197         seq_tensor = from_dlpack(dlpack_ascii_tensor).long()
    198         return seq_tensor
    199 

RuntimeError: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend. 'aten::empty.memory_format' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
MkldnnCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/MkldnnCPUType.cpp:144 [kernel]
SparseCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/SparseCPUType.cpp:239 [kernel]
BackendSelect: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/BackendSelectRegister.cpp:761 [kernel]
Named: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCPU: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCUDA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradXLA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse1: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse2: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse3: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
Tracer: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/TraceType_4.cpp:9291 [kernel]
Autocast: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

A second instance of the similar issue
custream_n_graph

output = source.map(process_batch).map(pagerank).sink_to_list()

Which causes the Kernel to restart.

A third instance of the similar issue
Phishing_Detection_using_Bert_CLX

seq_classifier.train_model(X_train["email"], y_train, epochs=1)

Which causes the Kernel to restart.

A fourth instance of the similar issue
pii_detection_training_example

pii_detection_training_example

Which causes the Kernel to restart.

Example # 5
LODA_anomaly_detection

import cupy as cp 
import cudf, cuml 
import matplotlib.pylab as plt 
import cuml.metrics as mt
try:
        import wget; print('wget Version:', wget.__version__)  
except ModuleNotFoundError:
        !conda install -c conda-forge wget -y
        import wget; print('wget Version:', wget.__version__)
#import wget
import s3fs;
from os import path;
%matplotlib inline 

from clx.analytics.loda import Loda

Where wget is not recognized

Error thrown below:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-444183b1dcb0> in <module>
      5 try:
----> 6         import wget; print('wget Version:', wget.__version__)
      7 except ModuleNotFoundError:

ModuleNotFoundError: No module named 'wget'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-444183b1dcb0> in <module>
      7 except ModuleNotFoundError:
      8         get_ipython().system('conda install -c rapidsai wget -y')
----> 9         import wget; print('wget Version:', wget.__version__)
     10 #import wget
     11 import s3fs;

ModuleNotFoundError: No module named 'wget'

Example # 6
FLAIR_DNS_Log_Parsing

data1 = cudf.read_csv('query_output1545120200000_1545163200000.tab', sep='\t',nrows=500000, quoting=3)

Where query_output1545120200000_1545163200000.tab' is not available

Error thrown below:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-a47cb0939166> in <module>
----> 1 data1 = cudf.read_csv('query_output1545120200000_1545163200000.tab', sep='\t',nrows=500000, quoting=3)

/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
    108         na_filter=na_filter,
    109         prefix=prefix,
--> 110         index_col=index_col,
    111     )
    112 

cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()

FileNotFoundError: [Errno 2] No such file or directory: 'query_output1545120200000_1545163200000.tab'

A second instance of the same issue
IDS_using_LODA

dir_path = "put/path/extracted/cic_ids2017/"
datasets = os.listdir(dir_path)

Error thrown below:

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-2268e14fff44> in <module>
      1 dir_path = "put/path/extracted/cic_ids2017/"
----> 2 datasets = os.listdir(dir_path)

FileNotFoundError: [Errno 2] No such file or directory: 'put/path/extracted/cic_ids2017/'

Desired outcome
Clx notebooks should be immediately ready to be replicated and implemented with less effort. Notebooks should be updated to reflect the commits made to the repositories during each release cycle. Clx functions and models work as expected.

Request impacts
Our Clx notebooks are public and require accurate information - Medium Priority

@taureandyernv @fondaing @efajardo-nv @bsuryadevara for awareness

The text was updated successfully, but these errors were encountered:

github-actions · 2021-11-23T20:02:57Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-02-21T21:02:52Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Nicholas-7 added the bug Something isn't working label Oct 18, 2021

Nicholas-7 changed the title ~~[Draft] 21.10 Notebook Testing Report~~ [WIP] 21.10 Notebook Testing Report Oct 18, 2021

github-actions bot added the inactive-30d label Nov 23, 2021

github-actions bot added the inactive-90d label Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] 21.10 Notebook Testing Report #457

[WIP] 21.10 Notebook Testing Report #457

Nicholas-7 commented Oct 18, 2021 •

edited

Loading

github-actions bot commented Nov 23, 2021

github-actions bot commented Feb 21, 2022

[WIP] 21.10 Notebook Testing Report #457

[WIP] 21.10 Notebook Testing Report #457

Comments

Nicholas-7 commented Oct 18, 2021 • edited Loading

github-actions bot commented Nov 23, 2021

github-actions bot commented Feb 21, 2022

Nicholas-7 commented Oct 18, 2021 •

edited

Loading