revisions

SaniyaKhullar · May 1, 2024 · bb9eae8 · bb9eae8
1 parent 9b95b43
commit bb9eae8
Show file tree

Hide file tree

Showing 14 changed files with 5,994 additions and 945 deletions.
diff --git a/README.md b/README.md
@@ -147,12 +147,15 @@ $$ -->
 | **edge_list**      | ***list*** <br> A list of lists corresponding to a prior network involving predictors (nodes) and relationships among them (edges): <br> [[source<sub>1</sub>, target<sub>1</sub>, weight<sub>1</sub>], ..., [source<sub>Z</sub>, target<sub>Z</sub>, weight<sub>Z</sub>]]. Here, weight<sub>1</sub>, ..., weight<sub>Z</sub> are optional. Nodes found in the `edge_list` are referred to as *network nodes* | 
 | **beta_net** | ***float, default = 1*** <br> Regularization parameter for network penalization: $\beta_{net} \geq 0$. | 
 | **alpha_lasso**  | ***float, default = 0.01*** <br> A numerical regularization parameter for the lasso term ($\alpha_{lasso} \geq 0$) needed if `model_type = LassoCV`. Larger values typically reduce the number of final predictors in the model. |
-| **default_edge_weight**  | ***float, default = 0.1*** <br>  Default edge weight ($w$) assigned to any edge with missing weight | 
-| **degree_threshold**  |  ***float, default = 0.5*** <br>  Edges with weight $w$ > degree_threshold are counted as 1 towards the node degree $d$ |
+| **default_edge_weight**  | ***float, default = 0.01*** <br>  Default edge weight ($w$) assigned to any edge with missing weight | 
+| **edge_vals_for_d**  | ***boolean, default = True*** <br>  If True, we focus on summing the edge weights to calculate the node degree $d$ | 
+| **w_transform_for_d**  |  ***string, default = "none"*** <br>  Other options are "sqrt", "square", "avg". Here, "none" means we add the weights of the edges that are connected to the node to get the node degree. These other options represent transformations that can be done on this sum to yield various other node degree values $d$ |
+| **degree_threshold**  |  ***float, default = 0.5*** <br>  If *edge_vals_for_d* is False, then Edges with weight $w$ > degree_threshold are counted as 1 towards the node degree $d$ |
 | **gene_expression_nodes**      | ***list, default = []*** <br> A list of predictors (e.g. TFs) to use that typically is found as columns in the training gene expression data $X_{train}$. <br> Any `gene_expression_nodes` not found in the `edge_list` are added internally into the network prior `edge_list` using pairwise `default_edge_weight`. Specifying `gene_expression_nodes` is *optional* but may boost the speed of training and fitting NetREm models (by adjusting the network prior in the beginning). Thus, if the gene expression data ($X$) is available, it is recommended to input `gene_expression_nodes`. Otherwise, NetREm  automatically determines `gene_expression_nodes` when fitting the model with $X_{train}$ gene expression data (when *fit(X,y)* is called), but needs time to recalibrate the network prior based on $X_{train}$ nodes and value set for `overlapped_nodes_only`. |
 | **overlapped_nodes_only**      | ***boolean, default = False*** <br> This determines if NetREm should focus on common nodes found in *network nodes* (from `edge_list`) and gene expression data (based on `gene_expression_nodes`). Here, *network nodes* not found in the gene expression data will always be removed. The priority is given to `gene_expression_nodes` since those have gene expression values that are used by the regression. <br> • If `overlapped_nodes_only = False`, the predictors will come from `gene_expression_nodes`, even if those are not found in the network `edge_list`. Some predictors may lack relationships in the prior network. <br> • If `overlapped_nodes_only = True`, the predictors used will need to be a common node: *network node* also found in the `gene_expression_nodes`. <br> See [overlapped_nodes_only.pdf](https://github.com/SaniyaKhullar/NetREm/blob/main/user_guide/overlapped_nodes_only.pdf) for hands-on examples. | 
-| **standardize_X**      | ***boolean, default = True*** <br> This determines if NetREm should standardize $X$: subtracting the mean of $X$ and dividing by the standard deviation of $X$ using the training data. <br> | 
-| **center_y**      | ***boolean, default = True*** <br> This determines if NetREm should center $y$: subtracting the mean of $y$ based on the training data  <br> | 
+| **standardize_X**      | ***boolean, default = True*** <br> This determines if NetREm should standardize $X$, for each predictor column: subtracting the mean of $X$ and dividing by the standard deviation of $X$ using the training data. <br> | 
+| **standardize_y**      | ***boolean, default = True*** <br> This determines if NetREm should standardize $y$: subtracting the mean of $y$ and dividing by the standard deviation of $y$ using the training data. <br> | 
+| **center_y**      | ***boolean, default = False*** <br> This determines if NetREm should center $y$: subtracting the mean of $y$ based on the training data  <br> | 
 | **y_intercept** | ***boolean, default = 'False'*** <br> This is the `fit_intercept` parameter found in the [Lasso](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) and [LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html) classes in sklearn. <br> • If `y_intercept = True`, the model will be fit with a y-intercept term included. <br> • If `y_intercept = False`, the model will be fit with no y-intercept term. | 
 | **view_network**  |  ***boolean, default = False*** <br>  • If `view_network = True`, then NetREm outputs visualizations of the prior graph network. Recommended for small networks (instead of for dense hairballs) <br> If `view_network = False`, then NetREm saves time by not outputting visuals of the network.  |
 | **model_type** | ***{'Lasso', 'LassoCV'}, default = 'Lasso'*** <br> • Lasso: user specifies value of $\alpha_{lasso}$ <br> • LassoCV: NetREm performs cross-validation (CV) on training data to determine optimal $\alpha_{lasso}$  |