-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add news.md support #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Suggested a refactor of merge_text_whitespaces
that hopefully cuts down on the different conditional paths through the
R/rd2markdown.R
Outdated
#' @param level optional level parameter. 2L by default | ||
#' @param child_level optional parameter that will be set as level to | ||
#' "child" tags parsed using `map_rd2markdown`. It allows proper rendering | ||
#' of sections and subsections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both a level
and a child_level
? I would probably lean toward getting rid of child_level
and just calling map_rd2markdown
with level = level + 1
in the one place it's used.
R/rd2markdown.R
Outdated
rd2markdown.description <- function(x, fragments = c(), ..., title = NULL, level = 2L, child_level = level) { | ||
out <- map_rd2markdown(x, ..., collapse = "", level = child_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rd2markdown.description <- function(x, fragments = c(), ..., title = NULL, level = 2L, child_level = level) { | |
out <- map_rd2markdown(x, ..., collapse = "", level = child_level) | |
rd2markdown.description <- function(x, fragments = c(), ..., title = NULL, level = 2L) { | |
out <- map_rd2markdown(x, ..., collapse = "", level = level + 1L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now when I think about it I had not idea why I introduced this param in the first place. Probably was thinking about something different and overdone it. Removed
R/map_rd2markdown.R
Outdated
merge_text_whitespaces <- function(x) { | ||
ws <- vapply(x, function(y) grepl("^ *$", y), FUN.VALUE = logical(1)) | ||
ib <- vapply(x, function(y) is_block(y), FUN.VALUE = logical(1)) | ||
ws <- ws & !ib | ||
|
||
seqs <- rle(ws) | ||
values <- seqs$values | ||
lengths <- seqs$lengths | ||
|
||
merged <- list() | ||
i <- 1 | ||
while (i <= length(values)) { | ||
|
||
# Current value was already included | ||
if (lengths[i] == 0) { | ||
i <- i + 1 | ||
next() | ||
} | ||
# Get index corresponding to end of current's value subsequence | ||
ws_inds <- sum(lengths[seq(i)]) | ||
# Get subvector of the same value of ws | ||
x_sub <- x[seq(ws_inds - lengths[i] + 1, ws_inds)] | ||
|
||
if (!values[i]) { | ||
merged <- append(merged, x_sub) | ||
} else { | ||
merged_ws <- paste0(x_sub, collapse = "") | ||
if (isFALSE(values[i + 1]) && !is_block(x[[ws_inds + 1]])) { | ||
merged_ws <- paste0(merged_ws, x[[ws_inds + 1]], collapse = "") | ||
# We already include one of the next FALSE, therefore we reduce | ||
# corresponding lengths value by 1 and increment current one | ||
lengths[i + 1] <- lengths[i + 1] - 1 | ||
lengths[i] <- lengths[i] + 1 | ||
merged <- append(merged, list(merged_ws)) | ||
# If next element is a block, we will try to append to the previous. | ||
} else if (isFALSE(values[i - 1])) { | ||
# Previous value was FALSE, therefore we can use last value in | ||
# merged to append to unless it is block. | ||
if (!is_block(merged[[length(merged)]])) { | ||
merged_ws <- paste0(merged[[length(merged)]], merged_ws, collapse = "") | ||
merged[[length(merged)]] <- merged_ws | ||
} else { | ||
# Previous value was a block. Merge standalone space TEXT | ||
merged_ws <- paste0(x_sub, collapse = "") | ||
merged <- append(merged, list(merged_ws)) | ||
} | ||
} else { | ||
# Entire x consists of ws only | ||
merged_ws <- paste0(x_sub, collapse = "") | ||
merged <- append(merged, merged_ws) | ||
} | ||
} | ||
|
||
i <- i + 1 | ||
} | ||
|
||
merged | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels more complicated than I think it needs to be. Instead, we can figure out which bits need to be merged, then paste them together instead of scanning through with many different conditional behaviors.
Note: This isn't a drop-in replacement. Instead of [" ", "a", " ", "b", " "] => [" a", " b "]
as it is now, this will produce [" a ", "b "]
. This only affects the merge_text_whitespaces
tests, though.
merge_text_whitespaces <- function(x) { | |
ws <- vapply(x, function(y) grepl("^ *$", y), FUN.VALUE = logical(1)) | |
ib <- vapply(x, function(y) is_block(y), FUN.VALUE = logical(1)) | |
ws <- ws & !ib | |
seqs <- rle(ws) | |
values <- seqs$values | |
lengths <- seqs$lengths | |
merged <- list() | |
i <- 1 | |
while (i <= length(values)) { | |
# Current value was already included | |
if (lengths[i] == 0) { | |
i <- i + 1 | |
next() | |
} | |
# Get index corresponding to end of current's value subsequence | |
ws_inds <- sum(lengths[seq(i)]) | |
# Get subvector of the same value of ws | |
x_sub <- x[seq(ws_inds - lengths[i] + 1, ws_inds)] | |
if (!values[i]) { | |
merged <- append(merged, x_sub) | |
} else { | |
merged_ws <- paste0(x_sub, collapse = "") | |
if (isFALSE(values[i + 1]) && !is_block(x[[ws_inds + 1]])) { | |
merged_ws <- paste0(merged_ws, x[[ws_inds + 1]], collapse = "") | |
# We already include one of the next FALSE, therefore we reduce | |
# corresponding lengths value by 1 and increment current one | |
lengths[i + 1] <- lengths[i + 1] - 1 | |
lengths[i] <- lengths[i] + 1 | |
merged <- append(merged, list(merged_ws)) | |
# If next element is a block, we will try to append to the previous. | |
} else if (isFALSE(values[i - 1])) { | |
# Previous value was FALSE, therefore we can use last value in | |
# merged to append to unless it is block. | |
if (!is_block(merged[[length(merged)]])) { | |
merged_ws <- paste0(merged[[length(merged)]], merged_ws, collapse = "") | |
merged[[length(merged)]] <- merged_ws | |
} else { | |
# Previous value was a block. Merge standalone space TEXT | |
merged_ws <- paste0(x_sub, collapse = "") | |
merged <- append(merged, list(merged_ws)) | |
} | |
} else { | |
# Entire x consists of ws only | |
merged_ws <- paste0(x_sub, collapse = "") | |
merged <- append(merged, merged_ws) | |
} | |
} | |
i <- i + 1 | |
} | |
merged | |
} | |
merge_text_whitespaces <- function(x) { | |
if (length(x) == 0) return(x) | |
# assumes length x > 0 | |
spaces <- grepl("^ *$", x, perl = TRUE) | |
is_trailing <- cumsum(!spaces) > 1 # after at least 1 non-space character | |
blocks <- vlapply(x, is_block) | |
groups <- cumsum(!spaces & is_trailing | blocks) | |
# collapse non-block groups, maintain first-in-group's attributes | |
unname(lapply(split(x, groups), function(group) { | |
if (is_block(group[[1]])) { | |
group[[1]] | |
} else { | |
atts <- attributes(group[[1]]) | |
result <- paste(group, collapse = "") | |
attributes(result) <- atts | |
result | |
} | |
})) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is much simpler! I feel a little bit miserable now :P
There are a couple of bugs, however (for instance if the string starts with a block or pearl matching). I will fix them in your version and prompt re-review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have sligthly refactored your approach, by adding another split logic. The problem with blocks was that they are not only non-space characters but also characters to which we cannot append. That's why it would be really hard to follow your strategy to append always after non-block group. It was leading to skipping first non-space character after block and all space character after block resulting in edge cases like list(block(), " ", " ", block())
-> list(block(), block())
which is something we want to avoid. That's why there were so many edge cases in my original solution. Therefore I simply combined both solutions and removed the problem by doing additional split on each block. Now each sub list is treated (using your brilliant idea) as there were no blocks and everything is merged as expected. Then the blocks and list are pasted back do get the desired result. I think this is a good compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. This might keep the simplicity but resolve these cases:
merge_text_spaces <- function(x) {
if (length(x) == 0) return(x)
# assumes length x > 0
spaces <- grepl("^ *$", x)
blocks <- vlapply(x, is_block)
lag_blocks <- c(FALSE, blocks[-length(x)])
is_trailing_after_block <- (cumsum(!spaces) - cumsum(cumsum(!spaces) * blocks)) > 1
groups <- cumsum(!spaces & is_trailing_after_block | blocks | lag_blocks)
# collapse non-block groups, maintain first-in-group's attributes
unname(lapply(split(x, groups), function(group) {
if (is_block(group[[1]])) {
group[[1]]
} else {
atts <- attributes(group[[1]])
result <- paste(group, collapse = "")
attributes(result) <- atts
result
}
}))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_trailing_after_block <- (cumsum(!spaces) - cumsum(cumsum(!spaces) * blocks)) > 1
although it's a really smart neat solution, is it simpler? There is a lot of happening in this line and might be hard to unpack for someone reading this code in the future (and this is an open source). To be honest I'm leaning towards just splitting the list on each block and treating strings as if there were no blocks and merging afterwards. I think might be simpler to work around this in the future. What's your opinion on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. I still prefer this style, but maybe we can break it down a bit so it's more clear. What do you think about this as a more annotated form of the same code:
# assumes length x > 0
spaces <- grepl("^ *$", x)
is_text <- !spaces
blocks <- vlapply(x, is_block)
lag_blocks <- c(FALSE, blocks[-length(x)])
# find cells within each block that follow at least one non-space element
block_text_start <- cumsum(is_text) * blocks
block_text_index <- cumsum(is_text) - cumsum(block_text_start)
is_block_first_text <- block_text_index <= 1
# determine which spans of elements need to be flattened
groups <- cumsum(is_text & !is_block_first_text | blocks | lag_blocks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so much better. Let me test it against some of the edge-cases but just by the look of it it is promising.
@maksymiuks I tried looking into this issue with the excessive newlines between block regions, but I think it's an issue in the current code and not related to this PR. From what I can tell, when a This might just be more visible now that the subsections are handled. In the spirit of keeping this PR minimal and to get us unstuck for the |
This was a wild journey through white space management. After adding the subsection tag (5min) I discovered that sometimes we might get excessive spaces from the Rd parsing. In most cases, it is not a problem, but sometimes it could get appended to the section definition and this breaks the rendering. And then I went down that rabbit hole (countless hours) trying to fix it. My answer to that is general and I believe it will allow us to avoid many other bugs in the future. If there are TEXT tags with standalone spaces (
" "
) (not white spaces, just spaces, every other sign is fine) I merge them to the nearest meaningful TEXT tags. This way if they are excessive, they are properly pruned further by theclean_text_whitespace
function, which upon creation didn't account for possible standalone spaces, instead of being pasted to the output. I believe it solves this and many issues that might occur in the future spanning from these excessive spaces.Additionally, it is totally possible I overdone it and there is a simple solution, so please be critical, I really want a second pair of eyes on it. It was a good lesson anyway. I appreciate it might be hard to get a grasp of these changes without a proper context so wd can hop in on a call once you are bag @dgkf