Skip to content

Commit

Permalink
Markdown Embedding
Browse files Browse the repository at this point in the history
Add support for reading markdown files that contain embedded dftxt data within one or more fenced code blocks.
  • Loading branch information
sernst committed Jan 24, 2024
1 parent 78c0f8c commit 993235a
Show file tree
Hide file tree
Showing 16 changed files with 579 additions and 37 deletions.
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,16 @@ Clarissa Dalloway Mrs. Dalloway 1925
Toad The Wind & the Willow 1906
```

It's also possible to embed dftxt into Markdown files with fenced code blocks that use
the `df` or `dftxt` type signifier. Multiple fenced code blocks will be collectively
extracted into the loaded DataFrame, which makes inline commenting of blocks quite
useful.

For examples of what that looks like see:

- [Markdown with dftxt Example](./dftxt/tests/_io/_markdown/scenarios/multiple_frames/source.md)
- [Single DataFrame broken out across multiple blocks](./dftxt/tests/_io/_markdown/scenarios/single_frame/source.md)

## Benefits

The benefits of the dftxt DataFrame serialization format include:
Expand Down Expand Up @@ -194,6 +204,20 @@ Euro Zone Euro 0.924 0.951 0.846 0.877 0.893
# The values here are yearly average currency exchange rates converting into USD.
```

#### Embedded in Markdown

Markdown is a fairly ubiquitous way to create human-readable documentation that also
renders nicely in IDEs and code collaboration tools. As such, dftxt supports embedding
dftxt data within Markdown as fenced code blocks (triple backticks) that have the `df`
or `dftxt` specifier after them. It's possible to specify multiple DataFrames this way
and break DataFrames up into multiple markdown fenced code blocks for inline commenting
where desirable.

For examples of what that looks like see:

- [Markdown with dftxt Example](./dftxt/tests/_io/_markdown/scenarios/multiple_frames/source.md)
- [Single DataFrame broken out across multiple blocks](./dftxt/tests/_io/_markdown/scenarios/single_frame/source.md)

### 3. Diff/Code Review Friendly

The benefits of the dftxt file format that make it human-friendly are also what make it
Expand Down
75 changes: 75 additions & 0 deletions dftxt/_io/_markdown.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import re
import typing

_START_FENCE_PATTERN = re.compile(
r"(^|\n)(?P<fence>(```|~~~))(?P<type>(dftxt|df))[ \t]*(?P<args>[^\n]*)\n"
)
_END_FENCE_PATTERN = re.compile(r"\n(```|~~~)")


def _parse_args(args: str) -> typing.Dict[str, str]:
"""Parse the arguments of a dftxt fence."""
exploded = [a.strip() for a in re.split(r"\s+", args)]
if not exploded:
return {"name": "", "action": "append"}

if exploded[0] == "...":
return {"name": "", "action": "wrap"}

name = exploded[0]
if name.endswith("..."):
name = name[:-3]
action = "wrap"
else:
action = "append"
return {"name": name, "action": action}


def _combine_frame_sections(frame_sections: typing.Dict[str, typing.List[str]]) -> str:
"""Combine the extracted frame sections into a dftxt file."""
keys = list(frame_sections.keys())
if not keys:
return ""

if len(keys) == 1 and keys[0] == "":
return "{}\n".format("\n".join(frame_sections[""]).rstrip())

frames: typing.List[str] = []
for key in keys:
header = "{}---".format("" if not frames else "\n")
if key:
header += f" {key} ---"
frames.append(
"{}\n\n{}".format(header, "\n".join(frame_sections[key]).rstrip("\n"))
)

return "{}\n".format("\n".join(frames).rstrip())


def extract(markdown: str) -> str:
"""Extract dftxt from a markdown file."""
cleaned = markdown.replace("\r", "")
offset = 0
frame_sections: typing.Dict[str, typing.List[str]] = {}
while offset < len(cleaned):
opening_match = _START_FENCE_PATTERN.search(cleaned, offset)
if opening_match is None:
offset = len(cleaned)
break

ending_match = _END_FENCE_PATTERN.search(cleaned, opening_match.end())
if ending_match is None:
offset = len(cleaned)
break

offset = ending_match.end()

args = _parse_args(opening_match.group("args"))
section = cleaned[opening_match.end() : ending_match.start()]
if args["action"] == "wrap":
section = f"\n\n{section.strip()}"
if args["name"] not in frame_sections:
frame_sections[args["name"]] = []
frame_sections[args["name"]].append(section)

return _combine_frame_sections(frame_sections)
Loading

0 comments on commit 993235a

Please sign in to comment.