- Performance improvements in extracting bold and italic nodes. (#133)
- Performance improvements in
__setitem__
/__delitem__
andpformat
/plain_text
methods. (#131)
- Fixed a bug in
plain_text
causingIndexError
when using a custom function to replacetemplates
/parser_functions
.
- Fixed a bug in
plain_text
not detecting images with multiple dots correctly. (#129)
- Fixed: Equal signs in extension tag attributes are no longer confused with name-value separator in arguments. (#128)
- Fixed a bug in
plain_text
. (#126) - Fixed another bug in parsing tables that end without a
|}
mark. (#125)
- Fixed bug in parsing tables that end without a
|}
mark. (#124)
- Fixed: regression in
plain_text
not being able to handle wikilinks only containing fragment/anchor, not title.
plain_text
method now uses a more accurate image-detection algorithm.
- Fixed and improved handling of tables and images in
plain_text
(#122)
- Added:
top_levels_only
argument toget_sections
. - Deprecated: Calling
get_sections
with positional arguments is now deprecated.
- Fixed some bugs in
plain_text
method. (#119, #120) - Fixed bug in
get_tags
. (#121)
- Fixed a bug in
WikiText.external_links
not detecting external links inserted via overwriting a template string. (#74) - The following already deprecated functions/parameters are removed:
- Setting
Parameter.default
toNone
is not possible anymore. Usedel Parameter.default
instead. - The default value for
preserve_spacing
parameter ofTemplate.set_arg
is nowFalse
. (It was deprecated to call this method without providing a value forpreserve_spacing
) - The
pattern
parameter ofWikiList.sublists
,WikiList.get_lists
andWikiText.get_lists
cannot beNone
anymore. Use the default value instead. WikiText.lists`
andWikiText.tags
are removed. Useget_lists`
orget_tags
instead.
- Setting
- Fixed a bug in
plain_text()
/remove_markup
, not being able to handle table with row/colspan. (#116) plain_text()
will now include table captions.
- Fixed a syntax error for Python < 3.10.
- BREAKING CHANGE: dropping Python 3.6 support.
- Fixed error in getting
plain_text()
of emptied-out wikitext (#113) - Deprecated: Calling
Template.set_arg()
without specifying a value forpreserve_spacing
parameter is deprecated. This is a temporary warning in preparation for changing the default value of this parameter fromTrue
toFalse
. (#111) - Fixed the
stacklevel
of warnings. - New feature:
plain_text()
replaces wiki-tables with a TSV string. (#115)
- Fixed a bug in detecting reverse pipe tricks as wikilinks.
- Fixed a bug in
WikiText.external_links
causing external links within extension tags (e.g. ref tag) not to be detected when tag is inside a template/parser function/parameter. (#110)
WikiText.get_lists
now correctly detects lists with a missing level (#70)WikiList.sublists
are now returned in sorted order.
- Fixed a bug in
WikiText.pformat
which used to causeIndexError
on a parser function which had no argument, e.g. for{{FULLPAGENAMEE}}
.
- Feat:
Table
objects now haverow_attrs
property. - Fixed: Infinite loop on parsing tables containing
\r
. (this is just to prevent infinite loop, CRLF line endings are not supported)
- Fixed: Handle empty tables instead of raising IndexError. (#107)
- Fixed an issue in handling of / in tags. (#108)
- Fixed a false-positive detection of invalid external links. (#109)
- Fixed an issue in
Template.normal_name()
causing IndexError on empty/invalid template names, e.g.{{Template:}}
. (#105)
- Fixed a bug in
plain_text
/remove_markup
causing duplicate values when replacing nested templates.
- Feature:
replace_templates
andreplace_parser_functions
parameters ofplain_text
/remove_markup
now accept a function mappingTemplate
orParserFuction
objects to desired replacement string. (#103)
- Fixed a bug in
Tag.parsed_contents
method. (#102)
- Fixed a bug in
plain_text
method. (#101)
- Fixed a bug in
pformat
andplain_text
methods. (#100)
- BREAKING: dropped support for Python 3.5
- Fixed: bug in handling of external links with uppercase scheme. (#99)
- Fix missing tables rows after comments (#98)
- Fixed: Templates titles cannot include wikilinks
- Fixed: Detection of tags withing WikiLinks (#96)
- Fixed a bug in
Template.set_arg
causing duplicate values. (#97)
- Fixed problem in detecting extension tags with uppercase letters in their names (#95)
- Fixed regex requirement for Python 3.5 on Windows platform.
- Fixed handling of external links within definition lists. (#91)
- Fixed a bug in
plain_text
method, not handling self-closing tags correctly.
- Fixed a bug that was causing the parser to hang when parsing complicated nested tags.
- Fixed the order of items in
WikiList.fullitems
. (#72) - Fixed and improved a few edge cases in
Table.caption
. (pr #81) - Fixed handling of external links within definition lists. (pr #83)
- Fixed a bug in parsing extension tags. (#90)
- MW variables are now recognized recognized as parser functions, not templates. (#69)
- Fixed a bug in mutation of root element when a child was mutated. (#66)
- Fixed a bug that was causing templates like
{{NAMESPACE|2}}
to be detected as a parser function. It is a template if the first argument starts with a:
. - Fixed bugs in detecting attributes of table cells. (#71, #73)
- Fixed a bug in detecting header cells in tables. (#77)
- Fixed a bug in
get_tags
where extension tags without attributes were not returned. (#84) - Fixed a bug in
get_tables
method where tables within tag extensions were not recognized (#85)
- Fixed a bug in detection parser functions without parameters.
{{NAMESPACE}}
used to be detected as template, but{{NAMESPACE:MediaWiki}}
a parser function. Now both of them will be detected parser functions.
- Fix a bug in detecting external links within extension tags. (#65)
- Fix a few bugs
plain_text
/remove_markup
. (#65)
- Detect unclosed comments, e.g.
<!== a
. - Fix parsing priority of tag extensions and comments. For example the comment in
<ref>b<!--c</ref>d-->
used to be parsed as with<!--c</ref>d-->
as comment which was incorrect.
- Fixed a catastrophic backtracking issue in parsing nested extension tags. (#60)
- Fixed a bug in
Bold.text
andItalic.text
, failing to parse objects containing\n
. (#61)
- Fixed a bug in parsing tags containing the
<
character. (#58) - Updated the list of known extension tags.
- Improved detection of nested tag extensions, e.g. a
<ref>
tag within<references>
.
- Fixed a bug in
get_bolds_and_italics
causing it to return duplicate items in some situations. This was also causing an error inplain_text
method. (#57)
- Fixed bug in matching header cells in
Table.cells
. (#53) - Add
Cell.is_header
property.
- Fixed a bug in detection of
Table.caption
andTable.caption_attrs
.
- Improve the performance of
get_bolds_and_italics(recursive=True, filter_cls=None)
. - Fix a bug in
get_bolds_and_italics(recursive=False, filter_cls=None)
which was causing it to return recursive Bold items.
- Remove the deprecated parameters of
Template.normal_name()
. - Fix a bug in
get_bolds_and_italics()
which was causing it to return onlyBold
items.
- Fix a bug in handling of comments in template names. (#54)
- Improve the handling of weird
colspan
androwspan
values in tables. (#53)
- Fix a syntax error in Python 3.5.
- BREAKING CHANGE:
- Remove
replace_bolds
/replace_italics
params fromremove_markup
/plain_text
methods. Users can use the newreplace_bolds_and_italics
parameter. Removing only bolds or only italics is no longer possible.
- Add
get_bolds_and_italics
as a new method. - Fixed bugs and rewrote the algorithm for finding
Bold
andItalic
objects. (#51)
- Trying to mutate an overwritten/detached object will now raise
DeadIndexError
(a subclass ofTypeError
). Hopefully this will prevent some subtle late-appearing bugs.
- Fix a bug in
plaintext
method.
- Fix a bug in detection of external links in parsable tag extensions. (#50)
- Fix a bug in handling of half-marked bold/italic, e.g.
'''bold\n
.
- Fix a bug handling of half-marked bold/italic items e.g.
'''bold text\n
.
- Improve handling of extension tags inside external links. (#49)
- Ignore invalid attributes that do not start with space characters. (#48)
- Improved how invalid attributes (in html tags, tables, etc.) are handled. (#47)
- Fixed a bug in handling
<pre>
tags. (#46)
- Fixed a bug in parsing tag attributes. (#44)
- Fixed handling of tags having different casings in start and end name, e.g.
<s></S>
. - Fix handling of extension tags.
- Fixed a bug in
get_bolds
/get_italics
resulting in duplicate items in returned values. It also was causing a subtle issue inplain_text
/remove_markup
, too. (#42) - Fixed detection of parameters containing single braces.
- Fix handling of external links containing wikilinks.
- Fixed a bug in
plain_text
/remove_markup
causing unexpectedly empty objects. (#40)
Fixed some other bugs in
plain_text
/remove_markup
functions for:- images containing wikitext
- tags containing bold/italic items
- nested tags
Fixed a bug in extracting sub-tags.
- Fixed a bug in Tag objects causing strange behaviour upon mutating a tag.
- Fixed a bug in
plain_text
/remove_markup
functions, causing some objects that are expected to be removed, remain in the result. (#39)
- Fix syntax errors for python 3.5, 3.6, and 3.7.
- Fix a bug in getting the parser functions of a Template object.
- Fix a catastrophic backtracking issue for wikitexts containing html tags. (#37)
- Add
wikitextparser.remove_markup
function andWikiText.plain_text
method. - Improve detection of parameters and wikilinks.
- Add
get_bolds
andget_italics
methods. WikiLink.wikilinks
,WikiList.get_lists()
,Template.templates
,Tag.get_tags()
,ParserFunction.parser_functions
, andParameter.parameters
won't return objects equal toself
anymore, only sub-elements will be returned.- Improve handling of comments within wikilinks.
WikiLink.text.setter
no longer accepts None values. This was marked as deprecated since v0.25.0.- Drop support for Python 3.4.
- Remove the deprecated
pprint
method. Users should usepformat
instead. - Allow a tuple of patterns in
get_list
andsublists
method. The defaultNone
is now deprecated and a tuple is used instead.
- Add a new parameter,
level
, for theget_sections
method.
- Fixed a rare bug in handling lists and template arguments when there is newline or a pipe inside a starting or closing tag.
Section.title
will return None instead of''
when the section does not have any title.
- Invoking the deleter of
Section.title
won't raise a RuntimeError anymore if the section does not have a title already.
- Add a deleter for
Section.title
property. (#32)
- Fixed a bug in
WikiText.get_lists()
which was causing it to sometimes return items in an unordered fashion. (#31)
- Rename
WikiText.lists()
method toWikiText.get_lists()
and deprecate the old name. - Add
get_sections()
method withinclude_subsections
parameter which allows getting section without including subsections. (#23)
- Fixed a bug in parsing wikilinks contianing
[.*]
(#29) - Fixed: wikilinks are not allowed to be preceded by
[
anymore. - Rename
WikiText.tags()
method toWikiText.get_tags()
and deprecate the old name.
- Fix a bug in detecting the end-tag of two consecutive same-name tags. (#27)
- Properly exclude the
test
package from the source distribution.
- Fix a regression in parsing some corner cases of nested templates. (#26)
- The previously deprecated
WikiText.__getitem__
now raises NotImplementedError. - WikiText.__call__: Remove the deprecated support for start is None.
- Optimize a little and use more robust algorithms.
- Implemented a workaround for a catastrophic backtracking condition when parsing tables. (#22)
- Add
get_tables
as a new method toWikiText
objects. It allows extracting tables in a non-recursive manner. - The
nesting_level
property was only meaningful for tables, templates, and parser functions, remove it from other types.
- Fix a bug in detecting nested tables. (#21)
- Fix a few bug in detecting tables and template arguments.
- Changed the
comments
property ofComment
objects to return an empty list. - Changed the
external_links
property ofExternalLink
objects to return an empty list.
- Fix a bug in setting
Section.contents
which only occurred when the title had trailing whitespace. - Setting
Section.level
will not overwriteSection.title
anymore.
- Define
WikiLink.title
property. It is similar toWikiLink.target
but will not include the#fragment
.
- Deprecate using None as the start value of
__call__
.
- Added fragment property to
WikiLink
class (#18) - Added deleter method for
WikiLink.text
property. - Deprecated: Setting
WikiLink.text
toNone
. Usedel WikiLink.text
instead. - Added deleter method for
WikiLink.target
property. - Added deleter method for
ExternalLink.text
property. - Added deleter method for
Parameter.default
property. - Deprecated: Setting
Parameter.default
toNone
. Usedel Parameter.default
instead. - Defined
WikiText.__call__
to get a slice of wikitext as string. - Deprecated
WikiText.__getitem__
. UseWikiText.__call__
orWikiText.string
instead.
- Fixed a bug in
Tag.parsed_contents
. (#19)
- Fixed a rarely occurring bug in detecting parameters with names consisting only of whitespace or underscores.
- Fixed a bug in detecting parser functions containing parameters.
- Fixed a bug in detecting table header cells that start with +, -, or }. (#17)
- Define deleter method for
WikiText.string
property and addTemplate.del_arg
method. (#14) - Improve the
lists
method ofTemplate
andParserFunction
classes. (#15) - Fixed a bug in detection of multiline arguments. (#13)
- Deprecated
capital_links
parameter ofTemplate.normal_name
. Usecapitalize
instead (keyword-only argument). - Deprecated the
code
parameter ofTemplate.normal_name
as a positional argument deprecate. It's now a keyword-only argument.
- Fixed a bug in
Section
objects that was causing them to return the properties of the whole page (#15). - Removed the deprecated attribute access methods.
The following deprecated methods accessible on
Table
andTag
objects, have been removed:.has
,.get
,.set
. Use.has_attr
,.get_attr
,.set_attr
instead. - Fixed a bug in
set_attr
method. - Removed the deprecated
Table.getdata
method. UseTable.data
instead. - Removed the deprecated
Table.getrdata(row_num)
method. UseTable.data(row=row_num)
instead. - Removed the deprecated
Table.getcdata(col_num)
method. UseTable.data(col=col_num)
instead. - Removed the deprecated
Table.table_attrs
property. UseTable.attrs
or other attribute-related methods instead.
- Fixed MemoryError caused by very long or unclosed comment tags (issue #12)
- Change the behaviour of external_links property to never return Templates or parser functions as part of the external link.
- Add support for literal IPv6 external links, e.g. https://[2001:db8:85a3:8d3:1319:8a2e:370:7348]:443/.
- Fixed: Do not mistake the equal signs of section titles for template keyword arguments.
- Fixed Invalid escape sequences for Python 3.6.
- Added
msg
,msgnw
,raw
,safesubst
, andsubst
to known parser function identifiers.
- Fixed a bug in Table.data (issue #9)
- Fixed: A bug in processing
Section
objects.
- Fixed: A bug in
external_links
(the starting position must now be a word boundary; previously this condition was not checked)
- Fixed: A bug in
external_links
(external links withing sub-templates are now detected correctly; previously they were ignored)
- Changed: The order of results, now everything is sorted by its starting position.
- Fixed: Bug in
ancestors
andparent
methods
- Added:
parent
andancestors
methods - Added:
__version__
to__init__.py
- Removed: Support for Python 3.3
- Fixed: Handling of comments and tags in section titles
- Changed: Add an underscore prefix to private internal modules names
- Changed: Moved test modules to a different directory
- Changed: Templates adjacent to external links are now treated as part of the link
- Fixed: A bug in handling tag extensions withing parser functions
- Fixed: A minor bug in Template.set_arg
- Changed: ExternalLink.text: Return None if the link is not within brackets
- Fixed: Handling of comments and templates in external links