-
Notifications
You must be signed in to change notification settings - Fork 13
ftlRegex
ftlRegex is a convenient Fortran wrapper around the POSIX regular expression
functionality in the C standard library (aka regex.h
). The default regular
expression flavor used by ftlRegex is POSIX Extended Regular Syntax.
Here is a little example code that shows what ftlRegex can do for you:
type(ftlString) :: line
type(ftlRegex) :: regex
line = 'Element: mass=12 Z=6 symbol=C name=Carbon'
call regex%New('(\w+)\s*=\s*(\w+)')
line = regex%Replace(line, '\2<-\1', doGroupSub=.true.)
The ftlString line
now holds:
Element: 12<-mass 6<-Z C<-symbol Carbon<-name
Quite a lot of work done in just one line of Fortran, isn't it?
Note that since ftlRegex
internally uses the regular expression engine of
the C standard library, the supported regular expression elements are up to the
implementation of libc. Everything from the POSIX standard should of course work
with any libc implementation, but beyond that implementations typically support
more. If you want to write regular expressions that work on all platforms
though, you should stick to what the POSIX standard requires.
In addition to the ftlRegex
type itself, the ftlRegexModule
defines some
other types that are used as return types of the matching methods of the
ftlRegex
type.
type, public :: ftlRegexMatch
logical :: matches = .false.
type(ftlString) :: text
integer :: begin = 0
integer :: end = 0
type(ftlRegexGroup), allocatable :: group(:)
end type
Here the matches
member is .true.
if a match was found. If a match was
found the text that matches the regular expression is stored as an ftlString
in the text
member variable. The position of the match in the original
string is given by the range [begin
, end
). Not that this (like all
ranges used in the FTL) is a half open interval, meaning that begin
is
included and end
is the first excluded character. So the text
member
compares equal to string(begin:end-1)
, if string
is a raw Fortran
string. The group
member holds the contents of the regular expression's
capture groups, if the particular expressions uses any. The used
ftlRegexGroup
type is defined as:
type, public :: ftlRegexGroup
type(ftlString) :: text
integer :: begin = 0
integer :: end = 0
end type
Here text
is just text captures by the group and begin
and end
delimit where the captures group is found in the original string, again as a
half open interval.
Constructs a new ftlDynArray container from a variety of data sources:
Pattern constructor. Constructs an
ftlRegex
using either anftlString
(or alternatively a normal Fortran string) containing the regular expression pattern, and a number of optional logical arguments.subroutine New(self, pattern, basic, icase, nosub, newline) type(ftlRegex) , intent(inout) :: self type(ftlString), intent(in) :: pattern logical , intent(in) , optional :: basic, icase, nosub, newlineThe optional logicals have the following meaning:
basic
Use basic POSIX regular expressions instead of the extended POSIX regexes that
ftlRegex
uses by default.icase
Do not differentiate case. Subsequent searches using the
ftlRegex
will be case insensitive.nosub
Do not report position of matches or capturing groups. The resulting
ftlRegex
can pretty much only be used to test if something matches, but not where exactly. However, testing for matches will be faster. (Hopefully, this depends on your libc implementation ...)newline
Match-any-character operators don't match a newline. A nonmatching list ([^...]) not containing a newline does not match a newline.
Example usage:
type(ftlRegex) :: regex type(ftlString) :: pattern call regex%New('\s*=\s*') ! construction from raw Fortran string ... pattern = 'TeSt' call regex%New(line, icase=.true.) ! ... or from an ftlString patternCopy constructor. Constructs one regular expression as a copy of another.
subroutine New(self, other) type(ftlRegex), intent(inout) :: self type(ftlRegex), intent(in) :: otherNote that the constructors are also available as free functions named
ftlRegex()
that take the same parameters as above type bound subroutines and return anftlRegex
instance. This is sometimes useful if one wants to use a regular expression only once:write (*,*) ('T12T' .matches. ftlRegex('T[0-9]+T')) ! prints True
Destructs the regular expression. All used memory is deallocated.
subroutine Delete(self) type(ftlRegex), intent(inout) :: selfIt's not necessary to call
Delete
manually. It is used as the finalizer of theftlRegex
type and will be called automatically when anftlRegex
goes out of scope.
Copy assignment. Replaces the contents with a copy of the contents of other.
subroutine assignment(=)(self, other) type(ftlDynArrayT), intent(inout) :: self type(ftlDynArrayT), intent(in) :: otherThis is exactly the same as using the copy constructor. (The assignment has only been implemented because intrinsic assignment would do the wrong thing and crash the program when the assigned regexes go out of scope.)
Compares two regular expressions for (in)equality.
logical function operator(==)(lhs, rhs) type(ftlRegex), intent(in) :: lhs, rhs logical function operator(/=)(lhs, rhs) type(ftlRegex), intent(in) :: lhs, rhsTwo regular expressions are considered equal both the pattern and the (optional) flags passed to their constructor are equal.
Checks whether a
string
(eitherftlString
or raw Fortran string) matches a regular expression.logical function operator(.matches.)(lhs, rhs) type(ftlString), intent(in) :: lhs type(ftlRegex) , intent(in) :: rhsExample usage:
type(ftlRegex) :: newsec type(ftlString) :: line integer :: unit, iostat, numSections ! open some file as unit call newsec%New('^\s*SECTION\s*$', icase=.true., nosub=.true.) numSections = 0 do while (.true.) call line%ReadLine(unit, iostat) if (is_iostat_end(iostat)) exit if (line .matches. newsec) numSections = numSections + 1 enddo write (*,*) 'Found ', numSections, 'in file'
Returns the number of non-overlapping matches of
regex
instring
(which can either be anftlString
or a raw Fortran string).integer function NumMatches(self, string) type(ftlRegex) , intent(in) :: self type(ftlString), intent(in) :: stringExample usage:
type(ftlRegex) :: regex call regex%New('[a-zA-z]\s*=\s*[0-9]+') write (*,*) regex%NumMatches('u=12 F=32 a=b x=7') ! prints 3
Returns an array of all non-overlapping matches of the regular expression in
string
(which can either be anftlString
or a raw Fortran string).function Match(self, string) type(ftlRegex) , intent(in) :: self type(ftlString) , intent(in) :: string type(ftlRegexMatch), allocatable :: matches(:)If no matches are found, the returned array has a size of 0.
Example usage:
type(ftlString) :: line type(ftlRegex) :: r type(ftlRegexMatch), allocatable :: m(:) line = 'keyword option1=value option2=othervalue' call r%New('(\w+)\s*=\s*(\w+)') m = r%Match(line) ! m(1)%text now holds 'option1=value' ! m(2)%text now holds 'option2=othervalue' ! m(:)%group is also populated with the contents of the capture groups. ! e.g. m(1)%group(2)%text holds 'value'
Returns a
ftlRegexMatch
for the first match of the regular expression in astring
(which can either be anftlString
or a raw Fortran string).type(ftlRegexMatch) function MatchFirst(self, string) type(ftlRegex) , intent(in) :: self type(ftlString), intent(in) :: stringIf no match is found then the
matched
member variable of the returnedftlRegexMatch
is set to.false.
.Example usage:
type(ftlRegex) :: regex type(ftlRegexMatch) :: match call regex%New('[a-zA-z]\s*=\s*[0-9]+') match = regex%MatchFirst('u=12 F=32 a=b x=7') ! match%text now holds 'u=12'
Returns an
ftlString
where all matches of the regular expression instring
have been replaced withsub
. Note that bothstring
andsub
can be eitherftlString
or raw Fortran strings.type(ftlString) function Replace(self, string, sub, doGroupSub) class(ftlRegex), intent(in) :: self type(ftlString), intent(in) :: string type(ftlString), intent(in) :: sub logical , intent(in), optional :: doGroupSubIf the optional argument
doGroupSub
is present and.true.
, the contents of the regular expression's capture groups can be used in the substitution string:\n
will be replaced by the contents of the n'th capture group.Example usage:
type(ftlString) :: line type(ftlRegex) :: regex line = 'Element: mass=12 Z=6 symbol=C name=Carbon' call regex%New('(\w+)\s*=\s*(\w+)') line = regex%Replace(line, '\2<-\1', doGroupSub=.true.) ! line now holds: 'Element: 12<-mass 6<-Z C<-symbol Carbon<-name'