Skip to content

ftlRegex

Robert Rüger edited this page Jan 31, 2017 · 6 revisions

ftlRegex is a convenient Fortran wrapper around the POSIX regular expression functionality in the C standard library (aka regex.h). The default regular expression flavor used by ftlRegex is POSIX Extended Regular Syntax.

Here is a little example code that shows what ftlRegex can do for you:

type(ftlString) :: line
type(ftlRegex)  :: regex

line = 'Element: mass=12 Z=6 symbol=C name=Carbon'
call regex%New('(\w+)\s*=\s*(\w+)')

line = regex%Replace(line, '\2<-\1', doGroupSub=.true.)

The ftlString line now holds:

Element: 12<-mass 6<-Z C<-symbol Carbon<-name

Quite a lot of work done in just one line of Fortran, isn't it?

Note that since ftlRegex internally uses the regular expression engine of the C standard library, the supported regular expression elements are up to the implementation of libc. Everything from the POSIX standard should of course work with any libc implementation, but beyond that implementations typically support more. If you want to write regular expressions that work on all platforms though, you should stick to what the POSIX standard requires.

Derived types in ftlRegexModule

In addition to the ftlRegex type itself, the ftlRegexModule defines some other types that are used as return types of the matching methods of the ftlRegex type.

type, public :: ftlRegexMatch
   logical                          :: matches = .false.
   type(ftlString)                  :: text
   integer                          :: begin = 0
   integer                          :: end   = 0
   type(ftlRegexGroup), allocatable :: group(:)
end type

Here the matches member is .true. if a match was found. If a match was found the text that matches the regular expression is stored as an ftlString in the text member variable. The position of the match in the original string is given by the range [begin, end). Not that this (like all ranges used in the FTL) is a half open interval, meaning that begin is included and end is the first excluded character. So the text member compares equal to string(begin:end-1), if string is a raw Fortran string. The group member holds the contents of the regular expression's capture groups, if the particular expressions uses any. The used ftlRegexGroup type is defined as:

type, public :: ftlRegexGroup
   type(ftlString) :: text
   integer         :: begin = 0
   integer         :: end   = 0
end type

Here text is just text captures by the group and begin and end delimit where the captures group is found in the original string, again as a half open interval.

ftlRegex methods

Construction, destruction, assignment & comparison

ftlRegex%New()

Constructs a new ftlDynArray container from a variety of data sources:

  • Pattern constructor. Constructs an ftlRegex using either an ftlString (or alternatively a normal Fortran string) containing the regular expression pattern, and a number of optional logical arguments.

    subroutine New(self, pattern, basic, icase, nosub, newline)
       type(ftlRegex) , intent(inout)           :: self
       type(ftlString), intent(in)              :: pattern
       logical        , intent(in)   , optional :: basic, icase, nosub, newline

    The optional logicals have the following meaning:

    basic

    Use basic POSIX regular expressions instead of the extended POSIX regexes that ftlRegex uses by default.

    icase

    Do not differentiate case. Subsequent searches using the ftlRegex will be case insensitive.

    nosub

    Do not report position of matches or capturing groups. The resulting ftlRegex can pretty much only be used to test if something matches, but not where exactly. However, testing for matches will be faster. (Hopefully, this depends on your libc implementation ...)

    newline

    Match-any-character operators don't match a newline. A nonmatching list ([^...]) not containing a newline does not match a newline.

    Example usage:

    type(ftlRegex)  :: regex
    type(ftlString) :: pattern
    
    call regex%New('\s*=\s*') ! construction from raw Fortran string ...
    
    pattern = 'TeSt'
    call regex%New(line, icase=.true.) ! ... or from an ftlString pattern
  • Copy constructor. Constructs one regular expression as a copy of another.

    subroutine New(self, other)
       type(ftlRegex), intent(inout) :: self
       type(ftlRegex), intent(in)    :: other

Note that the constructors are also available as free functions named ftlRegex() that take the same parameters as above type bound subroutines and return an ftlRegex instance. This is sometimes useful if one wants to use a regular expression only once:

write (*,*) ('T12T' .matches. ftlRegex('T[0-9]+T')) ! prints True

ftlRegex%Delete()

Destructs the regular expression. All used memory is deallocated.

subroutine Delete(self)
   type(ftlRegex), intent(inout) :: self

It's not necessary to call Delete manually. It is used as the finalizer of the ftlRegex type and will be called automatically when an ftlRegex goes out of scope.

ftlRegex assignment(=)

Copy assignment. Replaces the contents with a copy of the contents of other.

subroutine assignment(=)(self, other)
   type(ftlDynArrayT), intent(inout) :: self
   type(ftlDynArrayT), intent(in)    :: other

This is exactly the same as using the copy constructor. (The assignment has only been implemented because intrinsic assignment would do the wrong thing and crash the program when the assigned regexes go out of scope.)

ftlRegex operator(==) ftlRegex

ftlRegex operator(/=) ftlRegex

Compares two regular expressions for (in)equality.

logical function operator(==)(lhs, rhs)
    type(ftlRegex), intent(in) :: lhs, rhs

logical function operator(/=)(lhs, rhs)
    type(ftlRegex), intent(in) :: lhs, rhs

Two regular expressions are considered equal both the pattern and the (optional) flags passed to their constructor are equal.

Matching & match replacement

string operator(.matches.) ftlRegex

Checks whether a string (either ftlString or raw Fortran string) matches a regular expression.

logical function operator(.matches.)(lhs, rhs)
   type(ftlString), intent(in) :: lhs
   type(ftlRegex) , intent(in) :: rhs

Example usage:

type(ftlRegex)  :: newsec
type(ftlString) :: line
integer :: unit, iostat, numSections

! open some file as unit

call newsec%New('^\s*SECTION\s*$', icase=.true., nosub=.true.)

numSections = 0
do while (.true.)
   call line%ReadLine(unit, iostat)
   if (is_iostat_end(iostat)) exit
   if (line .matches. newsec) numSections = numSections + 1
enddo
write (*,*) 'Found ', numSections, 'in file'

ftlRegex%NumMatches()

Returns the number of non-overlapping matches of regex in string (which can either be an ftlString or a raw Fortran string).

integer function NumMatches(self, string)
   type(ftlRegex) , intent(in) :: self
   type(ftlString), intent(in) :: string

Example usage:

type(ftlRegex) :: regex
call regex%New('[a-zA-z]\s*=\s*[0-9]+')
write (*,*) regex%NumMatches('u=12 F=32 a=b x=7') ! prints 3

ftlRegex%Match()

Returns an array of all non-overlapping matches of the regular expression in string (which can either be an ftlString or a raw Fortran string).

function Match(self, string)
   type(ftlRegex)     , intent(in)  :: self
   type(ftlString)    , intent(in)  :: string
   type(ftlRegexMatch), allocatable :: matches(:)

If no matches are found, the returned array has a size of 0.

Example usage:

type(ftlString) :: line
type(ftlRegex) :: r
type(ftlRegexMatch), allocatable :: m(:)

line = 'keyword option1=value option2=othervalue'
call r%New('(\w+)\s*=\s*(\w+)')
m = r%Match(line)

! m(1)%text now holds 'option1=value'
! m(2)%text now holds 'option2=othervalue'
! m(:)%group is also populated with the contents of the capture groups.
! e.g. m(1)%group(2)%text holds 'value'

ftlRegex%MatchFirst()

Returns a ftlRegexMatch for the first match of the regular expression in a string (which can either be an ftlString or a raw Fortran string).

type(ftlRegexMatch) function MatchFirst(self, string)
   type(ftlRegex) , intent(in) :: self
   type(ftlString), intent(in) :: string

If no match is found then the matched member variable of the returned ftlRegexMatch is set to .false..

Example usage:

type(ftlRegex) :: regex
type(ftlRegexMatch) :: match
call regex%New('[a-zA-z]\s*=\s*[0-9]+')
match = regex%MatchFirst('u=12 F=32 a=b x=7')
! match%text now holds 'u=12'

ftlRegex%Replace()

Returns an ftlString where all matches of the regular expression in string have been replaced with sub. Note that both string and sub can be either ftlString or raw Fortran strings.

type(ftlString) function Replace(self, string, sub, doGroupSub)
   class(ftlRegex), intent(in)           :: self
   type(ftlString), intent(in)           :: string
   type(ftlString), intent(in)           :: sub
   logical        , intent(in), optional :: doGroupSub

If the optional argument doGroupSub is present and .true., the contents of the regular expression's capture groups can be used in the substitution string: \n will be replaced by the contents of the n'th capture group.

Example usage:

type(ftlString) :: line
type(ftlRegex)  :: regex

line = 'Element: mass=12 Z=6 symbol=C name=Carbon'
call regex%New('(\w+)\s*=\s*(\w+)')

line = regex%Replace(line, '\2<-\1', doGroupSub=.true.)

! line now holds: 'Element: 12<-mass 6<-Z C<-symbol Carbon<-name'
Clone this wiki locally