Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application and script abstract classes #78

Open
wants to merge 63 commits into
base: master
Choose a base branch
from

Conversation

jcflack
Copy link

@jcflack jcflack commented Aug 12, 2017

Application and script abstract classes

The goal of this work is to widen the population that can easily write reliable new Amanda applications and scripts to support specialized backup needs. The topmost section of the Amanda wiki Development Tasks list presents several useful applications that could be added to Amanda, and that list has been unchanged for seven years.

Arguably, the slow pace of new application and script development relates to the level at which the current Application API and Script API are defined. Those documents spell out the arguments and file descriptors passed to an application or script process, and the concrete syntax of messages to be exchanged.

Specification at that level has the advantage of being language-agnostic: an Amanda application or script could in theory be written in any language, as long as it consumes and produces the needed arguments and messages properly. Its disadvantage is a level of detail far removed from the practical problem a would-be application or script writer wants to solve. It tends to narrow the potential population of Amanda application or script authors: rather than including any Amanda user with a problem to solve and an idea how to do so, it realistically demands some expertise in IPC protocols and enough Amanda internals knowledge to fill in the actual behavior behind some of the syntax. (The many amanda-hackers messages exchanged during this work illustrate some of the traps for the unwary.)

While language-agnostic in theory, applications and scripts for Amanda are likely to be written in perl, given its heavy use in Amanda proper. Specifically, they are likely to be object-oriented perl code that extends one of the base classes Amanda::Application or Amanda::Script. They do inherit a small amount of common functionality from their base classes, but not nearly as much as they could. Inheriting classes are still on their own, for example, to produce and consume the API-defined IPC messages, ensuring valid syntax.

Less-thin abstract base classes could encapsulate much more of that common work, and present a traditional, object-oriented API where new application or script code may override just a few key methods and rely on default behavior wherever customization is not needed. That can make applications and scripts much faster to develop, and easier to review for correctness. At the same time, it simplifies future evolution of the IPC messages, something that would be increasingly impractical if a growing set of scripts and applications all contain duplicated code with those details baked in, but is simple with methods inherited from one central place.

This work, therefore, introduces the new abstract classes Amanda::Application::Abstract and Amanda::Script::Abstract. To stay compatible with existing applications and scripts, this work makes no changes to the existing classes Amanda::Application and Amanda::Script. Those classes are simply inherited by these new base classes, which are implemented in pure perl and provide a richer object-oriented API to applications and scripts that choose to inherit from them.

This layering also means that if it is ever desired to write an Amanda application or script in another scripting language than perl, these two pure-perl classes are essentially what could be translated to provide an "Amanda application/script API binding" for that language.

Included sample applications / scripts

Included here are four new applications to show the simplicity of development: amooraw (merely amraw redone in OO style), amgrowingfile (useful for a single large file known to only monotonically grow; can do incremental levels), amgrowingzip (like amgrowingfile but for a ZIP archive), and amopaquetree (backup of a directory/file tree where restoration of individual files won't be needed, but with fine-grained incremental backup down to only changed regions within files, rather than entire files that may contain small changes).

The amopaquetree application is also one that might itself be used as a base class for a specialized application applying the same opaque-tree, fine-grained-increments approach to a specific database management system, for example.

Four new scripts are also provided: am389bak for taking a consistent snapshot of a 389 Directory Server instance before estimate or backup, amsvnmakehotcopy to do the same for a Subversion repository, amlvmsnapshot to allow backing up from a snapshot of a filesystem using LVM, and amlibvirtfsfreeze to freeze and thaw filesystems in a libvirt-supported guest VM, so the host filesystem can be snapshotted for backup at a moment when the guest image files are consistent.

These four scripts, especially, should be considered experimental or demo quality at this stage. They do little error checking of external commands they execute (though they work fine when nothing goes wrong), and, to be useful in most real environments, they are likely to need short C-language setuid wrappers to be written for those few external commands, and such wrappers are not included in this commit. A configurable, secure, general-purpose permission-granting wrapper would greatly simplify development of scripts like these, but a design for that remains future work.

Future work

Elevated permissions

As mentioned above, a design for a general-purpose and securely configurable way to grant elevated permission to specific actions executed from selected applications or scripts (as opposed to having to write some C analog of runtar for each needed case) could be a challenging design problem, but one that would greatly simplify application/script development for real environments.

I/O involving child processes

The example applications and scripts presented here do less than they ought in the way of capturing standard and error output from child processes they execute, interpreting and responding appropriately, or passing modified messages upstream to Amanda. Fully robust implementations would include that, ideally without adding such complexity it obscures the outlines of the code.

Doing such work at the low level of, say, perl's open3 is too longwinded to be ideal. Experienced Amanda developers may prefer to work with the features of Amanda::MainLoop, already familiar from other parts of the guts of Amanda. For other potential application or script developers, who may not have deep Amanda hacking experience but will know perl, the familiarity and clear, intuitive syntax of perl's IPC::Run might be more appealing.

IPC::Run is a CPAN module that need not be present on every system with perl, and might not be a suitable dependency for Amanda core. Because of its functionality and appealing syntax, though, it might be something that specialized applications or scripts might rely on if the author prefers. Such applications or scripts would need to avoid breaking make check, and return a suitable error to amcheck, on systems where the module is not present. It would be simple to provide some stub support in Amanda::Application::Abstract and Amanda::Script::Abstract to simplify that.

Chapman Flack added 26 commits August 9, 2017 17:24
Provides an 'amooraw' application usable as an 'amraw' replacement
(restores into current amrecover directory, however, instead of
unconditionally writing over the original path as amraw does).
Allows writing a DLE that refers to a single file that is known
to only grow; can backup and restore incremental levels, which are
simply the tail of the file from where it ended at the prior level.
If the file ever gets rewritten (not just appended), it will be
important to remember to force a level 0 dump.
This application can back up and restore, incrementally, a DLE that is
a single ZIP file only added to "at the end" (in ZIP terms, which
really means overwriting the directory at the very end with new content,
followed by a new directory).
This application can back up and restore, incrementally, a directory
tree of files treated as an indivisible unit (say, something managed
by a DBMS or version-control system, where one is interested in
restoring the state of the whole tree as of a given backup point, but
individual files are of no interest except to the system that manages
them).

What is actually backed up is an rsync 'batch' stream; rsync can
generate an efficiently encoded stream for turning one tree into
another without needing internal knowledge of the data format.
(A level 0 backup is saved as a batch stream to turn an empty directory
into the tree being backed up.)
This can be useful if, for example, backing up something that lives
on a VM and the default Amanda::Paths::localstatedir ends up on the
root filesystem, which might be a sparsely-provisioned or copy-on-
write image that you'd rather avoid continually writing into, so
it stays more or less reflective of the OS itself and its updates.

If the VM is also provided another filesystem more appropriate for
continual day-to-day scribbling, this property can move the Amanda
local state to there.
Prior states of the opaque tree can be stored efficiently using
--link-dest, so only changed files are added in a new state while
unchanged ones are made links into the state it is based on.
That way (assuming many files are untouched between backups),
state for n levels does not require n times the space of the tree
being backed up, and often a multiple only slightly greater than 1.

However, commonly the timestamps will be meaningless in the tree
being backed up. That happens if a package provides a tool for
generating a consistent snapshot of its files (svnadmin hotcopy
for subversion, db2bak for an LDAP server, etc.) and if that tool
doesn't preserve timestamps in the tree copy that it writes.

In that case, every tree that amopaquetree is asked to back up
will have new timestamps on every file, even those that (in the
live application) have been untouched since the last backup.
If capture_rsync_state is trying to preserve timestamps, it will
fail to find any files it can link, and the state storage goes
back to requiring n times the tree size for n levels of state.

Therefore, avoid preserving timestamps in capture_rsync_state.

No corresponding change to generate_rsync_batch; it still
preserves times. Accordingly, each file in a restored tree will
end up stamped with the time of the latest dump it came from,
which is not unreasonable, and better than giving everything the
arbitrary new timestamp the application's snapshot tool may have
slapped on it.
Although adapted from amraw's example, for inner_restore to refer to
--device (in determining the directory to restore into) is *not* like
what other standard amanda applications do; they restore right into
amrecover's current directory (or the one given with --directory, if any),
as you would expect when --device held a filesystem and the pathnames
backed up were relative to the root of that filesystem. To uphold the
principle of least astonishment, make this application restore the same way.

Add manpage.
As with amopaquetree, don't astonish amrecover's user by somehow
deriving the restored file name from the --device path.

Because these apps operate on a single file rather than a directory
tree, the --directory option isn't quite the thing; and the file name
isn't recoverable from the command line (only the fixed name / is emitted
into the index, and only the fixed name . gets passed by amrecover),
so provide a new property --filename instead. That allows the amrecover
user to control the filename with

  setproperty filename foo.bar

Add man pages.
For now, it only runs for pre-dle-estimate ... which means no being
clever and setting estimate to "server, client" (which would otherwise
be desirable) because if the estimate isn't run, the hotcopy won't
have been made.
This turns out to be necessary for XFS filesystems, where
a newly-created snapshot can't be mounted without the nouuid
option (because, unsurprisingly, it has the same uuid as its
origin), or without the norecovery option. That last is surprising,
because there is an xfs_freeze operation and, per documentation,
it is automatically called by the dm driver when an LVM snapshot
is created, so the snapshot should be of a quiescent filesystem,
but somehow the process results in a couple snapshot-related
entries in the journal, enough for mount to think recovery is needed.
If a domain running under libvirt is able to respond to the
virsh domfsfreeze/domfsthaw commands (say, it is a qemu domain
with the qemu-guest-agent running within), then this script can
be used to freeze one or more filesystems in the domain and then
thaw them again. The intent is for a DLE for backing up the host
node to mention this script twice (through two different amanda.conf
'script' definitions, one with the freezeorthaw property set to freeze
and 'order' set to a lower number than the amlvmsnapshot script, and
the other with freezeorthaw set to thaw, and 'order' higher than that
of amlvmsnapshot.

The result should be that the running domain has its filesystem(s)
frozen, but only long for the host node to grab an LVM snapshot of
its own filesystem(s), then immediately thawing the domain
filesystem(s) ... all of this in pre-dle-estimate. The snapshot of
the host filesystem can then be used for estimate and backup, and
any guest domain filesystem image file(s) should be in a consistent
state.
In RHEL7, the qemu-guest-agent is able to freeze
selected filesystems, specified by their mountpoints
(though when it comes time to unfreeze, no mountpoints
can be specified, all frozen filesystems are unfrozen
at once). However, RHEL6's guest agent is not able to
freeze selected filesystems, only all of them at once.

So, the case of 'freeze' with no mountpoints specified
can't be rejected as an error; it could be necessary
for a RHEL6 VM.
When space permits, it seems ideal to run amlvmsnapshot
only at pre-dle-estimate to create the snapshot, and at
post-dle-backup to free it, thereby backing up exactly
what was estimated. But if there was not much volume-group
space left to allocate to a snapshot, or if Amanda's delay
in gathering all estimates and planning backups is long,
it could be possible for the snapshot to run out of space.
For that case, allow defining a dumptype that runs
amlvmsnapshot four times (pre/post-dle-estimate,
pre/post-dle-backup). In that case, the backup will be done
from a second snapshot taken later, so it won't be exactly
what was estimated (that's why it's called an estimate ;)
but the two snapshots will be shorter-lived and less likely
to exhaust available space.

Allow amlibvirtfsfreeze to run pre-dle-backup too. Add man pages.
To back up a 389 LDAP directory server instance, this script
can be used on pre-dle-estimate to run db2bak, which copies out
a consistent snapshot of the database files from the server instance
(whose name--the part of its directory name after the slapd- prefix--
has to be specified with the 'instance' property) into the directory
named by the DLE's 'device'. Then amopaquetree is great for dumping
that consistent snapshot.

It sounds easy but permissions complicate matters. If the Amanda user
isn't root and the 389 server runs as a different user, some kind of
setuid 'rundb2bak' wrapper is needed. In fact, it has to be setuid and
setgid AND copy both of those to the real ids before it execs db2bak,
which otherwise complains. That means the wrapper had better live on
a file system that supports ACLs, because with both user and group
having to be 389's, there'd be no other way to make it executable by
the Amanda user but not by everyone.

A couple other annoying things done by db2bak are also best handled
in the setuid wrapper. If the destination directory already exists,
db2bak moves it to the same name with .bak tacked on (there is no
documented option to not do that). By itself that's not so bad, but
if that .bak directory ALSO already exists, db2bak fails. So it has
to be removed every time, most easily in the same setuid wrapper, which
is able to do so.

A tidy way for the backup strategy to work is to have a default, or
inheritable, ACL on the parent directory of the destination (DLE
'device'), so that when db2bak writes the files there (running as
the 389 user), they get ACLs allowing the Amanda user to read them,
so amopaquetree then has no trouble dumping them.

That's another thing db2bak is able to break, by creating its files
and directories with explicit modes disallowing group access (which,
at least in the POSIX ACL world, has the effect of zeroing the ACL's
'mask' entry; the files all inherit the parent ACL giving access to
the Amanda user, but the mask blocks it anyway). There might be some
other way around that, but the setuid wrapper used here also just runs
through the resulting tree fixing the doggoned ACLs.

That wrapper's not included in this commit, out of a sense that it's
probably too specific to this site. For Amanda to really benefit from
easy development of scripts like this one, I think there also needs to
be some kind of generalization of runtar that can allow other things
to be run with privilege, subject to some simple client-host configuration
file limiting what can be run and for what DLEs. Future work....
A site that isn't using amgrowingzip may have no need for the
Perl module Archive::Zip, so make sure amgrowingzip doesn't
fail the syntax checks at make time if Archive::Zip isn't present.
A site that doesn't use amopaquetree may have no need for rsync.
Allow amopaquetree to clearly announce in selfcheck if a usable
rsync isn't present.
Introduce the abstract classes Amanda::Application::Abstract
and Amanda::Script::Abstract, with which applications and scripts
can be developed in a more OO style by simply overriding necessary
methods, instead of having to manage the exact form of messages to
and from the parent process at the level presented in the
Application API and Script API documents. Here those IPC details
are handled by the abstract classes, effectively providing a new,
object/method API for applications and scripts to use.

Such an approach also better insulates individual applications and
scripts from any future evolution of the message formats to and from
the parent process. Changes can be made to the abstract classes
instead of being duplicated in many applications or scripts.

Also provide four new applications to show the simplicity of
development: amooraw (merely amraw redone in OO style), amgrowingfile
(useful for a single large file known to only monotonically grow; can
do incremental levels), amgrowingzip (like amgrowingfile but for a
ZIP archive), and amopaquetree (backup of a directory/file tree where
restoration of individual files won't be needed, but with fine-grained
incremental backup down to only changed regions within files, rather
than entire files that may contain small changes).

Four new scripts are also provided: am389bak for taking a consistent
snapshot of a 389 Directory Server instance before estimate or backup,
amsvnmakehotcopy to do the same for a Subversion repository,
amlvmsnapshot to allow backing up from a snapshot of a filesystem
using LVM, and amlibvirtfsfreeze to freeze and thaw filesystems in
a libvirt-supported guest VM, so the host filesystem can be snapshotted
for backup at a moment when the guest image files are consistent.

These four scripts, especially, should be considered experimental or
demo quality at this stage. They do little error checking of external
commands they execute (though they work fine when nothing goes wrong),
and, to be useful in most real environments, they are likely to need
short C-language setuid wrappers to be written for those few external
commands, and such wrappers are not included in this commit.
A configurable, secure, general-purpose permission-granting wrapper
would greatly simplify development of scripts like these, but a design
for that remains future work.
In amanda-scripts(7), amzfs-snapshot is already (correctly) listed,
so remove it from amanda-applications(7); it's a script, not an app.
None for the scripts yet; those are more experimental.
This adds a third Amanda way of backing up ZFS, this one using
replication streams (preserving snapshot history, not just one
recent snapshot), and relying on some other schedule creating
regular snapshots; this application preserves those, without
creating its own.

Does not yet support 'send -nvP' which would be a faster and more
accurate estimating approach, nor 'send -c'. (Also, doesn't yet
take compressratio into account for non-nvP estimating.)
This whole estimating business is tedious compared to
using send -nvP in OpenZFS.
Support the OpenZFS 'zfs send -nvP' method of getting an estimated
send size. (Still needs to be tested on a box that supports -nvP.)

Discovered in passing that Amanda::Application::Abstract wasn't
declaring --calcsize as an estimate option if supports_calcsize()
was true ... and fixed a silent, original error in A::A::Abstract
caught by a handy warning from Perl while testing on a different
version.
Add a property UNCOMPRESSED that defaults to true, but can be set
to false if the platform supports compressed streams with
zfs send -c as in OpenZFS. (Note that Solaris 10 and 11 zfs send
has a -c option that means something else, unrelated to compression.)

UNCOMPRESSED=false, where possible, is a win both for space and for
CPU cycles, which will then not be used to uncompress stored data into
a bloated send stream.
Add support for the dedup, embed, large-block, and raw options,
which can simply be passed through to zfs send, without otherwise
changing logic here.
This application adds a third Amanda way of approaching ZFS backup.
Where amzfs-snapshot makes its own snapshot of a single dataset and
lets you back it up with a traditional archiving tool, and
amzfs-sendrecv makes its own snapshot of a single dataset and
captures only that snapshot with zfs send, this new amzfs-holdsend
(a) does not make its own snapshot, but assumes you have some other
scheduled process taking snapshots, (b) captures all snapshots since
the last backup, not just the latest one, and (c) operates on a
subtree in the ZFS namespace (the dataset named by DISK or DEVICE
and its descendants), rather than a single dataset.

Admins now have three choices in how to use Amanda for ZFS backups,
and can choose one best suited to local needs.
@jcflack
Copy link
Author

jcflack commented Aug 30, 2017

AppScriptWithAbstractClasses.pdf
Have just pushed one more new app, amzfs-holdsend, giving a third choice (in addition to amzfs-snapshot and amzfs-sendrecv) for how to approach ZFS backups.

Also attached here is a PDF file of the new generated docs for ease of review.

Chapman Flack added 3 commits September 1, 2017 10:48
Instead of making inner_backup responsible for calling write_local_state,
have command_backup do that automagically if RECORD is supported and
requested and a $self->{'localstate'} exists.

This is preparation for a future change in which command_backup will get
a confirmation from the server before writing the new state.

Discussion: https://marc.info/?l=amanda-hackers&m=150427714716446

Once it becomes possible that a negative confirmation from the server
prevents write_local_state being called, there should be a
repair_local_state method an application can override to reclaim
any resources that were going to be referred to in the new state,
but would be leaked when that state is not saved.
That is, called as $class->supports(...) or, when called from an
instance method, blessed($self)->supports(...).

The question https://marc.info/?l=amanda-hackers&m=150410741108445
got me, at first, to explain this without realizing I had flubbed it
myself nine times. Nothing broke, as none of the existing supports...
methods dereference the implicit argument for anything, so I hadn't
noticed.
Chapman Flack added 27 commits September 16, 2017 21:36
As added in upstream at adbcd7f, there is now a timestamp property
(and an implemented support subcommand to advertise it).

Update Amanda::Script::Abstract correspondingly.

Existing scripts were doing various custom property checks in new()
for simplicity; that won't work for 'support' because the properties
are not passed in that case. Therefore, a new method check_properties()
is the place for such checks; it is called by run() before do() in every
case except 'support'.

This is preparatory to a way for scripts to maintain invocation-specific
local state, but that is not in this commit.
Sync with upstream changes introducing timestamp property and 'support'
subcommand for scripts. Other changes from ongoing review.

In passing, fix the addition of Amanda::Script::Abstract to
perl/Makefile.am, which was not quite right in 2754e67.
It can be passed to them too, not only to scripts.

To support dropping these Perl modules in to earlier versions
of Amanda, don't advertise some more recent features in 'support'
unless corresponding Amanda::Feature constants are defined.
... including reporting the parsed options into the debug log.
Also a straggling out-of-date comment.
Checking can now be done with a sequence of check(condition, message)
(which will report any failed checks without interrupting execution),
followed at the end (in the command_... method) with a single bare
check(), which throws an exception to end execution if any of the
foregoing checks failed.

This should simplify using a single set of check methods both from
command_selfcheck (which simply ought to report as many issues as
possible) and from other commands (which ought to fail if anything
isn't right).
When estimating, if the requested level makes no sense (the prior one
isn't recorded), a DiscontiguousLevelError should be thrown, which will
be caught and turned into a -2 -2 report to the server (meaning not to
attempt that level at all). Anything else thrown from inner_estimate will
become a -1 -1 return, telling the server it may use its own estimate
if it has one. However, a critical error will be reported if there was
not at least one requested estimate level that succeeded.

Discussion: https://marc.info/?l=amanda-hackers&m=150515725916583
The exception's on_uncaught() will produce a special IPC message
to the parent process: sendbackup: retry delay s level n message m

Discussion: https://marc.info/?l=amanda-hackers&m=150428762720212
Add support for applications as well as scripts to receive
a timestamp.

Introduce exception objects to simplify control flow, based on
Amanda::Message objects to better integrate applications with that
convention also. For now, unique message codes or ranges for specific
applications have not been assigned. Methods transitionalError() and
transitionalGood() create objects with the generic 1 and 0 codes,
suitable until later patches add unique codes (if that is worth doing).

Introduce special exceptions that can be thrown from 'estimate' code
to indicate a requested level isn't possible (returning a -2 -2 estimate,
as discussed in https://marc.info/?l=amanda-hackers&m=150515725916583),
and from 'backup' code to force a retry at a different level, as discussed
for amgrowingfile in https://marc.info/?l=amanda-hackers&m=150428762720212.

Two tweaks in Amanda::Debug so the warn and die handlers do not fail
when $@ is an exception object rather than a plain string.
... and adjust existing scripts to use them.
Applications now accept --target (or, equivalently, the deprecated
--directory). For restoration, the property will be honored if present,
otherwise restoration will happen in the current working directory.
For other subcommands, it is honored if present, defaulting to --device.

Discussion: https://marc.info/?l=amanda-hackers&m=150471292725829
Related commit: 22ffc89

Add exception classes in Amanda::Script::Abstract similar to those
in Amanda::Application::Abstract, and use them in scripts.
The code to report exceptions that are not Amanda::...::Message instances
was missing the second parameter to print_to_server_and_die.
Report failure if the snapshot reached 100% allocation while the
backup was in progress, or a warning if it reached 90% or more,
so the admin knows to increase the allocated size, or arrange for
estimate/backup to happen faster, or to use separate snapshots
for each.
Had been passing ERROR because of a confusing comment in
Amanda::Script_App, and that isn't a problem in Amanda >= 3.3.8
because, in those recent versions, the status passed to
print_to_server_and_die gets coerced to FAILURE anyway. But
passing FAILURE explicitly here makes it possible to drop these
modules into Amanda < 3.3.8 installations and still have proper
behavior.

Discussion: https://marc.info/?l=amanda-hackers&m=151256442622699
Just before freezing/snapshotting for a backup can be a natural time
to trim guest filesystems, keeping the backing image sizes in check
by letting unused blocks be returned to the host OS.
@chassell
Copy link

I'll need to catch up and read more of this but let me say it merges beautifully and seems to disrupt nothing at all except to add more files / functions and a little bit of documentation changes.

I'm not sure it will be welcomed as a direct benefit but it can be a very clear "addition" of commands.

Chapman Flack and others added 2 commits November 10, 2020 11:02
If there are domains mentioned for trim/freeze/thaw in
the DLE but some of them happen not to be running at
the time of the backup, those operations are (a) impossible
and (b) unnecessary, so skip them for those domains and let
the dump succeed.
Enhance the amlibvirtfsfreeze script to first check whether the
libvirt domain in question is running. If it is not, attempts to
trim/freeze/thaw its filesystems through the guest agent will fail,
but they should also be unneeded, as its filesystem image is then
quiescent. (That assumption may not hold if the VM was shut down
abruptly; detecting or handling that case is beyond the scope of
this patch.)

Therefore, the script may act as a successful no-op (with an
informational message to the debug log) if the domain is not running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants