Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task/rdmp 224 dqe update #2095

Open
wants to merge 42 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2a516bf
add changelog
JFriel Sep 25, 2024
53596b0
Bugfix/rdmp 253 filter ordering (#2007)
JFriel Sep 26, 2024
c2ce5d3
Merge branch 'develop' of https://github.com/HicServices/RDMP into re…
JFriel Oct 10, 2024
6edf05b
Merge branch 'develop' of https://github.com/HicServices/RDMP into re…
JFriel Oct 22, 2024
80b03e7
Merge branch 'develop' of https://github.com/HicServices/RDMP into re…
JFriel Oct 24, 2024
23cd4c1
Merge branch 'develop' of https://github.com/HicServices/RDMP into re…
JFriel Oct 30, 2024
670a5f6
interim
JFriel Nov 4, 2024
5222afa
improved add
JFriel Nov 4, 2024
244e653
interim
JFriel Nov 5, 2024
e9fec7a
interim
JFriel Nov 6, 2024
b30c682
interim
JFriel Nov 7, 2024
26b9b3e
all not working
JFriel Nov 8, 2024
212736b
correct rows
JFriel Nov 8, 2024
f9f1670
add todo
JFriel Nov 8, 2024
cd08015
columns still not working
JFriel Nov 11, 2024
a0b7ad1
working periodicity state
JFriel Nov 14, 2024
14f5163
promising
JFriel Nov 15, 2024
7115f44
interim
JFriel Nov 18, 2024
9cc351b
attempt works
JFriel Nov 18, 2024
3ef391f
fix row state
JFriel Nov 18, 2024
558e89c
tidy up
JFriel Nov 19, 2024
9c75308
rethink periodicity
JFriel Nov 19, 2024
3c8fc5f
Merge branch 'develop' of https://github.com/HicServices/RDMP into ta…
JFriel Dec 16, 2024
41dbf8a
row state without all
JFriel Dec 16, 2024
ad1f077
working row states
JFriel Dec 16, 2024
dbd7ed4
working column state
JFriel Dec 17, 2024
e8f4f06
confirm rows and columns
JFriel Dec 17, 2024
252f16d
actually fix rows
JFriel Dec 17, 2024
b98a7f8
add start of periodicity
JFriel Dec 17, 2024
3278c4b
working periodicity
JFriel Dec 17, 2024
cd0df1b
interim
JFriel Dec 17, 2024
31d46aa
Merge branch 'develop' of https://github.com/HicServices/RDMP into ta…
JFriel Dec 19, 2024
0bb6067
add test
JFriel Dec 19, 2024
05f1721
add class coumentation
JFriel Dec 19, 2024
e0bb3d3
tidy up
JFriel Dec 19, 2024
b4ada9f
add note
JFriel Dec 19, 2024
2bace40
tidy up
JFriel Jan 6, 2025
2d3a7fa
tidy up
JFriel Jan 6, 2025
851416e
add docs
JFriel Jan 6, 2025
91dcf0d
Merge branch 'develop' into task/RDMP-224-DQE-update
JFriel Jan 6, 2025
1debdcb
codeql updates
JFriel Jan 6, 2025
fcac338
update test
JFriel Jan 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ private T Activate<T, T2>(T2 databaseObject, Image<Rgba32> tabImage)

uiInstance.SetDatabaseObject(this, databaseObject);

if (insertIndex is not null)
if (insertIndex is not null && _mainDockPanel.ActivePane is not null)
{
_mainDockPanel.ActivePane.SetContentIndex(floatable, (int)insertIndex);
}
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

- Build on and target .Net 9 rather than 8
- Add DQE Updater Mutilator for Data Loads see [DQE Post Load runner](./Documentation/DataLoadEngine/DQEPostLoadRunner.md)

## [8.4.2] - 2024-12-18

- Fix issue with MEF constructing Remote Table Attachers
Expand Down
12 changes: 12 additions & 0 deletions Documentation/DataLoadEngine/DQEPostLoadRunner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# DQE Post Load Runner

The DQE post-load runner can be used to automatically perform a DQE update once a data load completes.
The runner attempts to reuse any existing DQE results that have been unaffected by the data load, however this process can still be slow if the catalogue data is large and/or complex.

## Requirements
The DQE post-load runner requires an existing DQE result to exist, otherwise it will fail.

## Configuration
The runner makes a number of queries to the database, the timeout for these commands is configurable via the timeout option.


415 changes: 415 additions & 0 deletions Rdmp.Core.Tests/DataQualityEngine/DQEPartialUpdateTests.cs

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions Rdmp.Core/CommandLine/Options/DqeOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,10 @@ public class DqeOptions : RDMPCommandLineOptions
{
[Option('c', "Catalogue", HelpText = "ID of the Catalogue to run the DQE on", Required = true)]
public string Catalogue { get; set; }

[Option('d', "DataLoad", HelpText = "ID of the Data Load to run the DQE on. Adds new data to existing DQE results if they exist", Required = false)]
public string DataLoadUpdateID { get; set; }

[Option('t', "Timeout", HelpText = "How long(in seconds) each internal SQL command should brun for before timing out")]
public int CommandTimeout { get; set; }
}
10 changes: 9 additions & 1 deletion Rdmp.Core/CommandLine/Runners/DqeRunner.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
// You should have received a copy of the GNU General Public License along with RDMP. If not, see <https://www.gnu.org/licenses/>.

using System;
using Org.BouncyCastle.Security.Certificates;
using Rdmp.Core.CommandLine.Options;
using Rdmp.Core.Curation.Data;
using Rdmp.Core.DataFlowPipeline;
Expand All @@ -29,12 +30,19 @@ public override int Run(IRDMPPlatformRepositoryServiceLocator repositoryLocator,
ICheckNotifier checkNotifier, GracefulCancellationToken token)
{
var catalogue = GetObjectFromCommandLineString<Catalogue>(repositoryLocator, _options.Catalogue);
int? dataLoadID = null;
if (_options.DataLoadUpdateID != null)
dataLoadID = int.Parse(_options.DataLoadUpdateID);

var report = new CatalogueConstraintReport(catalogue, SpecialFieldNames.DataLoadRunID);

switch (_options.Command)
{
case CommandLineActivity.run:
report.GenerateReport(catalogue, listener, token.AbortToken);
if (dataLoadID is not null)
report.UpdateReport(catalogue, (int)dataLoadID, _options.CommandTimeout, listener, token.AbortToken);
else
report.GenerateReport(catalogue, listener, token.AbortToken);
return 0;

case CommandLineActivity.check:
Expand Down
2 changes: 1 addition & 1 deletion Rdmp.Core/Curation/Data/Aggregation/AggregateFilter.cs
Original file line number Diff line number Diff line change
Expand Up @@ -208,4 +208,4 @@ public AggregateFilter ShallowClone(AggregateFilterContainer into)
CopyShallowValuesTo(clone);
return clone;
}
}
}
2 changes: 1 addition & 1 deletion Rdmp.Core/Curation/Data/ExtractionFilter.cs
Original file line number Diff line number Diff line change
Expand Up @@ -187,4 +187,4 @@ public override void DeleteInDatabase()

base.DeleteInDatabase();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
using Rdmp.Core.DataLoad.Engine.Attachers;
using Rdmp.Core.DataLoad.Engine.Job;
using Rdmp.Core.DataLoad.Engine.LoadExecution.Components.Arguments;
using Rdmp.Core.DataLoad.Modules.Attachers;
using Rdmp.Core.Repositories;
using Rdmp.Core.ReusableLibraryCode.Checks;
using Rdmp.Core.ReusableLibraryCode.Progress;
Expand All @@ -29,6 +30,8 @@ public class AttacherRuntimeTask : RuntimeTask, IMEFRuntimeTask
public AttacherRuntimeTask(IProcessTask task, RuntimeArgumentCollection args)
: base(task, args)
{

//RequestsExternalDatabaseCreation
//All attachers must be marked as mounting stages, and therefore we can pull out the RAW Server and Name
var mountingStageArgs = args.StageSpecificArguments;
if (mountingStageArgs.LoadStage != LoadStage.Mounting)
Expand Down
6 changes: 5 additions & 1 deletion Rdmp.Core/DataLoad/Modules/Mutilators/DQEPostLoadRunner.cs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ namespace Rdmp.Core.DataLoad.Modules.Mutilators;
public class DQEPostLoadRunner : IMutilateDataTables
{

[DemandsInitialization("Timeout length for each query required to run the DQE update",defaultValue:50000)]
public int Timeout { get; set; }
public void Check(ICheckNotifier notifier)
{
}
Expand Down Expand Up @@ -73,7 +75,9 @@ public ExitCodeType Mutilate(IDataLoadJob job)
DqeOptions options = new()
{
Catalogue = catalogue.ID.ToString(),
Command = CommandLineActivity.run
DataLoadUpdateID = job.DataLoadInfo.ID.ToString(),
Command = CommandLineActivity.run,
CommandTimeout = Timeout
};
var runner = RunnerFactory.CreateRunner(new ThrowImmediatelyActivator(job.RepositoryLocator), options);
runner.Run(job.RepositoryLocator, ThrowImmediatelyDataLoadEventListener.Quiet, new AcceptAllCheckNotifier(),
Expand Down
33 changes: 24 additions & 9 deletions Rdmp.Core/DataLoad/Triggers/DiffDatabaseDataFetcher.cs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
using System.Text;
using FAnsi;
using FAnsi.Discovery;
using MongoDB.Driver;
using Rdmp.Core.Curation.Data;
using Rdmp.Core.Curation.Data.Spontaneous;
using Rdmp.Core.QueryBuilding;
Expand Down Expand Up @@ -113,7 +114,7 @@ public void FetchData(ICheckNotifier checkNotifier)
CheckResult.Success));

GetInsertData(server, database, checkNotifier);
GetUpdatetData(server, database, checkNotifier);
GetUpdatedData(server, database, checkNotifier);
}
catch (Exception e)
{
Expand Down Expand Up @@ -163,7 +164,7 @@ private void GetInsertData(DiscoveredServer server, DiscoveredDatabase database,
}


private void GetUpdatetData(DiscoveredServer server, DiscoveredDatabase database, ICheckNotifier checkNotifier)
private void GetUpdatedData(DiscoveredServer server, DiscoveredDatabase database, ICheckNotifier checkNotifier)
{
const string archive = "archive";
const string zzArchive = "zzarchivezz";
Expand Down Expand Up @@ -191,7 +192,8 @@ private void GetUpdatetData(DiscoveredServer server, DiscoveredDatabase database
--Records which appear in the archive
SELECT top {{0}}
{{6}},
{{7}}
{{7}},
{{8}}
FROM {{1}}
CROSS APPLY
(
Expand All @@ -200,7 +202,7 @@ SELECT TOP 1 {{2}}.*
WHERE
{{3}}
order by {syntaxHelper.EnsureWrapped(SpecialFieldNames.ValidFrom)} desc
) {{8}}
) {{9}}
where
{{1}}.{{4}} = {{5}}";
break;
Expand All @@ -214,13 +216,14 @@ SELECT TOP 1 {{2}}.*
/*Records which appear in the archive*/
SELECT
{{6}},
{{7}}
{{7}},
{{8}}
FROM
{{1}}
Join
{{2}} {{8}} on {whereStatement.Replace(archiveTableName, archive)}
{{2}} {{9}} on {whereStatement.Replace(archiveTableName, archive)}
AND
{{8}}.{{9}} = (select max({syntaxHelper.EnsureWrapped(SpecialFieldNames.ValidFrom)}) from {{2}} s where {whereStatement.Replace(archiveTableName, archive).Replace(tableName, "s")})
{{9}}.{{10}} = (select max({syntaxHelper.EnsureWrapped(SpecialFieldNames.ValidFrom)}) from {{2}} s where {whereStatement.Replace(archiveTableName, archive).Replace(tableName, "s")})
where
{{1}}.{{4}} = {{5}}

Expand All @@ -241,8 +244,9 @@ SELECT TOP 1 {{2}}.*
_dataLoadRunID, //{5}
GetSharedColumnsSQL(tableName), //{6}
GetSharedColumnsSQLWithColumnAliasPrefix(archive, zzArchive), //{7}
archive, //{8}
syntaxHelper.EnsureWrapped(SpecialFieldNames.ValidFrom)
GetHICSpecialColumns(archive, zzArchive),//{8}
archive, //{9}
syntaxHelper.EnsureWrapped(SpecialFieldNames.ValidFrom) //{10}
);

var dtComboTable = new DataTable();
Expand All @@ -253,11 +257,15 @@ SELECT TOP 1 {{2}}.*

//add the columns from the combo table to both views
foreach (DataColumn col in dtComboTable.Columns)
{
if (!col.ColumnName.StartsWith(zzArchive, StringComparison.InvariantCultureIgnoreCase))
{
Updates_New.Columns.Add(col.ColumnName, col.DataType);
Updates_Replaced.Columns.Add(col.ColumnName, col.DataType);
}
}
Updates_Replaced.Columns.Add(SpecialFieldNames.DataLoadRunID, typeof(int));
Updates_Replaced.Columns.Add(SpecialFieldNames.ValidFrom, typeof(DateTime));

foreach (DataRow fromRow in dtComboTable.Rows)
{
Expand All @@ -272,6 +280,13 @@ SELECT TOP 1 {{2}}.*
}
}

private string GetHICSpecialColumns(string tableName, string columnAliasPrefix = "")
{
return $@"{tableName}.{SpecialFieldNames.DataLoadRunID} as {columnAliasPrefix}{SpecialFieldNames.DataLoadRunID},
{tableName}.{SpecialFieldNames.ValidFrom} as {columnAliasPrefix}{SpecialFieldNames.ValidFrom}
";
}

private string GetSharedColumnsSQLWithColumnAliasPrefix(string tableName, string columnAliasPrefix)
{
var sb = new StringBuilder();
Expand Down
3 changes: 1 addition & 2 deletions Rdmp.Core/DataQualityEngine/Data/ColumnState.cs
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,7 @@ public void Commit(Evaluation evaluation, string pivotCategory, DbConnection con
DatabaseCommandHelper.AddParameterWithValueToCommand("@PivotCategory", cmd, pivotCategory);
cmd.ExecuteNonQuery();
}



IsCommitted = true;
}
}
11 changes: 11 additions & 0 deletions Rdmp.Core/DataQualityEngine/Data/RowState.cs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,17 @@ public class RowState
public string PivotCategory { get; private set; }


public RowState(int dataLoadRunID, int correct, int missing, int wrong, int invalid,
string validatorXml, string pivotCategory)
{
Correct = correct;
Missing = missing;
Wrong = wrong;
Invalid = invalid;
ValidatorXML = validatorXml;
DataLoadRunID = dataLoadRunID;
}

public RowState(DbDataReader r)
{
Correct = Convert.ToInt32(r["Correct"]);
Expand Down
Loading
Loading