Skip to content

Commit

Permalink
ANNZ 2.0.4
Browse files Browse the repository at this point in the history
**Modified the function, `CatFormat::addWgtKNNtoTree()`, and added
`CatFormat::asciiToFullTree_wgtKNN()`:** The purpose of the new
features is to add an output variable, denoted as `inTrainFlag` to the
output of evaluation. The new output indicates if the corresponding
object is "compatible" with other objects from the training dataset.
The compatibility is estimated by comparing the density of objects in
the training dataset in the vicinity of the evaluated object. If the
evaluated object belongs to an area of parameter-space which is not
represented in the training dataset, we will get `inTrainFlag = 0`. In
this case, the output of the training is probably unreliable.
  • Loading branch information
IftachSadeh committed Mar 18, 2015
1 parent d0ac2fc commit f3e0a0b
Show file tree
Hide file tree
Showing 11 changed files with 478 additions and 203 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

<!-- ## Master version -->

## ANNZ 2.0.4 (19/3/2015)

- **Modified the function, `CatFormat::addWgtKNNtoTree()`, and added `CatFormat::asciiToFullTree_wgtKNN()`:** The purpose of the new features is to add an output variable, denoted as `inTrainFlag` to the output of evaluation. The new output indicates if the corresponding object is "compatible" with other objects from the training dataset. The compatibility is estimated by comparing the density of objects in the training dataset in the vicinity of the evaluated object. If the evaluated object belongs to an area of parameter-space which is not represented in the training dataset, we will get `inTrainFlag = 0`. In this case, the output of the training is probably unreliable.

- Other minor modifications.

## ANNZ 2.0.3 (25/2/2015)

- **Added *MultiClass* support to binned classification:** The new option is controlled by setting the `doMultiCls` flag. In this mode, multiple background samples can be trained simultaneously against the signal. In the context of binned classification, this means that each classification bin acts as an independent sample during the training.
Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ANNZ 2.0.3
# ANNZ 2.0.4

## Introduction
ANNZ uses both regression and classification techniques for estimation of single-value photo-z (or any regression problem) solutions and PDFs. In addition it is suitable for classification problems, such as star/galaxy classification.
Expand Down Expand Up @@ -367,6 +367,12 @@ which in this example, adds the U-band magnitude and the error on the I-band mag

The directory, `output/test_randReg_quick/regres/eval/` (for the `scripts/annz_rndReg_quick.py` example), contains the output ascii and ROOT tree files, respectively, `ANNZ_randomReg_0000.csv` and `ANNZ_tree_randomReg_00002.root`. These have a similar format to that which is described above.

In addition to the above-mentioned variables, the parameter `inTrainFlag` is included in the output, provided the user sets:
```python
glob.annz["addInTrainFlag"] = True
```
(See `scripts/annz_rndReg_advanced.py`.) This output indicates if the an evaluated object is "compatible" with corresponding objects from the training dataset. The compatibility is estimated by comparing the density of objects in the training dataset in the vicinity of the evaluated object. If the evaluated object belongs to an area of parameter-space which is not represented in the training dataset, we will get `inTrainFlag = 0`. In this case, the output of the training is probably unreliable. The calculation is performed using a KNN approach, similar to the algorithm used for the `glob.annz["useWgtKNN"] = True` calculation.

### Single regression

The outputs of single regression are similar to those of randomized regression. In this case, the *best* MLM is actually the only MLM, and no PDF solutions are created. For instance, using `scripts/annz_singleReg_quick.py`, the performance plots will be found at `output/test_singleReg_quick/regres/optim/eval/plots/` and the output ascii file would be found at `output/test_singleReg_quick/regres/optim/eval/ANNZ_singleReg_0000.csv`. The latter would nominally include the variables:
Expand Down
25 changes: 25 additions & 0 deletions examples/scripts/annz_rndReg_advanced.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,31 @@
# (can be used to prevent multiple evaluation of different input files from overwriting each other)
glob.annz["evalDirPostfix"] = "nFile0"

# -----------------------------------------------------------------------------------------------------------
# addInTrainFlag, minNobjInVol_inTrain, maxRelRatioInRef_inTrain -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - addInTrainFlag - calculate for each object which is evaluated, if it is "close" in the
# input-parameter space to the training dataset. The result is written as part of the evaluation
# output, as an additional parameter named "inTrainFlag". The value of "inTrainFlag"
# is zero if the object is not "close" to the training objects (therefore probably has unreliable result).
# The calculation is performed using a KNN approach, similar to the algorithm used for
# the [glob.annz["useWgtKNN"] = True] calculation.
# - minNobjInVol_inTrain - The number of reference objects in the reference dataset which are used in the calculation.
# - maxRelRatioInRef_inTrain - A number in the range, [0,1] - The minimal threshold of the relative difference between
# distances in the inTrainFlag calculation for accepting an object - Should be a (<0.5) positive number.
# - ...._inTrain - The rest of the parameters ending with "_inTrain" have a similar role as
# their "_wgtKNN" counterparts, which are used with [glob.annz["useWgtKNN"] = True]. These are:
# - "outAsciiVars_inTrain", "weightInp_inTrain", "cutInp_inTrain",
# "cutRef_inTrain", "sampleFracInp_inTrain" and "sampleFracRef_inTrain"
# -----------------------------------------------------------------------------------------------------------
addInTrainFlag = False
if addInTrainFlag:
glob.annz["addInTrainFlag"] = True
glob.annz["minNobjInVol_inTrain"] = 100
glob.annz["maxRelRatioInRef_inTrain"] = 0.1
glob.annz["weightVarNames_inTrain"] = "MAG_U;MAG_G;MAG_R;MAG_I;MAG_Z"
# glob.annz["weightRef_inTrain"] = "(MAG_Z<20.5 && MAG_R<22 && MAG_U<24)" # cut the reference sample, just to have some difference...

# run ANNZ with the current settings
runANNZ()

Expand Down
1 change: 1 addition & 0 deletions include/CatFormat.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class CatFormat : public BaseClass {
void asciiToSplitTree(TString inAsciiFiles, TString inAsciiVars);
void asciiToFullTree(TString inAsciiFiles, TString inAsciiVars, TString treeNamePostfix = "");
void asciiToSplitTree_wgtKNN(TString inAsciiFiles, TString inAsciiVars, TString inAsciiFiles_wgtKNN, TString inAsciiVars_wgtKNN);
void asciiToFullTree_wgtKNN(TString inAsciiFiles, TString inAsciiVars, TString treeNamePostfix);
void parseInputVars(VarMaps * var, TString inAsciiVars, vector <TString> & inVarNames, vector <TString> & inVarTypes);
bool inputLineToVars(TString line, VarMaps * var, vector <TString> & inVarNames, vector <TString> & inVarTypes);
void setSplitVars(VarMaps * var, TRandom * rnd, map <TString,int> & intMap);
Expand Down
8 changes: 4 additions & 4 deletions src/ANNZ_loopCls.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ void ANNZ::optimCls() {
TGraphErrors * grph = new TGraphErrors(int(graph_X.size()),&graph_X[0], &graph_Y[0],&graph_Xerr[0], &graph_Yerr[0]);

grph->SetName(TString::Format((TString)"compPure_%d"+"_clasOptimize"+typeName+"_%d",nCompPureMgNow,nPlotSbSepNow));
grph->SetTitle(TString::Format((TString)"#%d, S_{s/b} ("+getTagName(nMLMnow)+","+typeToNameMLM[typeMLM[nMLMnow]]+") = %1.2e",nSbSepIndexNow,sbSepFrac));
grph->SetTitle(TString::Format((TString)"ranked as #%d, S_{s/b} ("+getTagName(nMLMnow)+","+typeToNameMLM[typeMLM[nMLMnow]]+") = %1.2e",nSbSepIndexNow+1,sbSepFrac));
grph->GetXaxis()->SetTitle("Completeness"); grph->GetYaxis()->SetTitle("Purity");
compPureMgV[typeName][nCompPureMgNow]->Add(grph);
}
Expand Down Expand Up @@ -325,9 +325,9 @@ void ANNZ::optimCls() {

normFactor = his1M[sigBckName][nMLMnow]->Integral(); if(normFactor>0) his1M[sigBckName][nMLMnow]->Scale(1/normFactor,"width");

his1M[sigBckName][nMLMnow]->SetTitle( TString::Format( (TString)"#%d, "+sigBckTitle+" ("+MLMname+","
his1M[sigBckName][nMLMnow]->SetTitle( TString::Format( (TString)"ranked as #%d, "+sigBckTitle+" ("+MLMname+","
+typeToNameMLM[typeMLM[nMLMnow]]+") - S_{s/b} = %1.2e"
,nSbSepIndexNow,nSbSepValNow ) );
,nSbSepIndexNow+1,nSbSepValNow ) );
}
}

Expand Down Expand Up @@ -632,7 +632,7 @@ void ANNZ::doEvalCls() {

// create the chain for the loop
// -----------------------------------------------------------------------------------------------------------
TString inTreeName = (TString)glob->GetOptC("treeName")+"_eval";
TString inTreeName = (TString)glob->GetOptC("treeName")+glob->GetOptC("evalTreePostfix");
TString inFileName = (TString)glob->GetOptC("outDirNameFull")+inTreeName+"*.root";

// prepare the chain and input variables. Set cuts to match the TMVAs
Expand Down
2 changes: 1 addition & 1 deletion src/ANNZ_loopReg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1756,7 +1756,7 @@ void ANNZ::doEvalReg(TChain * inChain, TString outDirName, vector <TString> * s
// -----------------------------------------------------------------------------------------------------------
// create the chain for the loop, or assign the input chain
// -----------------------------------------------------------------------------------------------------------
TString inTreeName = (TString)treeName+"_eval";
TString inTreeName = (TString)treeName+glob->GetOptC("evalTreePostfix");
TString inFileName = (TString)outDirNameFull+inTreeName+"*.root";

// prepare the chain and input variables. Set cuts to match the TMVAs
Expand Down
4 changes: 2 additions & 2 deletions src/ANNZ_loopRegCls.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -700,7 +700,7 @@ void ANNZ::makeTreeRegClsOneMLM(int nMLMnow) {
if(trainCut != "") cutExprs += (TString)" && ("+trainCut+")";

int nEvtPass = aChainOut->Draw(drawExprs,cutExprs);
if(nEvtPass > 0) his_all = (TH1F*)gDirectory->Get(hisName);
if(nEvtPass > 0) his_all = (TH1F*)gDirectory->Get(hisName); his_all->BufferEmpty();
}
if(!his_all) continue;

Expand All @@ -719,7 +719,7 @@ void ANNZ::makeTreeRegClsOneMLM(int nMLMnow) {
int nEvtPass = aChainOut->Draw(drawExprs,cutExprs);

if(nEvtPass > 0) {
his1_sb->SetDirectory(0); // allowed only after the chain fills the histogram
his1_sb->SetDirectory(0); his1_sb->BufferEmpty(); // allowed only after the chain fills the histogram
if(nSigBckNow == 0) his1_sig = his1_sb;
else his1_bck = his1_sb;
}
Expand Down
2 changes: 1 addition & 1 deletion src/ANNZ_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -690,7 +690,7 @@ void ANNZ::loadOptsMLM() {
aLOG(Log::DEBUG_1) <<coutWhiteOnBlack<<coutYellow<<" - starting ANNZ::loadOptsMLM() ... "<<coutDef<<endl;

int nMLMs = glob->GetOptI("nMLMs");
TString weightKNN = glob->GetOptC("baseName_weightKNN");
TString weightKNN = glob->GetOptC("baseName_wgtKNN");

inNamesVar.resize(nMLMs); inNamesErr.resize(nMLMs);

Expand Down
7 changes: 4 additions & 3 deletions src/CatFormat_asciiToTree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ void CatFormat::asciiToFullTree(TString inAsciiFiles, TString inAsciiVars, TStri
TString treeName = glob->GetOptC("treeName")+treeNamePostfix;
TString origFileName = glob->GetOptC("origFileName");
TString indexName = glob->GetOptC("indexName");
TString weightName = glob->GetOptC("baseName_weightKNN");
TString weightName = glob->GetOptC("baseName_wgtKNN");

map <TString,int> intMap;
vector <TString> inFileNameV, inVarNames, inVarTypes;
Expand Down Expand Up @@ -200,7 +200,7 @@ void CatFormat::asciiToSplitTree(TString inAsciiFiles, TString inAsciiVars) {
TString indexName = glob->GetOptC("indexName");
TString splitName = glob->GetOptC("splitName");
TString testValidType = glob->GetOptC("testValidType");
TString weightName = glob->GetOptC("baseName_weightKNN");
TString weightName = glob->GetOptC("baseName_wgtKNN");
bool doPlots = glob->GetOptB("doPlots");
TString plotExt = glob->GetOptC("printPlotExtension");
TString outDirNameFull = glob->GetOptC("outDirNameFull");
Expand Down Expand Up @@ -427,7 +427,8 @@ void CatFormat::asciiToSplitTree(TString inAsciiFiles, TString inAsciiVars) {
TCanvas * tmpCnvs = new TCanvas("tmpCnvs","tmpCnvs");
aChain->Draw(drawExprs,""); DELNULL(tmpCnvs);

TH1 * his1 = (TH1F*)gDirectory->Get(hisName); his1->SetDirectory(0); his1->SetTitle(branchNameV[nBranchNow]); assert(dynamic_cast<TH1F*>(his1));
TH1 * his1 = (TH1F*)gDirectory->Get(hisName); assert(dynamic_cast<TH1F*>(his1));
his1->SetDirectory(0); his1->BufferEmpty(); his1->SetTitle(branchNameV[nBranchNow]);

outputs->optClear();
outputs->draw->NewOptC("drawOpt" , "HIST");
Expand Down
Loading

0 comments on commit f3e0a0b

Please sign in to comment.