-
Notifications
You must be signed in to change notification settings - Fork 0
Function Reference
AppendID(recordset,idfield,output)
recordset | A record set to process. |
idfield | The name of the field to be appended containing the id for each row. |
output | The name of the returned record set. |
Return: | AppendID returns a record set. |
Associate(recordset,count)
recordset | A record set to process. |
count | An integer expression defining the number of times items must occur to be considered equivalent |
Associate(recordset,count).Apriori1
The Associate.Apriori1 attribute returns a record set with which single items are most likely to appear using an ‘old school’ brute force and speed approach.
Associate(recordset,count).Apriori2
The Associate.Apriori2 attribute returns a record set with which pairs of items are most likely to appear together using an ‘old school’ brute force and speed approach.
Associate(recordset,count).Apriori3
The Associate.Apriori3 attribute returns a record set with which triplets of items are most likely to appear together using an ‘old school’ brute force and speed approach.
Associate(recordset,count).AprioriN(maxN[,minN])
maxN | An integer expression defining the maximum size of sets to return. |
minN | (Optional) An integer expression defining the minimum size of sets to return. Default: 2. |
Return: | AprioriN returns a record set. |
Associate(dataset,count).EclatN(maxN[,minN])
maxN | An integer expression defining the maximum size of sets to return. |
minN | (Optional) An integer expression defining the minimum size of sets to return. Default: 2. |
Return: | EclatN returns a record set. |
Associate(dataset,count).Rules(patterns)
patterns | A record set derived from an Apriori1, Apriori2, Apriori3, AprioriN or EclatN subroutine. |
Return: | Rules returns a record set. |
Perceptron(N[,Alpha])
N | An integer expression defining the number of passes over the data to make during the learning process. |
Alpha | (Optional) A REAL value for the learning rate. Default: 0.1. |
Logistic([ridge][,epsilon][,maxIter])
ridge | (Optional) A REAL value for the ridge term used to ensure existence of Inv(X'*X) even if some independent variables X are linearly dependent. Default: 0.0001. |
epsilon | (Optional) A REAL value for the parameter used to test convergence. Default: 0.000000001. |
maxIter | (Optional) An integer expression defining the maximum number of iterations. Default: 200. |
The Classifier Interface (NaiveBayes, Perceptron and Logistic) exports the following attributes and subroutines:
LearnC(independent,dependent)
independent | A record set containing independent values. |
dependent | A record set containing dependent values. |
LearnD(independent,dependent)
independent | A record set containing independent values |
dependent | A record set containing dependent values |
ClassifyC(independent,model)
independent | A record set containing independent values. |
model | A record set containing a model derived from the LearnC subroutine. |
ClassifyD(independent,model)
independent | A record set containing independent values. |
model | A record set containing a model derived from the LearnD subroutine. |
TestC(independent,dependent)
independent | A record set containing independent values. |
dependent | A record set containing dependent values. |
TestD(independent,dependent)
independent | A record set containing independent values. |
dependent | A record set containing independent values. |
Compare(dependent,computed)
dependent | A record set containing dependent values. |
computed | A record set containing classification tags derived from a ClassifyC or ClassifyD subroutine. |
Compare(dependent,computed).Raw
The Compare.Raw attribute returns a detailed breakdown of every record in the test corpus including what the classification should have been and what it was.
Compare(dependent,computed).CrossAssignments
The Compare.CrossAssignments attribute returns for each record if a class is misclassified, what is it most likely to be misclassified as.
Compare(dependent,computed).PrecisionByClass
The Compare.PrecisionByClass attribute returns the precision broken down by the class that it should have been classified to.
Compare(dependent,computed).HeadLine
The Compare.Headline attribute returns the main precision number that shows how often the classifier was correct.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm])
documentset | A record set of documents to process. |
centroidset | A record set of centroids to process. |
niterations | (Optional) An integer expression defining the maximum number of iterations before stopping. Default: 1 |
nconverge | (Optional) A REAL value for the minimum distance for non-convergence. Default: 0.0. |
algorithm |
(Optional) The distance algorithm to use.
Possible Values:
|
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).AllResults
The AllResults attribute returns a record set with the result of all iterations.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).Convergence
The Convergence attribute returns the number of iterations that were performed.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).Result()
The Result() subroutine returns the final locations of the centroids.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).Result(n)
n | An integer expression defining the iteration to consider. |
Return: | Result(n) returns a record set. |
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).Delta(minN,maxN)
minN | An integer expression defining the minimum number of iterations. |
maxN | An integer expression defining the maximum number of iterations. |
Return: | Delta returns a record set. |
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).Delta(0)
The Delta(0) subroutine returns the total distance traveled by every centroid across each axis.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).DistanceDelta(minN,maxN)
minN | An integer expression defining the minimum number of iterations. |
maxN | An integer expression defining the maximum number of iterations. |
Return: | DistanceDelta returns a record set. |
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).DistanceDelta(0)
The DistanceDelta(0) subroutine returns the straight-line distance traveled by each centroid.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).DistanceDelta()
The DistanceDelta() subroutine returns the distance traveled by each centroid during the last iteration.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).Allegiances()
The Allegiances() subroutine returns the table of allegiances (centroid an entity is closest to) after convergence.
KMeans(documentset,centroidset[,niterations,nconverge,algorithm]).Allegiance(entityId,iterationN)
entityId | An integer expression defining the entity to find the allegiance for. |
iterationN | An integer expression defining the iteration to find the allegiance for. |
Return: | Allegiance returns a record set |
AggloN(numericfield,n[,algorithm,method])
numericfield | A NumericField set of records to process. |
n | An integer expression defining the number of iterations. |
algorithm |
(Optional) The distance algorithm to use.
Possible Values:
|
method |
(Optional) How to compute distance between clusters.
Possible values:
|
Return: | AggloN returns a record set . |
AggloN(numericfield,n[,algorithm,method]).Dendrogram
The Dendrogram attribute displays the output as a string representation of the tree diagram.
AggloN(numericfield,n[,algorithm,method]).Distances
The Distances attribute returns a record set of the remaining distances that would be used to further cluster the entities.
AggloN(numericfield,n[,algorithm,method]).Clusters
The Clusters attribute returns a record with each entity and the id of the cluster that the entity was assigned to.
Distances(numericfield1,numericfield2[,algorithm])
numericfield1 | A set of NumericField records to process. |
numericfield2 | A set of NumericField records to process. |
algorithm |
(Optional) The distance algorithm to use.
Possible Values:
|
Return: | Distances returns a record set. |
Closest(distances)
distances | A dataset containing distances. |
Return: | Closest returns a record set. |
Correlate(numericfield)
numericfield | A set of NumericField records to process. |
Correlate(numericfield).Simple
The Simple attribute returns a record set containing the Pearson and Spearman correlation co-efficient for every pair of fields.
Correlate(numericfield).Kendall
The Kendall attribute returns the Kendall Tau statistic for every pair of fields.
ByRounding(numericfield[,scale,delta])
numericfield | A set of NumericField records to process. |
scale | (Optional) A REAL value for the factor to multiply to bring data into a desired range. Default: 1.0. |
delta | (Optional) A REAL value to add to rebase a range, cause truncation or rounding up. Default: 0.0. |
Return: | ByRounding returns a record set. |
ByBucketing(numericfield[,numgroups])
numericfield | A set of NumericField records to process. |
numgroups | (Optional) An integer expression defining the number of groups to discretize numericfield into. Default: 10. |
Return: | ByBucketing returns a record set. |
ByTiling(numericfield[,numgroups])
numericfield | A set of NumericField records to process. |
numgroups | (Optional) An integer expression defining the number of groups to discretize numericfield into. Default: 10. |
Return: | ByTiling returns a record set. |
Do(numericfield,instructionset)
numericfield | A set of NumericField records to process. |
instructionset | A set of r_Method records containing metadata instructions. |
Return: | Do returns a record set. |
GenData(nrecords,distribution[,nfield])
nrecords | An integer expression defining the number of records to generate. |
distribution | A record set containing a distribution to take a random variable from. |
nfield | (Optional) An integer expression defining the column to fill. Default:1. |
Return: | GenData returns a record set. |
Uniform(low,high[,ranges])
low | A REAL value for the minimum value in the distribution. |
high | A REAL value for the maximum value in the distribution. |
ranges | (Optional) An integer expression defining the number of divisions to split the distribution into. Default: 10,000. |
Normal(mean,stdeviation[,ranges])
mean | A REAL value for the mean. |
stdeviation | A REAL value for the standard deviation. |
ranges | (Optional) An integer expression defining the number of divisions to split the distribution into. Default: 10,000. |
StudentT(v[,ranges])
v | An integer expression defining the degrees of freedom. |
ranges | (Optional) An integer expression defining the number of divisions to split the distribution into. Default: 10,000. |
Exponential(lambda[,ranges])
lambda | A REAL value for the rate parameter. |
ranges | (Optional) An integer expression defining the number of divisions to split the distribution into. Default: 10,000. |
Binomial(p[,ranges])
p | A REAL value for the success probability. |
ranges | (Optional) An integer expression defining the number of divisions to split the distribution into. Default 100. |
NegBinomial(p,r[,ranges])
p | A REAL value for the success probability. |
r | An integer expression defining the number of failures. |
ranges | (Optional) An integer expression defining the number of divisions to split the distribution into. Default: 1,000. |
Poisson(lambda[,ranges])
lambda | A REAL value for the expected value. |
ranges | (Optional) An integer expression defining the number of divisions to split the distribution into. Default: 100. |
The Distribution Interface (Uniform,Normal,StudentT,Exponential,Binomial,NegBinomial,Poisson) exports the following attributes and subroutines:
Density(RH)
RH | A REAL value. |
Return: | Density returns a single REAL value. |
Cumulative(RH)
RH | A REAL value. |
Return: | Cumulative returns a single REAL value. |
DensityV()
The DensityV subroutine returns a record set providing the probability density function at each range point.
CumulativeV()
The CumulativeV subroutine returns a vector providing the cumulative probability density function at each range point.
Ntile(percent)
percent | A REAL percentage value. |
Return: | Ntile returns a single REAL value. |
InvDensity(delta)
delta | A REAL value. |
Return: | InvDensity returns a single REAL value. |
FieldAggregates(numericfield)
numericfield | the name of the inputField |
FieldAggregates(numericfield).Simple
The Simple attribute returns the
- minimum
- maximum
- sum
- mean
- variance
- standard deviation
FieldAggregates(numericfield).SimpleRanked
The SimpleRanked attribute assigns every record a rank, arbitrarily picking which duplicate value receives the lower rank.
FieldAggregates(numericfield).Ranked
The Ranked attribute assigns every record a rank.
FieldAggregates(numericfield).Medians
The Medians attribute calculates the median for each column.
FieldAggregates(numericfield).Modes
The Modes attribute calculates the mode for each column.
FieldAggregates(numericfield).NTiles(n)
n | An integer expression defining how many groups to split population into. |
Return: | NTiles(n) returns a record set. |
FieldAggregates(numericfield).NTileRanges(n)
n | An integer expression defining how many tiles to split population into. |
Return: | NTileRanges(n) returns a record set. |
FieldAggregates(numericfield).Buckets(n)
n | An integer expression defining how many buckets to split population into. |
Return: | Buckets(n) returns a record set. |
FieldAggregates(numericfield).BucketRanges(n)
n | An integer expression defining how many buckets to split population into. |
Return: | BucketRanges(n) returns a record set. |
FromField(numericfield,layout,output[,map])
numericfield | A set of NumericField records to process. |
layout | The name of the resulting layout of the returned set. |
output | The name of the resulting record set. |
map | (Optional) The mapping table that was created by the ToField routine. Default: ‘ ‘ (left blank) |
Return: | FromField returns a record set. |
OLS(X,Y)
X | A record set containing independent variables. |
Y | A record set containing dependent variables. |
OLS(X,Y).Beta([control])
control |
(Optional) The Matrix decomposition method.
Possible Values:
|
OLS(X,Y).Extrapolate(independent,beta)
independent | A record set containing independent variables. |
beta | A record set containing results derived from Beta. |
Poly(X,Y,maxN)
X | A NumericField record set containing independent variables. |
Y | A NumericField record set containing dependent variables. |
maxN | An integer expression defining the maximum number of polynomial components used. Default: 6. |
Poly(X,Y,maxN).Beta The Beta attribute returns the unknown parameter value b used to predict values.
Poly(X,Y,maxN).Rsquared The Rsquared attribute returns the coefficient of determination, a measure of goodness of fit.
Poly(X,Y,maxN).SubBeta(K,N)
K | An integer expression defining the minimum number of polynomial components used. |
N | An integer expression defining the maximum number of polynomial components used. |
Return: | SubBeta returns a record set. |
ToField(recordset,output[,idfield,datafields])
recordset | A set of records to process. |
output | The name of the resulting NumericField record set. |
idfield | (Optional) A field that contains the Record ID for each row. Default: If omitted, it is assumed to be the first field. |
datafields | (Optional) A STRING containing a comma-delimited list of the fields to be treated as axes. Default: If omitted, all numeric fields that are not the Record ID will be treated as axes. NOTE: idfield defaults to the first field in the table, so if that field is specified as an axis field, then the user should be sure to specify a value in the idfield parameter. |
Return: | ToField returns a record set. |
Words(rawrecordset)
rawrecordset | A set of Raw records to process. |
Return: | Words returns a record set. |
Lexicon(words)
words | A set of WordElement records derived from a Words routine to process. |
Return: | Lexicon returns a record set. |
AllNGrams(words[,lexicon][,n])
words | A set of WordElement records derived from a Words routine to process. |
lexicon | (Optional) The output from the Lexicon routine. |
n | (Optional) An integer expression defining the maximum ngram size. Default: 3. |
Return: | AllNGrams returns a record set. |
Support(setofstrings,allngrams)
setofstrings | A set of strings. |
allngrams | The output from the AllNGrams routine. |
Return: | Support returns a single real value. |
Confidence(setofstrings1,setofstrings2,allngrams)
setofstrings1 | A set of strings. |
setofstrings2 | A set of strings. |
allngrams | The output from the AllNGrams routine. |
Return: | Confidence returns a single real value |
Lift(setofstrings1,setofstrings2,allngrams)
setofstrings1 | A set of strings. |
setofstrings2 | A set of strings. |
allngrams | The output from the AllNGrams routine. |
Return: | Lift returns a single real value. |
Conviction(setofstrings1,setofstrings2,allngrams)
setofstrings1 | A set of strings. |
setofstrings2 | A set of strings. |
allngrams | The output from the AllNGrams routine. |
Return: | Conviction returns a single real value. |
Ngrams(allngrams)
allngrams | The output from the AllNGrams routine. |
Return: | NGrams returns a record set. |
- number of documents in which the items appears
- the term frequency
- the inverse document frequency (IDF).
SubGrams(ngrams)
ngrams | The output from the Ngrams routine. |
Return: | SubGrams returns a record set. |
SplitCompare(ngrams)
ngrams | The output from the Ngrams routine. |
Return: | SplitCompare returns a record set. |
- initial unigram and the remainder
- the final unigram and the remainder
ShowPhrase(lexicon,string)
lexicon | A record set containing a lexicon derived from the Lexicon routine. |
string | A STRING of INTEGERs representing words in lexicon. |
Return: | ShowPhrase returns a record set. |
Enumerate(rawrecordset)
rawrecordset | A set of Raw records to process. |
Return: | Enumerate returns a record set. |
Clean(rawrecordset)
rawrecordset | A set of Raw records to process. |
Return: | Clean returns a record set. |
Split(rawrecordset)
recordset | A record set containing output derived from the Clean routine. |
Return: | Split returns a record set. |
Lexicon(recordset)
recordset | A record set containing output derived from the Split routine. |
Return: | Lexicon returns a record set. |
ToO(recordset,lexicon)
recordset | A record set containing output derived from the Split routine. |
lexicon | A record set containing output derived from the Lexicon routine. |
Return: | ToO returns a record set. |
FromO(recordset,lexicon)
owordelement | A record set containing output derived from the ToO routine. |
lexicon | A record set containing output derived from the Lexicon routine. |
Return: | FromO returns a record set. |
Trans(owordelement)
owordelement | A record set containing output derived from the ToO routine. |
Trans(owordelement).Wordbag
This WordBag attribute turns every document in the dataset into a wordbag, by removing then and counting multiple occurrences of a word within a document.
Trans(owordelement).WordsCounted
The WordsCounted attribute returns a dataset with the number of times that word occurs and tf-idf (Term Frequency – Inverse Document Frequency)
Trans(owordelement).TfIdf([lowthreshold][,lowdoccount])
lowthreshold | (Optional) A REAL value for the tf-idf value a word must be above to be kept. Default: 0.05. |
lowdoccount | (Optional) An integer expression defining the least number of documents a word must appear to qualify as a keyword candidate. Default: 200. |