Skip to content

Latest commit

 

History

History
451 lines (325 loc) · 9.4 KB

README.md

File metadata and controls

451 lines (325 loc) · 9.4 KB

MySQL Infusion UDF

Description

The MySQL Infusion UDF is a functionality enhancement for MySQL. It provides a variety of new string, math and aggregate functions.

Installation

Please make sure you have the following required software on your system before installation:

  • C and C++ compiler
  • MySQL server, client and development files

Installation of prerequisites

Ubuntu

apt-get install libmysqlclient-dev build-essential

Debian (uses MariaDB)

apt-get install default-libmysqlclient-dev build-essential

On CentOS / Fedora

yum install libtool
yum install mysql-community-devel

To install MySQL Infusion UDF:

git clone https://github.com/infusion/udf_infusion.git
cd udf_infusion
./configure
make
sudo make install
mysql <options> < load.sql

You can choose to install only given UDF functions with the --enable-functions option:

./configure --enable-functions="<list-of-functions>"

where <list-of-functions> is a list of function names separated by space.

To uninstall:

mysql <options> < unload.sql
make uninstall

Aggregate Functions

Get the median of a set

double median(double n);

mysql> SELECT median(weight) from t1;

Calculate the co-variance of two random variables

double covariance(double x, double y);

mysql> SELECT covariance(a, b) from t1;

Calculate the correlation of two random variables

double corr(double x, double y);

mysql> SELECT corr(a, b) from t1;

Get the first element of a group

string group_first(string s);

mysql> SELECT group_first(s) from t1;

Get the last element of a group

string group_last(string s);

mysql> SELECT group_last(s) from t1;

Get the maximum number of elements less than a certain partial sum

int lesspart(double m, double psum);

mysql> SELECT lesspart(m, 1000) from t1;

Get the maximum number of elements less than a certain percentage of the partial sum

int lesspartpct(double m, double pct);

mysql> SELECT lesspartpct(m, 0.4) from t1;

Get the maximum number of elements less than average

int lessavg(double m);

mysql> SELECT lessavg(double m) from t1;

Calculate continuous percentile. Returns the value at a relative position specified by the fraction, interpolating between input values if needed.

double percentile_cont(double x, double fraction);

mysql> SELECT percentile_cont(x, 0.5) from t1;

Calculate discrete percentile. Returns the first input value whose relative position is greater than or equal to the specified fraction.

double percentile_disc(double x, double fraction);

mysql> SELECT percentile_disc(x, 0.5) from t1;

Calculates the 3th statistical moment of a data set: skewness See: http://geography.uoregon.edu/geogr/topics/moments.htm

double skewness(double m);

mysql> SELECT skewness(double m) from t1;

Find statistical mode, i.e. the most frequent input value.

double stats_mode(double x);

mysql> SELECT stats_mode(double x) from t1;

Calculates the 4th statistical moment of a data set: kurtosis See: http://geography.uoregon.edu/geogr/topics/moments.htm

double kurtosis(double m);

mysql> SELECT kurtosis(double m) from t1;

Misc Functions

A running SUM() for int and double/real values. Same as @x:= @x + value, but much faster and cleaner to code. The names are built up of R(unning)-SUM-(Int|Double)

int rsumi(int col);
double rsumd(double col);


mysql> SELECT rsumi(int_col) FROM t1;
mysql> SELECT rsumd(double_col) FROM t1;

A 64 bit hash function for MySQL, implementing a FNV algorithm

int fnv(string str);

mysql> SELECT cast( fnv( str ) as unsigned ) from t1;

Unfortunately, the UDF API doesn't support unsigned int return values. If you don't like the cast approach above, just add a stored function on your own:

CREATE FUNCTION ufnv(str TEXT) RETURNS INT UNSIGNED
   RETURN cast( fnv( str ) as unsigned );

String Functions

All string functions operate on UTF8 strings.

Cuts a string if it's longer then a max value and appends "...". Words are not chopped as long as white spaces are present.

string cut(string str, int num[, string x='...']);

mysql> SELECT cut('This is the funny world of MySQL...', 15);
+------------------------------------------------+
| cut('This is the funny world of MySQL...', 15) |
+------------------------------------------------+
| This is the...                                 |
+------------------------------------------------+
1 row in set (0.00 sec)

The old name of this function was makeurl but I renamed it to slug, because this seems to be the more official term for this one. The Wordpress Codex says this: A slug is a few words that describe a post or a page. Slugs are usually a URL friendly version of the post title [...], but a slug can be anything you like. Slugs are meant to be used with permalinks as they help describe what the content at the URL is.

string slug(string str[, char separator='_']);

mysql> SELECT slug('Max Müller Straße!', '-');
+----------------------------+
| slug('Max Müller Straße!') |
+----------------------------+
| max-mueller-strasse        |
+----------------------------+
1 row in set (0.00 sec)

Generates a list of ngrams from a given string.

string ngram(string str[, int size=2]);

mysql> SELECT ngram('Lorem ipsum dolor');
+-------------------------------------------------------+
| ngram('Lorem ipsum dolor')                            |
+-------------------------------------------------------+
| _l lo or re em m_ _i ip ps su um m_ _d do ol lo or r_ |
+-------------------------------------------------------+
1 row in set (0.01 sec)

Math Functions

Round to the next multiple of a base.

int bround(int num, int base);

mysql> SELECT bround(13, 3);
+---------------+
| bround(13, 3) |
+---------------+
|            15 |
+---------------+
1 row in set (0.00 sec)

Round to the next power of 10. This breaks down 10^ceil(log(n) / log(10))

int xround(int num);

mysql> SELECT xround(55);
+------------+
| xround(55) |
+------------+
|        100 |
+------------+
1 row in set (0.00 sec)

Limits a number to a specified lower min- and/or a upper max value. Not used min/max can be ignored with NULL

double bound(double num, double min, double max);

mysql> SELECT bound(12, 0, 4);
+-----------------+
| bound(12, 0, 4) |
+-----------------+
|      4.00000000 |
+-----------------+
1 row in set (0.00 sec)

Calculates N over K

int noverk(int n, int k);

mysql> select noverk(49, 6);
+---------------+
| noverk(49, 6) |
+---------------+
|      13983816 |
+---------------+
1 row in set (0.00 sec)

Binary Functions

Check if a bit-flag is set in the number.

bool isbit(int mask, int n);

mysql> SELECT isbit(5, 2);
+-------------+
| isbit(5, 2) |
+-------------+
|           1 |
+-------------+
1 row in set (0.00 sec)

Set a bit-flag at the position to 1 (or optionally 0).

int setbit(int mask, int n[, bool init=1]);

mysql> SELECT setbit(8, 4, 1);
+-----------------+
| setbit(8, 4, 1) |
+-----------------+
|              24 |
+-----------------+
1 row in set (0.00 sec)

Toggle/invert a bit-flag at the given position.

int invbit(int mask, int n);

mysql> SELECT invbit(8, 2);
+--------------+
| invbit(8, 2) |
+--------------+
|           12 |
+--------------+
1 row in set (0.00 sec)

Rotate (=shift to the left and append shifted-out bits to the right) the bits within the number

int rotbit(int mask, int n);

mysql> select rotbit(13, 1);
+---------------+
| rotbit(13, 1) |
+---------------+
|            26 |
+---------------+
1 row in set (0.00 sec)

Get a smaller integer from a bigger integer. See: https://raw.org/article/store-small-numbers-in-big-numbers/

int getint(int mask, int x, int y);

mysql> SELECT GETINT(4283942, 4, 8);
+-----------------------+
| getint(4283942, 4, 8) |
+-----------------------+
|                     2 |
+-----------------------+
1 row in set (0.00 sec)

Set a smaller integer into a bigger integer. See: https://raw.org/article/store-small-numbers-in-big-numbers/

int setint(int mask, int x, int y, int m);

mysql> SELECT SETINT(4283942, 4, 8, 10);
+---------------------------+
| setint(4283942, 4, 8, 10) |
+---------------------------+
|                   4284070 |
+---------------------------+
1 row in set (0.00 sec)

Testing

udf_infusion contains a set of unit tests to verify the correctness of the provided UDF functions. Running them after installation is optional.

Prerequisites:

Note: The testing framework requires all UDF functions to be enabled during installation.

First, it is recommended you set connection details (incl. password) in ~/.my.cnf, e.g.:

[client]
user=<user>
password=<password>

Alternatively, you can set options to be passed to the MySQL client in the MYSQL_OPTIONS environment variable.

To prepare the testing environment (requires administrator rights in MySQL):

make test_prepare

This may take a while as sample data is generated and imported. Database udf_infusion_test is created and populated with generated data.

Run tests with:

make test

After completion, the temporary database can be dropped with test_clean:

make test_clean

License

Copyright (c) 2015, Robert Eisele Licensed under GPL Version 2 license.