Skip to content

Latest commit

 

History

History
1798 lines (1565 loc) · 47.1 KB

README.md

File metadata and controls

1798 lines (1565 loc) · 47.1 KB

php-mecab Documentation

Documentation for the package rsky/php-mecab.

Contents

Installation

(Please note that I am a Linux user and have only tested the Linux installation guide. The Mac and Windows installation guides have been pieced together from other sources.)

Ubuntu users can use the install script included in this repository to install mecab and php-mecab.
Download the script:

curl -O https://raw.githubusercontent.com/nihongodera/php-mecab-documentation/master/install-php-mecab.sh

Make the file executable:

chmod +x install-php-mecab.sh

Execute the script:

./install-php-mecab.sh

For information about what the script does, see here.

Install MeCab

Before installing php-mecab, you must install MeCab.

Linux

Linux users can more than likely find MeCab in their distro repositories. Simply install 'mecab' and the package 'mecab-ipadic-utf8'. Ubuntu users can do this with the following command.

sudo apt-get install mecab mecab-ipadic-utf8

If that doesn't work, you can download the source and build it yourself. Note that this will require the package 'build-essential'.
First pull in MeCab.

wget https://mecab.googlecode.com/files/mecab-0.996.tar.gz
tar zxfv mecab-0.996.tar.gz
cd mecab-0.996
./configure --with-charset=utf8 --enable-utf8-only

Then get the dictionary file.

wget https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
tar zxfv mecab-ipadic-2.7.0-20070801.tar.gz
cd mecab-ipadic-2.7.0-20070801
./configure --with-charset=utf8

Mac OS X

Both MeCab and the required dictionary (mecab-ipadic-utf8) are in MacPorts. If that doesn't work, try downloading the source and building it yourself. You can get the source and the dictionary from the following urls:
https://mecab.googlecode.com/files/mecab-0.996.tar.gz
https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz

I believe you can build these files with Xcode. Somebody correct me if I'm wrong.

Windows

Download the installer from this url: https://mecab.googlecode.com/files/mecab-0.996.exe

Install php-mecab

First, verify that you have MeCab on your computer by testing it in the command line. Type mecab and if you don't get an error, things are looking good. If you get an error that looks something like this param.cpp(69) [ifs] no such file or directory: /usr/local/lib/mecab/dic/ipadic/dicrc you need to find your dictionary file and pass it as a parameter. The directory is called 'ipadic-utf8' and needs to contain a file called 'unk.dic'.

mecab --dicdir=/path/to/dictionary/dic/ipadic/

Once you get mecab to start, type some Japanese and make sure you get an appropriate response.

~$ mecab
やった!
やっ    動詞,自立,*,*,五段・ラ行,連用タ接続,やる,ヤッ,ヤッ
た      助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
!      記号,一般,*,*,*,*,!,!,!
EOS

Linux

Install the following dependencies:
php5:
php5-dev libmecab-dev
build-essential

sudo apt-get install php5-dev libmecab-dev build-essential

php7:
php7.0-dev libmecab-dev
build-essential

sudo apt-get install php7.0-dev libmecab-dev build-essential

Download the php-mecab source.

wget https://github.com/rsky/php-mecab/archive/master.zip

You will need to find the package 'mecab-config'. It is usually located at /usr/bin/mecab-config, but check to make sure. Let's use 'locate' because its easy.

sudo updatedb
locate mecab-config

That should give you a path that looks something like /usr/bin/mecab-config.
We should now be ready to build our package. Put your mecab-config path after the --with-mecab-config option.

unzip master.zip
cd php-mecab-master/mecab
phpize
sudo ./configure --with-php-config=/usr/bin/php-config --with-mecab-config=/path/to/mecab-config
sudo make
sudo make install

Occasionally, configure will fail and throw the following error:

configure: error: wrong MeCab library version or lib not found. Check config.log for more information

This usually happens when mecab didn't install properly. To fix this, purge all mecab packages:

sudo apt-get --purge remove mecab mecab-ipadic-utf8 mecab-utils libmecab-dev

This will often not remove all the binaries so you may have to manually go into bin and remove them yourself.

sudo rm /usr/local/bin/mecab
sudo rm /usr/local/bin/mecab-config

Then, reinstall everything:

sudo apt-get install mecab mecab-ipadic-utf8 mecab-utils libmecab-dev

After completing this step, you should have a mecab.so. Go to /usr/lib/php5/ and find the package with a name that looks is similar to this: 20131226. Have a look in that file and mecab.so should be in there.

We now just need to enable the mod.
For php5:
Move to /etc/php5/mods-available/

cd /etc/php5/mods-available/

Next, create a new .ini file for mecab.

sudo touch mecab.ini
echo "extension=mecab.so" | sudo tee -a mecab.ini

And then we need to activate the module.

sudo php5enmod mecab

For php7:
Move to /etc/php/php7.0/mods-available/

cd /etc/php/php7.0/mods-available/

Next, create a new .ini file for mecab.

sudo touch mecab.ini
echo "extension=mecab.so" | sudo tee -a mecab.ini

And then we need to activate the module.

sudo phpenmod -v 7.0 mecab

Once this is done, you simply need to restart your web server.
For Apache:

sudo service apache2 restart

And for nginx:

sudo service nginx restart

You should be ready to go.

Mac OS X

Instructions should be the same as for Linux, but you may require the package xcode in order to properly compile the source code.

Windows

Installing php-mecab is the same as installing any other php extension. The following guide may be of use:
http://php.net/manual/en/install.windows.extensions.php

According to one of the php-mecab readme files:

The extension provides the VisualStudio V6 project file mecab.dsp. To compile the extension you open this file using VisualStudio, select the apropriate configuration for your installation (either "Release_TS" or "Debug_TS") and create "php_mecab.dll"

After successfull compilation you have to copy the newly created "php_mecab.dll" to the PHP extension directory (default: C:\PHP\extensions).

Top

Usage

php-mecab can be used functionally or as an object. I prefer the OOP approach, but I will try to cover both approaches in this guide. Note that as of version 0.6.0, the procedural functions will not work in php 7.

Initialization

MeCab sometimes requires a dictionary directory to be passed to it on initialization. The location of the directory seems to vary by system, so find 'ipadic-utf8' on your system and pass the full folder path. Often, there will be more than one 'ipadic-utf8' folders on a system. Make sure the one you use contains a file called 'unk.dic'. Without this, mecab will fail to initialize. Pass the the dictionary directory to MeCab with the console flag '-d' in an array.

The options passed to MeCab are the same as the options used in the command line program. Send them to the constructor in an array. Check the man page for MeCab for all available options.

Object Orientated

New up a MeCab\Tagger object. Version 0.6.0:

$mecab = new \MeCab\Tagger();   

Earlier versions:

$mecab = new \MeCab_Tagger();   

If it does't work, or you get an error, try passing the array containing the command line flag '-d' and a dictionary folder path to it as a parameter.

$mecab = new \MeCab\Tagger(['-d', '/path/to/dictionary/mecab/dic/ipadic-utf8']);   

The variable $mecab will be a MeCab\Tagger object.
Throughout this guide, when I refer to $mecab in the object orientated sections, it will be a Tagger object.

Functional

Use the function mecab_new() to get a mecab resource. As with the Object Orientated approach, you may or may not have to pass it a dictionary directory.

$mecab = mecab_new(['-d', '/path/to/dictionary/mecab/dic/ipadic-utf8']);

The $mecab variable will be a resource of type 'mecab'.
Throughout this guide, when I refer to $mecab in the functional sections, it will be a MeCab resource.

Top

Splitting Strings

Split methods only split a string into an array of morphemes. They provide no information about the morphemes.

Object Orientated

As of version 0.6.0, the split method is no longer on the Tagger object. The following only applies to previous versions.
The split() method is static and so does not require an instance of Tagger. It might, however, need the dictionary directory path to be passed as an argument in order to function.

$split = \Mecab_Tagger::split('眠いです');

Or if that doesnt work.

$split = \Mecab_Tagger::split('眠いです', '/path/to/dictionary/mecab/dic/ipadic-utf8');

print_r($split);

// Results
Array
(
    [0] => 眠い
    [1] => です
)

If you have an instance of MeCab\Tagger you can also call the method on the object. You will still need to pass the dictionary directory.

$split = $mecab->split('たこ焼きが食べたい');

print_r($split);

// Results
Array
(
    [0] => たこ焼き
    [1] => が
    [2] => 食べ
    [3] => たい
)

Functional

Use the funtion mecab_split(). It may or may not require the dictionary directory to be passed.

$split = mecab_split('パンダをいくらで買いますか');

Or....

$split = mecab_split('パンダをいくらで買いますか', '/path/to/dictionary/mecab/dic/ipadic-utf8');

print_r($split);

// Results
Array
(
    [0] => パンダ
    [1] => を
    [2] => いくら
    [3] => で
    [4] => 買い
    [5] => ます
    [6] => か
)

Top

Parsing Strings

MeCab will parse strings of Japanese text and return results in either string form or as a MeCab\Node. The MeCab\Node class seems a little awkward and difficult to deal with at first, but they give the user a lot of power and make parsing results a little easier.

Object Orientated

To parse a string and get results in string form, a couple options exist. The first is the parse() method.

$results = $mecab->parse('チョコレートがやめられない');

echo $results;

// Results
チョコレート    名詞,一般,*,*,*,*,チョコレート,チョコレート,チョコレート
が      助詞,格助詞,一般,*,*,*,が,ガ,ガ
やめ    動詞,自立,*,*,一段,未然形,やめる,ヤメ,ヤメ
られ    動詞,接尾,*,*,一段,未然形,られる,ラレ,ラレ
ない    助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
EOS

You could also use the parseToString() method which produces the exact same results.

$results = $mecab->parseToString('チョコレートがやめられない');

echo $results;

// Results
チョコレート    名詞,一般,*,*,*,*,チョコレート,チョコレート,チョコレート
が      助詞,格助詞,一般,*,*,*,が,ガ,ガ
やめ    動詞,自立,*,*,一段,未然形,やめる,ヤメ,ヤメ
られ    動詞,接尾,*,*,一段,未然形,られる,ラレ,ラレ
ない    助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
EOS

To get results in node form, use parseToNode().

$node = $mecab->parseToNode('ご飯作りたくない');

var_dump($node);

// Results
object(MeCab\Node) (0) {
}

Functional

To get results as a string, use the function mecab_sparse_tostr().

$node = mecab_sparse_tostr($mecab, 'パンダいらないよね');

echo $node;

// Results
パンダ  名詞,一般,*,*,*,*,パンダ,パンダ,パンダ
いら    動詞,自立,*,*,五段・ラ行,未然形,いる,イラ,イラ
ない    助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
よ      助詞,終助詞,*,*,*,*,よ,ヨ,ヨ
ね      助詞,終助詞,*,*,*,*,ね,ネ,ネ
EOS

For node results, use mecab_sparse_tonode().

$node = mecab_sparse_tonode($mecab, 'これ長くなってる');

var_dump($node);

// Results
resource(5) of type (node)

Top

###Using Nodes Nodes make it easy to access the information MeCab provides and give users powerful ways to navigate through results.

The node returned from the parseToNode() methods discussed in the previous section is the first node in the series and only represents the first morpheme. In order to get information about the entire string, it is necessary to walk through all the nodes in the series. But before we tackle that, lets take a quick look at some of more useful methods we have at our disposal.

Object Orientated

  • getPrev(): Get the previous node in the series.
  • getNext(): Get the next node in the series.
  • getSurface(): Get the surface (the original morpheme) of the node.
  • getFeature(): Get the feature (the MeCab info) of the node.
  • getLength(): Get the length of the node's surface.
  • toArray(): Get all the node's elements as an associative array.

Functional

There are several other methods available, but these are the most useful at this point. For a full list of methods, see the Classes and Functions section of this guide.
So let's see how we can walk through the nodes and extract the information we need.

Object Orientated

You can go about this a couple ways. The first way simply walks through the nodes with a foreach loop.

$node = $mecab->parseToNode('カレーライスにしようかな');

foreach ($node as $n) {
    echo $n->getFeature() . "\n";
}

// Results
BOS/EOS,*,*,*,*,*,*,*,*
名詞,一般,*,*,*,*,カレーライス,カレーライス,カレーライス
助詞,格助詞,一般,*,*,*,に,ニ,ニ
動詞,自立,*,*,サ変・スル,未然ウ接続,する,シヨ,シヨ
助動詞,*,*,*,不変化型,基本形,う,ウ,ウ
助詞,副助詞/並立助詞/終助詞,*,*,*,*,か,カ,カ
助詞,終助詞,*,*,*,*,な,ナ,ナ
BOS/EOS,*,*,*,*,*,*,*,*

This isn't necessairly a bad way to do it, but it's a little too magical for my liking. If $node is the first node in the series (and it is, you can var_dump and verify this), it doesn't make sense to loop through each $node as $n where $node is a single node and $n is also a single node. Instead, I prefer to use MeCab\Node's methods to explicitly define what I am doing.

$node = $mecab->parseToNode('これの方がいい');

do {
    echo $node->getFeature() . "\n";
} while ($node = $node->getNext());

// Results
BOS/EOS,*,*,*,*,*,*,*,*
名詞,代名詞,一般,*,*,*,これ,コレ,コレ
助詞,連体化,*,*,*,*,の,ノ,ノ
名詞,非自立,一般,*,*,*,方,ホウ,ホー
助詞,格助詞,一般,*,*,*,が,ガ,ガ
形容詞,自立,*,*,形容詞・イイ,基本形,いい,イイ,イイ
BOS/EOS,*,*,*,*,*,*,*,*

We can extract the logic to a general purpose looping function.

function walkThroughNodes(\Mecab\Node $node, $callback)
{
    do {
        $callback($node);
    } while ($node = $node->getNext());
}

We can then pass our walkThroughNodes function a closure to tell it what to do with each node.

$node = $mecab->parseToNode('これの方がいい');

walkThroughNodes($node, function($node) {
    echo $node->getSurface() . "\n";
});

// Results

これ
の
方
が
いい

Now we have never have to worry about a basic walkthough again. We can simply pass our walkThroughNodes function a node and a callback.

Functional

As mentioned in the Object Orientated section above, we can simply walk through the nodes with a foreach loop, but I don't like that approach. Instead, lets use MeCab's nodes to our advantage.

$node = mecab_sparse_tonode($mecab, 'ビール飲みたい');

do {
    echo mecab_node_surface($node) . "\n";
} while ($node = mecab_node_next($node));

// Results

ビール
飲み
たい

Like we did in the Object Orientated section, lets extract this to a function that we can send a callback to.

function walkThroughNodes($node, $callback)
{
    do {
        $callback($node);
    } while ($node = mecab_node_next($node));
}

We can cuse our walkThroughNodes function like this.

$node = mecab_sparse_tonode($mecab, 'ビール飲みたい');

walkThroughNodes($node, function ($node) {
    echo mecab_node_surface($node) . "\n";
});

// Results

ビール
飲み
たい

Basic MeCab

Now that we can extract information from Japanese strings using MeCab and php-mecab, let's take a quick look at what this information means.

$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);

$string = $mecab->parseToString('行く');

echo $string;

// Results
行く    動詞,自立,*,*,五段・カ行促音便,基本形,行く,イク,イク
EOS

Commonly in MeCab you will see BOS and EOS. These mean 'Beginning of Sentence' and 'End of Sentence', respectively.
In output lines, there are generally two parts, the surface and the feature. The surface is the original morpheme and the feature is MeCab info. In our case, '行く' is the surface and '動詞,自立,,,五段・カ行促音便,基本形,行く,イク,イク' is the feature. Remember you can use nodes to easily extract this information.

The feature is a comma seperated string with nine sections.
Section 1: Main part of speech category
Section 2: Part of speech sub-category
Section 3: Part of speech sub-category
Section 4: Part of speech sub-category
Section 5: Inflection type
Section 6: Inflection form
Section 7: Lemma (the root word found in the dictionary)
Section 8: Reading
Section 9: Pronunciation

In our example:

print_r(explode(',', '動詞,自立,*,*,五段・カ行促音便,基本形,行く,イク,イク'));

// Results
    [0] => 動詞  // Main part of speech category
    [1] => 自立  // Part of speech sub-category
    [2] => *  // Part of speech sub-category (none)
    [3] => *  // Part of speech sub-category (none)
    [4] => 五段・カ行促音便  // Inflection type
    [5] => 基本形  // Inflection form
    [6] => 行く  // Lemma (the root word found in the dictionary)
    [7] => イク  // Reading
    [8] => イク  // Pronunciation

What you do with this information is up to you!

Top

Classes and Functions

Classes

MeCab\Tagger

Main class used to parse text.

Methods
version() [static]

Return Mecab version.

/**
 * @return      string
 */
split($string, $dic_dir, $user_dic, $filter, $persistent) [static]

Only on versions prior to 0.6.0.
Split string into array of morphemes. Usually requires the dictionary directory to be passed as a parameter.

/**
 * @param       string          $string         String to split.  
 * @param       string          $dic_dir        Path to dictionary directory. (Optional)    
 * @param       string          $user_dic       Path to user dictionary. (Optional)   
 * @param       callback        $filter         Filter function or method. (Optional)      
 * @param       boolean         $persistent     (Optional)    
 *
 * @return      array 
 */

Example

$mecab = new \Mecab_Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);

$array = $mecab::split('行きます', '/var/lib/mecab/dic/ipadic-utf8');

print_r($array);

Array
(
 [0] => 行き
 [1] => ます
)   
__construct($arguments, $persistent)

Construct class instance.

/**
 * @param       array           $arguments      Command line arguments.        
 * @param       boolean         $persistent     (Optional)    
 *
 * @return      MeCab\Tagger 
 */
getPartial()

Get current partial parsing mode state.

/**
 * @return      boolean
 */
setPartial($bool)

Set partial parsing mode.

/**
 * @param       boolean         $bool           Partial parsing mode.
 */
getTheta()

Get current temparature parameter theta.

/**
 * @return      float
 */
setTheta($theta)

Set temparature parameter theta.

/**
 * @param       float/int       $theta          Temparature parameter theta.
 */
getLatticeLevel()

Get current lattice level.

/**
 * @return      int
 */
setLatticeLevel($level)

Set lattice level.

/**
 * @param       int             $level          Lattice level.
 */
getAllMorphs()

Get all-morphs output mode.

/**
 * @return      bool
 */
setAllMorphs($bool)

Set all-morphs output mode.

/**
 * @param       bool            $bool           All-morphs output mode.
 */
parse($string, $length, $output_length)

Parse string and output results as string.

/**
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)  
 * @param       int             $output_length  Maximum length of output. (Optional)
 *
 * @return      string
 */

Example

$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);

$string = $mecab->parse('行きます');

print_r($string);

行き    動詞,自立,*,*,五段・カ行促音便,連用形,行く,イキ,イキ
ます    助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
EOS
parseToString($string, $length, $output_length)

Parse string and output results as string.

/**
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)  
 * @param       int             $output_length  Maximum length of output. (Optional)
 *
 * @return      string
 */

Example

$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);

$string = $mecab->parseToString('行きます');

print_r($string);

行き    動詞,自立,*,*,五段・カ行促音便,連用形,行く,イキ,イキ
ます    助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
EOS
parseToNode($string, $length)

Parse string and output results as MeCab/Node.

/**
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)
 *
 * @return      MeCab/Node
 */

Example

$mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']);

$node = $mecab->parseToNode('行きます');

print_r($node->toArray());

Array
(
[surface] => 
[feature] => BOS/EOS,*,*,*,*,*,*,*,*
[id] => 0
[length] => 0
[rlength] => 0
[rcAttr] => 0
[lcAttr] => 0
[posid] => 0
[char_type] => 0
[stat] => 2
[isbest] => 1
[alpha] => 0
[beta] => 0
[prob] => 0
[wcost] => 0
[cost] => 0
)
parseNBest($n, $string, $length, $output_length)

Parse given sentence and output N-best results as string. This method causes seg faults for me.

/**
 * @param       int             $n              Number of results to obtain.
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)
 * @param       int             $output_length  Maximum length of output. (Optional)
 *
 * @return      string
 */
parseNBestInit($string, $length)

Initialize N-best enumeration with a sentence.

/**
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)

 * @return      boolean
 */
next($output_length)

Get the next result of N-Best as a string.

/**
 * @param       int             $output_length  Maximum length of output. (Optional)
 *
 * @return      string
 */
nextNode()

Get the next result of N-Best as a node.

/**
 * @return      MeCab\Node
 */
formatNode($node)

Format a node to a string.

/**
 * @param       MeCab\Node      $node           Node to be formatted.
 *
 * @return      string
 */
dictionaryInfo()

Return array of dictionary info.

/**
 * @return      array
 */

Top

Mecab/Node

Returned by parseToNode method on Mecab\Tagger.

Methods
getIterator()

Return MeCab\NodeIterator.

/**
 * @return      MeCab\NodeIterator
 */
setTraverse($mode)

Set the traverse mode.

/**
 * @param       long            $mode           Traverse mode.
 */
getPrev()

Get the previous node. Return NULL if none.

/**
 * @return      MeCab\Node
 */
getNext()

Get the next node. Return NULL if none.

/**
 * @return      MeCab\Node
 */
getENext()

Get the next node which has same end point as the given node. Return NULL if none.

/**
 * @return      MeCab\Node
 */
getBNext()

Get the next node which has same beginning point as the given node. Return NULL if none.

/**
 * @return      MeCab\Node
 */
getRPath()

Get the next node which has same end point as the given node. Return NULL if none.

/**
 * @return      MeCab\Path
 */
getLPath()

Get the next node which has same beginning point as the given node. Return NULL if none.

/**
 * @return      MeCab\Path
 */
getSurface()

Get the surface of the node.

/**
 * @return      string
 */
getFeature()

Get the feature of the node.

/**
 * @return      string
 */
getId()

Get the ID of the node.

/**
 * @return      int
 */
getLength()

Get the length of the node's surface.

/**
 * @return      int
 */
getRLength()

Get the length of the node's surface including it's leading whitespace.

/**
 * @return      int
 */
getRcAttr()

Get the ID of the right context.

/**
 * @return      int
 */
getLcAttr()

Get the ID of the left context.

/**
 * @return      int
 */
getPosId()

Get the ID of the part of speech.

/**
 * @return      int
 */
getCharType()

Get the type of character.

/**
 * @return      int
 */
getStat()

Get the status of the node.

/**
 * @return      int
 */

0: Normal, MECAB_NOR_NODE
1: Unknown, MECAB_UNK_NODE
2: Beginning of Sentence, MECAB_BOS_NODE
3: End of Sentence, MECAB_EOS_NODE

getAlpha()

Get the forward log probability.

/**
 * @return      float
 */
getBeta()

Get the backward probability log.

/**
 * @return      float
 */
getWCost()

Get the word arising cost.

/**
 * @return      int
 */
getCost()

Get the cumulative cost of the node.

/**
 * @return      int
 */
getProb()

Get the marginal probability of the node.

/**
 * @return      float
 */
isBest()

Determine whether the node is the best solution.

/**
 * @return      boolean
 */
toArray($dump_all)

Get all elements of the node as an associative array.

/**
 * @param       boolean         $dump_all       Dump all related nodes if true. (Optional)
 *
 * @return      array
 */
toString()

Get the formatted string of the node.

/**
 * @return      string
 */

Top

MeCab\Path

Returned by getRPath and getLPath methods on MeCab/Node class.

Methods
getRNext()

Get the rnext path. Return NULL if none.

/**
 * @return      MeCab/Path
 */
getLNext()

Get the lext path. Return NULL if none.

/**
 * @return      MeCab/Path
 */
getRNode()

Get the rnode. Return NULL if none.

/**
 * @return      MeCab/Node
 */
getLNode()

Get the lnode. Return NULL if none.

/**
 * @return      MeCab/Node
 */
getProb()

Get the marginal probability of the path.

/**
 * @return      float
 */
getCost()

Get the cumulative cost of the path.

/**
 * @return      int
 */

Top

MeCab\NodeIterator

Node iterator class.

Methods
current()

Return the current element.

/**
 * @return      MeCab\Node
 */
key()
/**
 * @return      int
 */
next()

Set pointer to next element.

rewind()

Set pointer to beginning.

valid()

Check if there is a current element after calls to rewind() or next().

/**
 * @return   boolean
 */

Top

Functions

mecab_version()

Return MeCab version. Return MeCab version.

/**
 * @return      string
 */
mecab_split($string, $dic_dir, $user_dic, $filter, $persistent)

Split string into array of morphemes.

/**
 * @param       string          $string         String to split.  
 * @param       string          $dic_dir        Path to dictionary directory. (Optional)    
 * @param       string          $user_dic       Path to user dictionary. (Optional)   
 * @param       callback        $filter         Filter function or method. (Optional)      
 * @param       boolean         $persistent     (Optional)    
 *
 * @return      array
 */
mecab_new($arguments, $persistent)

Create new MeCab resource.

/**
 * @param       array           $arguments      Command line arguments.        
 * @param       boolean         $persistent     (Optional)    
 *
 * @return      MeCab
 */
mecab_destroy($mecab)

Free the tagger.

/**
 * @param       MeCab           $mecab          MeCab resource.
 */
mecab_get_partial($mecab)

Get current partial parsing mode state.

/**
 * @param       MeCab           $mecab          MeCab resource.
 *
 * @return      boolean
 */
mecab_set_partial($mecab, $partial)

Set partial parsing mode.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       boolean         $bool           Partial parsing mode.
 */
mecab_get_theta($mecab)

Get current temparature parameter theta.

/**
 * @param       MeCab           $mecab          MeCab resource.
 *
 * @return      float
 */
mecab_set_theta($mecab, $theta)

Set temparature parameter theta.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       float/int       $theta          Temparature parameter theta.
 */
mecab_get_lattice_level($mecab)

Get current lattice level.

/**
 * @param       MeCab           $mecab          MeCab resource.
 *
 * @return      int
 */
mecab_set_lattice_level($mecab, $level)

Set lattice level.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       int             $level          Lattice level.
 */
mecab_get_all_morphs($mecab)

Get all-morphs output mode.

/**
 * @param       MeCab           $mecab          MeCab resource.
 *
 * @return      bool
 */
mecab_set_all_morphs($mecab, $bool)

Set all-morphs output mode.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       bool            $bool           All-morphs output mode.
 */
mecab_sparse_tostr($mecab, $string, $length, $output_length)

Parse string and output results as string.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)  
 * @param       int             $output_length  Maximum length of output. (Optional)
 *
 * @return      string
 */
mecab_sparse_tonode($mecab, $string, $length)

Parse string and output results as MeCab/Node.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)
 *
 * @return      MeCab/Node
 */
mecab_nbest_sparse_tostr($mecab, $n, $string, $length, $output_length)

Parse given sentence and output N-best results as string. This method causes seg faults for me.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       int             $n              Number of results to obtain.
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)
 * @param       int             $output_length  Maximum length of output. (Optional)
 *
 * @return      string
 */
mecab_nbest_init($mecab, $string, $length)

Initialize N-best enumeration with a sentence.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       string          $string         String to be parsed.
 * @param       int             $length         Length to be analyzed. (Optional)

 * @return      boolean
 */
mecab_nbest_next_tostr($mecab, $output_length)

Get the next result of N-Best as a string.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       int             $output_length  Maximum length of output. (Optional)
 *
 * @return      string
 */
mecab_nbest_next_tonode($mecab)

Get the next result of N-Best as a node.

/**
 * @param       MeCab           $mecab          MeCab resource.
 *
 * @return      MeCab\Node
 */
mecab_format_node($mecab, $node)

Format a node to a string.

/**
 * @param       MeCab           $mecab          MeCab resource.
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      string
 */
mecab_dictionary_info($mecab)

Return array of dictionary info.

/**
 * @return      array
 */
mecab_node_toarray($node, $dump_all)

Get all elements of the node as an associative array.

/**
 * @param       MeCab\Node      $node           Node of source string.
 * @param       boolean         $dump_all       Dump all related nodes if true. (Optional)
 *
 * @return      array
 */
mecab_node_tostring($node)

Get the formatted string of the node.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      string
 */
mecab_node_prev($node)

Get the previous node. Return NULL if none.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      MeCab\Node
 */
mecab_node_next($node)

Get the next node. Return NULL if none.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      MeCab\Node
 */
mecab_node_enext($node)

Get the next node which has same end point as the given node. Return NULL if none.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      MeCab\Node
 */
mecab_node_bnext($node)

Get the next node which has same beginning point as the given node. Return NULL if none.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      MeCab\Node
 */
mecab_node_rpath($node)

Get the next node which has same end point as the given node. Return NULL if none.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      MeCab\Path
 */
mecab_node_lpath($node)

Get the next node which has same beginning point as the given node. Return NULL if none.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      MeCab\Path
 */
mecab_node_surface($node)

Get the surface of the node.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      string
 */
mecab_node_feature($node)

Get the feature of the node.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      string
 */
mecab_node_id($node)

Get the ID of the node.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_length($node)

Get the length of the node's surface.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_rlength($node)

Get the length of the node's surface including it's leading whitespace.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_rcattr($node)

Get the ID of the right context.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_lcattr($node)

Get the ID of the left context.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_posid($node)

Get the ID of the part of speech.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_char_type($node)

Get the type of character.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_stat($node)

Get the status of the node.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */

0: Normal, MECAB_NOR_NODE
1: Unknown, MECAB_UNK_NODE
2: Beginning of Sentence, MECAB_BOS_NODE
3: End of Sentence, MECAB_EOS_NODE

mecab_node_alpha($node)

Get the forward log probability.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      float
 */
mecab_node_beta($node)

Get the backward probability log.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      float
 */
mecab_node_wcost($node)

Get the word arising cost.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_cost($node)

Get the cumulative cost of the node.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      int
 */
mecab_node_prob($node)

Get the marginal probability of the node.

/**
 * @param       MeCab\Node      $node           Node of source string.
 *
 * @return      float
 */
mecab_node_isbest($node)

Determine whether the node is the best solution.

/**
 * @param       MeCab\Node      $node           Node of source string.
 * 
 * @return      boolean
 */
mecab_path_rnext($path)

Get the rnext path. Return NULL if none.

/**
 * @param       MeCab\Path      $path           Path of source string.
 *
 * @return      MeCab\Path
 */
mecab_path_lnext($path)

Get the lext path. Return NULL if none.

/**
 * @param       MeCab\Path      $path           Path of source string.
 *
 * @return      MeCab\Path
 */
mecab_path_rnode($path)

Get the rnode. Return NULL if none.

/**
 * @param       MeCab\Path      $path           Path of source string.
 *
 * @return      MeCab\Node
 */
mecab_path_lnode($path)

Get the lnode. Return NULL if none.

/**
 * @param       MeCab\Path      $path           Path of source string.
 *
 * @return      MeCab\Node
 */
mecab_path_prob($path)

Get the marginal probability of the path.

/**
 * @param       MeCab\Path      $path           Path of source string.
 *
 * @return      float
 */
mecab_path_cost($path)

Get the cumulative cost of the path.

/**
 * @param       MeCab\Path      $path           Path of source string.
 *
 * @return      int
 */

Top

Other Resources

The University of the Ryukyus Department of Mechanical Systems Engineering maintains a php-mecab API documentation page that can be useful.
http://mechsys.tec.u-ryukyu.ac.jp/~oshiro/php_mecab_apis.html

The MeCab documentation is here on github, but its in Japanese only and is a little outdated.
http://taku910.github.io/mecab/

jordwest has translated parts the MeCab documentation into English here.
https://github.com/jordwest/mecab-docs-en

The MeCab api documentation is up on googlecode.
https://mecab.googlecode.com/svn/trunk/mecab/doc/doxygen/index.html

If you're using an IDE, fumikito has a gist that can help with php-mecab class recognition.
https://gist.github.com/fumikito/bb172b4cf5648c7f8451

If an app your using requires php-mecab and you'd like to use Travis CI, check out the example-travis.yml file and the accompanying travis-install-php.sh file in this repository.

Top

Contributing

Please help me to improve this guide. If you find errors or places where you feel this guide is lacking, please create an issue or make a pull request. Also, I would love to see this guide translated into other languages, especially Japanese. Any help with translations would be much appreciated.