Skip to content

Commit

Permalink
Add Romanji plugin.
Browse files Browse the repository at this point in the history
  • Loading branch information
zachleigh committed Nov 1, 2015
1 parent 13e877f commit 3507ed3
Show file tree
Hide file tree
Showing 20 changed files with 1,009 additions and 53 deletions.
65 changes: 58 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Limelight
[![Build Status](https://travis-ci.org/nihongodera/limelight.svg?branch=master)](https://travis-ci.org/zachleigh/petrol)
[![Latest Stable Version](https://poser.pugx.org/nihongodera/limelight/version.svg)](//packagist.org/packages/zachleigh/petrol)
[![License](https://poser.pugx.org/nihongodera/limelight/license.svg)](//packagist.org/packages/zachleigh/petrol)
[![Build Status](https://travis-ci.org/nihongodera/limelight.svg?branch=master)](https://travis-ci.org/nihongodera/limelight)
[![Latest Stable Version](https://poser.pugx.org/nihongodera/limelight/version.svg)](//packagist.org/packages/nihongodera/limelight)
[![License](https://poser.pugx.org/nihongodera/limelight/license.svg)](//packagist.org/packages/nihongodera/limelight)
##### A php Japanese language analyzer and parser.
- Split Japanese text into individual, full words
- Find parts of speech for words
Expand All @@ -20,6 +20,7 @@
- [Doing Raw MeCab Queries](#doing-raw-mecab-queries)
- [Plugins](#plugins)
- [Furigana](#furigana)
- [Romanji](#romanji)
- [Making Plugins](#making-plugins)
- [Change Log](#change-log)
- [Sources, Contributions, and Contributing](#sources-contributing-and-contributing)
Expand Down Expand Up @@ -400,9 +401,9 @@ echo $wordObject->reading()->toHiragana()->get(); // Output: とうきょう
Convert a property to katakana with toKatakana().
```php
// $wordObject is おいしい
echo $wordObject->reading; // Output: おいしい
echo $wordObject->word; // Output: おいしい

echo $wordObject->reading()->toKatakana()->get(); // Output: オイシイ
echo $wordObject->word()->toKatakana()->get(); // Output: オイシイ
```

### Doing Raw MeCab Queries
Expand Down Expand Up @@ -432,6 +433,7 @@ $array = $limelight->mecabSplit('食べます');

## Plugins
- [Furigana](#furigana)
- [Romanji](#romanji)
- [Making Plugins](#making-plugins)

Plugins make it easy to use the information gained from Limelight and allow users to customize the program to improve performance and get only the results they need. To register a plugin, list it and the full namespace of the class in the 'plugin' array in config.php.
Expand All @@ -451,7 +453,7 @@ Any options that the plugin needs are also registerd in config.php in an array w
],
```

Plugins can put results on individual LimelightWord objects, on the LimelightResults object, or both. To access the plugin data, simply call the 'plugin()' method on either LimelightWord or LimelightResults and pass the name of the plugin as parameter.
Plugins can put results on individual LimelightWord objects, on the LimelightResults object, or both. To access the plugin data, a few choices exist. First, call the 'plugin()' method on either LimelightWord or LimelightResults and pass the name of the plugin as parameter.
```php
$limelight = new Limelight();

Expand All @@ -463,6 +465,19 @@ echo $word->plugin('Furigana'); // Output: <ruby>東京<rt>とうきょう</rt><

echo $results->plugin('Furigana'); // Output: <ruby>東京<rt>とうきょう</rt></ruby>に<ruby>行<rt>い</rt></ruby>きます
```

Plugin data can also be accesed on LimelightWord objects in the same way other properties can be accesed by using either the property name or the property method call.
```php
$limelight = new Limelight();

$results = $limelight->parse('東京に行きます');

$word = $results->getByIndex(0);

echo $word->romanji; // Output: Toukyou

echo $word->romanji()->get(); // Output: Toukyou
```

### Furigana

Expand Down Expand Up @@ -490,11 +505,42 @@ $results = $limelight->parse('東京に行きます');

$word = $results->getByIndex(0);

echo $word->plugin('Furigana'); // Output: <ruby>東京<rt>とうきょう</rt></ruby>に<ruby>行<rt>い</rt></ruby>きます
echo $word->furigana; // Output: <ruby>東京<rt>とうきょう</rt></ruby>に<ruby>行<rt>い</rt></ruby>きます

echo $results->plugin('Furigana'); // Output: <ruby>東京<rt>とうきょう</rt></ruby>に<ruby>行<rt>い</rt></ruby>きます
```

### Romanji

The Romanji plugin converts words from Japanese to romanji (English letters). Currently, only [traditional hepburn](https://en.wikipedia.org/wiki/Hepburn_romanization) romanization is available, but other options are coming soon.

To get romanji for a string, parse it and access it on the LimelightResults object.
```php
$limelight = new Limelight();

$results = $limelight->parse('東京に行きます');

echo $results->plugin('Romanji'); // Output: Toukyou ni ikimasu
```
Strings on the LimelightResults object are space seperated.

Results can also be accessed on LimelightWord objects.
```php
$limelight = new Limelight();

$results = $limelight->parse('東京に行きます');

foreach ($results->getNext() as $word) {
echo $word->romanji;
}

// Output
//
// Toukyouniikimasu
```

Proper nouns are capitalized.

### Making Plugins

Making plugins for Limelight is simple. First, create a plugin class and have it extend Limelight\Plugins\Plugin. Limelight\Plugins\Plugin has one abstract method, handle(), which you must implement.
Expand Down Expand Up @@ -547,6 +593,11 @@ A plugin template with some example code can be found in Limelight/Plugins.
[Top](#contents)

## Change Log

Nov. 1, 2015: Version 1.2.0
- Added Romanji plugin
- Improved plugin data accessability
- Bug fixes

Oct. 30, 2015: Version 1.1.0
- Added plugin ability
Expand Down
17 changes: 17 additions & 0 deletions limelight_console
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,20 @@ use Limelight\Limelight;
require __DIR__ . '/vendor/autoload.php';

$limelight = new Limelight();

$results = $limelight->parse('東京に行きます');

// $word = $results->getByIndex(0);

// $romanji = $word->reading()->toRomanji()->get();

// echo $romanji;

// var_dump($results);

foreach ($results->getNext() as $word) {
echo $word->romanji;
}

// echo $word->romanji;
//
16 changes: 16 additions & 0 deletions src/Classes/LimelightResults.php
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,22 @@ public function getResultString()
return $string;
}

/**
* Get all lemmas combined as a string.
*
* @return [type] [description]
*/
public function getLemmaString()
{
$string = '';

foreach ($this->words as $word) {
$string .= $word->lemma()->get();
}

return $string;
}

/**
* Get all words.
*
Expand Down
26 changes: 26 additions & 0 deletions src/Classes/LimelightWord.php
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,25 @@ public function __get($name)
{
if (property_exists($this, $name)) {
return $this->$name;
} elseif (isset($this->pluginData[ucfirst($name)])) {
return $this->pluginData[ucfirst($name)];
}
}

/**
* Call methods for plugin items.
*
* @param string $name
* @param array $arguments
*
* @return $this
*/
public function __call($name, $arguments)
{
if (isset($this->pluginData[ucfirst($name)])) {
$this->returnItem = $this->pluginData[ucfirst($name)];

return $this;
}
}

Expand Down Expand Up @@ -250,6 +269,13 @@ public function toKatakana()
return $this;
}

public function toRomanji()
{


return $this;
}

/**
* Append value to end of property.
*
Expand Down
4 changes: 2 additions & 2 deletions src/Limelight.php
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ public function __construct()
*
* @return Limelight\Classes\LimelightResults
*/
public function parse($text)
public function parse($text, $runPlugins = true)
{
$tokenizer = new Tokenizer();

$tokenParser = new TokenParser();

$parser = new Parser($this->mecab, $tokenizer, $tokenParser);

return $parser->handle($text);
return $parser->handle($text, $runPlugins);
}

/**
Expand Down
10 changes: 6 additions & 4 deletions src/Parse/Parser.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@ class Parser
/**
* Construct.
*
* @param Mecab $mecab
* @param string $text
* @param Mecab $mecab
* @param Tokenizer $tokenizer
* @param TokenParser $tokenParser
*/
public function __construct(Mecab $mecab, Tokenizer $tokenizer, TokenParser $tokenParser)
{
Expand All @@ -42,18 +43,19 @@ public function __construct(Mecab $mecab, Tokenizer $tokenizer, TokenParser $tok
* Handle the parse for given text.
*
* @param string $text
* @param boolean $runPlugins
*
* @return [type] [description]
*/
public function handle($text)
public function handle($text, $runPlugins)
{
$node = $this->mecab->parseToNode($text);

$tokens = $this->tokenizer->makeTokens($node);

$words = $this->tokenParser->parseTokens($tokens);

$pluginResults = $this->runPlugins($text, $node, $tokens, $words);
$pluginResults = ($runPlugins ? $this->runPlugins($text, $node, $tokens, $words) : null);

return new LimelightResults($text, $words, $pluginResults);
}
Expand Down
4 changes: 0 additions & 4 deletions src/Parse/Tokenizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,6 @@ private function parseNode(Node $node)
$this->parsing = false;

return;

// $token['type'] = 'sentenceSplit';

// $token['literal'] = '';
} else {
$token['type'] = 'parsed';

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?php

namespace Limelight\Plugins\Plugins;
namespace Limelight\Plugins\Library\Furigana;

use Limelight\Plugins\Plugin;

Expand Down Expand Up @@ -35,9 +35,11 @@ public function __construct($text, $node, $tokens, $words)
*/
public function handle()
{
$furiganaWord = '';
$furiganaString = '';

foreach ($this->words as $wordObject) {
$furiganaWord = '';

$word = $wordObject->word;

$wordChars = $this->getChars($word);
Expand All @@ -57,11 +59,13 @@ public function handle()
$kanjiWithKana = $this->combineKanjiKana($kanji, $kana);

$furiganaWord .= $this->rebuildWord($wordChars, $kanjiWithKana, $katakanaChars);
}

$this->addToWord($wordObject, $furiganaWord);
$this->addToWord($wordObject, $furiganaWord);

$furiganaString .= $furiganaWord;
}

return $furiganaWord;
return $furiganaString;
}

/**
Expand Down Expand Up @@ -125,21 +129,7 @@ private function buildKanaArray(array $hiraganaChars, array $wordChars)

foreach ($hiraganaChars as $hiraganaChar) {
if ($this->countArrayValues($hiraganaChars, $hiraganaChar) !== 1 && !empty($wordKanaIntersect) && in_array($hiraganaChar, $wordKanaIntersect)) {
$reverseHiragana = array_reverse($hiraganaChars);

$reverseHiraganaCopy = $reverseHiragana;

$wordCopy = $wordChars;

foreach ($reverseHiragana as $key => $char) {
if (in_array($char, $wordCopy)) {
unset($wordCopy[array_search($char, $wordCopy)]);

unset($reverseHiraganaCopy[$key]);
}
}

return array_diff(array_reverse($reverseHiraganaCopy), $wordCopy);
return $this->reverseArrayCompile($wordChars, $hiraganaChars);
}
}

Expand All @@ -161,6 +151,31 @@ private function countArrayValues(array $array, $value)
return $counts[$value];
}

/**
* Find valid furigana by walking hiragana array in reverse.
*
* @param array $wordChars
* @param array $hiraganaChars
*
* @return array
*/
private function reverseArrayCompile(array $wordChars, array $hiraganaChars)
{
$reverseHiragana = array_reverse($hiraganaChars);

$reverseHiraganaCopy = $reverseHiragana;

foreach ($reverseHiragana as $key => $char) {
if (in_array($char, $wordChars)) {
unset($wordChars[array_search($char, $wordChars)]);

unset($reverseHiraganaCopy[$key]);
}
}

return array_diff(array_reverse($reverseHiraganaCopy), $wordChars);
}

/**
* Divide array into arrays of continuous keys.
*
Expand Down
Loading

0 comments on commit 3507ed3

Please sign in to comment.