Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated for windows and bash #5

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
.idea
composer.lock
vendor
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Changelog
### Version 2.0.0
#### Added
- Added PHP8.0 typed class,
- Added constructor to main `HunspellPHP` class where the `$dictionary`, `$encoding` and `$dictionary_path` cal be set/overridden during initialization.
- Added `$dictionary_path` as a new argument were the dictionary files path may be specified (system default search locations are used otherwise). Additional `get()` and `set()`methods added.
- Added functionality to `findCommand` method via new `(bool)$stem_mode` argument.
#### Removed
- Removed `findStemCommand` method.
- Removed unused exception classes.
- Removed `HunspellPHP\Exceptions` namespace.
- Removed composer.lock from repo.
#### Fixed
- Renamed `$language` more appropriately `$dictionary` since that is what that property is referencing.
- Moved HunspellMatchTypeException up one directory to \HunspellPHP namespace.
- Fixed an issue where not all `$match` values were returned from the command response resulting in PHP warnings.
- Fixed a missing type `-` extraction from the matcher regex which resulted in PHP warnings and bad responses.
10 changes: 0 additions & 10 deletions README.MD

This file was deleted.

19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Hunspell PHP wrapper
Forked from [johnzuk/HunspellPHP](https://github.com/johnzuk/HunspellPHP)

### Version 2.0.0
Version 2.0.0 and above requires PHP ^8.0.0 and includes an important fix to the result matcher regex. If you need this for an older version of PHP I recommend that you fork 1.2 and update the regex matcher property of the Hunspell class to what is set in the current version of the code.

[View Changelog](CHANGELOG.md)

### The reason for this fork
This project was initially forked because the shell commands used were for a non-bash shell. This fork's main purpose was to convert the shell commands to a BASH compatible syntax and add support for Windows powershell. As such this fork will not work correctly outside of a bash or powershell environment.

An additional change was made to the parsing of the return value as the `PHP_EOL` value used in the original source was not working in my testing. This was changed to "\n" which resolved the issue.

Example
===================
```php
$hunspell = new \HunspellPHP\Hunspell();
var_dump($hunspell->find('otwórz'));
```
11 changes: 6 additions & 5 deletions composer.json
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
{
"name": "hunspell-php/hunspell-php",
"name": "belniakmedia/hunspell-php",
"description": "Hunspell PHP wrapper",
"minimum-stability": "dev",
"version": "2.0.0",
"license": "MIT",
"authors": [
{
"name": "Janusz Żukowicz",
"email": "[email protected]"
"name": "Richard Kukiela",
"email": "[email protected]"
}
],
"require": {
"php" : ">=5.6"
"php" : ">=8.0"
},
"autoload": {
"psr-4": {
"HunspellPHP\\": "src/HunspellPHP"
}
}
}
}
20 changes: 0 additions & 20 deletions composer.lock

This file was deleted.

8 changes: 0 additions & 8 deletions src/HunspellPHP/Exception/InvalidResultException.php

This file was deleted.

8 changes: 0 additions & 8 deletions src/HunspellPHP/Exception/WordNotFoundException.php

This file was deleted.

140 changes: 77 additions & 63 deletions src/HunspellPHP/Hunspell.php
Original file line number Diff line number Diff line change
@@ -1,153 +1,169 @@
<?php
/** @noinspection PhpUnused */
namespace HunspellPHP;

use HunspellPHP\Exception\InvalidMatchTypeException;
use HunspellPHP\Exception\InvalidResultException;
use HunspellPHP\Exception\WordNotFoundException;

class Hunspell
{
const OK = '*';

const ROOT = '+';

const MISS = '&';

const NONE = '#';

const COMPOUND = '-';

const STATUSES_NAME = [
Hunspell::OK => 'OK',
Hunspell::ROOT => 'ROOT',
Hunspell::MISS => 'MISS',
Hunspell::NONE => 'NONE',
Hunspell::COMPOUND => 'COMPOUND',
Hunspell::COMPOUND => 'COMPOUND'
];

protected string $encoding;
protected string $dictionary;
protected string $dictionary_path;
protected string $matcher =
'/(?P<type>\*|\+|&|#|-)\s?(?P<original>\w+)?\s?(?P<count>\d+)?\s?(?P<offset>\d+)?:?\s?(?P<misses>.*+)?/u';

/**
* @var string
* @param string $dictionary Dictionary name e.g.: 'en_US' (default)
* @param string $encoding Encoding e.g.: 'utf-8' (default)
* @param ?string $dictionary_path Specify the directory of the dictionary file (optional)
*/
protected $language = "pl_PL";
public function __construct(
string $dictionary = 'en_US',
string $encoding = 'en_US.utf-8',
?string $dictionary_path = null
) {
$this->dictionary = $this->clear($dictionary);
$this->encoding = $this->clear($encoding);
$this->dictionary_path = $dictionary_path;
}


/**
* @var string
* @return string
*/
protected $encoding = "pl_PL.utf-8";
public function getEncoding(): string
{
return $this->encoding;
}

/**
* @var string
* @return string
*/
protected $matcher =
"/(?P<type>\*|\+|&|#)\s?(?P<original>\w+)?\s?(?P<count>\d+)?\s?(?P<offset>\d+)?:?\s?(?P<misses>.*+)?/u";
public function getDictionary(): string
{
return $this->dictionary;
}

/**
* @return string
*/
public function getLanguage()
public function getDictionaryPath(): string
{
return $this->language;
return $this->dictionary_path;
}

/**
* @param string $language
* @param string $dictionary Language code e.g.: 'en_US'
*/
public function setLanguage($language)
public function setDictionary(string $dictionary): void
{
$this->language = $this->clear($language);
$this->dictionary = $this->clear($dictionary);
}

/**
* @return string
* @param string $dictionary_path The path to load the dictionary files from
*/
public function getEncoding()
public function setDictionaryPath(string $dictionary_path): void
{
return $this->encoding;
$this->dictionary_path = $dictionary_path;
}


/**
* @param string $encoding
* @param string $encoding Encoding value (includes language code) e.g.: 'en_US.utf-8'
*/
public function setEncoding($encoding)
public function setEncoding(string $encoding): void
{
$this->encoding = $this->clear($encoding);
}

/**
* @param $words
* @param string $words
* @return array
* @throws InvalidMatchTypeException
*/
public function find($words)
public function find(string $words): array
{
$matches = [];
$results = $this->preParse($this->findCommand($words), $words);

$response = [];
foreach ($results as $word => $result) {
$matches = [];
$match = preg_match($this->matcher, $result, $matches);

$matches = ['type' => null];
preg_match($this->matcher, $result, $matches);
$matches['input'] = $word;
$matches['type'] = $matches['type'] ?? null;
$matches['original'] = $matches['original'] ?? '';
$matches['misses'] = $matches['misses'] ?? [];
$matches['offset'] = $matches['offset'] ?? null;
$matches['count'] = $matches['count'] ?? null;
$response[] = $this->parse($matches);
}

return $response;
}

/**
* @param string $word word to find
* @return HunspellStemResponse
* @throws InvalidMatchTypeException
* @throws InvalidResultException
* @throws WordNotFoundException
*/
public function stem($word)
public function stem(string $word): HunspellStemResponse
{
$result = explode(PHP_EOL, $this->stemCommand($word));
$result = explode(PHP_EOL, $this->findCommand($word, true));
$result['input'] = $word;
$result = $this->stemParse($result);
return $result;
return $this->stemParse($result);
}

/**
* @param string $input
* @return mixed
*/
protected function clear($input)
{
return preg_replace('[^a-zA-Z0-9_\-.]', '', $input);
}

/**
* @return string
* @param string $input
*/
protected function findCommand($input)
protected function clear(string $input): string
{
return shell_exec(sprintf("LANG=%s; echo '%s' | hunspell -d %s", $this->encoding, $input, $this->language));
return (string)preg_replace('[^a-zA-Z0-9_-\.]', '', $input);
}

/**
* @return string
* @param string $input
* @param bool $stem_mode
* @return string
*/
protected function stemCommand($input)
protected function findCommand(string $input, bool $stem_mode = false): string
{
return shell_exec(sprintf("LANG=%s; echo '%s' | hunspell -d %s -s", $this->encoding, $input, $this->language));
$stem_switch = $stem_mode ? ' -s' : '';
$dictionary = $this->dictionary_path
? rtrim($this->dictionary_path, DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR . $this->dictionary
: $this->dictionary;
if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
return (string)shell_exec(sprintf("powershell \"set LANG='%s'; echo '%s' | hunspell -d %s%s\"", $this->encoding, $input, $dictionary, $stem_switch));
} else {
return (string)shell_exec(sprintf("export LANG='%s'; echo '%s' | hunspell -d %s%s", $this->encoding, $input, $dictionary, $stem_switch));
}
}

/**
* @param string $input
* @param string $words
* @return array
*/
protected function preParse($input, $words)
protected function preParse(string $input, string $words): array
{
$result = explode(PHP_EOL, trim($input));
unset($result[0]);
$words = array_map('trim', explode(" ", $words));
$result = explode("\n", trim($input));
array_shift($result);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention in the notes that I also changed the split char from " " to a regex \W because when people search for things on my site like blue/black it chokes up. I also added a catch in case something else breaks a warning is not thown if vars of two sizes are going to be passed to array_combine.

$words = array_map('trim', preg_split('/\W/', $words));

if(sizeof($result) != sizeof($words)) {
return [];
}
return array_combine($words, $result);
}

Expand All @@ -156,7 +172,7 @@ protected function preParse($input, $words)
* @return HunspellResponse
* @throws InvalidMatchTypeException
*/
protected function parse(array $matches)
protected function parse(array $matches): HunspellResponse
{
if ($matches['type'] == Hunspell::OK || $matches['type'] == Hunspell::COMPOUND) {
return new HunspellResponse(
Expand Down Expand Up @@ -193,10 +209,8 @@ protected function parse(array $matches)
/**
* @param array $matches
* @return HunspellStemResponse
* @throws InvalidMatchTypeException
* @throws WordNotFoundException
*/
protected function stemParse(array $matches)
protected function stemParse(array $matches): HunspellStemResponse
{
$input = $matches['input'];
unset($matches['input']);
Expand Down
Loading