Skip to content
This repository has been archived by the owner on Jul 30, 2022. It is now read-only.

Commit

Permalink
Updated Frisian stopwords. Should be checked but have been extracted …
Browse files Browse the repository at this point in the history
…from Fryske Akademy corpus.
  • Loading branch information
martijndeb committed Aug 11, 2014
1 parent eed2fe7 commit 9c7b7c9
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 3 deletions.
2 changes: 1 addition & 1 deletion src/applications/ExtractStopwords.hx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class ExtractStopwords

for ( word in list.keys() ) {

Sys.println( word + "," + list.get( word ) );
Sys.println( list.get( word ) + "," + word );

}

Expand Down
Binary file modified src/applications/ExtractStopwords.n
Binary file not shown.
10 changes: 8 additions & 2 deletions src/linguistics/languages/Frisian.hx
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,16 @@ class Frisian implements ILanguage {

var tokenizer:ITokenizer = Type.createInstance( basicTokenizer, [] );

// TODO: Stopwords could be improved
// TODO: Stopwords could be improved and should be checked.
// Extracted from older texts using http://argyf.fryske-akademy.eu/files/tdb/
stopwords = tokenizer.tokenize(

"de fan in it en yn nei is foar troch fan op as mei jo dizze net"
"'e 'k 't al as bij by dan dat de den der dij do doe doe't dwaen dy dêr dêr't ek " +
"en er fan foar folle gau gean gjin goed guod ha har hat haw het hie hiene him hinne " +
"hja hjir hoe hoe't hy ik in is it jild jimme jit jo jou kaam ken kenne kin kinne " +
"koe komme komt lang man mar mat mear mei moast moat my myn mynhear neat nei net no " +
"oan oare oars oer of oft om op ris sa se sei sels sil sjen soe syn ta te toe troch " +
"tsjin waard want wat weard wer wier wirde woe wol wy wêr wêr't yn âlde ôf út ût"

);

Expand Down

0 comments on commit 9c7b7c9

Please sign in to comment.