Skip to content

Releases: bab2min/tomotopy

0.13.0

07 Aug 16:43
Compare
Choose a tag to compare
  • New features
    • Major features of Topic Model Viewer tomotopy.viewer.open_viewer() are ready now.
    • tomotopy.LDAModel.get_hash() is added. You can get 128bit hash value of the model.
    • Add an argument ngram_list to tomotopy.utils.SimpleTokenizer.
  • Bug fixes
    • Fixed inconsistent spans bug after Corpus.concat_ngrams is called.
    • Optimized the bottleneck of tomotopy.LDAModel.load() and tomotopy.LDAModel.save() and improved its speed more than 10 times.

0.12.7

18 Dec 15:24
Compare
Choose a tag to compare
  • New features
    • Added Topic Model Viewer tomotopy.viewer.open_viewer()
    • Optimized the performance of tomotopy.utils.Corpus.process()
  • Bug fixes
    • Document.span now returns the ranges in character unit, not in byte unit.

0.12.6

11 Dec 14:11
Compare
Choose a tag to compare
  • New features
    • Added some convenience features to tomotopy.LDAModel.train and tomotopy.LDAModel.set_word_prior.
    • LDAModel.train now has new arguments callback, callback_interval and show_progres to monitor the training progress.
    • LDAModel.set_word_prior now can accept Dict[int, float] type as its argument prior.

0.12.5

02 Aug 16:10
6ef712c
Compare
Choose a tag to compare
  • New features
    • Added support for Linux ARM64 architecture.

0.12.4

22 Jan 16:09
5982907
Compare
Choose a tag to compare
  • New features
    • Added support for macOS ARM64 architecture.
    • Added support for Python3.11
  • Bug fixes
    • Fixed an issue where tomotopy.Document.get_sub_topic_dist() raises a bad argument exception.
    • Fixed an issue where exception raising sometimes causes crashes.

0.12.3

20 Jul 15:22
Compare
Choose a tag to compare

New features

  • Now, inserting an empty document using tomotopy.LDAModel.add_doc() just ignores it instead of raising an exception. If the newly added argument ignore_empty_words is set to False, an exception is raised as before. (#161)
  • tomotopy.HDPModel.purge_dead_topics() method is added to remove non-live topics from the model. (#152)

Bug fixes

  • Fixed an issue that prevents setting user defined values for nuSq in tomotopy.SLDAModel (by @jucendrero). (#174)
  • Fixed an issue where tomotopy.utils.Coherence did not work for tomotopy.DTModel. (#164)
  • Fixed an issue that often crashed when calling make_doc() before calling train(). (#166)
  • Resolved the problem that the results of tomotopy.DMRModel and tomotopy.GDMRModel are different even when the seed is fixed. (#63)
  • The parameter optimization process of tomotopy.DMRModel and tomotopy.GDMRModel has been improved.
  • Fixed an issue that sometimes crashed when calling tomotopy.PTModel.copy().

0.12.2

06 Sep 11:20
20a44b0
Compare
Choose a tag to compare
  • An issue where calling convert_to_lda of tomotopy.HDPModel with min_cf > 0, min_df > 0 or rm_top > 0 causes a crash has been fixed.
  • A new argument from_pseudo_doc is added to tomotopy.Document.get_topics and tomotopy.Document.get_topic_dist.
    This argument is only valid for documents of PTModel, it enables to control a source for computing topic distribution.
  • A default value for argument p of tomotopy.PTModel has been changed. The new default value is k * 10.
  • Using documents generated by make_doc without calling infer doesn't cause a crash anymore, but just print warning messages.
  • An issue where the internal C++ code isn't compiled at clang c++17 environment has been fixed.

0.12.1

20 Jun 16:21
6f4cf0e
Compare
Choose a tag to compare
  • An issue where tomotopy.LDAModel.set_word_prior() causes a crash has been fixed.
  • Now tomotopy.LDAModel.perplexity and tomotopy.LDAModel.ll_per_word return the accurate value when TermWeight is not ONE.
  • tomotopy.LDAModel.used_vocab_weighted_freq was added, which returns term-weighted frequencies of words.
  • Now tomotopy.LDAModel.summary() shows not only the entropy of words, but also the entropy of term-weighted words.

0.12.0

29 Apr 16:48
Compare
Choose a tag to compare
  • Now tomotopy.DMRModel and tomotopy.GDMRModel support multiple values of metadata (see https://github.com/bab2min/tomotopy/blob/main/examples/dmr_multi_label.py )
  • The performance of tomotopy.GDMRModel was improved.
  • A copy() method has been added for all topic models to do a deep copy.
  • An issue was fixed where words that are excluded from training (by min_cf, min_df) have incorrect topic id. Now all excluded words have -1 as topic id.
  • Now all exceptions and warnings that generated by tomotopy follow standard Python types.
  • Compiler requirements have been raised to C++14.

0.11.1

27 Mar 15:56
Compare
Choose a tag to compare
  • A critical bug of asymmetric alphas was fixed. Due to this bug, version 0.11.0 has been removed from releases.