Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
ahyatt committed Sep 30, 2023
2 parents 40e854f + 21c5f8f commit 74d43f5
Show file tree
Hide file tree
Showing 5 changed files with 60 additions and 14 deletions.
1 change: 1 addition & 0 deletions README.org
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ If you are using embedding and llm functionality, an example in which you use Op
:bind (([f11] . ekg-capture))
:init
(require 'ekg-embedding)
(ekg-embedding-generate-on-save)
(require 'ekg-llm)
(require 'llm-openai) ;; The specific provider you are using must be loaded.
(let ((my-provider (make-llm-openai :key "my-openai-api-key")))
Expand Down
12 changes: 10 additions & 2 deletions doc/ekg.org
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ Clone the ekg library, from whatever branch you would like to use (=main= corres
(require 'ekg)
#+end_src
* Changelog
** Version 0.4.1
- Fix issues using default emacs in-buffer completion, and allowing completion in places we shouldn't.
- Add =ekg-embedding-generate-on-save= and =ekg-embedding-disable-generate-on-save= to turn off generating embeddings for notes.
** Version 0.4
- Added ability to save in-progress notes.
- Added "magic tags", tags that cause elisp to be executed. See the [[#magic-tags][magic tags]] section for more detail.
Expand Down Expand Up @@ -412,10 +415,11 @@ Because inline commands exist, the complete text of a note should be retrieved w
* Extras
The ekg module can have any number of functionality additions. These may appear as other packages with other maintainers, but some are included as part of this package.
** Embeddings
The embeddings functionality can be turned on by requiring the embeddings file, such as:
The embeddings functionality can be turned on by requiring the embeddings file and enabling it, such as:

#+begin_src emacs-lisp
(require 'ekg-embedding)
(ekg-embedding-generate-on-save)
#+end_src

This module contains functionality to explore similar notes and search using techniques associated with large language models. Embeddings let you do searches at a semantic level, based on an understood meaning that is separate from the words used. For example, if I have a note with a recipe for linguini, embeddings will let me see that it is similar to notes about spaghetti, and not similar to notes about cold fusion. Because the search is not based on words, but meaning derived from those words, notes that describe the same thing in two different languages should be very similar. In ekg these let you find notes similar to a current note, or in fact any buffer. You can also do a query via embeddings.
Expand All @@ -428,17 +432,21 @@ The embedding interfaces with your preferred LLM provider via the =llm= package.
(use-package ekg
:init
(require 'llm-openai) ;; the specific LLM provider must be required
(require 'ekg-embedding)
(ekg-embedding-generate-on-save)
(let ((my-provider (make-llm-openai :key "my-openai-api-key")))
(setq ekg-llm-provider my-provider
ekg-embedding-provider my-provider)))
#+end_src

The embedding provider should be kept the same as you continue using ekg, however if you do change it, you will need to call =ekg-embedding-generate-all= with a prefix argument (=C-u M-x ekg-embedding-generate-all=), which will regenerate all embeddings asynchronously. The embedding provider does not have to be the same as the LLM provider (if you also use the [[#llm][LLM]] add-on.) Also note that the provider will get the text of all your notes, so if that bothers you, do not use any provider on a server.

Once you have this set up, and you have already called ~(require 'ekg-embedding)~ you can call =M-x ekg-embedding-generate-all=. This may take a long time as each
Once you have this set up, and you have already called ~(require 'ekg-embedding)~ and ~(ekg-embedding-generate-on-save)~ you can call =M-x ekg-embedding-generate-all=. This may take a long time as each
embedding has to be generated separately with its own API call. Once you've done this, you can call, in =ekg-notes-mode=, =ekg-embedding-show-similar= to get a list of similar notes. You can also call =ekg-embedding-search= to perform a search over your notes using embeddings. In any buffer, you can call =ekg-embedding-show-similar-to-current-buffer= to similar notes to whatever the text is in the curent buffer.

The variable =ekg-embedding-text-selector= has a value that is a function that will pre-process all text that is sent for embeddings. The default value is =ekg-embedding-text-selector-initial=, which will estimate the size of the tokens sent and limit the text to the first 8k tokens. Right now the function is tuned to the limits of Open AI's embedding framework, and a different function may be needed for other embedding APIs.

If you would like to stop generating embeddings for notes in a session, you can call ~(ekg-embedding-disable-generate-on-save)~.
** Logseq
:PROPERTIES:
:CUSTOM_ID: logseq
Expand Down
21 changes: 19 additions & 2 deletions doc/ekg.texi
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Installation
Changelog
* Version 0.4.1: Version 041.
* Version 0.4: Version 04.
* Version 0.3.3: Version 033.
* Version 0.3.2: Version 032.
Expand Down Expand Up @@ -175,6 +176,7 @@ Clone the ekg library, from whatever branch you would like to use (@samp{main} c
@chapter Changelog

@menu
* Version 0.4.1: Version 041.
* Version 0.4: Version 04.
* Version 0.3.3: Version 033.
* Version 0.3.2: Version 032.
Expand All @@ -184,6 +186,16 @@ Clone the ekg library, from whatever branch you would like to use (@samp{main} c
* Version 0.2: Version 02.
@end menu

@node Version 041
@section Version 0.4.1

@itemize
@item
Fix issues using default emacs in-buffer completion, and allowing completion in places we shouldn't.
@item
Add @samp{ekg-embedding-generate-on-save} and @samp{ekg-embedding-disable-generate-on-save} to turn off generating embeddings for notes.
@end itemize

@node Version 04
@section Version 0.4

Expand Down Expand Up @@ -728,10 +740,11 @@ The ekg module can have any number of functionality additions. These may appear
@node Embeddings
@section Embeddings

The embeddings functionality can be turned on by requiring the embeddings file, such as:
The embeddings functionality can be turned on by requiring the embeddings file and enabling it, such as:

@lisp
(require 'ekg-embedding)
(ekg-embedding-generate-on-save)
@end lisp

This module contains functionality to explore similar notes and search using techniques associated with large language models. Embeddings let you do searches at a semantic level, based on an understood meaning that is separate from the words used. For example, if I have a note with a recipe for linguini, embeddings will let me see that it is similar to notes about spaghetti, and not similar to notes about cold fusion. Because the search is not based on words, but meaning derived from those words, notes that describe the same thing in two different languages should be very similar. In ekg these let you find notes similar to a current note, or in fact any buffer. You can also do a query via embeddings.
Expand All @@ -744,18 +757,22 @@ The embedding interfaces with your preferred LLM provider via the @samp{llm} pac
(use-package ekg
:init
(require 'llm-openai) ;; the specific LLM provider must be required
(require 'ekg-embedding)
(ekg-embedding-generate-on-save)
(let ((my-provider (make-llm-openai :key "my-openai-api-key")))
(setq ekg-llm-provider my-provider
ekg-embedding-provider my-provider)))
@end lisp

The embedding provider should be kept the same as you continue using ekg, however if you do change it, you will need to call @samp{ekg-embedding-generate-all} with a prefix argument (@samp{C-u M-x ekg-embedding-generate-all}), which will regenerate all embeddings asynchronously. The embedding provider does not have to be the same as the LLM provider (if you also use the @ref{LLM} add-on.) Also note that the provider will get the text of all your notes, so if that bothers you, do not use any provider on a server.

Once you have this set up, and you have already called @code{(require 'ekg-embedding)} you can call @samp{M-x ekg-embedding-generate-all}. This may take a long time as each
Once you have this set up, and you have already called @code{(require 'ekg-embedding)} and @code{(ekg-embedding-generate-on-save)} you can call @samp{M-x ekg-embedding-generate-all}. This may take a long time as each
embedding has to be generated separately with its own API call. Once you've done this, you can call, in @samp{ekg-notes-mode}, @samp{ekg-embedding-show-similar} to get a list of similar notes. You can also call @samp{ekg-embedding-search} to perform a search over your notes using embeddings. In any buffer, you can call @samp{ekg-embedding-show-similar-to-current-buffer} to similar notes to whatever the text is in the curent buffer.

The variable @samp{ekg-embedding-text-selector} has a value that is a function that will pre-process all text that is sent for embeddings. The default value is @samp{ekg-embedding-text-selector-initial}, which will estimate the size of the tokens sent and limit the text to the first 8k tokens. Right now the function is tuned to the limits of Open AI's embedding framework, and a different function may be needed for other embedding APIs.

If you would like to stop generating embeddings for notes in a session, you can call @code{(ekg-embedding-disable-generate-on-save)}.

@node Logseq
@section Logseq

Expand Down
21 changes: 17 additions & 4 deletions ekg-embedding.el
Original file line number Diff line number Diff line change
Expand Up @@ -300,10 +300,23 @@ The results are in order of most similar to least similar."
ekg-notes-size)))
nil))

(add-hook 'ekg-note-pre-save-hook #'ekg-embedding-generate-for-note-async)
;; Generating embeddings from a note's tags has to be post-save, since it works
;; by loading saved embeddings.
(add-hook 'ekg-note-save-hook #'ekg-embedding-generate-for-note-tags-delayed)
(defun ekg-embedding-generate-on-save ()
"Enable embedding generation for new notes.
If you have created notes without embeddings enabled, you should
run `ekg-embedding-generate-all' to generate embeddings for all
notes."
(add-hook 'ekg-note-pre-save-hook #'ekg-embedding-generate-for-note-async)
;; Generating embeddings from a note's tags has to be post-save, since it works
;; by loading saved embeddings.
(add-hook 'ekg-note-save-hook #'ekg-embedding-generate-for-note-tags-delayed))

(defun ekg-embedding-disable-generate-on-save ()
"Disable the embedding module for the Emacs session."
(remove-hook 'ekg-note-pre-save-hook #'ekg-embedding-generate-for-note-async)
(remove-hook 'ekg-note-save-hook #'ekg-embedding-generate-for-note-tags-delayed))

;; Regardless of whether notes are generated on save, when notes are deleted we
;; need to clean up the embeddings.
(add-hook 'ekg-note-delete-hook #'ekg-embedding-delete)

(provide 'ekg-embedding)
Expand Down
19 changes: 13 additions & 6 deletions ekg.el
Original file line number Diff line number Diff line change
Expand Up @@ -1144,14 +1144,19 @@ The function is expected to behave as normal for a function in
"Completion function for all metadata at `completion-at-point-functions'.
If no completion function is found for the field type, don't
attempt the completion."
(if-let (field (ekg--metadata-current-field))
;; Only do something when we aren't in a read-only space.
(when
(or (null (ekg--metadata-current-field))
;; + 2 for the colon and space
(>= (current-column) (+ 2 (length (car (ekg--metadata-current-field))))))
(if-let (field (ekg--metadata-current-field))
(when-let (completion-func (assoc (car field) ekg-capf-field-complete-funcs
#'equal))
#'equal))
(funcall (cdr completion-func)))
;; There's no current field, but we're in the metadata, so let's complete
;; the possible fields.
(when (ekg--in-metadata-p)
(ekg--field-name-complete))))
;; There's no current field, but we're in the metadata, so let's complete
;; the possible fields.
(when (ekg--in-metadata-p)
(ekg--field-name-complete)))))

(defun ekg--field-name-complete ()
"Completion function for metadata field names."
Expand Down Expand Up @@ -1183,6 +1188,8 @@ Argument FINISHED is non-nil if the user has chosen a completion."
(point)))
(start (save-excursion
(skip-chars-backward "^,\t\n:")
;; We are at the right boundary, but now ignore whitespace.
(skip-chars-forward "[ \t]")
(point))))
(list start end (completion-table-dynamic
(lambda (_) (ekg-tags)))
Expand Down

0 comments on commit 74d43f5

Please sign in to comment.