Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] decouple visualisation UI's topic numbering with their label #267

Open
ed9w2in6 opened this issue Apr 24, 2024 · 2 comments
Open
Assignees

Comments

@ed9w2in6
Copy link
Contributor

ed9w2in6 commented Apr 24, 2024

We have whole family of issues that are just about the numbering of topics during visualisation:

They can all be resolved just by decoupling the numbering from labels, which also remove the need of sort_topics, and start_index options in the python API.

Now I am not going into details on how to implement or specification of outcomes, but here are some ideas:

Outline

python API side

We currently generate topic numbers at topic_top_term_df in _prepare.py. We use enumerate and start_index to generate the numbering, in which it is supplied by user from prepare method, smuggled through _topic_info method.

topic_dfs = map(topic_top_term_df, enumerate(top_terms.T.iterrows(), start_index))

Sorting is orthogonal to this logic, hence we can safely ignored it when changing such code:

if (sort_topics):
topic_proportion = (topic_freq / topic_freq.sum()).sort_values(ascending=False)
else:
topic_proportion = (topic_freq / topic_freq.sum())

The number generated from enumerate will ultimately be used to name the topic, stored as Category:

'Category': 'Topic%d' % new_topic_id,

I believe we should allow user to supply a list of strings.

If we change this we need to change this too:

class PreparedData(namedtuple('PreparedData', ['topic_coordinates', 'topic_info', 'token_table',
'R', 'lambda_step', 'plot_opts', 'topic_order'])):
def sorted_terms(self, topic=1, _lambda=1):
"""Returns a dataframe using _lambda to calculate term relevance of a given topic."""
tdf = pd.DataFrame(self.topic_info[self.topic_info.Category == 'Topic' + str(topic)])
if _lambda < 0 or _lambda > 1:

and made sure none of them are named "Default", since we used it as default:

default_term_info = pd.DataFrame({
'saliency': saliency,
'Term': vocab,
'Freq': term_frequency,
'Total': term_frequency,
'Category': 'Default'})

And that is for topic_info data only, we have to do the same of mdsData and token_table too.
Clearly a better way is just to side-step it and just supply a desired list of names and store into the PreparedData namedtuple.

Solution: side step at JS visualisation side

Currently, our visualisation logic made hard assumptions that Category must be in the form of "TopicN" where N is a number:

function reorder_bars(increase) {
// grab the bar-chart data for this topic only:
var dat2 = lamData.filter(function(d) {
return d.Category == "Topic" + vis_state.topic;
});

Therefore, again, the path of lowest friction is to side-step it only changing the visualisation logic:

  1. RHS Table title
    .attr("y", -30)
    .attr("class", "bubble-tool") // set class so we can remove it when highlight_off is called
    .style("text-anchor", "middle")
    .style("font-size", "16px")
    .text("Top-" + R + " Most Relevant Terms for Topic " + topics + " (" + Freq + "% of tokens)");
  2. circle label
    .style("font-size", "11px")
    .style("fontWeight", 100)
    .text(function(d) {
    return d.topics;
    });

In which 2 is optional. So only 3 changes in total!


Summary, changes needed

  1. new parameter for topic names
  2. store it at PreparedData
  3. change RHS Table title, optionally the circle labels too
@msusol msusol self-assigned this Apr 24, 2024
@msusol
Copy link
Collaborator

msusol commented Apr 24, 2024

Are you creating a matching pull request?

@ed9w2in6
Copy link
Contributor Author

@msusol Yes, still WIP though. Ideally cleaning up the code base would be better but I do not have such plans.
My plan is to just, as mentioned above, a quick hack:

  1. adding new param at prepare, default to None, some logic to generate dummy topic name if None.
  2. store it at PreparedData
  3. change the visualisation accordingly:
    • RHS Table title
    • the circle labels too if it looked good.
    • allow select topic by topic name too, if not too difficult

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants