You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
They can all be resolved just by decoupling the numbering from labels, which also remove the need of sort_topics, and start_index options in the python API.
Now I am not going into details on how to implement or specification of outcomes, but here are some ideas:
Outline
python API side
We currently generate topic numbers at topic_top_term_df in _prepare.py. We use enumerate and start_index to generate the numbering, in which it is supplied by user from prepare method, smuggled through _topic_info method.
And that is for topic_info data only, we have to do the same of mdsData and token_table too.
Clearly a better way is just to side-step it and just supply a desired list of names and store into the PreparedData namedtuple.
Solution: side step at JS visualisation side
Currently, our visualisation logic made hard assumptions that Category must be in the form of "TopicN" where N is a number:
@msusol Yes, still WIP though. Ideally cleaning up the code base would be better but I do not have such plans.
My plan is to just, as mentioned above, a quick hack:
adding new param at prepare, default to None, some logic to generate dummy topic name if None.
store it at PreparedData
change the visualisation accordingly:
RHS Table title
the circle labels too if it looked good.
allow select topic by topic name too, if not too difficult
We have whole family of issues that are just about the numbering of topics during visualisation:
/js/ldavis.v3.0.0.js
) #266They can all be resolved just by decoupling the numbering from labels, which also remove the need of
sort_topics
, andstart_index
options in the python API.Now I am not going into details on how to implement or specification of outcomes, but here are some ideas:
Outline
python
API sideWe currently generate topic numbers at
topic_top_term_df
in_prepare.py
. We use enumerate andstart_index
to generate the numbering, in which it is supplied by user fromprepare
method, smuggled through_topic_info
method.pyLDAvis/pyLDAvis/_prepare.py
Line 276 in 16800f3
Sorting is orthogonal to this logic, hence we can safely ignored it when changing such code:
pyLDAvis/pyLDAvis/_prepare.py
Lines 413 to 416 in 16800f3
The number generated from
enumerate
will ultimately be used to name the topic, stored asCategory
:pyLDAvis/pyLDAvis/_prepare.py
Line 265 in 16800f3
I believe we should allow user to supply a list of strings.
If we change this we need to change this too:
pyLDAvis/pyLDAvis/_prepare.py
Lines 443 to 449 in 16800f3
and made sure none of them are named
"Default"
, since we used it as default:pyLDAvis/pyLDAvis/_prepare.py
Lines 237 to 242 in 16800f3
And that is for
topic_info
data only, we have to do the same ofmdsData
andtoken_table
too.Clearly a better way is just to side-step it and just supply a desired list of names and store into the
PreparedData
namedtuple.Solution: side step at JS visualisation side
Currently, our visualisation logic made hard assumptions that Category must be in the form of
"TopicN"
whereN
is a number:pyLDAvis/pyLDAvis/js/ldavis.js
Lines 697 to 701 in 16800f3
Therefore, again, the path of lowest friction is to side-step it only changing the visualisation logic:
pyLDAvis/pyLDAvis/js/ldavis.js
Lines 982 to 987 in 16800f3
pyLDAvis/pyLDAvis/js/ldavis.js
Lines 388 to 393 in 16800f3
In which
2
is optional. So only 3 changes in total!Summary, changes needed
PreparedData
The text was updated successfully, but these errors were encountered: