Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLE-1053 rebase asapp fixes #4

Open
wants to merge 29 commits into
base: ASAPP-fixes
Choose a base branch
from
Open

MLE-1053 rebase asapp fixes #4

wants to merge 29 commits into from

Conversation

jweese-asapp
Copy link

Attempt 2. Strictly a rebase of fixes onto FB master branch.

Celebio and others added 29 commits September 13, 2018 02:00
Summary: In the class Dictionary of fastText, merge the two methods named `computeSubwords`, by making the third argument an optional pointer.

Reviewed By: EdouardGrave

Differential Revision: D9766500

fbshipit-source-id: ab12c432b371cf5b924660e12e79a5d7cea708e2
Summary:
Hi everyone, and thanks for this wonderful library. I'm relatively new to it, and I found myself struggling a bit when trying to obtain reproducible results, e.g. in order to find the the best parameters.
I found the perfect answer in a 2016 issue here on your repo (facebookresearch#116)  and I though it could be useful to add it to the FAQs.

I'm sending you two PR:
- this one, in which I added the FAQ
- a second one, in which I modified the description in src/args.cc for the "thread" param

Of course feel free to choose which one to keep (or eventually to trash both of them).

Thanks!
Leonardo
Pull Request resolved: facebookresearch#633

Differential Revision: D9814563

Pulled By: EdouardGrave

fbshipit-source-id: 83e4b7a7163b9013aef144dedd9b4bd5945bafdf
Summary:
The first link at https://fasttext.cc/docs/en/pretrained-vectors.html doesn't work. This fixes it.
Pull Request resolved: facebookresearch#590

Reviewed By: piotr-bojanowski

Differential Revision: D9489391

Pulled By: EdouardGrave

fbshipit-source-id: f1e1f0fe6a52d3d12d7a3dbf608848d68daa6c3f
Summary: Conforming to Facebook c++ style https://our.intern.facebook.com/intern/wiki/CppStyle

Reviewed By: piotr-bojanowski

Differential Revision: D10126506

fbshipit-source-id: 8389b652697addf7176d5d8defddbcb22dab3526
Summary:
This diff adds a new command to fasttext to display precision/recall score for each individual label : `print-label-scores`
It will get predicted labels above given threshold, and compute scores.

For example, the question "vinegar softens the bite of raw onions ?" has two labels : "vinegar" and "onions". It will ask fastText to predict labels above given threshold. If there are two such labels : "pickling", "onions", we will obtain :
"onions" will have a precision of 100%,
"pickling" a precision of 0%,
"onions" will have a recall of 100%,
"vinegar" will have a recall of 0%.

Reviewed By: EdouardGrave

Differential Revision: D9991570

fbshipit-source-id: 63cff90f57659d51f5aa1f10243d40e253445aa6
Summary: The python binding for `predict` function was broken by the previous diff. The issue was reported here : facebookresearch#670

Reviewed By: EdouardGrave

Differential Revision: D10868209

fbshipit-source-id: 77a2e38a74356973eedb28aa5fa348acd39c0aef
Summary: Pull Request resolved: facebookresearch#610

Reviewed By: EdouardGrave

Differential Revision: D12900420

Pulled By: Celebio

fbshipit-source-id: 7001549031dbdc904436ae2d2432470e0a5669ff
…tion issues on some platforms

Summary: the issue was reported here : facebookresearch#666

Reviewed By: EdouardGrave

Differential Revision: D12900614

fbshipit-source-id: 04303eb1442b0ab7956c5a4b56d9d57eeb004961
Summary:
Recently, a diff from Celebio added a new feature "test-label" that calculates precision/recall/f1-score for every label. This is very useful feature, however, it makes FastText class overcomplicated.
I propose a refactoring of model testing and metrics calculation code. It introduces MetricsAccumulator class, which is responsible for collecting stats on a dataset and calculating final metrics. Moving this functionality to separate class allows to simplify model testing code in FastText class. The same FastText::test method can be used to compute both regular and per-label metrics. This design allows MetricsAccumulator to be extended to implement different types of metrics. As result, it would be much easier to add other kinds of metrics in the future.
Pull Request resolved: facebookresearch#672

Reviewed By: EdouardGrave

Differential Revision: D12901046

Pulled By: Celebio

fbshipit-source-id: 9dcf10de950e7fb9179c4400570d2fd7b9b1879c
…n in fasttext

Summary:
This diff is following up the pull-request diff `Refactor model testing and metrics code`:
- Merging classes LabelMetricsAccumulator and MetricsAccumulator into one : Meter
- putting back removed function signatures in fasttext.h and marking them as deprecated
- removal of f1 score from results (that will be added again later)
- simplifying main.cc thanks to the new api

Reviewed By: EdouardGrave

Differential Revision: D12903111

fbshipit-source-id: eb4116b207aad1713754c136e2a064e9517fdb57
Summary: fasttext binary has now the command test-label to display the score for each individual label. This diff adds the corresponding python binding.

Reviewed By: EdouardGrave

Differential Revision: D11589785

fbshipit-source-id: 809dd4a57750f05b68d6e576a58596b13fdc5d31
Summary: coverage option allows to compile in coverage mode in order to get execution metrics

Reviewed By: EdouardGrave

Differential Revision: D11659859

fbshipit-source-id: 0d831571e00fadf2002d6b074a89ff76fa7dcfe1
Summary: "compute precision/recall for each label" commit removed the function multilinePredict in the python bindings. However this causes performance issues. This diff is putting back the function by adapting it to the new api of `predict` function in fasttext.h. The issue was reported here : facebookresearch#673

Reviewed By: EdouardGrave

Differential Revision: D12900565

fbshipit-source-id: 880cc428810e755021958e6427a5e6c4f2b43e79
Summary:
Currently circleci tests fail for two reasons:
- when the vm tries to install a specific version of pybind for testing fasttext version on pypi
- when the vm has a compiler/stl version that needs explicit includes for stdexcept

Reviewed By: EdouardGrave

Differential Revision: D12956910

fbshipit-source-id: 8272415b41d54d880a37777a81a316741a5b920f
Summary:
Please be aware that this pull request was automatically created using [gtf](https://github.com/schneiderl/gtf) - a typo fixing bot.

You should be able to merge this with no other problems.
In case the proposed changes do not make sense I would be glad to hear about it.
Pull Request resolved: facebookresearch#662

Reviewed By: piotr-bojanowski

Differential Revision: D12959092

Pulled By: Celebio

fbshipit-source-id: dcab01ffb1bad30e17f1ce9cad27d801edf66c99
Summary:
The issue was reported here : facebookresearch#678
gcc 4.8.5 seems to not support `auto` as lambda parameters.

Reviewed By: piotr-bojanowski

Differential Revision: D13136421

fbshipit-source-id: e1770c80f78f1b6578b8750059fe8c9220265f24
Summary: Suggested by user willianpaixao : facebookresearch#674

Reviewed By: piotr-bojanowski

Differential Revision: D13136722

fbshipit-source-id: 4ea07342ed659d312280fd6d9087376e9c9a82d0
Summary:
This diff removes the print capabilities from fasttext and defines a new api.
- `predictLine` extracts predictions from exactly one line of the input stream.
- the deprecated `printLabelStats` is removed as [js bindings don't use it]( https://www.facebook.com/groups/1174547215919768/?multi_permalinks=2328051983902613&comment_id=2360179150689896 )
- `ngramVectors` is now deprecated by the addition of `getNgramVectors`. `Vector` class remains copy-free but move semantics has been added.
- `analogies` is now deprecated by `getAnalogies`. when called, fastText class lazy-precomputes word vectors
- `findNN` is now deprecated by `getNN`. when called, fastText class lazy-precomputes word vectors
- `trainThread` and `printInfo` functions are now private.
- `supervised`, `cbow`, `skipgram`, `selectEmbeddings`, `precomputeWordVectors` are now deprecated and will be private in the future.
- `saveVectors`, `saveOutput` and `saveModel` without arguments are now deprecated by their equivalent with filename as string argument.

Reviewed By: EdouardGrave

Differential Revision: D13083799

fbshipit-source-id: f557ed7c141a90a6171045fe118ac16c195c824f
Summary: In some environments, `python setup.py install` fails to install pybind11. The solution is given by `pybind11`'s repository which consists on calling `pip install pybind11` via subprocess. The issue and the solution were reported here facebookresearch#512

Reviewed By: EdouardGrave

Differential Revision: D13167381

fbshipit-source-id: 4ee7835a07e503d00728857242e085bc7de53c14
Summary: Argument names were missing from the fasttext.h file making it harder to read as an api. This commit adds their names. `loadVectors` function's argument is now a `const std::string &` instead of `std::string`.

Reviewed By: EdouardGrave

Differential Revision: D13180989

fbshipit-source-id: 81b63763047514ff13b60eb0cf7992601d33f188
Summary: The buffer vector should be normalized when added to the query vector.

Reviewed By: EdouardGrave

Differential Revision: D13192638

fbshipit-source-id: faa46d339e7cc0d149ccc5826fa7197ccfd81635
Summary: The new option for the `loss` parameter allows to compute the loss as a sum of cross-entropy of each independent unit of the output.

Reviewed By: EdouardGrave

Differential Revision: D10853638

fbshipit-source-id: dc4c56e25c89c9da1a33bda1b29db781080794fd
Summary: one-vs-all loss option is now available for python

Reviewed By: piotr-bojanowski

Differential Revision: D13232380

fbshipit-source-id: 08c7f500fd7206132d0905f33b79e2f3dc745db2
Summary: Pull Request resolved: facebookresearch#659

Reviewed By: piotr-bojanowski

Differential Revision: D13318793

Pulled By: Celebio

fbshipit-source-id: 3b8fd28172b0291b2df93a526de630b01df680e8
Summary: Re-licensing fastText to MIT

Reviewed By: piotr-bojanowski

Differential Revision: D13415080

fbshipit-source-id: 6708849531fe7559cde273a3024660bc8b3b3750
Summary:
Hi

Looks like in some cases the `language` is not defined which result in 404 links.

<img width="398" alt="capture d ecran 2018-07-23 a 00 24 32" src="https://user-images.githubusercontent.com/124937/43050768-e1209046-8e0e-11e8-8510-beed7b549633.png">

This fix defaults the language to `en`
Pull Request resolved: facebookresearch#581

Reviewed By: piotr-bojanowski

Differential Revision: D13397407

Pulled By: Celebio

fbshipit-source-id: 5604039e9a4104ecadfbd8978ffe5a15317e5c56
@jweese-asapp jweese-asapp requested a review from a user January 16, 2019 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants