-
Notifications
You must be signed in to change notification settings - Fork 250
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #115 from lenisha/master
Skills and samples updates
- Loading branch information
Showing
45 changed files
with
2,490 additions
and
295 deletions.
There are no files selected for viewing
71 changes: 71 additions & 0 deletions
71
01 - Search Index Creation/01.1 - BuiltIn Skills/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Adding Built In Skill to the Skillset | ||
|
||
Add Sentiment Analysis Skill to the Skillset and verify that sentiment are generated and stored in the index. | ||
|
||
Use https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-sentiment-v3 as reference for Skill inputs and outputs | ||
|
||
|
||
- Add field `sentiment` to index | ||
```json | ||
{ | ||
"name": "sentiment", | ||
"type": "Edm.String", | ||
"searchable": true, | ||
"sortable": true, | ||
"filterable": true, | ||
"facetable": true | ||
} | ||
``` | ||
|
||
- Add `"#Microsoft.Skills.Text.V3.SentimentSkill` to skillset | ||
```json | ||
{ | ||
"@odata.type": "#Microsoft.Skills.Text.V3.SentimentSkill", | ||
"name": "sentiment", | ||
"description": "", | ||
"context": "/document", | ||
"defaultLanguageCode": "en", | ||
"modelVersion": "", | ||
"includeOpinionMining": true, | ||
"inputs": [ | ||
{ | ||
"name": "text", | ||
"source": "/document/merged_text" | ||
} | ||
], | ||
"outputs": [ | ||
{ | ||
"name": "sentiment", | ||
"targetName": "sentiment" | ||
}, | ||
{ | ||
"name": "confidenceScores", | ||
"targetName": "confidenceScores" | ||
}, | ||
{ | ||
"name": "sentences", | ||
"targetName": "sentences" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
- Update Indexer to add output mappings between skill output and index field | ||
|
||
```json | ||
{ | ||
"sourceFieldName": "/document/sentiment", | ||
"targetFieldName": "sentiment" | ||
} | ||
``` | ||
|
||
**Refer** to Postman collection for more details | ||
|
||
|
||
# Verify Index data | ||
|
||
- Search for all docments that have 'GitHub` word in them sorting by sentiment | ||
|
||
- Search all document and show sentiment and locations facets | ||
|
||
- Search documents that have location in Europe |
1,042 changes: 752 additions & 290 deletions
1,042
01 - Search Index Creation/Cognitive Search Pipeline APIs.postman_collection.json
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
103 changes: 103 additions & 0 deletions
103
03 - Data Science and Custom Skills/FormRecognizer Skill/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
|
||
# Form Recognizer Custom Skill | ||
|
||
Follow MS Learn module [Build a Form Recognizer custom skill for Azure Cognitive Search ](https://learn.microsoft.com/en-us/training/modules/build-form-recognizer-custom-skill-for-azure-cognitive-search/4-exercise-build-deploy) | ||
to create Form Recognizer service and deploy Azure Function using cloud shell. | ||
|
||
Integrate a Form Recognizer Pre-Built Model for Invoices capability within the Cognitive Search pipeline | ||
|
||
# AnalyzeInvoice | ||
|
||
This custom skill extracts invoice specific fields using a pre trained forms recognizer model. | ||
|
||
|
||
## Settings | ||
|
||
This Azure function requires access to an [Azure Forms Recognizer](https://azure.microsoft.com/en-us/services/cognitive-services/form-recognizer/) resource. The [prebuilt invoice model](https://docs.microsoft.com/azure/cognitive-services/form-recognizer/concept-invoices) is available in the 2.1 preview API. | ||
|
||
|
||
This function requires a `FORMS_RECOGNIZER_ENDPOINT` and a `FORMS_RECOGNIZER_KEY` settings set to a valid Azure Forms Recognizer API key and to your custom Form Recognizer 2.1-preview endpoint. | ||
|
||
|
||
|
||
## Sample Input: | ||
|
||
This sample data is pointing to a file stored in this repository, but when the skill is integrated in a skillset, the URL and token will be provided by cognitive search. | ||
|
||
```json | ||
{ | ||
"values": [ | ||
{ | ||
"recordId": "record1", | ||
"data": { | ||
"formUrl": "https://github.com/Azure-Samples/azure-search-power-skills/raw/master/SampleData/Invoice_4.pdf", | ||
"formSasToken": "?st=sasTokenThatWillBeGeneratedByCognitiveSearch" | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
## Sample Output: | ||
|
||
```json | ||
{ | ||
"values": [ | ||
{ | ||
"recordId": "0", | ||
"data": { | ||
"invoices": [ | ||
{ | ||
"AmountDue": 63.0, | ||
"BillingAddress": "345 North St NY 98052", | ||
"BillingAddressRecipient": "Fabrikam, Inc.", | ||
"DueDate": "2018-05-31", | ||
"InvoiceDate": "2018-05-15", | ||
"InvoiceId": "1785443", | ||
"InvoiceTotal": 56.28, | ||
"VendorAddress": "4567 Main St Buffalo NY 90852", | ||
"SubTotal": 49.3, | ||
"TotalTax": 0.99 | ||
} | ||
] | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
## Sample Skillset Integration | ||
|
||
In order to use this skill in a cognitive search pipeline, you'll need to add a skill definition to your skillset. | ||
Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment): | ||
|
||
```json | ||
{ | ||
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill", | ||
"name": "formrecognizer", | ||
"description": "Extracts fields from a form using a pre-trained form recognition model", | ||
"uri": "[AzureFunctionEndpointUrl]/api/AnalyzeInvoice?code=[AzureFunctionDefaultHostKey]", | ||
"httpMethod": "POST", | ||
"timeout": "PT1M", | ||
"context": "/document", | ||
"batchSize": 1, | ||
"inputs": [ | ||
{ | ||
"name": "formUrl", | ||
"source": "/document/metadata_storage_path" | ||
}, | ||
{ | ||
"name": "formSasToken", | ||
"source": "/document/metadata_storage_sas_token" | ||
} | ||
], | ||
"outputs": [ | ||
{ | ||
"name": "invoices", | ||
"targetName": "invoices" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Refer to Postman Collection for more details. |
Binary file added
BIN
+144 KB
03 - Data Science and Custom Skills/FormRecognizer Skill/SampleInvoices/Invoice_1.pdf
Binary file not shown.
Binary file added
BIN
+153 KB
03 - Data Science and Custom Skills/FormRecognizer Skill/SampleInvoices/Invoice_2.pdf
Binary file not shown.
Binary file added
BIN
+182 KB
03 - Data Science and Custom Skills/FormRecognizer Skill/SampleInvoices/Invoice_3.pdf
Binary file not shown.
Binary file added
BIN
+153 KB
03 - Data Science and Custom Skills/FormRecognizer Skill/SampleInvoices/Invoice_4.pdf
Binary file not shown.
Binary file added
BIN
+161 KB
03 - Data Science and Custom Skills/FormRecognizer Skill/SampleInvoices/Invoice_5.pdf
Binary file not shown.
5 changes: 5 additions & 0 deletions
5
03 - Data Science and Custom Skills/FormRecognizer Skill/customskill/.funcignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.git* | ||
.vscode | ||
local.settings.json | ||
test | ||
.venv |
130 changes: 130 additions & 0 deletions
130
03 - Data Science and Custom Skills/FormRecognizer Skill/customskill/.gitignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don’t work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# Azure Functions artifacts | ||
bin | ||
obj | ||
appsettings.json | ||
local.settings.json | ||
.python_packages |
6 changes: 6 additions & 0 deletions
6
03 - Data Science and Custom Skills/FormRecognizer Skill/customskill/.vscode/extensions.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"recommendations": [ | ||
"ms-azuretools.vscode-azurefunctions", | ||
"ms-python.python" | ||
] | ||
} |
13 changes: 13 additions & 0 deletions
13
03 - Data Science and Custom Skills/FormRecognizer Skill/customskill/.vscode/launch.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"version": "0.2.0", | ||
"configurations": [ | ||
|
||
{ | ||
"name": "Attach to Python Functions", | ||
"type": "python", | ||
"request": "attach", | ||
"port": 9091, | ||
"preLaunchTask": "func: host start" | ||
} | ||
] | ||
} |
8 changes: 8 additions & 0 deletions
8
03 - Data Science and Custom Skills/FormRecognizer Skill/customskill/.vscode/settings.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"azureFunctions.deploySubpath": ".", | ||
"azureFunctions.scmDoBuildDuringDeployment": true, | ||
"azureFunctions.pythonVenv": ".venv", | ||
"azureFunctions.projectLanguage": "Python", | ||
"azureFunctions.projectRuntime": "~2", | ||
"debug.internalConsoleOptions": "neverOpen" | ||
} |
Oops, something went wrong.