Azure Search Cognitive Skill to extract technical and business skills from text
Deprecation Notice
We’ve launched a better version of this service with Azure Cognitive Serivces - Text Analytics in the new V3 of the Named Entity Recognition (NER) endpoint.
NER V3 Information
You can read more about this work and how to use it here:
- Calling the API - https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3
Skill
Type information - https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=general#skill(NEW) Custom Entity Lookup Skill
Azure Cognitive Search recently introduced a new built-in Cognitive Skill that does essentially what this repository does. You provide a dictionary of terms you want to match and it will extract those for you from any text field in your search index. You can read more about that here: https://docs.microsoft.com/en-us/azure/search/cognitive-search-skill-custom-entity-lookup
Custom Cognitive Skill with Containers
This repo is no longer supported but you’re free to use the index and skill definitions provided to enable the personalized job recommendations scenario. If you are just looking to deploy a container as a custom skill, I highly recommend utilizing this more generic cookiecutter repository: https://github.com/microsoft/cookiecutter-spacy-fastapi. We will continue to support this project.
For more information on deploying Containers on Azure see:
The Skills Extractor is a Named Entity Recognition (NER) model that takes text as input, extracts skill entities from that text, then matches these skills to a knowledge base (in this sample a simple JSON file) containing metadata on each skill. It then returns a flat list of the skills identified.
A Cognitive Skill is a Feature of Azure Search designed to Augment data in a search index.
A Skill is a Technical Concept/Tool or a Business related/Personal attribute.
Example skills:
Machine Learning, Artificial Intelligence, PyTorch, Business, Advertising
For the current goals of the service, we are focused on technical skills. Technical skills are the abilities and knowledge needed to perform specific tasks. They are practical, and often relate to mechanical, information technology, mathematical, or scientific tasks.
The Taxonomies the API pulls from primarily consist of concepts and tools related to technology. For example, Programming Languages are considered a higher-level technical skill, and C# or Python are a sub of that larger skill.
P.S. Sorry for the confusing naming.
We pull skills and technologies from many open online sources and build Record Linkage models to conflate skills and categories across each source into a single Knowledge Graph.
Here is a list of our sources:
The original idea stemmed from a few organizational needs. Here are a few:
Before running this sample, you must have the following:
az --version
to find the version you have.If you’re unfamiliar with Azure Search Cognitive Skills you can read more about them here:
https://docs.microsoft.com/en-us/azure/search/cognitive-search-concept-intro
If you would like to create your own Custom Skill leveraging the NLP power of the Python Ecosystem you can use this cookiecutter project to bootstrap a containerized API to deploy in your own infrastructure.
https://github.com/Microsoft/cookiecutter-azure-search-cognitive-skill
We’re eager to improve, so please take a couple of minutes to answer some questions about your experience https://aka.ms/AA4xoy5.
For support, please contact: WWL_Skills_Service@microsoft.com
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct.
For more information see the Code of Conduct FAQ or
contact opencode@microsoft.com with any additional questions or comments.