A lightweight (small and dependency-free) Java 8 library for identifying and normalizing numbers and measurements in text.
A lightweight (small and dependency-free) Java 8 library for identifying and normalizing numbers
whether they occur as decimal numbers (“5.8”), decimal fractions (“1/4”), or English numerals
(“forty-two”). Also contains functionality for detecting unit-of-measurement words. This was
developed as a stand-alone component of
BioMedICUS, a biomedical and clinical NLP engine developed by
the NLP-IE Group at the University of Minnesota Institute for Health Informatics.
This project makes use of the 2017 SPECIALIST Lexicon number files. For more information about the
SPECIALIST Lexicon, see their
website and their
Terms & Conditions.
To use in a maven project, include the following in your pom:
<dependencies>
<dependency>
<groupId>edu.umn.biomedicus</groupId>
<artifactId>biomedicus-measures</artifactId>
<version>2.0.1</version>
</dependency>
</dependencies>
Alternatively, download the .jar and include that in your libraries.
You can find the api documentation for this project here
DetectorFactory factory = Numbers.createDetectorFactory();
CombinedNumberDetector detector = factory.createCombinedNumberDetector();
Iterable<Token> tokens = ...
for (NumberResult numberResult in detector.findNumbers(tokens)) {
// do something with the detected number
}
For issues or enhancement requests, feel free to submit to the Issues tab on GitHub.
BioMedICUS has a gitter chat and a
Google Group for contacting
developers with questions, suggestions or feedback.
BioMedICUS is developed by the
University of Minnesota Institute for Health Informatics NLP/IE Group
with assistance from the
Open Health Natural Language Processing (OHNLP) Consortium.
Anyone is welcome and encouraged to contribute. If you discover a bug, or think the project could
use an enhancement, follow these steps:
git checkout -b feature-name
)git commit -am 'Summary of changes'
)git push origin feature-name
)