Unverified Commit 3e11584d authored by Neel Kamath's avatar Neel Kamath Committed by GitHub

Allow for a multi-model architecture (#16)

parent 42363774
......@@ -23,7 +23,7 @@ download-vectors:
test-server:
stage: test
image: docker/compose
script: docker-compose -f docker-compose.yml -f docker-compose.test.yml
script: SPACY_MODEL=en_core_web_sm docker-compose -f docker-compose.yml -f docker-compose.test.yml
up --build --abort-on-container-exit --exit-code-from app
test-spec:
......@@ -31,22 +31,6 @@ test-spec:
image: node
script: npx @stoplight/spectral lint docs/openapi.yaml
build-image:
stage: build
image: docker:dind
script:
- echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
- version=$(grep version docs/openapi.yaml -m 1)
- 'version=${version#*: }'
- version=$(echo $version | cut -d "'" -f 2)
- echo v$version > version.txt
- docker build -t $CI_REGISTRY_IMAGE -t $CI_REGISTRY_IMAGE:$(cat version.txt) .
- docker push $CI_REGISTRY_IMAGE
- docker push $CI_REGISTRY_IMAGE:$(cat version.txt)
artifacts:
paths: [version.txt]
only: [master]
build-docs:
stage: build
image: node
......@@ -58,14 +42,7 @@ build-docs:
docker-hub:
stage: deploy
image: docker:dind
script:
- echo $DOCKER_HUB_PASSWORD | docker login -u $DOCKER_HUB_USER --password-stdin https://index.docker.io/v1/
- docker pull $CI_REGISTRY_IMAGE
- docker pull $CI_REGISTRY_IMAGE:$(cat version.txt)
- docker tag $CI_REGISTRY_IMAGE $DOCKER_HUB_USER/spacy-server
- docker tag $CI_REGISTRY_IMAGE:$(cat version.txt) $DOCKER_HUB_USER/spacy-server:$(cat version.txt)
- docker push $DOCKER_HUB_USER/spacy-server
- docker push $DOCKER_HUB_USER/spacy-server:$(cat version.txt)
script: sh deploy.sh
only: [master]
pages:
......
......@@ -3,6 +3,8 @@ WORKDIR /app
ENV PYTHONUNBUFFERED 1
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
ARG SPACY_MODEL
RUN python -m spacy download $SPACY_MODEL
COPY main.py .
COPY s2v_old/ s2v_old/
EXPOSE 8000
......
......@@ -2,7 +2,7 @@
[![Built with spaCy](https://img.shields.io/badge/built%20with-spaCy-09a3d5.svg)](https://spacy.io)
This project provides industrial-strength NLP via [spaCy](https://spacy.io/) and [sense2vec](https://github.com/explosion/sense2vec) over a containerized HTTP API.
This project provides industrial-strength NLP for multiple languages via [spaCy](https://spacy.io/) and [sense2vec](https://github.com/explosion/sense2vec) over a containerized HTTP API.
## Installation
......@@ -10,13 +10,16 @@ This project provides industrial-strength NLP via [spaCy](https://spacy.io/) and
Install [Docker](https://hub.docker.com/search/?type=edition&offering=community).
The container `EXPOSE`s port `8000`. To serve at `http://localhost:8000`, run `docker run --rm -p 8000:8000 neelkamath/spacy-server`.
You can find specific tags (say for example, a French model) on the [Docker Hub repository](https://hub.docker.com/repository/docker/neelkamath/spacy-server/tags?page=1).
You can find specific versions on the [Docker Hub repository](https://hub.docker.com/repository/docker/neelkamath/spacy-server/tags?page=1).
For example, to run an English model at `http://localhost:8000`, run:
```
docker run --rm -e SPACY_MODEL=en_core_web_sm -p 8000:8000 neelkamath/spacy-server:v1-en_core_web_sm
```
### Generating an SDK
You can generate a wrapper for the HTTP API using [OpenAPI Generator](https://openapi-generator.tech/) on the file `https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml`.
You can generate a wrapper for the HTTP API using [OpenAPI Generator](https://openapi-generator.tech/) on the file [`https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml`](https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml).
## [Usage](https://neelkamath.gitlab.io/spacy-server/)
......
version: '3.7'
services:
app:
command: sh setup.sh 'uvicorn main:app --host 0.0.0.0 --reload'
command: sh scripts/setup.sh 'uvicorn main:app --host 0.0.0.0 --reload'
ports: ['8000:8000']
\ No newline at end of file
version: '3.7'
services:
app:
command: sh setup.sh pytest
\ No newline at end of file
command: sh scripts/setup.sh pytest
\ No newline at end of file
......@@ -7,6 +7,8 @@ services:
app:
image: python:3.8
working_dir: /app
environment:
SPACY_MODEL:
volumes:
- type: bind
source: .
......
......@@ -2,28 +2,33 @@
## Server
Replace `<MODEL>` with the name of the [spaCy model](https://spacy.io/models) (e.g., `en_core_web_sm`, `fr_core_news_md`). The model must be compatible with the spaCy version specified in [requirements.txt](../requirements.txt).
### Development
```
docker-compose -p dev up --build
SPACY_MODEL=<MODEL> docker-compose -p dev up --build
```
The server will be running on `http://localhost:8000`, and has automatic reload enabled.
### Testing
Since any model will do, tests have been written only for the `en_core_web_sm` model for its combination of speed, features, and accuracy.
```
docker-compose -p test -f docker-compose.yml -f docker-compose.test.yml \
SPACY_MODEL=en_core_web_sm docker-compose -p test -f docker-compose.yml -f docker-compose.test.yml \
up --build --abort-on-container-exit --exit-code-from app
```
### Production
```
docker build -t spacy-server .
docker build --build-arg SPACY_MODEL=<MODEL> -t spacy-server .
docker run --rm -e SPACY_MODEL=<MODEL> -p 8000:8000 spacy-server
```
The container `EXPOSE`s port `8000`. To serve at `http://localhost:8080`, run `docker run --rm -p 8000:8000 spacy-server`.
The container `EXPOSE`s port `8000`.
## Specification
......
This diff is collapsed.
"""Provides spaCy NLP over an HTTP API."""
"""Provides NLP via spaCy and sense2vec over an HTTP API."""
import os
import typing
import en_core_web_sm
import fastapi
import pydantic
import sense2vec
import spacy
import starlette.status
app = fastapi.FastAPI()
nlp = en_core_web_sm.load()
nlp.add_pipe(sense2vec.Sense2VecComponent(nlp.vocab).from_disk("s2v_old"))
model = os.getenv('SPACY_MODEL')
pipeline_error = 'The pretrained model ({})'.format(model) + " doesn't support {}."
nlp = spacy.load(model)
nlp.add_pipe(sense2vec.Sense2VecComponent(nlp.vocab).from_disk('s2v_old'))
class SectionsModel(pydantic.BaseModel):
......@@ -19,6 +22,11 @@ class SectionsModel(pydantic.BaseModel):
@app.post('/ner')
async def recognize_named_entities(request: SectionsModel):
if not nlp.has_pipe('ner') or not nlp.has_pipe('parser'):
raise fastapi.HTTPException(
status_code=400,
detail=pipeline_error.format('named entity recognition')
)
response = {'data': []}
for doc in nlp.pipe(request.sections, disable=['tagger']):
for sent in doc.sents:
......@@ -54,6 +62,12 @@ class TextModel(pydantic.BaseModel):
@app.post('/pos')
async def tag_parts_of_speech(request: TextModel):
if (not nlp.has_pipe('ner') or not nlp.has_pipe('parser')
or not nlp.has_pipe('tagger')):
raise fastapi.HTTPException(
status_code=400,
detail=pipeline_error.format('part-of-speech tagging')
)
data = []
for token in [build_token(token) for token in nlp(request.text)]:
text = token['sent']
......@@ -118,6 +132,11 @@ async def tokenize(request: TextModel):
@app.post('/sentencizer')
async def sentencize(request: TextModel):
if not nlp.has_pipe('parser'):
raise fastapi.HTTPException(
status_code=400,
detail=pipeline_error.format('sentence segmentation')
)
doc = nlp(request.text, disable=['tagger', 'ner'])
return {'sentences': [sent.text for sent in doc.sents]}
......
spacy>=2.2.3,<3
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz#egg=en_core_web_sm
sense2vec>=1.0.2,<2
# We must specify a particular version for spaCy and sense2vec because pretrained models are only compatible with
# particular versions.
spacy==2.2.3
sense2vec==1.0.2
fastapi==0.45.0
uvicorn==0.10.8
pytest>=4.6.7,<5
\ No newline at end of file
#!/usr/bin/env sh
# Builds and uploads every image (e.g., neelkamath/spacy-server:v1-en_core_web_sm) to Docker Hub.
# Get the HTTP API version.
version=$(grep version docs/openapi.yaml -m 1)
version=${version#*: }
version=v$(echo "$version" | cut -d "'" -f 2)
# Log in.
echo "$DOCKER_HUB_PASSWORD" | docker login -u "$DOCKER_HUB_USER" --password-stdin https://index.docker.io/v1/
# Build and upload the images.
while IFS='' read -r model || [ -n "$model" ]; do
docker build --build-arg SPACY_MODEL="$model" -t "$DOCKER_HUB_USER"/spacy-server:"$version"-"$model" .
docker push "$DOCKER_HUB_USER"/spacy-server:"$version"-"$model"
done <scripts/models.txt
en_core_web_sm
en_core_web_md
en_core_web_lg
en_vectors_web_lg
en_trf_bertbaseuncased_lg
en_trf_robertabase_lg
en_trf_distilbertbaseuncased_lg
en_trf_xlnetbasecased_lg
de_core_news_sm
de_core_news_md
de_trf_bertbasecased_lg
fr_core_news_sm
fr_core_news_md
es_core_news_sm
es_core_news_md
pt_core_news_sm
it_core_news_sm
nl_core_news_sm
el_core_news_sm
el_core_news_md
nb_core_news_sm
lt_core_news_sm
xx_ent_wiki_sm
\ No newline at end of file
#!/usr/bin/env bash
#!/usr/bin/env sh
# Executes a command in a virtual environment (e.g., <sh setup.sh 'uvicorn main:app --reload'>).
python -m venv venv
. venv/bin/activate
pip install -r requirements.txt
$1
\ No newline at end of file
python -m spacy download "$SPACY_MODEL"
$1
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment