Allow for a multi-model architecture (#16)

3e11584d · Neel Kamath · GitHub · 42363774 · 3e11584d · 3e11584d
Unverified Commit 3e11584d authored Dec 14, 2019 by Neel Kamath Committed by GitHub Dec 14, 2019
13 changed files
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -23,7 +23,7 @@ download-vectors:
 test-server:
  stage: test
  image: docker/compose
-  script: docker-compose -f docker-compose.yml -f docker-compose.test.yml
+  script: SPACY_MODEL=en_core_web_sm docker-compose -f docker-compose.yml -f docker-compose.test.yml
    up --build --abort-on-container-exit --exit-code-from app

 test-spec:
@@ -31,22 +31,6 @@ test-spec:
  image: node
  script: npx @stoplight/spectral lint docs/openapi.yaml

-build-image:
-  stage: build
-  image: docker:dind
-  script:
-    - echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
-    - version=$(grep version docs/openapi.yaml -m 1)
-    - 'version=${version#*: }'
-    - version=$(echo $version | cut -d "'" -f 2)
-    - echo v$version > version.txt
-    - docker build -t $CI_REGISTRY_IMAGE -t $CI_REGISTRY_IMAGE:$(cat version.txt) .
-    - docker push $CI_REGISTRY_IMAGE
-    - docker push $CI_REGISTRY_IMAGE:$(cat version.txt)
-  artifacts:
-    paths: [version.txt]
-  only: [master]
-
 build-docs:
  stage: build
  image: node
@@ -58,14 +42,7 @@ build-docs:
 docker-hub:
  stage: deploy
  image: docker:dind
-  script:
-    - echo $DOCKER_HUB_PASSWORD | docker login -u $DOCKER_HUB_USER --password-stdin https://index.docker.io/v1/
-    - docker pull $CI_REGISTRY_IMAGE
-    - docker pull $CI_REGISTRY_IMAGE:$(cat version.txt)
-    - docker tag $CI_REGISTRY_IMAGE $DOCKER_HUB_USER/spacy-server
-    - docker tag $CI_REGISTRY_IMAGE:$(cat version.txt) $DOCKER_HUB_USER/spacy-server:$(cat version.txt)
-    - docker push $DOCKER_HUB_USER/spacy-server
-    - docker push $DOCKER_HUB_USER/spacy-server:$(cat version.txt)
+  script: sh deploy.sh
  only: [master]

 pages:

--- a/Dockerfile
+++ b/Dockerfile
@@ -3,6 +3,8 @@ WORKDIR /app
 ENV PYTHONUNBUFFERED 1
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
+ARG SPACY_MODEL
+RUN python -m spacy download $SPACY_MODEL
 COPY main.py .
 COPY s2v_old/ s2v_old/
 EXPOSE 8000

--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@

 [![Built with spaCy](https://img.shields.io/badge/built%20with-spaCy-09a3d5.svg)](https://spacy.io)

-This project provides industrial-strength NLP via [spaCy](https://spacy.io/) and [sense2vec](https://github.com/explosion/sense2vec) over a containerized HTTP API.
+This project provides industrial-strength NLP for multiple languages via [spaCy](https://spacy.io/) and [sense2vec](https://github.com/explosion/sense2vec) over a containerized HTTP API.

 ## Installation

@@ -10,13 +10,16 @@ This project provides industrial-strength NLP via [spaCy](https://spacy.io/) and

 Install [Docker](https://hub.docker.com/search/?type=edition&offering=community).

-The container `EXPOSE`s port `8000`. To serve at `http://localhost:8000`, run `docker run --rm -p 8000:8000 neelkamath/spacy-server`.
+You can find specific tags (say for example, a French model) on the [Docker Hub repository](https://hub.docker.com/repository/docker/neelkamath/spacy-server/tags?page=1).

-You can find specific versions on the [Docker Hub repository](https://hub.docker.com/repository/docker/neelkamath/spacy-server/tags?page=1).
+For example, to run an English model at `http://localhost:8000`, run:
+```
+docker run --rm -e SPACY_MODEL=en_core_web_sm -p 8000:8000 neelkamath/spacy-server:v1-en_core_web_sm
+```

 ### Generating an SDK

-You can generate a wrapper for the HTTP API using [OpenAPI Generator](https://openapi-generator.tech/) on the file `https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml`.
+You can generate a wrapper for the HTTP API using [OpenAPI Generator](https://openapi-generator.tech/) on the file [`https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml`](https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml).

 ## [Usage](https://neelkamath.gitlab.io/spacy-server/)


--- a/docker-compose.override.yml
+++ b/docker-compose.override.yml
 version: '3.7'
 services:
  app:
-    command: sh setup.sh 'uvicorn main:app --host 0.0.0.0 --reload'
+    command: sh scripts/setup.sh 'uvicorn main:app --host 0.0.0.0 --reload'
    ports: ['8000:8000']
\ No newline at end of file
--- a/docker-compose.test.yml
+++ b/docker-compose.test.yml
 version: '3.7'
 services:
  app:
-    command: sh setup.sh pytest
\ No newline at end of file
+    command: sh scripts/setup.sh pytest
\ No newline at end of file
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -7,6 +7,8 @@ services:
  app:
    image: python:3.8
    working_dir: /app
+    environment:
+      SPACY_MODEL:
    volumes:
      - type: bind
        source: .

--- a/docs/developing.md
+++ b/docs/developing.md
@@ -2,28 +2,33 @@

 ## Server

+Replace `<MODEL>` with the name of the [spaCy model](https://spacy.io/models) (e.g., `en_core_web_sm`, `fr_core_news_md`). The model must be compatible with the spaCy version specified in [requirements.txt](../requirements.txt).
+
 ### Development

 ```
-docker-compose -p dev up --build
+SPACY_MODEL=<MODEL> docker-compose -p dev up --build
 ```

 The server will be running on `http://localhost:8000`, and has automatic reload enabled.

 ### Testing

+Since any model will do, tests have been written only for the `en_core_web_sm` model for its combination of speed, features, and accuracy.
+
 ```
-docker-compose -p test -f docker-compose.yml -f docker-compose.test.yml \
+SPACY_MODEL=en_core_web_sm docker-compose -p test -f docker-compose.yml -f docker-compose.test.yml \
    up --build --abort-on-container-exit --exit-code-from app
 ```

 ### Production

 ```
-docker build -t spacy-server .
+docker build --build-arg SPACY_MODEL=<MODEL> -t spacy-server .
+docker run --rm -e SPACY_MODEL=<MODEL> -p 8000:8000 spacy-server
 ```

-The container `EXPOSE`s port `8000`. To serve at `http://localhost:8080`, run `docker run --rm -p 8000:8000 spacy-server`.
+The container `EXPOSE`s port `8000`.

 ## Specification


--- a/docs/openapi.yaml
+++ b/docs/openapi.yaml
--- a/main.py
+++ b/main.py
-"""Provides spaCy NLP over an HTTP API."""
+"""Provides NLP via spaCy and sense2vec over an HTTP API."""

+import os
 import typing

-import en_core_web_sm
 import fastapi
 import pydantic
 import sense2vec
+import spacy
 import starlette.status

 app = fastapi.FastAPI()
-nlp = en_core_web_sm.load()
-nlp.add_pipe(sense2vec.Sense2VecComponent(nlp.vocab).from_disk("s2v_old"))
+model = os.getenv('SPACY_MODEL')
+pipeline_error = 'The pretrained model ({})'.format(model) + " doesn't support {}."
+nlp = spacy.load(model)
+nlp.add_pipe(sense2vec.Sense2VecComponent(nlp.vocab).from_disk('s2v_old'))


 class SectionsModel(pydantic.BaseModel):
@@ -19,6 +22,11 @@ class SectionsModel(pydantic.BaseModel):

 @app.post('/ner')
 async def recognize_named_entities(request: SectionsModel):
+    if not nlp.has_pipe('ner') or not nlp.has_pipe('parser'):
+        raise fastapi.HTTPException(
+            status_code=400,
+            detail=pipeline_error.format('named entity recognition')
+        )
    response = {'data': []}
    for doc in nlp.pipe(request.sections, disable=['tagger']):
        for sent in doc.sents:
@@ -54,6 +62,12 @@ class TextModel(pydantic.BaseModel):

 @app.post('/pos')
 async def tag_parts_of_speech(request: TextModel):
+    if (not nlp.has_pipe('ner') or not nlp.has_pipe('parser')
+            or not nlp.has_pipe('tagger')):
+        raise fastapi.HTTPException(
+            status_code=400,
+            detail=pipeline_error.format('part-of-speech tagging')
+        )
    data = []
    for token in [build_token(token) for token in nlp(request.text)]:
        text = token['sent']
@@ -118,6 +132,11 @@ async def tokenize(request: TextModel):

 @app.post('/sentencizer')
 async def sentencize(request: TextModel):
+    if not nlp.has_pipe('parser'):
+        raise fastapi.HTTPException(
+            status_code=400,
+            detail=pipeline_error.format('sentence segmentation')
+        )
    doc = nlp(request.text, disable=['tagger', 'ner'])
    return {'sentences': [sent.text for sent in doc.sents]}


--- a/requirements.txt
+++ b/requirements.txt
-spacy>=2.2.3,<3
-https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz#egg=en_core_web_sm
-sense2vec>=1.0.2,<2
+# We must specify a particular version for spaCy and sense2vec because pretrained models are only compatible with
+# particular versions.
+spacy==2.2.3
+sense2vec==1.0.2
 fastapi==0.45.0
 uvicorn==0.10.8
 pytest>=4.6.7,<5
\ No newline at end of file
--- a/scripts/deploy.sh
+++ b/scripts/deploy.sh
+#!/usr/bin/env sh
+
+# Builds and uploads every image (e.g., neelkamath/spacy-server:v1-en_core_web_sm) to Docker Hub.
+
+# Get the HTTP API version.
+version=$(grep version docs/openapi.yaml -m 1)
+version=${version#*: }
+version=v$(echo "$version" | cut -d "'" -f 2)
+
+# Log in.
+echo "$DOCKER_HUB_PASSWORD" | docker login -u "$DOCKER_HUB_USER" --password-stdin https://index.docker.io/v1/
+
+# Build and upload the images.
+while IFS='' read -r model || [ -n "$model" ]; do
+  docker build --build-arg SPACY_MODEL="$model" -t "$DOCKER_HUB_USER"/spacy-server:"$version"-"$model" .
+  docker push "$DOCKER_HUB_USER"/spacy-server:"$version"-"$model"
+done <scripts/models.txt
--- a/scripts/models.txt
+++ b/scripts/models.txt
+en_core_web_sm
+en_core_web_md
+en_core_web_lg
+en_vectors_web_lg
+en_trf_bertbaseuncased_lg
+en_trf_robertabase_lg
+en_trf_distilbertbaseuncased_lg
+en_trf_xlnetbasecased_lg
+de_core_news_sm
+de_core_news_md
+de_trf_bertbasecased_lg
+fr_core_news_sm
+fr_core_news_md
+es_core_news_sm
+es_core_news_md
+pt_core_news_sm
+it_core_news_sm
+nl_core_news_sm
+el_core_news_sm
+el_core_news_md
+nb_core_news_sm
+lt_core_news_sm
+xx_ent_wiki_sm
\ No newline at end of file
--- a/setup.sh
+++ b/setup.sh
-#!/usr/bin/env bash
+#!/usr/bin/env sh

 # Executes a command in a virtual environment (e.g., <sh setup.sh 'uvicorn main:app --reload'>).

 python -m venv venv
 . venv/bin/activate
 pip install -r requirements.txt
-$1
\ No newline at end of file
+python -m spacy download "$SPACY_MODEL"
+$1