Fix #21 (#22)

* Conditionally enable sense2vec for performance improvements * Disable sense2vec in unrelated pipeline uses * Test HTTP exceptions * Update Docker image tagging convention * Conditionally disable sense2vec

Fix #21 (#22)
* Conditionally enable sense2vec for performance improvements * Disable sense2vec in unrelated pipeline uses * Test HTTP exceptions * Update Docker image tagging convention * Conditionally disable sense2vec
47b313d5 · Neel Kamath · GitHub · ca48e810 · 47b313d5 · 47b313d5
Unverified Commit 47b313d5 authored Dec 15, 2019 by Neel Kamath Committed by GitHub Dec 15, 2019
15 changed files
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@

 [![Built with spaCy](https://img.shields.io/badge/built%20with-spaCy-09a3d5.svg)](https://spacy.io)

-This project provides industrial-strength NLP for multiple languages via [spaCy](https://spacy.io/) and [sense2vec](https://github.com/explosion/sense2vec) over a containerized HTTP API.
+This project provides industrial-strength NLP via [spaCy](https://spacy.io/) and [sense2vec](https://github.com/explosion/sense2vec) over a containerized HTTP API.

 ## Installation

@@ -10,18 +10,11 @@ This project provides industrial-strength NLP for multiple languages via [spaCy]

 Install [Docker](https://hub.docker.com/search/?type=edition&offering=community).

-You can find specific tags (say for example, a French model) on the [Docker Hub repository](https://hub.docker.com/repository/docker/neelkamath/spacy-server/tags?page=1).
-
-For example, to run an English model at `http://localhost:8000`, run:
-```
-docker run --rm -e SPACY_MODEL=en_core_web_sm -p 8000:8000 neelkamath/spacy-server:v1-en_core_web_sm
-```
-
 ### Generating an SDK

 You can generate a wrapper for the HTTP API using [OpenAPI Generator](https://openapi-generator.tech/) on the file [`https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml`](https://raw.githubusercontent.com/neelkamath/spacy-server/master/docs/openapi.yaml).

-## [Usage](https://neelkamath.gitlab.io/spacy-server/)
+## [Usage](https://hub.docker.com/r/neelkamath/spacy-server)

 ## [Contributing](docs/CONTRIBUTING.md)


--- a/docker/Dockerfile
+++ b/docker/Dockerfile
-FROM python:3.8
+FROM python:3.8 AS base
 WORKDIR /app
-ENV PYTHONUNBUFFERED 1
+ARG SPACY_MODEL
+ENV PYTHONUNBUFFERED=1 SENSE2VEC=0 SPACY_MODEL=$SPACY_MODEL
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-ARG SPACY_MODEL
 RUN python -m spacy download $SPACY_MODEL
 COPY src/main.py .
-COPY src/s2v_old/ s2v_old/
 EXPOSE 8000
 HEALTHCHECK --timeout=2s --start-period=2s --retries=1 \
    CMD curl -f http://localhost:8000/health_check
 RUN useradd user
 USER user
 CMD ["uvicorn", "main:app", "--host", "0.0.0.0"]
+
+FROM base
+ENV SENSE2VEC 1
+COPY src/s2v_old/ src/s2v_old/
\ No newline at end of file
--- a/docker/docker-compose.override.yml
+++ b/docker/docker-compose.override.yml
 version: '3.7'
 services:
  app:
-    command: sh scripts/setup.sh 'uvicorn src.main:app --host 0.0.0.0 --reload'
+    command: sh -c '. scripts/setup.sh && uvicorn src.main:app --host 0.0.0.0 --reload'
    ports: ['8000:8000']
+    environment:
+      SPACY_MODEL:
+      SENSE2VEC:
\ No newline at end of file
--- a/docker/docker-compose.test.yml
+++ b/docker/docker-compose.test.yml
 version: '3.7'
 services:
  app:
-    command: sh scripts/setup.sh pytest
+    command: sh -c '. scripts/setup.sh && pytest'
    environment:
+      SENSE2VEC: 1

      # Since any model will do, tests have been written only for the en_core_web_sm model because of its combination of
      # speed, features, and accuracy.

--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
-# It is not possible to use a Docker volume to cache the dependencies because subsequent usage of the volume
-# occasionally gets corrupted for an unknown reason. Hence, a virtual environment is to be used instead. It is known
-# that virtual environments aren't needed in Docker because isolation is already provided; we use it as a cache instead.
+# A virtual environment caches dependencies instead of a Docker volume because the volume randomly gets corrupted.

 version: '3.7'
 services:
  app:
    image: python:3.8
    working_dir: /app
-    environment:
-      SPACY_MODEL:
    volumes:
      - type: bind
        source: .

--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -9,6 +9,6 @@ If you're forking the repo to develop the project as your own and not just to se
 1. Clone the repository using one of the following methods.
    - SSH: `git clone git@github.com:neelkamath/spacy-server.git`
    - HTTPS: `git clone https://github.com/neelkamath/spacy-server.git`
-1. Download the [pretrained vectors](https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz). Extract it into the project's `src` directory.
+1. If you are not going to use sense2vec, skip this step. Download the [pretrained vectors](https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz). Extract it into the project's `src` directory.

 ## [Developing](developing.md)
\ No newline at end of file
--- a/docs/developing.md
+++ b/docs/developing.md
@@ -2,12 +2,12 @@

 ## Server

-Replace `<MODEL>` with the name of the [spaCy model](https://spacy.io/models) (e.g., `en_core_web_sm`, `fr_core_news_md`). The model must be compatible with the spaCy version specified in [requirements.txt](../requirements.txt).
+Replace `<MODEL>` with the name of the [spaCy model](https://spacy.io/models) (e.g., `en_core_web_sm`, `fr_core_news_md`). The model must be compatible with the spaCy version specified in [requirements.txt](../requirements.txt). Replace `<ENABLED>` with `1` or `0` to enable to disable sense2vec respectively.

 ### Development

 ```
-SPACY_MODEL=<MODEL> docker-compose -p dev --project-directory . \
+SPACY_MODEL=<MODEL> SENSE2VEC=<ENABLED> docker-compose -p dev --project-directory . \
    -f docker/docker-compose.yml -f docker/docker-compose.override.yml up --build
 ```

@@ -15,19 +15,29 @@ The server will be running on `http://localhost:8000`, and has automatic reload

 ### Testing

-```
-docker-compose -p test --project-directory . -f docker/docker-compose.yml -f docker/docker-compose.test.yml \
+- For noninteractive environments (e.g., CI pipelines), you can run all the tests with the single command:
+    ```
+    docker-compose -p test --project-directory . -f docker/docker-compose.yml -f docker/docker-compose.test.yml \
        up --build --abort-on-container-exit --exit-code-from app
-```
+    ```
+- For faster iterations (e.g., while developing), you can run the tests interactively. Changes to the source code will automatically be mirrored in the container.
+    1. Run:
+        ```
+        docker-compose -p test --project-directory . -f docker/docker-compose.yml -f docker/docker-compose.test.yml \
+            run --service-ports app bash
+       ```
+    1. `. scripts/setup.sh` (run this command every time you update `requirements.txt`)
+    1. Execute tests any number of times you want with pytest (e.g., `pytest`).
+    1. After you're done testing, exit the container by running `exit`.

 ### Production

 ```
-docker build --build-arg SPACY_MODEL=<MODEL> -t spacy-server -f docker/Dockerfile .
-docker run --rm -e SPACY_MODEL=<MODEL> -p 8000:8000 spacy-server
+docker build <TARGET> --build-arg SPACY_MODEL=<MODEL> -t spacy-server -f docker/base.Dockerfile .
 ```
+Replace `<TARGET>` with `--target base` if you want to disable sense2vec, and an empty string otherwise.

-The container `EXPOSE`s port `8000`.
+The container `EXPOSE`s port `8000`. Run using `docker run --rm -p 8000:8000 spacy-server`.

 ## Specification

@@ -61,7 +71,10 @@ Open `redoc-static.html` in your browser.

 ## Releases

- Create a GitHub release (this will automatically create the git tag). If you bumped the version in `docs/openapi.yaml`, then create a new release. If you haven't bumped the version but have updated the HTTP API's functionality, delete the existing GitHub release and git tag, and create a new one. Otherwise, skip this step. The release's title should be the features included (e.g., `NER, POS tagging, sentencizer, tokenizer, and sense2vec`). The tag should be the HTTP API's version (e.g., `v1`). The release's body should be ```Download and open the release asset, `redoc-static.html`, in your browser to view the HTTP API documentation.```. Upload the asset named `redoc-static.html` which contains the HTTP API docs.
+- If you haven't updated the HTTP API functionality, skip this step.
+    1. If you haven't bumped the version in the OpenAPI spec, delete the corresponding GitHub release and git tag.
+    1. Generate  `redoc-static.html`: `npx redoc-cli bundle docs/openapi.yaml -o redoc-static.html --title 'spaCy Server'`
+    1. Create a GitHub release. The release's body should be ```Download and open the release asset, `redoc-static.html`, in your browser to view the HTTP API documentation.```. Upload `redoc-static.html` as an asset.
 - If required, update the [Docker Hub repository](https://hub.docker.com/r/neelkamath/spacy-server)'s **Overview**.
 - For every commit to the `master` branch in which the tests have passed, the following will automatically be done.
    - The new images will be uploaded to Docker Hub.

--- a/docs/openapi.yaml
+++ b/docs/openapi.yaml
 openapi: 3.0.2
 info:
  title: spaCy Server
-  version: '1'
+  version: '2'
  description: |
    Industrial-strength NLP via [spaCy](https://spacy.io) and [sense2vec](https://github.com/explosion/sense2vec). No
    knowledge of spaCy or sense2vec is required to use this service.
@@ -26,8 +26,8 @@ paths:
  /ner:
    post:
      tags: [nlp]
-      description: Named entity recognition. Similar phrases will also be provided via sense2vec. The pretrained model
-        must have the `ner` and `parser` pipeline components to use this endpoint.
+      description: Named entity recognition. The pretrained model must have the `ner` and `parser` pipeline components
+        to use this endpoint. If a sense2vec model was bundled with the service, similar phrases can also be provided.
      operationId: ner
      requestBody:
        required: true
@@ -39,7 +39,7 @@ paths:
                - Net income was $9.4 million compared to the prior year of $2.7 million. Google is a big company.
                - Revenue exceeded twelve billion dollars, with a loss of $1b.
            schema:
-              $ref: '#/components/schemas/Sections'
+              $ref: '#/components/schemas/NERRequest'
      responses:
        '200':
          description: Labeled text, with phrases similar to each entity
@@ -74,13 +74,21 @@ paths:
                        text_with_ws: Sundar Pichai
                    text: Google is headed by Sundar Pichai.
              schema:
-                $ref: '#/components/schemas/NamedEntities'
+                $ref: '#/components/schemas/NERResponse'
        '400':
          description: The pretrained model lacks the `ner` or `parser` pipeline components.
          content:
            application/json:
-              example:
+              examples:
+                invalid_model:
+                  summary: The spaCy model lacks the required pipeline components.
+                  value:
                    detail: The pretrained model (en_trf_bertbaseuncased_lg) doesn't support named entity recognition.
+                sense2vec_disabled:
+                  summary: Similar phrases via sense2vec were requested, but a sense2vec model wasn't bundled with the
+                    service.
+                  value:
+                    detail: There is no sense2vec model bundled with this service.
              schema:
                $ref: '#/components/schemas/InvalidModel'
  /pos:
@@ -225,18 +233,22 @@ paths:
          description: All systems are operational
 components:
  schemas:
-    Sections:
+    NERRequest:
      type: object
      properties:
        sections:
          description:
-            Although you could pass the full text as a single array item, it would be faster to split large text
-            into multiple items. Each item needn't be semantically related.
+            Although you could pass the full text as a single array item, it would be faster to split large text into
+            multiple items. Each item needn't be semantically related.
          type: array
          items:
            type: string
+        sense2vec:
+          description: Whether to also compute similar phrases using sense2vec (significantly slower)
+          type: boolean
+          default: false
      required: [sections]
-    NamedEntities:
+    NERResponse:
      type: object
      properties:
        data:
@@ -268,7 +280,7 @@ components:
                      description: The entity’s lemma.
                    sense2vec:
                      type: array
-                      description: Phrases similar to the entity
+                      description: Phrases similar to the entity (empty if sense2vec was disabled)
                      items:
                        type: object
                        properties:

--- a/requirements.txt
+++ b/requirements.txt
@@ -2,6 +2,7 @@
 # particular versions.
 spacy==2.2.3
 sense2vec==1.0.2
+
 fastapi==0.45.0
 uvicorn==0.10.8
 pytest>=4.6.7,<5
\ No newline at end of file
--- a/scripts/deploy.sh
+++ b/scripts/deploy.sh
 #!/usr/bin/env sh

-# Builds and uploads every image (e.g., neelkamath/spacy-server:v1-en_core_web_sm) to Docker Hub.
+# Builds and uploads every image (e.g., neelkamath/spacy-server:2-en_core_web_sm-sense2vec) to Docker Hub.

 # Get the HTTP API version.
 version=$(grep version docs/openapi.yaml -m 1)
 version=${version#*: }
-version=v$(echo "$version" | cut -d "'" -f 2)
+version=$(echo "$version" | cut -d "'" -f 2)

 # Log in.
 echo "$DOCKER_HUB_PASSWORD" | docker login -u "$DOCKER_HUB_USER" --password-stdin https://index.docker.io/v1/

 # Build and upload the images.
-while IFS='' read -r model || [ -n "$model" ]; do
-  tag="$DOCKER_HUB_USER"/spacy-server:"$version"-"$model"
-  docker build --build-arg SPACY_MODEL="$model" -t "$tag" -f docker/Dockerfile .
-  docker push "$tag"
-  docker rmi "$tag" # Delete the image to prevent the device (e.g., CI runner) from running out of space and crashing.
+while IFS='' read -r spacy_model || [ -n "$spacy_model" ]; do
+  base_tag="$DOCKER_HUB_USER"/spacy-server:"$version"-"$spacy_model"
+  sense2vec_tag="$base_tag"-sense2vec
+  docker build --target base --build-arg SPACY_MODEL="$spacy_model" -t "$base_tag" -f docker/Dockerfile .
+  docker build --build-arg SPACY_MODEL="$spacy_model" -t "$sense2vec_tag" -f docker/Dockerfile .
+  docker push "$base_tag"
+  docker push "$sense2vec_tag"
+  docker rmi "$base_tag" "$sense2vec_tag" # Prevent the device (e.g., CI runner) from running out of space and crashing.
 done <scripts/models.txt
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
 #!/usr/bin/env sh

-# Executes a command in a virtual environment (e.g., <sh setup.sh 'uvicorn main:app --reload'>).
+# Sets up the development environment.

 python -m venv venv
 . venv/bin/activate
 pip install -r requirements.txt
 python -m spacy download "$SPACY_MODEL"
-$1
--- a/src/main.py
+++ b/src/main.py
@@ -11,34 +11,46 @@ import starlette.status

 app = fastapi.FastAPI()
 model = os.getenv('SPACY_MODEL')
-pipeline_error = 'The pretrained model ({})'.format(model) + " doesn't support {}."
+pipeline_error = 'The pretrained model ({})'.format(model) \
+                 + " doesn't support {}."
 nlp = spacy.load(model)
-nlp.add_pipe(sense2vec.Sense2VecComponent(nlp.vocab).from_disk('src/s2v_old'))
+if os.getenv('SENSE2VEC') == '1':
+    nlp.add_pipe(
+        sense2vec.Sense2VecComponent(nlp.vocab).from_disk('src/s2v_old')
+    )


-class SectionsModel(pydantic.BaseModel):
+class NERRequest(pydantic.BaseModel):
    sections: typing.List[str]
+    sense2vec: bool = False


 @app.post('/ner')
-async def recognize_named_entities(request: SectionsModel):
+async def recognize_named_entities(request: NERRequest):
    if not nlp.has_pipe('ner') or not nlp.has_pipe('parser'):
        raise fastapi.HTTPException(
            status_code=400,
            detail=pipeline_error.format('named entity recognition')
        )
+    if request.sense2vec and not nlp.has_pipe('sense2vec'):
+        raise fastapi.HTTPException(
+            status_code=400,
+            detail='There is no sense2vec model bundled with this service.'
+        )
    response = {'data': []}
    for doc in nlp.pipe(request.sections, disable=['tagger']):
        for sent in doc.sents:
-            entities = [build_entity(ent) for ent in sent.ents]
+            entities = [
+                build_entity(ent, request.sense2vec) for ent in sent.ents
+            ]
            data = {'text': sent.text, 'entities': entities}
            response['data'].append(data)
    return response


-def build_entity(ent):
+def build_entity(ent, use_sense2vec):
    similar = []
-    if ent._.in_s2v:
+    if use_sense2vec and ent._.in_s2v:
        for data in ent._.s2v_most_similar():
            similar.append(
                {'phrase': data[0][0], 'similarity': float(data[1])}
@@ -69,7 +81,8 @@ async def tag_parts_of_speech(request: TextModel):
            detail=pipeline_error.format('part-of-speech tagging')
        )
    data = []
-    for token in [build_token(token) for token in nlp(request.text)]:
+    doc = nlp(request.text, disable=['sense2vec'])
+    for token in [build_token(token) for token in doc]:
        text = token['sent']
        del token['sent']
        if text in [obj['text'] for obj in data]:
@@ -126,7 +139,7 @@ def build_token(token):

 @app.post('/tokenizer')
 async def tokenize(request: TextModel):
-    doc = nlp(request.text, disable=['tagger', 'parser', 'ner'])
+    doc = nlp(request.text, disable=['tagger', 'parser', 'ner', 'sense2vec'])
    return {'tokens': [token.text for token in doc]}


@@ -137,7 +150,7 @@ async def sentencize(request: TextModel):
            status_code=400,
            detail=pipeline_error.format('sentence segmentation')
        )
-    doc = nlp(request.text, disable=['tagger', 'ner'])
+    doc = nlp(request.text, disable=['tagger', 'ner', 'sense2vec'])
    return {'sentences': [sent.text for sent in doc.sents]}



--- a/src/outputs/ner/sense2vec_disabled.json
+++ b/src/outputs/ner/sense2vec_disabled.json
+{
+  "data": [
+    {
+      "text": "Net income was $9.4 million compared to the prior year of $2.7 million.",
+      "entities": [
+        {
+          "text": "$9.4 million",
+          "label": "MONEY",
+          "start_char": 15,
+          "end_char": 27,
+          "lemma": "$ 9.4 million",
+          "start": 3,
+          "end": 6,
+          "text_with_ws": "$9.4 million ",
+          "sense2vec": []
+        },
+        {
+          "text": "the prior year",
+          "label": "DATE",
+          "start_char": 40,
+          "end_char": 54,
+          "lemma": "the prior year",
+          "start": 8,
+          "end": 11,
+          "text_with_ws": "the prior year ",
+          "sense2vec": []
+        },
+        {
+          "text": "$2.7 million",
+          "label": "MONEY",
+          "start_char": 58,
+          "end_char": 70,
+          "lemma": "$ 2.7 million",
+          "start": 12,
+          "end": 15,
+          "text_with_ws": "$2.7 million",
+          "sense2vec": []
+        }
+      ]
+    },
+    {
+      "text": "Google is a big company.",
+      "entities": [
+        {
+          "text": "Google",
+          "label": "ORG",
+          "start_char": 72,
+          "end_char": 78,
+          "lemma": "Google",
+          "start": 16,
+          "end": 17,
+          "text_with_ws": "Google ",
+          "sense2vec": []
+        }
+      ]
+    },
+    {
+      "text": "Revenue exceeded twelve billion dollars, with a loss of $1b.",
+      "entities": [
+        {
+          "text": "twelve billion dollars",
+          "label": "MONEY",
+          "start_char": 17,
+          "end_char": 39,
+          "lemma": "twelve billion dollar",
+          "start": 2,
+          "end": 5,
+          "text_with_ws": "twelve billion dollars",
+          "sense2vec": []
+        },
+        {
+          "text": "1b",
+          "label": "MONEY",
+          "start_char": 57,
+          "end_char": 59,
+          "lemma": "1b",
+          "start": 11,
+          "end": 12,
+          "text_with_ws": "1b",
+          "sense2vec": []
+        }
+      ]
+    }
+  ]
+}
\ No newline at end of file
--- a/src/outputs/ner.json
+++ b/src/outputs/ner.json
 {
  "data": [
    {
+      "text": "Net income was $9.4 million compared to the prior year of $2.7 million.",
      "entities": [
        {
-          "end": 6,
-          "end_char": 27,
+          "text": "$9.4 million",
          "label": "MONEY",
+          "start_char": 15,
+          "end_char": 27,
          "lemma": "$ 9.4 million",
-          "sense2vec": [],
          "start": 3,
-          "start_char": 15,
-          "text": "$9.4 million",
-          "text_with_ws": "$9.4 million "
+          "end": 6,
+          "text_with_ws": "$9.4 million ",
+          "sense2vec": []
        },
        {
-          "end": 11,
-          "end_char": 54,
+          "text": "the prior year",
          "label": "DATE",
+          "start_char": 40,
+          "end_char": 54,
          "lemma": "the prior year",
+          "start": 8,
+          "end": 11,
+          "text_with_ws": "the prior year ",
          "sense2vec": [
            {
              "phrase": "the previous year",
@@ -59,17 +64,17 @@
              "phrase": "the entire year",
              "similarity": 0.6915000081062317
            }
-          ],
-          "start": 8,
-          "start_char": 40,
-          "text": "the prior year",
-          "text_with_ws": "the prior year "
+          ]
        },
        {
-          "end": 15,
-          "end_char": 70,
+          "text": "$2.7 million",
          "label": "MONEY",
+          "start_char": 58,
+          "end_char": 70,
          "lemma": "$ 2.7 million",
+          "start": 12,
+          "end": 15,
+          "text_with_ws": "$2.7 million",
          "sense2vec": [
            {
              "phrase": "$1 million",
@@ -111,22 +116,22 @@
              "phrase": "$2 million",
              "similarity": 0.7371000051498413
            }
-          ],
-          "start": 12,
-          "start_char": 58,
-          "text": "$2.7 million",
-          "text_with_ws": "$2.7 million"
+          ]
        }
-      ],
-      "text": "Net income was $9.4 million compared to the prior year of $2.7 million."
+      ]
    },
    {
+      "text": "Google is a big company.",
      "entities": [
        {
-          "end": 17,
-          "end_char": 78,
+          "text": "Google",
          "label": "ORG",
+          "start_char": 72,
+          "end_char": 78,
          "lemma": "Google",
+          "start": 16,
+          "end": 17,
+          "text_with_ws": "Google ",
          "sense2vec": [
            {
              "phrase": " Google",
@@ -168,33 +173,33 @@
              "phrase": "Yahoo",
              "similarity": 0.8037999868392944
            }
-          ],
-          "start": 16,
-          "start_char": 72,
-          "text": "Google",
-          "text_with_ws": "Google "
+          ]
        }
-      ],
-      "text": "Google is a big company."
+      ]
    },
    {
+      "text": "Revenue exceeded twelve billion dollars, with a loss of $1b.",
      "entities": [
        {
-          "end": 5,
-          "end_char": 39,
+          "text": "twelve billion dollars",
          "label": "MONEY",
+          "start_char": 17,
+          "end_char": 39,
          "lemma": "twelve billion dollar",
-          "sense2vec": [],
          "start": 2,
-          "start_char": 17,
-          "text": "twelve billion dollars",
-          "text_with_ws": "twelve billion dollars"
+          "end": 5,
+          "text_with_ws": "twelve billion dollars",
+          "sense2vec": []
        },
        {
-          "end": 12,
-          "end_char": 59,
+          "text": "1b",
          "label": "MONEY",
+          "start_char": 57,
+          "end_char": 59,
          "lemma": "1b",
+          "start": 11,
+          "end": 12,
+          "text_with_ws": "1b",
          "sense2vec": [
            {
              "phrase": "100m",
@@ -236,14 +241,9 @@
              "phrase": "100B",
              "similarity": 0.8209999799728394
            }
-          ],
-          "start": 11,
-          "start_char": 57,
-          "text": "1b",
-          "text_with_ws": "1b"
+          ]
        }
-      ],
-      "text": "Revenue exceeded twelve billion dollars, with a loss of $1b."
+      ]
    }
  ]
 }
\ No newline at end of file
--- a/src/test_main.py
+++ b/src/test_main.py
@@ -5,29 +5,51 @@ import starlette.testclient

 client = starlette.testclient.TestClient(main.app)

-
-def test_ner():
-    body = {
+ner_body = {
    'sections': [
        'Net income was $9.4 million compared to the prior year of $2.7 '
        + 'million. Google is a big company.',
        'Revenue exceeded twelve billion dollars, with a loss of $1b.'
    ]
-    }
-    response = client.post('/ner', json=body)
+}
+ner_sense2vec_body = {**ner_body, 'sense2vec': True}
+
+
+def test_ner_sense2vec_enabled():
+    response = client.post('/ner', json=ner_sense2vec_body)
    assert response.status_code == 200
-    with open('src/outputs/ner.json') as f:
+    with open('src/outputs/ner/sense2vec_enabled.json') as f:
+        assert response.json() == json.load(f)
+
+
+def test_ner_sense2vec_disabled():
+    response = client.post('/ner', json=ner_body)
+    with open('src/outputs/ner/sense2vec_disabled.json') as f:
        assert response.json() == json.load(f)


+def test_ner_spacy_fail():
+    fail('/ner', ner_body, 'ner')
+
+
+def test_ner_sense2vec_fail():
+    fail('/ner', ner_sense2vec_body, 'sense2vec')
+
+
+pos_body = {'text': 'Apple is looking at buying U.K. startup for $1 billion'}
+
+
 def test_pos():
-    text = {'text': 'Apple is looking at buying U.K. startup for $1 billion'}
-    response = client.post('/pos', json=text)
+    response = client.post('/pos', json=pos_body)
    assert response.status_code == 200
    with open('src/outputs/pos.json') as f:
        assert response.json() == json.load(f)


+def test_pos_fail():
+    fail('/pos', pos_body, 'parser')
+
+
 def test_tokenizer():
    text = {'text': 'Apple is looking at buying U.K. startup for $1 billion'}
    response = client.post('/tokenizer', json=text)
@@ -36,16 +58,29 @@ def test_tokenizer():
        assert response.json() == json.load(f)


+sentencizer_body = {
+    'text': 'Apple is looking at buying U.K. startup for $1 billion. Another '
+            + 'sentence.'
+}
+
+
 def test_sentencizer():
-    body = {
-        'text': 'Apple is looking at buying U.K. startup for $1 billion. '
-                + 'Another sentence.'
-    }
-    response = client.post('/sentencizer', json=body)
+    response = client.post('/sentencizer', json=sentencizer_body)
    assert response.status_code == 200
    with open('src/outputs/sentencizer.json') as f:
        assert response.json() == json.load(f)


+def test_sentencizer_fail():
+    fail('/sentencizer', sentencizer_body, 'parser')
+
+
 def test_health_check():
    assert client.get('/health_check').status_code == 204
+
+
+def fail(endpoint, body, pipe):
+    with main.nlp.disable_pipes(pipe):
+        response = client.post(endpoint, json=body)
+        assert response.status_code == 400
+        assert 'detail' in response.json()