Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add readme #2

Merged
merged 1 commit into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
# Lambda Layer for python NLTK package

Credit to https://github.com/customink/lambda-python-nltk-layer

Lambda layer to enable using famous NLTK python package with lambda.

Work with Lambda functions packaged as both docker images and Zip packages.

### Lambda functions packaged as Docker Images or OCI Images

To use Lambda NLTK with docker images, package your web app (http api) in a Dockerfile, and add one line to copy NLTK data files to /opt/nltk_data inside your container:

Pre-compiled Lambda NLTK data are provided in ECR public repo: public.ecr.aws/m5s2b0d4/nltk_lambda_layer.

```dockerfile
COPY --from=public.ecr.aws/m5s2b0d4/nltk_lambda_layer:latest /nltk_data /opt/nltk_data
```

Then add one line to config env variable `NLTK_DATA`

```dockerfile
ENV NLTK_DATA=/opt/nltk_data
``


Below is a Dockerfile for [an example nodejs application](examples/Dockerfile).
12 changes: 12 additions & 0 deletions examples/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM public.ecr.aws/docker/library/python:3.12.0-slim-bullseye

COPY --from=public.ecr.aws/m5s2b0d4/nltk_lambda_layer:preview /nltk_data /opt/nltk_data

# Copy function code
COPY . ${LAMBDA_TASK_ROOT}
# from your project folder.
COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" -U --no-cache-dir

ENV NLTK_DATA=/opt/nltk_data
CMD ["python", "main.py"]
21 changes: 21 additions & 0 deletions examples/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

example_sent = """This is a sample sentence,
showing off the stop words filtration."""

stop_words = set(stopwords.words('english'))

word_tokens = word_tokenize(example_sent)
# converts the words in word_tokens to lower case and then checks whether
#they are present in stop_words or not
filtered_sentence = [w for w in word_tokens if not w.lower() in stop_words]
#with no lower case conversion
filtered_sentence = []

for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)

print(word_tokens)
print(filtered_sentence)
1 change: 1 addition & 0 deletions examples/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nltk
Loading