Job Description
Job DescriptionDescription:
We are looking for a highly skilled NLP Data Scientist / Developer to design and implement natural language processing solutions for real-world problems. You will work on extracting insights from unstructured text data, building language models, and deploying real-world, intelligent applications that understand and process human language. This role blends data science, machine learning, and software development, with Python and LLMs at the core.
Key Responsibilities:
- Develop and implement NLP pipelines to process, analyze, and extract insights from structured and unstructured text data.
- Build and fine-tune models for text classification, named entity recognition, summarization, sentiment analysis, topic modeling, etc.
- Work with state-of-the-art language models (e.g., BERT/DeBERTa, spaCy, LLM APIs) and apply transfer learning techniques.
- Clean, tokenize, and normalize large text corpora in various formats (PDFs, HTML, etc.).
- Collaborate with cross-functional teams to integrate NLP features into software tools and customer-facing applications.
- Create REST APIs or services to serve models in production using frameworks like FastAPI or Flask.
- Optimize performance, accuracy, and scalability of NLP systems.
- Document technical approaches, experiment results, and development procedures for internal and external stakeholders.
What We Offer:
- Competitive salary and benefits package
- Flexible remote work options
- Access to GPU resources and cloud infrastructure
- Opportunities to work on cutting-edge NLP problems
- A collaborative, forward-thinking AI/ML team
Requirements:
Required Qualifications:
- 2+ years of experience with NLP development and Python packages.
- Strong knowledge of NLP libraries such as spaCy and Transformers (Hugging Face).
- Solid understanding of text preprocessing, vectorization (TF-IDF, word embeddings), and classification techniques.
- Experience with machine learning libraries like TensorFlow/PyTorch.
- Strong knowledge of hybrid models incorporating LLMs/genAI and traditional ML approaches
- Experience with PDF text extraction.
- Must currently possess or be eligible to obtain a Public Trust clearance
Preferred Qualifications:
- Bachelor’s or Master’s degree in Data Science, Computational Linguistics, Machine Learning, Applied Mathematics, Statistics, Computer Science or a related field.
- Experience with LLMs (Large Language Models) and prompt engineering.
- Knowledge of data privacy, redaction, and PII detection in text.
- Background in information retrieval or question-answering systems.
- Prior work with government, legal, healthcare, or enterprise document processing is a plus.
- Experience working with cloud platforms (AWS, Azure, GCP) and containerization (Docker).
- Familiarity with REST APIs, FastAPI/Flask, and deploying models to production.
- Proficiency with version control (Git) and collaborative development workflows.