Data Scientist

HeiTech Services

Hyattsville, MD, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Job DescriptionDescription:

We are looking for a highly skilled NLP Data Scientist / Developer to design and implement natural language processing solutions for real-world problems. You will work on extracting insights from unstructured text data, building language models, and deploying real-world, intelligent applications that understand and process human language. This role blends data science, machine learning, and software development, with Python and LLMs at the core.

Key Responsibilities:

Develop and implement NLP pipelines to process, analyze, and extract insights from structured and unstructured text data.
Build and fine-tune models for text classification, named entity recognition, summarization, sentiment analysis, topic modeling, etc.
Work with state-of-the-art language models (e.g., BERT/DeBERTa, spaCy, LLM APIs) and apply transfer learning techniques.
Clean, tokenize, and normalize large text corpora in various formats (PDFs, HTML, etc.).
Collaborate with cross-functional teams to integrate NLP features into software tools and customer-facing applications.
Create REST APIs or services to serve models in production using frameworks like FastAPI or Flask.
Optimize performance, accuracy, and scalability of NLP systems.
Document technical approaches, experiment results, and development procedures for internal and external stakeholders.

What We Offer:

Competitive salary and benefits package
Flexible remote work options
Access to GPU resources and cloud infrastructure
Opportunities to work on cutting-edge NLP problems
A collaborative, forward-thinking AI/ML team

Requirements:

Required Qualifications:

2+ years of experience with NLP development and Python packages.
Strong knowledge of NLP libraries such as spaCy and Transformers (Hugging Face).
Solid understanding of text preprocessing, vectorization (TF-IDF, word embeddings), and classification techniques.
Experience with machine learning libraries like TensorFlow/PyTorch.
Strong knowledge of hybrid models incorporating LLMs/genAI and traditional ML approaches
Experience with PDF text extraction.
Must currently possess or be eligible to obtain a Public Trust clearance

Preferred Qualifications:

Bachelor’s or Master’s degree in Data Science, Computational Linguistics, Machine Learning, Applied Mathematics, Statistics, Computer Science or a related field.
Experience with LLMs (Large Language Models) and prompt engineering.
Knowledge of data privacy, redaction, and PII detection in text.
Background in information retrieval or question-answering systems.
Prior work with government, legal, healthcare, or enterprise document processing is a plus.
Experience working with cloud platforms (AWS, Azure, GCP) and containerization (Docker).
Familiarity with REST APIs, FastAPI/Flask, and deploying models to production.
Proficiency with version control (Git) and collaborative development workflows.