Search

Data Scientist

HeiTech Services
locationHyattsville, MD, USA
PublishedPublished: 6/14/2022
Technology
Full Time

Job Description

Job DescriptionDescription:

We are looking for a highly skilled NLP Data Scientist / Developer to design and implement natural language processing solutions for real-world problems. You will work on extracting insights from unstructured text data, building language models, and deploying real-world, intelligent applications that understand and process human language. This role blends data science, machine learning, and software development, with Python and LLMs at the core.


Key Responsibilities:

  • Develop and implement NLP pipelines to process, analyze, and extract insights from structured and unstructured text data.
  • Build and fine-tune models for text classification, named entity recognition, summarization, sentiment analysis, topic modeling, etc.
  • Work with state-of-the-art language models (e.g., BERT/DeBERTa, spaCy, LLM APIs) and apply transfer learning techniques.
  • Clean, tokenize, and normalize large text corpora in various formats (PDFs, HTML, etc.).
  • Collaborate with cross-functional teams to integrate NLP features into software tools and customer-facing applications.
  • Create REST APIs or services to serve models in production using frameworks like FastAPI or Flask.
  • Optimize performance, accuracy, and scalability of NLP systems.
  • Document technical approaches, experiment results, and development procedures for internal and external stakeholders.


What We Offer:

  • Competitive salary and benefits package
  • Flexible remote work options
  • Access to GPU resources and cloud infrastructure
  • Opportunities to work on cutting-edge NLP problems
  • A collaborative, forward-thinking AI/ML team

Requirements:

Required Qualifications:

  • 2+ years of experience with NLP development and Python packages.
  • Strong knowledge of NLP libraries such as spaCy and Transformers (Hugging Face).
  • Solid understanding of text preprocessing, vectorization (TF-IDF, word embeddings), and classification techniques.
  • Experience with machine learning libraries like TensorFlow/PyTorch.
  • Strong knowledge of hybrid models incorporating LLMs/genAI and traditional ML approaches
  • Experience with PDF text extraction.
  • Must currently possess or be eligible to obtain a Public Trust clearance


Preferred Qualifications:

  • Bachelor’s or Master’s degree in Data Science, Computational Linguistics, Machine Learning, Applied Mathematics, Statistics, Computer Science or a related field.
  • Experience with LLMs (Large Language Models) and prompt engineering.
  • Knowledge of data privacy, redaction, and PII detection in text.
  • Background in information retrieval or question-answering systems.
  • Prior work with government, legal, healthcare, or enterprise document processing is a plus.
  • Experience working with cloud platforms (AWS, Azure, GCP) and containerization (Docker).
  • Familiarity with REST APIs, FastAPI/Flask, and deploying models to production.
  • Proficiency with version control (Git) and collaborative development workflows.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...