Skip to Main Content

BIOF 021 | R for Analysis of Text Data

May 4, 2021 to May 18, 2021

5:30pm - 8:00pm

Registration occurs on a first-come, first-served basis. The deadline for registration is one week before the first day of the course.  If you are unable to register before the deadline, please email: or call 301-496-7977 for space availability.

NIH Fellows or NIH community members being sponsored by their lab and awaiting payment authorization can tentatively hold a seat using the “Reserve A Seat” option. FAES must receive payment within 7 business days after reserving a seat or 3 business days before the start of the workshop, which ever comes first. If payment is not received in this time frame, your reservation will be canceled.

Register Now

NIH Only: Reserve a Seat

This workshop will be held live on May 4, 11, 18, 2021 (5:30PM-8:00PM)

This workshop will provide an introduction to working with text data in R and explore various approaches to analyzing text data. The first session will cover principles for wrangling text data as well as some basic text mining applications. The subsequent two sessions will delve into specific techniques to enable automated analysis of text data. The workshop will include three two-hour sessions with the following learning objectives:

  •       After Session 1: Introduction to Working with Text learners will be able to:
    •    Read text data into R and prepare it for analysis;
    •    Understand and select from various options in preparing text, such as stemming, lemmatization, term frequency weighting, term frequency-inverse document frequency weighting (tf-idf), and tokenization;
    •    Conduct simple text mining to explore content of a text corpus.
  •       After Session 2: Unsupervised Approaches to Text Analysis, learners will be able to:
    •    Describe how unsupervised approaches can be used to identify clusters of related documents;
    •    Process text data to prepare for unsupervised analysis;
    •    Build, train, and evaluate models for text clustering;
    •    Interpret outputs of clustering algorithms.
  •       After Session 3: Supervised Approaches to Text Analysis, learners will be able to:
    •    Describe how supervised approaches can be used to develop text-based models for multi-class classification;
    •    Process text data to prepare for supervised analysis;
    •    Build, train, and test models for text classification.


basic familiarity with R


Introductory coding courses and workshops

BIOF 017 | Introductory R Boot Camp
BIOF 020 | Python for Beginners
BIOF 043 | For True BeginRs
BIOF 101 | Introductory Coding Skills

Simultaneous access to two screens is highly recommended for best learning experience. Examples include one computer with two screens, two computers, one laptop and one tablet, etc.​

General Training Rate

Discounted Training Rate
$625.00 - NIH Community (Trainees, Employees, Contractors, Volunteers, etc.)
$695.00 - Academia, US Government (Non-NIH), US Military

Technology Fee

Although no grades are given for courses, each participant will receive Continuing Education Units (CEUs) based on the number of contact hours. One CEU is equal to ten contact hours. Upon completion of this course each participant will receive a certificate, showing completion of the workshop and 0.7 CEUs.

Refund Policy
100% tuition refund for registrations cancelled 14 or more calendar days prior to the start of the workshop.

50% tuition refund for registrations cancelled between 4 to 13 calendar days prior to the start of the workshop.

No refund will be issued for registrations cancelled 3 calendar days or less prior to the start of the workshop.

All cancellations must be received in writing via email to Ms. Carline Coote at

Cancellations received after 4:00 pm (ET) on business days or received on non-business days are time marked for the following business day.

All refund payments will be processed by the start of the initial workshop.