← Back

Problem 3.2 Segmenting Text

Introduction

This problem requires segmenting a continuous stream of characters into meaningful English words using a given dictionary, or report that such a segmentation does not exist.

Such programs are often required in text recognition systems where they are used to decode and validate continuous input from OCR (Optical Character Recognition) software or from speech-to-text systems to produce accurate and meaningful text.

Problem statement

Given a continuous stream of characters, determine if the stream can be segmented into a sentence consisting of valid English words. The sentence does not have to make grammatical sense. It simply needs to be a sequence of valid words according to a dictionary. A dictionary of valid words is provided for reference.

Input:

Output (in two different stages):

It is a good idea to solve this problem in two separate stages.

In stage 1:

Write a function that returns:

  • true if the stream can be segmented into valid English words.
  • false otherwise.

In stage 2:

Write a function that returns:

  • true and a string decoded of a space-delimited sentence of valid words if the stream can be segmented into valid English words (according to the dictionary).
  • false and a string decoded which is empty if the stream cannot be segmented into valid English words (according to the dictionary).

Further details

Some examples of inputs and outputs are as follows:

Example 1:

  • Input: stream = "thisisasimplevalidstring"
  • Output: true, decoded = "this is a simple valid string"

Example 2:

  • Input: stream = "uuydhllhfis"
  • Output: false, decoded = ""

Example 3:

  • Input: stream = "thisisanalmostavalidbutinvavidstring"
  • Output: false, decoded = ""

Example 4:

  • Input: stream = "thequickbrownfoxjumpsoverthelazydog"
  • Output: true, decoded = "the quick brown fox jumps over the lazy dog"

Implementation

The following code contains an empty function (to be implemented by you), and a main that tests it.