The CTAP system is designed to address the issues reviewed in the previous section. The goal is a system that supports complexity analysis in an easy-to-use, platform independent, flexible and extendable environment. The system consists of four major user modules—Corpus Manager, Feature Selector, Analysis Generator, and Result Visualizer—as well as a Feature Importer administrative module.
The Corpus Manager helps users manage the language materials that need to be analyzed. They can create corpora to hold texts, folders to group corpora and tags to label specific texts. The text labels will then be used to help filter and select target texts for analysis. They can also be used to group texts for result visualization purposes.
The Feature Selector from CTAP enables users to group their selection of the complexity features into feature sets. This flexibility is realized by utilizing the Unstructured Information Management framework (UIMA ) provided by the Apache Foundation. By using the UIMA framework, every complexity feature can be implemented as an Aggregate Analysis Engine (AAE) which chains up a series of primitive Analysis Engines (AEs). Each AE may be a general purpose NLP components, such as a sentence segmenter, parser, or POS tagger. It may also be one that calculates some complexity feature values based on analysis results from upstream AEs or components. This setup enables and encourages reusability of AEs or analysis components, thus making collaborative development of complexity feature extractors easier and faster.
The users can generate analyses in CTAP’s Analysis Generator. Each analysis extracts a set of features from the designated corpus. Results of the analysis are then persisted into the system database and may be downloaded to the user’s local machine for further processing. The user can also choose to explore analysis results with CTAP’s Result Visualizer. The UIMA framework supports parallel computing that can easily scale out for handling big data analysis needs.
The Result Visualizer is a simple and intuitive module that plots analysis results for the user to visualize preliminary findings from the analysis. It supports basic plot manipulation and download.