Distil

Distil, a system developed by Uncharted Software based on research funded by the D3M program, is a mixed-initiative modeling workbench that enables subject matter experts to discover underlying dynamics of complex systems and generate data-driven models. To maximize the combinatorial power of human/machine intelligence, Distil incorporates semantic data discovery, enrichment, analytic model recommendation and automated visualization to facilitate understanding of data and models.

Through Distil, SMEs visually explore and understand heterogeneous data sources related to analytic objectives, express the objectives using an intuitive visual vocabulary and interact with, understand, curate and refine resultant machine-inferred data models. Distil focuses on visual question decomposition into quantifiable facets that recommender services compose into user-tailorable analytic workflows by interfacing with model construction components.

Design philosophy

Distil has been designed to empower domain experts with the following principles:

Features

Data Exploration. Search for keywords, values or features in available datasets and models or upload a custom CSV/ZIP. Use natural language queries to describe goals. For the selected dataset, correct automatically inferred feature data types and choose the feature for which you want to build predictive models. Augment features to build complex timeseries or geocoordinate features.

Select Model Features. Choose features that may predict the selected target. Interactive highlighting reveals relationships between features. Automatic variable ranking sorts features by importance to the target. Data clustering groups data to reveal outliers and commonalities. Augment with other sources via automatically suggested joins. Exclude noisy or irrelevant data samples to get the best results.

Check Models. Review classification, regression and timeseries forecasting model results. Features are displayed in terms of strength to the model as a whole and importance to individual predictions. Compare results with ground truth and select variables or predictions to understand how samples impact accuracy. Iteratively build models and apply them to new data or forecast beyond the dataset.

Remote Sensing. Train a classifier to assign labels to multi-spectral satellite images to solve problems such as land use. Perform interactive image similarity searches to generate a ranked list of images that match an input set. View results on a map, ranked by similarity or confidence.

Demo

The following demo video shows how Distil’s remote sensing capabilities can be used to train a classifier that predicts land use.

Repositories

Distil was developed as part of DARPA’s D3M ecosystem, and consists of the following main repositories:

Contributors

Uncharted Software Inc.® is a leading provider of innovative visual analytics software solutions for Fortune 500 companies, federal government agencies, and third-party software firms. We believe that the right visualization can have a profound impact on people’s ability to explore, assimilate, understand, and create value from large amounts of data. Uncharted led the development of the Distil platform, providing the user-facing application and data server, as well participating in the development of the machine learning components contributed by our partners.

KUNGFU.AI is a leading-edge AI professional services firm based in Austin, TX. We build robust, scalable state-of-the-art AI solutions and maintain the models in production in our clients' environments. Our key AI/ML capability areas include computer vision, natural language processing, and predictive analytics. As a subcontractor to Uncharted Software under the DARPA D3M program, KUNGFU has developed a robust set of machine learning primitives, with a particular focus on multivariate time series forecasting and remote sensing using multispectral satellite imagery.

Jataware is a research and development company focused on software engineering, data science, machine learning and high performance computing. We provide technology consulting services and digital solutions for a wide range of problem sets within government and commercial spaces. As a member of Uncharted's D3M team, Jataware has developed components for: computer vision tasks operating on standard image and multispectral satellite image data; time series, audio and text classification problems; and graph analytics.

Qntfy is a technology solutions provider bridging data science and human behavior. We make complex psychological and behavioral data accessible, scalable and actionable for both individuals and organizations. Qntfy and Uncharted worked together to build a data-driven platform to automatically find and tune a machine learning pipeline given an end user’s dataset, supporting tabular, time series and image domains.