Quantcast
Channel: KNIME news, usage, and development
Viewing all articles
Browse latest Browse all 561

Will They Blend? Experiments in Data & Tool Blending. Today: Teradata Aster meets KNIME Table. What is that chest pain?

$
0
0

In this blog series we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source devops on the cloud with protected internal legacy tools, SQL with noSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?

Follow us here and send us your ideas for the next data blending challenge you’d like to see at willtheyblend@knime.com.

Today: Teradata Aster meets KNIME Table. What is that chest pain?

Author: Kate Phillips, Data Scientist, Analytics Business Consulting Organization, Teradata

The Challenge

Today’s challenge is related to the healthcare industry. You know that little pain in the chest you sometimes feel and you do not know whether to run to the hospital or just wait until it goes away? Would it be possible to recognize as early as possible just how serious an indication of heart disease that little pain is?

The goal of this experiment is to build a model to predict whether or not a particular patient with that chest pain has indeed heart disease.

To investigate this topic, we will use open-source data obtained from the University of California Irvine Machine Learning Repository, which can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/. Of all datasets contained in this repository, we will use the processed Switzerland, Cleveland, and VA data sets and the reprocessed Hungarian data set.

These data were collected from 920 cardiac patients: 725 men and 193 women aged between 28 and 77 years old; 294 from the Hungarian Institute of Cardiology, 123 from the University Hospitals in Zurich and Basel, Switzerland, 200 from the V.A. Medical Center in Long Beach, California, and 303 from the Cleveland Clinic in Ohio.

Each patient is represented through a number of demographic and anamnestic values, angina descriptive fields, and electrocardiographic measures (http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names).

In the dataset each patient condition is classified into 5 levels according to the severity of his/her heart disease. We simplified this classification system by transforming it into a binary class system: 1 means heart disease was diagnosed, 0 means no heart disease was found.

This is not the first time that we are running this experiment. In a not even that remote past, we built a Naïve Bayes KNIME model on the same data to solve the same problem. Today we want to build a logistic regression model and see if we get any improvements on the Naïve Bayes model performance.

Original patient data are stored in a Teradata database. The predictions from the old Naïve Bayes model are stored in a KNIME Table.

Teradata Aster is a proprietary database system that may be in use at your company/organization. It is designed to enable multi-genre advanced data transformation on massive amounts of data. If your company/organization is a Teradata Aster customer, you can obtain the JDBC driver that interfaces with KNIME by contacting your company’s/organization’s Teradata Aster account executive.

Table format is a KNIME proprietary format to store data efficiently, in terms of size and retrieval speed, and completely, i.e. also including their structure metadata. This leads to smaller local files, faster reading, and minimal configuration settings. In fact, the Table Reader node, which reads such Table files, only needs the file path and retrieves all other necessary information from the metadata saved in the file itself. Files saved in KNIME Table format carries an extension “.table”.

Teradata Aster on one side, KNIME Table formatted file on the other side. The question, as usual, is: Will they blend? Let’s find out.

Topic. Predicting heart disease. Is this chest pain innocuous or serious?

Challenge. Blend data from Teradata Aster system with data from a KNIME .table file. Build a predictive model to establish presence or absence of heart disease.

Access Mode. Database Connector node with Teradata JDBC driver to retrieve data from Teradata database. Table Reader node to read KNIME Table formatted files.

read more


Viewing all articles
Browse latest Browse all 561

Trending Articles