Note-2 Feature Engineering

What’s Feature Engineering

In the application of machine learning or the field of data science, to achieve better performance on prediction or classification, we should not only choose the most suitable algorithm/model, but also we should use the suitable features.

Definition in wiki:

1
Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work.

In a word, the feature engineering is to manually design what the input x should be and make our models work successfully.

Importance

The features choice are important for our task.

  1. Better features make model have more flexibility.

  2. Suitable features can use simple models

  3. Achieve better performance

Sub-questions of Feature Engineering

There are main three kinds of tasks in the feature engineering:

  1. Feature Construction
    • Given a problem and raw data set, to construct the features using domain knowledge, is what I called feature construction. In this process, we should analyse our problem and convert it into mathematics problem, and come up with ideas what data we need and how to tackle this problem.
  2. Feature Extraction
    • Extract the features from data set. Such as, in the document filtering or clustering task, to constuct the document/word vector, we use TF-IDF method to extract the information behind the documents. Another example in the CNN application, the kernels/filters in convolution layers are used to extract the features of images.
  3. Feature Selection
    • Choose the most suitable features and feed them into our models. Ignore the non-relational features.

These three tasks sometimes will overlap and make people confused. They are basicall the good ways for me to understand, you can choose what your think to make yourself have a better understanding.

How to do?

A data science pipeline is basicall followed like this:

  1. given task and understand it
  2. choose data set
  3. pre-process the data set
  4. feature engineering(extract features)
  5. model data
  6. analyse and evaluate

Feature Engineering is a part of work in our data science project.

There are some ways to do features engineering:

  • Brain storm: To come up the ideas of features which maybe useful for our project
  • Design features
  • Choose features

(… TO BE CONTINUE)

Reference:

  1. image and content of ideas from this blog