Monday, September 29, 2014

Typical phases of a Data Analytics project

 

a. Discovery

What is the question to be answered and does an adequate data set exist such that a meaningful analysis can be performed.

b. Data acquisition

Here you will be gathering and staging the data in your environment. You should always try to start with data that has not been processed in any way. That is to say, you should always start with raw data. If you cannot get the raw data, you need to understand how the data was transformed and/or processed.

c. Data Cleansing

Raw data will almost always need to be cleansed. Tidy Data best practices should always be applied.

d. Data Munging

Alter names of data variables, split data variables into two or more, reshape data, seek outliers and remove if necessary, etc.  During this phase, data is split into “training” and “test” data sets if applicable.

e. Data Exploration and Model Planning

Initial exploratory graphs, look for correlations in multivariate data, etc. This is the step where you begin to understand which types of models will be used to answer the questions asked of the data.

f. Data Analysis and Model Building

Training data is analyzed with model selections, results are recorded and patterns take shape.  Final models are then applied (when applicable) to the “test” data.

g. Interpretation and scrutinization of results

During this time you ask yourself whether or not the results make sense, whether or not the correct models were applied, whether or not the test data set was complete, “are these results valid.”  Always question the outcome of your analysis.

h. Analysis write-up

Write up a clear and methodical report of the sequence of events that culminated in the final analysis.  It should be noted that you do not need to include every single step of your analysis.  You need only include those steps that tell a complete, but compact story.   It is critical that your write-up is written for the intended audience.  If it is for non-technical people, ensure that the message is written in a language that is easily digestible by them.  If the audience is technical, such as data scientists, then ensure the details of your modeling are described.  Know your audience and write to them in a way that ensures your work is understood.

i. Operationalize

Produce your report, scripts, code, and supporting technical documentation.  Run a pilot experiment and implement your models in a production environment.

Source:

1 comment:

  1. Thank you for this article.
    The one is extremely useful for my work as I need to prepare a presentation related to cloud technology and it's prospects in business. I am sure virtual data room cost comparison in pair with your info will make my performance better.e

    ReplyDelete