Data Analytics
Analytics is the science of analysis where statistics, data mining, computer technology, etc.are used for analysis.Analysis is the process of breaking down a complex object or data into simpler forms or more compact or better data for understanding.
Aims at discovering patterns of variation from the given data.
It helps to understand the future from past data and uncertainty related to business.
It's a sophisticated process that uses statistics, mathematics and economics models to predict the future and prescribe strategies.
The processes include:
- Gather Data
- Organize Data
- Analyze Data
Stages of Analytics:
- Descriptive Analytics (Information) ~ How many students dropped out last year?
- Diagnostic Analytics (Insight) ~ Why has the drop-out rate increased in the last one year?
- Predictive Analytics (Insight) ~ Which students are more likely to drop-out?
- Prescriptive Analytics (Decision) ~ Which student should I target to keep from dropping out?
Popular Tools used in Analytics
- R
- Revolution R
- R Studio
- Tableau
- SAP HANA
- Weka
- KXEN
- SAS
Role of a data scientist:
- Inquisitive, can look at data and spot trends.
- Come out with unrevealed stories hidden in data that helps in creating more useful insights and help solving business problems.
- Work in sync with application developer to grant relevant data for analysis.
- Make an analytical plan in such a way that the results satisfy the business needs.
- Come up with an effective data mining architecture and prepare suitable models
- Respond to and resolve data mining performance issues
- Generate reports that are affordable from business perspective
Data Analytics Methodology
- Discovery
- Data Preparing
- Model Planning
- Deliver Results
- Put into use
Problem Definition:
- What is the problem?
- What it is not?
- We have this problem because?
- We don't have a solution because?
Defining a problem:
- State the problem in a general way.
- Understand the nature of the problem
- Survey the available literature
- Go for discussions for developing ideas
- Rephrase the problem into a working proposition.
Types of Data
- Qualitative Data
- Data expressed as groups or categories
- Descriptive Data
- e.g. Dividing a population into high medium and low height groups.
- Quantitative Data
- Data expressed as numbers
- Definitive Data
- e.g.The height of a person
Summarizing Data
- Summarizing is the process of converting huge amounts of raw data into a format that can be easily analyzed.
- Summaries differ on type of data; and can be descriptive or graphical
- Numeric Data - Descriptive
- Mean
- Median
- Mode
- Numeric Data - Graphical
- Box Plots
- Categorical Data - Descriptive
- Frequency distribution tables
- Categorical Data - Graphical
- Bar Charts
- Histogram
Data Collection
- Collect Relevant Data:
Process of collecting relevant data that aids in solving the problem statement - Categorize the Data:
Data Collection process needs to be defined and systematic. - Organize the Data:
Observations need to be recorded and organized for optimal usefulness.
Data Collection Methods
Data Collection Methods fall broadly into two categories Primary and Secondary
- Primary
- Observation - Measuring the data and various attributes
- Experiment - Subjects are divided into groups
- Surveys - Questions and Interviews help in reporting feedback and help is studying characteristics of the population.
- Secondary
- Data which has already been gathered before the study and is available as already published facts and reports.
Data Dictionary
It includes details like:
- Number of records
- Name of each field
- Characteristics of each field
- Description of each field
- Relationship between different fields
It helps in analyzing different data variables and their relationship between each other.
Outliers and their treatment
- Outliers is a point or an observation that deviates significantly from the other observations.
- Occurs due to experimental errors or "special circumstances"
- Outlier detection tests to check for outliers
- Outlier treatment
- Retention
- Exclusion
- Other treatment methods
No comments:
Post a Comment