Machine Learning – Dimensionality Reduction Cognitive Class Exam Quiz Answers

Clear My Certification January 12, 2024 Cognitive Class Leave a comment 119 Views

Enroll Here: Machine Learning – Dimensionality Reduction Cognitive Class Exam Quiz Answers

Machine Learning – Dimensionality Reduction Cognitive Class Certification Answers

Module 1: Data Series Quiz Answers – Cognitive Class

Question 1: Which of the following techniques can be used to reduce the dimensions of the population?

Exploratory Data Analysis
Principal Component Analysis
Exploratory Factor Analysis
Cluster Analysis

Question 2: Cluster Analysis partitions the columns of the data, whereas principal component and exploratory factor analyses partition the rows of the data. True or false?

False
True

Question 3: Which of the following options are true? Select all that apply.

PCA explains the total variance
EFA explains the common variance
EFA identifies measures that are sufficiently similar to each other to justify combination
PCA captures latent constructs that are assumed to cause variance

Module 2: Data Refinement Quiz Answers – Cognitive Class

Question 1: Which of the following options is true?

A matrix of correlations describes all possible pairwise relationships
Eigenvalues are the principal components
Correlation does not explain the covariation between two vectors
Eigenvectors are a measure of total variance, as explained by the principal components

Question 2: PCA is a method to reduce your data to the fewest ‘principal components’ while maximizing the variance explained. True or false?

False
True

Question 3: Which of the following techniques was NOT covered in this lesson?

Parallel analysis
Percentage of Common Variance
Scree Test
Kaiser-Guttman Rule

Module 3: Exploring Data Quiz Answers – Cognitive Class

Question 1: EFA is commonly used in which of the following applications? Select all that apply.

Customer satisfaction surveys
Personality tests
Performance evaluations
Image analysis

Question 2: Which of the following options is an example of an Oblique Rotation?

Regmax
Varimax
Softmax
Promax

Question 3: An Orthogonal Rotation assumes that factors are correlated with each other. True or false?

False
True

Machine Learning – Dimensionality Reduction Final Exam Answers – Cognitive Class

Question 1: Why might you use cluster analysis as an analytic strategy?

To identify higher-order dimensions
To identify outliers
To reduce the number of variables
To segment the market
None of the above

Question 2: Suppose you have 100,000 individuals in a dataset, and each individual varies along 60 dimensions. On average, the dimensions are correlated at r = .45. You want to group the variables together, so you decide to run principle component analysis. How many meaningful, higher-order components can you extract?

60
3
20
24
The answer cannot be determined

Question 3: What technique should you use to identify the dimensions that hang together?

Principal axis factoring
Confirmatory factor analysis
Exploratory factor analysis
Two of the above
None of the above

Question 4: What are loadings?

Covariance between the two factors
Correlations between each variable and its factor
Correlations between each variable and its component
Two of the above
None of the above

Question 5: When would you use PCA over EFA?

When you want to use an orthogonal rotation
When you are interested in explaining the total variance in a variance-covariance matrix
When you have too many variables
When you are interested in a latent construct
None of the above

Question 6: What is uniqueness?

A measure of replicability of the factor
The amount of variance not explained by the factor structure
The amount of variance explained by the factor structure
The amount of variance explained by the factor
None of the above

Question 7: Suppose you are looking to extract the major dimensions of a parrot’s personality. Which technique would you use?

Maximum likelihood
Principal component analysis
Cluster analysis
Factor analysis
None of the above

Question 8: Suppose you have 60 variables in a dataset, and you know that 2 components explain the data very well. How many components can you extract?

45
5
60
2
None of the above

Question 9: When would you use an orthogonal rotation?

When correlations between the variables are large
When you observe small correlations between the variables in the dataset
When you think that the factors are uncorrelated
All of the above
None of the above

Question 10: When would you use confirmatory factor analysis?

When you want to validate the factor solution
When you want to explain the variance in the matrix accounting for the measurement error
When you want to identify the factors
Two of the above
None of the above

Question 11: Which of the following is NOT a rule when deciding on the number of factors?

Newman-Frank Test
Percentage of common variance explained
Scree test
Kaiser-Guttman
None of the above

Question 12: What is one assumption of factor analysis?

A number of factors can be determined via the Scree test
Factor analysis will extract only unique factors
A latent variable causes the variance in observed variables
There is no measurement error
None of the above

Question 13: What is an eigenvector?

The proportion of the variance explained in the matrix
A higher-order dimension that subsumes all of the lower-order errors
A higher-order dimension that subsumes similar lower-order dimensions
A higher-order dimension that subsumes all lower-order dimensions
None of the above

Question 14: What is a promax rotation?

A rotation method that minimizes the square loadings on each factor
A rotation method that maximizes the variance explained
A rotation method that maximizes the square loadings on each factor
A rotation method that minimizes the variance explained
None of the above

Question 15: What is the cut-off point for the Common Variance Explained rule?

80% of variance explained
50% of variance explained
3 variables
1 unit
None of the above

Question 16: Why would you try to reduce dimensions?

Individuals need to be placed into groups
Variables are highly-correlated
Many variables are likely assessing the same thing
Two of the above
All of the above

Question 17: If you have 20 variables in a dataset, how many dimensions are there?

At most 20
At least 20
As many as the number of factors you can extract
Not enough information
None of the above

Question 18: What term describes the amount of variance of each variable explained by the factor structure?

Eigenvector
Commonality
Similarity
Communality
None of the above

Question 19: What package contains the necessary functions to perform PCA and EFA?

ggplot2
FA
psych
factAnalis
None of the above

Question 20: What is the best method for identifying the number of factors to extract?

Parallel Analysis
Scree test
Newman-Frank Test
Percentage of common variance explained
All of the above

Introduction to Machine Learning – Dimensionality Reduction

Dimensionality reduction is a technique in machine learning and statistics that involves reducing the number of input variables or features in a dataset. The goal is to simplify the dataset while retaining its essential information. This can be particularly useful in scenarios where the original dataset has a large number of features, potentially leading to increased computational complexity, the curse of dimensionality, and overfitting. Two common methods for dimensionality reduction are Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE).

Principal Component Analysis (PCA):

Overview:
- PCA is a linear technique that transforms the original features into a new set of uncorrelated features called principal components.
- The first principal component explains the maximum variance in the data, followed by the second, and so on.
Steps:
- Standardize the data (subtract the mean and divide by the standard deviation).
- Compute the covariance matrix of the standardized data.
- Calculate the eigenvectors and eigenvalues of the covariance matrix.
- Sort the eigenvalues in descending order and choose the top k eigenvectors, forming the new feature space.
- Project the original data into the new feature space.
Use Cases:
- Dimensionality reduction for visualization.
- Feature engineering to reduce the number of features while retaining most of the information.
- Noise reduction.

t-Distributed Stochastic Neighbor Embedding (t-SNE):

Overview:
- t-SNE is a non-linear technique for dimensionality reduction that focuses on preserving the pairwise similarities between data points.
- It is particularly effective at revealing the local structure of the data.
Steps:
- Define pairwise similarities between data points in the high-dimensional space.
- Construct a probability distribution over pairs of high-dimensional points that is similar to the pairwise similarities.
- Repeat the process in a low-dimensional space.
- Minimize the divergence between the high-dimensional and low-dimensional probability distributions.
Use Cases:
- Visualization of high-dimensional data in two or three dimensions.
- Clustering analysis to identify groups of similar data points.
- Exploration of the local structure of the data.

Considerations for Dimensionality Reduction:

Loss of Information:
- Dimensionality reduction involves a trade-off between simplifying the dataset and losing some information. It’s important to assess the impact on model performance.
Choice of Method:
- The choice between linear methods like PCA and non-linear methods like t-SNE depends on the nature of the data and the goals of dimensionality reduction.
Parameter Tuning:
- Some methods, like t-SNE, have hyperparameters that need to be tuned. Experimentation and validation are essential for finding the optimal settings.
Data Scaling:
- Scaling or standardizing the data is often crucial, especially for methods like PCA, which are sensitive to the scale of the features.
Application to Specific Problems:
- Different dimensionality reduction techniques may be more suitable for specific types of problems. Understanding the characteristics of your data and the requirements of your task is essential.

In summary, dimensionality reduction is a valuable technique in machine learning for handling high-dimensional datasets. The choice of method depends on the nature of the data, the desired outcome, and computational considerations. It’s important to experiment with different techniques and evaluate their impact on model performance in the context of your specific problem.