Enroll Here: Analyzing Big Data in R using Apache Spark Cognitive Class Exam Quiz Answers
Analyzing Big Data in R using Apache Spark Cognitive Class Certification Answers
Module 1: Introduction to SparkR Quiz Answers – Cognitive Class
Question 1: What shells are available for running SparkR?
- Spark-shell
- SparkSQL shell
- SparkR shell
- RSpark shell
- None of the options is correct
Question 2: What is the entry point into SparkR?
- SRContext
- SparkContext
- RContext
- SQLContext
Question 3: When would you need to call sparkR.init?
- using the R shell
- using the SR-shell
- using the SparkR shell
- using the Spark-shell
Module 2: Data Manipulation in SparkR Quiz Answers – Cognitive Class
Question 1: dataframes make use of Spark RDDs
- False
- True
Question 2: You need read.df to create dataframes from data sources?
- True
- False
Question 3: What does the groupBy function output?
- An Aggregate Order object
- A Grouped Data object
- An Order By object
- A Group By object
Module 3: Machine Learning in SparkR Quiz Answers – Cognitive Class
Question 1: What is the goal of MLlib?
- Integration of machine learning into SparkSQL
- To make practical machine learning scalable and easy
- Visualization of Machine Learning in SparkR
- Provide a development workbench for machine learning
- All of the options are correct
Question 2: What would you use to create plots? check all that apply
- pandas
- Multiplot
- Ggplot2
- matplotlib
- all of the above are correct
Question 3: Spark MLlib is a module of Apache Spark
- False
- True
Analyzing Big Data in R using Apache Spark Final Exam Answers – Cognitive Class
Question 1: Which of these are NOT characteristics of Spark R?
- it supports distributed machine learning
- it provides a distributed data frame implementation
- is a cluster computing framework
- a light-weight front end to use Apache Spark from R
- None of the options is correct
Question 2: True or false? The client connection to the Spark execution environment is created by the shell for users using Spark:
- True
- False
Question 3: Which of the following are not features of Spark SQL?
- performs extra optimizations
- works with RDDs
- is a distributed SQL engine
- is a Spark module for structured data processing
- None of the options is correct
Question 4: True or false? Select returns a SparkR dataframe:
- False
- True
Question 5: SparkR defines the following aggregation functions:
- sumDistinct
- Sum
- count
- min
- All of the options are correct
Question 6: We can use SparkR sql function using the sqlContext as follows:
- head(sql(sqlContext, “SELECT * FROM cars WHERE cyl > 6”))
- SparkR:head(sql(sqlContext, “SELECT * FROM cars WHERE cyl > 6”))
- SparkR::head(sql(sqlContext, “SELECT * FROM cars WHERE cyl > 6”))
- SparkR(head(sql(sqlContext, “SELECT * FROM cars WHERE cyl > 6”)))
- None of the options is correct
Question 7: Which of the following are pipeline components?
- Transformers
- Estimators
- Pipeline
- Parameter
- All of the options are correct
Question 8: Which of the following is NOT one of the steps in implementing a GLM in SparkR:
- Evaluate the model
- Train the model
- Implement model
- Prepare and load data
- All of the options are correct
Question 9: True or false? Spark MLlib is a module SparkR to provide distributed machine learning algorithms.
- True
- False
Introduction to Analyzing Big Data in R using Apache Spark
Analyzing Big Data in R using Apache Spark typically involves leveraging the SparkR package, which provides an R API for Apache Spark. This allows R users to work with large-scale data processing and analytics on a Spark cluster. Here are the key steps to analyze Big Data in R using Apache Spark:
1. Install Apache Spark:
- Install Apache Spark on your cluster or local machine. Make sure to set up the necessary configurations.
2. Install SparkR:
- Install the SparkR package in R, which provides the R API for Apache Spark.
RCopy code
install.packages("SparkR")
3. Initialize SparkR:
- Load the SparkR library and initialize a Spark context in R.
RCopy code
library(SparkR) sparkR.session(master = "local[*]", appName = "SparkR Example")
4. Load Data into Spark DataFrame:
- Use SparkR to load your Big Data into a Spark DataFrame. Spark DataFrames are distributed collections of data organized into named columns.
RCopy code
# Example: Load data from a CSV file into a Spark DataFrame df <- read.df("path/to/your/data.csv", source = "csv", header = "true", inferSchema = "true")
5. Data Exploration and Transformation:
- Use SparkR functions for data exploration and transformation. SparkR provides a variety of functions similar to those in base R for data manipulation.
6. Data Analysis and Machine Learning:
- Utilize SparkR’s machine learning functions for analysis. SparkR supports various machine learning algorithms and tools for regression, classification, clustering, and more.
7. Aggregation and Summary Statistics:
- Perform aggregation and compute summary statistics on your data using SparkR.
8. Visualizations:
- Generate visualizations using R’s plotting libraries or export data to be visualized in other tools.
9. Optimization and Parallel Processing:
- Leverage Spark’s distributed and parallel processing capabilities. SparkR automatically distributes computations across the Spark cluster.
10. Integration with Spark Ecosystem:
- SparkR integrates seamlessly with other Spark components, such as Spark SQL, Spark Streaming, and MLlib. This allows you to leverage the full power of the Spark ecosystem.
11. Model Deployment:
- After building and evaluating models, deploy them for production use within your Spark environment.