Data Science with Scala Cognitive Class Exam Quiz Answers

Clear My Certification January 15, 2024 Cognitive Class Leave a comment 551 Views

Enroll Here: Data Science with Scala Cognitive Class Exam Quiz Answers

Data Science with Scala Cognitive Class Certification Answers

Module 1 – Basic Statistics and Data Types Quiz Answers – Cognitive Class

Question 1: You import MLlib’s vectors from?

org.apache.spark.mllib.TF
org.apache.spark.mllib.numpy
org.apache.spark.mllib.linalg
org.apache.spark.mllib.pandas

Question 2: Select the types of distributed Matrices:

Row Matrix
Indexed Row Matrix
Coordinate Matrix

Question 3: How would you caculate the mean of the following?

val observations: RDD[Vector] = sc.parallelize(Array(

Vectors.dense(1.0, 2.0),

Vectors.dense(4.0, 5.0),

Vectors.dense(7.0, 8.0)))

val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)

summary.normL1
summary.numNonzeros
summary.mean
summary.normL2

Question 4: what task does the following lines of code?

import org.apache.spark.mllib.random.RandomRDDs._

val million = poissonRDD(sc, mean=1.0, size=1000000L, numPartitions=10)

calculate the variance
calculate the mean
generate random samples
Calculate the variance

Question 5: MLlib uses the compressed sparse column format for sparse matrices, as Such it only keeps the non-zero entrees?

True
False

Module 2 – Preparing Data Quiz Answers – Cognitive Class

Question 1: WFor a dataframe object the method describe calculates the?

count
mean
standard deviation
ma
min
all of the above

Question 2: What line of code drops the rows that contain null values, select the best answer?

val dfnan = df.withColumn(“nanUniform”, halfTonNaN(df(“uniform”)))
dfnan.na.replace(“uniform”, Map(Double.NaN -> 0.0))
dfnan.na.drop(minNonNulls = 3)
dfnan.na.fill(0.0)

Question 3: What task does the following lines of code perform?

val lr = new LogisticRegression()

lr.setMaxIter(10).setRegParam(0.01)

val model1 = lr.fit(training)

perform one hot encoding
Train a linear regression model
Train a Logistic regression model
Perform PCA on the data

Question 4: The StandardScaleModel transforms the data such that?

each feature has a max value of 1
each feature is Orthogonal
each feature to have a unit standard deviation and zero mean
each feature has a min value of -1

Module 3 – Feature Engineering Quiz Answers – Cognitive Class

Question 1: Spark ML works with?

tensors
vectors
dataframes
lists

Question 2: the function IndexToString() performs One hot encoding?

True
False

Question 3: Principal Component Analysis is Primarily used for?

to convert categorical variables to integers
to predict discrete values
dimensionality reduction

Question 4: one import set prior to using PCA is?

normalizing your data
making sure every feature is not correlated
taking the log for your data
subtracting the mean

Module 4 – Fitting a Model Quiz Answers – Cognitive Class

Question 1: You can use decision trees for?

regression
classification
classification and regression
data normalization

Question 2: the following lines of code: val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))

split the data into training and testing data
train the model
use 70% of the data for testing
use 30% of the data for training
make a prediction

Question 3: in the Random Forest Classifier constructor .setNumTrees()?

sets the max depth of trees
sets the minimum number of classes before a split
set the number of trees

Question 4: Elastic net regularization uses?

L0-norm
L1-norm
L2-norm
a convex combination of the L1 norm and L2 norm

Module 5 – Pipeline and Grid Search Quiz Answers – Cognitive Class

Question 1: what task does the following code perform: withColumn(“paperscore”, data(“A2”) * 4 + data(“A”) * 3)?

add 4 colunms to A2
add 3 colunms to A1
add 4 to each elment in colunm A2
assign a higher weight to A2 and A journals

Question 2: In an estimator?

there is no need to call the method fit
fit function is called
transform function is only called

Question 3: Which is not a valid type of Evaluator in MLlib?

Regression Evaluator
Multi-Class Classification Evaluator
Multi-Label Classification Evaluator
Binary Classification Evaluator
All are valid

Question 4: In the following lines of code, the last transform in the pipeline is a:

val rf = new RandomForestClassifier().setFeaturesCol(“assembled”).setLabelCol(“status”).setSeed(42)

import org.apache.spark.ml.Pipeline

val pipeline = new Pipeline().setStages(Array(value_band_indexer,category_indexer,label_indexer,assembler,rf))

principal component analysis
Vector Assembler
String Indexer
Vector Assembler
Random Forest Classifier

Data Science with Scala Final Exam Answers – Cognitive Class

Question 1: What is not true about labeled points?

They associate dense vectors with a corresponding label/response
They associate sparse vectors with a corresponding label/response
They are used in unsupervised machine learning algorithms
All are true
None are true

Question 2: Which is true about column pointers in sparse matrices?

They have the same number of values as the number of columns
They never repeat values
By themselves, they do not represent the specific physical location of a value in the matrix
All are true
None are true

Question 3: What is the name of the most basic type of distributed matrix?

Coordinate Matrix
Indexed Row Matrix
Simple Matrix
Row Matrix
Sparse Matrix

Question 4: A perfect correlation is represented by what value?

100
3
1
0
-1

Question 5: A MinMaxScaler is a transformer which:

Rescales each feature to a specific range
Takes no parameters
Makes zero values remain untransformed
All are true
None are true

Question 6: Which is not a supported Random Data Generation distribution?

Exponential
Uniform
Delta
Normal
Poisson

Question 7: Sampling without replacement means:

The expected size of the sample is the same as the RDDs size
The expected number of times each element is chosen is randomized
The expected size of the sample is unknown
The expected size of the sample is a fraction of the RDDs size
The expected number of times each element is chosen

Question 8: What are the supported types of hypothesis testing?

Kolmogorov-Smirnov test for equality of distribution
Pearson’s Chi-Squared Test for goodness of fit
Pearson’s Chi-Squared Test for independence
All are supported
None are supported

Question 9: For Kernel Density Estimation, which kernel is supported by Spark?

KDEMultivariate
KDEUnivariate
KernelDensity
Gaussian
All are supported

Question 10: Which DataFrames statistics method computes the pairwise frequency table of the given columns?

freqItems()
crosstab()
cov()
pairwiseFreq()
corr()

Question 11: Which is not true about the fill method for DataFrame NA functions?

It is used for replacing null values
It is used for replacing nil values
It is used for replacing NaN values
All are true
None are true

Question 12: Which transformer listed below is used for Natural Language Processing?

OneHotEncoder
ElementwiseProduct
Normalizer
StandardScaler
None are used for Natural Language Processing

Question 13: Which is true about the Mahalanobis Distance?

It is a scale-variant distance
It is a multi-dimensional generalization of measuring how many standard deviations a point is away from the median
It is measured along each Principle Component axis
It has units of distance
It does not take into account the correlations of the dataset

Question 14: Which is true about OneHotEncoder?

It creates a Sparse Vector
It must be told which column to create for its output
It must be told which column is its input
All are true
None are true

Question 15: Principle Component Analysis is:

A dimension reduction technique
Is never used for feature engineering
Used for supervised machine learning
All are true
None are true

Question 16: MLlib’s implementation of decision trees:

Partitions data by rows, allowing distributed training
Supports only multiclass classification
Does not support regressions
Supports only continuous features
None are true

Question 17: Which is not a tunable of SparkML decision trees?

maxMemoryInMB
minInfoGain
minDepth
maxBins
minInstancesPerNode

Question 18: Which is true about Random Forests?

They support non-categorical features
They combine many decision trees in order to reduce the risk of overfitting
They do not support regression
They only support binary classification
None are true

Question 19: When comparing Random Forest versus Gradient-Based Trees, what must you consider?

Parallelization abilities
Depth of Trees
How the number of trees affects the outcome
All of these
None of these

Question 20: Which is not a valid type of Evaluator in MLlib?

Multi-Class Classification Evaluator
Binary Classification Evaluator
Regression Evaluator
Multi-Label Classification Evaluator
All are valid

Introduction to Data Science with Scala

Scala is a versatile programming language that runs on the Java Virtual Machine (JVM), making it compatible with existing Java libraries and frameworks. When it comes to data science, Scala can be a powerful choice, especially when combined with popular data processing and machine learning libraries. Here are some key aspects of doing data science with Scala:

Apache Spark:
- Scala is the primary language for Apache Spark, a fast and distributed data processing engine. Spark provides APIs in Scala, Java, Python, and R, but Scala’s concise syntax and functional programming features make it a popular choice for Spark development.
- Spark allows you to perform distributed data processing, machine learning, graph processing, and more. You can use Spark’s DataFrame API or the lower-level RDD API for data manipulation.
Scala Libraries for Data Science:
- Breeze: Breeze is a numerical processing library for Scala that provides support for linear algebra, statistics, and signal processing. It can be used for various data manipulation and analysis tasks.
- Smile: Smile is a machine learning library for Scala and Java. It includes a variety of algorithms for classification, regression, clustering, and dimensionality reduction.
- Algebird: Algebird is a library for abstract algebra for Scala, and it can be useful for creating scalable and efficient data processing pipelines.
DataFrames and Datasets:
- In addition to Spark, libraries like Apache Flink also support Scala for distributed stream and batch processing. Flink’s Table API and SQL support can make it easy to work with data using familiar SQL-like queries.
Deep Learning with Scala:
- Libraries like Deeplearning4j provide support for deep learning in Scala. Deeplearning4j is designed for Java and Scala compatibility, making it suitable for JVM-based languages.
Interactive Data Analysis:
- Scala can be used interactively for data analysis using tools like Jupyter notebooks with the Scala kernel or Zeppelin notebooks. These notebooks allow you to combine code, visualizations, and narrative text in a single document.
Data Visualization:
- For data visualization, you can use Scala libraries like Breeze-viz or integrate with popular Java visualization libraries. Alternatively, you can export data to be visualized in other tools like Python-based Matplotlib or Plotly.
Data Management and Processing:
- Scala can be used for ETL (Extract, Transform, Load) tasks and data processing. Apache Kafka, a distributed streaming platform, has a Scala API that can be used for building real-time data pipelines.
Concurrency and Parallelism:
- Scala’s support for functional programming and its Actor model for concurrency make it well-suited for handling parallel processing tasks in data science.

When working with Scala for data science, it’s essential to leverage its functional programming features, which can lead to more concise and expressive code. Additionally, familiarity with the Apache Spark ecosystem is crucial, as it is a prevalent framework for large-scale data processing and analytics using Scala.

Clear My Certification All Certification Exam Answers

Data Science with Scala Cognitive Class Exam Quiz Answers

Related Articles

Enroll Here: Data Science with Scala Cognitive Class Exam Quiz Answers

Data Science with Scala Cognitive Class Certification Answers

Module 1 – Basic Statistics and Data Types Quiz Answers – Cognitive Class

Module 2 – Preparing Data Quiz Answers – Cognitive Class

Module 3 – Feature Engineering Quiz Answers – Cognitive Class

Module 4 – Fitting a Model Quiz Answers – Cognitive Class

Module 5 – Pipeline and Grid Search Quiz Answers – Cognitive Class

Data Science with Scala Final Exam Answers – Cognitive Class

Introduction to Data Science with Scala

About Clear My Certification

Check Also

Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers

Leave a Reply Cancel reply

Data Analysis with Python Cognitive Class Exam Quiz Answers

Docker Essentials: A Developer Introduction Cognitive Class Exam Quiz Answers

SQL and Relational Databases 101 Cognitive Class Exam Quiz Answers

Prompt Engineering for Everyone Cognitive Class Exam Quiz Answers

Data Science Methodology Cognitive Class Exam Quiz Answers

Tipico Casino Freispiele Ohne Einzahlung

Merkur Spiele Casino

Mit Roulette Reich Werden

Winwin Casino 50 Free Spins

Kenozahlen Von

Smokace Casino 50 Freispiele

Beste Igt Slot

Tricks Bei Spielothek

Tipico Casino Bonus Ohne Einzahlung

Bingo Online Spielen Echtgeld