Apache Pig 101 Cognitive Class Exam Quiz Answers

Clear My Certification January 12, 2024 Cognitive Class Leave a comment 220 Views

Enroll Here: Apache Pig 101 Cognitive Class Exam Quiz Answers

Apache Pig 101 Cognitive Class Certification Answers

Module 1: Pig Basics Quiz Answers – Cognitive Class

Question 1: What are the five ways to invoke Pig?

Script, Interactive Mode, Java Command, Interactive Local Mode, Interactive MapReduce Mode
Interactive External Mode, Interactive Mode, Script, Java Command, Interactive MapReduce Mode
Interactive Service Mode, Interactive Local Mode, Interactive External Mode, Interactive MapReduce Mode, Java Command
Interactive Local Mode, Interactive MapReduce Mode, Interactive External Mode, Interactive Mode, Script

Question 2: Bags are groups of tuples, tuples are groups of fields, and fields are composed of scalar data types. True or false?

True
False

Question 3: Which of the following statements is true?

Names of relations and fields, as well as keywords and operators, are case sensitive. However, function names are case insensitive.
Keywords and operator names are case sensitive.
Function names are case sensitive.
Names of relations are case sensitive, but names of fields are case insensitive.

Module 2: Pig Relational Operators Quiz Answers – Cognitive Class

Question 1: For the tuples (3,5,2) (5,2,1) (3,7,3) (3,6,1), using the GROUP operator on the third field produces the following: (2,{(3,5,2)}), (1,{(5,2,1),(3,6,1)}), (3,{(3,7,3)}). True or false? Disregard order when answering.

True
False

Question 2: UNION, GROUP, and COGROUP can be used interchangeably without creating different outputs. True or false?

True
False

Question 3: Which operators can be used within a nested FOREACH block?

LIKE, COUNT, LIMIT, ORDER BY
COUNT, ORDER BY, AVG, DISTINCT
AVG, LIMIT, FILTER, LIKE
LIMIT, DISTINCT, ORDER BY, FILTER

Module 3: Pig Evolution Function Quiz Answers – Cognitive Class

Question 1: The COUNT operator does NOT require the use of the GROUP BY operator. True or false?

True
False

Question 2: The TOKENIZE() function splits a string and outputs a bag of words. True or false?

True
False

Question 3: The two types of UDFs are DEFINE and REGISTER. True or false?

True
False

Apache Pig 101 Final Exam Answers – Cognitive Class

Question 1: What is the primary purpose of Pig in the Hadoop architecture?

To provide logging support for Hadoop jobs
To support the execution of workflows consisting of a collection of actions
To provide a high-level programming language so that developers can simplify the task of writing MapReduce applications
To move data into HDFS

Question 2: When executing Pig in local mode, the process runs locally, but all of the data files are accessed via HDFS. True or false?

True
False

Question 3: Data can be loaded into Pig with or without defining a schema. True or false?

True
False

Question 4: In Pig, you can specify the delimiter used to load data by

doing nothing. Pig can automatically detect the delimiter used in your data file
adding a schema definition to your LOAD statement
adding ‘using PigStorage(delimiter)’ to your LOAD statement
All of the above

Question 5: Which of the following can be used to pass parameters into a Pig Script? Select all that apply.

Command line parameters
A parameter files
JSON
Web Services

Question 6: Which Pig Operator is used to save data into a file?

SAVE
LOAD
STORE
DUMP

Question 7: In Pig, all tuples in a relation must have the same number of fields. True or false?

True
False

Question 8: Which Pig relational operator is used to select tuples from a relation based on some criteria?

transform
filter
group
order by

Question 9: Which Pig relational operator is used to combine all the tuples in a relation that have the same key?

union
transform
filter
group
join

Question 10: Which Pig relational operator is used to combine two or more relations using one or more common field values?

union
transform
filter
group
join

Question 11: The Pig Tokenize evaluation operator splits a string and outputs a bag of words. True or false?

True
False

Question 12: When using the Pig Count evaluation operator, you must also use either the Group All or the Group By operator. True or false?

True
False

Question 13: Which of the following Pig operators can be used to review the logical, physical, and MapReduce execution plans?

Verbose
Dump
Store
Explain

Question 14: Which of the following is a valid Pig evaluation operator?

isempty
count_star
diff
count
All of the Above

Question 15: You can extend Pig via user defined functions. True or false?

True
False

Introduction to Apache Pig 101

Apache Pig is a high-level platform and scripting language built on top of Hadoop for processing and analyzing large datasets. It simplifies the development of complex data transformations using a language called Pig Latin. Pig is part of the Apache Hadoop project and is particularly useful for ETL (Extract, Transform, Load) tasks.

Here are some key concepts and features of Apache Pig:

1. Pig Latin:

Pig Latin: This is the language used to express data transformations in Apache Pig. It is a high-level scripting language that provides a simple and easy-to-understand syntax for writing data processing workflows.

2. Data Flow Language:

Pig operates on the principle of data flow. Users express transformations using a series of operations, and Pig automatically translates these operations into a series of MapReduce jobs that can be executed on a Hadoop cluster.

3. Abstraction over MapReduce:

Pig abstracts away the complexity of writing low-level MapReduce code. Users can focus on expressing the data transformations they need without dealing directly with the intricacies of distributed computing.

4. Load and Store Functions:

Pig provides a set of built-in functions for loading data into a Pig script (LOAD) and storing the results back to HDFS or other storage systems (STORE).

5. Transformations:

Pig supports a wide range of transformations, including filtering (FILTER), grouping (GROUP), joining (JOIN), and aggregating (SUM, AVG). These operations make it easy to express complex data processing tasks.

6. User-Defined Functions (UDFs):

Users can extend Pig by writing their own User-Defined Functions (UDFs) in languages like Java or Python. UDFs allow for custom processing and integration with external libraries.

7. Schema On Read:

Pig uses a “schema on read” approach, meaning that data is interpreted and structured as it is loaded, allowing flexibility in handling various data formats.

8. Multi-Query Execution:

Pig supports multi-query execution, allowing users to define a sequence of data transformations and execute them as a single job, which can improve performance.

9. Optimization:

Pig has built-in optimization techniques to improve the performance of data processing tasks. It optimizes the execution plan and reduces the number of MapReduce jobs required.

Apache Pig is a powerful tool for expressing complex data transformations and processing large-scale datasets in a distributed environment. It abstracts away the complexities of MapReduce, making it accessible to users with less experience in distributed computing.