Enroll Here: Moving Data into Hadoop Cognitive Class Exam Quiz Answers
Moving Data into Hadoop Cognitive Class Certification Answers
Module 1 – Lord Scenario Quiz Answers – Cognitive Class
Question 1: What is Data at rest?
- Data that is being transferred over
- Data that is already in a file in some directory
- Data that hasn’t been used in a while
- Data that needs to be copied over
Question 2: Data can be moved using BigSQL Load. True or false?
- True
- False
Question 3: Which of the following does not relate to Flume?
- Pipe
- Sink
- Interceptors
- Source
Module 2 – Using Sqoop Quiz Answers – Cognitive Class
Question 1: Sqoop is designed to
- export data from HDFS to streaming software
- read and understand data from a relational database at a high level
- prevent “bad” data in a relational database from going into Hadoop
- transfer data between relational database systems and Hadoop
Question 2: Which of the following is NOT an argument for Sqoop?
- –update-key
- –split-from
- –target-dir
- –connect
Question 3: By default, Sqoop assumes that it’s working with space-separated fields and that each record is terminated by a newline. True or false?
- True
- False
Module 3 – Flume Overview Quiz Answers – Cognitive Class
Question 1: Avro is a remote procedure call and serialization framework, developed within a separate Apache project. True or false?
- True
- False
Question 2: Data sent through Flume
- may have different batching but must be in a constant stream
- may have different batching or a different reliability setup
- must be in a particular format
- has to be in a constant stream
Question 3: A single Avro source can receive data from multiple Avro sinks. True or false?
- True
- False
Module 4 – Using Flume Quiz Answers – Cognitive Class
Question 1: Which of the following is NOT a supplied Interceptor?
- Regex extractor
- Regex sinker
- HostType
- Static
Question 2: Channels are:
- where the data is staged after having been read in by a source and not yet written out by a sink
- where the data is staged after having been read in by a sink and not yet written out by a source
- where the data is staged after having been written in by a source and not yet read out by a sink
- where the data is staged after having been written in by a sink and not yet written out by a source
Question 3: One property for sources is selector.type? True or false?
- True
- False
Moving Data into Hadoop Final Exam Answers – Cognitive Class
Question 1: The HDFS copyFromLocal command can be used to
- capture streaming data that you want to store in Hadoop
- ensure that log files which are actively being used to capture logging from a web server are moved into Hadoop
- move data from a relational database or data warehouse into Hadoop
- None of the above
Question 2: What is the primary purpose of Sqoop in the Hadoop architecture?
- To “catch” logging data as it is written to log files and move it into Hadoop
- To schedule scripts that can be run periodically to collect data into Hadoop
- To import data from a relational database or data warehouse into Hadoop
- To move static files from the local file system into HDFS
- To stream data into Hadoop
Question 3: A Sqoop JDBC connection string must include
- the name of the database you wish to connect to
- the hostname of the database server
- the port that the database server is listening on
- the name of the JDBC driver to use for the connection
- All of the above
Question 4: Sqoop can be used to either import data from relational tables into Hadoop or export data from Hadoop to relational tables. True or false?
- True
- False
Question 5: When importing data via Sqoop, the imported data can include
- a collection of data from multiple tables via a join operation, as specified by a SQL query
- specific rows and columns from a specific table
- all of the data from a specific table
- All of the Above
Question 6: When importing data via Sqoop, the incoming data can be stored as
- Serialized Objects
- JSON
- XML
- None of the Above
Question 7: Sqoop uses MapReduce jobs to import and export data, and you can configure the number of Mappers used. True or false?
- True
- False
Question 8: What is the primary purpose of Flume in the Hadoop architecture?
- To “catch” logging data as it is written to log files and move it into Hadoop
- To schedule scripts that can be run periodically to collect data into Hadoop
- To import data from a relational database or data warehouse into Hadoop
- To move static files from the local file system into HDFS
- To stream data into Hadoop
Question 9: When you create the configuration file for a Flume agent, you must configure
- an Interceptor
- a Sink
- a Channel
- a Source
- All of the above
Question 10: When using Flume, a Source and a Sink are “wired together” using an Interceptor. True or false?
- True
- False
Question 11: Flume agents can run on multiple servers in the enterprise, and they can communicate with each other over the network to move data. True or false?
- True
- False
Question 12: Possible Flume channels include
- The implementation of your own channel
- File Storage
- Database Storage
- In Memory
- All of the Above
Question 13: Flume provides a number of source types including
- Elastic Search
- HBase
- Hive
- HDFS
- None of the Above
Question 14: Flume agent configuration is specified using
- CSV
- a text file, similar to the Java.properties format
- JSON
- XML, similar to Sqoop configuration
Question 15: To pass data from a Flume agent on one node to another, you can configure an Avro sink on the first node and an Avro source on the second. True or false?
- True
- False
Introduction to Moving Data into Hadoop
Moving data into Hadoop typically involves several steps, and the process may vary depending on the specific requirements and tools used in your environment. Hadoop is a distributed storage and processing framework that can handle large volumes of data across a cluster of commodity hardware. Here is a general guide on how data is typically moved into Hadoop:
- Understand the Data Requirements:
- Identify the data sources: Determine where your data is currently stored. This could be in various databases, log files, flat files, or other data storage systems.
- Define the data format: Understand the format of the data you want to move, such as structured (e.g., CSV, JSON, Avro) or unstructured (e.g., text logs).
- Choose the Right Ingestion Tool:
- Hadoop ecosystem provides various tools for data ingestion. Some popular ones include:
- Apache Sqoop: Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured data stores such as relational databases.
- Apache Flume: Flume is used for collecting, aggregating, and moving large amounts of streaming data into Hadoop.
- Apache Kafka: Kafka is a distributed event streaming platform that can be used for real-time data streaming and can feed data into Hadoop.
- Hadoop ecosystem provides various tools for data ingestion. Some popular ones include:
- Prepare Data for Ingestion:
- Ensure that the data is well-prepared and cleaned before ingestion. This may involve data cleaning, transformation, and enrichment to make it suitable for analysis in Hadoop.
- Configure and Execute Data Ingestion:
- Depending on the tool chosen, configure the ingestion process. For example:
- In Sqoop, you would configure the connection to the source database, specify the data to transfer, and define the target location in Hadoop.
- In Flume, you would configure sources (data producers), channels (buffers), and sinks (data consumers), and set up the flow for data movement.
- With Kafka, you would set up producers to publish data to topics, and consumers to subscribe to topics and pull data into Hadoop.
- Depending on the tool chosen, configure the ingestion process. For example:
- Monitor and Optimize:
- Monitor the data ingestion process to ensure that it is progressing as expected. Check for any errors, performance issues, or bottlenecks.
- Optimize the configuration and infrastructure based on monitoring results to improve performance.
- Data Validation:
- After the data is ingested, perform data validation to ensure that it was transferred accurately and completely. This step is crucial to maintaining data integrity.
- Data Storage and Processing in Hadoop:
- Once the data is in Hadoop, you can use Hadoop Distributed File System (HDFS) for storage and leverage processing frameworks such as Apache Hive, Apache Spark, or Apache MapReduce for analytics and data processing.