[2022] New Professional-Data-Engineer exam dumps Use Updated Google Exam [Q31-Q49]

Share

[2022] New Professional-Data-Engineer exam dumps Use Updated Google Exam

Verified Professional-Data-Engineer Dumps Q&As - Professional-Data-Engineer Test Engine with Correct Answers

NEW QUESTION 31
You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

  • A. Change the zone of your Cloud Dataflow pipeline to run in us-central1
  • B. Use a larger instance type for your Cloud Dataflow workers
  • C. Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery
  • D. Increase the number of max workers
  • E. Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery

Answer: B,C

 

NEW QUESTION 32
You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt.
You need to modify the Cloud Dataflow pipeline to filter out this corrupt data. What should you do?

  • A. Add a ParDo transform in Cloud Dataflow to discard corrupt elements.
  • B. Add a SideInput that returns a Boolean if the element is corrupt.
  • C. Add a Partition transform in Cloud Dataflow to separate valid data from corrupt data.
  • D. Add a GroupByKey transform in Cloud Dataflow to group all of the valid data together and discard the rest.

Answer: A

 

NEW QUESTION 33
Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

  • A. Retry the query with exponential backoff, up to a cap of 15 minutes.
  • B. Retry the query every second until it comes back online to minimize staleness of data.
  • C. Issue a command to restart the database servers.
  • D. Reduce the query frequency to once every hour until the database comes back online.

Answer: A

Explanation:
https://cloud.google.com/sql/docs/mysql/manage-connections#backoff

 

NEW QUESTION 34
An online retailer has built their current application on Google App Engine. A new initiative at the company mandates that they extend their application to allow their customers to transact directly via the application.
They need to manage their shopping transactions and analyze combined data from multiple datasets using a business intelligence (BI) tool. They want to use only a single database for this purpose. Which Google Cloud database should they choose?

  • A. Cloud BigTable
  • B. Cloud SQL
  • C. Cloud Datastore
  • D. BigQuery

Answer: A

 

NEW QUESTION 35
Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)

  • A. The wide model is used for memorization, while the deep model is used for generalization.
  • B. A good use for the wide and deep model is a recommender system.
  • C. A good use for the wide and deep model is a small-scale linear regression problem.
  • D. The wide model is used for generalization, while the deep model is used for memorization.

Answer: A,B

Explanation:
Explanation
Can we teach computers to learn like humans do, by combining the power of memorization and generalization? It's not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning. It's useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.
Reference: https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html

 

NEW QUESTION 36
Does Dataflow process batch data pipelines or streaming data pipelines?

  • A. None of the above
  • B. Only Batch Data Pipelines
  • C. Both Batch and Streaming Data Pipelines
  • D. Only Streaming Data Pipelines

Answer: C

Explanation:
Explanation
Dataflow is a unified processing model, and can execute both streaming and batch data pipelines Reference: https://cloud.google.com/dataflow/

 

NEW QUESTION 37
You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average
200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?

  • A. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.
  • B. Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.
  • C. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.
  • D. Increase the size of your parquet files to ensure them to be 1 GB minimum.

Answer: D

 

NEW QUESTION 38
You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the dat
a. They should only see certain tables based on their team membership. How should you set user permissions?

  • A. Create SQL views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the SQL views
  • B. Create authorized views for each team in datasets created for each team. Assign the authorized views data viewer access to the dataset in which the data resides. Assign the users/groups data viewer access to the datasets in which the authorized views reside
  • C. Assign the users/groups data viewer access at the table level for each table
  • D. Create authorized views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the authorized views

Answer: C

 

NEW QUESTION 39
You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription as the source. You need to make an update to the code that will make the new Cloud Dataflow pipeline incompatible with the current version. You do not want to lose any data when making this update. What should you do?

  • A. Update the current pipeline and provide the transform mapping JSON object.
  • B. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.
  • C. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.
  • D. Update the current pipeline and use the drain flag.

Answer: A

Explanation:
If any transform names in your pipeline have changed, you must supply a transform mapping and pass it using the --transformNameMapping option.
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#preventing_compatibility_breaks

 

NEW QUESTION 40
You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of- Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about
100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required.
You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)

  • A. MongoDB
  • B. Redis
  • C. HBase
  • D. MySQL
  • E. HDFS with Hive
  • F. Cassandra

Answer: A,C,E

 

NEW QUESTION 41
As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects.
Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies.
Which two steps should you take? (Choose two.)

  • A. Create distinct groups for various teams, and specify groups in Cloud IAM policies.
  • B. Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.
  • C. Introduce resource hierarchy to leverage access control policy inheritance.
  • D. For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.
  • E. Use Cloud Deployment Manager to automate access provision.

Answer: A,E

 

NEW QUESTION 42
Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly.
How should you optimize the cluster for cost?

  • A. Use pre-emptible virtual machines (VMs) for the cluster
  • B. Migrate the workload to Google Cloud Dataflow
  • C. Use a higher-memory node so that the job runs faster
  • D. Use SSDs on the worker nodes so that the job can run faster

Answer: B

 

NEW QUESTION 43
A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created with ingest-date partitioning.
Over time, the query processing time has increased. You need to implement a change that would improve query performance in BigQuery. What should you do?

  • A. Implement clustering in BigQuery on the package-tracking ID column.
  • B. Tier older data onto Cloud Storage files, and leverage extended tables.
  • C. Implement clustering in BigQuery on the ingest date column.
  • D. Re-create the table using data partitioning on the package delivery date.

Answer: C

 

NEW QUESTION 44
You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the
shade of each dot represents what class it is. You want to classify this data accurately using a linear
algorithm. To do this you need to add a synthetic feature. What should the value of that feature be?

  • A. Y^2
  • B. cos(X)
  • C. X^2
  • D. X^2+Y^2

Answer: B

 

NEW QUESTION 45
You work for a large real estate firm and are preparing 6 TB of home sales data to be used for machine learning. You will use SQL to transform the data and use BigQuery ML to create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

  • A. When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps. Before requesting predictions, use a saved query to transform your raw input data, and then use ML.EVALUATE.
  • B. When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps. At prediction time, use BigQuery's ML.EVALUATE clause without specifying any transformations on the raw input data.
  • C. Use a BigQuery view to define your preprocessing logic. When creating your model, use the view as your model training data. At prediction time, use BigQuery's ML.EVALUATE clause without specifying any transformations on the raw input data.
  • D. Preprocess all data using Dataflow. At prediction time, use BigQuery's ML.EVALUATE clause without specifying any further transformations on the input data.

Answer: A

 

NEW QUESTION 46
The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

  • A. Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.
  • B. Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.
  • C. Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.
  • D. Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.

Answer: D

 

NEW QUESTION 47
You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics.
Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

  • A. Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.
  • B. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.
  • C. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.
  • D. Add capacity (memory and disk space) to the database server by the order of 200.

Answer: A

Explanation:
Explanation

 

NEW QUESTION 48
You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

  • A. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.
  • B. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.
  • C. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data.
  • D. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

Answer: C

Explanation:
highly available = multi-regional:
https://cloud.google.com/bigquery/docs/locations
recovery strategy of this data that minimizes cost = point-in-time snapshot:
https://cloud.google.com/solutions/bigquery-data-warehouse#backup-and-recovery

 

NEW QUESTION 49
......

Pass Your Professional-Data-Engineer Dumps as PDF Updated on 2022 With 270 Questions: https://www.free4torrent.com/Professional-Data-Engineer-braindumps-torrent.html

Google Professional-Data-Engineer Real Exam Questions and Answers FREE: https://drive.google.com/open?id=14Bqzr7PfvtZP2latZd2UWlIK8uCfjJYA