[Jan 20, 2024] Pass Google Cloud Certified Professional-Data-Engineer Exam With 270 Questions
Ultimate Guide to Prepare Free Google Professional-Data-Engineer Exam Questions and Answer
Google Professional-Data-Engineer exam is a comprehensive assessment that requires extensive preparation and study. It consists of 50 multiple-choice questions that need to be answered within two hours. Professional-Data-Engineer exam fee is $200, and it can be taken online or at a testing center. Professional-Data-Engineer exam is available in English, Japanese, Spanish, and Portuguese languages.
Google Professional-Data-Engineer certification is a highly respected credential that validates the knowledge and skills of professionals working in the field of data engineering. Google Certified Professional Data Engineer Exam certification is designed to test the ability of candidates to design, build, operate, and manage data processing systems that are scalable, secure, and reliable. Google Certified Professional Data Engineer Exam certification exam is conducted by Google and is intended for individuals who have a good understanding of the Google Cloud Platform and data engineering best practices.
The Google Certified Professional Data Engineer Exam certification exam is divided into multiple sections, each of which covers a specific area of data engineering. Professional-Data-Engineer exam is scored on a scale of 1000, with a passing score of 700 or higher. Professional-Data-Engineer exam is computer-based and can be taken at a testing center or online. The cost of the exam is $200, and it is valid for two years.
NEW QUESTION # 58
Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training dat
a. However, when tested against new data, it performs poorly. What method can you employ to address this?
- A. Dropout Methods
- B. Dimensionality Reduction
- C. Serialization
- D. Threading
Answer: A
Explanation:
Reference https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-30505541d877
NEW QUESTION # 59
A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions.
You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL 'dataset.model', table user_features). How should you create the ML pipeline?
- A. Create a Cloud Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.
- B. Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
- C. Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.
- D. Create a Cloud Dataflow pipeline using BigQueryIO to read predictions for all users from the query.
Write the results to Cloud Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Cloud Bigtable.
Answer: D
NEW QUESTION # 60
You want to optimize your queries for cost and performance. How should you structure your data?
- A. Partition table data by create_date, location_id and device_version
- B. Cluster table data by create_date location_id and device_version
- C. Cluster table data by create_date partition by locationed and device_version
- D. Partition table data by create_date cluster table data by location_Id and device_version
Answer: D
NEW QUESTION # 61
You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit recipients' personally identifiable information (PII) to analytics systems, which violates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems. What should you do?
- A. Use Stackdriver logging to analyze the data passed through the total pipeline to identify transactions that may contain sensitive information.
- B. Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitive information.
- C. Create an authorized view in BigQuery to restrict access to tables with sensitive data.
- D. Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention API. Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review.
Answer: D
NEW QUESTION # 62
Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)
- A. A good use for the wide and deep model is a small-scale linear regression problem.
- B. The wide model is used for generalization, while the deep model is used for memorization.
- C. A good use for the wide and deep model is a recommender system.
- D. The wide model is used for memorization, while the deep model is used for generalization.
Answer: C,D
Explanation:
Can we teach computers to learn like humans do, by combining the power of memorization and generalization? It's not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning. It's useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.
NEW QUESTION # 63
Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set.
You want to increase the AUC of the model. What should you do?
- A. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC
- B. Deploy the model and measure the real-world AUC; it's always higher because of generalization
- C. Perform hyperparameter tuning
- D. Train a classifier with deep neural networks, because neural networks would always beat SVMs
Answer: A
NEW QUESTION # 64
You architect a system to analyze seismic dat
a. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?
- A. Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
- B. Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
- C. Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.
- D. Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
Answer: B
NEW QUESTION # 65
You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?
- A. Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
- B. Consume the stream of data in Cloud Dataflow using Kafka IO. Set a fixed time window of 1 hour. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
- C. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtable. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hour. If that number falls below 4000, send an alert.
- D. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuery. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hour. If that number falls below
4000, send an alert.
Answer: C
NEW QUESTION # 66
Flowlogistic Case Study
Company Overview
Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck, aircraft, and oceanic shipping.
Company Background
The company started as a regional trucking company, and then expanded into other logistics market.
Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources.
Solution Concept
Flowlogistic wants to implement two concepts using the cloud:
Use their proprietary technology in a real-time inventory-tracking system that indicates the location of
their loads
Perform analytics on all their orders and shipment logs, which contain both structured and unstructured
data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when a shipment will be delayed.
Existing Technical Environment
Flowlogistic architecture resides in a single data center:
Databases
- 8 physical servers in 2 clusters
- SQL Server - user data, inventory, static data
- 3 physical servers
- Cassandra - metadata, tracking messages
10 Kafka servers - tracking message aggregation and batch insert
Application servers - customer front end, middleware for order/customs
- 60 virtual machines across 20 physical servers
- Tomcat - Java services
- Nginx - static content
- Batch servers
Storage appliances
- iSCSI for virtual machine (VM) hosts
- Fibre Channel storage area network (FC SAN) - SQL server storage
Network-attached storage (NAS) image storage, logs, backups
10 Apache Hadoop /Spark servers
- Core Data Lake
- Data analysis workloads
20 miscellaneous servers
- Jenkins, monitoring, bastion hosts,
Business Requirements
Build a reliable and reproducible environment with scaled panty of production.
Aggregate data in a centralized Data Lake for analysis
Use historical data to perform predictive analytics on future shipments
Accurately track every shipment worldwide using proprietary technology
Improve business agility and speed of innovation through rapid provisioning of new resources
Analyze and optimize architecture for performance in the cloud
Migrate fully to the cloud if all other requirements are met
Technical Requirements
Handle both streaming and batch data
Migrate existing Hadoop workloads
Ensure architecture is scalable and elastic to meet the changing demands of the company.
Use managed services whenever possible
Encrypt data flight and at rest
Connect a VPN between the production data center and cloud environment
SEO Statement
We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around.
We need to organize our information so we can more easily understand where our customers are and what they are shipping.
CTO Statement
IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology.
CFO Statement
Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipments are at all times has a direct correlation to our bottom line and profitability.
Additionally, I don't want to commit capital to building out a server environment.
Flowlogistic's management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?
- A. Cloud Load Balancing, Cloud Dataflow, and Cloud Storage
- B. Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage
- C. Cloud Dataflow, Cloud SQL, and Cloud Storage
- D. Cloud Pub/Sub, Cloud Dataflow, and Local SSD
- E. Cloud Pub/Sub, Cloud SQL, and Cloud Storage
Answer: E
Explanation:
Explanation/Reference:
NEW QUESTION # 67
Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of dat
a. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)
- A. Create a new view over events using standard SQL
- B. Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connection and shared "events"
- C. Create a new view over events_partitioned using standard SQL
- D. Create a service account for the ODBC connection to use for authentication
- E. Create a new partitioned table using a standard SQL query
Answer: A,B
NEW QUESTION # 68
To run a TensorFlow training job on your own computer using Cloud Machine Learning Engine, what would your command start with?
- A. gcloud ml-engine local train
- B. gcloud ml-engine jobs submit training
- C. gcloud ml-engine jobs submit training local
- D. You can't run a TensorFlow program on your own computer using Cloud ML Engine .
Answer: A
Explanation:
gcloud ml-engine local train - run a Cloud ML Engine training job locally
This command runs the specified module in an environment similar to that of a live Cloud ML Engine Training Job.
This is especially useful in the case of testing distributed models, as it allows you to validate that you are properly interacting with the Cloud ML Engine cluster configuration.
NEW QUESTION # 69
You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:
* Executing the transformations on a schedule
* Enabling non-developer analysts to modify transformations
* Providing a graphical tool for designing transformations
What should you do?
- A. Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe.
Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery - B. Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis
- C. Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes
- D. Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query
Answer: A
NEW QUESTION # 70
Cloud Bigtable is Google's ______ Big Data database service.
- A. mySQL
- B. NoSQL
- C. Relational
- D. SQL Server
Answer: B
Explanation:
Cloud Bigtable is Google's NoSQL Big Data database service. It is the same database that Google uses for services, such as Search, Analytics, Maps, and Gmail.
It is used for requirements that are low latency and high throughput including Internet of Things (IoT), user analytics, and financial data analysis.
NEW QUESTION # 71
You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM). Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?
- A. Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up
- B. Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory
- C. Allocate additional network interface card (NIC), and configure link aggregation in the operating system to use the combined throughput when working with Cloud Storage
- D. Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS
Answer: B
NEW QUESTION # 72
Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)
- A. Supervised learning to determine which transactions are most likely to be fraudulent.
- B. Unsupervised learning to predict the location of a transaction.
- C. Clustering to divide the transactions into N categories based on feature similarity.
- D. Unsupervised learning to determine which transactions are most likely to be fraudulent.
- E. Supervised learning to predict the location of a transaction.
- F. Reinforcement learning to predict the location of a transaction.
Answer: C,D,E
NEW QUESTION # 73
You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once and must be ordered within windows of 1 hour.
How should you design the solution?
- A. Use Cloud Pub/Sub for message ingestion and Cloud Dataflow for streaming analysis.
- B. Use Apache Kafka for message ingestion and use Cloud Dataflow for streaming analysis.
- C. Use Apache Kafka for message ingestion and use Cloud Dataproc for streaming analysis.
- D. Use Cloud Pub/Sub for message ingestion and Cloud Dataproc for streaming analysis.
Answer: A
NEW QUESTION # 74
......
Google Certified Professional Data Engineer Exam Practice Tests 2024 | Pass Professional-Data-Engineer with confidence!: https://drive.google.com/open?id=1ZIr22ZnyyIphX5Qr-6-iNnzsUcSu84tW
Pass Professional-Data-Engineer Tests Engine pdf - All Free Dumps: https://www.free4torrent.com/Professional-Data-Engineer-braindumps-torrent.html