質問 1:What is the first of a Databricks Python notebook when viewed in a text editor?
A. //Databricks notebook source
B. % Databricks notebook source
C. %python
D. -- Databricks notebook source
正解:B
解説: (Topexam メンバーにのみ表示されます)
質問 2:Which statement describes the default execution mode for Databricks Auto Loader?
A. Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and impotently into the target Delta Lake table.
B. New files are identified by listing the input directory; the target table is materialized by directory querying all valid files in the source directory.
C. Webhook trigger Databricks job to run anytime new data arrives in a source directory; new data automatically merged into target tables using rules inferred from the data.
D. New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.
正解:D
解説: (Topexam メンバーにのみ表示されます)
質問 3:To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.
The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.
Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?
A. Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.
B. Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.
C. Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.
D. Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.
E. Create a new table with the required schema and new fields and use Delta Lake's deep clone functionality to sync up changes committed to one table to the corresponding table.
正解:D
解説: (Topexam メンバーにのみ表示されます)
質問 4:A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're using a notebook versioned with Databricks Repos, they're using a personal branch that contains old logic. The desired branch named dev-2.3.9 is not available from the branch selection dropdown.
Which approach will allow this developer to review the current logic for this notebook?
A. Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9
B. Merge all changes back to the main branch in the remote Git repository and clone the repo again
C. Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch
D. Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.
E. Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository
正解:D
解説: (Topexam メンバーにのみ表示されます)
質問 5:The following code has been migrated to a Databricks notebook from a legacy workload:
The code executes successfully and provides the logically correct results, however, it takes over 20 minutes to extract and load around 1 GB of data.
Which statement is a possible explanation for this behavior?
A. %sh does not distribute file moving operations; the final line of code should be updated to use %fs instead.
B. %sh executes shell code on the driver node. The code does not take advantage of the worker nodes or Databricks optimized Spark.
C. Instead of cloning, the code should use %sh pip install so that the Python code can get executed in parallel across all nodes in a cluster.
D. %sh triggers a cluster restart to collect and install Git. Most of the latency is related to cluster startup time.
E. Python will always execute slower than Scala on Databricks. The run.py script should be refactored to Scala.
正解:B
解説: (Topexam メンバーにのみ表示されます)
質問 6:A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.
One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.
What approach would allow them to do this?
A. Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.
B. Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.
C. Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.
D. Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.
正解:D
解説: (Topexam メンバーにのみ表示されます)
Databricks Databricks-Certified-Professional-Data-Engineer 認定試験の出題範囲:
トピック | 出題範囲 |
---|
トピック 1 | - Monitoring & Logging: This topic includes understanding the Spark UI, inspecting event timelines and metrics, drawing conclusions from various UIs, designing systems to control cost and latency SLAs for production streaming jobs, and deploying and monitoring both streaming and batch jobs.
|
トピック 2 | - Databricks Tooling: The Databricks Tooling topic encompasses the various features and functionalities of Delta Lake. This includes understanding the transaction log, Optimistic Concurrency Control, Delta clone, indexing optimizations, and strategies for partitioning data for optimal performance in the Databricks SQL service.
|
トピック 3 | - Data Processing: The topic covers understanding partition hints, partitioning data effectively, controlling part-file sizes, updating records, leveraging Structured Streaming and Delta Lake, implementing stream-static joins and deduplication. Additionally, it delves into utilizing Change Data Capture, and addressing performance issues related to small files.
|
トピック 4 | - Testing & Deployment: It discusses adapting notebook dependencies to use Python file dependencies, leveraging Wheels for imports, repairing and rerunning failed jobs, creating jobs based on common use cases, designing systems to control cost and latency SLAs, configuring the Databricks CLI, and using the REST API to clone a job, trigger a run, and export the run output.
|
参照:https://www.databricks.com/learn/certification/data-engineer-professional
TopExamは君にDatabricks-Certified-Professional-Data-Engineerの問題集を提供して、あなたの試験への復習にヘルプを提供して、君に難しい専門知識を楽に勉強させます。TopExamは君の試験への合格を期待しています。
安全的な支払方式を利用しています
Credit Cardは今まで全世界の一番安全の支払方式です。少数の手続きの費用かかる必要がありますとはいえ、保障があります。お客様の利益を保障するために、弊社のDatabricks-Certified-Professional-Data-Engineer問題集は全部Credit Cardで支払われることができます。
領収書について:社名入りの領収書が必要な場合、メールで社名に記入していただき送信してください。弊社はPDF版の領収書を提供いたします。
弊社のDatabricks Databricks-Certified-Professional-Data-Engineerを利用すれば試験に合格できます
弊社のDatabricks Databricks-Certified-Professional-Data-Engineerは専門家たちが長年の経験を通して最新のシラバスに従って研究し出した勉強資料です。弊社はDatabricks-Certified-Professional-Data-Engineer問題集の質問と答えが間違いないのを保証いたします。
この問題集は過去のデータから分析して作成されて、カバー率が高くて、受験者としてのあなたを助けて時間とお金を節約して試験に合格する通過率を高めます。我々の問題集は的中率が高くて、100%の合格率を保証します。我々の高質量のDatabricks Databricks-Certified-Professional-Data-Engineerを利用すれば、君は一回で試験に合格できます。
一年間の無料更新サービスを提供します
君が弊社のDatabricks Databricks-Certified-Professional-Data-Engineerをご購入になってから、我々の承諾する一年間の更新サービスが無料で得られています。弊社の専門家たちは毎日更新状態を検査していますから、この一年間、更新されたら、弊社は更新されたDatabricks Databricks-Certified-Professional-Data-Engineerをお客様のメールアドレスにお送りいたします。だから、お客様はいつもタイムリーに更新の通知を受けることができます。我々は購入した一年間でお客様がずっと最新版のDatabricks Databricks-Certified-Professional-Data-Engineerを持っていることを保証します。
弊社は失敗したら全額で返金することを承諾します
我々は弊社のDatabricks-Certified-Professional-Data-Engineer問題集に自信を持っていますから、試験に失敗したら返金する承諾をします。我々のDatabricks Databricks-Certified-Professional-Data-Engineerを利用して君は試験に合格できると信じています。もし試験に失敗したら、我々は君の支払ったお金を君に全額で返して、君の試験の失敗する経済損失を減少します。
弊社は無料Databricks Databricks-Certified-Professional-Data-Engineerサンプルを提供します
お客様は問題集を購入する時、問題集の質量を心配するかもしれませんが、我々はこのことを解決するために、お客様に無料Databricks-Certified-Professional-Data-Engineerサンプルを提供いたします。そうすると、お客様は購入する前にサンプルをダウンロードしてやってみることができます。君はこのDatabricks-Certified-Professional-Data-Engineer問題集は自分に適するかどうか判断して購入を決めることができます。
Databricks-Certified-Professional-Data-Engineer試験ツール:あなたの訓練に便利をもたらすために、あなたは自分のペースによって複数のパソコンで設置できます。