What is the most cost-effective way to run this job?

For this question, refer to the TerramEarth case study.

TerramEarth’s 20 million vehicles are scattered around the world. Based on the vehicle’s location its telemetry data is stored in a Google Cloud Storage (GCS) regional bucket (US. Europe, or Asia). The CTO has asked you to run a report on the raw telemetry data to determine why vehicles are breaking down after 100 K miles. You want to run this job on all the data .

What is the most cost-effective way to run this job?
A . Move all the data into 1 zone, then launch a Cloud Dataproc cluster to run the job.
B . Move all the data into 1 region, then launch a Google Cloud Dataproc cluster to run the job.
C . Launch a cluster in each region to preprocess and compress the raw data, then move the data into a multi region bucket and use a Dataproc cluster to finish the job.
D . Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the jo

Answer: D

Explanation:

Storageguarantees 2 replicates which are geo diverse (100 miles apart) which can get better remote latency and availability.

More importantly, is that multiregional heavily leverages Edge caching and CDNs to provide the content to the end users.

All this redundancy and caching means that Multiregional comes with overhead to sync and ensure consistency between geo-diverse areas. As such, it’s much better for write-once-read-many scenarios. This means frequently accessed (e.g. “hot” objects) around the world, such as website content, streaming videos, gaming or mobile applications.

References: https://medium.com/google-cloud/google-cloud-storage-what-bucket-class-for-the-best-performance-5c847ac8f9f2