Cloud Dataproc is a managed Spark and Hadoop service that is well suited for batch processing. Spark has support for stream processing, and if you are migrating a Spark-based batch processing system, then using Cloud Dataproc may be the fastest way to support stream processing.
Cloud Dataflow supports both batch and stream processing by implementing an Apache Beam runner, which is an open source model for implementing data workflows. Cloud Dataflow has several key features that facilitate building data pipelines, such as supporting commonly used languages like Python, Java, and SQL; providing native support for exactly one processing and event time; and implementing periodic checkpoints.
Choosing between the two will depend on details such as how the current batch processing is implemented and other implementation requirements, but typically for new development, Cloud Dataflow is the preferred option.
The case studies are available online here:
EHR Healthcare services.google.com/fh/files/blogs/master_case_study_ehr_healthcare.pdf
Helicopter Racing League services.google.com/fh/files/blogs/master_case_study_helicopter_racing_league.pdf
Mountkirk Games services.google.com/fh/files/blogs/master_case_study_mountkirk_games.pdf
TerramEarth services.google.com/fh/files/blogs/master_case_study_terramearth.pdf
The case studies are summarized in the following sections.
EHR Healthcare
In the EHR Healthcare cases study, you will have to assess the needs of an electronic health records software company. The company has customers in multiple countries, and the business is growing. The company wants to scale to meet the needs of new business, provide for disaster recovery, and adapt agile software practices, such as frequent deployments.
Business and Technical Considerations
EHR Healthcare uses multiple colocation facilities, and the lease on one of those facilities is expiring soon.
Customers use applications that are containerized and running in Kubernetes. Both relational and NoSQL databases are in use. Users are managed with Microsoft Active Directory. Open source tools are used for monitoring, and although there are alerts in place, email notifications about alerts are often ignored.
Business requirements include onboarding new clients as soon as possible, maintaining a minimum of 99.9 percent availability for applications used by customers, improving observability into system performance, ensuring compliance with relevant regulations, and reducing administration costs.
Technical requirements include maintaining legacy interfaces, standardizing on how to manage containerized applications, providing for high-performance networking between on-premises systems and GCP, providing consistent logging, provisioning and scaling new environments, creating interfaces for ingesting data from new clients, and reducing latency in customer applications.
The company has experienced outages and struggles to manage multiple environments.
Architecture Considerations
From the details provided in the case study, we can quickly see several factors that will influence architecture decisions.
The company has customers in multiple countries, and reducing latency to customers is a priority. This calls for a multiregional deployment of services, which will also help address disaster recovery requirements. Depending on storage requirements, multiregional Cloud Storage may be needed. If a relational database is required to span regions, then Cloud Spanner may become part of the solution.
EHR Healthcare is already using Kubernetes, so Kubernetes Engine will likely be used. Depending on the level of control they need over Kubernetes, they may be able to reduce operations costs by using Autopilot mode of Kubernetes instead of Standard mode.
The company uses Microsoft Active Directory to manage identities, so you may want to use Cloud Identity with Active Directory as an identity provider (IdP) for federating identities.
To improve deployments of multiple environments, you should treat infrastructure as code using Cloud Deployment Manager or Terraform. Cloud Build, Cloud Source Repository, and Artifact Registry are key to supporting an agile continuous integration/continuous delivery.
Current logging and monitoring are insufficient given the problems with outages and ignored alert messages. Engineers may be experiencing alert fatigue caused by too many alerts that either are false positives or provide insufficient information to help resolve the incident. Cloud Monitoring and Cloud Logging will likely be included in a solution.
Helicopter Racing League
The Helicopter Racing League case study describes a global sports provider specializing in helicopter racing at regional and worldwide scales. The company streams races around the world. In addition, it provides race predictions throughout the race.
Business and Technical Considerations
The company wants to increase its use of managed artificial intelligence (AI) and machine learning (ML) services as well as serving content closer to racing fans.
The Helicopter Racing League runs its services in a public cloud provider, and initial video recording and editing is performed in the field and then uploaded to the cloud for additional processing on virtual machines. The company has truck-mounted mobile data centers deployed to race sites. An object storage system is used to store content. The deep learning platform TensorFlow is used for predictions, and it runs on VMs in the cloud.
The company is focused on expanding the use of predictive analytics and reducing latency to those watching the race. They are particularly interested in predictions about race results, mechanical failures, and crowd sentiment. They would also like to increase the telemetry data collected during races. Operational complexity should be minimized while still ensuring compliance with relevant regulations.
Specific technical requirements include increasing prediction accuracy, reducing latency for viewers, increasing post-editing video processing performance, and providing additional analytics and data mart services.
Architecture Considerations
The emphasis on AI and ML makes the Helicopter Racing League a candidate for Vertex AI services. Since they are using TensorFlow, performance may be improved using GPUs or TPUs to build machine learning models.
Improving the accuracy of predictive models will likely require additional data or larger ML models, possibly both. Cloud Pub/Sub is ideal for ingesting large volumes of telemetry data. Services can run in Kubernetes Engine with appropriate scaling configurations and using a Google Cloud global load balancer. The Helicopter Racing League should consider adopting MLOps practices, including automated CI/CD for ML pipelines, such as Vertex Pipelines.
The league has racing fans across the globe, and latency is a key consideration, so Premium Tier network services should be used over the lower-performance Standard Network Tier. Cloud CDN can be used for high-performance edge caching of recorded content to meet latency requirements.
BigQuery