For example, consider an engineer on call who receives a notification that customer data is not being processed correctly by an application. In this case, a database is failing to complete a transaction because a disk is out of space, which causes the application writing to the database to block while the application repeatedly retries the transaction in rapid succession. The application stops reading from a message queue, which causes messages to accumulate until the maximum size of the queue is reached, at which point the message queue starts to drop data.
Once an incident begins, systems engineers and system administrators need information about the state of components and services. To reduce the time to recover, it is best to collect metrics and log events and then make them available to engineers at any time, especially during an incident response.
The incident might have been avoided if database administrators created alerts on free disk space or if the application developer chose to handle retries using exponential backoff instead of simply retrying as fast as possible until it succeeds. Alerting on the size of the message queue could have notified the operations team of a potential problem in time to make adjustments before data was dropped.
Improving Compliance with Industry Regulations
Many businesses are subject to government and industry regulations. Regulations range from protecting the privacy of customer data to ensuring the integrity of business transactions and financial reporting. Major regulations include the following:
Health Insurance Portability and Accountability Act (HIPAA), a healthcare regulation
Children's Online Privacy Protection Act (COPPA), a privacy regulation
Sarbanes–Oxley Act (SOX), a financial reporting regulation
Payment Card Industry Data Standard (PCI), a data protection regulation for credit card processing
General Data Protection Regulation (GDPR), a European Union privacy protection regulation
Complying with privacy regulations usually requires controls on who can access and change protected data, where it is stored, and under what conditions data may be retained by a business. As an architect, you will have to develop schemes for controls that meet regulations. Fine-grained access controls may be used to control further who can update data. When granting access, follow security best practices, such as granting only the permissions needed to perform one's job and separating high-risk duties across multiple roles. For more on security best practices, see Chapter 7, “Designing for Security and Legal Compliance.”
Business requirements define the context in which architects make design decisions. On the Google Cloud Professional Architect exam, you must understand business requirements and how they constrain technical options and specify characteristics required in a technical solution.
Business Terms to Know
Capital Expenditure (Capex) Funds spent to acquire assets, such as computer equipment, vehicles, and land. Capital expenditures are used to purchase assets that will have a useful life of at least a few years. The other major type of expenditure is operational expenditures. Capital expenses are spread over multiple years, with only a portion of the capital expense impacting the bottom line for each of the years.
Compliance Implementing controls and practices to meet the requirements of regulations, including security, monitoring, and verification that controls meet requirements.
Digital Transformation Major changes in businesses as they adopt information technologies to develop new products, improve customer service, optimize operations, and make other major improvements enabled by technology. Brick-and-mortar retailers using mobile technologies to promote products and engage with customers is an example of digital transformation. Digital transformations usually include some cloud component.
Governance Procedures and practices used to ensure that policies and principles of organizational operations are followed. Governance is the responsibility of directors and executives within an organization.
Key Performance Indicator (KPI) A measure that provides information about how well a business or organization is achieving an important or key objective. For example, an online gaming company may have KPIs related to the number of new players acquired per week, total number of player hours, and operational costs per player.
Line of Business The parts of a business that deliver a particular class of products and services. For example, a bank may have consumer banking and business banking lines, while an equipment manufacturer may have industrial as well as agricultural lines of business. Different lines of business within a company will have some business and technical requirements in common as well as their own distinct needs.
Operational Expenditures (Opex) An expense paid for from the operating budget, not the capital budget.
Operating Budget A budget allocating funds to meet the costs of labor, supplies, and other expenses related to performing the day-to-day operations of a business. Contrast this to capital expenditure budgets, which are used for longer-term investments.
Service-Level Agreement (SLA) An agreement between a provider of a service and a customer using the service. SLAs define responsibilities for delivering a service and consequences when responsibilities are not met.
Service-Level Indicator (SLI) A metric that reflects how well a service-level objective is being met. Examples include latency, throughput, and error rate.
Service-Level Objective (SLO) An agreed-upon target for a measurable attribute of a service that is specified in a service-level agreement.
Analyzing Technical Requirements
Technical requirements specify features of a system that relate to functional and nonfunctional performance. Functional features include providing Atomicity, Consistency, Reliability, and Durability (ACID) transactions in a database, which guarantees that transactions are atomic, consistent, isolated, and durable; ensuring at least once delivery in a messaging system; and encrypting data at rest. Nonfunctional features are the general features of a system, including scalability, reliability, observability, and maintainability.
Functional Requirements
The exam will require you to understand functional requirements related to computing, storage, and networking. The following are some examples of the kinds of issues you will be asked about on the exam.
Understanding Compute Requirements
Google Cloud has a variety of computing services, including Compute Engine, App Engine, Cloud Functions, Cloud Run, and Kubernetes Engine. As an architect, you should be able to determine when each of these platforms is the best option for a use case. For example, if there is a technical requirement to use a virtual machine running a particular hardened version of Linux, then Compute Engine is the best option. Sometimes, though, the choice is not so obvious.
If you want to run containers in a managed service on Google Cloud Platform (GCP), you could choose from App Engine Flexible, Cloud Run, or Kubernetes Engine. If you already have application code running in App Engine and you intend to run a small number of containers, then App Engine Flexible is a good option. If you plan to deploy and manage a large number of containers and want to use a service mesh like Anthos Service Mesh to secure and monitor microservices, Kubernetes Engine is a better option. If you are running stateless containers that do not require Kubernetes features such as namespaces or node allocation and management features, then Cloud Run is a good option.
Understanding