MySQL Conceptual Architecture

The MySQL RDBMS is a very popular database system that is being used all over the world for its performance, simplicity and robustness. It is a good assumption that the architecture for MySQL is similar to that of a general RDBMS architecture. There is no actual architectural diagram for the MySQL RDBMS provided by the designers. The conceptual architecture shows the information about what the system does and how this is broken into interacting parts.

The MySQL architecture was first broken down into three layers: an application layer, a logical layer, and a physical layer. The logical layer, which is focused upon in detail, was broken down into four components: a query processor, a transaction management system, a recovery management system, and a storage management system.

1. General RDBMS Architecture:

It was found in all of the consulted resources that all database systems, can be viewed as a Garlan & Shaw layered architecture at the highest level of abstraction. It has three main components as outlined in the following diagram:

1.1 Application Layer:

The application layer represents the interface for all the users of the system; it essentially provides the means by which the outside world can interact with the database server. In general, it has been found that users can be categorized into four main groups:

  1. Sophisticated users interact with the system without using any form of application; they form their requests directly with the use of a database query language.
  2. Specialized users are application programmers who write specialized database applications that do not fit into the traditional data-processing framework.
  3. Native users are unsophisticated users who interact with the system by invoking one of the permanent application programs that have been previously written.
  4. Database Administrators have complete control over the entire database system. They have a wide variety of responsibilities, including schema definition, the granting of access authorization, as well as the specification of integrity constraints.
All database systems provide extensive services for each of these groups; it is then left to the logical layer to appropriately process the varying requests.

1.2 Logical Layer:

The core functionality of the RDBMS is represented in the logical layer of the architecture; it is in this portion of the system that there are a wide variety of vendor specific implementations (see below figure). However, upon reading a number of general database management texts, it was found that general core logical functionality can indeed be abstracted. The following diagram is a high level abstraction of the modules found in any robust RDBMS.

It can be observed that there are four main modules present in the RDBMS; their implementations and interactions have been summarized based on general documentation from a variety of sources. It is interesting to note that since the major control flow is indeed downwards, the logical layer itself can be considered a Garlan & Shaw layered architecture. It should be noted that the specific implementations and interactions between the subsystems vary greatly from vendor to vendor; in the ensuing section, the MySQL subsystems of these components will be analyzed in detail.

1.3 Physical Layer:

The RDBMS is responsible for the storage of a variety of information, which is kept in secondary storage and accessed via the storage manager. The main types of data kept in the system are:

  1. Data files, which store the user data in the database
  2. Data dictionary, which stores metadata about the structure of the database
  3. Indices, which provide fast access to data items that hold particular values
  4. Statistical Data, which store statistical information about the data in the database; it is used by the query processor to select efficient ways to execute a query.
  5. Log Information, used to keep track of executed queries such that the recovery manager can use the information to successfully recover the database in the case of a system crash.

2. MySQL Architecture:

Once the research pertaining to general RDBMS architectures was complete, the specific MySQL documentation could then be examined in order to extract the vendor specific conceptual architecture. As mentioned above, however, the core of the functionality is found in the logical layer; the remainder of the section details the key subsystems in this layer. The conceptual architecture of MySQL is illustrated in figure below.

The architecture depicted in above figure is a view of the control flow of the MySQL system. It is an expansion of the simple architecture described in previous figure. It should be noted that the architecture described is a layered architecture as described by Garlan and Shaw. There exists a pipeline architecture, also described by Garlan and Shaw, represented in the Query Processing layer between the Embedded DML Precompiler and the Execution Engine.

2.1 MySQL Application Layer:

The MySQL application layer is where the clients and users interact with the MySQL RDBMS. There are three components in this layer as can be seen in the layered MySQL architecture diagram in above figure. These components illustrate the different kinds of users that can interact with the MySQL RDBMS, which are the administrators, clients and query users. The administrators use the administrative interface and utilities. In MySQL, some of these utilities are mysqladmin which performs tasks like shutting down the server and creating or dropping databases, isamchk and myisamchk which help to perform table analysis and optimization as well as crash recovery if the tables become damaged, and mysqldump for backing up the database or copying databases to another server. Clients communicate to the MySQL RDBMS through the interface or utilities. The client interface uses MySQL APIs for various different programming languages such as the C API, DBI API for Perl, PHP API, Java API, Python API, MySQL C++ API and Tcl. Query users interact with the MySQL RDBMS through a query interface that is mysql. mysql is a monitor (interactive program) that allows the query users to issue SQL statements to the server and view the results.

2.2 MySQL Logical Layer:

It was found that MySQL does indeed have a logical architecture that is virtually identical to the one depicted in Figure 2. The MySQL documentation gave an indication as to precisely how these modules could be further broken down into subsystems arranged in a layered hierarchy corresponding to the layered architecture in Garlan and Shaw. The following section details these subsystems and the interactions within them.

2.2.1 Query Processor:

The vast majority of interactions in the system occur when a user wishes to view or manipulate the underlying data in storage. These queries, which are specified using a data-manipulation language (ie SQL), are parsed and optimized by a query processor. This processor, depicted in Figure 3 above, can be represented as pipeline and filter architecture in the sense of Garlan and Shaw where the result of the previous component becomes an input or requirement to the next component. The component architect ure of the query processor will be explained below.

2.2.1.1 Embedded DML Precompiler:

When a request is received from a client in the application layer, it is the responsibility of the embedded DML (Data Manipulation Language) precompiler to extract the relevant SQL statements embedded in the client API commands, or to translate the client commands into the corresponding SQL statements. This is the first step in the actual processing of a client application written in a programming language such as C++ or Perl, before compiling the SQL query. The client request could come from commands executed from an application interface (API), or an application program. This is prevalent in all general RDBMS's. MySQL has this component in order to process the MySQL client application request into the format that MySQL understands.

2.2.1.2 DDL Compiler:

Requests to access the MySQL databases received from an administrator are processed by the DDL (Data Definition Language) compiler. The DDL compiler compiles the commands (which are SQL statements) to interact directly with the database. The administrator and administrative utilities do not expose an interface, and hence execute directly to the MySQL server. Therefore, the embedded DML precompiler does not process it, and this explains the need for a DDL compiler.

2.2.1.3 Query Parser:

After the relevant SQL query statements are obtained from deciphering the client request or the administrative request, the next step involves parsing the MySQL query. In this stage, the objective of the query parser is to create a parse tree structure based on the query so that it can be easily understood by the other components later in the pipeline.

2.2.1.4 Query Preprocessor:

The query parse tree, as obtained from the query parser, is then used by the query preprocessor to check the SQL syntax and check the semantics of the MySQL query to determine if the query is valid. If it is a valid query, then the query progresses down the pipeline. If not, then the query does not proceed and the client is notified of the query processing error.

2.2.1.5 Security/Integration Manager

Once the MySQL query is deemed to be valid, the MySQL server needs to check the access control list for the client. This is the role of the security integration manager which checks to see if the client has access to connecting to that particular MySQL database and whether he/she has table and record privileges. In this case, this prevents malicious users from accessing particular tables and records in the database and causing havoc in the process.

2.2.1.6 Query Optimizer

After determining that the client has the proper permissions to access the specific table in the database, the query is then subjected to optimization. MySQL uses the query optimizer for executing SQL queries as fast as possible. As a result, this is the reason why the performance of MySQL is fast compared to other RDBMS's. The task of the MySQL query optimizer is to analyze the processed query to see if it can take advantage of any optimizations that will allow it to process the query more quickly. MySQL query optimizer uses indexes whenever possible and uses the most restrictive index in order to first eliminate as many rows as possible as soon as possible. Queries can be processed more quickly if the most restrictive test can be done first.

2.2.1.7 Execution Engine

Once the MySQL query optimizer has optimized the MySQL query, the query can then be executed against the database. This is performed by the query execution engine, which then proceeds to execute the SQL statements and access the physical layer of the MySQL database from Figure 3. As well the database administrator can execute commands on the database to perform specific tasks such as repair, recovery, copying and backup, which it receives from the DDL compiler.

2.2.1.8 Scalability/Evolvability

The layered architecture of the logical layer of the MySQL RDBMS supports the evolvability of the system. If the underlying pipeline of the query processor changes, the other layers in the RDBMS are not affected. This is because the architecture has minimal sub-component interactions to the layers above and below it, as can be seen from the architecture diagram. The only sub-components in the query processor that interact with other layers is the embedded DML preprocessor, DDL compiler and query parser (which are at the beginning stages of the pipeline) and the execution engine (end of the pipeline). Hence, if the query preprocessor security/integration manager and/or query optimizer is replaced, this does not affect the outcome of the query processor.

2.2.2 Transaction Management

2.2.2.1 Transaction Manager

As of version MySQL 4.0.x, support was added for transactions in MySQL. A transaction is a single unit of work that has one or more MySQL commands in it. The transaction manager is responsible for making sure that the transaction is logged and executed atomically. It does so through the aid of the log manager and the concurrency-control manager. Moreover, the transaction manager is also responsible for resolving any deadlock situations that occur. This situation can occur when two transactions cannot continue because they each have some data that the other needs to proceed. Furthermore, the transaction manager is responsible for issuing the COMMIT and the ROLLBACK SQL commands.

The COMMIT command commits to performing a transaction. Thus, a transaction is incomplete until it is committed to. The ROLLBACK command is used when a crash occurs during the execution of a transaction. If a transaction were left incomplete, the ROLLBACK command would undo all changes made by that transaction. The result of executing this command is restoring the database to its last stable state.

2.2.2.2 Concurrency-Control Manager

The concurrency-control manager is responsible for making sure that transactions are executed separately and independently. It does so by acquiring locks, from the locking table that is stored in memory, on appropriate pieces of data in the database from the resource manager. Once the lock is acquired, only the operations in one transaction can manipulate the data. If a different transaction tries to manipulate the same locked data, the concurrency-control manager rejects the request until the first transaction is complete.

2.2.3 Recovery Management

2.2.3.1 Log Manager

The log manager is responsible for logging e very operation executed in the database. It does so by storing the log on disk through the buffer manager. The operations in the log are stored as MySQL commands. Thus, in the case of a system crash, executing every command in the log will bring back the database to its last stable state.

2.2.3.2 Recovery Manager

The recovery manager is responsible for restoring the database to its last stable state. It does so by using the log for the database, which is acquired from the buffer manager, and executing each operation in the log. Since the log manager logs all operations performed on the database (from the beginning of the database's life), executing each command in the log file would recover the database to its last stable state.

2.2.4 Storage Management

Storage is physically done on some type of secondary storage, however dynamic access of this medium is not practical. Thus, all work is done through a number of buffers. The buffers reside in main and virtual memory and are managed by a Buffer Manager. This manager works in conjunction with two other manager entities related to storage: the Resource Manager and the Storage Manager.

2.2.4.1 Storage Manager

At the lowest level exists the Storage Manager. The role of the Storage Manager is to mediate requests between the Buffer Manager and secondary storage. The Storage Manager makes requests through the underlying disk controller (and sometimes the operating system) to retrieve data from the physical disk and reports them back to the Buffer Manager.

2.2.4.2 Buffer Manager

The role of the Buffer Manager is to allocate memory resources for the use of viewing and manipulating data. The Buffer Manager takes in formatted requests and decides how much memory to allocate per buffer and how many buffers to allocate per request. All requests are made from the Resource Manager.

2.2.4.3 Resource Manager

The purpose of the Resource Manager is to accept requests from the execution engine, put them into table requests, and request the tables from the Buffer Manager. The Resource Manager receives references to data within memory from the Buffer Manager and returns this data to the upper layers.

2.2.5 Evolvability/Scalability

The goals of the Transaction Management subsystem and the Recovery Management subsystem seem to provide non-functional requirements such evolvability and scalability. For example, the different managers provide the necessary abstractions so that the implementation can change while leaving the interface the same, thereby ensuring that the system can evolve to contain better data structures or algorithms. Furthermore, these subsystems provide scalability by being able to handle several transactions from several different users concurrently, or by recovering crashes from several different databases without much effort.

Comments

Post a Comment

Popular posts from this blog

Simple Pie Graphs Using <canvas> element in HTML 5

Stored Procedures in MySQL

Simple Line Graphs Using <canvas> element in HTML 5