US8209587B1 - System and method for eliminating zeroing of disk drives in RAID arrays - Google Patents
- ️Tue Jun 26 2012
US8209587B1 - System and method for eliminating zeroing of disk drives in RAID arrays - Google Patents
System and method for eliminating zeroing of disk drives in RAID arrays Download PDFInfo
-
Publication number
- US8209587B1 US8209587B1 US11/734,314 US73431407A US8209587B1 US 8209587 B1 US8209587 B1 US 8209587B1 US 73431407 A US73431407 A US 73431407A US 8209587 B1 US8209587 B1 US 8209587B1 Authority
- US
- United States Prior art keywords
- block
- parity
- data
- stripe
- disk Prior art date
- 2007-04-12 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires 2031-04-12
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
- G06F11/1096—Parity calculation or recalculation after configuration or reconfiguration of the system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
- G06F11/1088—Reconstruction on already foreseen single or plurality of spare disks
Definitions
- a storage server is a file server.
- a file server operates on behalf of one or more clients to store and manage shared files in a set of mass storage devices, such as magnetic or optical storage disks or tapes.
- the mass storage devices may be organized into one or more volumes of Redundant Array of Independent (or Inexpensive) Disks (RAID).
- RAID Redundant Array of Independent (or Inexpensive) Disks
- Another example of a storage server is a device, which provides clients with block-level access to stored data, rather than file-level access, or a device, which provides clients with both file-level access and block-level access.
- FIG. 5 is a flow diagram of a method for using disk drives to perform a write operation according to a subtraction method without zeroing disk drives prior to using them according to one embodiment of the present invention
- FIG. 11 is a block diagram of the components of a disk configured to perform a write operation using a modified XPWrite command according to one embodiment of the present invention.
- Storage of information on the storage system 200 is preferably implemented as one or more storage volumes that comprise physical storage disk drives 250 defining an overall logical arrangement of disk space.
- the disk drives within a volume are typically organized as one or more RAID groups (such as a RAID group 202 shown in FIG. 2 ).
- the physical disks of each RAID group include those disks configured to store striped data (D) and those disks configured to store parity (P) for the data, in accordance with an illustrative RAID-4 and RAID-5 level configurations.
- RAID-DP RAID level configurations
- a storage system's ID can be used along with the current timestamp to create a unique disk drive ID.
- the unique disk drive ID is used to determine whether a data block was written prior to the disk being added to a RAID group or a data block has become free (unallocated) and is removed from the parity block in the stripe.
- the present invention is described in the context of creating a unique disk drive ID, a person of ordinary skill in the art would understand that other techniques that produce a similar result can be used to provide an indication whether a disk drive stores valid data.
- a disk drive that stores parity for a stripe stores an array of values, each value representing the position of a drive storing data blocks included in the parity for that stripe.
- a 16-element array can represent up to 16 disk drives in the disk array.
- a 16-bit value can represent up to 16 disk drives in the disk array.
- a portion of the memory 322 may be further organized as a “buffer cache” 350 for storing certain data structures as well as data blocks retrieved from disks 250 and to be written to the disks 250 .
- a data block stored in buffer cache 350 is appended with a checksum block, which could store up to 64 bytes of data.
- the checksum block stores a unique disk drive ID, which is inserted prior to the data block being written to the disk, to indicate that the data block was written after the disk was added to the existing array.
- the checksum block may store a checksum for the associated data block, which can be computed by combining all the bytes for a block (e.g., 4 Kb) with a series of arithmetic or logical operations.
- the storage adapter 326 cooperates with the storage operating system 324 executing on the storage server 200 to access data from disks 250 .
- the storage adapter 326 comprises a plurality of ports having input/output (I/O) interface circuitry that couples to the disks 250 (shown in FIG. 2 ) over an I/O interconnect arrangement, such as a conventional high-performance, fibre channel (FC) link topology.
- I/O input/output
- the updated parity mask may be cached in memory 322 in buffer cache 350 .
- the cached parity mask can be used to determine which data blocks are valid without reading the data blocks or a parity block. This embodiment is described herein in reference to FIGS. 7A and 7B .
- RAID controller module 436 updates the current mask of the parity mask.
- the current mask is a result of the logical operation performed on the mask of the disk drives that store data blocks included into the parity. If none of the data blocks are included in the parity block for the stripe where new data are written, the parity mask is assigned a zero value.
- the current mask of the parity mask for a stripe is logically OR'ed with the masks of disk drives to which new data will be written. The result may be first written onto a checksum block associated with the parity block in buffer cache 350 .
- FIG. 11 it illustrates components residing on each disk 250 in a RAID group, such as RAID group 202 shown in FIG. 2 , to execute a modified XPWrite command issued by RAID controller module 436 .
- Each disk 250 in a RAID group 202 includes a controller 1100 , which implements a command executing module 1130 .
- command executing module 1130 is configured to read data from the data block indicated in the command and to request new data from storage system 200 .
- Module 1130 also receives the XOR of new data blocks and computes new parity by XOR'ing the new data blocks with the data read in memory. Steps performed by module 1130 will be described in more detail in reference to FIG.
- command executing module 1130 receives the command issued by RAID controller module 436 .
- the command includes a block number at which data will be written as well as the number of blocks that the data occupies.
- the command includes an Operation Code (opcode) indicating the type of the command.
- RAID controller module 436 XOR's unallocated data blocks with the parity block for that stripe.
- the process of removing unallocated data blocks from a parity block is performed during idle time.
- idle time refers to a period of time during which requests from clients 210 are expected to be low (such as, for example, during weekends, holidays, and off-peak hours).
- spare disks are used to replace failed disks.
- the reconstruction process can be performed intermittently with normal I/O operations.
- the process of data reconstruction involves reading data from all disks other than the failed disk, XOR'ing the data being read, and writing the result of the XOR operation to a spare disk, which becomes a reconstructing disk once it replaces the failed disk.
- multiple read operations and at least one write operation have to be performed.
- this data block does not need to be read and can be ignored for purposes of reconstructing data on the failed drive. Additionally, if the data block to be reconstructed is invalid, then it does not need to be reconstructed and no reads or writes are required for that stripe.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
Embodiments of the present invention disclose a technique for providing an indication whether data stored on a disk drive are invalid. As used herein, invalid data are data written prior to the disk drive being added to an array of the disk drives or data in a block that has become free and which has been removed from the corresponding parity block of the stripe. Knowing that the disk drive was written prior to the drive being added to the existing array or having data which has become invalid allows a storage server to ignore the invalid data and not to use it when computing parity (i.e., a data protection value computed as a result of a logical operation on data blocks in a stripe in the array of disk drives). This, in turn, eliminates the need to zero disk drives or to perform parity re-computation prior to using the disk drives.
Description
The present invention pertains to storage systems, and more particularly, to optimizing data write operations.
BACKGROUNDA storage server is a processing system adapted to store and retrieve data on behalf of one or more client processing systems (“clients”) in response to input/output (I/O) client requests. A storage server can be used for many different purposes, such as to provide multiple users with access to shared data or to backup data.
One example of a storage server is a file server. A file server operates on behalf of one or more clients to store and manage shared files in a set of mass storage devices, such as magnetic or optical storage disks or tapes. The mass storage devices may be organized into one or more volumes of Redundant Array of Independent (or Inexpensive) Disks (RAID). Another example of a storage server is a device, which provides clients with block-level access to stored data, rather than file-level access, or a device, which provides clients with both file-level access and block-level access.
In a storage server, data gets corrupted or lost from time to time, for example, upon the failure of one of the mass storage devices. Therefore, it is common to implement techniques for protecting the stored data. Currently, these techniques involve calculating a data protection value (e.g., parity) and storing the parity in various locations. Parity may be computed as an exclusive-OR (XOR) of data blocks in a stripe spread across multiple disks in a disk array. In a single parity scheme, e.g. RAID-4 or RAID-5, an error can be corrected in any block in the stripe using a single parity block (also called “row parity”), which can be computed as a result of a logical operation on data blocks in a stripe. In a dual parity scheme, e.g. RAID Double Parity (RAID-DP), a technique invented by Network Appliance Inc. of Sunnyvale, Calif., errors resulting from a two-disk failure can be corrected using two parity blocks.
When one or more data blocks in a stripe are modified, a parity block has to be updated for that stripe. Two techniques are known for modifying data blocks and thus updating a parity block. According to a first method, known as “Read Modify Write” or “Parity By Subtraction Write” (also referred to herein as “a subtraction method”), data blocks that are being modified within a stripe and a parity block for the stripe are read into memory. Then an exclusive OR (XOR) operation is performed on the read data blocks and the parity block. The result of the XOR operation is XOR'ed with new data blocks to be written to compute new parity. Then, the result is written to the parity block in the stripe. When a large number of data blocks are modified, another method for writing data, known as “Write by Recalculation” (or as referred to herein, a “recalculation method”) is used. According to this method, data blocks within a stripe that are not to be modified are read into memory. To compute new parity, the read data blocks are XOR'ed with a result of the XOR operation on new data blocks. The result is written to a parity block in the stripe on a disk.
Prior to using disk arrays for the first time or prior to adding a disk to an existing array, disk drives are zeroed (a zero value is written to the disks) to ensure that parity for each stripe is consistent with the data blocks in the stripes. A disk can be added to the existing array, for example, when it replaces another disk or when the number of disks is expanded in the array. The process of zeroing the disk though can be time-consuming and might require up to eight hours to complete. Moreover, this time increases as disk drive capacities are increasing.
An alternative to zeroing disk drives would be to read all data blocks in each stripe, calculate the parity and write the parity onto the disk. Any write requests received to the stripes that already had the parity calculated and written could be performed using either the recalculation method or the subtraction method. Any write requests to the stripes that had not had the parity calculated and written would be done using the recalculation method. An advantage of this technique is that the disks could be used immediately after the array was created. One of the shortcomings of this technique though is that reading all data drives, computing parity, and writing the computed parity to the disk would take much longer than it would take to zero the drives. In addition, even though the array can be used while this reparity process is running, the overhead of the reparity process will affect the performance of the system. Moreover, any writes that have to be done using a recalculation method may not be as efficient as they would have been had they been done by the subtraction method (which again requires parity being consistent with all the data blocks in a stripe).
Accordingly, what is needed is a mechanism that eliminates the need to zero disk drives prior to adding them to the system without adding overhead of parity recomputation to the entire array.
SUMMARY OF THE INVENTIONEmbodiments of the present invention provide a method, system, and computer program product for providing an indication whether data stored on a disk drive are invalid. As used herein, invalid data are data written prior to the disk drive being added to an array of the disk drives, or data in a block that has become free (unallocated) and is removed from the parity block in the stripe. A disk drive can be added to an existing array, when, e.g., it replaces a disk in the array, the array is formed, or the array is expanded. Knowing that the disk drive stores data that were written prior to the drive being added to the existing array or is not needed allows a storage server to ignore the invalid data and not to use it when computing a parity block (i.e., a data protection value computed as a result of a logical operation on data blocks in a stripe in the array of disk drives). This, in turn, eliminates the need to zero disk drives or to perform parity re-computation prior to using the disk drives to ensure that the parity is consistent with the data in the corresponding stripe in the array.
According to one embodiment of the present invention, a unique disk drive ID is created for each disk drive when it is added to an array of disk drives. The disk drive ID is stored in a reserved area on the disk where other configuration information is stored. When a data block is written to a disk drive, a unique disk drive ID for that disk drive is written to a checksum block that is associated with the data block. A data block can be any chunk of data that a storage server is capable of recognizing and manipulating as a distinct entity When a parity block is written to a disk, a value indicating which data blocks are included in the parity block for that stripe is written to a checksum block that is associated with the parity block. A unique disk drive ID for the drive that stores parity for a stripe is also written to the checksum block.
When reading a data block from a disk drive as part of a write operation, data in the checksum block associated with the data block is compared to a disk drive ID that was stored in the configuration area of the disk drive. If the two match, it indicates that the data block was written after the disk was added to the array and is valid. If the two do not match, it indicates that the data block was written prior to the disk drive being added to the array or has become invalid. As noted above, knowing that the data block was written prior to the disk being added to the array allows the storage server to ignore that data block and not to use it for purposes of parity computation even without writing a zero value to that data block. A person of ordinary skill in the art would understand that any other mechanism can be used to provide an indication whether a disk drive stores valid data.
The inventive technique of the present invention can be used to optimize write operations, to perform more efficient reconstruction of data when a disk fails, to reduce reliance on “sick” disks for disk copies, and to improve error recovery, as described herein.
The inventive techniques of the present invention reduce or eliminate the number of read operations to perform a write operation. For example, when a value indicating which data blocks are included in a particular parity block for that stripe is cached in memory, it can be used to determine which data blocks are valid even without reading the data blocks. When performing a write operation, invalid data blocks are ignored and not read. This, in turn, reduces the number of read operations to complete a write operation.
According to another embodiment of the present invention, a write operation can be done without performing any read operation, if none of the data blocks that are not being modified in the stripe are valid. In this situation, parity is computed for a stripe where new data are written by performing a logical operation on only the new data blocks.
According to another embodiment, when data blocks that are being overwritten are not valid, but other data blocks in a stripe are valid, to reduce the number of read operations, the storage server issues a command to the disk that stores parity. The command, when executed at a disk that stores parity for a corresponding stripe, reads the parity block into memory, requests new data from the storage server, and computes new parity using the new data. The result of the computation is written to the parity disk. As a result of the execution of the command by the parity disk, the storage server does not need to read the old parity block and to compute the new parity using new data.
Furthermore, the present invention improves error recovery in an array in which one disk has a failure and another disk has a data block with a media error. To recover data from the data block that has a media error, the present invention determines whether data on the failed drive are valid, e.g., were written after the disk was added to the array. If the data on the failed drive are not valid, that data can be ignored when recovering the data block that has a media error.
Similarly, the inventive technique may reduce the number of read and write operations to reconstruct data from a failed drive by providing an indication whether data blocks on the failed drive were written prior to being added to the array or have become invalid. The inventive technique may reduce the number of read and write operations to copy data from a “sick” disk by providing an indication whether data stored on the sick disk are valid.
Although the present invention is described in the context of hard disk drives (HDD) or disks, a person of ordinary skill in the art would understand that the invention can be implemented with optical storage disks, tapes, or other types of storage devices. Other aspects of the invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings, which illustrate the principles of the invention by way of example.
BRIEF DESCRIPTION OF THE DRAWINGSOne or more embodiments of the present invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
show arrangements of data blocks on disks according to RAID-4 and RAID-5, respectively;
shows a network environment that includes a storage server according to an embodiment of the present invention;
is a diagram showing architecture of the storage server shown in
FIG. 2according to an embodiment of the present invention;
is a diagram showing a storage operating system of the storage server of
FIG. 2according to an embodiment of the present invention;
is a flow diagram of a method for using disk drives to perform a write operation according to a subtraction method without zeroing disk drives prior to using them according to one embodiment of the present invention;
is a flow diagram of a method for using disk drives to perform a write operation according to a recalculation method without zeroing disk drives prior to using them according to one embodiment of the present invention;
is a flow diagram of a method for optimizing write operations using inventive techniques of the present invention according to one embodiment of the present invention;
is a flow diagram of a method for optimizing write operations using inventive techniques of the present invention according to another embodiment of the present invention;
is a flow diagram of a method for optimizing write operations using inventive techniques of the present invention according to yet another embodiment of the present invention;
is a flow diagram of a method for optimizing write operations using inventive techniques of the present invention and a modified XPWrite command according to one embodiment of the present invention;
is a flow diagram of a method performed by a module executed on a drive that stores parity for a stripe to execute a modified XPWrite command according to an embodiment of the present invention; and
is a block diagram of the components of a disk configured to perform a write operation using a modified XPWrite command according to one embodiment of the present invention.
Embodiments of the present invention provide a method, system, and computer program product for providing an indication whether data stored on a disk drive are invalid. As used herein, “invalid data” are data written prior to the disk drive being added to an array of the disk drives or data in a block which has become free and which has been removed from the corresponding parity block in the stripe. Knowing that the disk drive stores data that were written prior to the drive being used in the existing array or have become invalid allows a storage server to ignore the invalid data and not to use it when computing a parity block (i.e., a data protection value computed as a result of a logical operation on data blocks in a stripe in the array of disk drives). This, in turn, eliminates the need to zero disk drives (by writing a zero value to the disks) or to perform parity re-computation prior to using the disk drives to ensure that the parity is consistent with the data in the corresponding stripe in the array.
As used herein, “old data” means data that is currently stored in data blocks; “new data” refers to data that is to be written to the disks. “Old parity” refers to parity computed before a write command is executed, that is, before new data are written to disks. “New parity” refers to parity computed after a write command is executed, that is, after new data are written to the disks. Throughout this description, the “Read Modify Write” method for writing data is referred to herein as a “subtraction method.” The “Parity By Recalculation Write” method for writing data is referred to herein as a “recalculation method.”
An “aggregate” is a logical container for a pool of storage, which combines one or more physical mass storage devices (e.g., disks) or parts thereof into a single logical storage object, which contains or provides storage for one or more other logical data sets (e.g., volumes), but at a higher level of abstraction. A “volume” is a set of stored data associated with a collection of mass storage devices, such as disks, which obtains its storage from (i.e., is contained within) an aggregate, and which is managed as an independent administrative unit.
A “checksum block” is a block of data that is associated with a data block when a data block is written to a disk. According to one embodiment, the checksum block can store up to 64 bytes of data.
Referring now to
FIGS. 1A and 1B, they show arrangements of data blocks on storage devices according to RAID-4 and RAID-5 respectively. In
FIGS. 1A and 1B, data sent to a storage server from a client(s) for storage as part of a write operation may first be divided up into fixed-size, e.g., four Kilo Byte, blocks (e.g. D0, D1, etc.), which are then formed into groups that are stored as physical data blocks in a “stripe” (e.g. Stripe I, Stripe II, etc.) spread across multiple disks in an array. Row parity, e.g. an exclusive-OR (XOR) of the data in the stripe, is computed and may be stored in a parity protection block on disk D. The location of the row parity depends on the type of protection scheme or protocol implemented.
FIG. 1Ashows a RAID-4 scheme in which the row parity, e.g. P(0-2), P(3-5), P(6-8), and P(9-11) are stored in a dedicated disk (disk D).
FIG. 1Bshows a RAID-5 scheme in which the row parity is distributed across disks in the array.
1. System Architecture
As noted, a mechanism for eliminating a process of zeroing disk drives prior to adding them to a RAID system can be implemented in a storage server, such as the one shown in
FIG. 2.
FIG. 2shows a network environment that includes
multiple clients210, a
storage server200, and a collection of
disks250.
Storage server200 operates on behalf of
clients210 to store and manage shared files in a set of mass storage devices, such as magnetic or optical storage based disks or tapes. As previously described,
storage server200 can also be a device that provides
clients210 with a block-level access to stored data, rather than a file-level access, or a device which provides clients with both file-level access and block-level access.
Storage server200 in
FIG. 2is coupled locally to a set of
clients210 over
connection system230, such as a local area network (LAN), a wide area network (WAN), metropolitan are network (MAN) or the Internet. The
connection system230 may be embodied as an Ethernet network or a Fibre Channel (FC) network. Each
client210 may communicate with the
storage server200 over the
connection system230 by exchanging discrete frames or packets of data according to pre-defined protocols, such as TCP/IP.
Each of the
clients210 may be, for example, a conventional personal computer (PC), workstation, or the like. The
storage server200 receives various requests from the
clients210 and responds to these requests. The
storage server200 may have a distributed architecture; for example, it may include a separate N- (“network”) blade and D- (disk) blade (not shown). In such an embodiment, the N-blade is used to communicate with
clients210, while the D-blade includes the file system functionality and is used to communicate with the
disks250. The N-blade and D-blade communicate with each other using an internal protocol. Alternatively, the
storage server200 may have an integrated architecture, where the network and data components are all contained in a single box and can be tightly integrated. The
storage server200 further may be coupled through a switching fabric to other similar storage servers (not shown) which have their own local disk subsystems. In this way, all of the
disks250 can form a single storage pool, to which any client of any of the file servers has access.
Storage of information on the
storage system200 is preferably implemented as one or more storage volumes that comprise physical
storage disk drives250 defining an overall logical arrangement of disk space. The disk drives within a volume are typically organized as one or more RAID groups (such as a
RAID group202 shown in
FIG. 2). The physical disks of each RAID group include those disks configured to store striped data (D) and those disks configured to store parity (P) for the data, in accordance with an illustrative RAID-4 and RAID-5 level configurations. However, other RAID level configurations (e.g., RAID-DP) are also contemplated.
A disk drive within a RAID group has a reserved area that stores configuration information for that particular drive and a copy of the configuration information for the entire RAID group (in
FIG. 2, the reserved
area204 is designated on each disk to store configuration information). The configuration information may include a type of a RAID group, a number of disk drives within a RAID group, the disk drive's position within the RAID group, as well as other information describing the RAID group and the disk drive. The configuration information can also be persisted to the memory (shown in
FIG. 3) of the
storage server200.
According to an embodiment of the present invention, a unique identifier (ID) is created for each disk drive at the time when a RAID group is created or when a disk drive is added to the RAID group. The unique ID can be stored as part of the configuration information for that drive. In one implementation, to create a unique drive ID, an ID of a RAID group and a count can be used. Each RAID group when created is assigned a unique ID. When drives are assigned to the RAID group, each drive ID is created with this unique ID and a count for the drive starting, e.g., with 0. The count then can be incremented and used for the next drive. The count is then saved in the reserved area on each disk drive with other configuration information for the RAID group. If a disk is later added to the RAID group, this count is incremented and used for the new drive and then saved back to the area on each drive that stores configuration information. This ensures that no drives have the same drive ID.
In another implementation, a storage system's ID can be used along with the current timestamp to create a unique disk drive ID. As will be described in more detail throughout the description of the present invention, the unique disk drive ID is used to determine whether a data block was written prior to the disk being added to a RAID group or a data block has become free (unallocated) and is removed from the parity block in the stripe. Furthermore, although the present invention is described in the context of creating a unique disk drive ID, a person of ordinary skill in the art would understand that other techniques that produce a similar result can be used to provide an indication whether a disk drive stores valid data.
is a diagram showing the architecture of the
storage server200 according to an embodiment of the present invention. The
storage server200 includes one or more processor(s) 321 and
memory322 coupled to a
bus system323. The
bus system323 shown in
FIG. 3may represent any one or more separate physical buses and/or point-to-point connections, connected by appropriate bridges, adapters and/or controllers. The
bus system323, therefore, may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus, a Hyper Transport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (sometimes referred to as “Firewire”).
The processor(s) 321 are the central processing units (CPUs) of the
storage server200 and, thus, control the overall operation of the
storage server200. In certain embodiments, the processor(s) 321 accomplish this by executing software, such as that described in more detail herein stored in
memory322. A
processor321 may include one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
322 comprises storage locations that are addressable by the
processor321 and adapters for storing software program code and data structures associated with the present invention. The
processor321 and adapters may, in turn, comprise processing elements and/or logic circuitry configured to execute the software code and manipulate various data structures.
Memory322 can be a random access memory (RAM), a read-only memory (ROM), a flash memory, or the like, or a combination of such devices.
Memory322 stores, among other components, the
storage operating system324 of the
storage server200. It will be apparent to those skilled in the art that other processing and memory means, including various computer readable media, may be used for storing and executing program instructions pertaining to the invention described herein.
Memory322 also stores configuration information (also referred to herein as “labels”) of the disk drives.
In one implementation,
memory322 also stores a mask of each drive. In one embodiment, a mask is written to a checksum block associated with a data block that stores parity for a drive. This mask is referred to herein as a parity mask and is computed as a result of a logical operation on the masks of each disk drive storing data blocks included in the parity for a stripe where new data blocks are written. Thus, the mask that is written in the checksum block associated with a data block that stores parity for the stripe can be used to determine which data blocks are included in the parity for a particular stripe. A mask for each disk drive can be created by shifting a one bit by the position of the drive in the array. In one embodiment, a mask is a bitmask. For example, drive 0 would be assigned a binary mask of “0001b”, while drive 1 would be assigned a binary mask of “0010b” (a one bit shifted left by one bit), drive 2 would be assigned the mask of “0100b” (a one bit shifted left by two bits). A parity mask, such as for example, a mask of “0101b” would indicate that the parity block includes the data for
drives0 and 2 (“0101b” is the result of logically OR'ing the mask of
drive0, “0001b”, and the mask of drive 2, “0100b”). During a write operation, a parity mask is updated and written to a checksum block, which is associated with the parity block.
A person of ordinary skill in the art would understand that although the present invention is described in the context of using a mask to represent the data blocks of a stripe that are included in the parity for that stripe, other representations can be used, such as storing each drives' position in the RAID group. In this embodiment, a disk drive that stores parity for a stripe stores an array of values, each value representing the position of a drive storing data blocks included in the parity for that stripe. For example, a 16-element array can represent up to 16 disk drives in the disk array. When using a mask to represent the data blocks of a stripe that are included in the parity for that stripe, a 16-bit value can represent up to 16 disk drives in the disk array.
A portion of the
memory322 may be further organized as a “buffer cache” 350 for storing certain data structures as well as data blocks retrieved from
disks250 and to be written to the
disks250. A data block stored in
buffer cache350 is appended with a checksum block, which could store up to 64 bytes of data. As described herein, the checksum block stores a unique disk drive ID, which is inserted prior to the data block being written to the disk, to indicate that the data block was written after the disk was added to the existing array. In addition, the checksum block may store a checksum for the associated data block, which can be computed by combining all the bytes for a block (e.g., 4 Kb) with a series of arithmetic or logical operations.
Also connected to the processor(s) 321 through the
bus system323 are a
storage adapter326 and a
network adapter327. The
storage adapter326 cooperates with the
storage operating system324 executing on the
storage server200 to access data from
disks250. The
storage adapter326 comprises a plurality of ports having input/output (I/O) interface circuitry that couples to the disks 250 (shown in
FIG. 2) over an I/O interconnect arrangement, such as a conventional high-performance, fibre channel (FC) link topology.
The
network adapter327 comprises a plurality of ports adapted to couple the
storage server200 to one or more clients 210 (shown in
FIG. 2) over point-to-point links, wide area networks, virtual private networks implemented over a public network (Internet) or a shared local area network. The
network adapter327 thus may comprise the mechanical, electrical and signaling circuitry needed to connect the node to the network.
2. Storage Operating System
shows an example of the
storage operating system324 of the
storage server200. In the illustrative embodiment, the
storage operating system324 can be the NetApp® Data ONTAP® storage operating system available from Network Appliance Inc., of Sunnyvale, Calif., that implements a Write Anywhere File Layout (WAFL®) file system. However, it is expressly contemplated that any appropriate storage operating system may be employed for use in accordance with the inventive principles described herein. As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a computer to perform a storage function that manages data access. The
storage operating system324 can also be implemented as a microkernel, an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose storage operating system with configurable functionality, which is configured for storage applications as described herein.
To facilitate access to the
disks250, the
storage operating system324 implements a
file system430. The
file system430 logically organizes the information as a hierarchical structure of named directories and files on the disks. In one embodiment, the
file system430 is a write-anywhere file system that “virtualizes” the storage space provided by the
disks250. Each file may be implemented as a set of data blocks configured to store information, such as data, whereas a directory may be implemented as a specially formatted file in which names and links to other files and directories are stored.
324 further includes a
protocol module432, a
network access module433, a
RAID controller module436, and a
storage driver module435.
The
protocol module432 implements one or more of various high-level network protocols, such as Network File System (NFS), Common Internet File System (CIFS), Hypertext Transfer Protocol (HTTP) and/or Transmission Control Protocol/Internet Protocol (TCP/IP) to decode incoming client requests or encode outgoing responses to the client request in the appropriate protocol. The
network access module433 includes one or more drivers, which implement one or more lower-level protocols to communicate over the network, such as Ethernet. The
protocol module432 and the associated
network access module433 allow the
storage server200 to communicate over the communication system 230 (e.g., with clients 210), as shown in
FIG. 2.
RAID controller module 436 (also referred to herein as a “storage module”) manages storage devices, such as disks, in a RAID system comprising one or more RAID groups.
RAID controller module436 manages data storage and retrieval in response to access requests from
clients210, which may include requests to write data and/or to read data. In one embodiment,
RAID controller module436 can be a software module implemented on the
storage server200. In an alternative embodiment,
RAID controller module436 can be a separate enclosure implemented as hardware with its own controller or a processor, separate from, though in communication, with those that execute
file system430.
RAID controller module436 issues internal I/O commands to
disks250, in response to client requests. These commands can be a command to write data at a physical block number at
disk250 or a command to read data from a physical block number at
disk250.
According to an embodiment of the present invention, when
RAID controller module436 receives a request to perform a write operation, the
RAID controller module436 is configured to insert (in memory), into a checksum block of the data block indicated in the request, a unique ID of a disk drive where the data block will be written. Storage adapters, such as for example, storage adapter 326 (shown in
FIG. 3), transfer the data block(s) and the unique disk drive ID to the disk.
436 includes a
comparator module436 a.
Comparator module436 a is configured to compare the disk drive ID stored in memory (which was stored in the configuration area) and a disk drive ID from the checksum block. If the two match, it indicates that data were written to the read data block after the disk was added to the array. If the two do not match, the data block was written prior to the disk being added to the array or the data block has become free and is removed from the parity block.
A person of ordinary skill in the art would understand that other techniques can be used to determine whether the data blocks are invalid, and that using the unique disk drive ID is just one exemplary embodiment to implement the present invention.
435 allows
storage server200 to communicate with the
disks250. The
storage driver module435 implements a lower-level storage device access protocol, such as Fibre Channel (FC) protocol, Small Computer Systems Interface (SCSI) protocol, Serial ATA (SATA), or Serial Attached SCSI (SAS).
Methods of Operation
Referring now to
FIG. 5, it illustrates a flow diagram of the steps to perform a write operation while eliminating the need to zero disk drives prior to using the disk drives. A person of ordinary skill in the art would understand that the steps described below do not need to be executed in the described order except where expressly required.
Initially, when a disk drive is added to a RAID group,
RAID controller module436 creates a unique disk drive ID for that disk drive. As described herein, in one implementation, a unique disk drive ID can be created using a unique RAID group ID and a count of the number of disks already added to the RAID group. A person of ordinary skill in the art would understand that other mechanisms could be used to create a unique disk drive ID. The unique disk drive ID is different for each disk drive.
RAID controller module436 then stores the unique disk drive ID in the configuration area on the disk.
200 receives a write request from
client210 to modify a data block(s) in a RAID array.
Storage server200 passes the request to the
file system430.
File system430 identifies a logical block number(s) at which new data blocks will be written and sends a request to
RAID controller module436 to write the data blocks to the disk at the logical block number. The request includes a pointer to
buffer cache350 that stores data to be written, a disk drive number, and a block number on the disk where data is to be written.
436 determines whether the parity mask for a stripe is stored in
memory322. If the parity mask is not stored in memory,
RAID controller module436 determines whether to use a subtraction method or a recalculation method to perform a write operation without using the parity mask of the stripe. The determination of which method to use is made by determining the number of read requests required for each method. If the recalculation method would require fewer reads, then it is used. Otherwise, the subtraction method is used.
1. Read Modify Write (Subtraction) Method
In summary, when a subtraction method is used to perform a write operation, the present invention advantageously eliminates the need to zero disk drives or otherwise to recompute the parity by providing an indication whether any of the data blocks that will be modified are invalid (e.g., they were written prior to creating a RAID group or being added to the RAID group or the data blocks have become free and were removed from the parity block). If the data blocks that will be modified are invalid, they can be ignored and not used for parity computation, as if they were zeroed.
If the subtraction method is used,
RAID controller module436 reads, into
memory322, data blocks to which data will be overwritten as well as a parity block for that stripe (at step 520). At
step530,
RAID controller module436 determines whether the parity block is valid. The parity block is valid if it was written after the RAID group was created and the parity mask stored in the checksum block is not “0” and thus includes at least one data block for that stripe. In one implementation, to determine whether the parity block was written after the RAID group was created,
comparator module436 a compares the unique disk drive ID stored in the checksum block associated with the parity block with the unique disk drive ID kept in memory for the disk that stores parity. If the two match, it indicates that the parity block was written after the disk was added to the array, and thus stores valid data. Currently, the parity mask is equal to the mask read from the drive that stores parity for a stripe (step 540). This mask will be used to determine which data blocks in a stripe are included in the parity block for the stripe where new data will be written.
At
step550, data blocks from the disk drives which were read and which are included in the parity mask are XOR'ed with the parity block. To determine which disk drives are included in the parity mask for a stripe, the following algorithm can be used.
RAID controller module436 performs:
-
- 1. A logical AND operation on the mask of a disk drive to which data will be written and the parity mask;
- 2. If a result of a logical AND operation contains a logical bit “1”, it indicates that data from that drive are included in the parity block in the stripe. For example, if the mask of drive 0=“0001b” and the parity mask read from the drive that stores parity for a stripe=“0011b” then the AND operation on the two masks will produce the following result: “0001b.” Since the result of the logical operation contains a logical bit “1”, a data block on drive 0 is included in a parity block on the drive that stores parity for a stripe in the stripe.
- 3. Otherwise, if a result of a logical AND operation does not contain a logical bit “1”, it indicates that data from that drive are not included in the parity block in the stripe.
A person of ordinary skill in the art would understand that the above-described steps to determine whether a disk drive is included in the parity mask is just an exemplary way of implementing the present invention and other techniques that produce a similar result can be used.
Once it is determined which disk drives that are being overwritten are included in the parity mask, new parity is computed by XOR'ing the following: 1. Old data blocks that are being overwritten and are included in the parity block, 2. The parity block, and 3. New data blocks.
If at
step530 it is determined that the parity block for the stripe where data blocks will be modified is not valid (i.e., the unique drive ID stored in the configuration area for the drive that stores parity for a stripe does not match to the corresponding data stored in the checksum block associated with the parity block), it indicates that the parity block was written prior to the disk drive being added to the existing array and no data blocks in that stripe are included in the parity block. If no data blocks in the stripe are included in the parity block, these data blocks are invalid and can be ignored for purposes of parity computation. At
step592,
RAID controller module436 assigns a zero value to the parity mask for a stripe.
At step 594,
RAID controller module436 computes new parity without using old data blocks that are being overwritten since these data blocks are invalid. In one implementation, the new parity is computed by XOR'ing new data to compute a new parity block. Thus, the present invention eliminates the need to use invalid data blocks to compute parity even without zeroing the invalid data blocks.
If the parity block is valid, a logical operation is performed on the new data blocks, old data blocks that are being read which are included in the parity, and old parity within the stripe (
steps550 and 560).
Once new parity has been computed for the stripe, at
step570,
RAID controller module436 updates a parity mask for a stripe. The parity mask for a stripe is a result of a logical operation of masks of disk drives storing data blocks included in a parity block for that stripe. If none of the data blocks in a stripe to which new data are written were included in the parity block for that stripe, then parity mask was equal to zero before the new write operation. In one implementation, the parity mask is updated by performing a logical OR operation on the masks of disk drives where new data will be written. The mask of each data drive being written is logically OR'ed to the parity mask to produce a new parity mask. The new parity mask is inserted into a checksum block associated with the computed parity block.
According to an embodiment of the present invention, the updated parity mask may be cached in
memory322 in
buffer cache350. On a subsequent write request, the cached parity mask can be used to determine which data blocks are valid without reading the data blocks or a parity block. This embodiment is described herein in reference to
FIGS. 7A and 7B.
At
step580,
RAID controller module436 inserts, into a checksum block associated with a data block that will be written to a disk, a unique disk drive ID of that disk.
Storage adapter326 transfers the data block along with the unique disk drive ID from
buffer cache350 to the
disk250.
Comparator module436 a later compares the unique disk drive ID with the unique disk drive ID, which was stored in the configuration area on the disk and is kept in memory, to determine whether the data block is valid.
At
step590,
RAID controller module436 inserts, into a checksum block associated with the computed parity block, a unique disk drive ID of the drive that stores parity for a stripe where the parity block will be written along with the new parity mask for a stripe.
Storage adapter326 transfers the parity block along with the unique disk drive ID and the parity mask from
buffer cache350 onto the
parity disk250.
2. Recalculation Write (Recalculation Method)
Assume that
storage server200 receives a request to modify a number of data blocks in a RAID array. If the number of data blocks in a stripe being overwritten plus one is greater than the number of data blocks in the stripe not being overwritten,
RAID controller module436 uses a recalculation method to perform a write operation. Briefly, according to this method, data blocks that are not being overwritten are read into memory and XOR'ed with an XOR of new data blocks. The result of the XOR operation is written to a parity block for that stripe.
Referring now to
FIG. 6, it illustrates a flow diagram of a method for using disk drives to perform a write operation according to a recalculation method without zeroing disk drives prior to using them according to one embodiment of the present invention.
In summary, when a recalculation method is used to perform a write operation, the present invention advantageously eliminates the need to zero disk drives or otherwise to recompute the parity by determining whether any of the data blocks that will not be modified are valid. Invalid data blocks can be ignored and not used for parity computation as if they were zeroed.
At
step610,
RAID controller module436 reads into
memory322 data blocks to which data are not to be written. At
step620,
RAID controller module436 determines whether any data blocks to which data are not to be written are valid. In one implementation, to determine whether a data block is valid,
RAID controller module436 reads a unique disk drive ID in the checksum block associated with the data block.
Comparator module436 a compares the unique disk drive ID in the checksum block with the unique disk drive ID which was stored in the configuration area for that disk drive and is kept in memory. If the two match, it indicates that the data block is included in the parity block for that stripe and the data block was written after the disk drive was added to the disk array.
Optionally,
RAID controller module436 may check whether the parity block includes a data block(s) to which new data are not to be written. To this end,
RAID controller module436 may perform the following steps:
-
- Reads the parity block along with the associated checksum block from the drive storing parity for the stripe in addition to reading the data blocks and associated checksum blocks from the data drives which are not being written.
- Performs a logical AND operation on the mask of each disk drive to which data will not be written and the parity mask read from the drive that stores parity for a stripe;
- If a result of the logical AND operation contains a logical bit “1”, it indicates that for the stripe where new data will be written the data block from that drive is included in the parity block.
At step 640, parity is updated for the stripe where data blocks are modified. To this end,
RAID controller module436 performs a logical XOR operation on new data blocks and the data blocks read into memory that are valid and thus were included in parity. A person of ordinary skill in the art would understand that other techniques that produce a similar result can be used to update the parity.
If at
step620, in the stripe where data blocks are modified, all data blocks are invalid, it indicates that these data blocks were written prior to creating a RAID group or prior to adding the disks to the RAID group or these data blocks have become free (unallocated) and were removed from the parity block in the stripe. Therefore, these data blocks can be ignored for purposes of parity computation. At
step680,
RAID controller module436 assigns a zero value to the parity mask for the stripe.
RAID controller module436 computes new parity without using invalid data blocks that will be modified (step 690). In one implementation,
RAID controller module436 computes parity by XOR'ing new data.
At
step650,
RAID controller module436 updates the current mask of the parity mask. The current mask is a result of the logical operation performed on the mask of the disk drives that store data blocks included into the parity. If none of the data blocks are included in the parity block for the stripe where new data are written, the parity mask is assigned a zero value. In one implementation, to update the current mask, the current mask of the parity mask for a stripe is logically OR'ed with the masks of disk drives to which new data will be written. The result may be first written onto a checksum block associated with the parity block in
buffer cache350.
At
step660,
RAID controller module436 inserts, into a checksum block associated with a data block that will be written to a disk, a unique disk drive ID for that disk.
Storage adapter326 transfers the data block along with the unique disk drive ID from
buffer cache350 to the
disk250. The unique disk drive ID will be later compared by
comparator module436 a with the unique disk drive ID stored in the configuration area for that disk and kept in memory
At
step670,
RAID controller module436 inserts, a unique disk drive ID of the drive that stores parity for a stripe, into a checksum block associated with the computed parity block that will be written to the drive that stores parity for a stripe.
Storage adapter326 transfers the parity block along with the unique disk drive ID and the parity mask from
buffer cache350 to the drive that stores
parity250.
Thus, the present invention eliminates the need to use invalid data blocks to compute parity without zeroing those data blocks by providing an indication that a data block is invalid. The inventive technique for using disk drives in RAID arrays without zeroing the drives prior to using the arrays can be used to optimize write operations, to do faster data reconstruction, to recover a media error, and to do faster disk copies, as described in greater detail herein.
Optimizing Write Operations Using Cached Parity Mask
The inventive technique of the present invention can be used to reduce or eliminate read operations when performing a write operation. For example, if the parity mask for a stripe is cached in memory, it can be used to decide whether data blocks in a stripe where new data are written are valid even without reading those data blocks.
Referring now to
FIG. 7A, it illustrates a flow diagram of a method for optimizing write operations using inventive techniques of the present invention according to one embodiment of the present invention.
Initially, at
step710, a determination is made whether the parity mask for the stripe is cached in
memory322. If the mask is not cached in
memory322, a determination is made whether the number of read operations that need to be performed using a subtraction method exceeds the number of read operations that need to be performed using a recalculation method (step 715). If so, the write operation is performed using a recalculation method as described in
FIG. 6. Alternatively, if the number of read operations needed to be performed using a recalculation method is equal to or exceeds the number of read operations needed to be performed using the subtraction method, the write operation is performed using a subtraction method, as described in
FIG. 5.
If the parity mask is cached in
memory322, the parity mask can be used to determine which data blocks are valid, e.g., included into a parity block in a stripe where new data will be written. This information will be used to eliminate the need to read invalid data blocks to perform a write operation.
At
step720, a determination is made whether the number of read operations that would have to be performed using a recalculation method is less than the number of read operations that would have to be performed using a subtraction method. To this end, the cached parity mask for the stripe is used to determine data blocks from which drives are included in the parity for the stripe. In one implementation, the following steps can be performed by RAID controller module 436:
-
- Logical AND operation on the mask of each disk drive and the parity mask for the stripe;
- If a result of the logical AND operation on the mask of the drive that will be written and the parity mask for the stripe does not contain a logical bit “1”, it indicates that data from that drive are not included in the parity block and thus would not have to be read if a subtraction method is used;
- If a result of the logical AND operation of the mask of a disk drive that will not be written and the parity mask for the stripe does not contain a logical bit “1”, it indicates that data from that drive are not included in the parity and thus will not have to be read if a recalculation method is used;
436 uses this information to determine a number of read operations that need to be performed by a subtraction method or a recalculation method to perform a write operation. It at
step720, the number of read operations for a recalculation method is greater than or equal to the number of read operations for a subtraction method, then the write operation is performed using a subtraction method (as described in reference to
FIG. 7B). Alternatively, the write operation is performed using a recalculation method.
To this end, at
step730, data blocks that meet the following two requirements are read into memory 322: (1) data blocks that are included in a parity block for the stripe where new data are written; and (2) data blocks to which new data are not to be written. Data blocks that are not included in the parity block are not read since they are invalid and can be ignored for parity computation.
At
step740, data blocks that were read are logically XOR'ed. At
step750, new data blocks are logically XOR'ed in memory. To compute new parity, the result of the XOR operation is XOR'ed with the result of the logical operation at
step740.
At
step760, the parity mask for the stripe is updated. In one implementation, the mask is updated by being OR'ed with the masks of the drives where new data are written. The mask is written into a checksum block associated with the computed parity block.
At
step770,
RAID controller module436 inserts, into a checksum block associated with the data block stored in
buffer cache350 that will be written to a disk, a unique disk drive ID for that disk.
Storage adapter326 transfers the data block along with the unique disk drive ID from
buffer cache350 onto the
disk250. The unique disk drive ID will be later compared by
comparator module436 a with the unique disk drive ID kept in memory that is stored in the configuration area on the disk.
At
step780,
RAID controller module436 inserts, into a checksum block associated with the computed parity block a unique disk drive ID for the drive that stores parity for the stripe where the parity block will be written.
Storage adapter326 transfers the parity block along with the unique disk drive ID and the parity mask from
buffer cache350 onto the
parity disk250.
Thus, since invalid data blocks are not read to compute parity, the optimization technique of the present invention may eliminate or reduce the number of read operations to complete a write operation.
Referring now to
FIG. 7B, it illustrates a flow diagram of a method for optimizing write operations using inventive techniques of the present invention according to another embodiment of the present invention. According to this embodiment, the write operation is performed using a subtraction method since the number of read operations to be performed using the recalculation method is greater than the number of read operations to be performed using the subtraction method (step 782).
As described herein, when a parity mask for a stripe is cached in
memory322, it can be used to determine which data blocks are valid in a stripe where new data are written.
At
step784,
RAID controller module436 reads into
memory322 data blocks that meet the following two criteria: (1) data blocks that are being overwritten; and (2) included in a parity block in the stripe where new data are written. Data blocks that are not included in the parity block do not need to be read and can be ignored for parity computation.
At
step786, a parity block for the stripe where new data will be written is read into memory. Then data blocks that are read into the memory are logically XOR'ed, at
step788. At
step790 new data blocks are XOR'ed in memory. To compute new parity, the result of the XOR operation in
step790 is XOR'ed with the result of the operation in
step788.
At
step792, the parity mask for the stripe is updated. In one implementation, the parity mask is OR'ed with the masks of the drives where new data will be written in the stripe. The mask is written into a checksum block associated with the computed parity block.
At
step794,
RAID controller module436 inserts, into a checksum block associated with the data block that will be written to a disk drive, a unique disk drive ID for that disk.
Storage adapter326 transfers the data block along with the unique disk drive ID from
buffer cache350 to the
disk250. The unique disk drive ID will be later compared by
comparator module436 a with the unique disk drive ID kept in memory that is stored in the configuration area on the disk.
At
step796,
RAID controller module436 inserts, into a checksum block associated with the computed parity block a unique disk drive ID for the drive that stores parity for a stripe where the parity block will be written.
Storage adapter326 transfers the parity block along with the unique disk drive ID and the parity mask for the stripe from
buffer cache350 onto the
parity disk250.
Thus, this embodiment eliminates the need to read invalid data blocks to compute parity. This, in turn, reduces the number of read operations to perform a write operation.
Optimizing Write Operations Using Modified XPWrite Command
According to yet another optimization technique, a write operation can be performed by eliminating read operations if none of the data blocks being overwritten are valid (i.e., their unique disk drive ID does not compare to the unique disk drive ID which was stored in the configuration information for the respective disk drives) but some other data blocks in the stripe not being overwritten are valid. To implement this embodiment,
RAID controller module436 issues a command to a disk drive that stores parity. In one embodiment, the command is a modified XPWrite command. XPWrite command is provided by SCSI standard, described in SCSI Block Commands, second generation, published draft of Sep. 22, 2006 from the T10 Technical Committee (see www.t10.org). XPWrite command includes a logical block address (LBA) of the block where to write data as well as a number of blocks for writing the data. As will be described below, XPWrite command is modified so that data stored in the checksum block associated with the new data block is not XOR'ed with the parity block. Rather, it is written directly to the parity block. Although the present invention is described in the context of the modified XPWrite command, a person of ordinary skill in the art would understand that any command that produces a same result can be used.
Referring now to
FIG. 9, it illustrates a flow diagram of a method for optimizing write operations using a modified XPWrite command according to one embodiment of the present invention.
Initially, at
step910, it was determined that the parity mask for the stripe is cached in
memory322. The parity mask is used to determine which data blocks are valid (e.g., included in the parity block for the stripe where new data are to be written). At
step920, a determination is made whether data blocks that will be modified are not valid and some other data blocks in a stripe are valid. In one embodiment, the determination is made by performing a logical AND operation on the mask of each data drive and the parity mask for the stripe. If a result of the logical operation does not contain a logical “1”, it indicates that a data block is not included in the parity, and thus does not need to be read for purposes of parity computation. If any of the data blocks that are being overwritten in the stripe are valid, or all of the data blocks in the stripe not being overwritten are invalid, the write operation is performed according to the steps recited in
FIG. 7A.
At
step930, new data blocks are XOR'ed in
memory322.
RAID controller module436 inserts a unique disk drive ID of the drive that stores parity for the stripe.
Module436 inserts the parity mask for the stripe into the checksum block associated with the parity block. The parity mask is updated to include the masks of drives being written.
RAID controller module436 then issues a modified XPWrite command to the disk that stores parity (step 940).
RAID controller module436 writes, to the data drives, the new data blocks along with the checksum block, which includes the unique drive ID.
Referring now to
FIG. 11, it illustrates components residing on each
disk250 in a RAID group, such as
RAID group202 shown in
FIG. 2, to execute a modified XPWrite command issued by
RAID controller module436. Each
disk250 in a
RAID group202 includes a
controller1100, which implements a
command executing module1130. In response to receiving a modified XPWrite command,
command executing module1130 is configured to read data from the data block indicated in the command and to request new data from
storage system200.
Module1130 also receives the XOR of new data blocks and computes new parity by XOR'ing the new data blocks with the data read in memory. Steps performed by
module1130 will be described in more detail in reference to
FIG. 10. In a preferred embodiment,
command executing module1130 is implemented as firmware embedded into a hardware device, such as
controller1100. In another embodiment,
command executing module1130 can be implemented as software stored in the
disk memory1120. In another embodiment,
command executing module1130 may reside in a controller between
storage server200 and the disk drive. The
command executing module1130 does not need to reside in the disk drive.
Referring to
FIG. 10, it illustrates the steps performed by
command executing module1130 at the parity disk to execute a modified XPWrite command issued by
RAID controller module436 at the
storage server200.
At
step1010,
command executing module1130 receives the command issued by
RAID controller module436. The command includes a block number at which data will be written as well as the number of blocks that the data occupies. In addition, the command includes an Operation Code (opcode) indicating the type of the command.
Command executing
module1130 reads (step 1020), into
disk memory1120, a data block(s) indicated in the command. Command executing
module1130 requests data from storage driver module 435 (step 1030). In one implementation,
storage driver module435 sends, to command executing
module1130, XOR of data blocks that will be written to disk drives and the checksum block to be written to disk.
Command executing
module1130 receives the result of the XOR operation (step 1040) and the checksum block. Then, command executing
module1130 computes new parity by XOR'ing the result of the XOR operation of the previous step with the data read into the disk memory 1120 (step 1050). At
step1060,
command executing module1130 writes the computed parity to the parity disk. At
step1070,
module1130 writes the checksum block to the parity disk. According to an embodiment of the present invention, the checksum block is written directly to the drive that stores parity for a stripe without being XOR'ed with the parity.
According to this embodiment, as a result of the execution of XPWrite command by the parity disk, the necessity for reading the old parity block by
RAID controller module436 and XOR'ing the parity block with new data is eliminated. Instead,
command execution module1130 at the
disk250 reads the old parity block, XOR's the old parity block with the XOR of new data, and writes the result of the operation to the drive that stores parity for a stripe.
Optimizing Write Operations by Removing Free Data Blocks from Parity
According to another embodiment of the present invention, write operations can be optimized if unallocated (or free data blocks) are removed from a parity block for that stripe. According to an embodiment of the present invention,
file system430 includes a write allocation module 442 (shown in
FIG. 4) that performs write allocation of blocks in a volume in response to an event in the file system (e.g., dirtying of the blocks in a file). To allocate data blocks, the
write allocation module442 uses the block allocation data structures (one
such structure440 is shown in
FIG. 4) to select free blocks (i.e., data blocks that are not allocated) to which to write the dirty blocks (i.e., data blocks that store data). In one embodiment, the block
allocation data structure440 can be implemented as a bitmap, in which a logical “1” indicates that a data block is allocated and logical “0” indicates that a data block is not allocated or vice versa.
When performing write allocation,
file system430 traverses a small portion of each disk (corresponding to a few blocks in depth within each disk) to “lay down” a plurality of stripes per RAID group. When performing write allocation within the volume, the
write allocation module442 typically works down a RAID group, allocating all free blocks within the stripes it passes over. This is efficient from a RAID system point of view in that more blocks are written per stripe.
According to one technique, which is described in a commonly-owned patent application U.S. Pat. No. 7,822,921, issued on Oct. 26, 2010, entitled “SYSTEM AND METHOD FOR OPTIMIZING WRITE OPERATIONS IN STORAGE SYSTEMS,” by James Taylor, the disclosure of which is incorporated by reference herein, a background process is executed to zero unallocated data blocks on the disks. A data structure is kept in memory to indicate which data blocks are zeroed. When zeroed data blocks are modified, the need to read those data blocks for parity computation is eliminated. As a result, a number of read operations can be reduced.
As previously described, zeroing data blocks can be a time-consuming process. According to an embodiment of the present invention, since data from unallocated data blocks is removed from parity in a corresponding stripe, new parity can be computed without using the free data blocks even without zeroing those data blocks.
FIG. 8illustrates the steps to implement this embodiment of the present invention.
As described herein,
file system430 maintains block
allocation data structure440 in which the
file system430 keeps track of allocated and unallocated data blocks.
RAID controller module436 periodically receives an indication from
file system430 which data blocks have become unallocated or free (step 810).
At
step820, to remove unallocated data blocks from a parity block,
RAID controller module436 XOR's unallocated data blocks with the parity block for that stripe. In one embodiment, the process of removing unallocated data blocks from a parity block is performed during idle time. As used herein, “idle time” refers to a period of time during which requests from
clients210 are expected to be low (such as, for example, during weekends, holidays, and off-peak hours).
In another embodiment, the
file system430 may remove unallocated data blocks from a parity block at the same time that data are written to other data blocks in the same stripe.
RAID controller module436 then updates the parity block by XOR'ing all data blocks in the stripe except for the unallocated data blocks.
At
step830,
RAID controller module436 clears the unique disk ID in a data block (s) that is being removed from parity. At
step840, the parity mask for the stripe is updated. In one implementation, to this end, a logical AND operation is performed on the current parity mask and a complement of the mask of the drive that stores a free data block. A complement is a logical function that converts each logical “1” bit to a logical “0” bit and each logical “0” bit to a logical “1” bit. For example, if the mask stored in the checksum block on the drive that stores parity for a stripe were “0111b” (indicating that the data blocks for
drive0, drive 1, and drive 2 were contained in the parity), and the data for drive 1 were removed from parity, then the mask “0111b” would be ANDed with the complement of the mask for drive 1. The mask for drive 1 is 0010b.” The complement of the mask for drive 1 is “1101b”. Thus, the result of the logical AND operation on “0111b” and “1101b” is “0101b.” This removes the mask of the drive that stores free data blocks from the parity mask.
This embodiment provides the number of advantages over the technique described in the referenced pending patent application. First, this embodiment eliminates the need to zero unallocated data blocks. Instead, a checksum block (which holds up to 64 bytes of data) associated with the unallocated data block(s) needs to be zeroed to clear a disk drive ID. Thus, if a size of a data block is 4 KB and a checksum block can store up to 64 bytes of data, the total number of bytes if 4160. With disk drives having 520 bytes per sector, 4160 bytes would require eight sectors. Writing zeroes to checksum blocks only is an equivalent of zeroing one sector rather than eight sectors, which results in savings of not writing data to the remaining seven sectors.
Furthermore, this embodiment, unlike the embodiments described in the referenced pending patent application, eliminates the need to maintain an additional data structure to keep track of the zeroed data blocks. A similar effect can be achieved without taking up additional resources by storing a unique disk drive ID on each disk drive and a mask and a unique disk drive ID on the disk that stores parity.
Error Recovery
A media error on a disk occurs when data cannot be read from a particular block or a number of blocks from that disk. Referring now to Table 1, it illustrates an array of disks storing data blocks. In this example, one stripe (Stripe 0) is shown for purposes of explanation only. A person of ordinary skill in the art would understand that the disk array shown in Table 1 may include more than one stripe. Disk 1 through
Disk4 store data.
Disk0 stores row parity. Thus, the exemplary disk array is implemented using RAID-4 parity scheme.
TABLE 1 |
Disk Array |
0 | ||||
Row Parity | Disk 1 | Disk 2 | Disk 3 | Disk 4 |
9 | 0 | 2 | 3 | 4 |
Assume that Disk 1 failed (i.e., the disk experienced an operation problem that renders the disk unable to respond to read-write operations). Disk 2 stores a data block that has a media error. In general, to recover data from a data block on Disk 2 that has a media error,
RAID controller module436 would have to perform the following steps:
Read data stored on Disk 1, Disk 3, and
Disk4 for
Stripe0;
Read parity from
Disk0 for
Stripe0;
XOR data stored on Disk 1, Disk 3, and
Disk4 for
Stripe0;
XOR the result of the previous step and parity for
Stripe0.
However, since Disk 1 failed and data stored on Disk 1 cannot be provided to recover data stored on Disk 2, data block that has a media error on Disk 2 could not be recovered.
According to the technique described in the above-referenced commonly-owned U.S. Pat. No. 7,822,921, if Disk 1 were zeroed, the need to read Disk 1 to recover data from Disk 2 would be eliminated.
The present invention provides the same benefits of not reading data blocks from the failed disk drive, but, unlike the referenced patent application, it does not require the data block on Disk 1 for that stripe to be zeroed. This is accomplished by determining whether a data block on a failed drive in a stripe where a media error was detected is valid. In one implementation,
RAID controller module436 performs a logical AND operation on the parity mask for the stripe and the mask of the failed drive. If a result of the logical operation does not contain a logical bit “1”, it indicates that data from the failed drive are not included in the parity block for that stripe and was written prior to adding the drive to the RAID group or before the RAID group was formed, or was removed from parity after the data block became free. As such, the data from the failed drive can be ignored and need not be read to recover data from another data block in the stripe. This allows
storage server200 to recover data from the data block that has a media error in an array that has a failed drive without zeroing the failed drive in RAID-4 or RAID-5. Furthermore, this embodiment allows
storage server200 to recover data in a dual parity RAID array with two failed drives and a third drive failure or a media error on a third disk.
Data Reconstruction
Occasionally, disks experience an operational problem that either degrades disks' read-write performance or causes a disk failure. Conventionally, spare disks are used to replace failed disks. Once the storage system replaces the failed disk with the spare disk, it begins a process of reconstructing data from the failed disk. The reconstruction process can be performed intermittently with normal I/O operations. The process of data reconstruction involves reading data from all disks other than the failed disk, XOR'ing the data being read, and writing the result of the XOR operation to a spare disk, which becomes a reconstructing disk once it replaces the failed disk. Thus, to reconstruct data in a stripe, multiple read operations and at least one write operation have to be performed. Performing multiple read operations during the reconstruct process results in rotation latency, thereby increasing overall time required to perform a reconstruct process. In addition, performance of the storage system suffers during reconstruct time since regular I/O operations are performed at the same time. Furthermore, if during a reconstruct process a second disk fails, or has a media error, it is impossible to reconstruct data in RAID-4 or RAID-5. In a dual parity RAID array with two failed drives and a third drive failure or a media error on a third disk, some or all data would not be recoverable.
According to an embodiment of the present invention, the number of I/O operations can be reduced to reconstruct data from the failed drive by not reading or reconstructing invalid data blocks, e.g., data blocks that were written prior to creating a RAID group or adding the respective drives to existing RAID groups, or blocks that were removed from the parity after they became free. This can be done even without zeroing these data blocks.
According to an embodiment of the present invention, instead of reading data blocks from all disks in a stripe,
RAID controller module436 determines whether a data block in a stripe is included in the parity block for that stripe. To this end, in one implementation,
RAID controller module436 performs a logical AND operation on the parity mask for the stripe and the mask of the disk drive that needs to be read for purposes of reconstructing data on the failed drive. If a result of the logical operation does not contain a logical bit “1”, it indicates that data from that disk drive are not included in the parity block for that stripe. Therefore, the data were written prior to adding the disk to the RAID array or prior to forming the array, or was removed from the parity for the stripe when the data block became free. As such, this data block does not need to be read and can be ignored for purposes of reconstructing data on the failed drive. Additionally, if the data block to be reconstructed is invalid, then it does not need to be reconstructed and no reads or writes are required for that stripe.
Copying Data from a Sick Disk
Occasionally, when a disk drive in an array reports multiple errors, it is common to copy data from that disk drive (also referred to herein as a “sick disk”) onto a spare disk before the disk drive is failed. Conventionally, to copy data from the sick disk onto a spare disk requires multiple read and write operations. Performance of the storage system suffers during the time the data are copied since regular I/O operations are performed at the same time.
According to an embodiment of the present invention, the number of read and write operations is reduced by determining which data blocks were written prior to creating a RAID group or being added to the RAID group, or have become free and were removed from the parity for the stripe and thus do not need to be copied. According to the present invention, data blocks that are not included in the parity block for the same stripe were written prior to adding their respective drives to the RAID array or these data blocks have become free. To determine whether a data block is included in the parity, in one implementation,
RAID controller module436 performs a logical AND operation on the parity mask for a stripe and the mask of the disk drive that includes a data block that is about to be copied onto another disk. If a result of a logical operation does not contain a logical bit “1”, it indicates that the data block is not included in the parity block for that stripe. These data blocks are not being copied. Hence, this reduces the number of I/O operations that would otherwise have to be performed to copy data blocks from a sick disk. This also reduces the reliance on the sick disk and allows the valid data to be copied onto a spare disk faster.
Thus, embodiments of the present invention provide a method, system, and computer program product for providing an indication whether a disk stores data that were written prior to the disk being added to an existing array of disks or have become free and were removed from parity for a stripe. Knowing that a data block was written prior to being added to the array or has become free and was removed from parity allows a storage server to ignore invalid data and not to use it for purposes of parity computation. This, in turn, eliminates the need to zero disks or to perform parity re-computation prior to using the disks to ensure that the parity is consistent with the data in the correspondent stripe in the array.
Although the present invention for purpose of explanation has been described with reference to specific exemplary embodiments, it will be understood that the invention is not limited to the embodiments described. A person of ordinary skill in the art would understand that the present invention can be practiced with modifications and alternations to those embodiments or can be practiced in other embodiments within the spirit and scope of the appended claims.
Although the present invention is described in terms of using RAID-4, it can be adapted by those of skill in the art to be used in other implementations, such as RAID-5, RAID-DP, etc. Furthermore, although the present invention is described using the file system, the RAID controller, and the disk driver system, those of skill in the art would understand that the invention can be practiced with other modules.
Moreover, non-dependent acts may be performed in parallel. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Furthermore, the use of the phrase “one embodiment” throughout does not necessarily mean the same embodiment. Although these particular embodiments of the invention have been described, the invention should not be limited to these particular embodiments. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense.
Unless specifically stated otherwise, it is to be appreciated that throughout the discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like refer to the action and processes of a computer system or similar electronic computing device that manipulates and transforms data represented as physical (e.g. electronic) quantities within the computer systems registers and memories into other data similarly represented as physical quantities within the computer system.
The present invention can be implemented by an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes or it may comprise a machine, such as a general purpose computer selectively activated or reconfigured by a computer program (such as a collection of instructions for execution by a machine or processor for example) stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to any type of disk including floppy disks, optical disks, magnetic optical disks, read-only memories, random access memories, EPROMS, EEPROMS, magnetic or optical cards or any type of media suitable for storing physical (e.g. electronic) constructions and each coupled to a computer system bus. Each of these media may be coupled to a computer system bus through use of an appropriate device for reading and or for writing the media.
Claims (15)
1. A method performed by a computer connected to a storage array having a plurality of storage devices, comprising:
determining by a processor of the computer whether a block in a stripe is to be included in a parity block for the stripe;
in response to determining that the block in the stripe is to be included in the parity block, determining whether the block is valid;
in response to determining that the block is invalid, ignoring the block to update the parity block for the stripe; and
in response to determining that the block is valid, updating the parity block using the block in the stripe.
2. The method of
claim 1, further comprising assigning a first mask to each storage device of the plurality of storage devices in the storage array.
3. The method of
claim 2, further comprising updating a second mask of a particular storage device to which new data are written that stores parity for the stripe in the array.
4. The method of
claim 1, wherein each storage device of the plurality of storage devices comprises a disk drive.
5. The method of
claim 1, further comprising:
inserting, into a checksum block of a new data block of the stripe, a unique storage device identification (ID) of a storage device of the plurality of storage devices where the new data block will be written; and
writing the new data block and an associated checksum block to the storage device identified by the unique storage device ID.
6. The method of
claim 1, further comprising:
inserting, into a checksum block associated with the parity block, a unique storage device identification (ID) of a storage device of the plurality of storage devices where the parity block will be written; and
writing the parity block and the checksum block to the storage device where the parity block will be written.
7. The method of
claim 1wherein determining whether the block is valid comprises:
storing a unique storage device identification (ID) on a storage device of the plurality of storage devices in the storage array storing the block;
responsive to receiving an access request that comprises a write operation and associated data, reading a data block in the stripe that is not to be modified and an associated checksum block; and
comparing the unique storage device ID to the data stored in the checksum block.
8. A computer configured to perform input/output (I/O) operations for a storage array having a plurality of storage devices, comprising:
a processor configured to execute one or more processes, the one or more processes, when executed, configured to:
determine whether a block in a stripe is to be included in a parity block for the stripe;
determine, in response to determining that the block in the stripe is to be included in the parity block, whether the block is valid;
ignore, in response to determining that the block is invalid, the block to update the parity block for the stripe; and
update, in response to determining that the block is valid, the parity block using the block in the stripe.
9. The method of
claim 7, wherein the unique storage device ID is read from a memory of the computer.
10. The computer of
claim 8, further comprising a storage module coupled to the processor, the storage module configured to insert, in a checksum block associated with a data block that stores data in an access request, a unique identification (ID) of a storage device of the plurality of storage devices in the storage array where the data in the access request will be written.
11. The computer of
claim 8, further comprising a storage module coupled to the processor, the storage module configured to insert, in a checksum block associated with the parity block, a unique identification (ID) of a storage device of the plurality of storage devices where the parity block will be written.
12. The computer of
claim 8, further comprising a memory coupled to the processor for storing a mask of each storage device of the plurality of storage devices in the storage array.
13. The computer of
claim 8, wherein each storage device of the plurality of storage devices comprises a disk drive.
14. The computer of
claim 8further comprising:
a storage module coupled to the processor and configured to store a unique storage device identification (ID) on a storage device of the plurality of storage devices in the storage array; and
responsive to receiving an access request that comprises a write operation and associated data, the storage module further configured to read a data block in the stripe that is not to be modified and an associated checksum block, and further configured to compare the unique storage device ID to the data stored in the checksum block.
15. A computer-readable storage medium stored with program instructions for execution by a processor, the computer-readable storage medium comprising:
program instructions that determine whether a block in a stripe is to be included in a parity block for the stripe;
program instructions that determine, in response to determining that the block in the stripe is to be included in the parity block, whether the block is valid;
program instructions that ignore, in response to determining that the block is invalid, the block to update the parity block for the stripe; and
program instructions that update, in response to determining that the block is valid, the parity block using the block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/734,314 US8209587B1 (en) | 2007-04-12 | 2007-04-12 | System and method for eliminating zeroing of disk drives in RAID arrays |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/734,314 US8209587B1 (en) | 2007-04-12 | 2007-04-12 | System and method for eliminating zeroing of disk drives in RAID arrays |
Publications (1)
Publication Number | Publication Date |
---|---|
US8209587B1 true US8209587B1 (en) | 2012-06-26 |
Family
ID=46273002
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/734,314 Active 2031-04-12 US8209587B1 (en) | 2007-04-12 | 2007-04-12 | System and method for eliminating zeroing of disk drives in RAID arrays |
Country Status (1)
Country | Link |
---|---|
US (1) | US8209587B1 (en) |
Cited By (30)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120290905A1 (en) * | 2011-05-13 | 2012-11-15 | Lsi Corporation | System and method for optimizing read-modify-write operations in a raid 6 volume |
US20130013857A1 (en) * | 2011-07-05 | 2013-01-10 | Dell Products, Lp | System and Method for Providing a RAID Plus Copy Model for a Storage Network |
US20130246707A1 (en) * | 2010-10-21 | 2013-09-19 | Oracle International Corporation | Two stage checksummed raid storage model |
CN103488434A (en) * | 2013-09-23 | 2014-01-01 | 浪潮电子信息产业股份有限公司 | Method for improving disk array reliability |
US20140181455A1 (en) * | 2012-12-20 | 2014-06-26 | Apple Inc. | Category based space allocation for multiple storage devices |
US20140337578A1 (en) * | 2011-03-01 | 2014-11-13 | Lsi Corporation | Redundant array of inexpensive disks (raid) system configured to reduce rebuild time and to prevent data sprawl |
US20150193168A1 (en) * | 2014-01-07 | 2015-07-09 | Netapp, Inc. | Clustered raid assimilation management |
US9304912B1 (en) * | 2012-01-06 | 2016-04-05 | Marvell International Ltd. | Systems and methods for building redundancy data in a RAID system |
EP2933733A4 (en) * | 2013-12-31 | 2016-05-11 | Huawei Tech Co Ltd | Data processing method and device in distributed file storage system |
US20170147431A1 (en) * | 2015-11-20 | 2017-05-25 | Qualcomm Incorporated | Protecting an ecc location when transmitting correction data across a memory link |
US9671960B2 (en) | 2014-09-12 | 2017-06-06 | Netapp, Inc. | Rate matching technique for balancing segment cleaning and I/O workload |
US9710317B2 (en) | 2015-03-30 | 2017-07-18 | Netapp, Inc. | Methods to identify, handle and recover from suspect SSDS in a clustered flash array |
US9720601B2 (en) | 2015-02-11 | 2017-08-01 | Netapp, Inc. | Load balancing technique for a storage array |
US9740566B2 (en) | 2015-07-31 | 2017-08-22 | Netapp, Inc. | Snapshot creation workflow |
US9762460B2 (en) | 2015-03-24 | 2017-09-12 | Netapp, Inc. | Providing continuous context for operational information of a storage system |
US9798728B2 (en) | 2014-07-24 | 2017-10-24 | Netapp, Inc. | System performing data deduplication using a dense tree data structure |
US9830220B1 (en) * | 2014-09-29 | 2017-11-28 | EMC IP Holding Company LLC | Enhanced error recovery for data storage drives |
US9836229B2 (en) | 2014-11-18 | 2017-12-05 | Netapp, Inc. | N-way merge technique for updating volume metadata in a storage I/O stack |
US10133511B2 (en) | 2014-09-12 | 2018-11-20 | Netapp, Inc | Optimized segment cleaning technique |
US10235059B2 (en) | 2015-12-01 | 2019-03-19 | Netapp, Inc. | Technique for maintaining consistent I/O processing throughput in a storage system |
CN110413205A (en) * | 2018-04-28 | 2019-11-05 | 伊姆西Ip控股有限责任公司 | Method, equipment and computer readable storage medium for being written to disk array |
US10496496B2 (en) * | 2014-10-29 | 2019-12-03 | Hewlett Packard Enterprise Development Lp | Data restoration using allocation maps |
US10911328B2 (en) | 2011-12-27 | 2021-02-02 | Netapp, Inc. | Quality of service policy based load adaption |
US10929022B2 (en) | 2016-04-25 | 2021-02-23 | Netapp. Inc. | Space savings reporting for storage system supporting snapshot and clones |
US10951488B2 (en) | 2011-12-27 | 2021-03-16 | Netapp, Inc. | Rule-based performance class access management for storage cluster performance guarantees |
US10997098B2 (en) | 2016-09-20 | 2021-05-04 | Netapp, Inc. | Quality of service policy sets |
US11016848B2 (en) | 2017-11-02 | 2021-05-25 | Seagate Technology Llc | Distributed data storage system with initialization-less parity |
US11379119B2 (en) | 2010-03-05 | 2022-07-05 | Netapp, Inc. | Writing data in a distributed data storage system |
US11386120B2 (en) | 2014-02-21 | 2022-07-12 | Netapp, Inc. | Data syncing in a distributed system |
US11625193B2 (en) | 2020-07-10 | 2023-04-11 | Samsung Electronics Co., Ltd. | RAID storage device, host, and RAID system |
Citations (84)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3876978A (en) | 1973-06-04 | 1975-04-08 | Ibm | Archival data protection |
US4092732A (en) | 1977-05-31 | 1978-05-30 | International Business Machines Corporation | System for recovering data stored in failed memory unit |
US4201976A (en) | 1977-12-23 | 1980-05-06 | International Business Machines Corporation | Plural channel error correcting methods and means using adaptive reallocation of redundant channels among groups of channels |
US4205324A (en) | 1977-12-23 | 1980-05-27 | International Business Machines Corporation | Methods and means for simultaneously correcting several channels in error in a parallel multi channel data system using continuously modifiable syndromes and selective generation of internal channel pointers |
US4375100A (en) | 1979-10-24 | 1983-02-22 | Matsushita Electric Industrial Company, Limited | Method and apparatus for encoding low redundancy check words from source data |
US4467421A (en) | 1979-10-18 | 1984-08-21 | Storage Technology Corporation | Virtual storage system and method |
US4517663A (en) | 1979-09-04 | 1985-05-14 | Fujitsu Fanuc Limited | Method of rewriting data in non-volatile memory, and system therefor |
US4547882A (en) | 1983-03-01 | 1985-10-15 | The Board Of Trustees Of The Leland Stanford Jr. University | Error detecting and correcting memories |
US4667326A (en) | 1984-12-20 | 1987-05-19 | Advanced Micro Devices, Inc. | Method and apparatus for error detection and correction in systems comprising floppy and/or hard disk drives |
US4688221A (en) | 1983-12-28 | 1987-08-18 | Hitachi, Ltd. | Error recovery method and apparatus |
US4722085A (en) | 1986-02-03 | 1988-01-26 | Unisys Corp. | High capacity disk storage system having unusually high fault tolerance level and bandpass |
US4755978A (en) | 1986-02-18 | 1988-07-05 | Sony Corporation | Disc player |
US4761785A (en) | 1986-06-12 | 1988-08-02 | International Business Machines Corporation | Parity spreading to enhance storage access |
US4775978A (en) | 1987-01-12 | 1988-10-04 | Magnetic Peripherals Inc. | Data error correction system |
US4796260A (en) | 1987-03-30 | 1989-01-03 | Scs Telecom, Inc. | Schilling-Manela forward error correction and detection code method and apparatus |
US4817035A (en) | 1984-03-16 | 1989-03-28 | Cii Honeywell Bull | Method of recording in a disk memory and disk memory system |
US4825403A (en) | 1983-05-16 | 1989-04-25 | Data General Corporation | Apparatus guaranteeing that a controller in a disk drive system receives at least some data from an invalid track sector |
US4837680A (en) | 1987-08-28 | 1989-06-06 | International Business Machines Corporation | Controlling asynchronously operating peripherals |
US4847842A (en) | 1987-11-19 | 1989-07-11 | Scs Telecom, Inc. | SM codec method and apparatus |
US4849976A (en) | 1987-08-03 | 1989-07-18 | Scs Telecom, Inc. | PASM and TASM forward error correction and detection code method and apparatus |
US4849974A (en) | 1987-08-03 | 1989-07-18 | Scs Telecom, Inc. | PASM and TASM forward error correction and detection code method and apparatus |
US4870643A (en) | 1987-11-06 | 1989-09-26 | Micropolis Corporation | Parallel drive array storage system |
US4899342A (en) | 1988-02-01 | 1990-02-06 | Thinking Machines Corporation | Method and apparatus for operating multi-unit array of memories |
US4989205A (en) | 1988-06-28 | 1991-01-29 | Storage Technology Corporation | Disk drive memory |
US4989206A (en) | 1988-06-28 | 1991-01-29 | Storage Technology Corporation | Disk drive memory |
US5077736A (en) | 1988-06-28 | 1991-12-31 | Storage Technology Corporation | Disk drive memory |
US5088081A (en) | 1990-03-28 | 1992-02-11 | Prime Computer, Inc. | Method and apparatus for improved disk access |
US5101492A (en) | 1989-11-03 | 1992-03-31 | Compaq Computer Corporation | Data redundancy and recovery protection |
US5128810A (en) | 1988-08-02 | 1992-07-07 | Cray Research, Inc. | Single disk emulation interface for an array of synchronous spindle disk drives |
US5148432A (en) | 1988-11-14 | 1992-09-15 | Array Technology Corporation | Arrayed disk drive system and method |
USRE34100E (en) | 1987-01-12 | 1992-10-13 | Seagate Technology, Inc. | Data error correction system |
US5163131A (en) | 1989-09-08 | 1992-11-10 | Auspex Systems, Inc. | Parallel i/o network file server architecture |
US5166936A (en) | 1990-07-20 | 1992-11-24 | Compaq Computer Corporation | Automatic hard disk bad sector remapping |
US5179704A (en) | 1991-03-13 | 1993-01-12 | Ncr Corporation | Method and apparatus for generating disk array interrupt signals |
US5202979A (en) | 1985-05-08 | 1993-04-13 | Thinking Machines Corporation | Storage system using multiple independently mechanically-driven storage units |
US5208813A (en) | 1990-10-23 | 1993-05-04 | Array Technology Corporation | On-line reconstruction of a failed redundant array system |
US5210860A (en) | 1990-07-20 | 1993-05-11 | Compaq Computer Corporation | Intelligent disk array controller |
US5218689A (en) | 1988-08-16 | 1993-06-08 | Cray Research, Inc. | Single disk emulation interface for an array of asynchronously operating disk drives |
US5233618A (en) | 1990-03-02 | 1993-08-03 | Micro Technology, Inc. | Data correcting applicable to redundant arrays of independent disks |
US5235601A (en) | 1990-12-21 | 1993-08-10 | Array Technology Corporation | On-line restoration of redundancy information in a redundant array system |
US5237658A (en) | 1991-10-01 | 1993-08-17 | Tandem Computers Incorporated | Linear and orthogonal expansion of array storage in multiprocessor computing systems |
US5257367A (en) | 1987-06-02 | 1993-10-26 | Cab-Tek, Inc. | Data storage system with asynchronous host operating system communication link |
US5271012A (en) | 1991-02-11 | 1993-12-14 | International Business Machines Corporation | Method and means for encoding and rebuilding data contents of up to two unavailable DASDs in an array of DASDs |
US5274799A (en) | 1991-01-04 | 1993-12-28 | Array Technology Corporation | Storage device array architecture with copyback cache |
US5305326A (en) | 1992-03-06 | 1994-04-19 | Data General Corporation | High availability disk arrays |
US5351246A (en) | 1991-06-21 | 1994-09-27 | International Business Machines Corporation | Method and means for coding and rebuilding that data contents of unavailable DASDs or rebuilding the contents of DASDs in error in the presence of reduced number of unavailable DASDs in a DASD array |
US5375128A (en) | 1990-10-18 | 1994-12-20 | Ibm Corporation (International Business Machines Corporation) | Fast updating of DASD arrays using selective shadow writing of parity and data blocks, tracks, or cylinders |
US5410667A (en) | 1992-04-17 | 1995-04-25 | Storage Technology Corporation | Data record copy system for a disk drive array data storage subsystem |
US5537567A (en) | 1994-03-14 | 1996-07-16 | International Business Machines Corporation | Parity block configuration in an array of storage devices |
US5579475A (en) | 1991-02-11 | 1996-11-26 | International Business Machines Corporation | Method and means for encoding and rebuilding the data contents of up to two unavailable DASDS in a DASD array using simple non-recursive diagonal and row parity |
US5623595A (en) | 1994-09-26 | 1997-04-22 | Oracle Corporation | Method and apparatus for transparent, real time reconstruction of corrupted data in a redundant array data storage system |
US5657468A (en) | 1995-08-17 | 1997-08-12 | Ambex Technologies, Inc. | Method and apparatus for improving performance in a reduntant array of independent disks |
US5805788A (en) | 1996-05-20 | 1998-09-08 | Cray Research, Inc. | Raid-5 parity generation and data reconstruction |
US5812753A (en) | 1995-10-13 | 1998-09-22 | Eccs, Inc. | Method for initializing or reconstructing data consistency within an array of storage elements |
US5819292A (en) | 1993-06-03 | 1998-10-06 | Network Appliance, Inc. | Method for maintaining consistent states of a file system and for creating user-accessible read-only copies of a file system |
US5862158A (en) | 1995-11-08 | 1999-01-19 | International Business Machines Corporation | Efficient method for providing fault tolerance against double device failures in multiple device systems |
US5884098A (en) | 1996-04-18 | 1999-03-16 | Emc Corporation | RAID controller system utilizing front end and back end caching systems including communication path connecting two caching systems and synchronizing allocation of blocks in caching systems |
US5948110A (en) | 1993-06-04 | 1999-09-07 | Network Appliance, Inc. | Method for providing parity in a raid sub-system using non-volatile memory |
US5950225A (en) | 1997-02-28 | 1999-09-07 | Network Appliance, Inc. | Fly-by XOR for generating parity for data gleaned from a bus |
US5963962A (en) | 1995-05-31 | 1999-10-05 | Network Appliance, Inc. | Write anywhere file-system layout |
US6038570A (en) | 1993-06-03 | 2000-03-14 | Network Appliance, Inc. | Method for allocating files in a file system integrated with a RAID disk sub-system |
US6092215A (en) | 1997-09-29 | 2000-07-18 | International Business Machines Corporation | System and method for reconstructing data in a storage array system |
US6138201A (en) | 1998-04-15 | 2000-10-24 | Sony Corporation | Redundant array of inexpensive tape drives using data compression and data allocation ratios |
US6138126A (en) | 1995-05-31 | 2000-10-24 | Network Appliance, Inc. | Method for allocating files in a file system integrated with a raid disk sub-system |
US6138125A (en) | 1998-03-31 | 2000-10-24 | Lsi Logic Corporation | Block coding method and system for failure recovery in disk arrays |
US6158017A (en) | 1997-07-15 | 2000-12-05 | Samsung Electronics Co., Ltd. | Method for storing parity and rebuilding data contents of failed disks in an external storage subsystem and apparatus thereof |
WO2001013236A1 (en) | 1999-08-17 | 2001-02-22 | Tricord Systems, Inc. | Object oriented fault tolerance |
US6223300B1 (en) | 1997-11-21 | 2001-04-24 | Alps Electric Co., Ltd. | Disk array apparatus |
US6247157B1 (en) | 1998-05-13 | 2001-06-12 | Intel Corporation | Method of encoding data signals for storage |
WO2002029539A2 (en) | 2000-10-02 | 2002-04-11 | Sun Microsystems, Inc. | A data storage subsystem including a storage disk array employing dynamic data striping |
US20020124137A1 (en) | 2001-01-29 | 2002-09-05 | Ulrich Thomas R. | Enhancing disk array performance via variable parity based load balancing |
US20020184556A1 (en) * | 2001-06-05 | 2002-12-05 | Ebrahim Hashemi | Data storage array employing block verification information to invoke initialization procedures |
US6532548B1 (en) | 1999-09-21 | 2003-03-11 | Storage Technology Corporation | System and method for handling temporary errors on a redundant array of independent tapes (RAIT) |
US6557123B1 (en) | 1999-08-02 | 2003-04-29 | Inostor Corporation | Data redundancy methods and apparatus |
US6571326B2 (en) | 2001-03-08 | 2003-05-27 | Intel Corporation | Space allocation for data in a nonvolatile memory |
US6581185B1 (en) | 2000-01-24 | 2003-06-17 | Storage Technology Corporation | Apparatus and method for reconstructing data using cross-parity stripes on storage media |
EP1324200A2 (en) | 2001-12-28 | 2003-07-02 | Network Appliance, Inc. | Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array |
US6671772B1 (en) | 2000-09-20 | 2003-12-30 | Robert E. Cousins | Hierarchical file system structure for enhancing disk transfer efficiency |
US6779095B2 (en) | 2000-06-19 | 2004-08-17 | Storage Technology Corporation | Apparatus and method for instant copy of data using pointers to new and original data in a data location |
US6904498B2 (en) * | 2002-10-08 | 2005-06-07 | Netcell Corp. | Raid controller disk write mask |
US7073115B2 (en) | 2001-12-28 | 2006-07-04 | Network Appliance, Inc. | Correcting multiple block data loss in a storage array using a combination of a single diagonal parity group and multiple row parity groups |
US7328305B2 (en) | 2003-11-03 | 2008-02-05 | Network Appliance, Inc. | Dynamic parity distribution technique |
US20080109616A1 (en) | 2006-10-31 | 2008-05-08 | Taylor James A | System and method for optimizing write operations in storage systems |
US7454445B2 (en) | 2000-08-18 | 2008-11-18 | Network Appliance, Inc. | Write allocation based on storage system map and snapshot |
-
2007
- 2007-04-12 US US11/734,314 patent/US8209587B1/en active Active
Patent Citations (91)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3876978A (en) | 1973-06-04 | 1975-04-08 | Ibm | Archival data protection |
US4092732A (en) | 1977-05-31 | 1978-05-30 | International Business Machines Corporation | System for recovering data stored in failed memory unit |
US4201976A (en) | 1977-12-23 | 1980-05-06 | International Business Machines Corporation | Plural channel error correcting methods and means using adaptive reallocation of redundant channels among groups of channels |
US4205324A (en) | 1977-12-23 | 1980-05-27 | International Business Machines Corporation | Methods and means for simultaneously correcting several channels in error in a parallel multi channel data system using continuously modifiable syndromes and selective generation of internal channel pointers |
US4517663A (en) | 1979-09-04 | 1985-05-14 | Fujitsu Fanuc Limited | Method of rewriting data in non-volatile memory, and system therefor |
US4467421A (en) | 1979-10-18 | 1984-08-21 | Storage Technology Corporation | Virtual storage system and method |
US4375100A (en) | 1979-10-24 | 1983-02-22 | Matsushita Electric Industrial Company, Limited | Method and apparatus for encoding low redundancy check words from source data |
US4547882A (en) | 1983-03-01 | 1985-10-15 | The Board Of Trustees Of The Leland Stanford Jr. University | Error detecting and correcting memories |
US4825403A (en) | 1983-05-16 | 1989-04-25 | Data General Corporation | Apparatus guaranteeing that a controller in a disk drive system receives at least some data from an invalid track sector |
US4688221A (en) | 1983-12-28 | 1987-08-18 | Hitachi, Ltd. | Error recovery method and apparatus |
US4849929A (en) | 1984-03-16 | 1989-07-18 | Cii Honeywell Bull (Societe Anonyme) | Method of recording in a disk memory and disk memory system |
US4817035A (en) | 1984-03-16 | 1989-03-28 | Cii Honeywell Bull | Method of recording in a disk memory and disk memory system |
US4667326A (en) | 1984-12-20 | 1987-05-19 | Advanced Micro Devices, Inc. | Method and apparatus for error detection and correction in systems comprising floppy and/or hard disk drives |
US5202979A (en) | 1985-05-08 | 1993-04-13 | Thinking Machines Corporation | Storage system using multiple independently mechanically-driven storage units |
US4722085A (en) | 1986-02-03 | 1988-01-26 | Unisys Corp. | High capacity disk storage system having unusually high fault tolerance level and bandpass |
US4755978A (en) | 1986-02-18 | 1988-07-05 | Sony Corporation | Disc player |
US4761785A (en) | 1986-06-12 | 1988-08-02 | International Business Machines Corporation | Parity spreading to enhance storage access |
US4761785B1 (en) | 1986-06-12 | 1996-03-12 | Ibm | Parity spreading to enhance storage access |
USRE34100E (en) | 1987-01-12 | 1992-10-13 | Seagate Technology, Inc. | Data error correction system |
US4775978A (en) | 1987-01-12 | 1988-10-04 | Magnetic Peripherals Inc. | Data error correction system |
US4796260A (en) | 1987-03-30 | 1989-01-03 | Scs Telecom, Inc. | Schilling-Manela forward error correction and detection code method and apparatus |
US5257367A (en) | 1987-06-02 | 1993-10-26 | Cab-Tek, Inc. | Data storage system with asynchronous host operating system communication link |
US4849974A (en) | 1987-08-03 | 1989-07-18 | Scs Telecom, Inc. | PASM and TASM forward error correction and detection code method and apparatus |
US4849976A (en) | 1987-08-03 | 1989-07-18 | Scs Telecom, Inc. | PASM and TASM forward error correction and detection code method and apparatus |
US4837680A (en) | 1987-08-28 | 1989-06-06 | International Business Machines Corporation | Controlling asynchronously operating peripherals |
US4870643A (en) | 1987-11-06 | 1989-09-26 | Micropolis Corporation | Parallel drive array storage system |
US4847842A (en) | 1987-11-19 | 1989-07-11 | Scs Telecom, Inc. | SM codec method and apparatus |
US4899342A (en) | 1988-02-01 | 1990-02-06 | Thinking Machines Corporation | Method and apparatus for operating multi-unit array of memories |
US4989205A (en) | 1988-06-28 | 1991-01-29 | Storage Technology Corporation | Disk drive memory |
US5077736A (en) | 1988-06-28 | 1991-12-31 | Storage Technology Corporation | Disk drive memory |
US4989206A (en) | 1988-06-28 | 1991-01-29 | Storage Technology Corporation | Disk drive memory |
US5128810A (en) | 1988-08-02 | 1992-07-07 | Cray Research, Inc. | Single disk emulation interface for an array of synchronous spindle disk drives |
US5218689A (en) | 1988-08-16 | 1993-06-08 | Cray Research, Inc. | Single disk emulation interface for an array of asynchronously operating disk drives |
US5148432A (en) | 1988-11-14 | 1992-09-15 | Array Technology Corporation | Arrayed disk drive system and method |
US5163131A (en) | 1989-09-08 | 1992-11-10 | Auspex Systems, Inc. | Parallel i/o network file server architecture |
US5101492A (en) | 1989-11-03 | 1992-03-31 | Compaq Computer Corporation | Data redundancy and recovery protection |
US5233618A (en) | 1990-03-02 | 1993-08-03 | Micro Technology, Inc. | Data correcting applicable to redundant arrays of independent disks |
US5088081A (en) | 1990-03-28 | 1992-02-11 | Prime Computer, Inc. | Method and apparatus for improved disk access |
US5166936A (en) | 1990-07-20 | 1992-11-24 | Compaq Computer Corporation | Automatic hard disk bad sector remapping |
US5210860A (en) | 1990-07-20 | 1993-05-11 | Compaq Computer Corporation | Intelligent disk array controller |
US5375128A (en) | 1990-10-18 | 1994-12-20 | Ibm Corporation (International Business Machines Corporation) | Fast updating of DASD arrays using selective shadow writing of parity and data blocks, tracks, or cylinders |
US5208813A (en) | 1990-10-23 | 1993-05-04 | Array Technology Corporation | On-line reconstruction of a failed redundant array system |
US5235601A (en) | 1990-12-21 | 1993-08-10 | Array Technology Corporation | On-line restoration of redundancy information in a redundant array system |
US5274799A (en) | 1991-01-04 | 1993-12-28 | Array Technology Corporation | Storage device array architecture with copyback cache |
US5271012A (en) | 1991-02-11 | 1993-12-14 | International Business Machines Corporation | Method and means for encoding and rebuilding data contents of up to two unavailable DASDs in an array of DASDs |
US5579475A (en) | 1991-02-11 | 1996-11-26 | International Business Machines Corporation | Method and means for encoding and rebuilding the data contents of up to two unavailable DASDS in a DASD array using simple non-recursive diagonal and row parity |
US5179704A (en) | 1991-03-13 | 1993-01-12 | Ncr Corporation | Method and apparatus for generating disk array interrupt signals |
US5351246A (en) | 1991-06-21 | 1994-09-27 | International Business Machines Corporation | Method and means for coding and rebuilding that data contents of unavailable DASDs or rebuilding the contents of DASDs in error in the presence of reduced number of unavailable DASDs in a DASD array |
US5237658A (en) | 1991-10-01 | 1993-08-17 | Tandem Computers Incorporated | Linear and orthogonal expansion of array storage in multiprocessor computing systems |
US5305326A (en) | 1992-03-06 | 1994-04-19 | Data General Corporation | High availability disk arrays |
US5410667A (en) | 1992-04-17 | 1995-04-25 | Storage Technology Corporation | Data record copy system for a disk drive array data storage subsystem |
US5819292A (en) | 1993-06-03 | 1998-10-06 | Network Appliance, Inc. | Method for maintaining consistent states of a file system and for creating user-accessible read-only copies of a file system |
US6289356B1 (en) | 1993-06-03 | 2001-09-11 | Network Appliance, Inc. | Write anywhere file-system layout |
US6038570A (en) | 1993-06-03 | 2000-03-14 | Network Appliance, Inc. | Method for allocating files in a file system integrated with a RAID disk sub-system |
US5948110A (en) | 1993-06-04 | 1999-09-07 | Network Appliance, Inc. | Method for providing parity in a raid sub-system using non-volatile memory |
US5537567A (en) | 1994-03-14 | 1996-07-16 | International Business Machines Corporation | Parity block configuration in an array of storage devices |
US5623595A (en) | 1994-09-26 | 1997-04-22 | Oracle Corporation | Method and apparatus for transparent, real time reconstruction of corrupted data in a redundant array data storage system |
US5963962A (en) | 1995-05-31 | 1999-10-05 | Network Appliance, Inc. | Write anywhere file-system layout |
US6138126A (en) | 1995-05-31 | 2000-10-24 | Network Appliance, Inc. | Method for allocating files in a file system integrated with a raid disk sub-system |
US5657468A (en) | 1995-08-17 | 1997-08-12 | Ambex Technologies, Inc. | Method and apparatus for improving performance in a reduntant array of independent disks |
US5812753A (en) | 1995-10-13 | 1998-09-22 | Eccs, Inc. | Method for initializing or reconstructing data consistency within an array of storage elements |
US5862158A (en) | 1995-11-08 | 1999-01-19 | International Business Machines Corporation | Efficient method for providing fault tolerance against double device failures in multiple device systems |
US5884098A (en) | 1996-04-18 | 1999-03-16 | Emc Corporation | RAID controller system utilizing front end and back end caching systems including communication path connecting two caching systems and synchronizing allocation of blocks in caching systems |
US5805788A (en) | 1996-05-20 | 1998-09-08 | Cray Research, Inc. | Raid-5 parity generation and data reconstruction |
US5950225A (en) | 1997-02-28 | 1999-09-07 | Network Appliance, Inc. | Fly-by XOR for generating parity for data gleaned from a bus |
US6158017A (en) | 1997-07-15 | 2000-12-05 | Samsung Electronics Co., Ltd. | Method for storing parity and rebuilding data contents of failed disks in an external storage subsystem and apparatus thereof |
US6092215A (en) | 1997-09-29 | 2000-07-18 | International Business Machines Corporation | System and method for reconstructing data in a storage array system |
US6223300B1 (en) | 1997-11-21 | 2001-04-24 | Alps Electric Co., Ltd. | Disk array apparatus |
US6138125A (en) | 1998-03-31 | 2000-10-24 | Lsi Logic Corporation | Block coding method and system for failure recovery in disk arrays |
US6138201A (en) | 1998-04-15 | 2000-10-24 | Sony Corporation | Redundant array of inexpensive tape drives using data compression and data allocation ratios |
US6247157B1 (en) | 1998-05-13 | 2001-06-12 | Intel Corporation | Method of encoding data signals for storage |
US6557123B1 (en) | 1999-08-02 | 2003-04-29 | Inostor Corporation | Data redundancy methods and apparatus |
WO2001013236A1 (en) | 1999-08-17 | 2001-02-22 | Tricord Systems, Inc. | Object oriented fault tolerance |
US6742137B1 (en) | 1999-08-17 | 2004-05-25 | Adaptec, Inc. | Object oriented fault tolerance |
US6532548B1 (en) | 1999-09-21 | 2003-03-11 | Storage Technology Corporation | System and method for handling temporary errors on a redundant array of independent tapes (RAIT) |
US6581185B1 (en) | 2000-01-24 | 2003-06-17 | Storage Technology Corporation | Apparatus and method for reconstructing data using cross-parity stripes on storage media |
US6779095B2 (en) | 2000-06-19 | 2004-08-17 | Storage Technology Corporation | Apparatus and method for instant copy of data using pointers to new and original data in a data location |
US7454445B2 (en) | 2000-08-18 | 2008-11-18 | Network Appliance, Inc. | Write allocation based on storage system map and snapshot |
US6671772B1 (en) | 2000-09-20 | 2003-12-30 | Robert E. Cousins | Hierarchical file system structure for enhancing disk transfer efficiency |
WO2002029539A2 (en) | 2000-10-02 | 2002-04-11 | Sun Microsystems, Inc. | A data storage subsystem including a storage disk array employing dynamic data striping |
US20020124137A1 (en) | 2001-01-29 | 2002-09-05 | Ulrich Thomas R. | Enhancing disk array performance via variable parity based load balancing |
US6571326B2 (en) | 2001-03-08 | 2003-05-27 | Intel Corporation | Space allocation for data in a nonvolatile memory |
US20020184556A1 (en) * | 2001-06-05 | 2002-12-05 | Ebrahim Hashemi | Data storage array employing block verification information to invoke initialization procedures |
EP1324200A2 (en) | 2001-12-28 | 2003-07-02 | Network Appliance, Inc. | Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array |
US6993701B2 (en) | 2001-12-28 | 2006-01-31 | Network Appliance, Inc. | Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array |
US7073115B2 (en) | 2001-12-28 | 2006-07-04 | Network Appliance, Inc. | Correcting multiple block data loss in a storage array using a combination of a single diagonal parity group and multiple row parity groups |
US7203892B2 (en) | 2001-12-28 | 2007-04-10 | Network Appliance, Inc. | Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array |
US7409625B2 (en) | 2001-12-28 | 2008-08-05 | Network Appliance, Inc. | Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array |
US6904498B2 (en) * | 2002-10-08 | 2005-06-07 | Netcell Corp. | Raid controller disk write mask |
US7328305B2 (en) | 2003-11-03 | 2008-02-05 | Network Appliance, Inc. | Dynamic parity distribution technique |
US20080109616A1 (en) | 2006-10-31 | 2008-05-08 | Taylor James A | System and method for optimizing write operations in storage systems |
Non-Patent Citations (16)
* Cited by examiner, † Cited by third partyTitle |
---|
Bultman, David L., High Performance SCSI Using Parallel Drive Technology, In Proc. BUSCON Conf., pp. 40-44, Anaheim, CA, Feb. 1988. |
Evans The Tip of the Iceberg:RAMAC Virtual Array-Part I, Technical Support, Mar. 1997, pp. 1-4. |
Gibson, Garth A., et al., Coding Techniques for Handling Failures in Large Disk Arrays, Technical Report UCB/CSD88/477, Computer Science Division, University of California, Jul. 1988. |
Gibson, Garth A., et al., Failure Correction Techniques for Large Disk Arrays, In Proceedings Architectural Support for Programming Languages and Operating Systems, Boston, Apr. 1989, pp. 123-132. |
Gibson, Garth A., et al., Strategic Directions in Storage I/O Issues in Large-Scale Computing, ACM Computing Survey, 28(4):779-93, Dec. 1996. |
Hitz, David, TR3002 File System Design for a NFS File Server Appliance, Network Appliance, Inc. |
Menon, Jai, et al., Floating Parity and Data Disk Arrays, Journal of Parallel and Distributed Computing, Boston: Academic Press. Inc., vol. 17 No. 1 and 2, Jan./Feb. 1993, 13 pages. |
Menon, Jai, et al., Methods for Improved Update Performance of Disk Arrays, IBM Almaden Research Center, IEEE, Jan. 1992, 10 pages. |
Park, Arvin, et al., Providing Fault Tolerance in Parallel Secondary Storage Systems, Technical Report CS-TR-057-86, Princeton, Nov. 1986. |
Patel, Arvind M., Adaptive Cross-Parity (AXP) Code for a High-Density Magnetic Tape Subsystem, IBM Technical Disclosure Bulletin 29(6):546-562, Nov. 1985. |
Patterson, D., et al., A Case for Redundant Arrays of Inexpensive Disks (RAID), SIGMOD International Conference on Management of Data, Chicago, IL, USA, Jun. 1-3, 1988, SIGMOD Record (17)3:109-16 (Sep. 1988). |
Patterson, D., et al., A Case for Redundant Arrays of Inexpensive Disks (RAID), Technical Report, CSD-87-391, Computer Science Division, Electrical Engineering and Computer Sciences, University of California at Berkeley (1987). |
Patterson, David A., et al., Introduction to Redundant Arrays of Inexpensive Disks (RAID). In IEEE Spring 89 COMPCON, San Francisco, IEEE Computer Society Press, Feb. 27-Mar. 3, 1989, pp. 112-117. |
Schulze, Martin E., Considerations in the Design of a RAID Prototype, Computer Science Division, Department of Electrical Engineering and Computer Sciences, Univ. of CA, Berkley, Aug. 25, 1988. |
Schulze, Martin., et al., How Reliable is a RAID?, Proceedings of COMPCON, 1989, pp. 118-123. |
Stonebraker, Michael, et al., The Design of XPRS, Proceedings of the 14th VLDB Conference, LA, CA (1988). |
Cited By (47)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11379119B2 (en) | 2010-03-05 | 2022-07-05 | Netapp, Inc. | Writing data in a distributed data storage system |
US9104342B2 (en) * | 2010-10-21 | 2015-08-11 | Oracle International Corporation | Two stage checksummed raid storage model |
US20130246707A1 (en) * | 2010-10-21 | 2013-09-19 | Oracle International Corporation | Two stage checksummed raid storage model |
US20140337578A1 (en) * | 2011-03-01 | 2014-11-13 | Lsi Corporation | Redundant array of inexpensive disks (raid) system configured to reduce rebuild time and to prevent data sprawl |
US8566686B2 (en) * | 2011-05-13 | 2013-10-22 | Lsi Corporation | System and method for optimizing read-modify-write operations in a RAID 6 volume |
US20120290905A1 (en) * | 2011-05-13 | 2012-11-15 | Lsi Corporation | System and method for optimizing read-modify-write operations in a raid 6 volume |
US9798615B2 (en) * | 2011-07-05 | 2017-10-24 | Dell Products, Lp | System and method for providing a RAID plus copy model for a storage network |
US20130013857A1 (en) * | 2011-07-05 | 2013-01-10 | Dell Products, Lp | System and Method for Providing a RAID Plus Copy Model for a Storage Network |
US12250129B2 (en) | 2011-12-27 | 2025-03-11 | Netapp, Inc. | Proportional quality of service based on client usage and system metrics |
US11212196B2 (en) | 2011-12-27 | 2021-12-28 | Netapp, Inc. | Proportional quality of service based on client impact on an overload condition |
US10951488B2 (en) | 2011-12-27 | 2021-03-16 | Netapp, Inc. | Rule-based performance class access management for storage cluster performance guarantees |
US10911328B2 (en) | 2011-12-27 | 2021-02-02 | Netapp, Inc. | Quality of service policy based load adaption |
US9304912B1 (en) * | 2012-01-06 | 2016-04-05 | Marvell International Ltd. | Systems and methods for building redundancy data in a RAID system |
US20140181455A1 (en) * | 2012-12-20 | 2014-06-26 | Apple Inc. | Category based space allocation for multiple storage devices |
CN103488434A (en) * | 2013-09-23 | 2014-01-01 | 浪潮电子信息产业股份有限公司 | Method for improving disk array reliability |
EP2933733A4 (en) * | 2013-12-31 | 2016-05-11 | Huawei Tech Co Ltd | Data processing method and device in distributed file storage system |
US10127233B2 (en) | 2013-12-31 | 2018-11-13 | Huawei Technologies Co., Ltd. | Data processing method and device in distributed file storage system |
AU2013409624B2 (en) * | 2013-12-31 | 2016-11-17 | Huawei Technologies Co., Ltd. | Data processing method and device in distributed file storage system |
US9170746B2 (en) * | 2014-01-07 | 2015-10-27 | Netapp, Inc. | Clustered raid assimilation management |
US9619351B2 (en) | 2014-01-07 | 2017-04-11 | Netapp, Inc. | Clustered RAID assimilation management |
US9367241B2 (en) | 2014-01-07 | 2016-06-14 | Netapp, Inc. | Clustered RAID assimilation management |
US20150193168A1 (en) * | 2014-01-07 | 2015-07-09 | Netapp, Inc. | Clustered raid assimilation management |
US11386120B2 (en) | 2014-02-21 | 2022-07-12 | Netapp, Inc. | Data syncing in a distributed system |
US9798728B2 (en) | 2014-07-24 | 2017-10-24 | Netapp, Inc. | System performing data deduplication using a dense tree data structure |
US9671960B2 (en) | 2014-09-12 | 2017-06-06 | Netapp, Inc. | Rate matching technique for balancing segment cleaning and I/O workload |
US10133511B2 (en) | 2014-09-12 | 2018-11-20 | Netapp, Inc | Optimized segment cleaning technique |
US10210082B2 (en) | 2014-09-12 | 2019-02-19 | Netapp, Inc. | Rate matching technique for balancing segment cleaning and I/O workload |
US9830220B1 (en) * | 2014-09-29 | 2017-11-28 | EMC IP Holding Company LLC | Enhanced error recovery for data storage drives |
US10496496B2 (en) * | 2014-10-29 | 2019-12-03 | Hewlett Packard Enterprise Development Lp | Data restoration using allocation maps |
US10365838B2 (en) | 2014-11-18 | 2019-07-30 | Netapp, Inc. | N-way merge technique for updating volume metadata in a storage I/O stack |
US9836229B2 (en) | 2014-11-18 | 2017-12-05 | Netapp, Inc. | N-way merge technique for updating volume metadata in a storage I/O stack |
US9720601B2 (en) | 2015-02-11 | 2017-08-01 | Netapp, Inc. | Load balancing technique for a storage array |
US9762460B2 (en) | 2015-03-24 | 2017-09-12 | Netapp, Inc. | Providing continuous context for operational information of a storage system |
US9710317B2 (en) | 2015-03-30 | 2017-07-18 | Netapp, Inc. | Methods to identify, handle and recover from suspect SSDS in a clustered flash array |
US9740566B2 (en) | 2015-07-31 | 2017-08-22 | Netapp, Inc. | Snapshot creation workflow |
CN108351820A (en) * | 2015-11-20 | 2018-07-31 | 高通股份有限公司 | The positions protection ECC when data are corrected in across memory link transmission |
US20170147431A1 (en) * | 2015-11-20 | 2017-05-25 | Qualcomm Incorporated | Protecting an ecc location when transmitting correction data across a memory link |
US10140175B2 (en) * | 2015-11-20 | 2018-11-27 | Qualcomm Incorporated | Protecting an ECC location when transmitting correction data across a memory link |
US10235059B2 (en) | 2015-12-01 | 2019-03-19 | Netapp, Inc. | Technique for maintaining consistent I/O processing throughput in a storage system |
US10929022B2 (en) | 2016-04-25 | 2021-02-23 | Netapp. Inc. | Space savings reporting for storage system supporting snapshot and clones |
US11327910B2 (en) | 2016-09-20 | 2022-05-10 | Netapp, Inc. | Quality of service policy sets |
US10997098B2 (en) | 2016-09-20 | 2021-05-04 | Netapp, Inc. | Quality of service policy sets |
US11886363B2 (en) | 2016-09-20 | 2024-01-30 | Netapp, Inc. | Quality of service policy sets |
US11016848B2 (en) | 2017-11-02 | 2021-05-25 | Seagate Technology Llc | Distributed data storage system with initialization-less parity |
CN110413205B (en) * | 2018-04-28 | 2023-07-07 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer readable storage medium for writing to disk array |
CN110413205A (en) * | 2018-04-28 | 2019-11-05 | 伊姆西Ip控股有限责任公司 | Method, equipment and computer readable storage medium for being written to disk array |
US11625193B2 (en) | 2020-07-10 | 2023-04-11 | Samsung Electronics Co., Ltd. | RAID storage device, host, and RAID system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8209587B1 (en) | 2012-06-26 | System and method for eliminating zeroing of disk drives in RAID arrays |
US8156282B1 (en) | 2012-04-10 | System and method for optimizing write operations in storage systems |
US10606491B2 (en) | 2020-03-31 | Providing redundancy in a virtualized storage system for a computer system |
JP6294518B2 (en) | 2018-03-14 | Synchronous mirroring in non-volatile memory systems |
US7647526B1 (en) | 2010-01-12 | Reducing reconstruct input/output operations in storage systems |
US8560879B1 (en) | 2013-10-15 | Data recovery for failed memory device of memory device array |
US7984259B1 (en) | 2011-07-19 | Reducing load imbalance in a storage system |
US10365983B1 (en) | 2019-07-30 | Repairing raid systems at per-stripe granularity |
JP5302886B2 (en) | 2013-10-02 | System and method for reading block fingerprint and eliminating data duplication using block fingerprint |
US10691339B2 (en) | 2020-06-23 | Methods for reducing initialization duration and performance impact during configuration of storage drives |
US8732411B1 (en) | 2014-05-20 | Data de-duplication for information storage systems |
TWI451257B (en) | 2014-09-01 | Method and apparatus for protecting the integrity of cached data in a direct-attached storage (das) system |
CN110737395B (en) | 2023-09-29 | I/O management method, electronic device, and computer-readable storage medium |
US10579540B2 (en) | 2020-03-03 | Raid data migration through stripe swapping |
US7716519B2 (en) | 2010-05-11 | Method and system for repairing partially damaged blocks |
US6976146B1 (en) | 2005-12-13 | System and method for emulating block appended checksums on storage devices by sector stealing |
US11592988B2 (en) | 2023-02-28 | Utilizing a hybrid tier which mixes solid state device storage and hard disk drive storage |
CN116627856A (en) | 2023-08-22 | Method, device and equipment for realizing memory address mapping |
US9830094B2 (en) | 2017-11-28 | Dynamic transitioning of protection information in array systems |
US11409666B2 (en) | 2022-08-09 | Techniques for providing I/O hints using I/O flags |
CN103098034B (en) | 2016-11-30 | The apparatus and method of operation are stored for condition and atom |
CN115686366A (en) | 2023-02-03 | Write data caching acceleration method based on RAID |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
2007-04-12 | AS | Assignment |
Owner name: NETWORK APPLIANCE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TAYLOR, JAMES;GOEL, ATUL;LEONG, JAMES;SIGNING DATES FROM 20070403 TO 20070410;REEL/FRAME:019150/0988 |
2011-12-05 | FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
2012-05-09 | AS | Assignment |
Owner name: NETAPP, INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:NETWORK APPLIANCE, INC.;REEL/FRAME:028180/0846 Effective date: 20080310 |
2012-06-06 | STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
2015-12-28 | FPAY | Fee payment |
Year of fee payment: 4 |
2019-12-26 | MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
2023-12-26 | MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |
2024-06-26 | AS | Assignment |
Owner name: NETAPP, INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:NETWORK APPLIANCE, INC.;REEL/FRAME:067846/0174 Effective date: 20080317 |