CN108920192A - Method and device for implementing cache data consistency based on distributed finite directory - Google Patents
- ️Fri Nov 30 2018
Info
-
Publication number
- CN108920192A CN108920192A CN201810719059.4A CN201810719059A CN108920192A CN 108920192 A CN108920192 A CN 108920192A CN 201810719059 A CN201810719059 A CN 201810719059A CN 108920192 A CN108920192 A CN 108920192A Authority
- CN
- China Prior art keywords
- ddcu
- host
- pcache
- request
- privately owned Prior art date
- 2018-07-03 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 16
- 230000003139 buffering effect Effects 0.000 claims description 37
- 230000004044 response Effects 0.000 claims description 18
- 230000003068 static effect Effects 0.000 claims description 17
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000012423 maintenance Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 2
- 238000004080 punching Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 6
- 230000000903 blocking effect Effects 0.000 description 4
- 239000003054 catalyst Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000003475 lamination Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000007630 basic procedure Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/524—Deadlock detection or avoidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
- G06F9/3869—Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/466—Transaction processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a method and a device for realizing cache data consistency based on a distributed finite directory, which respond to a request of a corresponding processing unit PE X through a private cache PCache X, send a data read-write consistency request transaction to a corresponding host DDCU X when the request is not hit or the data is written back, discard messages and enter a flow control mode when the host DDCU X processes the request transaction and meets resource conflict, and the PCache X retransmits the discarded messages one by one based on a contract principle until the flow control mode exits and enters a normal flow operation mode after the resource conflict is relieved. The invention can solve the problem that the requested transaction is starved due to the fact that a work flow line is blocked by directory self-replacement, provides strict execution sequence guarantee, ensures that the data dependency relationship is not damaged in a distributed environment, completely maintains the data dependency relationship among the requested transactions, and ensures that the requested transactions sent by a plurality of cores can be executed in a fair and coordinated manner, thereby improving the reliability and the expandability of the many-core processor.
Description
Technical field
The present invention relates to multicore processor architectures, and in particular to a kind of based on the data cached of distributed limited catalogue Consistency implementation method and device, for solving the cache based on catalogue(Cache)Consistency protocol is in catalog memory Capacity is by limited time, the problem that catalogue causes request transaction hungry to death from replacement blocking work assembly line.
Background technique
Many-core microprocessor is commonly achieved in that isomorphism is integrated, that is, integrate multiple maturations, structure is identical, function is strong Big general procedure core.Intel is integrated with 32 processor cores in Skylake-EP Xeon E5 chip.AMD is sharp Imperial Threadripper processor is integrated with 16 processor cores.Server catalyst Catalyst chip of soaring then is integrated with 16 ~ 64 Processor core.As shown in Figure 1, being integrated with Multi-Level Cache in these processor chips(Cache0~Cache (n-1))With deposit Store up controller(Memory Controller Unit, MCU)(MCU0~MCU (m-1)), their storage hierarchy have one altogether Then same feature passes through on-chip interconnection network using privately owned Distributed C ache structure(Network on chip, referred to as NOC)The shared outermost layer Cache of access(Last Level Cache, LLCache)Or storage control MCU.It is distributed private There is Cache to realize data consistency by Cache coherence protocol.Common implementation is namely based on the Cache mono- of catalogue Cause property agreement.
The most fine granularity address space of Cache coherence protocol management based on catalogue is single cache line (CacheLine), common size is 64 bytes.The physical address space of modern many-core microprocessor is often very big.With 44 objects For managing address space, entire catalog memory(Directory Control Unit, DCU)Capacity be about 256G item, Under current process conditions, it is clear that entire directory information can not be stored completely on piece.Cache mono- based on limited list mode Cause property agreement is currently used half-way house, i.e., storage section directory information on piece contents controller.When catalogue stores When device off-capacity, catalogue replacement operation is initiated, the directory entry in the state that effectively still dallies is replaced away, former directory entry Leave new request transaction for.As shown in Fig. 2, DCU, which according to the directory entry information being replaced, generates to monitor, cancels request Snoop_ Invalidate is sent to corresponding privately owned Cache0 and CacheX by NOC.Privately owned Cache0 and CacheX cancel inside it Cache line Cacheline, and generate snoop responses Snoop_RSP_Data message, be sent back to DCU.DCU will be monitored The dirty copy that response returns writes back MCU by write order message MCU_Write.
When catalogue replacement quantity is more, following several situations will occur:(1)Request transaction is hungry to death:When catalogue is replaced It can cause snoop-operations, monitor calcellation and take a long time, DCU work pipeline resource can be occupied always.When catalogue replacement quantity compared with It when more, is easy to cause blocking, new request transaction delays to handle, such as Read the and Write affairs in Fig. 2, when system section When points increase, easily cause request transaction hungry to death;(2)Request transaction Out-of-order execution destroys data dependence relation:Limited capacity Contents controller pass through from replacement realize catalog memory resource multiplex, when replace quantity it is more when, control dispatching algorithm Complexity combination increases severely.When two there are when the arrival of the request transaction of data dependence, complicated control dispatching algorithm is easy to cause Request transaction Out-of-order execution, data dependence relation are destroyed, and directly affect the correctness that program executes in processor core, are caused Very big risk.The above-mentioned Cache coherence scheme based on limited capacity catalogue is easy to cause request transaction hungry to death, destroys number An important factor for according to dependence, causing biggish design risk, being limitation many-core processor reliability and scalability.
Summary of the invention
The technical problem to be solved in the present invention:In view of the above problems in the prior art, it provides a kind of based on distributed limited The caching data consistency implementation method and device of catalogue, solving catalogue from replacement blocking work assembly line leads to request transaction Problem hungry to death, while stringent execution sequence being provided and is guaranteed, it is ensured that under distributed environment, data dependence relation is not destroyed, Data dependence relation can completely be kept between request transaction, and guarantee the fair coordination of the request transaction energy that multiple cores issue It executes, to improve many-core processor reliability and scalability.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is:
A kind of caching data consistency implementation method based on distributed limited catalogue, implementation steps include:
1)The request of privately owned caching PCache X response alignment processing unit PE X, and when request is not hit by or has data to write back Reading and writing data consistency request transaction is issued to corresponding host DDCU X;
2)Host DDCU X handles reading and writing data consistency request transaction, and encounters resource in reading and writing data consistency request transaction It preferentially is stored in retry in buffering [X] and delay if retrying buffering [X] and being in Idle state free when conflict and be handled, otherwise It abandons the request transaction and retries response message RetryAck to privately owned caching PCache X transmission;
3)Enter flow control mode, the host DDCU X under flow control mode between privately owned caching PCache X and privately owned caching PCache Send letter about message Credit_Grant, and one letter about message Credit_ of every transmission one by one to privately owned caching PCache X Then privately owned caching PCache X retransmits the reading and writing data consistency request transaction abandoned by host DDCU X to Grant Host DDCU X is given, and reading and writing data consistency request transaction is handled by host DDCU X, until being abandoned by host DDCU X All reading and writing data consistency request transactions be fully completed backed off after random flow control mode.
Preferably, step 1)When the request of privately owned caching PCache X response alignment processing unit PE X, if access is asked It asks and is successfully processed, then directly return to hit results and exit;If access request is not hit by or has data to write back, it will ask The message asked is set as dynamic credit request and is stored in wait in normal queue to issue host DDCU X, and in privately owned caching Message is also replicated into the local repeating transmission of a deposit while message in normal queue is sent to host DDCU X by PCache The bottom of queue.
Preferably, step 2)It is middle to privately owned caching PCache X transmission retry response message RetryAck when further include that will weigh The retryCounter counter of examination buffering [X] increases 1, and step 3 certainly)Detailed step include:
3.1)Privately owned caching PCache X is received retry response message RetryAck after then enter flow control mode with host DDCU X, The pause message that wherein destination node in normal queue is host DDCU X is sent, and by its corresponding counter RetryCounter [X] did not allowed to send to host DDCU X before counter retryCounter [X] becomes 0 from increasing 1 Dynamic credit request Dreq, the dynamic credit request Dreq refer to that the reading and writing data consistency under non-flow control mode requests thing Business;
3.2)Work as the upper request transaction being retried retried in buffering [X] in host DDCU X successfully to flow out, retries buffering [X] quene state can be converted to Idle state free, if the counter retryCounter value for retrying buffering [X] is greater than 1, place Main DDCU X can be arranged first retries buffering [X] into reserved mode, and jumps and perform the next step;
3.3)Host DDCU X sends letter about message Credit_Grant to corresponding privately owned caching PCache X;
3.4)After privately owned caching PCache X receives letter about message Credit_Grant, will previously it be abandoned in chronological order by DCU Request transaction take out earliest item data read-write consistency request transaction from re-transmit queue and be set as static credit report Literary Sreq is sent to host DDCU X again, while the counter retryCounter [X] of corresponding host DDCU X subtracts 1 certainly;Institute It states static credit message Sreq and refers to reading and writing data consistency request transaction under flow control mode;
3.5)Host DDCU X is received be set as the reading and writing data consistency request transaction of static credit message Sreq after, to remaining Privately owned caching PCache send monitor message safeguard corresponding cache line Cacheline consistency or by with deposit It stores up controller interaction and completes corresponding cache line Cacheline consistency maintenance;
3.6)Judge whether the counter retryCounter [X] of host DDCU X is 0, if it is 0 privately owned caching PCache X and host DDCU X exit flow control mode;Otherwise, it jumps and executes step 3.3).
The present invention also provides a kind of caching data consistency realization device based on distributed limited catalogue, including many-core are micro- Processor, it is aforementioned based on the data cached consistent of distributed limited catalogue that the many-core microprocessor is programmed to perform the present invention The step of property implementation method.
Compared to the prior art, the present invention has following beneficial effects:The present invention is to based on catalogue Cache coherence protocol Many-core processor architecture improve, design and increase letter about flow-control mechanism using distributed directory, solve catalogue from The problem replaced blocking work assembly line and cause request transaction hungry to death, while stringent execution sequence being provided and is guaranteed, it is ensured that divide Under cloth environment, data dependence relation is not destroyed, and data dependence relation can completely be kept between request transaction, and is guaranteed more The fair coordination of the request transaction energy that a core issues executes, to improve many-core processor reliability and scalability.
Detailed description of the invention
Fig. 1 is the many-core microprocessor topological structure schematic diagram of the prior art.
Fig. 2 is the operation principle schematic diagram of the contents controller of the prior art.
Fig. 3 is the basic procedure schematic diagram of present invention method.
Fig. 4 is the distributed frame schematic diagram of DDCU in the embodiment of the present invention.
Fig. 5 is the bit vector of DCU and Bsy flag bit and effective marker position schematic diagram in the embodiment of the present invention.
Fig. 6 is the structural schematic diagram of privately owned caching PCache and contents controller DCU in the embodiment of the present invention.
Fig. 7 is the request transaction state transition graph for retrying the queue of buffering [X] in the embodiment of the present invention and keeping in.
Fig. 8 is the mutual message process in the embodiment of the present invention under flow control mode between DDCU and PCache.
Specific embodiment
As shown in figure 3, the implementation step of caching data consistency implementation method of the present embodiment based on distributed limited catalogue Suddenly include:
1)The request of privately owned caching PCache X response alignment processing unit PE X, and when request is not hit by or data write back Reading and writing data consistency request transaction is issued to corresponding host DDCU X;
2)Host's DDCU X response data reads and writes consistency request transaction, and encounters resource in reading and writing data consistency request transaction It preferentially is stored in retry in buffering [X] and delay if retrying buffering [X] and being in Idle state free when conflict and be handled, otherwise It abandons reading and writing data consistency request transaction and retries response message RetryAck to privately owned caching PCache X transmission;
3)Enter flow control mode, the host DDCU X under flow control mode between privately owned caching PCache X and privately owned caching PCache Send letter about message Credit_Grant, and one letter about message Credit_ of every transmission one by one to privately owned caching PCache X Then privately owned caching PCache X retransmits the reading and writing data consistency request transaction abandoned by host DDCU X to Grant Host DDCU X is given, and the reading and writing data consistency request transaction received again is handled by host DDCU X, until by host All reading and writing data consistency request transactions that DDCU X is abandoned are fully completed backed off after random flow control mode.
As shown in figure 4, indicating processing unit PE, privately owned caching PCache and corresponding place in the present embodiment with serial number X Main DDCU(Distributed Directory Control Unit, distributed directory controller unit)Between association reflect Penetrate relationship, 1~n-1 of privately owned caching PCache X, the X ∈ of the correspondence of processing unit PE X.Privately owned caching PCache X corresponds to host DDCU X, is communicated by network-on-chip.Processing unit(Processor Element, PE)Privately owned caching(Private Cache, PCache)It is not hit by or when data write back, reading and writing data consistency request transaction can be all issued, according to address space Fragment(Sharding)Mapping specifications realize a pair of processing unit PE, privately owned caching PCache and corresponding host DDCU It should be related to, these reading and writing data consistency request transactions will be forwarded to different host DDCU.The catalogue of certain address spatial lamination Controller assembly line is when handling a certain request transaction, if encountering resource contention(For example, current directory is just being replaced certainly, or Person's catalog memory resource exhaustion, or be also not finished for a upper affairs of current cache line Cacheline)When, Present transaction request will be terminated and be transmitted to and retry buffering queue.It retries buffering queue and is based on letter about principle, be responsible for completing and ask Ask the flow control between source node.After resource contention releasing, flow control is just exited between contents controller and request source node Mode, into normal continuous productive process mode.In order to realize processing unit PE, privately owned caching PCache and corresponding host The one-to-one relationship of DDCU is needed physical address space fragment in the present embodiment, it is ensured that each DCU manages fixed space Cache block data consistency, the space that different DCU is managed is not overlapped;Any cache in physical address space Row Cacheline, host DCU is complete, and chip is unique.The many-core processing of Cache coherence protocol based on distributed limited catalogue Device basic structure such as Fig. 3.In this configuration, processor core unit(Processor Element, PE)Access privately owned caching (Private Cache, PCache)When being not hit by failure, PCache will be to belonging to the address cache line Cacheline points The DDCU of piece(Host DDCU)Issue reading and writing data consistency request transaction.It has been recorded in each DDCU and has been loaded into each PCache All cache line Cacheline information.According to these information, DDCU generates new monitoring retrieval command message and issues There is the PCache of copy, the data for monitoring return are forwarded to the privately owned caching of initial requesting node by DDCU;Or it generates newly Memory access command message, which is sent to, deposits control unit, and the data fetched are forwarded to the privately owned caching of initial requesting node by DDCU.
It is the significant element for completing consistency protocol by the contents controller cells D CU that all host DDCU X are constituted, It has recorded all cache line Cacheline's for being loaded into each privately owned caching PCache by each host DDCU X Information.In general, each of bit vector corresponds to a PCache copy of cache line Cacheline, position in directory information Vector is 1, indicates there is cache line Cacheline copy in corresponding PCache.Fig. 5 is n copy of a record Bit vector, Bsy indicate busy condition, indicate that the upper affairs being attached on cache line Cacheline are carrying out, temporarily The new request transaction of Shi Buneng processing, Vld indicate that cache line Cacheline is in active state, indicate in many-core system There are cache line Cacheline copies at least one PCache.
In the present embodiment, step 1)When the request of privately owned caching PCache X response alignment processing unit PE X, if visited It asks request hit, then directly return to hit results and exits;It, will request if access request is not hit by or has data to write back Message be set as dynamic credit request and be stored in wait in normal queue to issue host DDCU X, and in privately owned caching PCache Message is also replicated into the local re-transmit queue of a deposit while message in normal queue is sent to host DDCU X Bottom.
As shown in fig. 6, safeguarding two request transaction queues in the present embodiment in privately owned caching PCache, one is normal team Column, one is re-transmit queue.Re-transmit queue uses the working method of first in first out, whenever having new request transaction from normal queue When flowing out to on-chip interconnection network NoC, which is also replicated the bottom of a deposit re-transmit queue.Simultaneously for one System containing M DCU also safeguards retryCounter retryCounter group [0 in privately owned caching PCache X:M-1], it is ensured that with DCU is at one-to-one relationship.For the system containing N number of privately owned caching PCache, setting retries buffering queue group in each DCU [0:N-1], retryCounter retryCounter group [0:N-1], credit counter creditCounter group [0:N-1], it is ensured that With PCache at one-to-one relationship.Each retrying buffering queue depth is 1, is temporarily deposited on DCU work assembly line for caching In resource contention(Conflict includes three kinds:Cache line Cacheline just from replacement, catalog memory resource exhaustion or Bsy caused by a upper affairs of person's cache line Cacheline are also not finished is effective)Request transaction.
In the present embodiment, step 2)It is middle to privately owned caching PCache X transmission retry response message RetryAck when further include The retryCounter counter that buffering [X] will be retried increases 1, and step 3 certainly)Detailed step include:
3.1)Privately owned caching PCache X is received retry response message RetryAck after then enter flow control mode with host DDCU X, The pause message that wherein destination node in normal queue is host DDCU X is sent, and by its corresponding counter RetryCounter [X] did not allowed to send to host DDCU X before counter retryCounter [X] becomes 0 from increasing 1 Dynamic credit request Dreq, the dynamic credit request Dreq refer to that the reading and writing data consistency under non-flow control mode requests thing Business;
3.2)Work as the upper request transaction being retried retried in buffering [X] in host DDCU X successfully to flow out, retries buffering [X] quene state can be converted to Idle state free, if the counter retryCounter value for retrying buffering [X] is greater than 1, place Main DDCU X can be arranged first retries buffering [X] into reserved mode, and jumps and perform the next step;
3.3)Host DDCU X sends letter about message Credit_Grant to corresponding privately owned caching PCache X;
3.4)After privately owned caching PCache X receives letter about message Credit_Grant, will previously it be abandoned in chronological order by DCU Request transaction take out earliest item data read-write consistency request transaction from re-transmit queue and be set as static credit report Literary Sreq is sent to host DDCU X again, while the counter retryCounter [X] of corresponding host DDCU X subtracts 1 certainly;Institute It states static credit message Sreq and refers to reading and writing data consistency request transaction under flow control mode;
3.5)Host DDCU X is received be set as the reading and writing data consistency request transaction of static credit message Sreq after, to remaining Privately owned caching PCache send and monitor the corresponding cache line Cacheline consistency of Message processing or by storage control Corresponding cache line Cacheline consistency maintenance is completed in device interaction processed;
3.6)Judge whether the counter retryCounter [X] of host DDCU X is 0, if it is 0 privately owned caching PCache X and host DDCU X exit flow control mode;Otherwise, it jumps and executes step 3.3).
By taking some privately owned caching PCache X and its corresponding host DDCU X as an example, the contents controller course of work is such as Under:
S1)In privately owned caching PCache, to the request transaction message generated due to Cache is not hit by or data write back when, setting These requests are dynamic credit request, and are stored in normal queue in order, are waited to be sent to NOC, as path is 1. in Fig. 6.
S2)In privately owned caching PCache, when the message of normal queue is sent, the bottom of a deposit re-transmit queue is replicated, As path is 2. in Fig. 6.
S3)In host DDCU X, when handling from the request of privately owned caching PCache X, for there are resource contentions(Punching Prominent includes three kinds:Cache line Cacheline is just in replacement, catalog memory resource exhaustion or the cache certainly Bsy caused by a upper affairs of row Cacheline are also not finished is effective)Request transaction, if retry buffering [X] be in sky Not busy state free is then deposited into and retries in buffering [X] queue.As path is 3. in Fig. 6.
S4)In host DDCU X, above-mentioned steps S3)In, if retrying buffering [X] is in busy state, abandons this and ask Affairs message is sought, while sending the message that instruction is retryACK and returning to privately owned caching PCache X, informs other side's request Affairs message has been dropped, while the retryCounter counter for retrying buffering [X] increases 1 certainly.As path is 4. in Fig. 6.
S5)In host DDCU X, the request transaction of buffering [X] is retried for being stored in, according to as shown in fig. 6, implementing Behavior described in step 5 is operated.As path is 7. in Fig. 6.
S6)In privately owned caching PCache X, when receiving the retryACK message from DCU X, into flow control mode, Suspend the message that the destination node in normal queue is DCU X at once to send, while its counter retryCounter [X] increases certainly 1.Hereafter, before counter retryCounter [X] becomes 0, PCache X is not allowed to send dynamic credit request to DCU X Dreq.As path is 5. in Fig. 6
S7)In host DDCU X, according to as shown in Fig. 6 state transition graph, when retry upper one in buffering [X] be retried ask Affairs are asked successfully to flow out, Idle state free can be converted to by retrying buffering [X] quene state.At this time if retrying buffering [X] RetryCounter Counter Value is greater than 1, and the queue can be arranged first and enter reserved mode by host DDCU X, any to come from The dynamic credit request Dreq of PCache X will be dropped when resource contention occurs, and sending instruction is RetryAck's Message informs privately owned caching PCache X(4. such as the path in Fig. 6);Secondly, host DDCU X can be to privately owned caching PCache X Send the message that instruction is Credit_Grant(8. such as the path in Fig. 6), creditCounter [X] increasing 1 certainly.
S8)In privately owned caching PCache X, when receiving from the Credit_Grant message of host DDCU X, learn Host DDCU X has been that the request transaction being dropped just now has reserved memory space, can receive the transmission again of those affairs.It is private There is caching PCache X that will previously can be taken out most from re-transmit queue by the host DDCU X request transaction abandoned in chronological order Early one, and it is set as static credit Sreq message, it is sent to DCU X again (path in such as Fig. 6 is 6.).It is privately owned simultaneously slow It deposits PCache X and corresponds to retryCounter [X] counter of host DDCU X from subtracting 1.One Credit_Grant message is corresponding It is primary to retransmit.At this point, if retryCounter [X] counter becomes 0, then it represents that privately owned caching PCache X and host DDCU X Between have dropped out flow control mode, the dynamic requests in normal queue can be sent;If retryCounter [X] counter is also greater than 0, privately owned caching PCache X will continue to wait next Credit_Grant message.
S9)In host DDCU X, when receiving privately owned caching PCache X static requests Sreq, if the stream of host DDCU X Resource contention occurs for waterline, then is stored in reserved already retry in buffering [X](9. such as the path in Fig. 6), then according to implementation Step S5)In rule back flow back into the work assembly line of host DDCU X.If the assembly line of host DDCU X there is no Resource contention smoothly completes request transaction, then corresponding creditCounter [X] subtracts 1 certainly, and reserved retries buffering [X] queue Into Idle state.At this point, the queue can be also arranged in host DDCU X if counter retryCounter [X] is also greater than 0 A Credit_Grant message is sent into reserved mode, while to privately owned caching PCache X, is waited to be processed next next From the static requests Sreq of privately owned caching PCache X;If counter retryCounter [X] and creditCounter [X] All become 0, then it represents that flow control mode is had dropped out between privately owned caching PCache X and host DDCU X, host DDCU X can handle Normal dynamic credit request Dreq.
As shown in fig. 7, the quene state for retrying buffering [X] in the present embodiment other than Idle state free, further includes stream State issued, wake-up tri- kinds of busy states of state wakeup and sleep state sleep, DCU assembly line detect resource contention, are inserted into out When request transaction, it is inserted into if it is due to caused by cache line Cacheline replacement and catalog memory resource exhaustion certainly, Then the affairs in queue, which are directly entered, wakes up state wakeup;Effectively draw if it is the Bsy due to cache line Cacheline If the insertion risen, then the affairs in queue are directly entered sleep state sleep, wait resource contention resolution by after wakeup, into Enter to wake up state wakeup.Only enter the request transaction for waking up state wakeup, can just be rescheduled the workflow into DDCU Waterline.Due to being 1 between certain address spatial lamination DCU and privately owned caching PCache:The relationship of N, DCU scheduler can encounter multiple heavy The scene of examination queue while wakeup, therefore ad hoc outflow state issued indicate that all request transaction is being scheduled and select issue_ select.According to scheduling result, if success issue_success, the queue enter Idle state free, can be used under storage Request transaction when primary conflict;If failure issue_fail, the queue reenter wake up state wakeup, wait again by Scheduling selection issue_select.In the present embodiment, the privately owned caching PCache request transaction for being sent to host DDCU is divided into dynamic State credit request(Dynamic Credited Request, Dreq)With static credit request(Static Credited Request, Sreq)Two classes.Into before flow control mode, privately owned caching PCache default presses dynamic credit model, with arbitrary speed Request transaction is sent to host DDCU;Into after flow control mode, privately owned caching PCache can only controllably be sent to host DDCU Static credit request affairs.Host DDCU by instruction be retryACK message is interacted with privately owned caching PCache, enable its into Enter flow control mode;Host DDCU is interacted by the message that instruction is Credit_Grant with privately owned caching PCache, enables it with quiet The mode of state credit retransmits the request transaction that processing is previously abandoned by DCU.
As shown in figure 8, the assembly line that works in host DDCU X due to resource contention, will abandon Dreq_read0 request and Dreq_read1 request, and by returning to two RetryAck messages to privately owned caching PCache X, inform the situation, hereafter place Main DDCU X and privately owned caching PCache X enter flow control mode.It, can a starting Credit_ after resource contention resolution Grant message gives privately owned caching PCache X, prepares to receive first static requests Sreq_read0.Deng first Sreq_ After the completion of read0 read transaction, host DDCU X sends a Credit_Grant message to privately owned caching PCache X, standard again It is standby to receive first static requests Sreq_read1.Sreq_read1 read transaction smoothly flows into the work assembly line of host DDCU X Afterwards, host DDCU X and privately owned caching PCache X exit flow control mode.Privately owned caching PCache X starts to host DDCU X Send new dynamic credit request affairs, Dreq_read2 request and Dreq_read3 request.
In addition, the present embodiment also provides a kind of caching data consistency realization device based on distributed limited catalogue, packet Many-core microprocessor is included, which is programmed to perform the present embodiment based on the data cached of distributed limited catalogue The step of consistency implementation method, details are not described herein.
The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.
Claims (4)
1. a kind of caching data consistency implementation method based on distributed limited catalogue, it is characterised in that implementation steps include:
1)The request of privately owned caching PCache X response alignment processing unit PE X, and when request is not hit by or has data to write back Reading and writing data consistency request transaction is issued to corresponding host DDCU X;
2)Host DDCU X handles reading and writing data consistency request transaction, and encounters resource in reading and writing data consistency request transaction It preferentially is stored in retry in buffering [X] and delay if retrying buffering [X] and being in Idle state free when conflict and be handled, otherwise It abandons reading and writing data consistency request transaction and retries response message RetryAck to privately owned caching PCache X transmission;
3)Enter flow control mode between privately owned caching PCache X and privately owned caching PCache, it is slow when retrying under flow control mode When punching [X] is in Idle state free, host DDCU X sends letter about message Credit_ to privately owned caching PCache X one by one Grant, and every send a letter about message Credit_Grant then privately owned caching PCache X will be abandoned by host DDCU X One reading and writing data consistency request transaction, which retransmits, gives host DDCU X, and host DDCU X handles the reading and writing data consistency Request transaction, until being fully completed backed off after random flow control mould by the host DDCU X all reading and writing data consistency request transactions abandoned Formula.
2. the caching data consistency implementation method according to claim 1 based on distributed limited catalogue, feature exist In step 1)When the request of privately owned caching PCache X response alignment processing unit PE X, if access request is hit, directly It returns to hit results and exits;If access request is not hit by or there are data to write back, dynamic is set by the message of request Credit request, which is stored in wait in normal queue, issues host DDCU X, and in privately owned caching PCache by the report in normal queue Message is also replicated to the bottom of the local re-transmit queue of a deposit while text is sent to host DDCU X.
3. the caching data consistency implementation method according to claim 2 based on distributed limited catalogue, feature exist In step 2)It is middle to privately owned caching PCache X transmission retry response message RetryAck when further include that will retry buffering [X] RetryCounter counter increases 1, and step 3 certainly)Detailed step include:
3.1)Privately owned caching PCache X is received retry response message RetryAck after then enter flow control mode with host DDCU X, The pause message that wherein destination node in normal queue is host DDCU X is sent, and by its corresponding counter RetryCounter [X] did not allowed to send to host DDCU X before counter retryCounter [X] becomes 0 from increasing 1 Dynamic credit request Dreq, the dynamic credit request Dreq refer to that the reading and writing data consistency under non-flow control mode requests thing Business;
3.2)Work as the upper request transaction being retried retried in buffering [X] in host DDCU X successfully to flow out, retries buffering [X] quene state can be converted to Idle state free, if the counter retryCounter value for retrying buffering [X] is greater than 1, place Main DDCU X can be arranged first retries buffering [X] into reserved mode, and jumps and perform the next step;
3.3)Host DDCU X sends letter about message Credit_Grant to corresponding privately owned caching PCache X;
3.4)After privately owned caching PCache X receives letter about message Credit_Grant, will previously it be abandoned in chronological order by DCU Request transaction take out earliest item data read-write consistency request transaction from re-transmit queue and be set as static credit report Literary Sreq is sent to host DDCU X again, while the counter retryCounter [X] of corresponding host DDCU X subtracts 1 certainly;Institute It states static credit message Sreq and refers to reading and writing data consistency request transaction under flow control mode;
3.5)Host DDCU X is received be set as the reading and writing data consistency request transaction of static credit message Sreq after, to remaining Privately owned caching PCache send monitor message safeguard corresponding cache line Cacheline consistency or by with deposit It stores up controller interaction and completes corresponding cache line Cacheline consistency maintenance;
3.6)Judge whether the counter retryCounter [X] of host DDCU X is 0, if it is 0 privately owned caching PCache X and host DDCU X exit flow control mode;Otherwise, it jumps and executes step 3.3).
4. a kind of caching data consistency realization device based on distributed limited catalogue, including many-core microprocessor, feature It is:The many-core microprocessor is programmed to perform described in any one of claims 1 to 3 based on distributed limited catalogue Caching data consistency implementation method the step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810719059.4A CN108920192B (en) | 2018-07-03 | 2018-07-03 | Method and device for realizing cache data consistency based on distributed limited directory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810719059.4A CN108920192B (en) | 2018-07-03 | 2018-07-03 | Method and device for realizing cache data consistency based on distributed limited directory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108920192A true CN108920192A (en) | 2018-11-30 |
CN108920192B CN108920192B (en) | 2021-07-30 |
Family
ID=64423600
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810719059.4A Active CN108920192B (en) | 2018-07-03 | 2018-07-03 | Method and device for realizing cache data consistency based on distributed limited directory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108920192B (en) |
Cited By (4)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111651375A (en) * | 2020-05-22 | 2020-09-11 | 中国人民解放军国防科技大学 | Method and system for realizing data consistency of multi-processor cache based on distributed limited directory |
CN114153756A (en) * | 2021-12-03 | 2022-03-08 | 中国人民解放军国防科技大学 | A Configurable Micro-Operation Mechanism for Multicore Processor Directory Protocol |
CN114416798A (en) * | 2022-01-20 | 2022-04-29 | 上海金融期货信息技术有限公司 | Cache management method and device based on data dependency and consistency guarantee |
CN114546263A (en) * | 2022-01-23 | 2022-05-27 | 苏州浪潮智能科技有限公司 | Data storage method, system, device and medium |
Citations (2)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101336419A (en) * | 2006-01-31 | 2008-12-31 | 富士通株式会社 | Memory access control device and memory access control method |
US20140244939A1 (en) * | 2013-02-28 | 2014-08-28 | Industry & Academic Cooperation Group Of Sejong University | Texture cache memory system of non-blocking for texture mapping pipeline and operation method of texture cache memory |
-
2018
- 2018-07-03 CN CN201810719059.4A patent/CN108920192B/en active Active
Patent Citations (2)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101336419A (en) * | 2006-01-31 | 2008-12-31 | 富士通株式会社 | Memory access control device and memory access control method |
US20140244939A1 (en) * | 2013-02-28 | 2014-08-28 | Industry & Academic Cooperation Group Of Sejong University | Texture cache memory system of non-blocking for texture mapping pipeline and operation method of texture cache memory |
Non-Patent Citations (2)
* Cited by examiner, † Cited by third partyTitle |
---|
JOHN GIACOMONI 等: "FastForward for Efficient Pipeline Parallelism-A Cache-Optimized Concurrent Lock-Free Queue", 《PROCEEDINGS OF THE 13TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMINGFEBRUARY 2008》 * |
吴俊杰: "DOOC:一种能够有效消除抖动的软硬件合作管理Cache", 《计算机研究与发展》 * |
Cited By (7)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111651375A (en) * | 2020-05-22 | 2020-09-11 | 中国人民解放军国防科技大学 | Method and system for realizing data consistency of multi-processor cache based on distributed limited directory |
CN114153756A (en) * | 2021-12-03 | 2022-03-08 | 中国人民解放军国防科技大学 | A Configurable Micro-Operation Mechanism for Multicore Processor Directory Protocol |
CN114153756B (en) * | 2021-12-03 | 2024-09-24 | 中国人民解放军国防科技大学 | Configurable micro-operation mechanism oriented to multi-core processor directory protocol |
CN114416798A (en) * | 2022-01-20 | 2022-04-29 | 上海金融期货信息技术有限公司 | Cache management method and device based on data dependency and consistency guarantee |
CN114416798B (en) * | 2022-01-20 | 2024-11-12 | 上海金融期货信息技术有限公司 | Cache management method and device based on data dependency and consistency guarantee |
CN114546263A (en) * | 2022-01-23 | 2022-05-27 | 苏州浪潮智能科技有限公司 | Data storage method, system, device and medium |
CN114546263B (en) * | 2022-01-23 | 2023-08-18 | 苏州浪潮智能科技有限公司 | Data storage method, system, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN108920192B (en) | 2021-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6944983B2 (en) | 2021-10-06 | Hybrid memory management |
CN108885583B (en) | 2022-08-30 | Cache memory access |
JP6314355B2 (en) | 2018-04-25 | Memory management method and device |
CN108920192A (en) | 2018-11-30 | Method and device for implementing cache data consistency based on distributed finite directory |
JP5006348B2 (en) | 2012-08-22 | Multi-cache coordination for response output cache |
CN101604295B (en) | 2013-03-27 | Optimizing concurrent accesses in a directory-based coherency protocol |
AU2013217351B2 (en) | 2016-04-28 | Processor performance improvement for instruction sequences that include barrier instructions |
EP2686774B1 (en) | 2017-02-01 | Memory interface |
US20080082759A1 (en) | 2008-04-03 | Global address space management |
US20160275015A1 (en) | 2016-09-22 | Computing architecture with peripherals |
US8375171B2 (en) | 2013-02-12 | System and method for providing L2 cache conflict avoidance |
WO2004095291A2 (en) | 2004-11-04 | Cache allocation upon data placement in network interface |
CN105242872B (en) | 2018-06-12 | A kind of shared memory systems of Virtual cluster |
TW200809608A (en) | 2008-02-16 | Method and apparatus for tracking command order dependencies |
CN103455371B (en) | 2017-03-01 | The method and system of message communicating between for minor node in the tube core of optimization |
EP1021764A1 (en) | 2000-07-26 | I/o forwarding in a cache coherent shared disk computer system |
CN103399824A (en) | 2013-11-20 | Method and device for holding cache miss states of caches in processor of computer |
CN109992566A (en) | 2019-07-09 | A kind of file access method, device, equipment and readable storage medium storing program for executing |
US20080082622A1 (en) | 2008-04-03 | Communication in a cluster system |
KR102220468B1 (en) | 2021-02-25 | Preemptive cache post-recording with transaction support |
US10917198B2 (en) | 2021-02-09 | Transfer protocol in a data processing network |
US10664398B2 (en) | 2020-05-26 | Link-level cyclic redundancy check replay for non-blocking coherence flow |
US11449489B2 (en) | 2022-09-20 | Split transaction coherency protocol in a data processing system |
CN111651375A (en) | 2020-09-11 | Method and system for realizing data consistency of multi-processor cache based on distributed limited directory |
US20190042342A1 (en) | 2019-02-07 | Techniques for managing a hang condition in a data processing system with shared memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
2018-11-30 | PB01 | Publication | |
2018-11-30 | PB01 | Publication | |
2018-12-25 | SE01 | Entry into force of request for substantive examination | |
2018-12-25 | SE01 | Entry into force of request for substantive examination | |
2021-07-30 | GR01 | Patent grant | |
2021-07-30 | GR01 | Patent grant |