phabricator.wikimedia.org

⚓ T194697 Multiblocks — Allow for multiple, simultaneous blocks with different expiration dates

️Mon May 14 2018

Community Tech are looking at this, following its high ranking in the 2023 Community Wishlist Survey.

Three people have said to me that we should consider just dropping the unique index constraint. But I'm still pretty keen on keeping the block conflict feature I introduced in 2006 (1d9922db6442dec82e4be07c45c2e8a3a2035602). In the multiblocks context, the feature means that if two admins try to block a user or IP address simultaneously, and the user has no prior blocks, one admin should succeed and the other should get a confirmation page with details of the first block. If we just remove the unique index, both admins would succeed in creating a block.

I can think of three approaches to supporting multiple blocks with block conflict prevention:

1. Gap locking with FOR UPDATE

The unique index is dropped, and we use gap locking to prevent insertion of rows with the same (or adjacent) addresses.

BEGIN;
SELECT COUNT(*) FROM ipblocks WHERE ipb_address='...' FOR UPDATE;
INSERT INTO ipblocks ...;
COMMIT;

The problem is that it will generate deadlocks, both in the conflict case and when adjacent IP addresses or users are blocked.

According to the MySQL manual, deadlocks are not meant to be a big deal. You can just reissue the transaction. But this conflicts with MediaWiki's current transaction system, in which there is a single transaction for the whole web request, with no control over it except via DeferredUpdates. This makes it difficult to write deadlock-tolerant code.

2. Denormalized summary table

The target information is duplicated into a summary table. The unique index is dropped from ipblocks, but the summary table has a unique index.

BEGIN;
INSERT INTO ipblocks_summary (ips_address, ips_count) VALUES (...);
INSERT INTO ipblocks (ipb_address, ...) VALUES (...);
COMMIT;

Secondary block insertion:

BEGIN;
UPDATE ipblocks_summary SET ips_count=2 WHERE ips_address='...', ips_count=1;
INSERT INTO ipblocks ...;
COMMIT;

3. Normalized schema

This is my idea from 2018 T194697#4490345. Although we could keep the ipblocks table and field names the same for easier migration. The plan would be to add the summary table as in option 2 (now called block_target), then add ipb_target, then drop the duplicated fields from ipblocks such as ipb_address.

BEGIN;
INSERT INTO block_target ...;
INSERT INTO ipblocks (ipb_target, ...) VALUES ($bt_id, ...);
COMMIT;

Secondary block insertion:

BEGIN;
UPDATE block_target SET bt_count=2 WHERE bt_id=$id..., bt_count=1;
INSERT INTO ipblocks (ipb_target, ...) VALUES ($id, ...);
COMMIT;

Migration options:

Use SCHEMA_COMPAT_xxx modes. Complex code, slow migration, zero impact on users.
Use a single feature flag. For each section in turn, switch the wikis in the section to read-only mode, update the schema, then switch the feature flag.
Cowboy olden days option. Just update the code and do the schema update simultaneously with a train deployment.

Considerations:

Code review for migration complexity.
Schema change time. The size of ipblocks varies, see T267818.

Bonus level: fixing polymorphic ipblocks

@daniel noted that user blocks and IP blocks are not really the same. In terms of schema design, ipblocks is a polymorphic table, since it is trying to represent at least two types of target (IP range and user). For an IP block, ipb_user is zero. for a user block, ipb_range_start is empty and ipb_address contains the denormalized username, requiring a special case in RenameuserSQL. Arguably single IP addresses and IP ranges are distinct kinds of blocks. There is a fixme in tables.json saying that, for efficiency, ipb_range_start/ipb_range_end should be empty for single-IP blocks. In the PHP code, they have distinct type constants Block::TYPE_IP and Block::TYPE_RANGE.

The conventional representation for polymorphic data, referring to e.g. Polymorphic Association – bad SQL Smell! by Tom Gillies, is to have separate tables for separate types, and to reverse the direction of the foreign key association. In such a design, the block_target table would just have a primary key, a count, and maybe a type enum, and we would have tables such as blocked_user with bu_target → block_target.bt_id.

It would be a far-reaching change to the conceptual model, ideally including a split of the DatabaseBlock class.