phabricator.wikimedia.org

⚓ T120867 Large amounts of unwanted files (mostly copyvios) uploaded via cross-wiki upload tool (A/B test of different upload interfaces)

  • ️Tue Dec 08 2015

Comment Actions

@matmarex: thx for your honest answer. I followed your attempts to make this thing working and I respect the work you dedicated for it.

Let's do the same analysis as before, now for cross-wiki-uploads from 08.03.2016 via Quarry = User:Gunnex/Cross-wiki uploads 08.03.2016, because I worked on that data already intensively since May 2016.

Here, we have less uploads and only 1 day to check (may be influenced also by (un)lucky circumstances), but just to find out a "trend". I posted here already some interim results before but we have now an almost stable data situation.

  • 1st run on 10.05.2016. 1.289** rows. 902 living (87 pending deletion), 387 deleted.
  • 2nd run on 12.05.2016. 1.309** rows. 846 living (180 pending deletion), 463 deleted.
  • 3rd run on 15.05.2016: 1.317** rows. 813 living (252 pending deletion), 504 deleted.
  • 4th run on 29.05.2016: 1.291** rows. 643 living (97 pending deletion), 648 deleted.
  • 5th run on 01.06.2016: 1.299** rows. 522 living (146 pending deletion), 777 deleted.
  • 6th run on 07.06.2016: 1.279** rows. 446 living (73 pending deletion), 833 deleted.
  • 7th run on 27.07.2016: 1.276** rows, 363 living (0 pending deletion), 913 deleted.

--> **) including some double entries due to bug T140522

...reaching an overall bad ratio of 71,55 % (copyvios/PS/perm./source/etc.) for this period.

The 913 deleted files were deleted, because (per info available at column "DeletionReason"):

DeletionReason:copyviosno-permissionno-sourceno-licensedeletion requests*others**total
Files:407826040117913

--> *) multiple issues (copyvios/permission/sources/project scope)
--> **) multiple issues (duplicates, attack images, user errors, project scope, etc.)

Regarding user registration we have:

Uploaded files, depending on registration date:

Registration date:08.03.201607.03.201606.03.201605.03.201604.03.2016–01.01.2016201520142013201220112010–2008total
Files:778114111239631820319101.276
Deleted:5699571157481410273913
Bad ratio (%):73,1483,3363,64100,0065,6976,1977,7850,0066,6736,8430,00

904 uploads (70,85 %) by fresh users registered 08.03. – 05.03.2016 are standing for 672 deleted files = 73,60 % --> that means, that each 1,36 file uploaded only by these users were "bad".

Again, also older accounts (which may be more familiar with policies of Wikipedia) presents high bad ratios but here we have also a smaller data base to compare.

As you can see via User:Gunnex/Cross-wiki uploads from pt.wikipedia.org I am checking (with some back log) especially cross-wiki uploads from my (ex-) home wiki: ptwiki. It was not surprising to find out that I got here around a 85,00 % bad ratio (10.2015 – 04.2016) for cross-wiki uploads from pt.wikipedia.org

Ok, ptwiki is one the "bad wikis" – but who else?
I managed to adapt the Quarry for 08.03.2016 with a new column, indicating the origin of the cross-wiki upload + did the same for 15.07.–18.07.2016 for comparism.

The numbers:
(ignoring wikis with lower than 10 uploads)

08.03.201615.07.–18.07.2016
wikiUploadsDeletedBad ratio (%)UploadsDeletedPending deletionBad ratio (%)*
arwiki161487,5050172584,00
cswiki12433,33175241,18
dewiki562951,79118334364,41
elwiki13323,08125691,67
enwiki54941575,5970026020967,00
eswiki1199680,67146525472,60
frwiki955861,0584233366,67
huwiki402460,003801026,32
itwiki383181,5960172265,00
jawiki11872,73144135,71
mnwiki2323100,00nananana
nlwiki161381,25216761,91
plwiki3133,3352141861,54
ptwiki332884,8577282974,03
ruwiki584679,31105253759,05
srwiki10990,001000,00
svwiki10550,00127491,67
trwiki211571,4339231494,87
ukwiki291034,482918268,97
zhwiki141178,572412258,33

--> *) taking into account also the pending deletion

Well, eswiki was already on my "watch-list" before due to comparable cultural (let's say...) "spontaneity" as ptwiki (in other words: they don't care - and they may have heard about "copyrights" but they are ignoring it), and e.g. arwiki is probadly a typical case of "I don't care + I never heard of "copyrights". On the other hand dewiki, which has the merit of beeing probadly the most reliable wiki-version, but which is also "equipped" with some users who are falling into the group of "I don't care/I heard about it, but.../I don't know" – like also other "big" wikis like fr/it/nl/ru/etc.

Or in other words: all wikis are somehow "bad" and/or "not so good" – some more, some less. And the bad ratios from cross-wiki uploads from enwiki made by users around the world is a quite representative cross-section from user behaviour worldwide (and they confirm the bad ratios mentioned above). So, deactivating the cross-wiki upload tool only on some "critical" wikis (which could be also an option) most likely does not solve the whole mess (and...well... the users may also switch to a wiki with activated tool, gaming the system).

So, citing myself:

So, in other words: the cross-wiki-upload tool is in the vast majority a perfect tool for users who – quicky-like – wants to illustrate/promote/etc. something spontaneous on Wikipedia, ignoring further concerns about copyrights. Just grab it from Internet. It is obvious that WMF is trying to establish a somehow social media-like thing, imitating Facebook & Co... – which is going completely wrong.

And that's a global problem.

(...) Requesting wiki configuration changes says we need community consensus, so I suppose we should start an RFC at Commons:Village pump/Proposals?

Probadly yes (but not by me).