phabricator.wikimedia.org

⚓ T173772 Create mechanism to update categories database in graph storage

  • ️Mon Aug 21 2017

Create mechanism to update categories database in graph storage

As categories change, we need to update the contents of the graph database hosting categories. For this, we need to figure out mechanism for updating those.

Event Timeline

Comment Actions

Current thinking is:

  • Every day, create RDF of updated categories, as SPARQL Update file
  • Load it into the blazegraph after it is created.
  • This will be done for each wiki that has the functionality enabled.

Reality check:
enwiki seems to have 73662 category updates and 498 category creations on August 19th 2017. Similar numbers show up on other days. This seems to be completely workable number to process daily. Moreover, many category updates will prove on the same categories - seems to be real number of distinct categories update on enwiki is around 25K/day.

On commons, numbers seem to be about 2-3x from this for modifications and about 5x for creations. Still seems to be workable, and commons is probably the upper bound of what we're going to get.

Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits