mirbsd.org

MirBSD: Welcome at MirBSD

  • ️Fri Jul 26 2024

Updated 2024-10-04.

I haven’t made a secret of my stance towards ML/LLM, so-called “AI” or “GenAI” or, sometimes, even “AGI”. In fact, I require contributors to confirm that they have not leveraged AI to create the proposed changes; this wording was coined by the GotoSocial project, who state:

We will not accept changes (code or otherwise) created with the aid of "AI" tooling. "AI" models are trained at the expense of underpaid workers filtering inputs of abhorrent content, and does not respect the owners of input content. Ethically, it sucks.

(Note: I interpret this, and I’m fairly sure that so do they, as not only must the actual contribution not be made of AI sludge, but also is everyone who contributed forbidden from using AI tools during the time of creating the contribution.)

This is a harsh stance, but it’s better at this point than to even chance it. Plus, besides the ethical problems, it’s ecologically worse than even just unsustainable. Don’t even try an ML/LLM tool to prove how much they suck. Their resource (energy, even clean water!) abuse is just too horrid for that. See also this post and this thread.

And then, there’s the copyright and related parts, not limited to ripping off the creatives and damaging the minds of readers. I put interpretation guidelines on the webpage of The MirOS Licence stating that model output (a collage) is naturally a derivative work of the inputs and therefore subject to the licences of those inputs (“training data”); there’s even legal statements saying so (for as long as the protected work is still recognisable in the collage). There’s a bit more to it, but I’m forgetting things as I’m trying to put together this wlog entry, which I’ll probably update in the future whenever I find new points. Some organisations, such as the notorious “Creative Commons” or, shamefully, the OSI, have posted statements to the contrary, often resting on variants of “but it’s needed for AI to work!!!!!11” and/or USA-specific things (“fair use”) and generally a wrong understanding of how ML/LLM actually works. I’ve put together a list of references and a set of instructions for how to interpret any Creative Commons licence I grant, deliberately deviating from those of the licence steward (as licensor my intent weighs more). May you find help in this. Even the GEMA is following this, from intent.

Even manager magazines have realised that the only actual profit in “AI” to be made is in selling hardware and hosting in data centres for those who fell to this latest hype, such as in the Business Insider and even Goldman Sachs (and another thread looking at the Goldman Sachs thing in more detail), and Forbes as well, and The Economist and others… turns out that it’s not suited for tasks that actually are useful parts of the chain of economic value added. And the IEEE found that it’s not good at new tasks — not a surprise to us knowing how they internally function, e.g. from that great Explain Extended article I linked in the “put together” list above.

  • To justify costs, [AI] must be able to solve complex problems, which it isn’t designed to do […] not a matter of just some tweaks […] the tech is nowhere near where it needs to be in order to be useful for even basic tasks
  • Goldman Sachs argues that AI so is hyped right now that optimism about it casts a shadow over the entire stock market for the next decade
  • Experienced developers using Copilot are 20% less productive.
  • Only 5% of businesses are doing any kind of AI roll out and, of those, a lot are scaling back because the returns are nonexistent or negative.
  • Silicon Valley companies spent $50B on NVIDIA GPUs and that led to a $3B increase in revenue (not profit).

This leaves open the question: who is still endorsing “AI” despite all these facts, in the current climate, with the known drawbacks?

[…] the people who are most excited by AI in any given field are the people with the least talent in that field.

There.

Anyone who has seriously tried to use these “ AI “ tools already knows this.

Oh, and, attackers might also like the new avenues opened to them.


Update 2024-10-04:
I’ve collected quite a number of bookmarks of more helpful posts pointing out more of the same, and more other, problems with “AI”, which I’ll put in this list of references and/or this article, in a subsequent update, soon. But there is a thing, triggered by the current Wikipedia issues, I’ve got to note right now:

I now require contributors to confirm that they have not leveraged AI to create the proposed changes, and to the best of [my/our / their] knowledge, the submission is free of “AI” output, with again the same interpretation as above. This mostly affects entering AI sludge as proxy, such as transcribing things from a printout which might have been created using “AI”, and requires not only not doing this knowingly but also the submitter to apply due diligence in selecting their sources and to discard AI sludge, as far as they can recognise.

Some time ago already, NetBSD® joined the list of those who explicitly ban AI sludge as tainted. Unfortunately, Debian still tries to sit this out, as it’s wont to do; OpenBSD is worse, an AI contribution was rejected, but only on quality grounds, with deraadt@ seemingly not taking its AI sludge status into account. We’ll see how others will decide…

Due to high demand, I’ve set up a Debian GNU/Linux VM that I already operate for multiple other purposes, and which already carried a mirror of MirBSD CVS and downloads, to also mirror (per rsync-over-ssh) the website and expose all that as a publicly accessible web mirror complete with SSL certificate and all that. The server supports TLSv1.2 and TLSv1.3 but should also still work with TLSv1.0 and without SNI, and, of course, plain http also continues to work.

tl;dr: https://mbsd.evolvis.org/ with the usual URL paths.

Given that it’s not running on native MirBSD, there may be a few caveats; I’ve proxied the “give me entropy” CGIs to the main machine via https and made everything else work, but at least the diffs generated from CVSweb have slightly different hunk distribution. The static content (i.e. all those *.htm files as well as the /MirOS/** downloads) are of course bitwise identical, and, as I’ve patched rsync on MirBSD to account for leap seconds but convert to POSIX time_t on the wire as expected, the timestamps should also be identical (unless I do manage to release some software during a leap second, which so far I haven’t, but the Time::Local tests managed to hit one precisely *sigh…*).

I expect that URL to stay stable even across future planned migrations of the machine to a different setup and, possibly, provider; this is why this got a separate, specific hostname.

TLSv1.2 support in MirBSD, I’m afraid, still has no ETA, given that I have other construction sites open and do dayjob and stuff.

I’m sorry to miss FOSDEM, but huge events during a pandemic should be avoided, and given others do not mask, attending involves some danger. I’m sitting this out; maybe another time? I do miss it…

So, apparently, DNS names can only be up to 253 octets long in ASCII form. The label length octets need accounting. Thanks jschauma!
Consequently, my rfc822 library and tool version 0.7 was released.

Debian 11 “bullseye” was released today (it’s still the 14ᵗʰ for me…) as well. I switched all my unstable “sid” systems to bullseye to avoid systemd’s UsrMove, which (per Technical Committee) is mandatory to be supported in any subsequent release (gah!). Still, congratulations!

Due to RT’s porting efforts, I’m still not finished with the mksh things I wanted to do, but am continuing with others. I’ll release a new sleep(1) soon (but, maybe, we can test it on many platforms first?) and guess I’ll switch ed and jupp to mirtoconf as well when I find the time.

I also had fun with… ISO 3166, ccTLDs, etc. and wtf(1). Added lots, and also deduplicated, in the acronyms database. Not the 1300+ gTLDs though. They’re insane, ICAN’t doesn’t publish either which ones are still active or their meaning (corresponding to those already present). Anyway please enjoy! Submissions, as usual, welcome ☺

My contribution to Free Sheet Music is also growing. I slightly reorganised the index (left side) of the main website, only select subprojects are now shown, but all, including musical things, the Foundry etc. are listed in the page about subprojects, some just with a small link or placeholder, others with much more. I think there may be more to add… but this, and some hyperlinking (in all directions), could help.

Now off to sleep. Our cat is already sleeping again. Thankfully, this is (probably) the last really warm day.

After Garmin’s proprietary “opencaching.com” platform, which virtually nobody pined after, and ignoring that Navicache has not been more than a zombie for quite a while, I am regretting GPSgames.org (who offered just so much more than just geocaching — GeoDashing, GeoVexilla (I partook in both), Shutterspot (not for me), MinuteWar, GeoGolf and GeoPoker (which I never really got) — although I guess GeoHashing is the closest thing to, at least, GeoDashing) is no more. A month later, it doesn’t look it will ever be resurrected, even though this outage is unplanned; an archival was scheduled for later, which is quite a pity — I had renewed my interest in them due to the pandemic, but that was to be planned and keeping historic data intact.

This was the only platform which used a Free licence for its content (CC-BY-SA), even if, like all others, it required a more broad grant from contributors. Now, only nōn-free platforms (like the OpenCaching network) are left; only commercialising seems to save most. Pity.

Update 2022-10-22: navicache.com, having been mostly unusable due to bugs already for years, is now also gone: whoever operated this let the domain expire. The log and cache database is most likely also gone forever.