stochastictrebuchet

stochastictrebuchet@sh.itjust.works · 3 months ago

ML engineer here. My intuition says you won’t get better accuracy than with sentence template matching, provided your matching rules are free of contradictions. Of course, the downside is you need to remember (and teach others) the precise phrasing to trigger a certain intent. Refining your matching rules is probably a good task for a coding agent.

Back in the pre-LLM days, we used simpler statistical models for intent classification. These were way smaller and could easily run on CPU. Check out random forests or SVMs that take bags of words as input. You need enough examples though to train them on.

With an LLM you can reframe the problem as getting the model to generate the right ‘tool’ call. Most intents are a form of relation extraction: there’s an ‘action’ (verb) and one or more participants (subject, object, etc.). You could imagine a single tool definition (call it ‘SpeakerIntent’) that outputs the intent type (from an enum) as well as the arguments involved. Then you can link that to the final intent with some post-processing. There’s a 100M version of gemma3 that’s apparently not bad at tool calling.

stochastictrebuchet@sh.itjust.works · 3 months ago

Does Discord have something similar to Slack threads? That’s more or less helps to group related discussion together. Still, even threads eventually get lost in the chat history.

stochastictrebuchet@sh.itjust.works · 4 months ago

It completely baffles me they didn’t just skip ahead a few years last season. The noticeable difference between the ages of the actors and the ages of their characters made it hard to take it seriously

stochastictrebuchet@sh.itjust.works · 6 months ago

Top left is Get Out. Pretty decent thriller

stochastictrebuchet@sh.itjust.works · 8 months ago

Broke: file names have a max character length.

Woke: split b64-encoded data into numbered parts and add .part-1…n suffix to each file name.

stochastictrebuchet@sh.itjust.works · 8 months ago

Now do ‘ocasional’… ‘ocassion— fuck, ‘occasionnall—

Screw it

stochastictrebuchet@sh.itjust.works · 10 months ago

It all started with that damn gorilla

stochastictrebuchet@sh.itjust.works · 10 months ago

Thanks for teaching me something new!

So Chromium is based on Blink, which is LGPL – a less viral GPL. Hence, it can serve as a dependency in closed-source software.

As to the shared heritage of these well-established projects – I don’t know how else to interpret it other than a testament to the complexity of building a decent browser engine.

Btw, quick shout out to Orion, a rare WebKit browser by the makers of Kagi that’s apparently coming to Linux as well. I’m a monthly supporter. Even though I still mostly use Vivaldi, it’s been coming along really nicely. Proprietary software but idc. I appreciate their unspoken mission statement: pay or be the product. (No-one should be a product, obviously, but that’s capitalism.)

stochastictrebuchet@sh.itjust.works · 10 months ago

Don’t have time to factcheck so going to take your word for it. Interesting bit of knowledge! Honestly wouldn’t have thought that. How else are Chrome, Edge, Brave, Arc, Vivaldi and co getting away with building proprietary layers on top of a copyleft dependency?

I’m no legal expert. All I know is that when I’m picking dependencies at work, if it’s copyleft, I leave it on the table. I love the spirit of GPL, but I don’t love the idea of failing an audit by potential investors because of avoidable liabilities.

stochastictrebuchet@sh.itjust.works · 10 months ago

I’m OOTL. Are these actual issues people have with the project?

C++ might not be as memory-safe as Rust, but let’s not pretend a Rust code base wouldn’t be riddled with raw pointers.

BSD tells me the team probably wants Ladybird to become not just a standalone browser but also a new competing base for others to build a browser on top of – a Chromium competitor. Even though BSD wouldn’t force downstream projects to contribute back upstream, they probably would, since that’s far less resource-intensive than maintaining a fork. (Source: me, who works on proprietary software, can’t use GPL stuff, but contributes back to my open-source dependencies.)

stochastictrebuchet@sh.itjust.works · 11 months ago

https://minilanguage.com/ is an interesting one to look at. There are exactly 1000 words in the total vocabulary. That’s Mini Mundo though. A second, smaller variant also exists: Mini Kore, with 100 words.

I started learning it too soon after learning Toki Pona and lost steam. But I agree with the design principles. They stem from the observation that Toki Pona, as fun as it is, is just too damn ambiguous for anything non-superficial. All too often speakers need to clarify what they said by switching to a natural language. Even my own Toki notes become indecipherable after a few days.

Toki Pona: fun, therapeutic mental exercise, made even better with sitelen pona. Feels like writing poetry. Never meant to be a useful language. Easy to learn, hard to use.

Mini: useful as a language for general purpose communication. Small, primarily latinate vocabulary. Harder to learn, easier to use.

stochastictrebuchet@sh.itjust.works · 11 months ago

Kalama Sin podcast is a good one for listening comprehension. No new episodes since July though

stochastictrebuchet@sh.itjust.works · 1 year ago

To the extent that the billboard never existed while the image implies it did – sure.

I love the term ‘slop’. It’s one of my favorite new words along with ‘nontent’.

But this, to me, isn’t that. I think of slop as ‘unrequested, unconvincing, lazy, and lifeless’. In short, ineffective and unwelcome.

I feel like this meme gets the message across. It’s not great, but it’s not terrible. The AI tells are subtle enough: the multi lane pileup in the background and some poor small size text rendering.

Not sure why I felt the need to write this. Guess I’m of the opinion that just because something is AI-generated doesn’t mean it should be discounted immediately, unless it really feels like zero effort went into it. Have a nice day!

stochastictrebuchet@sh.itjust.works · 1 year ago

Ya better werk 🫰

stochastictrebuchet@sh.itjust.works · 1 year ago

Task failed successfully

stochastictrebuchet@sh.itjust.works · 1 year ago

GenAI coding assistants are only as good as the data they are trained on. Less-used proglangs make up a tiny fraction of the available data, or may even be completely absent. There is a reason coding assistants give convincing results with Python and JS/TS, but underperform even on relatively up-and-coming langs like Rust.

stochastictrebuchet@sh.itjust.works · 2 years ago

Out of curiosity, do you have to refine it somehow, or is it good to eat straight from the tree?