stochastictrebuchet

  • 0 Posts
  • 17 Comments
Joined 3 years ago
cake
Cake day: June 12th, 2023

help-circle
  • ML engineer here. My intuition says you won’t get better accuracy than with sentence template matching, provided your matching rules are free of contradictions. Of course, the downside is you need to remember (and teach others) the precise phrasing to trigger a certain intent. Refining your matching rules is probably a good task for a coding agent.

    Back in the pre-LLM days, we used simpler statistical models for intent classification. These were way smaller and could easily run on CPU. Check out random forests or SVMs that take bags of words as input. You need enough examples though to train them on.

    With an LLM you can reframe the problem as getting the model to generate the right ‘tool’ call. Most intents are a form of relation extraction: there’s an ‘action’ (verb) and one or more participants (subject, object, etc.). You could imagine a single tool definition (call it ‘SpeakerIntent’) that outputs the intent type (from an enum) as well as the arguments involved. Then you can link that to the final intent with some post-processing. There’s a 100M version of gemma3 that’s apparently not bad at tool calling.








  • Thanks for teaching me something new!

    So Chromium is based on Blink, which is LGPL – a less viral GPL. Hence, it can serve as a dependency in closed-source software.

    As to the shared heritage of these well-established projects – I don’t know how else to interpret it other than a testament to the complexity of building a decent browser engine.

    Btw, quick shout out to Orion, a rare WebKit browser by the makers of Kagi that’s apparently coming to Linux as well. I’m a monthly supporter. Even though I still mostly use Vivaldi, it’s been coming along really nicely. Proprietary software but idc. I appreciate their unspoken mission statement: pay or be the product. (No-one should be a product, obviously, but that’s capitalism.)


  • Don’t have time to factcheck so going to take your word for it. Interesting bit of knowledge! Honestly wouldn’t have thought that. How else are Chrome, Edge, Brave, Arc, Vivaldi and co getting away with building proprietary layers on top of a copyleft dependency?

    I’m no legal expert. All I know is that when I’m picking dependencies at work, if it’s copyleft, I leave it on the table. I love the spirit of GPL, but I don’t love the idea of failing an audit by potential investors because of avoidable liabilities.


  • I’m OOTL. Are these actual issues people have with the project?

    C++ might not be as memory-safe as Rust, but let’s not pretend a Rust code base wouldn’t be riddled with raw pointers.

    BSD tells me the team probably wants Ladybird to become not just a standalone browser but also a new competing base for others to build a browser on top of – a Chromium competitor. Even though BSD wouldn’t force downstream projects to contribute back upstream, they probably would, since that’s far less resource-intensive than maintaining a fork. (Source: me, who works on proprietary software, can’t use GPL stuff, but contributes back to my open-source dependencies.)


  • https://minilanguage.com/ is an interesting one to look at. There are exactly 1000 words in the total vocabulary. That’s Mini Mundo though. A second, smaller variant also exists: Mini Kore, with 100 words.

    I started learning it too soon after learning Toki Pona and lost steam. But I agree with the design principles. They stem from the observation that Toki Pona, as fun as it is, is just too damn ambiguous for anything non-superficial. All too often speakers need to clarify what they said by switching to a natural language. Even my own Toki notes become indecipherable after a few days.

    Toki Pona: fun, therapeutic mental exercise, made even better with sitelen pona. Feels like writing poetry. Never meant to be a useful language. Easy to learn, hard to use.

    Mini: useful as a language for general purpose communication. Small, primarily latinate vocabulary. Harder to learn, easier to use.



  • To the extent that the billboard never existed while the image implies it did – sure.

    I love the term ‘slop’. It’s one of my favorite new words along with ‘nontent’.

    But this, to me, isn’t that. I think of slop as ‘unrequested, unconvincing, lazy, and lifeless’. In short, ineffective and unwelcome.

    I feel like this meme gets the message across. It’s not great, but it’s not terrible. The AI tells are subtle enough: the multi lane pileup in the background and some poor small size text rendering.

    Not sure why I felt the need to write this. Guess I’m of the opinion that just because something is AI-generated doesn’t mean it should be discounted immediately, unless it really feels like zero effort went into it. Have a nice day!