• 2 Posts
  • 120 Comments
Joined 2 years ago
cake
Cake day: December 31st, 2023

help-circle
  • A small gui to automate generating some pdfs from some CSV files.

    There’s a small non-profit in my area helping people operate localized energy distribution (as producers and consumers). Each month, they receive a zip file containing the raw kiloWatt-hours produced and consumed by each participant over the past month as CSV files. So far the non-profit has been manually importing these CSVs into LibreOffice to generate graphs and tables and export the whole thing as an individualized PDF file for each participant. Now that they’re starting to help more than 2-3 operations, it’s become useful to try to automate that process.

    I’ve been writing it in rust for a few reasons. First of all I wanted cross-compilation to be sure to work and at this point I’m more familiar with rust than go, secondly I read a blog post recently that evaluated rust gui solutions in terms of accessibility and IME-compatibility on windows. I started off looking for a “direct” pdf-writing library but eventually switched to using typst to generate the pdfs from templates I write. typst being written in rust has enabled me to bundle its engine into the program in a pretty-straightforward way.

    I’m currently working on allowing the import of multiple sets of data so that the generated PDFs can show line plots of the electricity production and consumption over several months.


    1. chunk_size := file_size / cpu_cores. Compile regex.

    2. spawn cpu_cores workers:
      2.a. worker #n starts at n * chunk_size bytes. If n > 0, skip bytes until newline encountered.
      2.b worker starts feeding bytes from file/chunk into regex. When match is found, write to output (stdout or file, whichever has better performance). When newline encountered, restart regex state automata.
      2.c after having read chunk_size bytes, continue until encountering a newline to ensure the whole file is covered by the parallel search.

    Optionally, keep track of byte number and attach them to the found matches when outputting, to facilitate eventually de-duplicating and/or navigating to said match in the file.

    To avoid problems, have each worker output to a separate file, and only combine these output files when the workers are all finished.

    As others have said, it’s going to be hard to get more speedup than this, and you will ultimately be limited by your storage’s read speed and throughput if the whole file cannot fit into memory.


  • It’s been a while since I set up my runner, and I have it on my personal desktop (which is wayyyyyy beefier than the VPS I host my forgejo instance on), but I’m pretty sure I was able to specify that only my user account can trigger actions to be run on this runner. What I’m getting at is that there is a decent amount of granularity for forgejo action permissions; you should be able to find a balance that suits you between “no actions at all” and “anyone can run any code they desire on your server”.






  • Yup! YAML is defined as a “strict superset” of JSON (or at least, it was the last time I checked).

    It’s a lot like markdown and HTML; when you want to write something deeply structured and somewhat complex you can always drop back/down to the format with explicit closing delimiters and it just works™.



  • So which is it? Are developers 55% more productive, or are they losing 20% of their time to inefficiencies and burning out at record rates?

    The answer: executives are measuring—and reporting—what makes their stock price rise, not what’s actually happening on the ground.

    Or if you want to get slightly more conspiratorial: the execs are all buying shares in OpenAI, Nvidia, and the like - so now they’re more interested in ordering people to use LLM tools so that these stocks rise in price, even if it means sabotaging their own company.




  • I have the same preference for personal projects, but when I was working on a corporate team it was really useful to have the “run configs” for intellij checked in so that each new team member didn’t need to set them up by themselves. Some of the setup needed to get the python debugger properly connected to the project could get quite gnarly.