Everything Started with the Promise of Loosely Coupled Systems

RagingHungryPanda@lemm.ee · 1 month ago

I’m not entirely sure. I spent more than a year in Latin America and came back to prices being about 2-3x what i remember. Groceries before I left were 2x compared to before COVID.

Shits fucking expensive in the US.

RagingHungryPanda@lemm.ee · 2 months ago

Thanks for giving it a good read through! If you’re getting on nvme ssds, you may find some of your problems just go away. The difference could be insane.

I was reading something recently about databases or disk layouts that were meant for business applications vs ones meant for reporting and one difference was that on disk they were either laid out by row vs by column.

RagingHungryPanda@lemm.ee · 2 months ago

Thanks haha

RagingHungryPanda@lemm.ee · 2 months ago

That was a bit of a hasty write, so there’s probably some issues with it, but that’s the gist

RagingHungryPanda@lemm.ee · 2 months ago

yes? maybe, depending on what you mean.

Let’s say you’re doing a job and that job will involve reading 1M records or something. Pagination means you grab N number at a time, say 1000, in multiple queries as they’re being done.

Reading your post again to try and get context, it looks like you’re identifying duplicates as part of a job.

I don’t know what you’re using to determine a duplicate, if it’s structural or not, but since you’re running on HDDs, it might be faster to get that information into ram and then do the job in batches and update in batches. This will also allow you to do things like writing to the DB while doing CPU processing.

BTW, your hard disks are going to be your bottleneck unless you’re reaching out over the internet, so your best bet is to move that data onto an NVMe SSD. That’ll blow any other suggestion I have out of the water.

BUT! there are ways to help things out. I don’t know what language you’re working in. I’m a dotnet dev, so I can answer some things from that perspective.

One thing you may want to do, especially if there’s other traffic on this server:

use WITH (NOLOCK) so that you’re not stopping other reads and write on the tables you’re looking at
use pagination, either with windowing or LIMIT/SKIP to grab only a certain number of records at a time

Use a HashSet (this can work if you have record types) or some other method of equality that’s property based. Many Dictionary/HashSet types can take some kind of equality comparer.

So, what you can do is asynchronously read from the disk into memory and start some kind of processing job. If this job does also not require the disk, you can do another read while you’re processing. Don’t do a write and a read at the same time since you’re on HDDs.

This might look something like:

offset = 0, limit = 1000

task = readBatchFromDb(offset, limit)

result = await task

data = new HashSet\<YourType>(new YourTypeEqualityComparer()) // if you only care about the equality and not the data after use, you can just store the hash codes

while (!result.IsEmpty) {

offset = advance(offset)

task = readBatchFromDb(offset, limit) // start a new read batch



dataToWork = data.exclusion(result) // or something to not rework any objects

data.addRange(result)



dataToWrite = doYourThing(dataToWork)

// don't write while reading

result = await task



await writeToDb(dataToWrite) // to not read and write. There's a lost optimization on not doing any cpu work

}



// Let's say you can set up a read or write queue to keep things busy

abstract class IoJob {

public sealed class ReadJob(your args) : IoJob

{

Task\<Data> ReadTask {get;set;}

}

public sealed class WriteJob(write data) : IoJob

{

Task WriteTask {get;set;}

}

}



Task\<IoJob> executeJob(IoJob job){

switch job {

ReadJob rj => readBatchFromDb(rj.Offset, rj.Limit), // let's say this job assigns the data to the ReadJob and returns it

WriteJob wj => writeToDb(wj) // function should return the write job

}

}



Stack\<IoJob> jobs = new ();



jobs.Enqueue(new ReadJob(offset, limit));

jobs.Enqueue(new ReadJob(advance(offset), limit)); // get the second job ready to start



job = jobs.Dequeue();

do () {

// kick off the next job

if (jobs.Peek() != null) executeJob(jobs.Peek());



if (result is ReadJob rj) {



data = await rj.Task;

if (data.IsEmpty) continue;



jobs.Enqueue(new ReadJob(next stuff))



dataToWork = data.exclusion(data)

data.AddRange(data)



dataToWrite = doYourThing(dataToWork)

jobs.Enqueue(new WriteJob(dataToWrite))

}

else if (result is WriteJob wj) {

await writeToDb(wj.Data)

}



} while ((job = jobs.Dequeue()) != null)

RagingHungryPanda@lemm.ee · 2 months ago

RagingHungryPanda@lemm.ee · 2 months ago

And they PAID to be there!

RagingHungryPanda@lemm.ee · 2 months ago

Everything Started with the Promise of Loosely Coupled Systems

RagingHungryPanda@lemm.ee · 2 months ago

That’s the point though - it wasn’t a good thing

RagingHungryPanda@lemm.ee · 2 months ago

Indexes and pagination would be good starts

RagingHungryPanda@lemm.ee · 3 months ago

That sounds great! Oh

RagingHungryPanda@lemm.ee · 3 months ago

Hey! My company just fired ours today!

RagingHungryPanda@lemm.ee · 3 months ago

This is my parents XD

RagingHungryPanda@lemm.ee · 3 months ago

I misread it haha

RagingHungryPanda@lemm.ee · 3 months ago

None of that sounds sexy

RagingHungryPanda@lemm.ee · 3 months ago

I leave them all behind

RagingHungryPanda@lemm.ee · 4 months ago

haha fair

RagingHungryPanda@lemm.ee · 4 months ago

You should be able to mirror or fork, make contributions there, and you can probably try to do a PR back into the original. You will still have to use github somewhat since that’s where the original repo is.

RagingHungryPanda@lemm.ee · 5 months ago

The Japanese keyboard layout still has them (⁠✷⁠‿⁠✷⁠)

RagingHungryPanda@lemm.ee · 6 months ago

Just from stepping I’d have $1,250 today. I’m fine with that.

RagingHungryPanda@lemm.ee · 6 months ago

He…didn’t wipe, did he?

RagingHungryPanda@lemm.ee · 10 months ago

Chinese Farmers Spraying Wolfberries With Industrial Sulfur: State Media - Business Insider - lemm.ee