How much firmware is initializing???

DudemanJenkins@lemmy.world · 4 months ago

How much firmware is initializing???

Kokesh@lemmy.world · 4 months ago

Especially server accessible only by SSH…

cypherix93@lemmy.world · 4 months ago

I can’t be bothered to walk down to the basement, so practically my server is also only accessible by SSH

Rusty Shackleford@programming.dev · 4 months ago

Especially after age 40 and a knee surgery… I’m tired boss! 😩

Lucy :3@feddit.org · 4 months ago

I’m 150+km away from my server, with literally everything on it lol

yhvr@lemm.ee · 4 months ago

I’m at college right now, which is a 3 hour drive away from my home, where a server of mine is. I just have to ask my parents to turn it back on when the power goes out or it gets borked. I access it solely through RustDesk and Cloudflare Tunnels SSH (it’s actually pretty cool, they have a web interface for it).

I have no car, so there’s really no way to access it in case something catastrophic happens. I have to rely on hopes, prayers, and the power of a probably outdated Pop!_OS install. Totally doesn’t stress me out I’ll just say I like to live on the edge :^)

ironhydroxide@sh.itjust.works · 4 months ago

Setup a pikvm as ipmi and you’ll have at least another layer of failure required to completely lose connectivity

yhvr@lemm.ee · 4 months ago

Hadn’t heard of pikvm before. Will keep that in mind, thanks!

Lucy :3@feddit.org · 4 months ago

Currently the server(s) are in my room, which is so messy my dad probably wouldn’t even enter it voluntarily. And in the case grub/fstab/crypttab/etc. are messed up, which is probably the most common error, he probably couldn’t solve it by himself. Soon everything’s gonna live in its own little room in the basement, so it’s gonna be accessible easier actually.

Neuromancer@lemm.ee · 4 months ago

In the old days some of the servers took at hour to reboot. That was stressful when you couldn’t ping it at an hour.

NocturnalMorning@lemmy.world · 4 months ago

Don’t say stuff like that. You’re gonna give me a heart attack.

Neuromancer@lemm.ee · 4 months ago

The more disk you had, the longer it took. It walked the scsi bus which took forever. So if you had more disk. It took even longer.

Since everything was remote, you’d have to call hands and they weren’t technical. Also no cameras since it was the 90’s.

Now when I restart a vm or container. I panic if it’s not back up in 10 minutes.

NocturnalMorning@lemmy.world · 4 months ago

I get annoyed if my pc isn’t restarted in 30 seconds now.

Neuromancer@lemm.ee · 4 months ago

I think mine takes like 2 minutes. It’s ten years old. I’ve putting off upgrading to the cost of videos cards

Thassodar@lemm.ee · 4 months ago

I got an M.2 drive last year after having a motherboard capable of it for 3-4 years, and naturally named it “Plash Speed”.

didnt1able@sh.itjust.works · 4 months ago

I will never not laugh at this video.

CanadaPlus@lemmy.sdf.org · 4 months ago

Why would you design a disk driver that way?

Neuromancer@lemm.ee · 4 months ago

It isn’t a disk driver since the OS is not loaded yet. It is the hardware identifying each disk in the SCSI chain. Not sure what else it was doing walking the bus much I know finding all the disk was the longest part.

CanadaPlus@lemmy.sdf.org · edit-2 4 months ago

Shoot, it did occur to me that might not technically be the right word.

Still, even if you’re an engineer in the late 80’s, it seems like it would be obvious you need a way for disks to announce themselves in O(1) time. Was it just a limitation of interoperability between vendors or something?

Neuromancer@lemm.ee · 4 months ago

I think it was just a limit of how quick everything ran back then. Also, this was an IBM system that was checked, double-checked, and triple-checked because it was a mission-critical system. IBM used to be known for quality hardware. Hard to imagine because they are such a crap company now but that was the equivalent of a google back then.

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 4 months ago

I like how posting got fairly fast. Then we started putting absurd amounts of ram into servers so now they’re back to slow.

Like we have a high clock speed dual 32 core AMD server with 1TB of ram that takes at least 5 minutes to do it’s RAM check. So every time you need to reboot you’re just sitting there twiddling your thumbs waiting anxiously.

Neuromancer@lemm.ee · 4 months ago

I will date myself. These machines had a lot of memory as well which added to the slow reboot. I think it was 16 gigs.

The r series for IBM took forever. The p series was faster but was still slow

trolololol@lemmy.world · 4 months ago

I’ll date myself. My first PC had 500MB of STORAGE

Neuromancer@lemm.ee · 4 months ago

My first pc had a tape drive.

trolololol@lemmy.world · 4 months ago

I had a friend with one of those while I had an Atari. The Atari game would come up within a minute, but the tape took like 15 min to start.

Neuromancer@lemm.ee · 4 months ago

Using a tape drive is crazy when you think about it. It was slow…. This wasn’t the big tape cartridges. It was a standard Audio tape. Not sure why they could store but it was all sequential

trolololol@lemmy.world · 4 months ago

Never ask an engineer why lol

Source: am engineer

SaharaMaleikuhm@feddit.org · 4 months ago

Never update, never reboot. Clearly the safest method. Tried and true.

bamfic@lemmy.world · 4 months ago

Found the debian user!

naeap@sopuli.xyz · 4 months ago

Never touch a running system
Until you have a inviting hole in your system

Nevertheless, I’m panicking every time I update my sever infrastructure…

xmunk@sh.itjust.works · 4 months ago

Initializing VPC…

Configuring VPC…

Constructing VPC…

Planning VPC…

VPC Configuration…

Step (31/12)…

Spooling up VPC…

VPC Configuration Finished…

Beginning Declaration of VPC…

Declaring Configuration of VPC…

Submitting Paperwork for VPC Registration with IANA…

Redefining Port 22 for official use as our private VPC…

Recompiling OpenSSH to use Port 125…

Resetting all open SSH connections…

Your VPC declaration has been configured!

Initializing Declared VPC…

nick@midwest.social · 4 months ago

Just had to restart our main MySQL instance today. Had to do it at 6am since that’s the lowest traffic point, and boy howdy this resonates.

2 solid minutes of the stack throwing 500 errors until the db was back up.

xmunk@sh.itjust.works · 4 months ago

If you have the bandwidth… it is absolutely worth it to invest in a maintenance mode for your system, just check some flat file on disk for a flag before loading up a router or anything and then, if it’s engaged, just send back a static html file with ye olde “under construction” picture.

dondelelcaro@lemmy.world · 4 months ago

Bonus points if your static site sends a 503 with a retry after header.

nick@midwest.social · edit-2 4 months ago

That’s not really… possible at this point. We have thousands of customers (some very large ones, like A——n and G—-e and Wal___t) with tens or hundreds of millions of users, and even at lowest traffic periods do 60k+ queries per second.

This is the same MySQL instance I wrote about a while ago that hit the 16TiB table size limit (due to ext4 file system limitations) and caused a massive outage; worst I’ve been involved in during my 26 year career.

Every day I am shocked at our scale, considering my company is only like 90 engineers.

MystikIncarnate@lemmy.ca · 4 months ago

Is that the same database my user couldn’t connect to today?

mikyopii@programming.dev · 4 months ago

When you make a potentially system breaking change and forgot to make a snapshot of the VM beforehand…

MystikIncarnate@lemmy.ca · 4 months ago

There’s always backups… Right?

… Right?

WhyJiffie@sh.itjust.works · 4 months ago

oh there is. from 3 years ago, and some

Buddahriffic@lemmy.world · 4 months ago

Someone set up a script to automatically create daily backups to tape. Unfortunately, it’s still the first tape that was put in there 3.5 years ago, every backup since that one filled up failed. It might as well have failed silently because everyone who received the email with the error message filtered them to a folder they generally ignored.

msage@programming.dev · 4 months ago

And no one ever tried to restore it.

Happened to me as well, after a year I learned incremental DB backups were wrongly offset by GMT diff, so we were losing hours every time. Fun.

Luckily we never needed them.

And now we have Postgres with WAL archiving and I sleep so much better.

umbrella@lemmy.ml · 4 months ago

this week i sudo shutdown now our main service right at the end of the workday because i tought it was a local terminal.

not a bright move.

SavvyWolf@pawb.social · 4 months ago

There’s a package called molly-guard which will check to see if you are connected via ssh when you try to shut it down. If you are, it will ask you for the hostname of the system to make sure you’re shutting down the right one.

Very usefull program to just throw onto servers.

umbrella@lemmy.ml · 4 months ago

nice. got it installed to test it out

trolololol@lemmy.world · 4 months ago

We got the Trojan in, let’s move move move!

Trainguyrom@reddthat.com · 4 months ago

I was making after hours config changes on a pair of mostly-but-not-entirely redundant Cisco L3 switches which basically controlled the entire network at that location. While updating the running configs I mixed up which ssh session was which switch and accidentally gave both switches the same IP address, and before I noticed the error I copied the running config to the startup config.

Due to other limitations and the fact that these changes were to fix DNS issues (and therefore I couldn’t rely on DNS to save me) I ended up keeping sshing in by IP until I got the right switch and trying to make the change before my session died due to dropped packets from the mucked up network situation I had created. That easily added a couple of hours of cleanup to the maintainence I was doing

naeap@sopuli.xyz · 4 months ago

Happens to everyone

Just having a multitude of terminals open with a mix of test environment and (just for comparison) an open connection to the production servers…

We were at a fair/exhibition once and on the first day people working on an actual customer project asked us, if they could compare with our code.
Obviously they flashed the wrong PLC and we were stuck dead at the first hours of the exhibition.
I still think that this place was cursed, as we also had to do multiple re-soldering of some connections of our robot and the sherry on top was the system flash dying - where I had fucked up, because I just finished everything late at night and didn’t made a complete backup of everything.
But it seems, if luck runs out, you lose on all fronts.

At least I was able to restore everything in 20mins. Which must be some kind of record.
But I was shaking so much from the stress, that I couldn’t efficiently type anymore and was lucky to have a colleague to just calmly enter what I told him to and with that we’re able to get the show case up and running again.

Well, at least the beer afterwards tasted like the liquid of the gods

mox@lemmy.sdf.org · 4 months ago

Oops.

Since you’re using sudo, I suggest setting different passwords on production, remote, and personal systems. That way, you’ll get a password error before a tired/distracted command executes in the wrong terminal.

umbrella@lemmy.ml · edit-2 4 months ago

i have different passwords but i type them so naturally it didnt even register.

“wrong password.”

“oh, i’m on the server, here’s the right password:”

“no wait”

LiveLM@lemmy.zip · 4 months ago

Best thing I did was change my shell prompt so I can easily tell when it isn’t my machine

umbrella@lemmy.ml · edit-2 4 months ago

you mean the user@machine:$ thing? how do you have yours?

Vilian@lemmy.ca · 4 months ago

Change the color too

LiveLM@lemmy.zip · 4 months ago

Correct!
I put a little Home icon on mine using NerdFonts.
If you are using ZSH or Fish you can do much more

Ignotum@lemmy.world · 4 months ago

I have more than once typed shutdown instead of reboot when working on a remote machine… always fun

RandomLegend [He/Him]@lemmy.dbzer0.com · 4 months ago

Make an alias for Ehen you type shutdown it dies restart and if you want to shutdown make an alias that goes like

Yesireallywanttoshutdown

chatokun@lemmy.dbzer0.com · 4 months ago

Networking, we had a remote office in Europe (I’m in the US) and wanted to reset a phone. Phone was on port 10 of the Cisco switch, port 1 went to the firewall (not my design, already in place).

Helping my coworker, I tell her to shut port 10.

Shut port 1, enter.

Ok… office is offline and on another continent…

umbrella@lemmy.ml · 4 months ago

i have the horrible habit of using shutdown now because of my personal computers. a lot more fun.

sik0fewl@lemmy.ca · edit-2 4 months ago

Not sure if this will help you, but I always do shutdown and then think about whether I want to do -r or -h. I’m sure it won’t help 🙂

MystikIncarnate@lemmy.ca · 4 months ago

Ipmi is your friend.

NastyNative@mander.xyz · 4 months ago

Tbh there is nothing more taxing on my mental health than doing maintenance on our production servers.

WagnasT@lemmy.world · edit-2 4 months ago

when it was the wrong server and you’re hoping it comes back up before 5 minutes and nagios starts sending alerts

sep@lemmy.world · 4 months ago

I install molly-guard on important machines for this reason. So fast to do a reboot on the wrong ssh session

tiramichu@lemm.ee · 4 months ago

If a tree falls in the woods…

pedz@lemmy.ca · edit-2 4 months ago

I work with IBM i/AS400 servers and those are not exactly the quickest thing to “reboot” (technically an IPL). Especially the old ones. I have access to the HMC/console but even this sometimes takes several minutes (if not dozens) just to show what’s going on.

It’s always a bit stressful to see the codes passing one after the other and then it stops on one and seems to get stuck there for a while before continuing the IPL process. Maybe it’s applying PTFs (updates) or something, and you just have to wait while even the console is blank.

I’ve been monitoring those servers for years and I’m still sometimes wondering if it hanged during the IPL or if it’s just doing its thing, because this part, even with codes, is not very verbose.

Fortunately it’s also very stable so it pretty much always comes back a few minutes after you start wondering why the hell it’s taking so long.

shoulderoforion@fedia.io · 4 months ago

… and you’re updating it remotely

umbrella@lemmy.ml · edit-2 4 months ago

… and you just changed a very important configuration file you intended to double check next

shoulderoforion@fedia.io · 4 months ago

and it that moment we all squint our eyes and become homer simpson

lnxtx (xe/xem/xyr)@feddit.nl · 4 months ago

Dell PowerEdge R620, I’m talking to you.

tooclose104@lemmy.ca · 4 months ago

When someone previously told a vrtx vm not to auto boot after power up and none of the remote access is working either… Both undocumented as well, of course. And your tired AF tech is statically configuring the wrong IP range on their laptop to manu because it’s been a long shutdown day and are also unfamiliar with the system in general (me). Good times, I figured it out though, but lots of sweating and swearing.

draughtcyclist@lemmy.world · 4 months ago

Y’all need high availability in your lives.

PenisDuckCuck9001@lemmynsfw.com · edit-2 4 months ago

That’s why you connect an arduino to the motherboards reset pin and load it with a program where it resets the system if it doesn’t receive an ACK signal over the usb connection every 10 minutes.

Eventually though the networking and apache stops working after around 150 days so you also have to make a script that resets the system after 30 minutes of not having network.

trolololol@lemmy.world · 4 months ago

Plot twist, reboot takes 11 minutes and you didn’t test for it