Three Sysadmin Rules You Can’t (And Shouldn’t) Break

by Ramesh Natarajan on July 27, 2010

When I drafted this article, I really came-up with 7 sysadmin habits. But, out of those 7 habits, three really stood out for me.

While habits are good, sometimes rules might even be better, especially in the sysadmin world, when handling a production environment.

Rule #1: Backup Everything ( and validate the backup regularly )

Experienced sysadmin knows that production system will crash someday, no matter how proactive we are. The best way to be prepared for that situation is to have a valid backup.

If you don’t have a backup of your critical systems, you should start planning for it immediately. While planning for a backup, keep the following factors in your mind:

  • What software (or custom script?) you would use to take a backup?
  • Do you have enough disk space to keep the backup?
  • How often would you rotate the backups?
  • Apart from full-backup, do you also need regular incremental-backup?
  • How would you execute your backup? i.e Using crontab or some other schedulers?

If you don’t have a backup of your critical systems, stop reading this article and get back to work. Start planning for your backup immediately.

A while back in one of the research conducted by some group (don’t remember who did that), I remember they mentioned that only 70% of the production applications are getting backed-up. Out of those, 30% of the backups are invalid or corrupted.

Assume that Sam takes backup of the critical applications regularly, but doesn’t validate his backup. However, Jack doesn’t even bother to take any backup of his critical applications. It might sound like Sam who has a backup is in much better shape than Jack who doesn’t even have a backup. In my opinion, both Sam and Jack are in the same situation, as Sam never validated his backup to make sure it can be restored when there is a disater.

If you are a sysadmin and don’t want to follow this golden rule#1 (or like to break this rule), you should seriously consider quitting sysadmin job and become a developer. :-)

Rule #2: Master the Command Line ( and avoid the UI if possible )

There is not a single task on a Unix / Linux server, that you cannot perform from command line. While there are some user interface available to make some of the sysadmin task easy, you really don’t need them and should be using command line all the time.

So, if you are a Linux sysadmin, you should master the command line.

On any system, if you want to be very fluent and productive, you should master the command line. The main difference between a Windows sysadmin and Linux sysadmin is — GUI Vs Command line. Windows sysadmin are not very comfortable with command line. Linux sysadmin should be very comfortable with command line.

Even when you have a UI to do certain task, you should still prefer command line, as you would understand how a particular service works, if you do it from the command line. In lot of production server environment, sysadmin’s typically uninstall all GUI related services and tools.

If you are Unix / Linux sysadmin and don’t want to follow this rule, probably there is a deep desire inside you to become a Windows sysadmin. :-)

Rule #3: Automate Everything ( and become lazy )

Lazy sysadmin is the best sysadmin.

There is not even a single sysadmin that I know of, who likes to break this rule. That might have something to do with the lazy part.

Take few minutes to think and list out all the routine tasks that you might do daily, weekly or monthly. Once you have that list, figure out how you can automate those. The best sysadmin typically doesn’t like to be busy. He would rather be relaxed and let the system do the job for him.

Are there any other rules you think a sysadmin shouldn’t break? Leave a comment.

Download Free eBook - Linux 101 Hacks

Get free Unix tutorials, tips and tricks straight to your email in-box.

If you enjoyed this article, you might also like..

  1. How to View and Delete Iptables Rules – List and Flush
  2. Caught In the Loop? Awk While, Do While, For Loop, Break, Continue, Exit Examples
  3. Free Windows Backup Software – GFI Backup Home Edition Freeware
  

Vim 101 Hacks Book

{ 28 comments… read them below or add one }

1 Felix Frank July 27, 2010 at 2:47 am

Good one, I like the tone and agree to most of what’s said.

Using the command line is advisable, even though one should be aware of the advantages GUIs may or may not have (and if they have any, leverage them!). The really strong suit of the CLI arises from rule 3 – automation is possible, powerful and makes your tasks safer, as the margin for human error shrinks.

2 Mike Hall July 27, 2010 at 3:31 am

The “Friday” rule.
Don’t schedule an outage for the last day of your work week.
- Use the last day of work week to do / review preventitive things to keep you from needing to be contacted during your time off.
- Most of the time “Murphy’s Law” seems to intervene on things scheduled for last work day that: have you departing for day later than originally planned, and failing to explain issues with outage to your co-workers. This combination normally leads to calls at home (or calls to return to work) on you days off.

3 Prasad Parolekar July 27, 2010 at 4:15 am

Really nice article.
and like the first line of 3rd point
“Lazy sysadmin is the best sysadmin!!!!!!!!!!”

4 Greg Rickson July 27, 2010 at 4:51 am

There is ALWAYS a better way of doing something … You just haven’t found it yet !!

5 Rich July 27, 2010 at 5:37 am

Rule #2, right after perform backups: Document everything. All your system configs, all your system inter-relationships, all your processes. This, in effect is an extension of rule 1: backup everything. In the case of disaster recover, backing up data is only useful if you can recreate the system environment the data was running on. Also, you are backing up a another critical resource: you! If you were hit by a bus, could someone pick up where you left off?

6 Phil July 27, 2010 at 5:52 am

Never change the root shell, unless you have an alternative root account set up!

7 Francisco Fiesta July 27, 2010 at 5:53 am

I see that’s more or less the way I was following. One question: How can the back up information be validated? Which method, script or programm?

Thanks.

8 Slavko July 27, 2010 at 6:39 am

using shell is very close to automatization taks, because shell comands are more simple to automatizate than mouse clicks :-D

9 Cotamayor July 27, 2010 at 7:48 am

I Agree with Francisco, how to validate data, I am new on sysadmin and still learning, so a good pointer would be greatly appreciated

10 komradebob July 27, 2010 at 9:03 am

The simplest solution to validate data is to restore from your backup media and compare that to the existing data. Most simplistic is to run a ‘sum’ on the files to compare. If the data is more dynamic, run a sum on the files in the backup before they get backed up and include that in the backup. Restore to a different directory and check it against what is restored.

To generate a checksum file:

find . -type f -print -exec sum ‘{}’ \; > checksumfile

then back it up.

Restore it someplace, run the same command (but put the output in a different file!) and diff the two checksum files.

11 carlos July 27, 2010 at 9:37 am

well, …. I think the author did not write “validate the data”, but validate the backup…

kind of confusing, ha…

In my opinion, one sysadmin must validate the media now and then, mostly periodically.

For example, if you have a backup of a particular aplication typically in tape, or maybe you can hire an on line service, it depends on your budget and/or your needs.

You can restore this backup to another server, now that prizes are dropping, or if you upgrade your server, you can have an “old” server only for this purpose.

Use wisely the checksum methods, md5sum, hash, cksum, … whatever that fits your needs, I am saying, make a checksum before the backup and another checksum in the new location. and compare.

Well, surely there are a lot of techniques to accomplish this.

At a glance this is only my “two cents”.

12 wintermoot July 27, 2010 at 2:41 pm

You did it wrong, 1st rule is RTFM.

13 BA July 27, 2010 at 9:47 pm

Backups (and disaster recovery plans) are rarely shown the respect they deserve. A failed backup could literally put your company out of business yet often they are handled by junior sysadmins with no supervision. I always tell new sysadmins they will never truly know the importance of backups until they have to look a user in the face while telling them their data is gone forever. Gone because you didn’t do your job properly.

14 Hamilton July 27, 2010 at 9:55 pm

Very niiiice article, i can say that is one of best that i read here in thegeeksttuff. I’ll take notes and leave them very close to my PC, im not yet an administrator but that’s not reason for not apply those rules to my habits right now, especially rules #2 and #3

15 Kuric July 28, 2010 at 1:09 am

2 things… validating the backup media is very important.. I know a sysadmin for a large hotel in a well known tourist area on the east coast, who had been running backups for years and after a hurricane hit needed to do a restore only to find out that the tape drive had a bad head in it and none of the backups ever had anything written to them…

Also along the backup line is before making ANY changes to configuration files Backup the current config so that you can revert back and start over if the change you make has an unexpected consequence.

Very good article …
-Mike

16 Paranoicster July 28, 2010 at 1:17 pm

Rule #4

merge into #3 – Get a date and let the computer working

17 Francisco Fiesta July 28, 2010 at 2:33 pm

Thanks a lot to all for your answers to my question. Very usefull.
I’m not really a sysadmin but at home I also have important information to keep safe and systems I wouldn’t like to reinstall from scratch again and again. So, I take seriously backing up, being sure that my backups will work when I’d need them and I find also important to delegate many repetitive tasks as possible to scripts or programs as I can. Trying to become lazy involves learning scripting, more linux shell and learning in general. So, I suppose laziness is the prize to knowledge.

18 MikeFM July 29, 2010 at 12:06 am

Redundant backups are important too. Expect that when your system fails your primary backup device will fail also. Unfortunately I’ve learned this the hard way. It’s best if your secondary backup is kept off site so if something happens that just flattens your entire block then you can drop the backup on a spare machine somewhere and be the hero when the company continues with minimal downtime instead of being out of business.

Our servers mostly are virtualized now so I even make backups of the entire VM and make backups on the OS level – if one backup stops functioning as expected for some reason then I have an alternate.

Employers tend to give admins crap for spending so much time and money on backups but when the shit hits the fan they are much happier that you were prepared.

19 gus3 July 29, 2010 at 12:08 am

Mike Hall has it wrong. The admins’ needs are totally disjoint from the end users’ needs. The end users needs a reliable system; the admins need a system that isn’t unreliable. The end users need a system that won’t fail; the admins need to make sure that the system won’t fail, and so need the time to test the system in ways that might make the system fail.

I’ve been an admin, and I know that sometimes, the weekend is the best time for an outage. If a system will fail after an update, the weekend gives the most time to recover from the failure.

20 Michael July 29, 2010 at 2:15 am

Rule #4: Chaos theory (“butterfly effect”) is for real.
…or: If it works don’t change it!

As sysadmin’s we all know that even simple tasks such as unplugging a network printer will lead to an unpredictable series of events which eventually will end in disasters such as email servers not working.

Even though the two things are not connected in any way, disasters may pop up ;-)

21 RedRyder July 29, 2010 at 6:44 am

One good reasoning for rule #2 is when you have to administer a server a couple of time zones away and all your network traffic is goes through headquarters in the next state over the difference between command line and GUI(xdmcp, vnc, etc) can be hours of downtime for the customer.

Also on the topic of backup, at home least, I find the best option is to store everything on my networked RAID drives. Every so often I just change out one of the drives using my spare and store it in the fireproof safe.

22 Andres Arenas July 29, 2010 at 6:46 am

I found the backup strategy one of the most challenging tasks of all. It is not a simple as back up everything, or you will end up backing up all your users music, family and party pictures, and tons of crap. I agree that the most important part of the backup process is to test if you can effectively restore the data, in the end that is the purpose of backing up.

I would recommend to check your plan taking into account the purpose of the backup:
[1] Disaster Recovery
[2] Archive or long term preservation of data.
The first strategy has the purpose of saving the most current data to get your systems up and running as soon as possible with the minimum data loss. Usually you don’t old data for this purpose.
The second strategy is more complicated. Should you preserve all versions of your files?, for how long?, what data needs to be preserved and what data can be ignored (for preservation purposes).

A final note, specially for archival purposes it is important to backup in a tool/format that you can use in the future. Try to use standard tools and test if you can still restore old data with your new shiny tape drive or backup software.

23 Go2Doug July 29, 2010 at 7:29 am

I have to disagree partially with rule #2, “master the command line”.

“Mastering” the command line implies that one should know by heart nearly all the command line commands and their associated options. Ever seen the book Linux in a Nutshell? There’s no way that somebody could memorize even half the commands in that book. Instead of using the command line for each and every task, I would advise learning it by heart for more common tasks.

By the way, in the *Nix world shouldn’t you be referring to the “shell prompt”? “Command line” is Windows jargon, isn’t it?

24 Keith Edward Brown July 29, 2010 at 8:34 am

Never, ever deploy version 1.0, or for that matter a brand spanking new product version that is significant to your daily operations until service packed (OS, backup, database, email server, etc.). Let the earlier adopters toil and suffer. Case in point, an IT services org deployed Exchange 2010 two weeks after it’s release. 4 weeks later there are still ongoing problems, including the pres/CEO not being able to open attachments on emails more recent than 6/22/10. Besides a product that is surely filled with bugs, the services org had only a few weeks of newsgroup postings for the sake of deploying and remediating this new service. And then consider how monumentally poor Backup Exec 12.x was for backing up Exchange 2007. Is there any reason to believe Backup Exec 2010 will be any better?

25 Ron S. July 29, 2010 at 9:54 am

Thou Shalt Not Maketh System Changes on Fridays
(Unless thou wishest to be work weekends)

26 Ken August 1, 2010 at 9:44 pm

Just because you use the GUI doesn’t mean you aren’t comfortable with the cli.

27 Marco August 9, 2010 at 7:47 am

I could add:
*Practice any change you’ll do on a not critical environment before try the change on production environment

On Rule #3 I could add to the description, that notifications are essential for ensuring the availability of the process

28 satheesh August 11, 2010 at 6:05 am

How can we automate the regular tasks. Is this done by writing scripts or any other means. if any one can explain it would be more help full to me and others those who are new in this field and want advices like this.
thank you
satheesh

Leave a Comment

Previous post:

Next post: