Nikto is a free (GPL) tool for scanning the vulnerabilities of a webhost. It scans for dangerous CGI files, outdated versions, HTTP server options and installed web servers and their options. It also scans open ports, which can be exploited by hackers. It can save reports in plain text, XML, HTML, NBE or CSV. Read More “Scanning your website for vulnerabilities with Nikto : Examples” »
Category Archives: System Administration
Protect your web applications: SQL Injection Attacks Basics
SQL injection attack (SQLIA) is one the typical, easiest and one of the highest rated vulnerabilities present in websites. Despite they being easy to stop and being repeated again and again, unsuspecting webmasters easily fall prey to these attacks due to their ignorance or non-vigilant behavior against possible areas of attacks. Read More “Protect your web applications: SQL Injection Attacks Basics” »
How NOT to lose your data: Starting Online backup
How many times have we lost data due to corrupt hard disk, pen drive or CD/DVD. Many a times have I stored data in an external drive meant to be taken over to another computer at another location, but upon reaching, I found somehow the data vanished. External hard disks are an option, but again, not fail-safe. A small 3 feet drop, and chances are that they are dead! Well, those days are gone. Now, we have multiple options of storing data online, with the advent to multiple backup services. There is a huge range of free services like Dropbox (2GB free, recommended), Sugarsync (5GB free, recommended), UbuntuOne (5GB free, recommended), ADrive (50GB free, but with advertisements and only web interface), filesanywhere (1GB free) and others.
If you need more space, there are a wide range of highly reliable storage (and cheap!) services offered by Google Storage Services and Amazon S3. Their services require you to have technical knowledge to use. For example, Amazon S3 is actually a web service, and third party tools are required to use them with Windows. For non-technical users, it is better to go for paid services of Dropbox, Sugarsync, UbuntuOne or others.
There are tools available to enable entire website backup to one of the above services. For example, this website is undergoes an automated daily backup on Dropbox and Sugarsyc (yes, multiple backups, just in case). Moreover, Amazon S3 actually offers static website hosting as well, and is not susceptible to websites going gown in heavy loads.
Also, most of these services offer data sharing features, wherein we can share data over multiple computers, or send large data from to others by exposing via URL.
Top free version control systems
The benefits of version control system cannot be ignored in a shared software development environment. Amongst many version control systems available, few stand out in popularity due to speed, ease of usage or usage in large environments like Linux kernel development. Listed below are a few of them, and either one should serve the purpose in most development scenarios.
Originally designed by Linus Torvalds for Linux kernel development, GIT is a distributed full-fledged repository. Its strong points include rapid branching and merging, with different branches having different part of codes, supporting large projects efficiently (Linux kernel is hosted on Git) and toolkit based design (supporting lot of tools written in scripting languages or C to ease tasks). It also has a feature wherein the name of a particular revision depends on the entire development history, and so changes in old versions are easily noticed. This provides an extra layer of security (called Cryptographic authentication of history). Github and Google code offer free GIT hosting for open source projects. Github also offers paid private hosting, which is suitable for small companies as it saves them hardware and hosting cost.

Apache Subversion (SVN) was developed to be mostly-compatible successor to famous CVS. It is extensively used by Apache Software foundation, Ruby, PHP and Mediawiki. Its important features are atomic operations like commit, which aims to maintain the integrity of the current revision, versioning of symbolic links, support for binary files, Path-based authorization, and language binding for famous languages like C#, JAVA, Python etc. svn provides its own protocol (on default port 3690) for operations like checkout, commit, update etc. Tortoise SVN provides a great GUI that integrates with Windows explorer to provide a nice interface for SVN operations. SourceForge and Google Code offer SVN hosting.

Sponsored by Canonical (who also sponsor Ubuntu), bazaar is a tool written in Python for version control management. Bazaar supports working with or without a central server, and it is possible to use both methods at the same time with the same project. We can use bazaar create a branch from other systems like SVN and import in Bazaar, and then merge them back in SVN (though support for GIT is limited to read-only). Huge projects like Ubuntu, MySQL and Emacs use Bazaar, and free hosting is available on Launchpad and Sourceforge. There are a lot of plugins and GUI available for users.

Another distributed revision control tool written in Python, Mercurial focuses on high performance, decentralization, scalability and simplicity (it is much simpler than Git). It offers a great web-gui for managing files and revisions. TortoiseHg is a Windows extension for Mercurial. It also integrates well with Microsoft Visual Studio. It is used by important projects like Google go language, OpenJDK, Vim and OpenOffice. Codeplex, Google Code and SourceForge offer free code hosting using Mercurial.

CVS is one of initial version control systems, and is a mature product extensively used in software development world. Though it does not offer facilities like GIT and SVN, it needs a mention due to its wide usage in corporate world. Tortoise CVS is a GUI for Windows which supports nearly all operations of CVS. CVS also integrates well with IDEs like Eclipse and NetBeans. But due to its dormant development state, I would recommend using one of the above version control system rather than CVS.
Using ack to search the source code: going beyond grep
ack (or ack-grep on ubuntu) is a perl based search tool that replicates most of functionality of grep command, but goes a step ahead to position itself as an effective tool while searching in source code files. Its main features include
- Skipping CVS, SVN or Git directories, .bak files
- Searching only files of specific language: with “–type=TYPE format” option. ack supports a predefined list of languages and their file extensions. This can be overridden by -a option to search all file types.
- It descends into directories to search the files, ignoring subversion directories. More directories can be included or ignored using –[no]ignore-dir
- It supports grep options like -w, -A, -c etc.
Commonly used options include
-f : to print the file names to be searched
-w : search only a complete word
-G REGEX: Only path included in REGEX are included in the search.
-H: Print filenames with each search
-h: skip file names
-i: ignore case
–match REGEX: Used to specify the pattern explicitly. This would be useful to perform multiple searches on the file. For example:
# search for foo and bar in given files ack-grep file1 t/file* --match foo
-n: no descending into directories.
–sort-files: Sorts the found files lexically.
–type=TYPE, –type=noTYPE: Specify the files to include or exclude.
-v: invert match
Moreover, ack provides options add file types, if required. This can be achieved by modifying the .ackrc file. The location of this file is specified by ACKRC environment variable. If this file doesn’t exist, ack looks in the default location.
More help can be found by ‘ack –help’ or ‘ack-grep –help’ or referring to the manual pages.
CSV to Excel: The smart way!!
Back in the days when I did not know perl (which is not long ago), I wrote an elegant csv to excel converter in JAVA using POI to generate some reports. While POI is much more than a simple excel writer, the coding time was comparatively high (more so because that solution supported many features like advanced designs etc.). Perl provides a quick solution, if you require a script to generate reports from any data source. Either you may modify the example below to use another data source, or write a wrapper script to get data in csv from sql etc., and then call the perl script.
The prerequisites for running this script are:
For those who know perl, the script is simple enough. For those who don’t, please learn
. Either way, you can use this script in a very simple way.
#!/usr/bin/perl
use Spreadsheet::WriteExcel;
if ($#ARGV != 1){
print "format: excelFileName commaSaperatedListOfCSVs\n";
print "example: example.xls sheet1.csv,sheet2.csv\n";
exit ;
}
$sheetName=$ARGV[0];
$csvs=$ARGV[1];
@csv_a = split(/,/, $csvs);
my $workbook = Spreadsheet::WriteExcel->new($sheetName);
for my $csv (@csv_a){
print "processing $csv\n";
($name = $csv ) =~ s/\.csv//g;
$sheet=$workbook->add_worksheet($name);
open (CSV, $csv) || die ("Could not open $csv");
my $row = 0;
while ( my $line = <CSV>){
my $col = 0;
@entries=split (/,/, $line);
for my $entry (@entries){
$sheet->write($row, $col, $entry);
$col++;
}
$row++;
}
close (CSV);
}
$workbook->close();Here is a sample run:
$ perl csvToExcel.pl test.xls test1.csv,test2.csv processing test1.csv processing test2.csv
And here is the generated Excel file… voila!! Time to code: 10 minutes.

Understanding init in Linux/Unix : With examples.
Init is the parent of all processes running in user space in Linux/Unix. At startup, init is responsible to start all the non-operating system services, creates user environment, and presents the user with the login screen. Again, at shutdown, it is responsible to terminate all processes in the controlled manner Kernel executes its own shutdown.
Init process is located in /sbin/init on Linux, and has the process id (pid) as 1. Processes managed by init are known as jobs. The configuration files of these jobs are usually located in /etc/init, unless overridden.
Init process has a run level associated with it. A run level also determines which processes are executed at system startup. The run levels are stated below:
0 Halt 1 Single-user mode 2 Local Multiuser with Networking but without network service (like NFS) 3 Full Multiuser with Networking 4 Not Used 5 Full Multiuser with Networking and X Windows(GUI) 6 Reboot
For example, to reboot the system, simply run ‘init 6’ as root. (run a sync before that, as show below):
$ sync $ sudo telinit 6 $ Connection to 10.0.0.4 closed by remote host. Connection to 10.0.0.4 closed.
There is a special run level S, which is not really meant to be used directly, but more for the scripts that are executed when entering runlevel 1. We can switch from one run level to another using telinit. The services which do not exist in a given run level are stopped, and the ones required are started. This is performed by the /etc/init.d/rc script executed on a change of runlevel This script examines symlinks in the /etc/rc?.d directories, symlinks beginning K are services to be stopped and symlinks beginning S are services to be started. For example:
$ ls -lrt /etc/rc5.d/ total 4 -rw-r--r-- 1 root root 677 2011-03-28 22:10 README lrwxrwxrwx 1 root root 18 2011-04-28 14:23 S70pppd-dns -> ../init.d/pppd-dns lrwxrwxrwx 1 root root 19 2011-04-28 14:23 S70dns-clean -> ../init.d/dns-clean lrwxrwxrwx 1 root root 15 2011-04-28 14:23 S50saned -> ../init.d/saned lrwxrwxrwx 1 root root 15 2011-04-28 14:23 S50rsync -> ../init.d/rsync lrwxrwxrwx 1 root root 20 2011-04-28 14:23 S50pulseaudio -> ../init.d/pulseaudio lrwxrwxrwx 1 root root 19 2011-04-28 14:23 S25bluetooth -> ../init.d/bluetooth lrwxrwxrwx 1 root root 27 2011-04-28 14:23 S20speech-dispatcher -> ../init.d/speech-dispatcher lrwxrwxrwx 1 root root 20 2011-04-28 14:23 S20kerneloops -> ../init.d/kerneloops lrwxrwxrwx 1 root root 18 2011-04-28 14:23 S99rc.local -> ../init.d/rc.local lrwxrwxrwx 1 root root 18 2011-04-28 14:23 S99ondemand -> ../init.d/ondemand lrwxrwxrwx 1 root root 21 2011-04-28 14:23 S99grub-common -> ../init.d/grub-common lrwxrwxrwx 1 root root 22 2011-04-28 14:23 S99acpi-support -> ../init.d/acpi-support lrwxrwxrwx 1 root root 24 2011-04-28 14:23 S90binfmt-support -> ../init.d/binfmt-support lrwxrwxrwx 1 root root 14 2011-08-14 16:21 S75sudo -> ../init.d/sudo
telinit : To change a run level, we use telinit command:
Usage:
telinit [OPTION]... RUNLEVEL
For example:
$ sudo telinit 5
telinit may be also used to send basic commands to the init, like Q or q to request that init reload its configuration.
To Shell Scripts and beyond
Shell Scripts are good and very handy when it comes to Linux system management. There is a huge level of automation which can be achieved by the use of small shell scripts. One big advantage that I feel while using such scripts is that they are interpreted and not compiles (like Java). So a small change can quickly be reflected in the running system. Moreover, I do not have to write 10 lines of code to read a file line by line and perform some operation. Just do a cat, pipe the output to another command and we are good. Or better, to search something from a file, use a grep. These small advantages, when combined together, make shell scripts very fast to code and deploy. But it is not the complete story.
The main problem comes when we try to up-scale the scripts to do complicated tasks. The scripts tend to become slower and slower as the number of commands involved increase. This happens because each command is a process in itself, each with a start time and execution time. Besides, shell scripts are prone to errors, with huge costs if we do something like rm –rf *. I have, in a short span of time, been excited by scripts, have committed costly mistakes, and have waited for 30 minutes or more for script to complete (which took just 5 minutes when ported to Java).
These things have forced me to look beyond shell scripts. I think such times come in life of every Linux administrator, when shell scripts start looking more like a problem than solution. I have experimented with Python, php, Groovy, Scala and Perl.
Being from a Java background, I personally found Groovy and Scala much easier to learn, and still continue to use them till now. For Java developers, picking up one of these languages should not be a hassle. But absolute for beginners, I would say to go with php, because learning php also gives an extra edge due to its extensive use in web development. Once you are into php (or decide to skip it altogether), please take out time to be amazed by the power of Python and Perl. These languages were built for to replace those non-scalable shell scripts with something which can be converted to scripts which are very fast, and easily manageable.
Finally, what it comes down to is that pick one and dive deep. Each above mentioned languages can perform tasks as good as others. Also, I am yet to try my hands on Ruby!
Experiences with MySQL Engines
MySQL supports many storage engines based on the data and application requirements. At times, the requirements are to support large amount read quires, while at other times, support transactions with more writes than reads. MySQL is designed to handle different type of requirements with different type of storage engines, each different from another with distinct strong and weak points. I have personally worked with InnoDB, MyISAM, MEMORY and MERGE storage engines, and they are the ones I will to cover. While most of it is good theory, and this article will cover the important points (benefits and problems) associates with each of the engines, I will also state my own experiences alongside.
MyISAM: This engine supports high read and write rates, and is best of high reads of large data. The data is stored in files on the disk. The important points to note about MyISAM are:
- MyISAM does not support transactions and foreign key constraints.
- It does not support row level locking. This can be a pain, because the entire table gets locked in case of inserts and updates. Readers obtain shared locks while writers obtain exclusive locks. Hence, it is better to split one large read statement into multiple small read statements whenever possible.
- It comes with a handy tool myisamchk command line tool to check and repair corrupted tables.
- It also comes with a nice myisampack utility which compresses MyISAM tables. Though the tables are no more available for writes, the reads become faster because of fewer disk seeks to get the records.
- Indexes are simpler when compared to InnoDB
- MyISAM has built-in full-text search.
My Experiences: I have found MyISAM a very robust engine, and easily holds 10 to 20 million records without causing any noticeable decrease in performance. Obviously, indexes play a huge role in the performance.
MyISAM is best used when there are large number of reads as against writes. This engine is not suitable for a huge number of writes.
InnoDB: InnoDB is ACID complaint storage engine of MySQL, and supports foreign key constraints. It boasts of automatic crash recovery and high performance as strong weapons from its arsenal. Important points:
- Supports transactions.
- It has row level locking feature, which means that it can support multiple writes and reads at the same time.
- The index size tends to be large (because secondary indexes also contain primary index), and hence high amount of disk space is required when compares to MyISAM.
- Altering highly indexed tables is a very expensive operation for InnoDB engine because of need to recreate indexes. But altering a table is something that is rare.
- Because of complex index structure, the inserts and updates are somewhat slower than MyISAM (but this is easily compensated by support of multiple concurrent writes due to row level locking).
My Experience: In my case, as long as the volume of data remained below 5 million, InnoDB was happy with inserts and selects, and very fast when compared to MyISAM. But as the data grew, inserts became slow due to multiple indexes, and complex selects slower.
MyISAM Merge Engine (MERGE): It is a combination of identical MyISAM tables into one virtual table. This is best used for logging and warehousing applications. It is best used with myisampack utility. Important points:
- It is easy to repair multiple small tables in case of crash rather than one huge table.
- It can use indexes of the original underlying tables, though the primary key or unique key constraints holds no meaning for a merge table.
- Main problem comes when you want to alter the structure of the table. In that case, you need to alter all the underlying tables, otherwise the merge table breaks.
- Index reads are slower.
- The order of indexes in the MERGE table and its underlying tables should be the same.
My Experience: In my case, I had created one MyISAM table based on month, and current month table accepted data during inserts, while older tables were compressed with myisampack. The entire set was merged into a merge engine table to provide complete data set and faster writes. This model operated flawlessly some a long time, but afterwards, the number of tables to manage in case of an alter command became a pain.
MEMORY Engine: This engine stores the data in the memory, becoming very fast for storage and reads, but is not safe for sensitive data. A crash or restart of mysql will cause loss of data. Points:
- They use hash indexes by default, which makes them very fast, and very useful for creating temporary tables.
- MEMORY engine tables cannot have TEXT or BLOB columns, making them unsuitable for tables with such requirements. The best usage of this engine is for storing temporary session information.
- There is a limit on the amount of data that can be stored in MEMORY engine tables. Hence, it becomes important to keep deleting older and obsolete data.
My Experience: I often use memory table to store data which can be recreated from another tables. For example, summary tables while displaying logs to the user. Summary tables are very handy, and are populated using external jobs. Because the underlying data is safely stored either in MyISAM or InnoDB engine, the risk of data loss is negligible. Because data is stored in memory, and is preprocessed most of the times, the lookup tends to be extremely fast.
Screen in Linux
Screen is yet another way to run processes in the background (besides nohup), and comes very handy at times when running process that are taking lot of time. Let us work our way with an example of how to use screen command. Install screen if required using apt-get or yum or your default package manager, and start screen using:
Now, once started, you can multiple windows of screen in the same terminal. For example, I can run a script which sleeps for 1 hour, before printing something:
#!/bin/bash sleep 3600 echo "done"
I run it as shown below:

The script waits and waits. In the same terminal, I can create a new screen, leaving this process running in the background using a shortcut key ‘CTRL-a c’ (First press ctrl-c, and then c. This is called key binding, explained later). I am presented with another screen for me to work on, and I can run ‘top’ on it:
Once I am done using the top command, I can exit the screen with an exit command, or switch to another screen with ctrl-a,n (next) or ctrl-a,p (previous). But let’s assume while my script was running on a remote server, my connection broke and I lost my screen. Here is where screen command options come handy. We can reattach to the previous disconnected screen using ‘screen –ls’ to see which screens are available for us to reattach, and then using ‘screen -r’ as shown below (in another terminal):

I return to my old ‘top’ command, and by pressing ctrl-a,n I can see the script running as well, patiently waiting for the sleep to end.
Screen comes to rescue in case we are connected to a bad connection, or when we are using more than one computer and need to share the same terminal.
Screen has lot of options and shortcuts to use. Few most used options are:
-d -r : Reattach a session and if necessary detach it first.
-ls and -list : does not start screen, but prints a list of pid.tty.host strings and creation timestamps identifying your screen sessions. Sessions marked `detached’ can be resumed with “screen -r”. Those marked `attached’ are running and have a controlling terminal. If the session runs in multiuser mode, it is marked `multi’. Sessions marked as `unreachable’ either live on a different host or are `dead’. An unreachable session is considered dead, when its name matches either the name of the local host, or the specified parameter, if any. See the -r flag for a description how to construct matches. Sessions marked as `dead’ should be thoroughly checked and removed. Ask your system administrator if you are not sure. Remove sessions with the -wipe option.
-x : Attach to a not detached screen session. (Multi display mode). Screen refuses to attach from within itself. But when cascading multiple screens, loops are not detected.
As mentioned above, screen can also take shortcut keys to follow commands (this is called key binding). They start with ctrl-a, and the command key shown below. To get this screen help, press ‘ctrl-a ?’


