The Muse: 2008

You've got hacked :)

Yup since I opened up the ssh ports, once a day someone tries to get into the system. Here a small piece of log,

Oct 13 17:19:07 *******-server sshd[7250]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.117.255.223  user=nobody
Oct 13 17:19:09 *******-server sshd[7250]: Failed password for nobody from 58.117.255.223 port 35131 ssh2
Oct 13 17:19:17 *******-server sshd[7255]: Invalid user patrick from 58.117.255.223
Oct 13 17:19:17 *******-server sshd[7255]: pam_unix(sshd:auth): check pass; user unknown
Oct 13 17:19:17 *******-server sshd[7255]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.117.255.223
Oct 13 17:19:19 *******-server sshd[7255]: Failed password for invalid user patrick from 58.117.255.223 port 35703 ssh2
Oct 13 17:23:50 *******-server sshd[7325]: Invalid user jane from 58.117.255.223
Oct 13 17:23:50 *******-server sshd[7325]: pam_unix(sshd:auth): check pass; user unknown
Oct 13 17:23:50 *******-server sshd[7325]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.117.255.223
Oct 13 17:23:52 *******-server sshd[7325]: Failed password for invalid user jane from 58.117.255.223 port 52794 ssh2
Oct 13 17:23:59 *******-server sshd[7327]: Invalid user pamela from 58.117.255.223
Oct 13 17:23:59 *******-server sshd[7327]: pam_unix(sshd:auth): check pass; user unknown
Oct 13 17:23:59 *******-server sshd[7327]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.117.255.223

The address belongs to some kid in China (Beijing Education Information Network). The box has been safe so far.

Finally after successfully setting up OpenVPN with Linux server and XP/Ubuntu clients (Vista is evil), we decided to create the final network, but were stalled with the general problem of names (wonder what will I do when I have to name my kids!!). Looked around for tips and there is an RFC on it RFC 1178 which gives a few tips for names.

Without much ado the names helium, neon, argon, krypton clicked. Nobel Gases I say...

Book progress

I've finished the first chapter and have been able to rope in a few friends help me proof read and correct me along the way. I had never thought writing a book would be so much fun, but there is a lot of thought process also put in to make it as simple yet professional for anyone and everyone to understand.

There are many references from wikipedia for definitions of certain terms and terminologies. For those who don't trust wikipedia can search up their favourite reliable resources for further explanation and review.

Another interesting problem I'm facing as I go ahead is, how much detailed should the explanation be. What is the level of assumption I should consider while explaining a topic. Hopefully my friends will be able to point that out, if you are interested to participate in the proof reading, just drop me an email and I'll be glad to send you an invite.

My first E-Book: Kick Start a software start-up

Ok after all the configuration and working of the VM server with 4 virtual machines, I found it too much info to blog (lazy me..) with all the steps and configuration (I have some 3 posts still as draft yet to be published). There were some random thoughts on how this process and infrastructure can be used by others (start-ups or low budgets) for thier benefit.

Since most of the blog postings are enlightenments that occured to time once in a while, I've decided to consolidate all this information and jott them down like a step-by-step guide on how can we use the VM infrastructure for small shops or small projects in large shops. The table of contents for this book will look something like this.

Table of Contents

Preface
Who should read this book
What are the technologies we'll cover
Introduction
Analyzing & Building the server
Software selection.
RAID & LVM
Setup & Configure your first VM
VPN Applicance
Creating more Appliances I: Webserver, Database
Creating more Applicanes II: Version Control, Continious Integration Servers
Timed VNC sessions (iTalc or vncthumbnailviewer)
Appendix A: Configure your Linksys router with DDNS

If you have anything else you would like to consider let me know if practically possible I'll try to include it into the book. I hope to get the book out in 2-3 months (provided I can give it one hour a day). People volunteering for proof-reading are most welcome.

CollectionUtils (apache commons) Bad Boy....

We had a problem at work today, where we need to find the difference between two collections and remove the uncommon ones from the first one and add the uncommon ones from the first to the second.

This is related to hibernate add-remove-update for collections. Read here about it

Set A = {R, G, B}
Set Z = {R, G, Y}

Final Set to the DB is = {R, G, Y}, well looks simple so we just send the second set Z, yeah but the catch is it needs to maintain the instance of A and finally send back A to the hibernate layer again, so all the manipulations happen on it.

Mathematically solving the problem, we need an intersection of A & Z {R, G} and a union of the relative complement of A & Z {R, G } + {Y} = {R, G , Y}

Using Java we can solve this using three approaches.

For Loop
CollectionUtils
JDK collections native (1.6)

For Loop

private static void forLoop(List<Set<String>> all) {
Set db = all.get(0);
Set web = all.get(1);

Set tempCollection = new HashSet<String>();
tempCollection.addAll(db);

for (String t : tempCollection) {
if (!web.contains(t)) {
db.remove(t);
}
}

for (String t : web) {
if (!tempCollection.contains(t)) {
db.add(t);
}
}
}

CollectionUtils

private static void collectionUtils(List<Set><String>> all) {
Set db = all.get(0);
Set web = all.get(1);

CollectionUtils.retainAll(web, db);
db.addAll(web);
}

JDK Collection Native

private static void jdkNative(List<Set><String>> all) {
Set db = all.get(0);
Set web = all.get(1);

db.retainAll(web);
db.addAll(web);
}

Now we ran a performance test on both, and were shocked to see the results.

forLoop -18 ms
collectionUtils -72 ms
jdkNative -5 ms

This is what the web and db collection were made up of.

Create Collections

    private static List<Set<String>> createCollections() {
 List<Set<String>> all = new ArrayList<Set<String>>();

 Set<String> db = new HashSet<String>();
 Set<String> web = new HashSet<String>();
 for (int i = 0; i < 10000; i++) {
   db.add(String.valueOf(i));
 }

 for (int i = 5000; i < 15000; i++) {
   web.add(String.valueOf(i));
 }

 all.add(db);
 all.add(web);

 return all;
}

and the main method

  public static void main(String[] args) {
 long start = 0;
 long end = 0;

 List<Set<String>> all = createCollections();

 start = new Date().getTime();
 forLoop(all);
 end = new Date().getTime();
 System.out.println("TIME BY forLoop " + (end - start));

 start = new Date().getTime();
 jdkNative(all);
 end = new Date().getTime();
 System.out.println("TIME BY jdkNative " + (end - start));

 start = new Date().getTime();
 collectionUtils(all);
 end = new Date().getTime();
 System.out.println("TIME BY collectionUtils " + (end - start));

}

So avoid using the CollectionUtils. See what is more important, cleaner code or faster code ;)

iPhone & CFA

I've enrolled myself for the Level I CFA exam in December '08. Studying on and off, depending on the schedule and time availability. Off late I stumbled upon SenFinance an interesting site full of Accounting and Quantitative tutorials for CFA Level I.

I downloaded them all and wanted to view them on my way to work (a 20 minute train travel is good to study). So I got my hands on SUPER® which is an amazing software to convert any one video or auto file to another format. Now I finally have the tutorials in mp4 format and can study on my way to work!

Setting up a PHP development environment on your local system

This is an old post I had written on google docs, before VMware Server was free for the general user. Thought could be still more relevant to the general developer for testing in Linux envoirment. VMPlayer I beleive is lighter than VMware Server.

There are 2 stages of setting up the development & testing environment for yourself. The preferred work environment is Linux. If you already on Linux then you can safely skip Stage I, you can directly proceed to Stage II, also if you have Linux already installed, just cross check if you have the LAMP (Linux + Apache + MySQL + PHP) server installed.

Pre Installation Software Download Links

http://www.vmware.com/download/player/ [VMPlayer]
http://www.ubuntu.com/getubuntu/download [Ubuntu]

STAGE I: Setting up Ubuntu on VMWare.

Download Ubuntu/Xubuntu/Kubuntu Desktop edition ISO, what ever you like... all the same just different filemanagers (Gnome/Xfe/KDE resp.). If you are comfortable with command line, then you can also install the Server edition with no XWindows and only commandline.
Download and install VMPlayer for your system.
Download qemu for your operating system and extract in any directory you like.
Open your commandline to go ahead to bin folder of qemu and type

qemu-img create -f vmdk ubuntu.vmdk 3G

Here we are creating a 3GB, virtual hard drive in VMWare format. This file will now act as our hard drive for VMWare to install an OS on it. The name of the file is ubuntu.vmdk, you can name it what ever you wish to.
Now the main vmx file (The VMWare configuration file). change the highlighted paths as per your system.

 #!/usr/bin/vmware
config.version = "8"
virtualHW.version = "4"
ide0:0.present = "TRUE"
ide0:0.filename = "ubuntu.vmdk"
# The amount of RAM you want to allot to the Operating system. For Desktop use 512 and server just 256.
memsize = "512"
MemAllowAutoScaleDown = "FALSE"
ide1:0.present = "TRUE"

#ide1:0.fileName = "auto detect"
#ide1:0.deviceType = "cdrom-raw"

ide1:0.fileName = "ubuntu-7.04-server-i386.iso"
ide1:0.deviceType = "cdrom-image"

ide1:0.autodetect = "TRUE"
floppy0.present = "FALSE"
ethernet0.present = "TRUE"
usb.present = "TRUE"
sound.present = "TRUE"
displayName = "Ubuntu LAMP Server"
guestOS = "ubuntu"
nvram = "ubuntu-server-three.nvram"
MemTrimRate = "-1"

ide0:0.redo = ""
ethernet0.addressType = "generated"
uuid.location = "56 4d ce 99 e0 d2 2b bf-73 47 ac 62 65 13 57 86"
uuid.bios = "56 4d ce 99 e0 d2 2b bf-73 47 ac 62 65 13 57 86"

tools.syncTime = "TRUE"
ide1:0.startConnected = "TRUE"

uuid.action = "create"

checkpoint.vmState = "ubuntu-lamp-server.vmss"

isolation.tools.hgfs.disable = "TRUE"
virtualHW.productCompatibility = "hosted"
tools.upgrade.policy = "manual"

tools.remindInstall = "TRUE"

usb.autoConnect.device0 = ""

After setting all the paths correctly, if you have VMPlayer installed, just save the vmx file, (call it ubuntu.vmx) and double click on the file.
If all the paths are set correctly, the VMPlayer will boot up the virtual drive and show the ubuntu installation menu. This is easier than windows a million times.
At the end of the installation (server or desktop) the process will ask you, if you want to install LAMP server. Select it and let it install the LAMP server.
Reboot the system, and you are good to go. You have successfully installed Ubuntu on VMWare.

STAGE II: Check the configuration setup of Apache/MySQL/PHP

Now we'll check if PHP and Apache have been successfully installed on your system. Go the /var/www folder, you should be seeing a apache2-default folder out there. If you see these then apache seems to be installed. Just open the browser and type http://localhost you should be able to see the apache-default folder over there. If you a receive a page not found then apache is not running or installed properly.

If things look good, next stage is to check PHP installation. Create a file named index.php and type the following there.

 <?php phpinfo(); ?>

Just refresh the localhost (place the file in /var/www) That should give you loads of PHP info on the screen in blue, purple and a huge table. If that happens you are good to go!!! Else something is wrong!!!.

MySQL Check
Open the command line and type

mysql -uroot

If it opens up with a mysql> prompt then it's good else something is wrong. Your mysql password is blank.

If all sounds good, now go to Stage III

STAGE III: Applications to setup.

Download

JOOMLA
WORDPRESS

Extract them to the /var/www folders and access them using http://localhost/joomla and http://localhost/wordpress respectively if you extracting them to the folders on these names.

How to go about setting them, start the index.php or read the readme file and you should be good to go.

The demise of MyGadgetBuilder

Hello all, my previous blog, MyGadgetBuilder.com has been officially shut down. I'll not be renewing the domain name any more, so some one can go ahead and claim it. It was a nice learning experience. Being a developer with some design skills, putting up the site was a piece of cake, but getting people to post on the site was kind of painful.

People had issues with copyright, data protection and all kinds of issues. Also things/people change as time passes by, everyone gets busy with their own profession and has little time for off-track exploration for their old love (electronics). I don't blame them, priorities do change; but took me some time to realize :) better late than never.

Well all for good. Hence forth MGB will point to this blog (till end of subscription life). Instead visit Instructables a nice site, which already existed on the lines of the world I was planning to build.

Cloud Computing on VMWare or Xen

What is Cloud Computing (CC)?
A lot has already been said and blogged about cloud computing. Just to give you a one liner about it;

It's the availability of system resources to allow scalability of the system as and when required by the application.

As and when required?
Consider an example of a build server running in our VM environment, now this server during it's peak instrumentation of code and all kinds of code analysis requires the power of a quad core machine with 4GB RAM, so when you perform your hardware requirement analysis for this server, you consider all these factors and build a VM fulfilling these requirements.

Over the period of time your system usage reports show a small spike in system resource usage every 3-4 hours (every commit build) and a peak spike for 2-2.5 hours every 24 hours (nightly builds), the remaining time the system is pretty much idle doing nothing, eating up precious RAM allocation and number of CPUs dedicated.

Now if we push this setup in a CC environment, the system will use bare minimum resources when idle and grab all the possible resources from the cluster, during peak cycles. This allows us to leverage the un-utilized system resources, during non-peak cycles.

VMWare has DRS - Distributed Resource Scheduler, which helps the VMs attain similar to cloud status to dynamically allocate the resources.

Now Xen also has something similar, but then is it available for a small start-up (read free)? Yes, Nimbus@UC is something we can look at, will try digging in more into it and see what we can achieve out of it. So let us see how we can leverage the existing hardware to allow maximum virtual resource utilization.

Update: Read this article http://highscalability.com/eucalyptus-build-your-own-private-ec2-cloud

Virtualization: The Practical Implementation - I

I'm helping my friend Nitin at his firm Star4ce Technologies put up a virtualization server as their in-house development environment. To begin with we went ahead and ordered some hardware from newegg.

Here is a brief on the hardware we've picked;

Intel Core 2 Quad Q9300 Yorkfield 2.5GHz LGA 775 95W Quad-Core Processor
ASUS P5K-E LGA 775 Intel P35 ATX Intel Motherboard
(5) Seagate Barracuda 7200.11 ST3500320AS 500GB 7200 RPM SATA 3.0Gb/s Hard Drive
(2) CORSAIR 4GB (2 x 2GB) 240-Pin DDR2 800 (PC2 6400) SDRAM Dual Channel
LITE-ON 20X DVD±R DVD Burner with LightScribe SATA
Antec 850W ATX12V / EPS12V Power Supply
Antec P182 Gun Metal Black 0.8mm cold rolled steel ATX Mid Tower Computer Case
ZOTAC GeForce 7300GT 256MB 128-bit GDDR2 PCI Express x16
Logitech USB + PS/2 Cordless Standard Desktop EX110 Mouse Included

Now the bill fits to around $1,500.00 as of the day this article was written. Hopefully by next week we shall receive the components and we shall go ahead and assemble the system.

A few after thoughts about the hardware choices

Intel Core 2 Quad Q9300
45nm consumes less power compared to predecessors and can be over clocked to 3.2GHz.

ASUS P5K-E
It's not a high end server, but can be used for a low end server. We did see a few other motherboards but opted for this, as they had SLI and were more geared towards being gaming PC's.

Why no Server Motherboards
This is an interesting find we dug into, we decided to hit the server motherboards for the system, as they supported 32GB of RAM quite easily as compared to the 8GB max of the above mentioned. The price was not that different approx $150.00 higher (with dual processors -physical) but it had FB-RAM which took the toll on the price of the system. 4GB on FB-RAM goes to approx $250.00 to $550.00 for 8GB. Yes now that's a whopping high number we are talking about.

So we are back to reality with a lower end system (recall the first article) it was aimed towards start-ups and low budget development teams.

Seagate Barracuda 7200.11 ST3500320AS
32MB cache, and I trust Seagate. 5 pieces: 1- OS and 4- RAID 01

CORSAIR 4GB (2 x 2GB) 240-Pin DDR2 800
Nice & reliable.

ZOTAC GeForce 7300GT 256MB
There was no onboard video, this was the lowest and best

So we wait and watch for the parts to arrive. I'll be posting in intermittently the progress, issues & accomplishments along the way.

UPDATE: The shipping was quick, the parts should be arriving today afternoon.

DATETIME vs TIMESTAMP vs DATE & TIME - II

Ok now the test are run and the results are out, I know we are all excited to know them, and I'm equally eager to print them too!!

The test did a simple select * from the tables.

  public void fetchAll() throws Exception {
  String SQL1 = "SELECT * FROM dateandtime";
  String SQL2 = "SELECT * FROM datetime";
  String SQL3 = "SELECT * FROM timestamps";

  long start = 0;
  long end = 0;

  System.out.println("ONE");
  start = new Date().getTime();
  selectQuery(SQL1);
  end = new Date().getTime();
  System.out.println(" SQL 1 - dateandtime " + (end - start));

  System.out.println("TWO");
  start = new Date().getTime();
  selectQuery(SQL2);
  end = new Date().getTime();
  System.out.println(" SQL 2 - datetime " + (end - start));

  System.out.println("THREE");
  start = new Date().getTime();
  selectQuery(SQL3);
  end = new Date().getTime();
  System.out.println(" SQL 3 - timestamps " + (end - start));
}

The time to fetch kept on reducing with every subsequent calls.

 SQL 1 - dateandtime 4526 ms
SQL 2 - datetime 2852 ms
SQL 3 - timestamps 3577 ms

SQL 1 - dateandtime 4168 ms
SQL 2 - datetime 2467 ms
SQL 3 - timestamps 3073 ms

SQL 1 - dateandtime 4080 ms
SQL 2 - datetime 2346 ms
SQL 3 - timestamps 3130 ms

SQL 1 - dateandtime 3949 ms
SQL 2 - datetime 2419 ms
SQL 3 - timestamps 3043 ms

So looks like DATETIME wins in fetching speed.

Duplicate file finder

Disk space is cheap, starting from a 1.2GB hard drive from my first computer to a "spare" 500GB external hard drive, cheap data storage has come a long way. I click a lot of photographs ever since I got my first digital camera, and I store a lot of these photos too (locally), now since 2006 I have over 201,608 photos and some videos. My camera photo number counter has reset twice!!

A few days back I decided to hand over my drive to my brother, since he was leaving soon, I dumped all the stuff on my two laptops, and forgot about them. Of late my wife asked me to pick up good nice pics so we can print them. That's when I realized I had a lot of duplicate photos, now the simplest idea was to find the duplicate file names and delete them, but that was not possible, since I had already reset the counter so I technically had 3 files with the same name and atleast 10,000 of them!!

MD5 to the rescue. Since all the photos and movies are binary files, MD5 seemed ideal to me...

MD5 digests have been widely used in the software world to provide some assurance that a transferred file has arrived intact. For example, file servers often provide a pre-computed MD5 checksum for the files, so that a user can compare the checksum of the downloaded file to it.

So I started my eclipse and churned out a program to scan my HDD and compare MD5 keys and find all duplicates.

The method below generates the MD5 checksum for any file

  private static String generateMD5(String path) throws IOException {
 MessageDigest digest;
 InputStream is = null;
 try {
   digest = MessageDigest.getInstance("MD5");
   is = new FileInputStream(new File(path));
   byte[] buffer = new byte[8192];
   int read = 0;

   while((read = is.read(buffer)) > 0) {
     digest.update(buffer, 0, read);
   }
   byte[] md5sum = digest.digest();
   BigInteger bigInt = new BigInteger(1, md5sum);
   return bigInt.toString(16);
 } catch(NoSuchAlgorithmException e) {
   e.printStackTrace();
 } catch(FileNotFoundException e) {
   e.printStackTrace();
 } catch(IOException e) {
   e.printStackTrace();
 } finally {
   is.close();
 }
 return null;
}

Then go ahead and get a list of all the files on your system


missed the code

And finally run the main method.

public static void main(String[] args) throws Exception {
 List filePaths = new ArrayList();
 File file = new File("/home/varun/workbench/duplicates.csv");
 FileWriter fw = new FileWriter(file);
 SortedMap duplicates = new TreeMap();
 filePaths = generateFileMap("/mnt/datastorage/photos", filePaths);
 for(String path : filePaths) {
   String hash = generateMD5(path);
   if(duplicates.containsKey(hash)) {
     fw.write(path + "," + duplicates.get(hash) + "\n");
   } else {
     duplicates.put(hash, path);
   }
 }
}

So go ahead and run this, you can also extend it to generate a list of any kind of duplicate files. Average file size is 1.5 - 2.0 MB

The output file when viewed on Open Office, Google Docs or Excel looks like this

This is how the output looks like this. It shows you which file a duplicate of which other file.

/mnt/datastorage/Photos/2008/Photos/Halloween NYC 2007/DSC01958.JPG /mnt/datastorage/Photos/2008/Photos/SORT ME/DSC01958.JPG
/mnt/datastorage/Photos/2008/Photos/Halloween NYC 2007/DSC01959.JPG /mnt/datastorage/Photos/2008/Photos/SORT ME/DSC01959.JPG
/mnt/datastorage/Photos/2008/Photos/Halloween NYC 2007/DSC01960.JPG /mnt/datastorage/Photos/2008/Photos/SORT ME/DSC01960.JPG

UPDATE: Ran the program with SHA algorithm also and here are the comparison times.

Time for MD5 858,953ms (14.31 minutes)
Time for SHA 1,191,656ms (19.80 minutes)

Bibliography

DATETIME vs TIMESTAMP vs DATE & TIME

I'm starting off this project and wanted to study some data retrieval optimization values. DATE & TIME are the two most deciding factors for processing the information in my app. The aggregation, classification, sorting & grouping of data is based on DATE & TIME.

Daily reports
Weekly reports
Every day at 00:00 hours.
Every year on this date

So there is a huge amount of chronlogical processing. We might require to process the data just date, just time, or both date & time. So was born the question. "What is the most optimum way of storing information DATE & TIME, DATETIME or TIMESTAMP?" The initial study helped me find this.

From the MySQL manual...

Storage Requirements for Date and Time Types

Data Type Storage Required

DATE 3 bytes

TIME 3 bytes

DATETIME 8 bytes

TIMESTAMP 4 bytes

YEAR 1 byte

The storage requirements shown in the table arise from the way that MySQL represents temporal values:

DATE: A three-byte integer packed as DD + MM×32 + YYYY×16×32
TIME: A three-byte integer packed as DD×24×3600 + HH×3600 + MM×60 + SS

DATETIME: Eight bytes:

A four-byte integer packed as YYYY×10000 + MM×100 + DD
A four-byte integer packed as HH×10000 + MM×100 + SS

TIMESTAMP: A four-byte integer representing seconds UTC since the epoch ('1970-01-01 00:00:00' UTC)
YEAR: A one-byte integer

So in terms of data storage, DATETIME is 8 bytes, TIMESTAMP 4 bytes, DATE & TIME 6 bytes (3 each). Ideally TIMESTAMP is good enough, if it fits my needs.

8 bytes > 6 bytes > 4 bytes

Memory is getting cheaper by the day, so let's ignore this for the time being, we'll revisit the storage factor a bit later.

Since I have to fetch information and process it, I decided to run some test in MySQL. Below is the schema of the database.

CREATE DATABASE datetest;

USE datetest;

DROP TABLE IF EXISTS dateandtime;

DROP TABLE IF EXISTS datetime;

DROP TABLE IF EXISTS timestamps;

CREATE TABLE dateandtime (
 timeonly TIME,
 dateonly DATE,
 counter    INTEGER,
 salary DECIMAL(10,2),
PRIMARY KEY (timeonly, dateonly));

CREATE TABLE datetime (
 dateandtime DATETIME,
 counter    INTEGER,
 salary DECIMAL(10,2),
PRIMARY KEY (dateandtime));

CREATE TABLE timestamps (
 timestamps TIMESTAMP,
 counter    INTEGER,
 salary DECIMAL(10,2),
PRIMARY KEY (timestamps));

I added approximately 100,000,000 records to each table, and then ran further test on it. As of now I'm yet to write the test cases, after I'm done I'll put the files on.

Found another interesting post, you might want to touch base on.
http://www.scribd.com/doc/2565263/The-top-20-design-tips-for-MySQL-Enterprise-data-architects

Bibliography

http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html

Agile environment using virtualization

(This document is aimed more towards Java & VMWare, but the same can be replicated for any other language & environment)

Abstract

You have a team of developers working on different modules for a project-product which are inter-dependant. Each developer diligently writes unit & integration tests supporting their code. You want to set-up an agile test environment to run the unit tests & staging server for integration test, but purchasing hardware for multiple machines is a constraint.

The following article provides a guide on how virtualization tools like VMware can be used to set-up staging or QA environment (Agile), with approximate minimum hardware & time investment.

Hardware Sizing

The hardware sizing performed here is not an exact calculation, but an educated guess at the approximate system requirement to run the application.

Application	RAM	CPU
Revision Control (SVN or CVS)	512MB - 1GB	Single
Continuous Integration (Cruise Control or Bamboo)	1GB - 2GB	Single
Database (MySQL*)	2GB - 4GB	Single-Dual
Webserver (Wiki, Bugzilla, JIRA)	1GB	Single

*MySQL on VMware has some serious performance issues for production environments, if your application is very database intensive with thousands to millions of I/O per second, you might want to avoid virtualization all together. For testing purpose (application sanity, NOT performance) VM is decent.

Considering the above list of applications, and the assumption, each physical core can support 2 virtual cores.

Hardware configuration

Quad-core processor CPU
8GB of RAM
5 HDDs (1 OS + 4 HDD with RAID**)
Other basic required components

**Setting up RAID and the type of configuration suiting your needs is a seperate chapter all together, We consider mirrored and stripped there are loads of articles available on the internet on this topic.

The approximate cost of the above system is roughly $2,500.00 - $3,500.00 (as the date of this article)

Set-up

You can either install Windows or Linux (we are on an Ubuntu system) as the HostOS. Since you have 8GB of RAM on the physical machine, 32 bit OS's are not capable of using above 3-4GB. Use 64-bit distributions.

Download VMWare server from [http://www.vmware.com/products/server] select the distribution type depending on your OS. The guest and the host OS can be completely different, they need not be same there is no relation between them.
eg: You can install Windows as host and Linux as guest or vice-versa.

You can either create the virtual machines all by yourself using the VMware step-by-step wizard, or download 'appliances' created by others from the VMware site [http://www.vmware.com/appliances], this saves you the initial installation effort.

When you create a new VM, you are also asked the default networking connection type. NAT and Bridged are the most common options one selects from. Use NAT if you want the VM to talk within its own subnet only, or Bridged if you want other machines on the same subnet as the host access them.
eg: Your host machine is on 192.168.10.50

Tip: If you installing the guest yourself, you can create one master guest, with all the common applications configured (eg: OpenSSH server would be great to have on all the guests for remote management or java), you'll have to change the host name after that.

Power up each of the VM's and install the required applications for which they've been set and use them like any other machine on the network.

Your source files are versioned on the Revision Control Server
Your tests are run nightly on the Cruise Control Server
Your team documents the whole project and process on the Wiki, and file bugs in Bugzilla
Your MySQL server is used for running the DB & the supporting DBs for other apps (bugzilla, wiki)

These VM's need to be maintained like any other normal physical machine on the network.

Pitfalls

Putting all your eggs in one basket, if the main VMware server fails, all the "machines" (VM's) fail.
Since the server is dependant on the native OS it sits on, the capabilities of the VMServer is restricted by the OS
Solution: Use commercial ESX Server or other alternatives

Alternatives

You can use the Xen too for virtualization. I've heard pretty good things about it too, with a few short comings & advantages over VMServer (that's a separate topic all together).

There is an article dedicated to Virtual Machines on Wikipedia, you can refer that depending on your needs and fix upon a solution that suits your needs better

Glossary

Virtualization - The virtual machine simulates enough hardware to allow an unmodified "guest" OS (one designed for the same CPU) to be run in isolation

Continuous Integration - Continuous integration describes a set of software engineering practice's that speed up the delivery of software by decreasing integration times

Revision Control - Revision control (also known as version control (system) (VCS), source control or (source) code management (SCM)) is the management of multiple revisions of the same unit of information

Host OS - The operating system installed on the physical machine running VMware Server.

Guest OS - The operating system installed on the virtual machine.

Bibliography

About

Always quelling over a problem laden upon by clients, situations or myself in quest for a better solution. And sometimes, I do hit upon one! It is these solutions or concepts I have dawned upon, that I would like to share, and see them implemented by others in the real world.

I code for a living, click pictures as a hobby & rave about geeky gadgets. Hopefully soon I shall lay my hands upon the Nikon D-90, and finally revive my photoblog. Till then you'll have to do away with a hopelessly outdated version at http://varunphotography.blogspot.com

Data Type	Storage Required
`DATE`	3 bytes
`TIME`	3 bytes
`DATETIME`	8 bytes
`TIMESTAMP`	4 bytes
`YEAR`	1 byte