The Muse: java

Showing posts with label java. Show all posts

DWR + Lazy Loading

Time immemorial, we are all aware about the famous lazy load exceptions in Hibernate. Using a detached object does not exactly hide us from the issue. We are using DWR in our application and found that when marshalling (or unmarshalling if you insist) a detached object to JSON, DWR was erratically trying to call a null (lazyly loaded) object, causing LazyLoadExceptions.

We did not want to use OpenSessionInViewFilter as it is evil (some day I'll post all the consolidated rantings as one). You can google and find a lot of info around it.

As a quick fix we did what one would first think of, "It's a demo, just eager load it!". Well the demo was over and now was the time to investigate a better solution, and we did find it.

It was no rocket science, the solution was available in front of us all along, we just did not look hard enough. When using DWR, there are BeanConverters available which are responsible for this marshalling process. There was one for Hiberante, called "hibernate3". http://directwebremoting.org/dwr/server/hibernate

The HibernateBeanConverter tries to avoid reading from un-initialized properties. (If you just want something that blindly reads everything then just use a plain BeanConverter).

Bingo, so when exporting a detached hibernate object to DWR, use type="hibernate3" and that should resolve the lazy load issues.

By Contract or By Convention

Should we be using interfaces for everything and making everyone explicitly implement them or should we trust that everyone follows the same convention.

By Contract

For use cases, where a certain contracts needs to be followed, we should use interfaces, this allows easier testing and loose coupling of implementation, allowing us more flexibility to change the implementation at run time as asked for.

eg: UserService interface has a method fetchUser();

We can have two implementations for this service method, one using a DAO and other the WebService. Thus the implementation is loose and "the contract" is set to return a User.

By Convention

In certain cases, we cannot implement a contract, esp where the concerns span across multiple layers of the app. This is when we need to define convention over contract. AOP is the perfect use case where this needs to be strictly followed.

Let's consider the service methods which fetch information for displaying info to the front end. They need to have a specific security restriction associated with it, or a use case where we need to assign "READ_ONLY" rights to certain users for an object. In order to keep the code as decoupled as possible we use AOP, which apply concerns on these methods, without having actual references in the code.

For these concerns to be properly applied, we need to follow conventions.

eg: All DAOs should persist an object using the create() method only, that ways we can add a concern to block access to all create() methods for a group of user. If someone plans to "be different" and follow their own convention, like calling the method persist(), this can open up a huge security hole in the application. The AOP concern will skip "persist()" as it does not follow convention. Such issues will not be caught unless there is a proper code review performed on the code base, or we have really really strong test case.

If your module needs to follow a convention, which cannot be controlled using a contract, please document it, so as the team can follow is correctly.

My first E-Book: Kick Start a software start-up

Ok after all the configuration and working of the VM server with 4 virtual machines, I found it too much info to blog (lazy me..) with all the steps and configuration (I have some 3 posts still as draft yet to be published). There were some random thoughts on how this process and infrastructure can be used by others (start-ups or low budgets) for thier benefit.

Since most of the blog postings are enlightenments that occured to time once in a while, I've decided to consolidate all this information and jott them down like a step-by-step guide on how can we use the VM infrastructure for small shops or small projects in large shops. The table of contents for this book will look something like this.

Table of Contents

Preface
Who should read this book
What are the technologies we'll cover
Introduction
Analyzing & Building the server
Software selection.
RAID & LVM
Setup & Configure your first VM
VPN Applicance
Creating more Appliances I: Webserver, Database
Creating more Applicanes II: Version Control, Continious Integration Servers
Timed VNC sessions (iTalc or vncthumbnailviewer)
Appendix A: Configure your Linksys router with DDNS

If you have anything else you would like to consider let me know if practically possible I'll try to include it into the book. I hope to get the book out in 2-3 months (provided I can give it one hour a day). People volunteering for proof-reading are most welcome.

CollectionUtils (apache commons) Bad Boy....

We had a problem at work today, where we need to find the difference between two collections and remove the uncommon ones from the first one and add the uncommon ones from the first to the second.

This is related to hibernate add-remove-update for collections. Read here about it

Set A = {R, G, B}
Set Z = {R, G, Y}

Final Set to the DB is = {R, G, Y}, well looks simple so we just send the second set Z, yeah but the catch is it needs to maintain the instance of A and finally send back A to the hibernate layer again, so all the manipulations happen on it.

Mathematically solving the problem, we need an intersection of A & Z {R, G} and a union of the relative complement of A & Z {R, G } + {Y} = {R, G , Y}

Using Java we can solve this using three approaches.

For Loop
CollectionUtils
JDK collections native (1.6)

For Loop

private static void forLoop(List<Set<String>> all) {
Set db = all.get(0);
Set web = all.get(1);

Set tempCollection = new HashSet<String>();
tempCollection.addAll(db);

for (String t : tempCollection) {
if (!web.contains(t)) {
db.remove(t);
}
}

for (String t : web) {
if (!tempCollection.contains(t)) {
db.add(t);
}
}
}

CollectionUtils

private static void collectionUtils(List<Set><String>> all) {
Set db = all.get(0);
Set web = all.get(1);

CollectionUtils.retainAll(web, db);
db.addAll(web);
}

JDK Collection Native

private static void jdkNative(List<Set><String>> all) {
Set db = all.get(0);
Set web = all.get(1);

db.retainAll(web);
db.addAll(web);
}

Now we ran a performance test on both, and were shocked to see the results.

forLoop -18 ms
collectionUtils -72 ms
jdkNative -5 ms

This is what the web and db collection were made up of.

Create Collections

    private static List<Set<String>> createCollections() {
 List<Set<String>> all = new ArrayList<Set<String>>();

 Set<String> db = new HashSet<String>();
 Set<String> web = new HashSet<String>();
 for (int i = 0; i < 10000; i++) {
   db.add(String.valueOf(i));
 }

 for (int i = 5000; i < 15000; i++) {
   web.add(String.valueOf(i));
 }

 all.add(db);
 all.add(web);

 return all;
}

and the main method

  public static void main(String[] args) {
 long start = 0;
 long end = 0;

 List<Set<String>> all = createCollections();

 start = new Date().getTime();
 forLoop(all);
 end = new Date().getTime();
 System.out.println("TIME BY forLoop " + (end - start));

 start = new Date().getTime();
 jdkNative(all);
 end = new Date().getTime();
 System.out.println("TIME BY jdkNative " + (end - start));

 start = new Date().getTime();
 collectionUtils(all);
 end = new Date().getTime();
 System.out.println("TIME BY collectionUtils " + (end - start));

}

So avoid using the CollectionUtils. See what is more important, cleaner code or faster code ;)

Duplicate file finder

Disk space is cheap, starting from a 1.2GB hard drive from my first computer to a "spare" 500GB external hard drive, cheap data storage has come a long way. I click a lot of photographs ever since I got my first digital camera, and I store a lot of these photos too (locally), now since 2006 I have over 201,608 photos and some videos. My camera photo number counter has reset twice!!

A few days back I decided to hand over my drive to my brother, since he was leaving soon, I dumped all the stuff on my two laptops, and forgot about them. Of late my wife asked me to pick up good nice pics so we can print them. That's when I realized I had a lot of duplicate photos, now the simplest idea was to find the duplicate file names and delete them, but that was not possible, since I had already reset the counter so I technically had 3 files with the same name and atleast 10,000 of them!!

MD5 to the rescue. Since all the photos and movies are binary files, MD5 seemed ideal to me...

MD5 digests have been widely used in the software world to provide some assurance that a transferred file has arrived intact. For example, file servers often provide a pre-computed MD5 checksum for the files, so that a user can compare the checksum of the downloaded file to it.

So I started my eclipse and churned out a program to scan my HDD and compare MD5 keys and find all duplicates.

The method below generates the MD5 checksum for any file

  private static String generateMD5(String path) throws IOException {
 MessageDigest digest;
 InputStream is = null;
 try {
   digest = MessageDigest.getInstance("MD5");
   is = new FileInputStream(new File(path));
   byte[] buffer = new byte[8192];
   int read = 0;

   while((read = is.read(buffer)) > 0) {
     digest.update(buffer, 0, read);
   }
   byte[] md5sum = digest.digest();
   BigInteger bigInt = new BigInteger(1, md5sum);
   return bigInt.toString(16);
 } catch(NoSuchAlgorithmException e) {
   e.printStackTrace();
 } catch(FileNotFoundException e) {
   e.printStackTrace();
 } catch(IOException e) {
   e.printStackTrace();
 } finally {
   is.close();
 }
 return null;
}

Then go ahead and get a list of all the files on your system


missed the code

And finally run the main method.

public static void main(String[] args) throws Exception {
 List filePaths = new ArrayList();
 File file = new File("/home/varun/workbench/duplicates.csv");
 FileWriter fw = new FileWriter(file);
 SortedMap duplicates = new TreeMap();
 filePaths = generateFileMap("/mnt/datastorage/photos", filePaths);
 for(String path : filePaths) {
   String hash = generateMD5(path);
   if(duplicates.containsKey(hash)) {
     fw.write(path + "," + duplicates.get(hash) + "\n");
   } else {
     duplicates.put(hash, path);
   }
 }
}

So go ahead and run this, you can also extend it to generate a list of any kind of duplicate files. Average file size is 1.5 - 2.0 MB

The output file when viewed on Open Office, Google Docs or Excel looks like this

This is how the output looks like this. It shows you which file a duplicate of which other file.

/mnt/datastorage/Photos/2008/Photos/Halloween NYC 2007/DSC01958.JPG /mnt/datastorage/Photos/2008/Photos/SORT ME/DSC01958.JPG
/mnt/datastorage/Photos/2008/Photos/Halloween NYC 2007/DSC01959.JPG /mnt/datastorage/Photos/2008/Photos/SORT ME/DSC01959.JPG
/mnt/datastorage/Photos/2008/Photos/Halloween NYC 2007/DSC01960.JPG /mnt/datastorage/Photos/2008/Photos/SORT ME/DSC01960.JPG

UPDATE: Ran the program with SHA algorithm also and here are the comparison times.

Time for MD5 858,953ms (14.31 minutes)
Time for SHA 1,191,656ms (19.80 minutes)

Bibliography