Just another site

I want to know more about Window Identity Foundation today.  I’ve found lots of videos available online.  My favorite one is this:


Following the Cloudera Hadoop Course

Since I’ve last hit a road block last time I’ve tried to follow the Hadoop – a Definitive Guide book, I have not done much with it.  My time were spent working on learning developing iOS apps for iPad.  Today I went back to the Cloudera site and found a very nice course on the subject.  I like the format and it explains everything very clearly and with illustration, kind of like the Kahn Academy for Hadoop.  🙂  I’ve went through 2 classes tonight.!/c-ud617/l-308873796/m-416359060

Practice Java Tonight

Tonight I forced myself to sit down and practice Java programming for 3 hours.  For me, the best way to learn is through hands-on experience.  I wrote a package that parses a text file, sort and count occurrence of each word.  i’ve found out that Java String.split method and the C# version of the same behave differently.  The former takes a regular expression and the latter takes a array of string that contains the splitting characters, or an array of splitting characters.  The C# RegRx.Split behaves like the Java split.

Trying to Complie Hadoop Sample Code with Eclipse

I am having trouble building and running the sample code from Hadoop – The Definitive Guide.  The problem is that I am not familiar with how to set the CLASSPATH for the Java compiler.  Found this webcast from Cloudera:

So, I’ve created a project and add the downloaded sample code.  Then I click on the Properties of the project:


Click on “Add External Jars…”

Browse to /user/lib/hadoop/lib and select all the jar files


Add more jar files…


After that, all the jar files show up in my project folder, making it really cluttered.


At this point I gave up and call it a night.

The next night, after being able to compile the WordCount example from Cloudera in command line, it gave me an idea – what if I add the hadoop_mapreduce library in this project as well?  Turned out it works.  The ~ signs under the import statement initially stayed, then disappeared.  Oh, I also need to switch to J2SE-1.5 JRE library.  Eclipse asked me if I want to switch so I am guessing the sample code is using the old JRE calls. Now, all I’ve left is the warning about Job() is deprecated.

Job job = new Job();

At this point, I am going to table it and learn from Cloudera or some other places such as Horton Works.

In doing my research, I’ve found out that you can use Eclipse to run Hadoop code in its IDE.  This posting: let you download the Eclipse Hadoop plug-in, or build your own plug in.  I think I am going to try it and test it out with WordCount sample code.

Compiling Cloudera WordCount Hadoop Sample Code

Since I’ve been having trouble building the sample code from the Hadoop – The Definitive Guide, using Eclipse.  I thought I’d go back to the Cloudera site and built it’s sample code, following the instruction from here:

However, since the version of Hadoop has changed, I am using Hadoop 2.0.0-cdh4.5.0, the ClassPath is different, too.  When I try compiling

$ mkdir wordcount_classes
$ javac -cp <classpath> -d wordcount_classes
where <classpath> is:/usr/lib/hadoop/*:/usr/lib/hadoop/client-0.20/* 

I get errors: package org.apache.hadoop.mapred does not exist
import org.apache.hadoop.mapred.*;
^ cannot find symbol
symbol : class MapReduceBase
location: class org.myorg.WordCount
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

Obviously, we need more jar file to be included in the ClassPath.  After checking with the installed directories, this works:

javac -cp /usr/lib/hadoop/*:/usr/lib/hadoop-mapreduce/* -d wordcount_classes

After that, I can create the jar file, run the application, and exam the results, just like what’s in the instruction.

How to set the “From” attribute for a PhoneCall entity using Dynamics CRM 2013 SDK

In working on my Incident Management POC, I’ve found that I need to set the phone call’s “From” attribute to a contact programatically and I had some trouble finding out how to do that.  The following is the definition of the “PhoneCall.From” property in C#:

public System.Collections.Generic.IEnumerable<UIM.ActivityParty> From

I search online for code samples and couldn’t find one.  This is what works for me to set the “From” attribute for a PhoneCall entity with phonecallId, with a contactId, after a lot of try/error:

var myPhoneCall = myContext.PhoneCallSet.Where(p=>p.Id == phonecallId).Single();

ActivityParty actvityParty = new ActivityParty()
ActivityId = new EntityReference(PhoneCall.EntityLogicalName, phonecallId),
PartyId = new EntityReference(Contact.EntityLogicalName, contactId)
myPhoneCall.Id = phonecallId;
myPhoneCall.From = new List<UIM.ActivityParty>() { actvityParty };

Trying to build a help desk service using Dyanmics CRM 2013

The past week I’ve been consumed by my work around Microsoft Dynamics CRM 2013. Everybody was still on vacation and I stepped up to do develop the POC. My hope is to learn something and get my developer’s mojo back.

I did get a lot better at writing code, especially Linq, since that’s how I access the entities from CRM SDK.

The routing scenario in the POC is like this:

  • A customer calls the help desk. He/she dials the support hotline and punches in the selection to answer the questions asked by the IVR system path. The selections eventually lead the call to a queue.
  • The system creates a phonecall record in the CRM and a case record. The phonecall’s regading object is set to the case. Based on the info collected through IVR, the system knows this much about the case:
    • support language
    • type of service request (questions, complaints, or other types)
    • country of support
    • product family
    • other attributes…
  • These selections create a “property bag” around the case. We can use it to match the preset “property bag” for the queues in CRM. If there is a match, the case gets assigned to the queue.
  • An agent belongs to a team. The team has its own property bag. When the team’s property bag matches that of a queue, the agents in the team gets to see the queue and all the open items in the queue are up for grab by the agents.

I am not sure how a case gets routed in the out-of-box implementation of Dynamics CRM but so far I know:

  • A queue contains queuedItems.
  • a queuedItems has a n:1 relationship to any entity that’s queue-enabled: Appointment, Campaignactivity, CampaignResponse, Email, Fax, Incident, Letter, PhoneCall, RecurringAppointmentMaster, ServiceAppointment and Task.
  • The assigning of Incident (case) to Queue needs to be based on some kind of logic but I am not finding OOB support from the CRM front-end. I’ve built my own phonecall simulator to do the property bag matching logic like above but would like to learn if there’s way to do this without coding.

Other than the above, I’ve also explored using the Subject entity in CRM to representing the hierarchical reason for user’s call. So, I have the top-level subject called “Reason for Contact” that has children of major reasons for calling. Each of the child reason can have more detailed classification in its own branch. It looks good in the CRM UI. However, if I am using the same Subject entity for something else, like representing my product taxonomy, I want to have a way to configure the branch for the Incident form so that “Reason for Contact” will only show the branch under “Reason for Contact” record and not the product tree. I wish I know how to do this.

Getting Ready to Learn Hadoop

I had previously installed and configured Hadoop and got the Hadoop – the Definitive Guide by Tom White.  The chapter 2 of the book talked about writing a map/reduce program to process the NCDC Weather Data.  So, first, I need to download the data files.

It turns out not that straight-forward.  The book’s companion website, only have 2 gzip files for 1900 and 1911, each contains only 1 file.  The NCDC website has tons of data for download, some are free and some are not.  I went online to see anyone else has shared how they downloaded the files.  This site provides the answer that’s closest to what I need.!topic/nosql-databases/LXgRfqo7H-k

I’ve created a shell script (hey, a chance to learn that, too!) and execute it to grab data from 1990-2012.  It works, but little did I know how long it will take.  So far the script has been running for 1 full day, and we are still downloading the weather data from 1998.  The shell script is as below:

for i in {1990..2012}
cd ~/hadoop-book/data
mkdir $i
wget -r -np -R index.html*$i/
cp$i/*.gz ~/hadoop-book/data/$i/

The script is supposed to skip the index.html* files but it still downloads it.  Never mind that.  At the end of the execution, I will have a hadoop-book/data folder and one folder per year from 1990-2012.  The folders will contain many, many small gzip files, one from each weather station.

I also downloaded the source code from  It took me a long time to find the “Download” button on the right-hand side.  The Readme tells me that I need to download Maven.   Took me a little bit of research to find this site ( tells me I need to edit a file.

The whole process I came up with is as follows:

  1. sudo -H gedit /etc/apt/sources.list

  2. Add the following line the sources.list file:

    deb precise main

    deb-src precise main

  3. sudo apt-get update && sudo apt-get install maven3

  4. sudo ln -s /usr/share/maven3/bin/mvn /usr/bin/mvn

Then I ran

mvn package -DskipTests -Dhadoop.version=1.1.1

Initially, it gave me an error saving my JAVA_HOME is not set correctly.  I’ve fixed it by

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

Also modified my .bashrc file so I will have it for my next session.  After that, I ran the same mvn command and it started building everything.  I am not sure how it knows but I guess it’s because maven is supposed to do that.  In any case, it started working away and print out a lot of information on the screen.  This is a fragment of it:

[WARNING] Replacing pre-existing project main-artifact file: /home/mei/hadoop-book-master/ch13/target/ch13-3.0.jar
with assembly file: /home/mei/hadoop-book-master/ch13/target/../../hbase-examples.jar
[INFO] ————————————————————————
[INFO] Building Chapter 14: ZooKeeper 3.0
[INFO] ————————————————————————
[INFO] — maven-enforcer-plugin:1.0.1:enforce (enforce-versions) @ ch14 —
[INFO] — maven-resources-plugin:2.6:resources (default-resources) @ ch14 —
[INFO] Using ‘UTF-8’ encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/mei/hadoop-book-master/ch14/src/main/resources
[INFO] — maven-compiler-plugin:2.3.2:compile (default-compile) @ ch14 —
[INFO] Compiling 10 source files to /home/mei/hadoop-book-master/ch14/target/classes
[INFO] — maven-resources-plugin:2.6:testResources (default-testResources) @ ch14 —
[INFO] Using ‘UTF-8’ encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/mei/hadoop-book-master/ch14/src/test/resources
[INFO] — maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ ch14 —
[INFO] No sources to compile
[INFO] — maven-surefire-plugin:2.5:test (default-test) @ ch14 —
[INFO] Tests are skipped.
[INFO] — maven-jar-plugin:2.4:jar (default-jar) @ ch14 —
[INFO] Building jar: /home/mei/hadoop-book-master/ch14/target/ch14-3.0.jar
[INFO] — maven-assembly-plugin:2.2.1:single (make-assembly) @ ch14 —
[INFO] Reading assembly descriptor: ../book/src/main/assembly/jar.xml
[INFO] DeleteGroup.class already added, skipping
[INFO] ListGroup.class already added, skipping
[INFO] ResilientActiveKeyValueStore.class already added, skipping
[INFO] CreateGroup.class already added, skipping
[INFO] ResilientConfigUpdater.class already added, skipping
[INFO] ConnectionWatcher.class already added, skipping
[INFO] JoinGroup.class already added, skipping
[INFO] ActiveKeyValueStore.class already added, skipping
[INFO] ConfigUpdater.class already added, skipping
[INFO] ConfigWatcher.class already added, skipping
[INFO] Building jar: /home/mei/hadoop-book-master/ch14/target/../../zookeeper-examples.jar
[INFO] DeleteGroup.class already added, skipping
[INFO] ListGroup.class already added, skipping
[INFO] ResilientActiveKeyValueStore.class already added, skipping
[INFO] CreateGroup.class already added, skipping
[INFO] ResilientConfigUpdater.class already added, skipping
[INFO] ConnectionWatcher.class already added, skipping
[INFO] JoinGroup.class already added, skipping
[INFO] ActiveKeyValueStore.class already added, skipping
[INFO] ConfigUpdater.class already added, skipping
[INFO] ConfigWatcher.class already added, skipping
[WARNING] Configuration options: ‘appendAssemblyId’ is set to false, and ‘classifier’ is missing.
Instead of attaching the assembly file: /home/mei/hadoop-book-master/ch14/target/../../zookeeper-examples.jar, it will become the file for main project artifact.

It build a lot of jar file

[INFO] ————————————————————————
[INFO] Building Hadoop: The Definitive Guide, Example Code 3.0
[INFO] ————————————————————————
[INFO] ————————————————————————
[INFO] Reactor Summary:
[INFO] Hadoop: The Definitive Guide, Project …………. SUCCESS [7.850s]
[INFO] Common Code ………………………………… SUCCESS [1:02.842s]
[INFO] Chapter 2: MapReduce ………………………… SUCCESS [1.007s]
[INFO] Chapter 3: The Hadoop Distributed Filesystem …… SUCCESS [2.791s]
[INFO] Chapter 4: Hadoop I/O ……………………….. SUCCESS [1.962s]
[INFO] Chapter 4: Hadoop I/O (Avro) …………………. SUCCESS [27.794s]
[INFO] Chapter 5: Developing a MapReduce Application ….. SUCCESS [1.647s]
[INFO] Chapter 7: MapReduce Types and Formats ………… SUCCESS [1.690s]
[INFO] Chapter 8: MapReduce Features ………………… SUCCESS [1.943s]
[INFO] Chapter 11: Pig …………………………….. SUCCESS [12.575s]
[INFO] Chapter 12: Hive ……………………………. SUCCESS [28.504s]
[INFO] Chapter 13: HBase …………………………… SUCCESS [29.176s]
[INFO] Chapter 14: ZooKeeper ……………………….. SUCCESS [0.618s]
[INFO] Chapter 15: Sqoop …………………………… SUCCESS [16.455s]
[INFO] Chapter 16: Case Studies …………………….. SUCCESS [0.949s]
[INFO] Hadoop Examples JAR …………………………. SUCCESS [0.894s]
[INFO] Snippet testing …………………………….. SUCCESS [4.474s]
[INFO] Hadoop: The Definitive Guide, Example Code …….. SUCCESS [0.000s]
[INFO] ————————————————————————
[INFO] ————————————————————————
[INFO] Total time: 3:23.641s
[INFO] Finished at: Sun Dec 29 00:54:02 PST 2013
[INFO] Final Memory: 158M/562M
[INFO] ————————————————————————

Now I  am supposed to be able to run it, but I don’t know how yet.  I am also waiting for the data files to be downloaded.

Back to Learning Microsoft Technologies

The last week was hectic.  At work, the team needs to create a POC project that uses Microsoft Dynamics CRM 2013 for case management.  Since Christmas is coming, almost everyone took off for the holidays.  Right before that, I volunteered to configure or create the CRM entities, based on the logical data model I’ve been worked on for the last 5 months.

So, I’ve downloaded the Dynamics CRM SDKs, learned how to query the entities and perform CRUD operations using the early-bound (with code generated by CRMSvcUtil.exe) object model.  I’ve also tried using the late-bound API to do the same.  In the configuration side, I’ve learned how to modify the existing entities by adding fields, relationships, and forms.  In addition, I’ve learned how to create a new custom entities.

When it comes to the data, I’ve use the settings feature to first download a Data Import Template (an XML file).  The fields in the template is based on the form I’ve created for the entity.  I then open the XML file in Excel and populate the template with real data rows.   I saved the file and use the Import Data wizard from the Settings feature to import the data rows into the CRM entity.  It then walk my through the steps where I can define the mappings between the Excel file columns and the target entity’s fields.  What’s interesting about it is that, for the “lookup” column, you can put in any field that can uniquely identify a record in the lookup entity.

My next step, after complete the configuration of entities in CRM, is to write a test harness or client app that perform simple case management activities.  The PM may not want me to do it, but, I think I should just do it myself and learn something.  I am thinking of creating a Lightswitch app to connect the OData services exposed by Dynamics CRM.  This article has good info:  There are lots of samples here.

I’ve noticed that Lightswitch let you create application based on the data endpoint.  The choices (for VS 2013) can be either database, SharePoint, OData Service, or WCF RIA Service.  The last one piqued my interest so I am going to try it.  I don’t remember whether I’ve heard about the WCF RIA Service before or not.  This post has a good walkthrough that I am going to try soon:

I have not done any extensive application development since 2007 because my focus shifted to BI.  There were so much to learn and I can say I am 80% on top of the Microsoft BI stack now.  With my renewed interest in coding, there are so much to catch up.

MongoDB Driver Import Issue – Resolved

Tonight my younger daughter had a orchestra concert at her school.  I went and sat on the gym bench reading “Hadoop: The Definitive Guide”.  Even though I cannot do some hands-on exercises, it gets me started to see the actual Map/Reduce Java code.

After the concert I continue to work on my first Java MongoDB program.  Since the issue was that the interpreter cannot find the MongoDB Java driver, I re-downloaded mongo-java-driver-2.11.3.jar file and place it in the /usr/lib/mongodb folder (created by me).  I am not sure if it’s the right folder but I gave it a try.  Some blogger suggested to put it in the JAVA_HOME folder.  That led me to discover that I didn’t have the variable set.  After some searching and I’ve found that I have multiple versions (6 and 7) of openjre jar files in /user/lib/jvm  folder.  So, I set that variable in my .bashrc file for future use.

I’ve found in the forum (, someone has a similar question and this is the answer that worked:

In project explorer right click on the folder/package where you want to import the JAR. Then choose “Import”. In the import wizard, select General > File system. Browse for the JAR and select it.Now you imported the JAR, but before you can use it, you need to add it to the classpath. Once again in project explorer right click on the project root folder, select Properties. In the properties menu select Java Build Path. In Java Build Path open the libraries tab and click on “Add JARs…” and browse for the JAR you just imported.

If you do not wish to actually import the package to your project but just use it, then you can skip the first part of this instruction and directly go to the “Libraries” tab in the Java Build Path. This time you should click on “Add external JARs…” and browse for the JAR in your file system.

I imported the jar file by selecting the Import option after right click on the project icom in Package Explorer, then select General/File system and browse to the folder.  After the selection, the list boxes on the right showed the jar files but I check the folder on the left list box.  I guess it means importing all the jar files in that folder.  

I didn’t set classpath but it worked.  I’ve added some code to add documents to the collection and iterate through the documents in the collection to print out the objects.

I like how the driver support  DBObject that serializing into JSON format.


BasicDBObject doc = new BasicDBObject("name", "MongoDB").
                              append("type", "database").
                              append("count", 1).
                              append("info", new BasicDBObject("x", 203).append("y", 102));


    for (DBObject obj : col.find()) {

The result is:

{ “_id” : { “$oid” : “52b146b8e4b04530a89967c2”} , “name” : “John” , “type” : “Singer” , “lastname” : “Lennon” , “info” : { “x” : 203 , “y” : 102}}

I guess they might have the same thing for C# and I just haven’t came across it.  So far I’ve only tried the Linq driver, which does require you to define a POCO class to represent it’s schema.