Making Recommandation on MovieLens Data with Collaborative Filtering

I am doing some research on building recommendation engine.

Among resources I collected, the book Building Recommendation Engine from Packt Pub is a good one with overview on how to engineering recommendation engine and with working samples.

One problem I found in Chapter 5 BUILDING COLLABORATIVE FILTERING RECOMMENDATION ENGINES is the sample code in python doing recommendation on Movie Lens 100k data is that the RMSE values are as high as 11.

A quick google can find some tutorials on the web, doing the same thing on ml-100k data set, using SVD or ALS algorithm have a RMSE lower than 1.0, commonly 0.9x.

In the Chapter 7 of the book, BUILDING REAL-TIME RECOMMENDATION ENGINES WITH SPARK, author using Spark ALS reached 0.9x RMSE on same data set.

It looks the author did not put too much effort polishing Chapter 5’s code sample.

I would recommend following articles and samples over Chapter 5 of this book:

Issue of Windows 7 SP1 Slow Windows Update Checking

For some reason, I installed a virtual machine with Windows 7 Pro SP1.

The initial windows update checking was quite slow, with no progress for a couple of minutes and the cpu was keeping high as 100% by a service host process. Definitely something wrong here.

So I googled and found the solution from here. The key point is install a patch from a  MS KB article: https://support.microsoft.com/en-us/kb/3102810.

After that I received 200+ patches from Windows Update and installed them.

 

Resize HortonWorks Sandbox Disk Size

Hortonworks

I’ve created a HortonWorks HDP Sandbox on Azure for testing.

The initial OS disk size is about 48GB, which is too small to hold my files in HDFS for testing.

I find out a way to extend the OS disk to simply increase the HDFS capacity.

References:

1: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006371

2: https://community.hortonworks.com/content/kbentry/1620/how-to-increase-sandbox-disk-space-virtualbox.html

Steps:

Basically it’s create a new LVM partition then add this partition to existing LVM  OS logic volume group.

  1. Attache a disk in Azure portal to the HDP virtual machine.
  2. Create new partition (/dev/sdc1) on that disk (/dev/sdc) and assign partition id ‘8e’, which is LVM partition.
  3. Use ‘vgdisplay’ to find volume group name, it’s ;vg_sandbox’ on HortonWorks sandbox.
  4. Use ‘vgextend’ to extend ‘vg_sandbox’ with the newly created ‘/dev/sda3’.
  5. Use ‘lvdisplay’ to find logical volume name, it’s “lv_root”, path is ‘/dev/vg_sandbox/lv_root’.
  6. Use ‘lvextend –L +xxxG /dev/vg_sandbox/lv_root’ to extend size of the logical volume. xxxG is the free space in volumn group ‘vg_sandbox’, which can be find using vgdisplay.
  7. Use ‘resize2fs /dev/vg_sandbox/lv_root’ to resize the partition online, this step takes some time.

 

Syncing Calendar from Corporate Exchange to Outlook.com Account on Windows Phone

The requirement is simple, to sync my calendar from my company email account, which runs on Exchange Server and client is Outlook, to the Microsoft account on my Windows Phone.

This KB article stated that on Outlook 2010 or 2007, one can use the Outlook Connector software to do this, which is not my case as I am using Outlook 2013. The article also gives a silly solution for Outlook 2013, which simply need I copy meetings manually from the two calendar.

May I  create a little application to do this automatically?

Change Google Font CDN URL to Speed Up Semantic UI Page Loading

Semantic UI is a great web UI framework that ease UI related works.

The default font used in Semantic UI is google font and it is loaded from Google’s CDN, which had been blocked in China for a long time. This can cause a very long-time of first page loading.

To fix this, one can follow Semantic UI customization guide and make change there. An know issue has been submitted on Github site of Semantic UI.

For those who don’t want to customize it, there is a simple solution: change the semantic.sss in dist folder, change the first line from

@import ‘https://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic&subset=latin’;

to

@import ‘http://fonts.useso.com/css?family=Lato:400,700,400italic,700italic&subset=latin’;

 

The fonts.useso.com is a China local CDN which replicated from the Google one, makes access from China faster.

Ionic Framework with Visual Studio Cordova Tools

Playing IonicFramework with Microsoft Visual Studio Cordova Tools CTP3, I found a series of articles on setup demo IonicFramework app with MS Cordova Tools.

Here is the links:

  1. http://skylore.wordpress.com/2014/06/03/ionic-framework-inside-a-visual-studio-cordova-application-using-typescript-part-1/
  2. http://skylore.wordpress.com/2014/06/05/ionic-framework-inside-a-visual-studio-cordova-application-using-typescript-part-2/
  3. http://skylore.wordpress.com/2014/06/08/ionic-framework-inside-a-visual-studio-cordova-application-using-typescript-part-3/

And this is a brief:

  1. Create a TypeScript Blank Cordova App in VS 2013
  2. Get the Ionic package from NuGet, drag ‘script/ionic-bundles.js’ and ‘content/ionic.css’ to index.html to create script references
  3. Use command ‘ionic start myApp sidemenu’ to create a project scaffold
  4. Copy ‘www/templates’ folder from scaffold project to VS project, let it be the same level of existing ‘scripts’ folder
  5. Create a ‘app’ folder inside ‘scripts’ folder, move index.ts from /scripts into /scripts/app, update reference of index.js in index.html. Copy app.js and controller.js from www/js of that scaffold project to /scripts/app folder.
  6. Add or adjust references of js/ts files to index.html, create a ng-app directive.
  7. Install angularjs.TypeScript.DefinitelyTyped by right click on angular.js in scripts folder and select “Search for Typescript Typings”.
  8. Create a services.ts file in scripts/app folder, create service and inject to Angular
  9. Create a controller.ts file, register service to controller.
  10. Modify template data binding to fetch data from the service

 

For building failure of “The certificate specified has expired” on Windows platform, it’s a bug of Cordova tools. See: http://www.spritehand.com/2014/11/visual-studio-cordova-certificate.html

Install RavenDB with Nuget

在用nuget安装ravendb的时候会发生如下的错误:

Install-Package : Updating ‘System.Spatial 5.2.0’ to ‘System.Spatial 5.0.2’ failed. Unable to find a version of ‘RavenDB.Database’ that is compatible with ‘System.Spatial 5.0.2’.

 

解决的方法是用下面的命令来安装:

Install-Package RavenDB.Database -DependencyVersion Highest

Install-Package RavenDB.Embedded -DependencyVersion Highest

HTC 8X 电信版Windows Phone开启无线热点后,3G网络消失的问题

最近买了个¥1099的HTC 8X电信版Windows Phone,代替了使用了2年的Nokia 800C,准备尝试即将到来的Windows 8.1。

由于Windows Phone 8.1还没有官方推送下来,目前还是在Windows Phone 8.0系统,使用下来发现一个3g网络的bug。

症状是第一次开启wlan移动热点功能后,3G连接消失,可以打电话,但是无法上网。切换飞行模式、开关数据链接和重启手机都无法解决这个问题。

症状和这里描述的一样,使用了wlan热点功能后,3g无法连接。暂时的解决方法在这个帖子里也提到了,就是重新打开wlan移动热点,这是3G链接会恢复,说明这个bug是和wlan移动热点功能相关的。

一劳永逸的结局方法也在这个帖子里,第九楼:“使用HTC自己应用-连接设置,其中运营商选择“中国电信互联网设置CTNET”。

特地记录一下,一定有其他人也会遇到。

林丹的打法——14年中国大师赛观后感

林丹最近的打法看似历史上以前的球员都有过但是感觉又不是那么回事,他这样赢球的方法好像还是没有出现过,给人很独特的印象。所以在这里总结一下:

  • 移动方面是很省力的地板流,没有必要的话基本不会跳离地面很高,不会出现全场打得活蹦乱跳最后抽筋的情况。
  • 中场抽球和防追身球/扑球的处理有独到之处,心态上不慌不忙,反应快而且回球还很硬(这点谌龙也是如此,估计国家队专门抓过这部分的技术)。
  • 处理对手来球时对球路的预判和启动都非常好,拿球点高而且早,方便各种骗和控制。
  • 用出球变化和球路变化控制对手,不仅仅利用动作一致性控制对手,更可以利用手法+停顿来逼迫对手二次启动。
  • 非常吝啬的突击,他的突击能力还在,体能总体而言是比不上08了,但是不随便挥霍的情况下,一场比赛下来突击个10拍8拍的也足够用了。关键分上连续放2-3个大招的能力也还保留着,对手整场比赛中都不得不防,而且一般还防不住。
  • 安全球,21分必备的法宝。他的球在相持多拍中比较注重安全第一的出球思想,网前不求一球搓死对手,而是放远网;吊球对方站位好的话没有好机会就直接快速吊在对方手上。第一拍给对手一个“好打”的球,等着抓下一拍的机会,一点也不怕打多拍慢慢组织。而且这种放网、吊球是会和推球、轻杀高球结合的,对方不敢放手抓他的球。加上他对安全球的后一拍球路的判断和抓球准,往往是这一拍看似安全,下面第二第三拍暗藏杀机。
  • 在击球点的选择上面,网前和后场都不特别追求高点。前场强调尽早启动尽早拿点,用停顿+假动作控制对手。后场借助于小动作发力的能力,在偏低的点也能打出多种精确球路控制对方,后场球不追求进攻效果,也无需耗费体能用大后撤步和起跳来抢点,大大节省了体能。
  • 出球落点准确,弧度多变(这个电视转播上比较难看清),出球的球速快。
  • 打不死的防守,一两拍两三拍的进攻组合是很难打死炒鸡蛋的,往往对手需要组织更多拍才能找到合适的机会一击得手,问题是组织更多拍被炒鸡蛋抓到漏洞的机会或者自己失误的机会比打死炒鸡蛋的机会大很多,非常危险。
  • 对比赛节奏的控制,08年后就少见打得很快的比赛了,11年后高球对拉在炒鸡蛋的比赛里面很常见,以速度见长的选手对上他速度打不上去,或者强行加速一局然后第二局被打个10来分。
  • 利用体能战胜对手,一方面自己的体能在整场球上的分配做得好,最近几年比较少见体能不够用速度上不去的情况;另一面自己的战术(主要是防守+迫使对方二次启动+多拍)使得对手要付出更多体能,所以基本上很少见打到后来对手体能状况比他好的情况。基本断绝了年轻人单凭靠体能冲击他的可能性,除非他年纪再上去几岁,实在是不行了。
  • 利用各种小招数喘息,这个也是管理自己体能和精力的办法。最常见的就是湿身鱼跃擦地大法,一口气缓上来了马上突击两下趁对手不备赚两分。
  • 比赛经验丰富,从出道以来本身就是水平(最)高的选手行列,年轻时候他冲击的也都是高手,加上那时候高手又多,不说国外的,就家里的那些馆主大嘴某届全运会冠军某届奥运会冠军之流穿越到现在横扫个小铭之类的很稳当的。
  • 心理素质问题,关键球关键局不怵,这个没人能比得了,得益于转打太极球的那几年没事就跟对手磨三局磨出来的。外加从某人手上拿下来的关键球又特多,心理素质在炒鸡蛋来说已经不是高或者低的问题了,是根本没有心理素质这个问题。


总体而言他的球全方位地往精确方向去,上面提了很多点,但是核心的指导思想个人的感觉是更经济地打球,自己少付出体能少加速,通过出球的控制来把控比赛的节奏,通过防守来增加对方取分的难度和体能上的付出,可以说是磨控防反的打法。

而以往的打法相对来说更片面一点,往往追求点打得更准,线路拉得更开或者速度上更快让对手跟不上或者假动作直接骗死你或者正手一拍拍死你之类的很单纯的目标,和炒鸡蛋的太极球不是一路的。