Sunday, February 10, 2013

Getting YouCompleteMe to work in RHEL 6 (or Fedora)

No matter what IDE I use, so far I still give up and go back to vim. However, Eclipse is quite missed for it’s autocomplete facility. The recent announcement on Hacker News gave me a glimpse of hope with YouCompleteMe, as with clang complete a while ago, but even sleeker.

But YouCompleteMe uses a number of really new dependencies, which RHEL6, being an old snapshot of Linux, is a bad target candidate since these dependencies are not easily satisfied. Perhaps stopping short of paid support, it’s unlikely that these packages will be made up-to-date. Not being a cheap-ass here, but since I really don’t like waiting, I’ll just have to roll-my-own :-)

Amazingly these dependencies are so new that even vim needs to be rebuilt due to the recently introduced python extensions, so there’s no avoiding some serious source compiling here!

Before we start, let me impart some words of advice. Firstly, even if you are gutsy enough to build your own binaries, never try to reinvent the wheel and build from pristine sources!

Secondly, installing software through package managers (eg. RPMs, .debs) are always superior to “sudo make install”. It ensures you’ll never have lingering dependencies, or wrong versions of libraries that got wrongly linked because the installed script overwrote the default.

For Redhat based OSes, since the guys at Fedora has already done the hard work of building up-to-date packages, applying patches and cleanups for you, it’ll be the best bet to avoid pain from compilation errors, or hours spent in debugging other people’s code.

Pull a copy of the vim.spec file from Fedora Rawhide and re-adapt it to your liking before building an RPM. Obviously, F19 (as this time of writing) has diverged since RHEL6 got snapshotted, so a little patch and backporting is unavoidable.

Then, there’s the issue of Clang, which is recommended to be version 3.2. Even Fedora Rawhide at the moment only supports 3.1, so I was rather surprised about such a new dependency that YouCompleteMe requires. Still it’s not a problem, just a little more hacking on llvm.spec.

Note: my Clang 3.2 build fails a single regression test during building, so I’ve disabled regression testing to allow the RPM to be built. While it may be ok for some other software, but regression test failures on your compiler is a BAD thing, especially if you’re going to build the entire OS from scratch. But since we’re only using it as an annotation tool, I’m going to let it slide.

If building those 2 things hasn’t deterred you yet, you’ll still need to deal with the last headache of building a newer version of CMake, as a dependency that YouCompleteMe require in order to compile the final library. [ But why? :-( ]

Anyway, for people who want to skip the pain of building it yourself, you can get my pre-built and all my vim/llvm RPM dependencies from my RPM repo (if you trust my work ;-) and save yourself some compiling hassles. They probably will work on Fedora as well, since dependencies are usually forward-compatible, but YMMV. Have fun!
Friday, September 14, 2012

Have you ever reused code?

The term 'Code Reuse' feels like a software developer's cliche that had since fallen into the list with other unfashionable tech lexicons. Nevertheless the terminology still lingers on like a bad smell, never fully ready to die off. These days, code reuse feels more like the definition of a myth - a story everybody has heard of, but nobody has witnessed.

If you are ever geeky enough to have raised code reuse as a conversation piece, you'll probably notice that almost everybody have something good to say about it, from a vague feel-good feeling about how good a thing it is, to how it may have profoundly changed a person's life (ok, I exaggerated about this one). If you had to ask anybody for 5 good examples, I'm sure you'll be hard pressed to find anybody with a sensible answer. How about we start with yourself: when was the last time you've reused your own code in a meaningful, substantive way?

These days, the only visible code reuse I know of, is only when I rely on code from a software library - often an external library written by somebody else. Be it a data structure, a fancy graphical widget, or complex mathematical computations, there is probably a library out there which will cater to your need. Writing from scratch is something you never seem to do anymore.

But relying on software libraries is just not my romanticised version of code reuse, the one where the object-oriented programming paradigm had so promised so long ago. Remember the textbook claims on writing your own well-abstracted objects, and how you'll be rewarded in reusing them for all perpetuity? Personally, that lofty promise has certainly fallen short of my expectations from when I was a starry-eyed kid coding in OOP for the first time, to the more experienced software developer today.

So what went wrong? Nothing actually.

Code that has well-defined purposes, inputs and outputs, which are so often used, are easily defined and hence usually gets 'factorised' into code libraries. These libraries get battle-tested by many other developers over time, ironing out any residual kinks, as well as any lingering bugs. Over time, a well-used library makes more compelling sense to use than to roll your own, since it minimises the risk and uncertainty from newly introduced code.

So whatever's that's left for you to work on, are likely new and unique issues you are solving, making it naturally unfactorisable. And if certain portions of code do become apparent enough for you to find a commonality, that's perhaps when you'll refactor your own code to reuse these commonalities, although I suspect the possibility of such situations are getting less likely. Maybe like me, you're feeling a little cheated as well.

Code reuse today is just an euphemism of relying on other people's code - well, it is still reuse, just not your own code, not unless you happen to be a software library writer. But chances are, you are usually not.

I might as well go one step further and declare that we never reuse our own code anymore - as a corollary to the famous Bikeshed Problem. Not all of us will gain sufficient experience in building our own nuclear reactor (or more efficient data structures and algorithms), so what's left remaining is only to focus on the colour of the bikeshed (or button placements within a HTML form) because that's the only thing that's left to do when other people have done all the heavy lifting for you. And that's how it should be, after all, didn't they tell us not to reinvent the wheel?

It's why any boy and his dog today can write an application with some knowledge of HTML, CSS and Javascript - nobody needs to know how to code a rasteriser for transforming vectors into pixels, write their own graphics routines so that they can display a button, input, or to write their own binary search tree in order to use a hashmap, since they don't have to - the first principles of software systems are all conveniently abstracted into libraries, frameworks, and easy APIs that they can use.

It is not a bad thing, but it is also to no wonder why any arts major can simply write a web application and proclaim themselves to be a software developer these days. While I wouldn't mind them doing a webpage for me, I won't go as far to trust a lay-coder on anything that's of any algorithmic complexity.

On the flip side, it's never been better to be a software developer; we are more productive from the assortment of libraries that are at our disposal, from the myriad software frameworks to numerous tools that we utilise today - all of which has allowed us to write software systems that would be difficult in the past, a relative breeze today.

As software development goes these days, we are indeed standing on the shoulders of giants.
Friday, September 07, 2012

Vim: When Copy and Paste doesn't work ...

I used to remember that copying and pasting to the clipboard used to work a long while ago, but I just couldn't remember what exactly did I do in order to get it to work. That were the days where I was still bothered enough to tweak things to get it running - these days I just want to get things to work, which many will arguably retort that what I'm asking for is impossible.

While Linux may be perceived as still a much less-accessible OS compared to the ones that you have to pay for, it actually works pretty well once you've gone past the learning curve. But from time-to-time, you can still be surprised with what you don't know. It can feel like a hassle sometimes, but on the flip-side, learning new things is what makes using it fun. (Sure many may disagree on that too - and if you are one of them, I'll save you the torture; shut your brain down and go back to surf your Facepage instead ;p)

I remembered about having to recompile vim from dog years ago, and started to look at what's missing in the standard vim-enhanced package; lo-and-behold, the version flag shows the following:

$ /usr/bin/vim --version
VIM - Vi IMproved 7.3 (2010 Aug 15, compiled Nov 16 2010 17:05:25)
Included patches: 1-56
Modified by 
Compiled by 
Huge version without GUI.  Features included (+) or not (-):
+arabic +autocmd -balloon_eval -browse ++builtin_terms +byte_offset +cindent 
-clientserver -clipboard +cmdline_compl +cmdline_hist +cmdline_info +comments 
+conceal +cryptv +cscope +cursorbind +cursorshape +dialog_con +diff +digraphs 
-dnd -ebcdic +emacs_tags +eval +ex_extra +extra_search +farsi +file_in_path 
+find_in_path +float +folding -footer +fork() +gettext -hangul_input +iconv 
+insert_expand +jumplist +keymap +langmap +libcall +linebreak +lispindent 
+listcmds +localmap -lua +menu +mksession +modify_fname +mouse -mouseshape 
+mouse_dec +mouse_gpm -mouse_jsbterm +mouse_netterm -mouse_sysmouse 
+mouse_xterm +multi_byte +multi_lang -mzscheme +netbeans_intg -osfiletype 
+path_extra +perl +persistent_undo +postscript +printer +profile +python 
-python3 +quickfix +reltime +rightleft +ruby +scrollbind +signs +smartindent 
-sniff +startuptime +statusline -sun_workshop +syntax +tag_binary 
+tag_old_static -tag_any_white -tcl +terminfo +termresponse +textobjects +title
 -toolbar +user_commands +vertsplit +virtualedit +visual +visualextra +viminfo 
+vreplace +wildignore +wildmenu +windows +writebackup -X11 -xfontset -xim -xsmp
 -xterm_clipboard -xterm_save 
   system vimrc file: "/etc/vimrc"
     user vimrc file: "$HOME/.vimrc"
      user exrc file: "$HOME/.exrc"
  fall-back for $VIM: "/etc"
 f-b for $VIMRUNTIME: "/usr/share/vim/vim73"
Compilation: gcc -c -I. -Iproto -DHAVE_CONFIG_H     -O2 -g -pipe -Wall  -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64  -D_FORTIFY_SOURCE=1     
Linking: gcc   -L.  -rdynamic -Wl,-export-dynamic  -Wl,--enable-new-dtags -Wl,-rpath,/usr/lib64/perl5/CORE   -L/usr/local/lib -Wl,--as-needed -o vim       -lm -lnsl  -lselinux  -lncurses -lacl -lattr -lgpm -ldl    -Wl,--enable-new-dtags -Wl,-rpath,/usr/lib64/perl5/CORE  -fstack-protector  -L/usr/lib64/perl5/CORE -lperl -lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc -L/usr/lib64/python2.7/config -lpython2.7 -lpthread -ldl -lutil -lm -Xlinker -export-dynamic   -lruby -lpthread -lrt -ldl -lcrypt -lm   

The compiler flag xterm_clipboard isn't compiled with standard text mode vim, that was the main reason that I had to recompile vim in the past!

But these days, I'm lazy. I much rather not have to recompile and maintain my own packages if I have to, and it turns out that I'm in luck - a bit of digging showed that the vim-X11 package contains vimx, a version of vim that has the xterm_clipboard flag enabled. Happy days!

So just do:

$ sudo yum install vim-X11
$ alias vim=$(which vimx)

The alias command just makes it easier given I'm so used to typing vim than vimx, so that I don't have to undo my habit :)

So how do you make use of the clipboard? Let say you have mouse mode on (set mouse=a), and selected some text using your mouse; in order to send it to the clipboard, do


Note that the quote isn't a typo. To paste from the clipboard into vim, do:


Bonus trick. You can make your selection in normal mode automatically be sent to the x11 clipboard by making this configuration:

set go+=a

Have fun! :D
Friday, August 31, 2012

Adding extra jar files to Ant path in Fedora/RHEL

The default RPM packaged version of 'Ant' that comes with Fedora/JPackage doesn't doesn't respect the $ANT_HOME environment variable the same way as if you have downloaded and installed it directly from Apache itself.

These days, having a little more to do with J2EE work as they are good sample applications for testing our JVM, I'm having to pick up various build tools that I don't normally use, like Apache Ivy and Maven. Ivy works as an additional jar to supercharge Ant's capabilities, and hence a post as a self reminder. There are essentially 2 ways of accomplishing the task:

1) Put "ivy.jar" into our custom development distro by default into /usr/share/ant/lib. This is a nice option for all developers, since we won't need to do anything extra for it to work. But it isn't tracked by package management (ie not an RPM), and neither should developers put in the file into /usr/share/ant/lib just because we have local superuser rights, since management of these issues should be done by the sysadmin, automatically if possible.

2) Workaround this situation by having a local override of the Ant configuration. Create the following directory structure in your $HOME/.ant directory, eg.

[vincentliu@workstation08 ~]$ tree $HOME/.ant
|-- ant.conf
`-- lib
     `-- ivy.jar

In the ant.conf file, have the following lines:

[vincentliu@workstation08 ~]$ cat $HOME/.ant/ant.conf
# Need to override the existing $ANT_HOME path that JPackage customized
# to add in the ivy package as part of Ant's classlib

Copy the ivy.jar file into $HOME/.ant/lib directory. And these changes will allow you to compile use Apache Ivy natively without littering multiple copies of it per project. 
Friday, August 24, 2012

Google Drive does not work if your network is slow

I solely use Google Docs, erm Drive, for working with documents these days. While it's named "Drive" now, I'll refer it as the old incarnation "Docs" as it's the document editor that I'm about to rant here.

Google Docs as an editor, is simple to use, and is very accessible - there is no need to install any specific software for it, all you need to do is to open it up through the web browser. But the best thing I like about it, is that I can edit the document without a care and not have to worry about saving the document somewhere so that I can resume editing elsewhere later. Everything is available as long as I have access to the Internet.

Now, that's all fine and dandy, except if you have a "slow" connection. And when I say "slow", I don't mean the archaic 56kbps speeds back in the heyday where people still dial-up a modem connected to a copper phone line. Slow in Google's context, apparently meant anything at mobile broadband speeds (@1mbps).

Google Docs had been working fine, prior to the fairly recently change they've introduced the "we'll save as you type" feature. The old Google Docs wasn't that bandwidth hungry since saving the document was in coarser time blocks instead of the consistent synching that they are doing now.

With the recent changes, Google Docs appear to either suck up more bandwidth, or have lower latency requirements that my humble mobile broadband dongle does not appear to satisfy anymore. For whatever I type in, after 2 minutes working into the document, Google Docs will just hang at "Saving..." and then produce this screen:

This error is consistently reproducible, and it's not even a complex document we're talking about here - it's essentially a text file editable by vim that I copy and paste into sometimes. I don't get how Google gets this so wrong - we're talking about a document editor for a simple file, for god's sake, what kind of network requirements do you need in order to make it work?!
Saturday, March 17, 2012

Happy St. Patricks!

It's been busy, but I haven't forgotten. Shout out to all my friends: Work has been busy, but life is chugging along. Will need to catch up with everyone soon. Till then, remember your friend in the Emerald Isle :D
Monday, July 11, 2011

The x86_64 Calling Convention

I suppose I can consider myself an 'old-school' developer now; even though I have been reading the AMD64 ABI documentation, I still haven't fully absorbed it into my head yet, which is evidenced by the recent two situations I had today where RTFM-ing would have had saved me hours of GDB debugging pain.

I have been coding some assembly instructions to make C-calls at runtime to a debugging routine, but the call seems to always ends up mysteriously trampling the JIT-ed routines, making the VM take unexpected execution paths and causing some unlikely assertions to be fired.

The situation is confounded by a number of issues:
  1. the code generated is dynamic, and therefore there are no debugging symbols associated with them compared to code typically generated by the assembler/compiler;
  2. there are different types of call-frames for a given method; 1 for a pre-compiled stub, 1 for a frame that's crossed-over from JIT-ed code to native code, and 1 for the JIT-ed code itself;
  3. when the eventual assertion does manifest, the code is already far away in the rabbit-hole from where the original problem manifested. And because some of the JIT-ed code actually makes a "JMP", unlike a "CALL", you can't actually figure out where the code originated from, since %rip is never saved on the call stack.
While situations 1 and 2 make debugging difficult by having the need to keep a lot of contextual information in order to figure out what's going on, situation 3 is just impossible to debug if the bug is non-deterministic in nature. For example, each compiled method in the VM generates a small assembly stub that replaces the actual code to be executed; when the stub gets executed for the first time, it triggers of the JIT compiler at runtime to compile the real method from its intermediate representation. The compiled method then replaces the stub, hence subsequent invocations will simply call the already JIT-generated method, thereby executing at native-speed, like just as you would get on compiled code.

To optimise on space, the stubs are made as small as possible (~20 bytes), and the common execution body shared by all stubs is factored into a common block. All stubs will eventually perform a global "JMP" instruction to this common block. In order to faciliate communication, all shared data between the stub and the common code block is passed on the thread stack, where the common offset to the method handle is agreed upon. 

While the design is elegant, it is also impossible to debug when it breaks; the non-deterministic-ness of the bug seems to surface from time-to-time, where it seems to suggest that the thread stack got corrupted or that it's not passing the method handle correctly. Even when GDB is left running, by the time the assertion triggers, it's already past the fact, and therefore it is unable to trace back to the originating path.

I thought it might be a good idea to inject some debugging calls to trace the execution and stack pointer at runtime, so that I can figure out which stub was last called and the stack height when the call was made; the two information combined should give me sufficient hints on where the problem might lie. However, my injected code has introduced two other issues that I had overlooked, which brings me back into the discussion of the x86_64 ABI again; if you ever wanted to template any assembly instructions into your code that relies on an external library call, do keep these 2 points from the ABI specification in mind:
  1. Save ALL caller-saves registers, not just only the ones that you are using.
  2. (§3.2.2) The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.
I have to say that I've dismissed (1) since I've gotten use to the style of only documenting and saving the registers that was used; the convention was something that I had picked up from Peter Norton's 1992 book, "Assembly Language for the PC". For those who don't know, he's the "Norton" that Symantec's Norton Antivirus is named after. I still have the out-of-print book on my desk as a keepsake; it reminds me of the the memories of reading it and scribbling code on a piece of paper at my local library. Remarkably, that was how I learnt assembly, since I didn't have a computer back then. Thumbing through the book today, I still have an incredible respect for Peter's coding prowess. He had a way of organising his procedures so elegantly such that each of them all fitted perfectly together from chapter to chapter.

Sorry, got sidetracked. So yes, point (1) - to save ALL registers; this is necessary because all caller-saved registers can actually be occupied by the JIT routines as input arguments to the callee; while this typically means the 6 defined registers (%rdi, %rsi, %rdx, %rcx, %r8, %r9) for general input (see §3.2.3), other registers can also be trashed upon a call return, so as a rule-of-thumb save everything, except the callee-saved registers (%rbx, %rbp, %r12 to %r15), which are guaranteed to be preserved.

Point (2) - I haven't observed a reproducible side effect from this; however the failure points between adhering to it and not actually causes a visible difference in the JIT-ed code's path; therefore there is a need to be on the side of caution. I seem to have observed that some memory faults from not following this directive, but I can't ascertain this for a fact yet.

Finally, a self-inflicted bug that I'd like to remind myself of; remember make sure to deduct from %rsp if any memory has been written onto the thread stack; otherwise any function calls may unknowingly overwrite it!

For all the trouble with debugging that I've gotten myself into, there is at least a silver-lining to it; I had made the problem deterministic, or if it isn't the same problem, it was a similar class of problem that I can consistently reproduce to analyse its behaviour and learn from the mistakes I have been making. Because of the determinism, I was able to use GDB's reversible debugging feature to record the execution from the stub to the common code to gain a better understanding of how the generated code actually works. It's a really nifty feature, and I'm glad to have it as my first useful case of applied reversible debugging in practice.