painting tgg

Introducing CORBA

What is CORBA?

The wikipedia page says it all:

The Common Object Request Broker Architecture (CORBA) is a standard
defined by the Object Management Group (OMG) that enables software
components written in mutiple computer languages and running on
multiple computers to work together (i.e., it supports mutiple

The standard was created twenty years ago, in 1991. The technology can be used in many programming languages: Python, Java, C++, C, etc.

Hence you can write a software in your favorite programming language, exposing services that others will be able to call using a different language. (Nowadays you can do the same with WebServices. This article is about CORBA.)

All CORBA specifications are available on [OMGSPEC]; latest version is [CORBA3.1].

[MCHALE] book is a very good way to learn CORBA.

How does it work?

CORBA is language agnostic because it relies on its own language: the Interface Definition Language (IDL for short). The IDL defines the contract between the client (software calling the service) and the server (software implementing the service). The service is exposed through a remote object.

IDL is used to define:

  • modules, which are called namespaces in C++ (or packages in Java)
  • structs, which are like C structs: they can contain data only (the data part of the contract)
  • interfaces, which contain methods specification (the service part of the contract)
  • exceptions that can be raised by a method
  • typedefs which are type aliases
  • unions, à la C
  • enums, enumerations
  • constants

IDL comes with existing types:

  • char, octet
  • string, wstring
  • short, long, long long (signed if unspecified, unsigned if prefixed with unsigned)
  • float, double, long double
  • arrays
  • sequences
  • any which can be any existing or user-defined type

From this contract, the following code will be generated using a dedicated tool:

  • client-side code, called stubs
  • server-side code, called skeletons

So to call a CORBA remote service in a given programming language, we will generate stubs from the IDL, then write code relying on these stubs. To provide a CORBA service in a given progamming language, we will generate skeletons from the IDL, then extend them with our implementation.

The dispatching and networking machinery is handled by the Object Request Broker (ORB for short). That's the most important part of the CORBA system: it's the component that hides away remoting aspects to make distant objects appear as if they were embedded within the local, client-side code.

Telecom Log Service example

The Telecom Log Service [TLOG] is an OMG specified service that defines two interfaces:

  • Log, to add, search and delete records
  • LogMgr, a Log container to lookup a given log, or list all contained logs

The specification details five sub-interfaces of the log service, grouped into three different approaches:

  • basic implementation. This is the service depicted below.
  • event-based implementation. The Event Service [EVENT] is another CORBA service specification. Combining both services means that software emitting events can be plugged to an Event Log without modification.
  • notify-based implementation. The Notification Service [NOTIF] is another CORBA service specification, built on top of the Event Service, with added filtering facility. Again, combining these two services provide easier integration with existing software.

Here is the UML representation of key base interfaces (in yellow), basic implementation (in blue) and important date structures (in green):

Here is the simplified IDL for these parts:

#include <TimeBase.idl>  // IDL can include each other
#pragma prefix "" // extra prefix to use in front of type names

// DsLogAdmin definition
// Because of the #pragma prefix above, the complete name of this module
// is
module DsLogAdmin
        // E X C E P T I O N --------------------------------------------------
        exception InvalidGrammar {};                    // empty exception
        exception InvalidConstraint {};                 // empty exception
        exception LogFull { short n_records_written; }; // one attribute
        exception LogOffDuty {};                        // empty exception
        exception LogLocked {};                         // empty exception
        exception LogDisabled {};                       // empty exception

        // T Y P E   A L I A S ------------------------------------------------
        typedef unsigned long LogId;
        typedef sequence<LogId> LogIdList; // LogIdList is a sequence of LogId
        typedef unsigned long long RecordId;
        typedef string Constraint;
        typedef TimeBase::TimeT TimeT; // TimeT type defined in TimeBase module
        typedef sequence<any> Anys;    // Anys is the name of a sequence of any

        // S T R U C T U R E S ------------------------------------------------
        struct NVPair {
                string name;
                any    value;
        }; // <-- Mandatory semi-colon
        typedef sequence<NVPair> NVList; // Give a name to a sequence of NVPair

        struct LogRecord {
                RecordId id;
                TimeT    time;
                NVList   attr_list;
                any      info;
        typedef sequence<LogRecord> RecordList;

        // I N T E R F A C E S ------------------------------------------------
        // Iterator interface specification
        interface Iterator {
                // get() method:
                // - two input parameters
                // - returning a list of LogRecord
                // - possibly raising InvalidParam exception
                RecordList get(in  unsigned long position,
                               in  unsigned long how_many)
                        raises (InvalidParam);
                void destroy();
        };  // <-- Mandatory semi-colon

        // Forward declaration
        interface LogMgr;

        interface Log
                LogMgr my_factory();

                LogId id();

                // retrieve() method:
                // - two input parameters
                // - one output parameter
                // - returning a list of LogRecord
                // This method will actually return two objects: the
                // RecordList and the Iterator
                RecordList retrieve(in  TimeT from_time,
                                    in  long how_many,
                                    out Iterator i);

                RecordList query(in  string grammar,
                                 in  Constraint c,
                                 out Iterator i)
                        raises (InvalidGrammar, InvalidConstraint);

                unsigned long delete_records(in  string grammar,
                                             in  Constraint c)
                        raises (InvalidGrammar, InvalidConstraint);

                void write_records(in  Anys records) 
                        raises (LogFull, LogOffDuty, LogLocked, LogDisabled);
        typedef sequence<Log> LogList; // Give name LogList to sequence of Log

        // inteface BasicLog derives from interface Log
        interface BasicLog : Log {
                void destroy();

        interface LogMgr
                LogList list_logs();
                Log find_log(in LogId id);
                LogIdList list_logs_by_id();

        // inteface BasicLogFactory derives from interface LogMgr
        interface BasicLogFactory : LogMgr
                BasicLog create(
                        in  LogFullActionType full_action,
                        in  unsigned long long max_size,
                        out LogId id)
                        raises (InvalidLogFullAction);

                BasicLog create_with_id(
                        in  LogId id,
                        in  LogFullActionType full_action,
                        in  unsigned long long max_size)
                        raises (LogIdAlreadyExists, InvalidLogFullAction);
}; // <-- Mandatory semi-colon

The complete IDL file for the Basic Telecom Log Service is available here.

painting tgg

ACE+TAO Debian packaging moved to git

We recently converted Debian ACE+TAO package repository from Subversion to git.

This was a long and interesting process; I learned a lot on git in the course. I had been using git for a while for other packages: BOUML, dwarves and GNU Smalltalk. But I did not really get it.

A preliminary study led by Pau[1] showed that out of the following three tools:

the last one was giving results that look better.

The conversion

svn-all-fast-export requires physical access to the repo, so the Alioth SVN repo was copied on my machine svn-pkg-ace/ before running the tool:

svn-all-fast-export --identity-map authors.txt --rules pkg-ace.rules svn-pkg-ace

Here's the content of the pkg-ace.rules configuration file that was used:

create repository pkg-ace
end repository

match /trunk/
  repository pkg-ace
  branch master
end match

match /(branches|tags)/([^/]+)/
  repository pkg-ace
  branch \2
end match

The author mapping file authors.txt being:

markos = Konstantinos Margaritis <email-hidden>
mbrudka-guest = Marek Brudka <email-hidden>
pgquiles-guest = Pau Garcia i Quiles <email-hidden>
tgg = Thomas Girard <email-hidden>
tgg-guest = Thomas Girard <email-hidden>

The tool sample configuration file merged-branches-tags.rules recommends to post-process tags, which are just a branch in SVN. That's why the configuration file above treats branches as tags.

The conversion was indeed fast: less than 1 minute.

Post-conversion observations

Invoking gitk --all in the converted repo revealed different kind of issues:

  • svn tags as branches:

    Branches are marked with green rectangles, and tags with yellow arrows. What we have here (expected given our configuration of the tool) are branches (e.g. 5.4.7-5) corresponding to tags, and tags matching the SVN tagging commit (e.g. backups/5.4.7-5@224). We'll review and fix this.

  • merged code that did not appear as such:

    Branches that were not merged using svn merge look like they were not merged at all.

  • commits with wrong author:

    Before being in SVN, the repository was stored in CVS. When it was imported into SVN, no special attention was given to the commit author. Hence I got credited for changes I did not write.

  • obsolete branches:

    The tool leaves all branches, including removed ones (with tag on their end) so that you can decide what to do with them.

  • missing merges:

    The branch 5.4.7-12 was never merged into the trunk!

Learning git

Based on observations above, I realized my limited knowledge won't do to complete the conversion and clean the repository. There are tons of documentation on git out there, and you can find a lot of links from the git documentation page. Here's the one I've used:

The Git Object Model

It's described with pictures here. You really need to understand this if you haven't already.

Once you do, you understand that git is built bottom-up: the plumbing then the porcelain. If you can't find the tool you need, it's easy to write it.

git fast-import

The Migrating to Git chapter explains how you can use the git fast-import tool to manually import anything into git.

I've used it to create tags with dates in the past, slightly changing the Custom Importer example in the book:

#!/usr/bin/env ruby
# retag.rb
# Small script to create an annotated tag, specifying commiter as well as
# date, and tag comment.
# Based on Scott Chacon "Custom Importer" example.
# Arguments:
#  $1 -- tag name
#  $2 -- sha-1 revision to tag
#  $3 -- committer in the form First Last <email>
#  $4 -- date to use in the form YYYY/MM/DD_HH:MM:SS

def help
  puts "Usage: retag <tag> <sha1sum> <committer> <date> <comment>"
  puts "Creates a annotated tag with name <tag> for commit <sha1sum>, using "
  puts "given <committer>, <date> and <comment>"
  puts "The output should be piped to git fast-import"

def to_date(datetime)
  (date, time) = datetime.split('_')
  (year, month, day) = date.split('/')
  (hour, minute, second) = time.split(':')
  return Time.local(year, month, day, hour, minute, second).to_i

def generate_tag(tag, sha1hash, committer, date, message)
  puts "tag #{tag}"
  puts "from #{sha1hash}"
  puts "tagger #{committer} #{date} +0000"
  print "data #{message.size}\n#{message}"

if ARGV.length != 5
  exit 1
  (tag, sha1sum, committer, date, message) = ARGV
  generate_tag(tag, sha1sum, committer, to_date(date), message)

graft points

(graft means greffe in French)

Because of missing svn:mergeinfo some changes appear unmerged. To fix this there are graft points: they override git idea of parents of a commit.

To create a graft point, assuming 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242 is the commit you want to change, currently with a single parent 898ad49b61d4d8d5dc4072351037e2c8ade1ab68, but containing changes from commit 11cf74d4aa996ffed7c07157fe0780ec2224c73e:

me@mymachine$ echo 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242 11cf74d4aa996ffed7c07157fe0780ec2224c73e 898ad49b61d4d8d5dc4072351037e2c8ade1ab68 >> .git/info/grafts

git filter-branch

git filter-branch allows you to completely rewrite history of a git branch, changing or dropping commits while traversing the branch.

As an additional benefit, this tool use graft points and make them permanent. In other words: after running git filter-branch you can remove .git/info/grafts file.

I've used it to rewrite author of a given set of commits, using a hack on top of Chris Johnsen script:



TARG_NAME="Raphael Bossek"


    if test "$GIT_COMMIT" = 546db1966133737930350a098057c4d563b1acdf -o \
            "$GIT_COMMIT" = 23419dde50662852cfbd2edde9468beb29a9ddcc; then
        if test -n "$TARG_EMAIL"; then
            export GIT_AUTHOR_EMAIL
            unset GIT_AUTHOR_EMAIL
        if test -n "$TARG_NAME"; then
            export GIT_AUTHOR_NAME
            unset GIT_AUTHOR_NAME


git filter-branch $force --tag-name-filter cat --env-filter "$filt" -- $br

(Script edited here; there were much more commits written by Raphael.)


It's important to realize that the whole selected branch history is rewritten, so all objects id will change. You should not do this if you already published your repository.

The --tag-name-filter cat argument ensures our tags are copied during the traversal; otherwise they would be untouched, and hence not available in the new history.


Once git filter-branch completes you get a new history, as well as a new original ref to ease comparison. It is highly recommended to check the result of the rewrite before removing original. To shrink the repo after this, git clone the rewritten repo with file:// syntax -- git-filter-branch says it all.

Cleaning up the repo

To recap, here's how the ACE+TAO git repo was changed after conversion:

  1. Add graft points where needed.

  2. Clean tags and branches.

    Using git tag -d, git branch -d and the Ruby script above it was possible to recreate tags.

    During this I was also able to add missing tags, and remove some SVN errors I did -- like committing in a branch created under tags/.

  3. Remove obsolete branches.

  4. Merge missing pieces.

    There were just two missing debian/changelog entries.

    I did this before git filter-branch because I did not find a way to use the tool correctly with multiple heads.

  5. Fix commit author where needed.

    Using the shell script above Raphael is now correctly credited for his work.

That's it.

The ACE+TAO git repository for Debian packages is alive at;a=summary.

[2]available in Debian as svn-all-fast-export
painting tgg

Playing with C++0x -- lambdas

C++0x has a workload of new features[2]...

Today I'm having a look at lambdas. Lambdas are available in g++-4.5, for now only available in experimental.

What are lambdas?

You can think of it as something akin to a struct with operator(). But it's a more than that: it's closer from a closure, something that Scheme fans are very familiar with: the closure code block captures its outter environment in its body.

Closures are a very powerful tool that are being retrofitted in many languages (for instance Java). They can be used as building blocks for many useful programing idioms such as continuations.

Enough for theory, let's have a look at this new beast.

How do they look like?

  [](int i) { return i + 1; };

This is the increment lambda. You can declare a lambda in any function or method. Written like this, the lambda compiles[1] but it's not very useful: it's an anonynous lambda that is neither stored nor used.

How to read this? The square brackets open the declaration of a lambda. Then you declare lambda parameters and its body. Nothing suprising so far except for the declaration opening.

Using a lambda

To use it add arguments, for instance:
  [](int i) { return i + 1; }(0);
would compute the value 1.

Storing a lambda

The type of a lambda is automatically deduced. You can hint its return type if you need to. To store it you need to use another new C++0x keywork, auto, that was taken back from C:
  auto inc = [](int i) { return i + 1; };
  std::cout << inc(0) << std::endl;
Notice that calling a named lambda is not different from calling a function. If you need to hint the lambda return type you can use ->:
  auto mult = [](int x, double y) -> double { return x * y; };

Capturing environment

You can use lambdas to capture outter environment. A more complex, not working, example:
  void f() {
    int x = 5;
    [](int w) { return w + x; }(0);
Tha lambda declaration + invocation does not work, because the x variable we're referring to is declared in the outter scope. To enable capture of the outter scope, we can rewrite the previous example like this:
  void f() {
    int x = 5;
    [=](int w) { return w + x; }(0);
The = sign in the square brackets (which are really the capturing clause) means we will copy every variable from the outter scope. Hence we can access x from our anonymous lambda. Now what if you want to change x?
  void f() {
    int x = 5;
    [=](int w) { return w + ++x; }(0);
This does not compile: the environment is read-only by default. If you really want to increment x then you need to write:
  void f() {
    int x = 5;
    [=](int w) mutable { return w + ++x; }(0);
Yes, mutable. The value returned from the lambda is what you would expect. But guess what? x value is not changed when the lambda returns. Indeed, we've copied outter scope variables by value. To really affect outter scope you need to change the capture clause:
  void f() {
    int x = 5;
    [&](int w) { return w + ++x; }(0);
We use the ampersand to capture the environment by reference, and thus we can change x value. We could also be more explicit about what we want to capture:
struct A {
    int z;

    A() : z(0) {}

    void f() {
      int x = 5;
      [this, &x](int w) { z = w + ++x; }(0);
      std::cout << "x: " << x << ", z: " << z << std::endl;
The lambda in f() explicitely mentions environment variables which are captured by value: this, because we change this->z, and also the local variable x.

Complete example

It's easier to write functionnal-style code. As Sarah just said: 10 years from now everyone will realize how powerful functional programming is, then we'll be the masters of the universe!
#include <algorithm>
#include <iostream>
#include <vector>

int main(int argc, char* argv[])
  std::vector<int> v = { 1, 2, 3 };
  // Show then increment
  std::for_each(v.begin(), v.end(), [](int& e) { std::cout << e++ << std::endl; });
  // Show content now
  std::for_each(v.begin(), v.end(), [](int e) { std::cout << e << std::endl; });

 return 0;

[1] Use g++-4.5 --std=c++0x to activate C++0x mode
[2] More complete g++ C++0x feature list available here

painting tgg

Playing with C++0x -- tuple

C++0x is approaching, and it comes with a lot of new stuff, so I've decided to play with it.

Some C++0x features are already available in g++ 4.3. Others will be available in the 4.4 release -- see this page for a summary of C++0x language support in g++.

What is a tuple? It's a std::pair on steroïds: it can contain any number of elements. For the impatient, the whole sample program is available from here. It can be compiled with: g++-4.3 --std=c++0x tuple.cpp -o tuple.

Now let's see how to create a tuple:

#include <string>
#include <tuple>

typedef std::tuple<int, double, std::string> my_tuple_t;
my_tuple_t a_tuple(1, 5.0);

Every element of the tuple gets initialized to its default value if it's not specified. Indeed displaying the third element:

std::cout << std::get<2>(a_tuple) << std::endl;
outputs nothing, because the string element was default-constructed to an empty string. The std::make_tuple function, like std::make_pair for std::pair, can be used to create a tuple using type deduction:
std::cout << std::get<1>(std::make_tuple(false, std::string("see"))) << std::endl;

A tuple element can be changed using the same get function:

std::get<2>(a_tuple) = "a string";

Of course, a tuple is type safe. Trying to put a string at the first element:

std::get<0>(a_tuple) = "another string";
does not compile. But there's more: you also get a compilation error when trying to access an element that does not exist. For example trying to display fourth element does not compile:
std::cout << std::get<3>(a_tuple) << std::endl;

What else can you do with tuples? You can test them for equality, or assign them:

my_tuple_t another_tuple = std::make_tuple(2, 10.0, std::string("nothing"));
std::cout << (a_tuple == another_tuple) << std::endl;
another_tuple = a_tuple;
std::cout << (a_tuple == another_tuple) << std::endl;

You can also retrieve multiple elements at once, possibly ignoring some of them:

std::string a_string;
std::tie(std::ignore, std::ignore, a_string) = another_tuple;

Last, you can concatenate two tuples into a new tuple:

std::tuple<int, double, std::string, bool, std::string> cat =
    std::tuple_cat(a_tuple, std::make_tuple(false, std::string("see")));
painting tgg

Seaside on Debian

GNU Smalltalk 3.1 rc1 includes Seaside, a web application framework based on continuations.

To use, you'll need to install package gnu-smalltalk 3.1~rc1-2 from experimental. Then, from a terminal, launch gst and type[1]:

st> PackageLoader fileInPackage: 'Seaside'
st> PackageLoader fileInPackage: 'Seaside-Development'
st> PackageLoader fileInPackage: 'Seaside-Examples'
st> ObjectMemory snapshot: ''
Now start you can start Seaside with gst-remote -I --daemon --start=Seaside, and launch your browser on http://localhost:8080/seaside:
Seaside Seaside counter sample application

[1] taken from GNU Smalltalk info page
painting tgg

dwarves uploaded

We've just uploaded Dwarves.

It is a set of tools that use DWARF information inserted in ELF binaries compiled with debugging symbols. See this paper for details on this set.

Included in the package:

  • pahole: finds alignment holes in structs and classes in languages such as C/C++, CPU cacheline alignment. Helps repack those structures to achieve more cache hits.
  • codiff: a diff like tool to compare the effects changes in source code generate on the resulting binaries
  • pfunct: displays all sorts of information about functions, inlines, etc.
  • pdwtags: pretty-prints DWARF information
  • pglobal: lists global symbols
  • prefcnt: counts DWARF tags usage
  • syscse: system call sign extender
  • dtagnames: lists tag names

Until the package reaches unstable, you can grab them from here.

painting tgg

Fun with templates

At work we had to use two C libraries defining structures with different size but same name. Function names were different though, so we believed no symbol clash would occur.

We thought if we used the headers from each library in a different compilation unit we'd be safe.

But we were wrong. The code compiled fine but crashed at runtime. Why? Because we used templates within these compilation units, and this generated weak symbols with the very same name. The linker then discarded duplicate symbols, and since structure size was different we got a crash.

Let's reproduce this on a small example:

/* A.h */
#ifndef A_H
#define A_H

struct Data { int an_int; };

/* B.h */
#ifndef B_H
#define B_H

struct Data { char a_buffer[65]; };

/* UsingA.cpp */
#inlude "A.h"
#inlude <deque>

static std::deque<Data> a_datas;

std::size_t get_a_size() { return a_datas.size(); }
/* UsingB.cpp */
#inlude "B.h"
#inlude <deque>

static std::deque<Data> b_datas;

std::size_t get_b_size() { return b_datas.size(); }
/* Main.cpp */
int main(int argc, char* argv[]) {  return 0; }
We compile these files:
$ g++ -c -I . UsingA.cpp UsingB.cpp Main.cpp
$ g++ -o Main UsingA.o UsingB.o Main.o
and inspect what got generated in UsingA.o, choosing the call to size() to reduce the output:
$ nm -C UsingA.o | grep '::size()'
0000000000000000 W std::deque<Data, std::allocator<Data> >::size() const
Likewise in UsingB.o:
$ nm -C UsingB.o | grep '::size()'
0000000000000000 W std::deque<Data, std::allocator<Data> >::size() const
g++ has generated code for the deque we're using. That's how templates work: they get "expanded" in the compilation unit using them. So far so good.

Now if we look in Main:

$ nm -C Main | grep '::size()'
0000000000400a64 W std::deque<Data, std::allocator<Data> >::size() const
we see there's only a single symbol in the resulting binary.

Because the symbols are weak (see the W in nm output), g++ silently discards one version, say the version in UsingB.o. (If they were not weak, link would fail because of duplicate symbols. And every C++ program would fail to link.) Therefore any call on b_datas is likely to fail: the deque code called will not use the right definition of Data.

Lesson learned: avoid data structures with the same name.