Mzgtk2 Memory Management

Contents


On mzgtk2 memory management

Figure 1 – Sketch of mzgtk2 memory management

Figure 1 - Sketch of mzgtk2 memory management

It’s always a challenge to connect two different worlds to one another. There are interesting problems to be solved and different ways of thinking to overcome. While this is especially true in real life, it also holds also for the case where scheme has to interact a C(++) library.

Streamlining this interaction is a requirement for any binding of C code to scheme. Connecting the scheme memory model and that of a widget set to one another maybe one of the more challenging of these interactions. For mzgtk2, the garbage collected memory model of mzscheme and the reference counted memory model of Gtk+2.x need to be connected. This report presents a solution to this connection of memory models.


Goal

The main goal of the memory management used by mzgtk2 is to minimize memory leaks, if possible reducing them to zero, and -at all costs- prevent program crashes due to premature garbage collection.


Gtk aspects related to memory management

There are a number of aspects to be aware of when connecting Gtk to mzscheme.

First of all, mzscheme is garbage collected. Essentially, this means that all memory that is not referenced anymore from any variable in the mzscheme scope will be removed from memory. This results in the possibility that a scheme object that is associated with a Gtk object is collected, leaving a floating Gtk object that will never be freed from memory.

An other situation can arise, when a scheme closure is connected to a Gtk signal. The scheme closure will be used as a callback when the connected signal triggers its event. It is often tempting, if not pragmatic, to use a lambda form for the callback closure. This lambda form will be used within the Gtk object long after it has been handed to it. At the time the signal triggers, the lambda form may long have been collected by the scheme garbage collector.

While the first situation results in memory leakage, the second most often leads to a crash, which is worse.

Code sample 1 presents debugging information for an interactive mzgtk2 session. It shows a session, in which the Gtk C interface is directly used from scheme (i.e. not through the mzgtk2 object oriented layer, which handles things differently). What can be seen from this sample, is that the gtk entrybox that is created in a scheme let context, will be collected in the scheme memory (in this sample, the garbage collector has been invoked directly after creating the entry box). The fact that the entry box has been added to the gtk window, using gtk-container-add, prevents it from being removed from Gtk memory.


mzgtk2 memory management strategy

Memory management in mzgtk2 is largely handled during type coercion between mzscheme and Gtk. Type coercions are implemented using SWIG type maps.

Copying coercions

In general, type mapping is done, using a copying typemap from mzscheme to Gtk types and back. In this kind of type mapping no memory is shared between mzscheme and Gtk. These maps follow some rules. E.g., a pointer of type const gchar* that is returned from a Gtk function, will be copied to mzscheme. A pointer of type gchar* that is returned from a Gtk function will be copied to mzscheme, then freed, because Gtk requires non const results to be freed. For a const gchar* argument of a Gtk function, it is expected from Gtk that it will make its own internal copy of the given string. So a UTF8 mzscheme string can be handed directly to such function.

code sample 1 – garbage collection of a gtk object
 > (define a (gtk-window-new 'toplevel))
 creating gobject 03093230 gtk-window-new first time refcount=2

 > (let ((e (gtk-entry-new))) (gtk-container-add a e))
 creating gobject 0309FD38 gtk-entry-new first time refcount=1

 > (collect-garbage)
 Finalizing gobject from scheme 0309FD38 refcount=2 function=gtk-entry-new
 Removing guard from gobject for function 0309FD38,2,gtk-entry-new
 added to freelist: obj=0309FD38,2 from gtk-entry-new

 > (gtk-widget-show-all a)
 objects on freelist:
  obj=0309FD38,2,gtk-entry-new
 performing unref obj=0309FD38,2,i=1,c=2,gtk-entry-new
 unref for object 0309FD38,1,c=1 has been

When a Gtk function returns a pointer of type G(S)List*, it is coerced into a scheme list. The G(S)List* will normally be freed.

This copying style of type coercion results in very stable program execution. Memory is allocated and freed immediately within one and the same type coercion, so no memory leaks are introduced.

Coercions for variables living in mzscheme memory

A second type of coercion is used, when a Gtk C type object has to be associated with a scheme object, but it is sure, that, when the scheme object goes out of scope, the associated Gtk C type object can be collected along with it, because it will be of no use anymore. This type of memory management is usually required for Gtk structure types, such as iterators, GdkColor, etc.

For this variant nothing is copied. But, because it is sure, no Gtk Object will ever keep a reference to the memory of the object, it is safe to hold this type of objects in scheme memory. If they are collected, no Gtk object whatsoever will notice it.

Coercions for variables living in both mzscheme and Gtk memory

This last type of coercion is the most interesting. The other two types are more or less straightforward, but this one is the difficult one. Here we need to really bring together the two worlds. What’s needed is a heuristic to follow with which the main goal as stated earlier is achieved.

A practical heuristic

To work out a usable heuristic, we need to have an idea about the aspects in the “living in two worlds problem” that must be accounted for.

Code sample 2 – a signal cycle
 (define w (gtk-window))

 (let ((e (gtk-entry))
       (b (gtk-button 'label "_Quit" 
                      'closure (lambda (window) 
                                  (-> window destroy) (gtk-main-quit))))
       (v (gtk-vbox 'widgets e b)))
    (-> w add v))

 (-> w show-all)
 (gtk-main)
  • Signals can be connected to gtkwidgets, which are derived from gobjects. These connected signals will be called back from Gtk. Scheme closures will be associated with the signals. The main problem here is, that these scheme closures must stay alive as long as the association with the signal exists.
  • Scheme objects must hold a reference to a gobject, however, the reference count of gobjects must also decrease to 0 in a normal way, i.e., Gtk must be able to free a gobject that is no longer necessary. So references to gobject cannot be created carelessly.
  • On the other side, it is better to have to much references than not enough, because the last variant will often lead to program crashes.

The main assumption here is that the only cycle possible from scheme to gtk and back to (an other location in) scheme, is through callback closures, connected to Gtk signals, as illustrated in Figure 1. In practice, this heuristic seems to be enough.

Of course the signal cycle could be quite long. E.g. the code in Code sample 2 results in a situation that has been sketched in Figure 2.

Not handled by this heuristic

Of course, it is possible to add data to a Gtk variable of type GObject that is a pointer to some scheme object, but this aspect is not handled by mzgtk2. It shouldn’t be necessary to go that deep into the GObject structure. If one needs to associate scheme data (other then callback closures) with GObjects in scheme, this is possible, using the associate-data member of the ROOS class gobject. This member function will associate the data with the ROOS object, not directly with the GObject in C.


Implementation of mzgtk2 memory management

Once it is clear how to do it, the implementation of memory management for objects that exists both in garbage collected memory of scheme and in Gtk is not as difficult as one may think.

Connecting GObjects

First of all, it has to be taken care of that callbacks like in Code sample 2 will not be collected by the scheme garbage collector as long as they live in the Gtk context. In mzgtk2, this has been done by keeping a btree in scheme memory, referencing all callback closures that have been connected to any GObject (actually, any GtkWidget or GtkObject). While the closures are in this btree, they will never be collected.

Figure 2 – Sketch of a signal cycle from the code in code sample 2

Figure 2 - Sketch of a signal cycle from the code in code sample 2

Next, the first thing to make sure is to keep a GObject referenced from scheme, while some scheme object is referencing it and to dereference it if all scheme objects are cleaned up. This is done using a proxy. Figure 1 shows this proxy. The proxy connects scheme objects and the GObject to one another. Scheme objects point to the proxy and the GObject points to the proxy.

The code in Code sample 3 is used in mzgtk2 to keep objects referenced from scheme. This little piece of code has some assumptions about Gtk behaviour.

First, if a GObject is of base type GtkObject, it will be floating when it is first created. This ‘floating’ behaviour has been implemented by the Gtk team to be able to add a GtkWidget to a container and have the container take ownership of the widget when it is still floating. Instead of keeping the reference floating, mzgtk2 will take ownership immediately, so that a container will add its own reference to the GObject instead of taking ownership. Essentially, mzgtk2 is now owner of all GtkObjects and the finalizers added to the scheme memory allocations for GObjects will take care of the (de)referencing.

If a GObject is not of type GtkObject, and the reference count is equal to 1, mzgtk2 again takes ownership, by not increasing the reference count to the GObject. Actually, this part is themost tricky part, because it assumes that all Gtk GObjects that contain an other GObject will have increased the reference count to that GObject. Because of this assumption, mzgtk2 is able to cleanup all GObjects along with collection of the GObject. It only needs to dereference GObjects on finalization.

In all other cases, when the reference count is bigger then 1 or a GtkObject is not floating (which probably is the same), mzgtk2 will only increase the reference count of the associated GObject, which makes mzgtk2 associated with the GObject.

The containment assumption falsified

The main assumption in the used heuristic for GObject handling is that Gtk does its reference counting well. I trust it generally does.

There are cases however, where the assumption doesn’t hold. These are the ones where some GObject contains an other GObject and is also the first and only owner of this GObject. In these cases, the reference count of the GObject = 1, but mzgtk2 still takes ownership by not increasing the reference count.

There are two solutions to this problem:

  1. These functions, returning contained gobjects can be declared of type CGObject. The CGObject typemap can then increase the reference count regardless of the current value.
  2. There could be a decision procedure to determine if a gobject is contained. If so, the default typemapper can be made more intelligent. I doubt this is a possibility.

Note that for the last case, there is a decision procedure for GtObjects: Contained GtkObjects won’t be floating.

This situation must be handled if the GObject that is contained within the other GObject is not explicitly added to the containing GObject, but is created when the containing GObject is created. As is e.g. the case with gtk-image-new-from-file.

Actually, I think GtkImage is maybe the only GObject having this problem.

Introducing a type directive for functions: mzgtk2_is_not_owner

There is a decision procedure to determine if a gobject is contained, i.e. owned by some other gobject. But it can be shown, that one would need at least something as powerfull as a Garbage Collector or something specifically written on the internals of Gtk/GObjects to find out, which both is not what I want.

So let’s introduce the mzgtk2_is_not_owner directive. With this type directive, it becomes possible to tell the typemapper for mzgtk2 that a function is returning a GObject it actually is the owner of. Knowing this, makes it possible to increase the reference to this Gobject allways, instead of taking over ownership when the reference count equals 1.

mzgtk2_is_not_owner is not the way to go let’s do: mzgtk2_is_first_owner

The directive in the previous section may not be that workable. If a function is missed, bugs that are difficult to track will arise, because GObjects will be prematurely deallocated from memory. It is better to turn it around.

So I introduce the mzgtk2_is_first_owner directive. This directive tells mzgtk2 that it will be the first owner of a type directly derived from GObject. It is not necessary to use it in combination with any type derived from GtkObject and it must be used for newly allocated types that derive from GObject.

With this typemap scheme, the code in code sample 3 changes to the code in code sample 5.

The proposed typemap scheme is in sync with the Goal of the mzgtk2 memory management. GObjects may leak, because we do not take ownership of them while we should, but there will be never a situation that a GObject is prematurely deallocated (unless of course a wrong function of type ‘TYPE’ is choosen to be of type ‘mzgtk2_is_first_owner_TYPE’.

Deriving a procedure to add new gtk header files to mzgtk2.

So how can new gtk header files be added to mzgtk2. The following steps need to be followed:

  1. First, add the header file to the set of header files, giving it an extension ‘.i.h’.
  2. Next put it in the Makefile of mzgtk2, along with the other header files.
  3. Add an %include for the new ‘.i.h’ header file in mzgtk2.i.h.
  4. If new types T[i] are wrapped and derived from GObject, put these types in ‘mzgtk2_typemaps.i.h’, using mzgtk2_gobject(T[i])
  5. Remove all type sections, except enumerations, by enclosing them with #ifndef SWIG … #endif /*swig*/
  6. Remove all private functions (starting with an underscore ‘_’), by enclosing them with #ifndef SWIG … #endif /*swig*/
  7. Examine all functions of the header file. If they create a new object of type ‘TYPE’ derived from GObject and hand it to the caller in any way, the new object must be of type ‘mzgtk2_is_first_owner_TYPE’.

Some implementation details

To work, the code in code sample 3 needs to know the number of outstanding references to the GObject is not equal to 1. This has been implemented in mzgtk2, using the function int gobject_ref_count(GObject *obj). This function returns private data of GObjects. It could have been implemented using a function that returns true, if the number of outstanding references to a GObject equals 1 (see toggle references ), but the function as it stands is very helpful for debugging purposes.

Code sample 3 – code for referencing gobjects
 if (GTK_IS_OBJECT(g)) {
   if (GTK_OBJECT_FLOATING(g)) {
      g_object_ref(g);
      gtk_object_sink(g);
   }
   else { 
      g_object_ref(g);
   }
 }
 else if (gobject_ref_count(g)>1) { 
   g_object_ref(g);
 }

Finalization from Gtk

The pointer from a GObject to the proxy is used, when the GObject is finalized from Gtk before the scheme proxy is finalized. When the GObject is finalized, a notification function will be called with the proxy as data. This notification function will call a scheme callback function that can be attached to the proxy (which is currently used to invalidate ROOS objects), and will cleanup the proxy, so that it is clear from the scheme point of view, that this GObject has been invalidated.

Finalization from scheme

From the scheme point of view, a finalizer is attached to the proxy. If no scheme object is referencing the proxy anymore, the garbage collector will collect it and while collecting it, it will call the finalizer for it. This finalizer does a couple of small things. First, if a valid pointer to an associated GObject exists (i.e., the GObject has not been finalized from Gtk), it steals the pointer from the associated GObject to the proxy (after collection, the proxy is of no use anymore to Gtk). Next, it adds the associated GObject to a list of GObjects to be dereferenced on the next reference to any GObject. This is done instead of dereferencing the GObject immediately from the garbage collection thread, which causes crashes when dereferencing the GObject results in finalization from Gtk.

Code sample 4 – finalizer handling
 if (GTK_IS_OBJECT(g)) {
   g_signal_connect(G_OBJECT(g),
                   "destroy",
                    G_CALLBACK(destroyHandler),g);
 }
 g_object_set_data_full(g,
                        "%mzgtk2_guard",
                        p,
                        gobject_gtk_finalizer);
 scheme_register_finalizer(p,
                           gobject_scheme_finalizer,
                           NULL,NULL,NULL);

Breaking cycles

Figure 2 sketches a situation where callback closures are used with widgets. A gtk button refers back to a callback function. It has already been mentioned that to keep the closure from being collected, the connected closure is referenced from a central btree in scheme. If no measures are taken, when (-> w destroy) is called, the GtkWidgets will be destroyed. However, because the closure in scheme is kept in the central btree, the scheme objects associated with the closure will never be collected. These associated scheme objects will of course at least be the widgets that are involved with the signal connection. Because of the reference counting used (on finalization of the scheme proxy, the GObject will be dereferenced), the Gtk GObjects will also never be finalized, which results in a big memory leak.

To prevent this problem from occurring, a destroy handler is connected to the “destroy” signal of each GtkObject (i.e. GtkWidget). See code sample 4. When a widget is destroyed, Gtk will call the associated destroy handler, which in turn will break all signal connections to the widget. All references from the btree, holding the associated scheme closures will then be removed. Now, the GtkWidget is collectable. And because it is destroyed, it is of no use anymore to the Gtk environment either, so, when the associated scheme objects go out of scope in scheme, they will indeed be collected.


Conclusion

This ends my little report on mzgtk2 memory management. More information can be obtained by looking at the mzgtk2.c source code and the SWIG type mappings in mzgtk2-typemaps.i.h in the source tree of mzgtk2. Finally, the main issues regarding the management of objects shared between scheme and Gtk memory can be summerized:

  1. Carefull reference counting of GObjects and GtkObjects is needed.
  2. Finalization from both scheme and Gtk side is necessary.
  3. Cycles can be broken by hooking into the “destroy” signal of Gtk.

Toggle references

The implementation in mzgtk2 is based on versions of Gtk before 2.8. From version 2.8 up, Gtk has a new function g_object_add_toggle_ref(), which handles the case of GObjects for which the reference count drops to one. This opens new possibilities for mzgtk2. I won’t adopt it yet, because I want to support versions of Gtk <2.8 too. See http://mail.gnome.org/archives/gtk-devel-list/2005-April/msg00095.html for more information on toggle references.


Info

File      :  mzgtk2-memory-management.pod
Part of   :  mzgtk2
Author(s) :  Hans Oesterholt
Copyright :  (c) 2005
License   :  LGPL
Generated :  2007-0-28 21:57:33