Re: Modifications to brop/monitor/abc

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Thu Nov 14 2002 - 09:13:14 EST

  • Next message: Djamel Ouerdane: "Re: Modifications to brop/monitor/abc"
    Hi Djam, Truls, Kris, et al, 
    
    Djamel Ouerdane <ouerdane@nbi.dk> wrote concerning
    Re: Modifications to brop/monitor/abc [Thu, 14 Nov 2002 11:08:22 +0100 (CET)]
    ----------------------------------------------------------------------
    > Hi guys,
    > 
    > Can someone try with gcc 3.2 ?
    
    SIGSEGV has _nothing_ (absolutely NOTHING) to do with the compiler.
    SIGSEGVs occurs has a result of code trying to access none-allocated
    memory blocks- i.e., badly written code! 
    
    Djam, you should really lay off your compiler-trip, and spend your
    time on something more productive, like putting tracking offsets into
    the DB :-) 
    
    Truls Martin Larsen <t.m.larsen@fys.uio.no> wrote concerning
    Re: Modifications to brop/monitor/abc [Thu, 14 Nov 2002 11:01:21 +0100]
    ----------------------------------------------------------------------
    > Hi Kris,
    > 
    > I have seen the seg fault myself, but also before root 3.03.09 and
    > gcc  3.04. I have also spent a lot of time trying to figure out why
    > these popups produce this behaviour, but without any luck. If I
    > should come  across any reason, I'll let you know. 
    
    This seems to indicate a problem in either the client code (BROP) or
    in the library code (ROOT). 
    
    > Kris Hagel wrote:
    > 
    > > Hello,
    > > I completed (I think) the migration to signal/slot in the online 
    > > monitor software. 
    
    Cool.  The signal/slot mechanism is far superior to the old
     `ProcessMessage' approach (GNOME--, Gtk-- uses the same approach for
    good reasons, and Qt has something that at least conceptually the
    same, but not as flexible as the Gtk-- approach though). 
    
    > >  This was necessitated by the fact that some of the  routines
    > > still using the old message sending methods were not  compiling
    > > after rootcint because of new "features" (I guess) of the  
    > > 3.04 compiler on the piis.  
    
    The new `features' of the GCC 3.x line of C++ compilers is that they
    are 99.99999% ISO/IEC standard compliant.  Hence, if code did not
    compile with GCC 3.x, it most probably means that it wasn't valid C++
    in the first place. 
    
    > > Whatever the reason, the signal/slot is a cleaner way to do
    > > business.  
    
    It's cleaner because the design is better.  Rather than dispatching
    directly to the objects, there's a broker sitting in between making
    sure that stuff that needs to be handled is handled.  Also, the code
    is cleaner as you no longer have to have a 2 or 3 nested `switch'
    statements - instead you have small member functions for each `event'
    you want to handle. 
    
    Also you can do a lot more using the signal/slot approach.
    Unfortunately it's not type safe (at runtime or compile time) like
    libsigc++ used by Gtk--.   
    
    > > There is one caveat.  I (or the 3.04 compiler; or root 3.03.09) have 
    > > manufactured a seg fault when a popup canvas is used by double 
    > > clicking on a monitor pad.  Everything in my tests appears to continue 
    > > working after the seg fault, but I have the general idea that all seg 
    > > faults are bad seg faults.  
    
    Indeed they are.  SIGSEGV indicates poorly written code.  Have you
    checked that all member pointers are initialised to 0 and that you
    check all memory before dereferencing it?  If you delete temporary
    objects, do you make sure that all references to those objects are
    deleted?   
    
    Check if that the pop-up canvas gets a copy or a reference of the
    contained objects, and if they get a reference, make sure the objects
    live as long as the canvas is there.  Simply checking the pointer when
    drawing the pop-up canvas is not be enough - the objects can be
    deleted later on in the event loop, out side of the canvas, leaving
    the canvas with an invalid pointer (SIGSEGV).  Hence, you should make
    damn well sure that the objects in the pop-up is removed id they are
    deleted.  Also watch out for double deletes. 
    
    > > I did not manage to locate the problem and  rationalized that not
    > > many people in brahms besides me use that feature anyway, so I
    > > went ahead and committed in the code.  
    
    Normally, the one thing any developer can be sure of, is that if
    there's a feature in the application, it will be used, and most likely
    in a way the developer hasn't thought of. 
    
    > > But a question to experts (Christian I guess, but anyone else if
    > > they  know).  How do I find where a seg fault happens?  I tried
    > > with gdb and it says it is in InnerLoop and as far as I can tell
    > > it continues there after handling the signal (UNIX signal).   
    
    First off, you obviously need to compile BROP with debugging symbols
    (I guess you did) - but to debug GUI code, you should also compile
    ROOT with debugging symbols (pass `--with-build=debug' to the ROOT
    `./configure' script), as a lot of the signal/slot handling is taking
    place in the library code and it's really helpful to be able to track 
    through all of the signal/slot handling. 
    
    Second off, to avoid having some symbols stripped off you're
    library/application you should set the optimisation level to no higher
    than 1 (which is done per default on SMP machines) - otherwise, the
    optimiser may remove symbols that isn't really used from your code. 
    
    Having done all that, start your application with gdb: 
    
      shell> gdb <application> 
      (gdb) run <arguments to application> 
      ...
      Program received SIGSEGV in ... 
      (gdb) 
    
    When you get to this point, make a backtrace 
    
      (gdb) bt 
    
    This will give you a stack trace of the execution.  Here you can
    figure out how you got to what ever (member) function that gave the
    SIGSEGV.   No go up the stack until you hit the first non-trivial
    (member) function.  `non-trivial' means (member) functions other than
    the signal handler and related functions.  
    
      (gdb) up 
      ...
    
    Now start looking at the symbols in this piece of code.  That is,
    print the addresses of pointers and ladida, and check if they are
    valid. 
    
      (gdb) print foo
      ...
      (gdb) print *foo
    
    Now, this is the place where you're likely to run into trouble if
    you're using GCC 3.x.  The problem is, that the GDB shipped with Red
    Hat 7.x doesn't really understand the run time ABI of GCC 3.x, and so
    you may not be able to inspect the memory pointed to by pointers.
    This is why I recommend using GCC 2.96-RH until we switch to Red Hat
    8.x 
    
    Djam, here's the only thing that has anything to do with the compiler:
    The ABI and debugging.  Fact is, that if you want to use GCC 3.x, you
    damn well better update your Binutils, GDB, Glibc, and God knows what
    other pieces of software: in essence, switch to Red Hat 8.0.   No one
    in their right mind would ever ship a compiler that creates SIGSEGV in
    client code at runtime - in fact, I think it would be hard to produce
    such a compiler. 
    
    <VMS-rant> 
    > > It  makes me miss VMS again as I remember "back in the good old
    > > days" when a program crashed and the computer told you exactly where
    > > it crashed and why.  
    </VMS-rant> 
    
    That would imply that VMS binary code _always_ contained debugging
    symbols, with the associated performance penalty - you can't seriously
    believe that kind of a system to be superior to UNIX.  It's the same
    `feature' you see on `modern' `operating systems' like Windoze:
    Binary code contains debugging symbols, and the signal handler starts
    a debugger that attaches to the process - that is just a plain waste
    of binary code and CPU cycles.   Debuggers does not justify poor
    design. 
    
    <rant>
    If you want the debugger to fire up automatically when the program
    receives a signal, you can write a signal-handler that does that, and
    install that as your signal handler.  The pseudo-code would be: 
    
      signal_handler() { 
         pid = get_pid_of_process_that_got_signal(); 
         name = get_process_name();
         execute_debugger_and_attach(name, pid); 
      }
    
    Compile the signal handler code into a shared library, say
    `mysighandler.so', always compile your code with `-g' and set the
    environment variable `LD_PRELOAD' to point at `mysighandler.so'.  Now,
    your signalhandler will start gdb each time an application gets a
    signal. 
    </rant>
    
    > > Who can help me?   
    
    I hope the above will give you some ideas. 
    
    > > I spent a few hours coverting to signal/slot and 2 1/2 days chasing 
    > > this stupid seg fault and in the end have nothing to show for it.
    
    Ah, the joy of error handling.  That's what you get when you do
    programs that are slightly more complicated than a simple `hello
    world' demo :-) 
    
    Yours, 
    
     ___  |  Christian Holm Christensen 
      |_| |	 -------------------------------------------------------------
        | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
         _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
        _|	          Denmark                    Office: (+45) 353  25 305
     ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
     | |
    


    This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 09:14:00 EST