Re: Modifications to brop/monitor/abc

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Thu Nov 14 2002 - 09:13:14 EST

Next message: Djamel Ouerdane: "Re: Modifications to brop/monitor/abc"

Previous message: Claus O. E. Jorgensen: "C1 Gain Calib scripts."
In reply to: Truls Martin Larsen: "Re: Modifications to brop/monitor/abc"
Next in thread: Djamel Ouerdane: "Re: Modifications to brop/monitor/abc"
Reply: Djamel Ouerdane: "Re: Modifications to brop/monitor/abc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Djam, Truls, Kris, et al, 

Djamel Ouerdane <ouerdane@nbi.dk> wrote concerning
Re: Modifications to brop/monitor/abc [Thu, 14 Nov 2002 11:08:22 +0100 (CET)]
----------------------------------------------------------------------
> Hi guys,
> 
> Can someone try with gcc 3.2 ?

SIGSEGV has _nothing_ (absolutely NOTHING) to do with the compiler.
SIGSEGVs occurs has a result of code trying to access none-allocated
memory blocks- i.e., badly written code! 

Djam, you should really lay off your compiler-trip, and spend your
time on something more productive, like putting tracking offsets into
the DB :-) 

Truls Martin Larsen <t.m.larsen@fys.uio.no> wrote concerning
Re: Modifications to brop/monitor/abc [Thu, 14 Nov 2002 11:01:21 +0100]
----------------------------------------------------------------------
> Hi Kris,
> 
> I have seen the seg fault myself, but also before root 3.03.09 and
> gcc  3.04. I have also spent a lot of time trying to figure out why
> these popups produce this behaviour, but without any luck. If I
> should come  across any reason, I'll let you know. 

This seems to indicate a problem in either the client code (BROP) or
in the library code (ROOT). 

> Kris Hagel wrote:
> 
> > Hello,
> > I completed (I think) the migration to signal/slot in the online 
> > monitor software. 

Cool.  The signal/slot mechanism is far superior to the old
 `ProcessMessage' approach (GNOME--, Gtk-- uses the same approach for
good reasons, and Qt has something that at least conceptually the
same, but not as flexible as the Gtk-- approach though). 

> >  This was necessitated by the fact that some of the  routines
> > still using the old message sending methods were not  compiling
> > after rootcint because of new "features" (I guess) of the  
> > 3.04 compiler on the piis.  

The new `features' of the GCC 3.x line of C++ compilers is that they
are 99.99999% ISO/IEC standard compliant.  Hence, if code did not
compile with GCC 3.x, it most probably means that it wasn't valid C++
in the first place. 

> > Whatever the reason, the signal/slot is a cleaner way to do
> > business.  

It's cleaner because the design is better.  Rather than dispatching
directly to the objects, there's a broker sitting in between making
sure that stuff that needs to be handled is handled.  Also, the code
is cleaner as you no longer have to have a 2 or 3 nested `switch'
statements - instead you have small member functions for each `event'
you want to handle. 

Also you can do a lot more using the signal/slot approach.
Unfortunately it's not type safe (at runtime or compile time) like
libsigc++ used by Gtk--.   

> > There is one caveat.  I (or the 3.04 compiler; or root 3.03.09) have 
> > manufactured a seg fault when a popup canvas is used by double 
> > clicking on a monitor pad.  Everything in my tests appears to continue 
> > working after the seg fault, but I have the general idea that all seg 
> > faults are bad seg faults.  

Indeed they are.  SIGSEGV indicates poorly written code.  Have you
checked that all member pointers are initialised to 0 and that you
check all memory before dereferencing it?  If you delete temporary
objects, do you make sure that all references to those objects are
deleted?   

Check if that the pop-up canvas gets a copy or a reference of the
contained objects, and if they get a reference, make sure the objects
live as long as the canvas is there.  Simply checking the pointer when
drawing the pop-up canvas is not be enough - the objects can be
deleted later on in the event loop, out side of the canvas, leaving
the canvas with an invalid pointer (SIGSEGV).  Hence, you should make
damn well sure that the objects in the pop-up is removed id they are
deleted.  Also watch out for double deletes. 

> > I did not manage to locate the problem and  rationalized that not
> > many people in brahms besides me use that feature anyway, so I
> > went ahead and committed in the code.  

Normally, the one thing any developer can be sure of, is that if
there's a feature in the application, it will be used, and most likely
in a way the developer hasn't thought of. 

> > But a question to experts (Christian I guess, but anyone else if
> > they  know).  How do I find where a seg fault happens?  I tried
> > with gdb and it says it is in InnerLoop and as far as I can tell
> > it continues there after handling the signal (UNIX signal).   

First off, you obviously need to compile BROP with debugging symbols
(I guess you did) - but to debug GUI code, you should also compile
ROOT with debugging symbols (pass `--with-build=debug' to the ROOT
`./configure' script), as a lot of the signal/slot handling is taking
place in the library code and it's really helpful to be able to track 
through all of the signal/slot handling. 

Second off, to avoid having some symbols stripped off you're
library/application you should set the optimisation level to no higher
than 1 (which is done per default on SMP machines) - otherwise, the
optimiser may remove symbols that isn't really used from your code. 

Having done all that, start your application with gdb: 

  shell> gdb <application> 
  (gdb) run <arguments to application> 
  ...
  Program received SIGSEGV in ... 
  (gdb) 

When you get to this point, make a backtrace 

  (gdb) bt 

This will give you a stack trace of the execution.  Here you can
figure out how you got to what ever (member) function that gave the
SIGSEGV.   No go up the stack until you hit the first non-trivial
(member) function.  `non-trivial' means (member) functions other than
the signal handler and related functions.  

  (gdb) up 
  ...

Now start looking at the symbols in this piece of code.  That is,
print the addresses of pointers and ladida, and check if they are
valid. 

  (gdb) print foo
  ...
  (gdb) print *foo

Now, this is the place where you're likely to run into trouble if
you're using GCC 3.x.  The problem is, that the GDB shipped with Red
Hat 7.x doesn't really understand the run time ABI of GCC 3.x, and so
you may not be able to inspect the memory pointed to by pointers.
This is why I recommend using GCC 2.96-RH until we switch to Red Hat
8.x 

Djam, here's the only thing that has anything to do with the compiler:
The ABI and debugging.  Fact is, that if you want to use GCC 3.x, you
damn well better update your Binutils, GDB, Glibc, and God knows what
other pieces of software: in essence, switch to Red Hat 8.0.   No one
in their right mind would ever ship a compiler that creates SIGSEGV in
client code at runtime - in fact, I think it would be hard to produce
such a compiler. 

<VMS-rant> 
> > It  makes me miss VMS again as I remember "back in the good old
> > days" when a program crashed and the computer told you exactly where
> > it crashed and why.  
</VMS-rant> 

That would imply that VMS binary code _always_ contained debugging
symbols, with the associated performance penalty - you can't seriously
believe that kind of a system to be superior to UNIX.  It's the same
`feature' you see on `modern' `operating systems' like Windoze:
Binary code contains debugging symbols, and the signal handler starts
a debugger that attaches to the process - that is just a plain waste
of binary code and CPU cycles.   Debuggers does not justify poor
design. 

<rant>
If you want the debugger to fire up automatically when the program
receives a signal, you can write a signal-handler that does that, and
install that as your signal handler.  The pseudo-code would be: 

  signal_handler() { 
     pid = get_pid_of_process_that_got_signal(); 
     name = get_process_name();
     execute_debugger_and_attach(name, pid); 
  }

Compile the signal handler code into a shared library, say
`mysighandler.so', always compile your code with `-g' and set the
environment variable `LD_PRELOAD' to point at `mysighandler.so'.  Now,
your signalhandler will start gdb each time an application gets a
signal. 
</rant>

> > Who can help me?   

I hope the above will give you some ideas. 

> > I spent a few hours coverting to signal/slot and 2 1/2 days chasing 
> > this stupid seg fault and in the end have nothing to show for it.

Ah, the joy of error handling.  That's what you get when you do
programs that are slightly more complicated than a simple `hello
world' demo :-) 

Yours, 

 ___  |  Christian Holm Christensen 
  |_| |	 -------------------------------------------------------------
    | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
     _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
    _|	          Denmark                    Office: (+45) 353  25 305
 ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
 | |

Next message: Djamel Ouerdane: "Re: Modifications to brop/monitor/abc"
Previous message: Claus O. E. Jorgensen: "C1 Gain Calib scripts."
In reply to: Truls Martin Larsen: "Re: Modifications to brop/monitor/abc"
Next in thread: Djamel Ouerdane: "Re: Modifications to brop/monitor/abc"
Reply: Djamel Ouerdane: "Re: Modifications to brop/monitor/abc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 09:14:00 EST