Re: What is the Bus error?

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Mon Sep 02 2002 - 07:40:11 EDT

  • Next message: Christian Holm Christensen: "Re: More on OSX port"

    Hi Kris et al, 
    
    On Fri, 30 Aug 2002 15:46:07 -0500
    Kris Hagel <hagel@comp.tamu.edu> wrote
    concerning "Re: What is the Bus error?":
    > Me the guru the implication being on linux HA Ha.  
    
    Yeah, I didn't quite get that either :-) 
    
    > Anyway, Andrei was in  my office 30 seconds before he shot off the
    > mail and my response was  that I had never seen that on anything
    > except a Motorola VME processor at which it meant something quite
    > different than I assume this message here does (assuming this
    > message means anything at all).  
    
    Oh it does.  See the glibc info pages: 
    
      - Macro: int SIGBUS
         This signal is generated when an invalid pointer is dereferenced.
         Like `SIGSEGV', this signal is typically the result of
         dereferencing an uninitialized pointer.  The difference between
         the two is that `SIGSEGV' indicates an invalid access to valid
         memory, while `SIGBUS' indicates an access to an invalid address.
         In particular, `SIGBUS' signals often result from dereferencing a
         misaligned pointer, such as referring to a four-word integer at an
         address not divisible by four.  (Each kind of computer has its own
         requirements for address alignment.)
    
         The name of this signal is an abbreviation for "bus error".
    
    See also this [1], which I found be a search on "bus+error+linux" on
    google. 
    
    > I have to admit, though, that I was not smart enough to suggest gdb.
    > I do feel,   however,  (and I told Andrei) that it smells like an
    > array overflow somewhere.  
    
    See, you were right.  If you'd checked the manual you'd know you were
    right :-) 
    
    > In that case gdb may or may not be of use  because jobs can run well
    > past the point where an array overflow occurs before it finds
    > something it doesn't like and give the random errors.  
    
    Not in the case of SIGSEGV and SIGBUS.  That dereferencing invalid
    memory always creates an immediate signal.  And, you'll see that in
    GDB. 
    
    The only time I got a SIGBUS was from Netscape (bloat-ware - use
    Galeon), and from Objectivity.  In the latter case it was because I
    was working on a Red Hat 6.2 machine with glibc 2.1, and Objectivity
    was linked using glibc 2.0 (or something like that anyway).  It took
    me a week or so to figure that one out!  (I think Flemming, Kris,
    Konstantin and a few others remember my desperation.)  The symptom was
    that the program wouldn't even start - hey, it wouldn't even go to the
    main function.  
    
    > Anyway, it is undoubtly worth a try.
    
    When you do use GDB, use `backtrace' to see where the problem was, and
    go `up' until you hit it.  Then `list' the code at that point, and
    start `print'ing the variables and addresses to see which variable had
    the problem.  Make sure you've initialised everything in the CTOR, and
    that you call all needed Init member functions, and ladida.  (Oh, and
    the names of the commands are not even cryptic at all!). 
    
    > Gee do I miss VMS where one could specify /check=all + an
    > understandable debugger and things could be debugged in a
    > straightforward manner and the error message was related to what the
    > error was and not something semi-random.
    
    If you had the bother to read the manual, then you'd know what the
    `cryptic' message meant.  It's true you can not get bounds check with
    GCC, and I doubt very much with almost any C/C++ compiler.  With
    Fortran you can, as the language gives another set of quaranties than
    C/C++.  Anyway, the way to get make arrays in C++, is _not_ to do 
    
       int a[10]; 
       int* a = new int[10]
    
    but rather use some sort of (possibly templated) container, like
    
      class IntArray  { 
      private: 
        size_t _n; 
        int*   _data; 
      public: 
        IntArray(int n, int initVal=0) { 
          _n    = n;
          _data = new int[_n]; 
        }
        ~IntArray() { delete [] _data; }
        int operator[](size_t i) const { 
          if (i >= _n || i < 0) throw out_of_range("index out of range"); 
          return _data[i];
        }
        int& operator[](size_t i) { 
          if (i >= _n || i < 0) throw out_of_range("index out of range"); 
          return _data[i];
        }
        ...
      }
    
      typedef valarray<int> IntArray;
    
    Also, plain C strings are depreciated for the same reason - instead
    use std::string objects.  Or, if you really want the language to take
    care of all this for you, then you should use Java.  I truely don't
    believe this to be an issue of the platform, rather than an language
    issue.
    
    An aside: Has anyone tried to compile BRAT with Intel's C++ compiler
    yet? 
    
    > Kris
    > 
    > P. S.  The last statement is for Christian as I think he might be 
    > getting lazy and I seldom fail to get him to write 10 pages of prose to 
    > respond to statements like that and I think he needs to do that to keep 
    > himself entertained over the weekend.
    
    Ha ha ha.  Well, it's Monday, so no 10 pages of `why VMS is probably
    the second worse OS (after ... well, I don't need to say what do I?),
    and GNU/Linux is probably the second best after GNU/Hurd, and
    Fortran77/90/95 are at best a pain in the behind, Java is to slow, C
    is ugly, Intercal is funny, ML is cool if it could do more, C++ is
    bloody great, and _both_ vi and Emacs rule!' 
    
    > Flemming Videbaek wrote:
    > 
    > >Hi,
    > >
    > >Why don't you ask your local guru (Kris)  to help you running the gdb on
    > >your job to see where it breaks
    > >this is really the only way. It probably made a core file.
    
    Which means you can do 
    
      gdb <program name> core 
    
    and go straight to the point of the signal. 
    
    On Fri, 30 Aug 2002 18:03:28 -0400 (EDT)
    Andrey Makeev <makeev_a@rcf2.rhic.bnl.gov> wrote
    concerning "Re: What is the Bus error?":
    > the GDB output gives:
    > 
    > (gdb) where
    ...
    > #9  0x40858000 in BrZdcSlewCalModule::Finish (this=0x8ae33b0) at
    > BrZdcSlewCalModule.cxx:329
    ...
    > Looks like trouble is at
    > 
    > BrZdcSlewCalModule::Finish (this=0x8ae33b0) at BrZdcSlewCalModule.cxx:329
    > 
    > But here is a copy from that module (with line numbers):
    > 
    > 326: fCalibration->SetComment ("Slewpar1", "Generated by
    > BrZdcSlewCalModule: fit with a pol3 function");
    > 
    > 327: fCalibration->SetComment ("Slewpar2", "Generated by
    > BrZdcSlewCalModule: fit with a pol3 function");
    > 
    > 328: fCalibration->SetComment ("Slewpar3", "Generated by
    > BrZdcSlewCalModule: fit with a pol3 function");
    > 
    > 329: fCalibration->SetComment ("Slewpar4", "Generated by
    > BrZdcSlewCalModule: fit with a pol3 function");
    > 
    > 330: fCalibration->SetComment ("Slewpar5", "Generated by
    > BrZdcSlewCalModule: fit with a pol3 function");
    
    First off, check that you have indeed allocated memory for the
    parameter "Slewpar4" via a Use message to the fCalibration object. 
    
    Second off - and this is important, and I've raised that issue before:
    Do not make the comments automatically.  The comment field is there
    for some genuinely useful information and _must_ be entered based on
    inspection of the quality of the calibrations by a human.  Otherwise,
    that field has no meaning and should if anything be left empty (which
    it can't be :-).   I twisted Djamel's arm until he saw reason and
    added the functionality to add comments, so take a look in the TOF
    calibration code. 
    
    > and it doesn't give any clues why BE shows up, so m.b. Kris
    > is right, but I couldn't figure out at the moment any array
    > overflows in the code... It worked perfectly not long time
    > ago, and I haven't changed nothing in there.
    
    Did you usually link with the libNew.so array in ROOT?  If so, then
    you probably have an uninitialised pointer somewhere.  Are you using a
    newer compiler or something like that? (that is compared to when it
    worked). 
    
    Yours, 
    
     ____ |  Christian Holm Christensen 
      |_| |	 -------------------------------------------------------------
        | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
         _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
        _|	          Denmark                    Office: (+45) 353  25 305
     ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
     | |
    
    
    
    [1] http://www.uwsg.iu.edu/hypermail/linux/kernel/9902.1/0011.html
    



    This archive was generated by hypermail 2b30 : Mon Sep 02 2002 - 07:45:30 EDT