Re: What is the Bus error?

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Mon Sep 02 2002 - 07:40:11 EDT

Next message: Christian Holm Christensen: "Re: More on OSX port"

Previous message: Christian Holm Christensen: "Re: status of Mac OSX port of brat"
In reply to: Kris Hagel: "Re: What is the Bus error?"
Next in thread: Andrey Makeev: "Re: What is the Bus error?"
Reply: Andrey Makeev: "Re: What is the Bus error?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Kris et al, 

On Fri, 30 Aug 2002 15:46:07 -0500
Kris Hagel <hagel@comp.tamu.edu> wrote
concerning "Re: What is the Bus error?":
> Me the guru the implication being on linux HA Ha.  

Yeah, I didn't quite get that either :-) 

> Anyway, Andrei was in  my office 30 seconds before he shot off the
> mail and my response was  that I had never seen that on anything
> except a Motorola VME processor at which it meant something quite
> different than I assume this message here does (assuming this
> message means anything at all).  

Oh it does.  See the glibc info pages: 

  - Macro: int SIGBUS
     This signal is generated when an invalid pointer is dereferenced.
     Like `SIGSEGV', this signal is typically the result of
     dereferencing an uninitialized pointer.  The difference between
     the two is that `SIGSEGV' indicates an invalid access to valid
     memory, while `SIGBUS' indicates an access to an invalid address.
     In particular, `SIGBUS' signals often result from dereferencing a
     misaligned pointer, such as referring to a four-word integer at an
     address not divisible by four.  (Each kind of computer has its own
     requirements for address alignment.)

     The name of this signal is an abbreviation for "bus error".

See also this [1], which I found be a search on "bus+error+linux" on
google. 

> I have to admit, though, that I was not smart enough to suggest gdb.
> I do feel,   however,  (and I told Andrei) that it smells like an
> array overflow somewhere.  

See, you were right.  If you'd checked the manual you'd know you were
right :-) 

> In that case gdb may or may not be of use  because jobs can run well
> past the point where an array overflow occurs before it finds
> something it doesn't like and give the random errors.  

Not in the case of SIGSEGV and SIGBUS.  That dereferencing invalid
memory always creates an immediate signal.  And, you'll see that in
GDB. 

The only time I got a SIGBUS was from Netscape (bloat-ware - use
Galeon), and from Objectivity.  In the latter case it was because I
was working on a Red Hat 6.2 machine with glibc 2.1, and Objectivity
was linked using glibc 2.0 (or something like that anyway).  It took
me a week or so to figure that one out!  (I think Flemming, Kris,
Konstantin and a few others remember my desperation.)  The symptom was
that the program wouldn't even start - hey, it wouldn't even go to the
main function.  

> Anyway, it is undoubtly worth a try.

When you do use GDB, use `backtrace' to see where the problem was, and
go `up' until you hit it.  Then `list' the code at that point, and
start `print'ing the variables and addresses to see which variable had
the problem.  Make sure you've initialised everything in the CTOR, and
that you call all needed Init member functions, and ladida.  (Oh, and
the names of the commands are not even cryptic at all!). 

> Gee do I miss VMS where one could specify /check=all + an
> understandable debugger and things could be debugged in a
> straightforward manner and the error message was related to what the
> error was and not something semi-random.

If you had the bother to read the manual, then you'd know what the
`cryptic' message meant.  It's true you can not get bounds check with
GCC, and I doubt very much with almost any C/C++ compiler.  With
Fortran you can, as the language gives another set of quaranties than
C/C++.  Anyway, the way to get make arrays in C++, is _not_ to do 

   int a[10]; 
   int* a = new int[10]

but rather use some sort of (possibly templated) container, like

  class IntArray  { 
  private: 
    size_t _n; 
    int*   _data; 
  public: 
    IntArray(int n, int initVal=0) { 
      _n    = n;
      _data = new int[_n]; 
    }
    ~IntArray() { delete [] _data; }
    int operator[](size_t i) const { 
      if (i >= _n || i < 0) throw out_of_range("index out of range"); 
      return _data[i];
    }
    int& operator[](size_t i) { 
      if (i >= _n || i < 0) throw out_of_range("index out of range"); 
      return _data[i];
    }
    ...
  }

  typedef valarray<int> IntArray;

Also, plain C strings are depreciated for the same reason - instead
use std::string objects.  Or, if you really want the language to take
care of all this for you, then you should use Java.  I truely don't
believe this to be an issue of the platform, rather than an language
issue.

An aside: Has anyone tried to compile BRAT with Intel's C++ compiler
yet? 

> Kris
> 
> P. S.  The last statement is for Christian as I think he might be 
> getting lazy and I seldom fail to get him to write 10 pages of prose to 
> respond to statements like that and I think he needs to do that to keep 
> himself entertained over the weekend.

Ha ha ha.  Well, it's Monday, so no 10 pages of `why VMS is probably
the second worse OS (after ... well, I don't need to say what do I?),
and GNU/Linux is probably the second best after GNU/Hurd, and
Fortran77/90/95 are at best a pain in the behind, Java is to slow, C
is ugly, Intercal is funny, ML is cool if it could do more, C++ is
bloody great, and _both_ vi and Emacs rule!' 

> Flemming Videbaek wrote:
> 
> >Hi,
> >
> >Why don't you ask your local guru (Kris)  to help you running the gdb on
> >your job to see where it breaks
> >this is really the only way. It probably made a core file.

Which means you can do 

  gdb <program name> core 

and go straight to the point of the signal. 

On Fri, 30 Aug 2002 18:03:28 -0400 (EDT)
Andrey Makeev <makeev_a@rcf2.rhic.bnl.gov> wrote
concerning "Re: What is the Bus error?":
> the GDB output gives:
> 
> (gdb) where
...
> #9  0x40858000 in BrZdcSlewCalModule::Finish (this=0x8ae33b0) at
> BrZdcSlewCalModule.cxx:329
...
> Looks like trouble is at
> 
> BrZdcSlewCalModule::Finish (this=0x8ae33b0) at BrZdcSlewCalModule.cxx:329
> 
> But here is a copy from that module (with line numbers):
> 
> 326: fCalibration->SetComment ("Slewpar1", "Generated by
> BrZdcSlewCalModule: fit with a pol3 function");
> 
> 327: fCalibration->SetComment ("Slewpar2", "Generated by
> BrZdcSlewCalModule: fit with a pol3 function");
> 
> 328: fCalibration->SetComment ("Slewpar3", "Generated by
> BrZdcSlewCalModule: fit with a pol3 function");
> 
> 329: fCalibration->SetComment ("Slewpar4", "Generated by
> BrZdcSlewCalModule: fit with a pol3 function");
> 
> 330: fCalibration->SetComment ("Slewpar5", "Generated by
> BrZdcSlewCalModule: fit with a pol3 function");

First off, check that you have indeed allocated memory for the
parameter "Slewpar4" via a Use message to the fCalibration object. 

Second off - and this is important, and I've raised that issue before:
Do not make the comments automatically.  The comment field is there
for some genuinely useful information and _must_ be entered based on
inspection of the quality of the calibrations by a human.  Otherwise,
that field has no meaning and should if anything be left empty (which
it can't be :-).   I twisted Djamel's arm until he saw reason and
added the functionality to add comments, so take a look in the TOF
calibration code. 

> and it doesn't give any clues why BE shows up, so m.b. Kris
> is right, but I couldn't figure out at the moment any array
> overflows in the code... It worked perfectly not long time
> ago, and I haven't changed nothing in there.

Did you usually link with the libNew.so array in ROOT?  If so, then
you probably have an uninitialised pointer somewhere.  Are you using a
newer compiler or something like that? (that is compared to when it
worked). 

Yours, 

 ____ |  Christian Holm Christensen 
  |_| |	 -------------------------------------------------------------
    | |	 Address: Sankt Hansgade 23, 1. th.  Phone:  (+45) 35 35 96 91
     _|	          DK-2200 Copenhagen N       Cell:   (+45) 24 61 85 91
    _|	          Denmark                    Office: (+45) 353  25 305
 ____|	 Email:   cholm@nbi.dk               Web:    www.nbi.dk/~cholm
 | |

[1] http://www.uwsg.iu.edu/hypermail/linux/kernel/9902.1/0011.html

Next message: Christian Holm Christensen: "Re: More on OSX port"
Previous message: Christian Holm Christensen: "Re: status of Mac OSX port of brat"
In reply to: Kris Hagel: "Re: What is the Bus error?"
Next in thread: Andrey Makeev: "Re: What is the Bus error?"
Reply: Andrey Makeev: "Re: What is the Bus error?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Mon Sep 02 2002 - 07:45:30 EDT