Circular Dependencies and Symbol Declaration

One misconception I see quite frequently in online forums is over the purpose of #import.

Duplicate Symbol Declaration

For those who come from a C or C++ background, we're used to writing code in header files like this:

  1. #ifndef __MyHeaderFile_h__
  2. #define __MyHeaderFile_h__
  3. // declarations go here
  4. #endif

We do this because of how preprocessing works during compilation. When GCC comes across a #include directive, it retrieves the file, recursively processes it, and adds its list of symbol declarations to those it already has. When it finishes compiling (ie, it has processed all the #include directives and has translated everything, etc), the compiler moves on to the next file, and begins again.

The reason we do this #ifndef dance is because of the following scenario: Suppose we have a file, called A.h. A.h includes two files: B.h and C.h. Both B.h and C.h include a single file, D.h. If D.h did not defensively declare its symbols, then when the compiler includes it for the second time (as it is processing C.h), it would see that it's trying to include functions and variables that have the exact same name as others that it has already included. Since names in C have to be unique, the compiler will generate an error and abort compilation.

This is why we put in the #ifndef stuff. The first time the compiler process D.h it will say "Aha! __D_h__ is undefined, so I can continue processing this file!" We then immediately define __D_h__ as a way to indicate that D.h has already been processed. Then when the compiler tries to process it again, it says "__D_h__ is already defined, so I can skip this stuff". And thus we avoid duplicate symbol declaration.

Or we can just use #import and not have to do the #ifndef stuff. #import is the exact same thing as #include, except that it ensures that each file is only processed once, no matter how many times it's included in the file's dependencies.

Circular Dependencies

While #import is a wonderful thing (and I highly recommend that everyone use it where possible), it does not solve all of our subtle compilation problems. Consider this scenario: A.h #imports B.h, and B.h #imports A.h. We're using #import, so we know that each file will only get processed once. But which one should it start with? Let's start with A.h. We immediately come across the #import statement for B.h, so we pause on processing A.h and go start on B.h. Except that as we begin processing B.h, we come across the line to #import A.h, and so we go process that, which tells us to process B.h, which tells us to process A.h, ...

This is called a circular dependency. It is a fancy name for an infinite loop. Fortunately for us, the compiler is smart enough to notice that it has found a circular dependency and will abort compilation with an error. We solve this problem using a mechanism called Forward Declarations. Forward declarations are basically promises to the compiler that "this method/function/variable exists, but for right now, just know what it's called, and you'll find the actual definition later". The compiler then trusts that we'll define the symbol later, and continues on its merry way.

So in our example above, A.h would forward declare the existence of B, and B.h would forward declare the existence of A. This way the compiler can process A.h as if it had already processed B.h, and vice versa. The actual importing would happen (via a #import statement) wherever the symbols are actually defined.

Conclusion

The differences between circular dependencies and duplicate symbol declarations are subtle, but they are not the same. I believe that it is really important for programmers to know the difference. Understanding that they are different cause save us from badly organized code, bizarre workarounds to achieve compilation, and posting naïve comments online.

Comments

Slightly inaccurate

Your description of the circular dependency problem isn't quite right. The preprocessor is taking A.h and its dependencies and producing a temporary version of A.h that has everything all in a single file. When A.h imports B.h, it essentially copies everything from B.h into the intermediate version of A.h that we're producing. Then, when the preprocessor finds #import "A.h" in B.h, it will know that it's already been imported, so it won't go back and try to import it again. It will do exactly what you'd expect.

However, the preprocessed version will look something like this:

  1. // A.h
  2.  
  3. @interface B : NSObject {
  4.   int someVariable;
  5. }
  6. - (NSString *)makeStringFromA:(A *)objectA;   // Line [1]
  7. @end
  8.  
  9. @interface A : NSObject {
  10.   int somethingElse;
  11. }
  12. - (void)eatBforLunch:(B *)objectB;
  13. @end.

Note that we've replaced all the #import statements with the code they represent. Since C (and C-derivatives) process files in a top-down manner, when the compiler reads the preprocessed version of this header, it will get to Line [1] and say "Hey, wait, I don't know what an "A" is. Is this a pointer? A long long? How much memory should I allocate for argument?" And that's where it will die. Adding the @class declaration in B.h will mean that when the compiler deals with the preprocessed header, it will already know that "A" is a pointer to an object. It doesn't need to know anything about the methods or size of A at that point, so simply declaring it as a pointer type is sufficient.

-BJ Homer

Thanks for clarifying.

Thanks for clarifying. You're right, I was slightly off.