Search
davedelong's tweets
- @JayCarr it's in the prefs, under "Advanced" — 11 hours 27 min ago
- so that bug i was tracking down all afternoon wasn't actually my bug. silly xib using a wonky encoding :P — 14 hours 51 min ago
- @alesplin 5 pm Mountain Time, i think — 16 hours 54 min ago
- @JayCarr not yet... — 17 hours 24 min ago
- Software Engineering is the "[art of making] software accessible to those who do not know how it works." http://bit.ly/9cNqsF — 23 hours 27 min ago
- 1 of 477
- ››
Circular Dependencies and Symbol Declaration
davedelong — Sun, 10/04/2009 - 13:11
One misconception I see quite frequently in online forums is over the purpose of #import.
Duplicate Symbol Declaration
For those who come from a C or C++ background, we're used to writing code in header files like this:
#ifndef __MyHeaderFile_h__ #define __MyHeaderFile_h__ // declarations go here #endif
We do this because of how preprocessing works during compilation. When GCC comes across a #include directive, it retrieves the file, recursively processes it, and adds its list of symbol declarations to those it already has. When it finishes compiling (ie, it has processed all the #include directives and has translated everything, etc), the compiler moves on to the next file, and begins again.
The reason we do this #ifndef dance is because of the following scenario: Suppose we have a file, called A.h. A.h includes two files: B.h and C.h. Both B.h and C.h include a single file, D.h. If D.h did not defensively declare its symbols, then when the compiler includes it for the second time (as it is processing C.h), it would see that it's trying to include functions and variables that have the exact same name as others that it has already included. Since names in C have to be unique, the compiler will generate an error and abort compilation.
This is why we put in the #ifndef stuff. The first time the compiler process D.h it will say "Aha! __D_h__ is undefined, so I can continue processing this file!" We then immediately define __D_h__ as a way to indicate that D.h has already been processed. Then when the compiler tries to process it again, it says "__D_h__ is already defined, so I can skip this stuff". And thus we avoid duplicate symbol declaration.
Or we can just use #import and not have to do the #ifndef stuff. #import is the exact same thing as #include, except that it ensures that each file is only processed once, no matter how many times it's included in the file's dependencies.
Circular Dependencies
While #import is a wonderful thing (and I highly recommend that everyone use it where possible), it does not solve all of our subtle compilation problems. Consider this scenario: A.h #imports B.h, and B.h #imports A.h. We're using #import, so we know that each file will only get processed once. But which one should it start with? Let's start with A.h. We immediately come across the #import statement for B.h, so we pause on processing A.h and go start on B.h. Except that as we begin processing B.h, we come across the line to #import A.h, and so we go process that, which tells us to process B.h, which tells us to process A.h, ...
This is called a circular dependency. It is a fancy name for an infinite loop. Fortunately for us, the compiler is smart enough to notice that it has found a circular dependency and will abort compilation with an error. We solve this problem using a mechanism called Forward Declarations. Forward declarations are basically promises to the compiler that "this method/function/variable exists, but for right now, just know what it's called, and you'll find the actual definition later". The compiler then trusts that we'll define the symbol later, and continues on its merry way.
So in our example above, A.h would forward declare the existence of B, and B.h would forward declare the existence of A. This way the compiler can process A.h as if it had already processed B.h, and vice versa. The actual importing would happen (via a #import statement) wherever the symbols are actually defined.
Conclusion
The differences between circular dependencies and duplicate symbol declarations are subtle, but they are not the same. I believe that it is really important for programmers to know the difference. Understanding that they are different cause save us from badly organized code, bizarre workarounds to achieve compilation, and posting naïve comments online.

Slightly inaccurate
Anonymous — Fri, 10/09/2009 - 14:14Your description of the circular dependency problem isn't quite right. The preprocessor is taking
A.hand its dependencies and producing a temporary version ofA.hthat has everything all in a single file. WhenA.himportsB.h, it essentially copies everything fromB.hinto the intermediate version ofA.hthat we're producing. Then, when the preprocessor finds#import "A.h"inB.h, it will know that it's already been imported, so it won't go back and try to import it again. It will do exactly what you'd expect.However, the preprocessed version will look something like this:
Note that we've replaced all the
#importstatements with the code they represent. Since C (and C-derivatives) process files in a top-down manner, when the compiler reads the preprocessed version of this header, it will get to Line [1] and say "Hey, wait, I don't know what an "A" is. Is this a pointer? A long long? How much memory should I allocate for argument?" And that's where it will die. Adding the@classdeclaration inB.hwill mean that when the compiler deals with the preprocessed header, it will already know that "A" is a pointer to an object. It doesn't need to know anything about the methods or size of A at that point, so simply declaring it as a pointer type is sufficient.-BJ Homer
Thanks for clarifying.
davedelong — Fri, 10/09/2009 - 14:30Thanks for clarifying. You're right, I was slightly off.