January 15, 2009

Classes
Tags blog

PHP Memory Management in Foreach

<style type=text/css>code { display: block; background:#dddddd; border: 1px solid #999999; padding: 5px; }</style><p>M

Many developers, even experienced ones, are confused by the way PHP handles arrays in foreach loops. In the standard foreach loop, PHP makes a copy of the array that is used in the loop. The copy is discarded immediately after the loop finishes. This is transparent in the operation of a simple foreach loop. For example:

$set = array("apple", "banana", "coconut");
foreach ( $set AS $item ) {
    echo "{$item}\
";
}

This outputs:

apple
banana
coconut

Even though the copy is created, the developer doesn’t notice, because the original array isn’t referenced within the loop or after the loop finishes. However, when you attempt to modify the items in a loop, you find that they are unmodified when you finish:

$set = array("apple""banana""coconut");
foreach ( 
$set AS $item ) {
    
$item strrev ($item);
}
print_r($set);

This outputs:

Array
(
    [0] => apple
    [1] => banana
    [2] => coconut
)

There are no changes from the original, even though you clearly assigned a value to $item. This is because you are operating on $item as it appears in the copy of $set being worked on. You can override this by grabbing $item by reference, like so:

$set = array("apple""banana""coconut");
foreach ( 
$set AS &$item ) {
    
$item strrev($item);
}
print_r($set);

This outputs:

Array
(
    [0] => elppa
    [1] => ananab
    [2] => tunococ
)

As you can see, when $item is operated on by-reference, the changes made to $item are made to the members of the original $set. Using $item by reference also prevents PHP from creating the array copy. To test this, first we’ll show a quick script demonstrating the copy:

$set = array("apple""banana""coconut");
foreach ( 
$set AS $item ) {
    
$set[] = ucfirst($item);
}
print_r($set);

This outputs:

Array
(
    [0] => apple
    [1] => banana
    [2] => coconut
    [3] => Apple
    [4] => Banana
    [5] => Coconut
)

In this example, PHP copied $set and used it to loop over, but when $set was used inside the loop, PHP added the variables to the original array, not the copied array. Basically, PHP is only using the copied array for the execution of the loop and the assignment of $item. Because of this, the loop above only executes 3 times, and each time it appends another value to the end of the original $set, leaving the original $set with 6 elements, but never entering an infinite loop.

However, what if we had used $item by reference, as I mentioned before? A single character added to the above test:

$set = array("apple""banana""coconut");
foreach ( 
$set AS &$item ) {
    
$set[] = ucfirst($item);
}
print_r($set);

Results in an infinite loop. Note this actually is an infinite loop, you’ll have to either kill the script yourself or wait for your OS to run out of memory. I added the following line to my script so PHP would run out of memory very quickly, I suggest you do the same if you’re going to be running these infinite loop tests:

ini_set("memory_limit","1M");

So in this previous example with the infinite loop, we see the reason why PHP was written to create a copy of the array to loop over. When a copy is created and used only by the structure of the loop construct itself, the array stays static throughout the execution of the loop, so you’ll never run into issues.

But wait, there’s more. PHP fails to create a copy of the array if a reference is used at all. We know that referencing $item will cause the infinite loop scenario above, but if $set is referenced anywhere else in the script, even the non-referencing foreach format will break:

$set = array("apple""banana""coconut");
$a = &$set;
foreach ( 
$set AS $item ) {
    
$set[] = ucfirst($item);
}

Results in an infinite loop, even though $item isn’t by reference. Using $a instead of $set gives identical results.

This is not to say that $item is implicitly used by reference if $set is referenced. See this example:

$set = array("apple""banana""coconut");
$a = &$set;
foreach ( 
$a AS $item ) {
    
$item ucfirst($item);
}
print_r($set);

This outputs:

Array
(
    [0] => apple
    [1] => banana
    [2] => coconut
)

$set is unchanged from the original values, because even though $set is referenced by $a, and $set has not been copied, $item is still given only lexical scope in relation to the loop, and will not pass modifications back to $set. You will still have to assign it by reference to make changes to the original array:

$set = array("apple""banana""coconut");
$a = &$set;
foreach ( 
$a AS &$item ) {
    
$item strrev($item);
}
print_r($set);

This outputs:

Array
(
    [0] => elppa
    [1] => ananab
    [2] => tunococ
)

All of these examples also work in associative arrays using the foreach ( $set AS $key => $item ) syntax. $key can never be used by-reference it always comes from the array the loop construct is using, and cannot be modified. So the tricks used to modify array items in-position won’t work for modifying the keys. You can create new keys in the array, however, and unset the existing ones, like so:

$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
foreach ( 
$set AS $key => $item ) {
    
$set[ucfirst($key)] = $item;
    unset(
$set[$key]);
}
print_r($set);

This outputs:

Array
(
    [Apple] => red
    [Banana] => yellow
    [Coconut] => brown
)

However, as you may have already noticed, this array was copied before the loop began. If you were using the array in a situation where it couldn’t be copied, you will run into errors:

$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
$a = &$set;
foreach ( 
$set AS $key => $item ) {
    
$set[ucfirst($key)] = $item;
    unset(
$set[$key]);
}
print_r($set);

This outputs:

Array
(
)

Because the array was referenced and not copied, you get vastly unpredictable results when attempting to alter the physical structure of the array, especially using unset(). Without the unset() call in this example, you operate on the original array and loop through the original array, so you get the same infinite-loop generating code as before, but since we’re specifying the key for $set it doesn’t continue forever:

$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
$a = &$set;
foreach ( 
$set AS $key => $item ) {
    
$set[ucfirst($key)] = $item;
}
print_r($set);

This outputs:

Array
(
    [apple] => red
    [banana] => yellow
    [coconut] => brown
    [Apple] => red
    [Banana] => yellow
    [Coconut] => brown
)

You can prove that it’s still possible to enter an infinite loop by adding a $set[] inside your loop:

$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
$a = &$set;
foreach ( 
$set AS $key => $item ) {
    
$set[ucfirst($key)] = $item;
    
$set[] = $item;
}
print_r($set);

This results in an infinite loop.

One interesting thing you can do with the $key => $item syntax when the array is copied is modify the original array structure without fear of causing loop issues:

$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
foreach ( 
$set AS $key => $item ) {
    
$set[] = ucfirst($item);
    unset(
$set[$key]);
}
print_r($set);

This outputs:

Array
(
    [0] => Red
    [1] => Yellow
    [2] => Brown
)

As you can see from this example, the array was copied for use in the loop construct. References to $set within the loop still refer to the outer version of $set, so the unset() call and the $set[] addition work on the original, leaving us with a nicely upper-cased version of the original, without keys.

This knowledge is useful for developers who are trying to plug memory holes in PHP applications. If you foreach through an array of objects that can be 50MB in size, you create an entire copy of the structure in memory for no reason other than to power the loop. If your loop doesn’t modify the structure of the array or add to it at all, it would be vastly more efficient to add the “cheat” of $a = &$array; right before your array to prevent PHP from making a copy.

This knowledge is also hopefully useful for programmers who cannot figure out why arrays are behaving like they are. Basically, if you don’t use references, the loop executes once for each member in the original array, regardless of what you do to the original.

NOTE: These tests were performed on PHP version 5.2.5. 5.2.0 and earlier perform differently. Run these tests yourself under controlled circumstances before relying on PHP to behave in any particular way.