Archive for January, 2009
PHP Memory Management in Foreach
Many developers, even experienced ones, are confused by the way PHP handles arrays in foreach loops. In the standard foreach loop, PHP makes a copy of the array that is used in the loop. The copy is discarded immediately after the loop finishes. This is transparent in the operation of a simple foreach loop. For example:
$set = array("apple", "banana", "coconut");
foreach ( $set AS $item ) {
echo "{$item}\n";
}
This outputs:
apple banana coconut
Even though the copy is created, the developer doesn’t notice, because the original array isn’t referenced within the loop or after the loop finishes. However, when you attempt to modify the items in a loop, you find that they are unmodified when you finish:
$set = array("apple", "banana", "coconut");
foreach ( $set AS $item ) {
$item = strrev ($item);
}
print_r($set);
This outputs:
Array
(
[0] => apple
[1] => banana
[2] => coconut
)
There are no changes from the original, even though you clearly assigned a value to $item. This is because you are operating on $item as it appears in the copy of $set being worked on. You can override this by grabbing $item by reference, like so:
$set = array("apple", "banana", "coconut");
foreach ( $set AS &$item ) {
$item = strrev($item);
}
print_r($set);
This outputs:
Array
(
[0] => elppa
[1] => ananab
[2] => tunococ
)
As you can see, when $item is operated on by-reference, the changes made to $item are made to the members of the original $set. Using $item by reference also prevents PHP from creating the array copy. To test this, first we’ll show a quick script demonstrating the copy:
$set = array("apple", "banana", "coconut");
foreach ( $set AS $item ) {
$set[] = ucfirst($item);
}
print_r($set);
This outputs:
Array
(
[0] => apple
[1] => banana
[2] => coconut
[3] => Apple
[4] => Banana
[5] => Coconut
)
In this example, PHP copied $set and used it to loop over, but when $set was used inside the loop, PHP added the variables to the original array, not the copied array. Basically, PHP is only using the copied array for the execution of the loop and the assignment of $item. Because of this, the loop above only executes 3 times, and each time it appends another value to the end of the original $set, leaving the original $set with 6 elements, but never entering an infinite loop.
However, what if we had used $item by reference, as I mentioned before? A single character added to the above test:
$set = array("apple", "banana", "coconut");
foreach ( $set AS &$item ) {
$set[] = ucfirst($item);
}
print_r($set);
Results in an infinite loop. Note this actually is an infinite loop, you’ll have to either kill the script yourself or wait for your OS to run out of memory. I added the following line to my script so PHP would run out of memory very quickly, I suggest you do the same if you’re going to be running these infinite loop tests:
ini_set("memory_limit","1M");
So in this previous example with the infinite loop, we see the reason why PHP was written to create a copy of the array to loop over. When a copy is created and used only by the structure of the loop construct itself, the array stays static throughout the execution of the loop, so you’ll never run into issues.
But wait, there’s more. PHP fails to create a copy of the array if a reference is used at all. We know that referencing $item will cause the infinite loop scenario above, but if $set is referenced anywhere else in the script, even the non-referencing foreach format will break:
$set = array("apple", "banana", "coconut");
$a = &$set;
foreach ( $set AS $item ) {
$set[] = ucfirst($item);
}
Results in an infinite loop, even though $item isn’t by reference. Using $a instead of $set gives identical results.
This is not to say that $item is implicitly used by reference if $set is referenced. See this example:
$set = array("apple", "banana", "coconut");
$a = &$set;
foreach ( $a AS $item ) {
$item = ucfirst($item);
}
print_r($set);
This outputs:
Array
(
[0] => apple
[1] => banana
[2] => coconut
)
$set is unchanged from the original values, because even though $set is referenced by $a, and $set has not been copied, $item is still given only lexical scope in relation to the loop, and will not pass modifications back to $set. You will still have to assign it by reference to make changes to the original array:
$set = array("apple", "banana", "coconut");
$a = &$set;
foreach ( $a AS &$item ) {
$item = strrev($item);
}
print_r($set);
This outputs:
Array
(
[0] => elppa
[1] => ananab
[2] => tunococ
)
All of these examples also work in associative arrays using the foreach ( $set AS $key => $item ) syntax. $key can never be used by-reference it always comes from the array the loop construct is using, and cannot be modified. So the tricks used to modify array items in-position won’t work for modifying the keys. You can create new keys in the array, however, and unset the existing ones, like so:
$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
foreach ( $set AS $key => $item ) {
$set[ucfirst($key)] = $item;
unset($set[$key]);
}
print_r($set);
This outputs:
Array
(
[Apple] => red
[Banana] => yellow
[Coconut] => brown
)
However, as you may have already noticed, this array was copied before the loop began. If you were using the array in a situation where it couldn’t be copied, you will run into errors:
$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
$a = &$set;
foreach ( $set AS $key => $item ) {
$set[ucfirst($key)] = $item;
unset($set[$key]);
}
print_r($set);
This outputs:
Array ( )
Because the array was referenced and not copied, you get vastly unpredictable results when attempting to alter the physical structure of the array, especially using unset(). Without the unset() call in this example, you operate on the original array and loop through the original array, so you get the same infinite-loop generating code as before, but since we’re specifying the key for $set it doesn’t continue forever:
$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
$a = &$set;
foreach ( $set AS $key => $item ) {
$set[ucfirst($key)] = $item;
}
print_r($set);
This outputs:
Array
(
[apple] => red
[banana] => yellow
[coconut] => brown
[Apple] => red
[Banana] => yellow
[Coconut] => brown
)
You can prove that it’s still possible to enter an infinite loop by adding a $set[] inside your loop:
$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
$a = &$set;
foreach ( $set AS $key => $item ) {
$set[ucfirst($key)] = $item;
$set[] = $item;
}
print_r($set);
This results in an infinite loop.
One interesting thing you can do with the $key => $item syntax when the array is copied is modify the original array structure without fear of causing loop issues:
<?php
$set = array("apple"=>"red","banana"=>"yellow","coconut"=>"brown");
foreach ( $set AS $key => $item ) {
$set[] = ucfirst($item);
unset($set[$key]);
}
print_r($set);
This outputs:
Array
(
[0] => Red
[1] => Yellow
[2] => Brown
)
As you can see from this example, the array was copied for use in the loop construct. References to $set within the loop still refer to the outer version of $set, so the unset() call and the $set[] addition work on the original, leaving us with a nicely upper-cased version of the original, without keys.
This knowledge is useful for developers who are trying to plug memory holes in PHP applications. If you foreach through an array of objects that can be 50MB in size, you create an entire copy of the structure in memory for no reason other than to power the loop. If your loop doesn’t modify the structure of the array or add to it at all, it would be vastly more efficient to add the “cheat” of $a = &$array; right before your array to prevent PHP from making a copy.
This knowledge is also hopefully useful for programmers who cannot figure out why arrays are behaving like they are. Basically, if you don’t use references, the loop executes once for each member in the original array, regardless of what you do to the original.
NOTE: These tests were performed on PHP version 5.2.5. 5.2.0 and earlier perform differently. Run these tests yourself under controlled circumstances before relying on PHP to behave in any particular way.
No commentsPHP Type Conversions for Comparison
There has been some discussion recently among our dev team regarding PHP type conversion. I’ll give some of the problems we’ve run into and then try to shed some light on the inner workings of PHP when it does comparisons.
The first example may seem familiar to most seasoned developers, but when chained together it brings up an interesting point about PHP: The == operator isn’t transitive.
echo (null == 0 ? "YES" : "NO") . "\n"; //YES
echo ("null" == 0 ? "YES" : "NO") . "\n"; //YES
echo ("null" == null ? "YES" : "NO") . "\n"; //NO
As you can see, null == 0 == “null”, but null != “null”
You may be familiar with the following kind of error. The erroneous code is usually similar to:
if ( $a = "Hello" && $b != "World" )
Seeded with $b = “World”, the function assigned FALSE to $a. This is because there was a single = instead of == in $a = “Hello”, so PHP was interpreting the whole thing as an assignment operator. Since $b was not equal to “World” $b != “World” was returning TRUE, and TRUE was && with “Hello”, so “Hello” was converted to FALSE, then FALSE && TRUE was assigned to $a.
PHP has a certain order of precedence for data types. It is defined loosely in the manual’s comparison operators page, but I will try to spell it out more explicitly here. There are 8 basic types of data in PHP. In order of operator precedence, they are:
- Boolean
- Object
- Array
- Floating Point Number
- Integer
- String
- Resource
- NULL
That is to say, if you compare any two data types on the list, the variable with the data type lower on the list will be converted to the upper variable’s data type, and then the comparison is applied. However, when applying the first example to this hard and fast rule, we find it lacking. In reality, there are certain comparisons that are so far off PHP converts BOTH data types to a third data type. The first example actually works out like:
- null == 0. both were converting to FALSE, so the comparison was succeeding
- “null” == 0. “null” was converting to 0, so the comparison was succeeding
- “null” == null. “null” was converting to TRUE, NULL was converting to false.
It’s much more easily represented as a table:
| Boolean | Object | Array | Integer | String | Resource | NULL | ||
| Boolean | Boolean Objects always resolve to true |
Boolean Empty arrays are false, all others are true |
Boolean 0 resolves to false, all others are true |
Boolean 0 resolves to false, all others are true |
Boolean "" resoves to false, all others are true |
Boolean Resources always resolve to true |
Boolean NULL is always false |
|
| Object | Boolean Objects always resolve to true |
Objects are always greater-than |
Objects are always greater-than |
Objects are always greater-than |
Objects are always greater-than |
Objects are always greater-than |
Boolean Objects always resolve to true |
|
| Array | Boolean Empty arrays are false, all others are true |
Objects are always greater-than |
Arrays are always greater-than |
Arrays are always greater-than |
Arrays are always greater-than |
Arrays are always greater-than |
Boolean Empty arrays are false, all others are true |
|
| Floating Point | Boolean 0 resolves to false, all others are true |
Objects are always greater-than |
Arrays are always greater-than |
Floating Point | Floating Point | Floating Point | Boolean 0 resolves to false, all others are true |
|
| Integer | Boolean 0 resolves to false, all others are true |
Objects are always greater-than |
Arrays are always greater-than |
Floating Point | Floating Point | Integer | Boolean 0 resolves to false, all others are true |
|
| String | Boolean 0 resolves to false, all others are true |
Objects are always greater-than |
Arrays are always greater-than |
Floating Point | Floating Point | Floating Point | String NULL is converted to "" |
|
| Resource | Boolean Resources always resolve to true |
Objects are always greater-than |
Arrays are always greater-than |
Floating Point | Integer | Floating Point | Boolean Resources always resolve to true |
|
| NULL | Boolean NULL resolves to false |
Boolean Objects always resolve to true Never == null |
Boolean Empty arrays are false, all others are true |
Boolean 0 resolves to false, all others are true |
Boolean 0 resolves to false, all others are true |
String NULL is converted to "" |
Boolean Resources always resolve to true Never == null |
In the table where you see the phrase “No Conversion Made” that means that those two data types will never == each other. However, in most of those situations data types are given specific return values for quantitative comparisons, such as greater-than and less-than. Note the specific case of NULL, where almost every instance of comparing to NULL results in both types being converted to Boolean.
Armed with this information, we are now capable of determining the outcome of almost any comparison in PHP. We know, for instance, that array() is greater than “Hello”, but “Hello” is less than 2. We know that stdClass() is greater than array(), but both of them are equal to TRUE. There are plenty of places where PHP contradicts normal logic, because of the sometimes convoluted process involved in comparing different data types.
The fact that PHP sometimes internally converts two operands to a third, unrelated data type can be quite confusing. I hope, however, that the chart in this article will help you work out exactly what it’s doing.
Of course, as one of our lead developers is quick to point out, this whole discussion would be moot if everyone used ===.
No comments