Managing mega Arrays in PHP

    |
  • Added:
  • |
  • In: Basic PHP

I'm doing a data-mine on millions of old log entries for someone and really want to use PHP in this matter to present the materials a link them easily to the existing PHP system.

I run this code in the PHP 5.4.4 in the Terminal (OSX 10.8):

// Settings
ini_set('error_reporting', E_ALL); // Shows all feedback from the parser for debugging
ini_set('max_execution_time', 0); // Changes the 30 seconds parser exit to infinite
ini_set('memory_limit', '512M'); // Sets the memory that may be used to 512MegaBytes


echo 'Start memory usage: '.(memory_get_usage(TRUE) / 1024)."\n";

$x = Array();
for ($i = 0; $i < 1e7; $i++) {
    $x[$i] = 1 * rand(0, 10);
    //unset($x[$i]);
}

echo 'End memory usage: '.(memory_get_usage(TRUE) / 1024)."\n";
echo 'Peak memory usage: '.(memory_get_peak_usage(TRUE) / 1024)."\n";

This is a simple test with ten-million cycles. The leakage is really bad compared to using dictionaries in Python :(.

When I unquote the unset() function to test the usage, it's instantly all unicorns and rainbows. So forcing the release of the memory seems to go well.

Is there any way I can still maintain 10-50 million array entries within that 512M memory limit?

I can't imagine when I would do some regex with these kind of loops either..

This Question Has 2 Answeres | Orginal Question | user1467267

Use SplFixedArray because you really need to see How big are PHP arrays (and values) really? (Hint: BIG!)

$t = 1e6;
$x = array();
for($i = 0; $i < $t; $i ++) {
    $x[$i] = 1 * rand(0, 10);
}

Output

Start memory usage: 256
End memory usage: 82688
Peak memory usage: 82688

and

$t = 1e6;
$x = new SplFixedArray($t);
for($i = 0; $i < $t; $i ++) {
    $x[$i] = 1 * rand(0, 10);
}

Output

Start memory usage: 256
End memory usage: 35584
Peak memory usage: 35584

But better still i think you should consider a memory based database like REDIS

If the SplFixedArray doesn't work for you I would strongly recomend the use of RabbitMQ -> http://www.rabbitmq.com/tutorials/tutorial-one-php.html

RabbitMQ is more simple to configure and use than normally people think and it has a good library for PHP.

With RabbitMQ your script can be ten, twenty, hundred times faster (depending the number of consumers you set) and you also can manage any amount of data.

I had use RabbitMQ to import milions of rows to retrieve information about all cars registered in Denmark imagine how big this can be.


Search
I am...

Sajjad Hossain

I have five years of experience in web development sector. I love to do amazing projects and share my knowledge with all.
Connect Social With PHPAns
Top