Massive Technical Interviews Tips: Loop order and Iliffe vector

Friday, August 25, 2017

Loop order and Iliffe vector

http://tengfei.tech/2016/06/11/How-does-looping-order-affect-the-performance-of-a-2D-array/

How does the order of the loops affect performance when iterating over a 2D array?

int[][] array = new int[100][100];
for(int i = 0 ; i < 100 ; i++ ){
    for(int j = 0 ; j < 100 ; j++ ){
        System.out.printlin(array[j][i]); // column first
        System.out.printlin(array[i][j]); // row first
    }
}

In computer programming, an Iliffe vector, also known as a display, is a data structure used to implement multi-dimensional arrays. An Iliffe vector for an n-dimensional array (where n ≥ 2) consists of a vector (or 1-dimensional array) of pointers to an (n − 1)-dimensional array. They are often used to avoid the need for expensive multiplication operations when performing address calculation on an array element. They can also be used to implement jagged arrays, such as triangular arrays, triangular matrices and other kinds of irregularly shaped arrays. The data structure is named after John K. Iliffe.

Their disadvantages include the need for multiple chained pointer indirections to access an element, and the extra work required to determine the next row in an n-dimensional array to allow an optimising compiler to prefetch it. Both of these are a source of delays on systems where the CPU is significantly faster than main memory.

The Iliffe vector for a 2-dimensional array is simply a vector of pointers to vectors of data, i.e., the Iliffe vector represents the columns of an array where each column element is a pointer to a row vector.

Multidimensional arrays in languages such as Java, Python (multidimensional lists), Ruby, Visual Basic .NET, Perl, PHP, JavaScript, Objective-C (when using NSArray, not a row-major C-style array), Swift, and Atlas Autocode are implemented as Iliffe vectors.

Iliffe vectors are contrasted with dope vectors in languages such as Fortran, which contain the stride factors and offset values for the subscripts in each dimension.

But why does this matter? All memory accesses are the same, right?

NO. Because of CPU caches.

A CPU cache is a hardware cache used by the Central Processing Unit (CPU) of a computer to reduce the average time to access data from the main memory. The cache is a smaller, faster memory which stores copies of the data from frequently used main memory locations.

Data from memory gets brought over to the CPU in little chunks (called ‘cache lines’), typically 64 bytes. If you have 4-byte integers, that means you’re getting 16 consecutive integers in a neat little bundle. It’s actually fairly slow to fetch these chunks of memory.

If you access the data in a consecutive memory order, every time you access a variable, it will directly read from CPU cache. 100*100/16 times fetches from memory.

But if you access the data in a nonconsecutive memory order, it will fetch the data from memory to CPU cache. 100*100 times fetches from memory.

It is very obvious that which way is more efficient.

http://www.wikiwand.com/en/Array_data_structure

Multidimensional arrays

For multi dimensional array, the element with indices i,j would have address B + c · i + d · j, where the coefficients c and d are the row and column address increments, respectively.

More generally, in a k-dimensional array, the address of an element with indices i1, i2, ..., ik is

B + c1 · i1 + c2 · i2 + ... + ck · ik.

For example: int a[2][3];

This means that array a has 2 rows and 3 columns, and the array is of integer type. Here we can store 6 elements they are stored linearly but starting from first row linear then continuing with second row. The above array will be stored as a11, a12, a13, a21, a22, a23.

Friday, August 25, 2017

Loop order and Iliffe vector

Multidimensional arrays

Labels

Popular Posts