Massive Technical Interviews Tips: C

Showing posts with label C. Show all posts

Friday, December 18, 2015

程序的内存分配：栈区（stack）堆区（heap）

http://blog.csdn.net/ns_code/article/details/21260229
一个由C/C++编译的程序占用的内存分为以下几个部分
1、栈区（stack）—   由编译器自动分配释放   ，存放函数的参数值，局部变量的值等。其
操作方式类似于数据结构中的栈。
2、堆区（heap）   —   一般由程序员分配释放，   若程序员不释放，程序结束时可能由OS回
收   。注意它与数据结构中的堆是两回事，分配方式倒是类似于链表，呵呵。
3、全局区（静态区）（static）—，全局变量和静态变量的存储是放在一块的，初始化的
全局变量和静态变量在一块区域，   未初始化的全局变量和未初始化的静态变量在相邻的另
一块区域。   -   程序结束后由系统释放。
4、文字常量区   —常量字符串就是放在这里的。   程序结束后由系统释放
5、程序代码区—存放函数体的二进制代码。


char *str="abc";
str="bcd";
str[2]='k';//这句话是错误的，因为"abc"存在常量区，是只读类型的，不可以改变，但str这个变量储存在栈区

二、例子程序

这是一个前辈写的，非常详细
//main.cpp
int   a   =   0;   全局初始化区
char   *p1;   全局未初始化区
main()
{
int   b;   栈
char   s[]   =   "abc";   栈
char   *p2;   栈
char   *p3   =   "123456";   123456/0在常量区，p3在栈上。
static   int   c   =0；   全局（静态）初始化区
p1   =   (char   *)malloc(10);
p2   =   (char   *)malloc(20);
分配得来得10和20字节的区域就在堆区。
strcpy(p1,   "123456");   123456/0放在常量区，编译器可能会将它与p3所指向的"123456"
优化成一个地方。
}


二、堆和栈的理论知识
2.1申请方式
stack:
由系统自动分配。   例如，声明在函数中一个局部变量   int   b;   系统自动在栈中为b开辟空
间
heap:
需要程序员自己申请，并指明大小，在c中malloc函数
如p1   =   (char   *)malloc(10);
在C++中用new运算符
如p2   =   new   char[10];
但是注意p1、p2本身是在栈中的。


2.2
申请后系统的响应
栈：只要栈的剩余空间大于所申请空间，系统将为程序提供内存，否则将报异常提示栈溢
出。
堆：首先应该知道操作系统有一个记录空闲内存地址的链表，当系统收到程序的申请时，
会遍历该链表，寻找第一个空间大于所申请空间的堆结点，然后将该结点从空闲结点链表
中删除，并将该结点的空间分配给程序，另外，对于大多数系统，会在这块内存空间中的
首地址处记录本次分配的大小，这样，代码中的delete语句才能正确的释放本内存空间。
另外，由于找到的堆结点的大小不一定正好等于申请的大小，系统会自动的将多余的那部
分重新放入空闲链表中。

2.3申请大小的限制
栈：在Windows下,栈是向低地址扩展的数据结构，是一块连续的内存的区域。这句话的意
思是栈顶的地址和栈的最大容量是系统预先规定好的，在WINDOWS下，栈的大小是2M（也有
的说是1M，总之是一个编译时就确定的常数），如果申请的空间超过栈的剩余空间时，将
提示overflow。因此，能从栈获得的空间较小。
堆：堆是向高地址扩展的数据结构，是不连续的内存区域。这是由于系统是用链表来存储
的空闲内存地址的，自然是不连续的，而链表的遍历方向是由低地址向高地址。堆的大小
受限于计算机系统中有效的虚拟内存。由此可见，堆获得的空间比较灵活，也比较大。



2.4申请效率的比较：
栈由系统自动分配，速度较快。但程序员是无法控制的。
堆是由new分配的内存，一般速度比较慢，而且容易产生内存碎片,不过用起来最方便.
另外，在WINDOWS下，最好的方式是用VirtualAlloc分配内存，他不是在堆，也不是在栈是
直接在进程的地址空间中保留一块内存，虽然用起来最不方便。但是速度快，也最灵活。


2.5堆和栈中的存储内容
栈：   在函数调用时，第一个进栈的是主函数中后的下一条指令（函数调用语句的下一条可
执行语句）的地址，然后是函数的各个参数，在大多数的C编译器中，参数是由右往左入栈
的，然后是函数中的局部变量。注意静态变量是不入栈的。
当本次函数调用结束后，局部变量先出栈，然后是参数，最后栈顶指针指向最开始存的地
址，也就是主函数中的下一条指令，程序由该点继续运行。
堆：一般是在堆的头部用一个字节存放堆的大小。堆中的具体内容由程序员安排。

2.6存取效率的比较

char   s1[]   =   "aaaaaaaaaaaaaaa";
char   *s2   =   "bbbbbbbbbbbbbbbbb";
aaaaaaaaaaa是在运行时刻赋值的；
而bbbbbbbbbbb是在编译时就确定的；
但是，在以后的存取中，在栈上的数组比指针所指向的字符串(例如堆)快。
比如：
#include
void   main()
{
char   a   =   1;
char   c[]   =   "1234567890";
char   *p   ="1234567890";
a   =   c[1];
a   =   p[1];
return;
}
对应的汇编代码
10:   a   =   c[1];
00401067   8A   4D   F1   mov   cl,byte   ptr   [ebp-0Fh]
0040106A   88   4D   FC   mov   byte   ptr   [ebp-4],cl
11:   a   =   p[1];
0040106D   8B   55   EC   mov   edx,dword   ptr   [ebp-14h]
00401070   8A   42   01   mov   al,byte   ptr   [edx+1]
00401073   88   45   FC   mov   byte   ptr   [ebp-4],al
第一种在读取时直接就把字符串中的元素读到寄存器cl中，而第二种则要先把指针值读到
edx中，再根据edx读取字符，显然慢了。


2.7小结：
堆和栈的区别可以用如下的比喻来看出：
使用栈就象我们去饭馆里吃饭，只管点菜（发出申请）、付钱、和吃（使用），吃饱了就
走，不必理会切菜、洗菜等准备工作和洗碗、刷锅等扫尾工作，他的好处是快捷，但是自
由度小。
使用堆就象是自己动手做喜欢吃的菜肴，比较麻烦，但是比较符合自己的口味，而且自由

度大。

http://codecloud.net/heap-and-stack-2241.html

管理方式

堆中资源由程序员控制，容易发生内存泄露。

栈资源由编译器自动管理，无需手工控制。

系统响应

对于堆，应知道系统有一个记录空闲内存地址的链表，当系统收到程序申请时，遍历该链表，寻找第一个空间大于申请空间的堆结点，删除空闲结点链表中的该结点，并将该结点空间分配给程序（大多数系统会在这块内存空间首地址记录本次分配的大小，这样delete才能正确释放本内存空间，另外系统会将多余的部分重新放入空闲链表中）。

对于栈，只要栈的剩余空间大于所申请空间，系统为程序提供内存，否则报异常提示栈溢出。

空间大小

堆大小受限于计算机系统中有效的虚拟内存（32bit系统理论上是4G），所以堆的空间比较灵活，比较大。

栈大小是操作系统/编译器预定好的。

碎片问题

对于堆，频繁的new/delete会造成大量碎片，使程序效率降低。

对于栈，它是一个先进后出的队列，进出一一对应，不会产生碎片。

生长方向

堆向上，向高地址方向增长。

栈向下，向低地址方向增长。

分配方式

堆都是动态分配（没有静态分配的堆）。

栈有静态分配和动态分配，静态分配由编译器完成（如局部变量分配），动态分配由alloca函数分配，但栈的动态分配的资源由编译器进行释放，无需程序员实现。

分配效率

堆由C/C++函数库提供，机制很复杂。所以堆的效率比栈低很多。

栈是极其系统提供的数据结构，计算机在底层对栈提供支持，分配专门寄存器存放栈地址，栈操作有专门指令。

Monday, June 30, 2014

Stack and Heap in C and C++

The Stack
It's a special region of your computer's memory that stores temporary variables created by each function (including the main() function). The stack is a "FILO" (first in, last out) data structure, that is managed and optimized by the CPU quite closely. Every time a function declares a new variable, it is "pushed" onto the stack. Then every time a function exits, all of the variables pushed onto the stack by that function, are freed (that is to say, they are deleted). Once a stack variable is freed, that region of memory becomes available for other stack variables.

Variables allocated on the stack, or automatic variables, are stored directly to this memory. Access to this memory is very fast, and it’s allocation is dealt with when the program is compiled.

1. lives in RAM (random-access memory), but has direct support from the processor via its stack pointer.
2. stack pointer is moved down to create new memory and moved up to release that memory.
3. extremely fast and efficient way to allocate storage, second only to registers.
Every thread requires its own stack, they are separated from other stacks, each stack may grow separately.
very fast access
don't have to explicitly de-allocate variables
space is managed efficiently by CPU, memory will not become fragmented
local variables only
limit on stack size (OS-dependent)
variables cannot be resized

The place where arguments of a function call are stored
The place where registers of the calling function are saved
The place where local data of called function is allocated
The place where called function leaves result for calling function
Supports recursive function calls

The Heap
Variables allocated on the heap, or dynamic variables, have their memory allocated at run time (ie: as the program is executing). Accessing this memory is a bit slower, but the heap size is only limited by the size of virtual memory. This memory remains allocated until explicitly freed by the program and, as a result, may be accessed outside of the block in which it was allocated.

Heap grows toward stack
All threads share the same heap
Data structures may be passed from one thread to another.
variables can be accessed globally
no limit on memory size
(relatively) slower access
no guaranteed efficient use of space, memory may become fragmented over time as blocks of memory are allocated, then freed
you must manage memory (you're in charge of allocating and freeing variables)
variables can be resized using realloc()

Difference between the stack and the heap
Both Stack and Heap are stored in RAM.
Every thread has its own stack, but all threads in one application shares one heap.
Variable allocation is fast on stack where as on heap its slow.
Variables on stack go out of scope automatically once their need is done. That means de-allocation on stack is automatic. On heap, in regards to C and C++ we have to manually de-allocate where as high-level languages such as Java has garbage collection schemes.
On stack, we can access variables without the need for pointers and hence its fast and that is the reason it is used to store local data, method arguments and the call stack etc all that which needs less amount of memory.
You would use stack only when you know for sure how much memory for your data you would need even before compile time. On the other hand, we can use heap without us having to know for sure the amount of memory we need.
Stack is used for static memory allocation and Heap for dynamic memory allocation.
Stack is thread specific and Heap is application specific.
Memory block in stack will be freed when thread is terminated while heap is freed only after application termination.

Stack Overflow and Heap Overflow(OutOfMemory)

Can an object be stored on the stack instead of the heap?
Yes, an object can be stored on the stack. If you create an object inside a function without using the “new” operator then this will create and store the object on the stack, and not on the heap. Suppose we have a C++ class called Member, for which we want to create an object.

Can the stack grow in size? Can the heap grow in size?
The stack is set to a fixed size, and can not grow past it’s fixed size (although some languages have extensions that do allow this). So, if there is not enough room on the stack to handle the memory being assigned to it, a stack overflow occurs. This often happens when a lot of nested functions are being called, or if there is an infinite recursive call.

If the current size of the heap is too small to accommodate new memory, then more memory can be added to the heap by the operating system.

What can go wrong with the stack and the heap?
If the stack runs out of memory, then this is called a stack overflow – and could cause the program to crash.
The heap could have the problem of fragmentation, which occurs when the available memory on the heap is being stored as noncontiguous (or disconnected) blocks – because used blocks of memory are in between the unused memory blocks. When excessive fragmentation occurs, allocating new memory may be impossible because of the fact that even though there is enough memory for the desired allocation, there may not be enough memory in one big block for the desired amount of memory.
heap overflow is generally called 'out of memory'.

References
http://www.quora.com/Objective-C-programming-language/What-is-the-difference-between-the-stack-and-the-heap
http://www.programmerinterview.com/index.php/data-structures/difference-between-stack-and-heap/
http://timmurphy.org/2010/08/11/the-difference-between-stack-and-heap-memory-allocation/comment-page-1/

Memory Layout of C Programs | GeeksforGeeks

A typical memory representation of C program consists of following sections.

1. Text Segment:
A text segment , also known as a code segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions.

As a memory region, a text segment may be placed below the heap or stack in order to prevent heaps and stack overflows from overwriting it.

Usually, the text segment is sharable so that only a single copy needs to be in memory for frequently executed programs. Also, the text segment is often read-only, to prevent a program from accidentally modifying its instructions.

2. Initialized Data Segment:
Initialized data segment, usually called simply the Data Segment. A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer.

Note that, data segment is not read-only, since the values of the variables can be altered at run time.

This segment can be further classified into initialized read-only area and initialized read-write area.

3. Uninitialized Data Segment:uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code.

4. Stack:
The stack area traditionally adjoined the heap area and grew the opposite direction; when the stack pointer met the heap pointer, free memory was exhausted.

The stack area contains the program stack, a LIFO structure, typically located in the higher parts of memory. A “stack pointer” register tracks the top of the stack; it is adjusted each time a value is “pushed” onto the stack. The set of values pushed for one function call is termed a “stack frame”; A stack frame consists at minimum of a return address.

Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller’s environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn’t interfere with the variables from another instance of the function.

5. Heap:
Heap is the segment where dynamic memory allocation usually takes place.

The heap area begins at the end of the BSS segment and grows to larger addresses from there. The Heap area is shared by all shared libraries and dynamically loaded modules in a process.

Read full article from Memory Layout of C Programs | GeeksforGeeks

Thursday, June 19, 2014

G-Facts from geeksforgeeks

In C language, sizeof( ) is an operator. Though it looks like a function, it is an unary operator.

To know the IP address(es) of a URL/website, nslookup can be used at the shell/command prompt (cmd.exe). It works on both types of operating systems i.e. Linux/Windows.

In C, function parameters are always passed by value. Pass-by-reference is simulated in C by explicitly passing pointer values.
In ISO C, you can define main either to take no arguments, or to take two arguments that represent the command line arguments to the program, like this:

int main (int argc, char *argv[])
Other platform-dependent formats are also allowed by the C and C++ standards; for example, Unix (though not POSIX.1) and Microsoft Visual C++ have a third argument giving the program’s environment, otherwise accessible through getenv in stdlib.h:

Bootstrapping (compilers)
http://en.wikipedia.org/wiki/Bootstrapping_(compilers)
In computer science, bootstrapping is the process of writing a compiler (or assembler) in the target programming language which it is intended to compile. Applying this technique leads to a self-hosting compiler.
Many compilers for many programming languages are bootstrapped, including compilers for BASIC, ALGOL, C, Pascal, PL/I, Factor, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Java, Python, Scala, Nimrod, Eiffel, and more.

“Pointer arithmetic and array indexing [that] are equivalent in C, pointers and arrays are different” – Wayne Throop

G-Fact 8
To uniquely construct a Binary Tree, Inorder together with either Postorder or Preorder must be given (See this for details). However, either Postorder or Preorder traversal is sufficient to uniquely construct a Binary Search Tree. To construct Binary Search tree, we can get Inorder traversal by sorting the given Preorder or Postorder traversal. So we have the required two traversals and can construct the Binary Search Tree.

The number of structurally different Binary Trees with n nodes is Catalan number Cn = (2n)!/(n+1)!*n!
http://mathworld.wolfram.com/BinaryTree.html
The number of binary trees with n nodes are 1, 2, 5, 14, 42, ... (Sloane's A000108), which are the Catalan number C_n.

Enumeration constants (enum values) are always of type int in C, whereas they are distinct types in C++ and may have size different from that of int.

In C, struct keyword must be used for declaring structure variables, but it is optional in C++.
struct node {
int x;
node *next; // Error in C, struct must be there. Works in C++
};

Predict the output of following program.

#include <stdio.h>
int main()
{
int x = 012;
printf("%d", x);
getchar();
return 0;
}
The program prints 10. Putting a 0 before an integer constant makes it an octal number and putting 0x (or 0X) makes it a hexadecimal number. It is easy to put a 0 by accident, or as a habit. The mistake is very common with beginners.

Read full article from G-Facts from geeksforgeeks