c - Memory Barriers in Single Producer Single Consumer Queue -
i've spent last few weeks doing lot of reading on memory models, compiler reordering, cpu reordering, memory barriers, , lock-free programming , think i've driven myself confusion now. i've written single producer single consumer queue , trying figure out need memory barriers things work , if operations need atomic. single producer single consumer queue follows:
typedef struct queue_node_t {     int data;     struct queue_node_t *next; } queue_node_t;  // empty queue looks this: // head tail //   |    | // dummy_node  // queue: insert @ tail, remove head //    head           tail //     |              | // dummy_node -> 1 -> 2 -> null  typedef struct queue_t {     queue_node_t *head; // consumer consumes head     queue_node_t *tail; // producer adds @ tail } queue_t;  queue_node_t *alloc_node(int data) {     queue_node_t *new_node = (queue_node_t *)malloc(sizeof(queue_node_t));     new_node->data = data;     new_node->next = null;     return new_node; }  queue_t *create_queue() {     queue_t *new_queue = (queue_t *)malloc(sizeof(queue_t));     queue_node_t *dummy_node = alloc_node(0);     dummy_node->next = null;     new_queue->head = dummy_node;     new_queue->tail = dummy_node;     // 1. need kind of barrier make sure if     // thread didn't call performs queue operation     // , happens run on different cpu queue structure     // observed it? i.e. head , tail     // initialized     return new_queue; }  // enqueue modifies tail void enqueue(queue_t *the_queue, int data) {     queue_node_t *new_node = alloc_node(data);     // insert @ tail     new_node->next = null;      // let save off existing tail     queue_node_t *old_tail = the_queue->tail;      // make new node new tail     the_queue->tail = new_node;      // 2. store/store barrier needed here?      // link in new node last concurrent dequeue doesn't see     // node until we're done     // don't know needs atomic need have     // release semantics isn't visible until prior writes done     old_tail->next = the_queue->tail;     return; }  // dequeue modifies head bool dequeue(queue_t *the_queue, int *item) {     // 3. need barrier here make sure if enqueue happened     // can observe it? i.e., if enqueue called on      // empty queue thread 0 on cpu0 , dequeue called     // thread 1 on cpu1     // dequeue oldest item (fifo) @ head     if (the_queue->head->next == null) {         return false;     }     *item = the_queue->head->next->data;     queue_node_t *old_head = the_queue->head;     the_queue->head = the_queue->head->next;     free(old_head);     return true; } here questions corresponding comments in code above:
- in create_queue()need kind of barrier before return? i'm wondering if call function thread 0 running on cpu0 , use pointer returned in thread 1 happens run on cpu1 possible thread 1 seesqueue_tstructure isn't initialized?
- do need barrier in enqueue()make sure new node isn't linked in queue until of new node's fields initialized?
- do need barrier in dequeue()? feel correct without 1 may need 1 if want make sure see completed enqueue.
update: tried make clear comments in code head of queue points dummy node. common technique makes producer ever needs access tail , consumer ever accesses head. empty queue contain dummy node , dequeue() returns node after head, if there one. nodes dequeued dummy node advances , previous "dummy" freed.
first of all, depends specific hardware-architecture, os, language, etc.
1.) no. because need additional barrier pass pointer other thread anyway
2.) yes, old_tail->next = the_queue->tail needs executed after the_queue->tail = new_node
3.) have no effect, since there nothing before barrier, theoretically need barrier after old_tail->next = the_queue->tail in enqueue(). compiler wont reorder outside of function, cpu maybe that. (very unlikely, not 100% sure)
ot: since doing micro-optimization, add padding cache
typedef struct queue_t {     queue_node_t *head; // consumer consumes head     char cache_pad[64]; // head , tail shouldnt in same cache-line(->64 byte)     queue_node_t *tail; // producer adds @ tail } queue_t; and if have enough memory waste,
typedef struct queue_node_t {     int data;     struct queue_node_t *next;     char cache_pad[56]; // sizeof(queue_node_t) == 64; 32bit } queue_node_t; 
Comments
Post a Comment