c - Memory Barriers in Single Producer Single Consumer Queue -


i've spent last few weeks doing lot of reading on memory models, compiler reordering, cpu reordering, memory barriers, , lock-free programming , think i've driven myself confusion now. i've written single producer single consumer queue , trying figure out need memory barriers things work , if operations need atomic. single producer single consumer queue follows:

typedef struct queue_node_t {     int data;     struct queue_node_t *next; } queue_node_t;  // empty queue looks this: // head tail //   |    | // dummy_node  // queue: insert @ tail, remove head //    head           tail //     |              | // dummy_node -> 1 -> 2 -> null  typedef struct queue_t {     queue_node_t *head; // consumer consumes head     queue_node_t *tail; // producer adds @ tail } queue_t;  queue_node_t *alloc_node(int data) {     queue_node_t *new_node = (queue_node_t *)malloc(sizeof(queue_node_t));     new_node->data = data;     new_node->next = null;     return new_node; }  queue_t *create_queue() {     queue_t *new_queue = (queue_t *)malloc(sizeof(queue_t));     queue_node_t *dummy_node = alloc_node(0);     dummy_node->next = null;     new_queue->head = dummy_node;     new_queue->tail = dummy_node;     // 1. need kind of barrier make sure if     // thread didn't call performs queue operation     // , happens run on different cpu queue structure     // observed it? i.e. head , tail     // initialized     return new_queue; }  // enqueue modifies tail void enqueue(queue_t *the_queue, int data) {     queue_node_t *new_node = alloc_node(data);     // insert @ tail     new_node->next = null;      // let save off existing tail     queue_node_t *old_tail = the_queue->tail;      // make new node new tail     the_queue->tail = new_node;      // 2. store/store barrier needed here?      // link in new node last concurrent dequeue doesn't see     // node until we're done     // don't know needs atomic need have     // release semantics isn't visible until prior writes done     old_tail->next = the_queue->tail;     return; }  // dequeue modifies head bool dequeue(queue_t *the_queue, int *item) {     // 3. need barrier here make sure if enqueue happened     // can observe it? i.e., if enqueue called on      // empty queue thread 0 on cpu0 , dequeue called     // thread 1 on cpu1     // dequeue oldest item (fifo) @ head     if (the_queue->head->next == null) {         return false;     }     *item = the_queue->head->next->data;     queue_node_t *old_head = the_queue->head;     the_queue->head = the_queue->head->next;     free(old_head);     return true; } 

here questions corresponding comments in code above:

  1. in create_queue() need kind of barrier before return? i'm wondering if call function thread 0 running on cpu0 , use pointer returned in thread 1 happens run on cpu1 possible thread 1 sees queue_t structure isn't initialized?
  2. do need barrier in enqueue() make sure new node isn't linked in queue until of new node's fields initialized?
  3. do need barrier in dequeue()? feel correct without 1 may need 1 if want make sure see completed enqueue.

update: tried make clear comments in code head of queue points dummy node. common technique makes producer ever needs access tail , consumer ever accesses head. empty queue contain dummy node , dequeue() returns node after head, if there one. nodes dequeued dummy node advances , previous "dummy" freed.

first of all, depends specific hardware-architecture, os, language, etc.

1.) no. because need additional barrier pass pointer other thread anyway

2.) yes, old_tail->next = the_queue->tail needs executed after the_queue->tail = new_node

3.) have no effect, since there nothing before barrier, theoretically need barrier after old_tail->next = the_queue->tail in enqueue(). compiler wont reorder outside of function, cpu maybe that. (very unlikely, not 100% sure)

ot: since doing micro-optimization, add padding cache

typedef struct queue_t {     queue_node_t *head; // consumer consumes head     char cache_pad[64]; // head , tail shouldnt in same cache-line(->64 byte)     queue_node_t *tail; // producer adds @ tail } queue_t; 

and if have enough memory waste,

typedef struct queue_node_t {     int data;     struct queue_node_t *next;     char cache_pad[56]; // sizeof(queue_node_t) == 64; 32bit } queue_node_t; 

Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -