Instead of ompi_datatype_get_extent(), use ompi_datatype_get_true_extent() to get the local and remote lower bound. For derived types like subarray, true_lb is the correct offset for RDMA operations.