# Mon, 15 Aug 2022 15:28:26 -0400 Since C with python supposedly gives some best practices, let's go through the process here from the starting doc to the published module. The module in question will be a bare minimum quaternion class. This will give me a chance to benchmark it vs other implementations of quaternions in C and python. I'll start with the official docs here, instead of a tutorial and write it as I go.

Python Extension - SergeNotes Edition

- A module "spam" should be associated with the C file "spammodule.c" by convention - define PY_SSIZE_T_CLEAN before include Python.h to make string types use Py_ssize_t instead of ints (early rabbit hole imo) - Python.h must be included before any other header, and it pulls in stdio.h, string.h, errno.h and stdlib.h. All your basic food groups. - In Python.h, all user-visible symbols have prefix of Py or PY A freestanding function is of the form
static PyObject * mymodule_func(PyObject *self, PyObject *args);
, where the first parameter points to module object and the second parameter contains a tuple object of arguments to the function ! Note that if instead of a freestanding function we were defining a member function, the first parameter points to the object instance instead of the module object Ok, starting with part 2 this document gets all over the place, so the following is a reordering of it that makes sense to me:

Initializing the Module

We're going to introduce some functions and types from Python.h and then explain how we use them to initialize a module:
/* macro to declare a function as PyObject * return type, declares any special linkage declarations
   required by the platform, and for C++ declares the function as extern "C". The declared function
   is called when a python program imports its module for the first time */
PyMODINIT_FUNC


/* create a new module object, based on the populated struct PyModuleDef*/
PyObject *PyModule_Create(PyModuleDef *def);
To create a module, we must define a function that returns a PyObject* that represents our created module. We can define this special function with the macro PyMODINIT_FUNC in the first part of the declaration, e.g.
PyMODINIT_FUNC PyInit_mymodule(void){...}
We know we want to return what PyModule_Create makes, so let's see how PyModuleDef is defined (from pep-489)
typedef struct {
    int slot;
    void *value;
} PyModuleDef_Slot;

typedef struct PyModuleDef {
    PyModuleDef_Base m_base; /* index of lookup function, should always be PyModuleDef_HEAD_INIT */
    const char* m_name; /* name of module */
    const char* m_doc; /* docstring for the module */
    Py_ssize_t m_size; /* size for keeping modules safe to use in sub-interpreters
		          (aka otherwise it'll be static), use -1 to keep it static and global state */
    PyMethodDef *m_methods; /* pointer to a table of module-level functions, can be NULL if none*/
    PyModuleDef_Slot *m_slots;
    traverseproc m_traverse;
    inquiry m_clear;
    freefunc m_free;
} PyModuleDef;
The examples in the documentation don't initialize past m_methods, so I've only added comments for those. What we need to know now is the definition of PyMethodDef, which according to the stable ABI is:
typedef struct PyMethodDef {
    const char* ml_name; /* name of the method */
    PyCFunction ml_meth; /* pointer to the C implementation */
    int ml_flags; /* flag bits indicating how the call should be constructed */
    const char* ml_doc; /* points to the contents of the docstring */
};
Putting all this together, a simple module initialization would look like
static PyMethodDef modmethods[] = {
    {"myfunc", my_c_function, METH_VARARGS,
     "run a special function made by yours truly"},
    ...
    {NULL, NULL, 0, NULL} /* array terminator */
};

static struct PyModule mymodule = {
    PyModuleDef_HEAD_INIT,
    "mymodule",
    PyDoc_STR("sorry.. this module is MINE!!"),
    -1,
    
};

PyMODINIT_FUNC PyInit_mymodule(void) {
    return PyModule_Create(&mymodule);
}
Note that the third param is METH_VARARGS, which is the typical and most flexible calling convention (the methods have type PyCFunction and expect two PyObject* values), but there's a lot more options, like METH_FASTCALL may be faster but only supports positional arguments (its signature is like
PyObject *fn(PyObject *self, PyObject *argv, int argc);
or METH_CLASS for the first parameter to be passed a class object rather than an instance of it. Ok, so now how do we actually make a useful function? Well with our signature
static PyObject * mymodule_func(PyObject *self, PyObject *args);
, let's figure out the parameters. We expect the second argument to be a tuple, which we can parse with
/*Parse the parameters of a function that takes only positional parameters into local variables.
Returns true on success; on failure, it returns false and raises the appropriate exception. */
int PyArg_ParseTuple(PyObject *args, const char *format, ...)
PyArg_ParseTuple is basically like vsscanf, the first argument is our tuple(effectively va_args), the second argument is a format string for the types we expect, and the rest of the arguments are pointers to where we stored the correctly deduced results. The full list of conversion specifiers are here , but I'll specify some here that I'll care about: d = double f = float D = Py_complex structure o = python object So we could define a max function like:
static PyObject * mymodule_max(PyObject *self, PyObject *args){
    double a, b;
    PyArg_ParseTuple(args, "dd", &a, &b);
    return PyFloat_FromDouble(fmax(a,b));
}
Since we don't want to define functions in the context of a Quaternion class rather than a module, I have to somehow create a quaternion object. To create a new extension type, we need to create a new type object. We do this by populating a
PyTypeObject
and adding it to the created module. Two functions will help us:
/* Add an object to module as name 
   0 on success, -1 on error */
int PyModule_AddObject(PyObject *module, const char *name, PyObject *value);

/* Finalize a type object. Must be called on all type objects to finish their initialization
   Returns 0 on success, -1 on failure */
int PyType_Ready(PyTypeObject *type);
The PyTypeObject has member fields for pretty much every class operator you can think of and more, so there's more slots than I want to cover here. I'll instead post a modified example from the docs:
static PyTypeObject MyCustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "mymodule.MadeByME",
    .tp_doc = PyDoc_STR("a docstring goes here"),
    .tp_basicsize = sizeof(MyCustomObject), /* a struct to represent our type */
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT,
    .tp_new = PyType_GenericNew,
};

Looking at the docs, here are other members I think might want to use for the quaternion type. The members I put in the struct are not correctly ordered, so use designated initializes (like they do) to avoid going insane populating all these.
struct PyTypeObject {
...
    /* basics */
    allocfunc tp_alloc; /* allocate memory but not initialize */
    initproc tp_init; /* __init__, or, initialize memory */
    destructor tp_finalize; /*__del__, destructor*/
    
    /* comparison function */
    richcmpfunc tp_richcompare; 
    /* richcmpfunc := PyObject*(*richcmpfunc)(PyObject*, PyObject*, int) */

    /*for subscripting*/
    objobjargproc mp_ass_subscript; /* for assignment? */
	binaryfunc mp_subscript;
	
    /* __repr__ and __str__ */
    reprfunc tp_repr;
    reprfunc tp_str;
    /* reprfunc := PyObject*(*reprfunc)(PyObject*)

    /* method members: */
    PyMethodDef tp_methods[]; /* same format as our earlier module-scoped function list */
    PyMemberDef tp_members[]; /* declare attributes of the type and correspond them to our C struct */
    PyGetSetDef tg_getset[]; /* property-like access for a type, alternative to using tp_members */

    /* attribute getters and setters, alternative to working with tp_members */
    getattrofunc tp_getattro; /*PyObject *(*getattrofunc)(PyObject *self, PyObject *attr); */
    setattrofunc tp_setattro; /*int (*setattrofunc)(PyObject *self, PyObject *attr, PyObject *value);
    /* note there are deprecated fields tp_getaddr, tp_setattr */

    /* and... number-methods */
    PyNumberMethods *tp_as_number; /* pointer to another struct... */
...
}
It turns out PyNumberMethods has too many member as well...
typedef struct{
...
/* binaryfunc's are defined as the following type:
       PyObject* (*binaryfunc)(PyObject*,PyObject*)  */

    binaryfunc nb_add;
    binaryfunc nb_subtract;
    binaryfunc nb_multiply;
    binaryfunc nb_true_divide;

    unaryfunc nb_negative;
    unaryfunc nb_float; /*__float__*/
    unaryfunc nb_int; /*__int__*/
    unaryfunc nb_bool; /*__bool__*/

    objobjargproc mp_subscript; /* also for subscripting (why?)*/
...
} PyNumberMethods;

Wow, that's a lot. But first off we know that we're going to be defining the same function signatures as we did for module scoped functions before, except our first parameter is a pointer to the the object rather than the module. An important point (that took looking at module implementations to figure out), is that the parameters to method operators, like binaryfunc, is that the pointers can be cast to our struct type. And what's more, in addition to casting the input PyObjects as our lovely struct type, we can also cast it into a PyTypeObject using the PyTYPE macro (e.g.
PyTYPE(passedPyObj)->tp_repr(passedPyObj)
). This implies that the metadata associated with our type exists in some convenient offset around the pointer, so that we can switch between treating it as a PyObject and our C struct type. This convenience basically allows us to adapt any existing C Abstract Data Types(ADT) into a python type by passing the type's operator functions into the struct's methods. A better approach might be to write wrapper functions around an ADT's methods to allow for data checking and fancier python functionality. In this tutorial, I will be working with the worst approach, which is defining the ADT's functionality in the python handler functions and mixing ADT logic with python functions and macros, thus preventing me from testing the ADT independently and using it for other C codebases. The last piece to be function is understanding how to deal with PyObjects. The concept is simple: PyObjects correlate to our objects in the interpreter, and since everything is a PyObject, we need to learn some functions to work this type. I've scrounged up the following from the tutorial and docs:
// the following 3 create python objects whose memory is already managed for us
/* Create a Unicode object from a null-terminated string. */
PyObject* PyUnicode_FromString(const char *u);
/* similar to scanf, but with different conversion-specifiers, no float types >:( */
PyObject* PyUnicode_FromFormat(const char *format, ...); 

/* Allocate a new Python object using its C structure type and its Python object type,
   size of memory allocated is determined from tp_basicsize */
PyObject* PyObject_New(TYPE, PyTypeObject *type); 

/* Increment the reference count for object o, if o is NULL nothing happens */
void Py_XINCREF(PyObject *o);
void Py_XDECREF(PyObject *o); // opposite of XINCREF
Ok. Let's assemble all our info so far and give this a shot with a Quaternion type that can just add and show its members. I've added some additional notes in the comments for things I was testing out while writing it.
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <structmember.h>

// The struct we are building our type around
typedef struct {
    PyObject_HEAD
    double w;
    double x;
    double y;
    double z;
} Quaternion;

/* --- Arithmetic Method Section  --- */
static PyObject* Quaternion_add(Quaternion *self, Quaternion *other);

static PyNumberMethods Quaternion_numbermethods = {
    .nb_add = (binaryfunc)Quaternion_add,
};


// here we use offsetof, as defined in stdint.h(!)
static PyMemberDef Custom_members[] = {
    {"w", T_DOUBLE, offsetof(Quaternion, w), 0, "real component"},
    {"x", T_DOUBLE, offsetof(Quaternion, x), 0, "coefficient to i"},
    {"y", T_DOUBLE, offsetof(Quaternion, y), 0, "coefficient to j"},
    {"z", T_DOUBLE, offsetof(Quaternion, z), 0, "coefficient to k"},
    {NULL}, /* array terminator */
};

/* --- Type Method Section --- */

static void Quaternion_dealloc(Quaternion *self){
    // since we're using python's default allocation, we must use its deallocation
    // note that this is casting self as a PyTypeObject so we can access its
    // "free memory" method
    Py_TYPE(self)->tp_free((PyObject *)self);

    // an equivalent call for allocationg memory would be PyTYPE(self)->tp_alloc(ptr-to-PyTypeObject,0); (I think)
}

static PyObject* Quaternion_repr(Quaternion *self)
{
    static char repr[128];
    snprintf(repr,   128, "w:%f, x:%f, y:%f, z:%f", self->w, self->x, self->y, self->z);
    return PyUnicode_FromString(repr);
}

static PyTypeObject QuaternionType = {
    PyVarObject_HEAD_INIT(NULL, 0)  /*object's reference count I think, mandatory boilerplate*/
    .tp_name = "simplequat.SimpleQuat",
    .tp_doc = PyDoc_STR("SimpleQuat object"),
    .tp_basicsize = sizeof(Quaternion),
    .tp_flags = Py_TPFLAGS_DEFAULT, /* bitmask of what "special fields" exist, just use default */
    .tp_itemsize = 0, /* only for variable sized objects */
    .tp_members = Custom_members,
    .tp_repr = (reprfunc)Quaternion_repr,
    .tp_as_number = &Quaternion_numbermethods,

    /* create new instance using type's tp_alloc slot */
    .tp_new = PyType_GenericNew,

     /* ".tp_alloc = PyType_GenericAlloc" is the default allocation, which uses the basicsize we
     * set earlier to create this memory and use Python's default memory allocation to
     * allocate a new instance and initialize all its contents to NULL, we can think of this
     * field as optional*/

    /* however, the dealloc must be defined (unless our struct is empty)*/
    .tp_dealloc = (destructor)Quaternion_dealloc,

};


/* --- Python Module Definition --- */
static PyModuleDef simplequat = {
    .m_base = PyModuleDef_HEAD_INIT,
    .m_name = "simplequat",
    .m_doc = PyDoc_STR("A quaternion module with no error checking that only supports addition"),
    .m_size = -1
};

/* --- Module Initializer --- */
PyMODINIT_FUNC PyInit_simplequat(void){
    // add our definition of the type
    if (PyType_Ready(&QuaternionType) < 0) return NULL;

    // init module
    PyObject *module = PyModule_Create(&simplequat);
    if (module == NULL) return NULL;

    Py_INCREF(&QuaternionType);
    // Add the type to the module's dictionary
    if (PyModule_AddObject(module, "SimpleQuat", (PyObject*)&QuaternionType) < 0) {
        // trigger destructors on both objects
        Py_DECREF(&QuaternionType);
        Py_DECREF(module);
        return NULL;
    }

    return module;
}

static PyObject* Quaternion_add(Quaternion *self, Quaternion *other) {
    Quaternion *ret = (Quaternion*)PyObject_New(Quaternion, &QuaternionType);
    ret->w = self->w + other->w;
    ret->x = self->x + other->x;
    ret->y = self->y + other->y;
    ret->z = self->z + other->z;
    return (PyObject*)ret;
}
With the following setup.py file (yoinked roughly from the tutorial), we can build our module:
from distutils.core import setup, Extension
setup(name="simplequat", version="1.0",
      ext_modules=[Extension("simplequat", ["simplequatmodule.c"])])
python3 setup.py build
running build
running build_ext
building 'simplequat' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c simplequatmodule.c -o build/temp.linux-x86_64-3.8/simplequatmodule.o
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/simplequatmodule.o -o build/lib.linux-x86_64-3.8/simplequat.cpython-38-x86_64-linux-gnu.so
Now the output module should be somehere in a subdirectory. For me it was
build/lib.linux-x86_64-3.8
. Change to this directory and run python.
Python 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import simplequat
>>> A = simplequat.SimpleQuat()
>>> A
w:0.000000, x:0.000000, y:0.000000, z:0.000000
>>> B = simplequat.SimpleQuat()
>>> B
w:0.000000, x:0.000000, y:0.000000, z:0.000000
>>> A.w = A.y = 5
>>> A
w:5.000000, x:0.000000, y:5.000000, z:0.000000
>>> B.w, B.x, B.y, B.z = 0.1, 0.2, 0.3, 0.4
>>> B
w:0.100000, x:0.200000, y:0.300000, z:0.400000
>>> A+B
w:5.100000, x:0.200000, y:5.300000, z:0.400000
>>>
And wow, after all that terse documentation we can finally implement a thing in C. At the moment I'm more relieved than happy, which means lets grind some more.

Module Design

A big program is made up of many small modules. Assuming that these modules have been tested thoroughly, only the application-specific code will contain bugs. In the gray area between library-oriented code and application-oriented code, the smallest steps we can meaningfully isolate and break down functionality would represent each module of our development. Completing a module (aka guaranteeing its behavior) before moving onto its dependents inductively builds ourselves the stability we ~need for a large project, and it does this despite the ever ambiguious proximity to core business logic. Thus taking this demo to a well implemented python package is to take the earlier hint about the python coupling and: (a) Separate the C into a well implemented module (b) Simplify the python as a module with respect to its downstreams

C Module Design

The main interface to a C "module" is on the level of headers and shared object -- symbols are exposed in a shared object for other pieces of code to link against and use or manipulate. A header provides an interface to these symbols. The implementation should be hidden at both of these levels. In C++, a common pattern is to declare a class with methods and members and implement them in the a .cc/.cpp file, but any changes to the underlying class's "private" members and other implementation-specific details cause a change in the interface and a break in the ABI and the contract between all previous clients. In C++ we can work around this with a combination of the Pimpl idiom and static methods for implementation-specific idioms, but you're fighting against the encouraged OO syntax of the language. C is the opposite approach, we choose what we want to expose and poke holes in our header file and exposed symbols to access these, but are otherwise explicit about it. What do we want to expose from a quaternion API? First off, the type.
struct Quaternion {
    double w;
    double x;
    double y;
    double z;
};
Then some way to create our object. It is idiomatic C to provide a make and free pair of functions.
// create an uninitialized quaternion
struct Quaternion* make_Quaternion(void);

// free a quaternion created with make_Quaternion()
void free_Quaternion(struct Quaternion *q);

// initialize quaternion to 1+0i+0j+0k, aka 1
void init_Quaternion(struct Quaternion *q);
This is less important since our type is fairly simple and we expose its definition, but if we didn't (aka if it were an abstract data type) then there would be no other way for the user to create instances of it. In general it lets us simplify the creation logic for the type if it were to grow and get more complex. Now we want to create our functionality. Let's cover all basic interesting operations.
/* copy b to a, returns pointer to a*/
struct Quaternion *copy_Quaternion(struct Quaternion *a, const struct Quaternion *b);

/* get norm of quaternion a*/
double norm_Quaternion(const struct Quaternion *a);

/* make a = -a, returns pointer to a */
struct Quaternion *negate_Quaternion(struct Quaternion *a);

/* add b to a, returns pointer to a*/
struct Quaternion *add_Quaternion(struct Quaternion *a, const struct Quaternion *b);

/* subtract b from a, returns pointer to a*/
struct Quaternion *sub_Quaternion(struct Quaternion *a, const struct Quaternion *b);

/* multiply a by b, storing the result in a, returns pointer to a */
struct Quaternion *mul_Quaternion(struct Quaternion *a, const struct Quaternion *b);

/* scale a by s, returns pointer to a*/
struct Quaternion *scale_Quaternion(struct Quaternion *a, double s);

/* make a its conjugate, returns pointer to a*/
struct Quaternion *conj_Quaternion(struct Quaternion *a);

/* invert  a, returns pointer to a*/
struct Quaternion *inv_Quaternion(struct Quaternion *a);

/* rotate quaternion a by b, returns pointer to a */
struct Quaternion *rot_Quaternion(struct Quaternion *a, const struct Quaternion *b);
I've defined the previous operations to return a copy to parameter a, the modified quaternion. This is to mainly allow for chaining operations, but we can also use it to return NULL on errors, or even make copy_Quaternion more convenient by having it return a newly created quaternion if the user passes NULL in parameter a. We could make the decision to return copies of a quaternion on each operation, or other convenient python-like behavior. An important consideration here is that we never allocate stack memory for the user. If we expose the struct definition, they can allocate from the stack on their own and call pointer interface functions. However, if we didn't want to provide the definition of a quaternion, if it were an abtract data type, we wouldn't be able to define the function's symbols to returning non-pointer type in the header since its size wouldn't be known at the interface level. Thus the best situation for expressiveness and accessibility is to give an interface for working with the pointer types in both abstract and non-abstract cases. If you put the last three code excerpts into a header called
Quaternion.h
, then you've just defined the main interface to your implementation. If you don't know about quaternions and want to try implementing these functions yourself, the best primer I've found is by Moti Ben-Ari, which I've mirrored here. After writing an implementation (even if they don't work or even do anything) in Quaternion.c, you can check the defined symbols by running
gcc -c Quaternion.c
and then
nm Quaternion.o
. Use
man nm
, to figure out what the letters next to the functions you defined mean, and the other letters next to the functions you didn't define mean. Mine looks like this:
U _GLOBAL_OFFSET_TABLE_
00000000000001d8 T add_Quaternion
0000000000000040 b b_temp.3632
0000000000000020 b bc_temp.3633
0000000000000552 T conj_Quaternion
000000000000009b T copy_Quaternion
                 U free
000000000000002c T free_Quaternion
000000000000004b T init_Quaternion
00000000000005be T inv_Quaternion
0000000000000000 T make_Quaternion
                 U malloc
00000000000002f0 T mul_Quaternion
0000000000000150 T negate_Quaternion
00000000000000d7 T norm_Quaternion
000000000000065c T rot_Quaternion
00000000000004e1 T scale_Quaternion
                 U sqrt
0000000000000264 T sub_Quaternion
0000000000000000 b temp.3617
This level of symbol definition should be considered a complimentary interface to the header one that we just defined. Note that unlike C++ symbols, no type information is conveyed at this level; only the location of the function pointers. The header is what informs us of their invocation. Our header may seem fairly bare, but don't assume this is just because this is a tutorial; since quaternions are mainly used for rotation mathematics, we could offer an interface for converting euler angles bidirectionally to quaternions or even matrices. However, consider the consumers to this library, they may have their own matrix or rotational type. We don't know the exact conversion formats the user will want to work with in either C or Python. Implementing conversions in the module at this step would be premature and likely cause more work later. The nice thing about a minimally designed module is that, if needed, we could add in this functionality with no breaking changes. On the other hand building "fast" now might feel good, but without a concrete set of known requirements, we will likely discover much of the functionality either wasn't needed or, worse, was close-but-not-exactly-what-was-needed, and will either be wasted work or will actively require additional work to tear down and clean up. Hence simpler is better even if the final destination is the same. So, let's take what we got and get onto verifying it as a stable module.

Testing A C Module

Once we have the interface and implementation, we need to evaluate our code. Two metrics that can be done in similar fashion, in a similar step are correctedness and speed. Let's use cmocka. It is available on ubuntu as the package
libcmocka-dev
. We can find some assertion statements in its API docs, in particular I will be
using assert_float_equal(...)
,
assert_true(...)
and
assert_memory_equal(...)
I don't want to go into a cmocka tutorial, but here's the basic structure you want to use for your test file (let's call it
test.c
)
// headers required by cmocka
#include <stddef.h>
#include <setjmp.h>
#include <stdarg.h>

#include <cmocka.h>

#include "Quaternion.h"

static void test_creation(void **state) {
	// test creation functions here
}

static void test_operations(void **state) {
	struct Quaternion q {.w = 0.1, .x = 0.2, .y=0.3, .z=0.4};
	// test creation functions here
}

int main()
{
	const struct CMUnitTest tests[] = {
		cmocka_unit_test(test_creation),
		cmocka_unit_test(test_operations),
		// can keep adding more
	};
}
You can use this to fill out even a partial implementatino of the Quaternion.c functionality. Once you have some logic in the tests, even if it doesn't work, build it all with:
gcc -o test test.c Quaternion.c -lcmocka
and run
./test
. With all my correctness tests checked into test.c, my output looks like the following:
[==========] Running 3 test(s).
[ RUN      ] test_creation
[       OK ] test_creation
[ RUN      ] test_easy_operations
[       OK ] test_easy_operations
[ RUN      ] test_hard_operations
[       OK ] test_hard_operations
[==========] 3 test(s) run.
[  PASSED  ] 3 test(s).
If you don't want to implement the unit tests still want to implement your own Quaternion module, you can download my finished test here and use it to check your implementation. Also, I got sick of using a gcc command to build, so I wrote a simple makefile:
all: test.o Quaternion.o
	gcc -o test test.o Quaternion.o -lm -lcmocka

test.o: test.c Quaternion.h
	gcc -c test.c

Quaternion.o: Quaternion.c Quaternion.h
	gcc -c Quaternion.c
In addition to making building more convenient, you can see the clear separation between modules. If you had multiple modules in a project, you could have multiple tests depending on each module independently to ensure only its symbols were required to run it. Again, check out the
nm
output for each .so you produce to get an idea of the separation between components. Finally, let's implement a simple speed benchmark just so we have a number. Let's call the file bench.c Since the logic for a test is pretty straightforward and repetitive, I'll use macros to factor out redundant boilerplate. The important part is that we use CLOCK_THREAD_CPUTIME_ID to get accurate timing of used processor time.
#include <time.h>
#include <stdlib.h>
#include <stdio.h>

#include "Quaternion.h"


#define RANDQUAT(q) \
    q.w = rand(); \
    q.x = rand(); \
    q.y = rand(); \
    q.z = rand();

#define RANDQARR(N,id) struct Quaternion id[N];\
    for (int i=0; i<N; ++i){ \
        RANDQUAT(id[i]); \
    }

#define BENCH_BINARY_OP(fn,N) \
    RANDQARR(N, bi##fn##a) \
    RANDQARR(N, bi##fn##b) \
    struct timespec fn##ts1, fn##ts2; \
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &fn##ts1); \
    for (int i=0; i<N; ++i) \
        fn(&bi##fn##a[i], &bi##fn##b[i]); \
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &fn##ts2); \
    double time##fn = (fn##ts2.tv_sec - fn##ts1.tv_sec) * 1e9; \
    time##fn += (fn##ts2.tv_nsec - fn##ts1.tv_nsec); \
    printf("average for %s : %fns\n", #fn, time##fn / N);

#define BENCH_UNARY_OP(fn,N) \
    RANDQARR(N, un##fn) \
    struct timespec fn##ts1, fn##ts2; \
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &fn##ts1); \
    for (int i=0; i<N; ++i) \
        fn(&un##fn[i]); \
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &fn##ts2); \
    double time##fn = (fn##ts2.tv_sec - fn##ts1.tv_sec) * 1e9; \
    time##fn += (fn##ts2.tv_nsec - fn##ts1.tv_nsec); \
    printf("average for %s : %fns\n", #fn, time##fn / N);

#define SAMPLE_SIZE 10000

int main(void){
    srand(1);

    BENCH_UNARY_OP(init_Quaternion, SAMPLE_SIZE);
    BENCH_UNARY_OP(norm_Quaternion, SAMPLE_SIZE);
    BENCH_UNARY_OP(negate_Quaternion, SAMPLE_SIZE);
    BENCH_UNARY_OP(inv_Quaternion, SAMPLE_SIZE);
    BENCH_UNARY_OP(conj_Quaternion, SAMPLE_SIZE);

    BENCH_BINARY_OP(copy_Quaternion, SAMPLE_SIZE);
    BENCH_BINARY_OP(add_Quaternion, SAMPLE_SIZE);
    BENCH_BINARY_OP(sub_Quaternion, SAMPLE_SIZE);
    BENCH_BINARY_OP(mul_Quaternion, SAMPLE_SIZE);
    BENCH_BINARY_OP(rot_Quaternion, SAMPLE_SIZE);
}
From compiling this with my code, I got:
average for init_Quaternion : 4.500000ns
average for norm_Quaternion : 4.820000ns
average for negate_Quaternion : 4.820000ns
average for inv_Quaternion : 14.090000ns
average for conj_Quaternion : 4.100000ns
average for copy_Quaternion : 5.610000ns
average for add_Quaternion : 5.800000ns
average for sub_Quaternion : 8.490000ns
average for mul_Quaternion : 13.910000ns
average for rot_Quaternion : 42.050000ns
Which gives us a basic understanding of our library's performance. We've proven reliability, gotten a sense of performance. We now have a C building block to use from some client.

Python Extension Module Design

From the python side, what we want to do is adapt the C interface to a good Python interface. Essentially the interface to an existing C type will not match the expectations of a python interface, and if it does, it's at the expense of the C side. In this case, we are simply going to define new functions using our library that handle a few conditions, similar to how we would in python, and leverage the C library when possible, and avoid it when it doesn't make sense. Think of this as a waiter and a kitchen, the kitchen knows how to cook at scale and fast but the waiter is trying to make the experience comfortable for the user, and when we're writing stuff for python we're solidly in cozy waiter-land! Now that we're working with these sources, our old simplequatmodule.c is mostly full of rubbish. Make a new file by renaming it to CQuatModule.c, remove the struct, remove all the declarations. In-fact, remove everything but the PyTypeObject and module definition, and rename the module and the type name to CQuat. Let's define new functions for the python handlers:
typedef struct {
    PyObject_HEAD;
    struct Quaternion q;
} Quaternion;

/* --- Arithmetic Operator Section  --- */
static PyObject* Quaternion_add(Quaternion *self, Quaternion *other);
static PyObject* Quaternion_subtract(Quaternion *self, Quaternion *other);
static PyObject* Quaternion_multiply(Quaternion *self, PyObject *other);
static PyObject* Quaternion_negate(Quaternion *self);

/* --- PyMethodDef Section --- */
static PyObject* Quaternion_norm(Quaternion *self);
static PyObject* Quaternion_conjugate(Quaternion *self);
static PyObject* Quaternion_invert(Quaternion *self);
static PyObject* Quaternion_rotate(Quaternion *self, PyObject *other);

/* --- Mapping Methods Section --- */
static PyObject* Quaternion_subscript(Quaternion *self, PyObject *key);
static int Quaternion_subscript_assign(Quaternion *self, PyObject *key, PyObject *v);

/* --- Type Method Section --- */
static int Quaternion_init(Quaternion *self, PyObject *args, PyObject *kwds);

The most jarring thing you might notice, is that instead of defining the C structure with the members inline with PyObject_HEAD, we add an instance to our object in there. This is important because if the object changes, we would have to match our implementation. And more importantly, this would also work with abstract types that don't expose their interface. (and if you reproduced the members you'd have to do some hacky casting to make it work). Note that this will change our offsetof members from e.g. w to q.w. It's also in writing that struct that I realize the "casting" of the PyObject works because PyObject_HEAD just puts the PyObject struct as the first element of our struct. We're just skipping past it when we access the elements! Note that you could design your C structure and functions in such a way to pass them directly to these handlers, but then we wouldn't be able to do anything python-y inside of them. Similar to the math operators we defined in the simpler module version, we're populating a PyMappingMethod struct so that our type can be accessed and assigned via subscript operator, e.g.
q[1] = 2.3
. The populated struct looks like this:
/* --- populate mapping methods --- */
static PyMappingMethods Quaternion_mappers = {
    .mp_subscript = (binaryfunc)Quaternion_subscript,
    .mp_ass_subscript = (objobjargproc)Quaternion_subscript_assign,
};
and will be assigned to a member of PyTypeObject called "tp_as_mapping", which matches the motif set by ".tpasnumber". In addition to the operators we defined in the arithmetic and mapping section, we now want to define regular methods. This is done in a PyMethodDef. I define that as such:
/*--- populate Regular Methods --- */
static PyMethodDef Quaternion_methods[] = {
    {"norm",        (PyCFunction)Quaternion_norm,         METH_NOARGS,    "Compute norm"},
    {"conjugate",   (PyCFunction)Quaternion_conjugate,    METH_NOARGS,    "Compute conjugate"},
    {"inverse",     (PyCFunction)Quaternion_invert,       METH_NOARGS,    "Compute inverse"},
    {"rotate",      (PyCFunction)Quaternion_rotate,      METH_VARARGS,   "Rotate this object by the input quaternion"},
    {NULL},
};
Similar to the PyMemberDef array, we terminate the array with a NULL pointer. Also note that the functions are either without arguments or with variable arguments. Recall from before we could use other flags to make these invocations faster. Most of these definitions will be fairly straightforward, of the form:
static PyObject* Quaternion_add(Quaternion *self, Quaternion *other) {
    Quaternion *ret = (Quaternion*)PyObject_New(Quaternion, &QuaternionType);
    copy_Quaternion(&ret->q,&self->q);
    add_Quaternion(&ret->q, &other->q);
    return (PyObject*)ret;
}
but recall, the rotate accepts an argument tuple, so the form:
static PyObject* Quaternion_rotate(Quaternion *self, Quaternion *other) {
    Quaternion *ret = (Quaternion*)PyObject_New(Quaternion, &QuaternionType);
    copy_Quaternion(&ret->q, &self->q);
    rot_Quaternion(&ret->q, &((Quaternion*)quat)->q);
    return (PyObject*)ret;
}
Would only work if the user passed in a single Quaternion as other. Any other arguments, or multiple arguments would cause disasterous failure and some general error message. For this and other operations, we will have to deduce the the argument's type and convert them. For example, depending on how we want to define the
__init__
function, we may want to treat one or up to 4 numeric values as a valid initialization of a Quaternion. Similarly, the subscript operator requires a key. How do we manage this type amgibuity? Here are some functions of interest:
// -- from the argument parsing API

/* Parse the parameters of a function that takes only positional parameters into local variables.
Returns true on success; on failure, it returns false and raises the appropriate exception. */
int PyArg_ParseTuple(PyObject *args, const char *format, ...)

/* Parse the parameters of a function that takes both positional and keyword parameters into local variables. The keywords argument is a NULL-terminated array of keyword parameter names. Empty names denote positional-only parameters. Returns true on success; on failure, it returns false and raises the appropriate exception. */
 int PyArg_ParseTupleAndKeywords(PyObject *args, PyObject *kw, const char *format, char *keywords[], ...)

// -- from the concrete object layer API

/* Return true if its argument is a PyLongObject or a subtype of PyLongObject. This function always succeeds. */
int PyLong_Check(PyObject *p);

/* Return true if its argument is a PyFloatObject or a subtype of PyFloatObject. This function always succeeds. */
int PyFloat_Check(PyObject *p);

/* Return a C double representation of the contents of pyfloat.
If pyfloat is not a Python floating point object but has a __float__() method,
this method will first be called to convert pyfloat into a float.
If __float__() is not defined then it falls back to __index__().
This method returns -1.0 upon failure, so one should call PyErr_Occurred() to check for errors. */
double PyFloat_FromDouble(PyObject *pyfloat);

/*Return a C long representation of obj. If obj is not an instance of PyLongObject,
first call its __index__() or __int__() method (if present) to convert it to a PyLongObject.
Raise OverflowError if the value of obj is out of range for a long.*/
long PyLong_AsLong(PyObject *obj);
Parsetuple is yet another similar-to-scanf function, with conversion specifiers found here, we can use this with our creation and multiplication function as such:
static int Quaternion_init(Quaternion *self, PyObject *args, PyObject * kwds) {
    static char *kwlist[] = {"w","x","y","z",NULL};
    // "d" means double, and "|" indicates the following arguments are optional
    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|dddd", kwlist,
                                     &self->q.w,
                                     &self->q.x,
                                     &self->q.y,
                                     &self->q.z))
        return -1;

    return 0;
}

static PyObject* Quaternion_multiply(Quaternion *self, PyObject *other) {
    Quaternion *ret = (Quaternion*)PyObject_New(Quaternion, &QuaternionType);
    copy_Quaternion(&ret->q,&self->q);
    // are we multiplying by a numeric type? if so, scale
    if (PyLong_Check(other) || PyFloat_Check(other))
        scale_Quaternion(&ret->q, PyFloat_AsDouble(other));
    else
        mul_Quaternion(&ret->q, &((Quaternion*)other)->q);

    return (PyObject*)ret;
}
and the reworked rotation function now looks like:
static PyObject* Quaternion_rotate(Quaternion *self, PyObject *arg) {
    PyObject *quat = NULL;
    if (!PyArg_ParseTuple(arg, "O!", &QuaternionType, &quat))
        return NULL;

    Quaternion *ret = (Quaternion*)PyObject_New(Quaternion, &QuaternionType);
    copy_Quaternion(&ret->q, &self->q);
    rot_Quaternion(&ret->q, &((Quaternion*)quat)->q);
    return (PyObject*)ret;
}
Note that the "O" conversion specifier would normally just require a
PyObject**
to be passed in so an "object" could be found and assigned to it, but the "O!" form actually allows us to restrict the type, which we do by first passign a pointer to that type's PyTypeObject. Note that each basic type has its PyTypeObject accessible as a name in Python.h. Finally, the part that they tried to feed me first but was too boring to go through, which ended up turning this whole post into a tutorial: error handling! The docs say we need to keep in mind: - When a function fails, it should set an exception condition and return an error value (e.g. NULL) - Exception information is stored in 3 member of the interpreter's thread state. If there is no exception, then they would all be NULL This is a massive section and there's a lot of details like defining your own exception classes, but here's what we're going to use:
/*This is the most common way to set the error indicator. The first argument specifies the exception type; 
it is normally one of the standard exceptions, e.g. PyExc_RuntimeError. You need not increment its reference count.
The second argument is an error message; it is decoded from 'utf-8’. */
void PyErr_SetString(PyObject *type, const char *message);
Since the rotation type is already handling strict checking of the input type as a Quaternion, we will add more explicit failure case to multiplication.
static PyObject* Quaternion_multiply(Quaternion *self, PyObject *other) {
    Quaternion *ret = (Quaternion*)PyObject_New(Quaternion, &QuaternionType);
    copy_Quaternion(&ret->q,&self->q);
    // are we multiplying by a numeric type? if so, scale
    if (PyLong_Check(other) || PyFloat_Check(other))
        scale_Quaternion(&ret->q, PyFloat_AsDouble(other));
    else if (PyObject_IsInstance(other, (PyObject*)&QuaternionType))
        mul_Quaternion(&ret->q, &((Quaternion*)other)->q);
    else
    {
        PyErr_SetString(PyExc_TypeError, "cannot multiply by non numeric, non quaternion type");
        return NULL;
    }

    return (PyObject*)ret;
}
Putting this all together, our QuaternionType definition now looks like:
static PyTypeObject QuaternionType = {
    PyVarObject_HEAD_INIT(NULL, 0)  /*object's reference count I think, mandatory boilerplate*/
    .tp_name = "CQuat.Quaternion",
    .tp_doc = PyDoc_STR("Quaternion object"),
    .tp_basicsize = sizeof(Quaternion),
    .tp_flags = Py_TPFLAGS_DEFAULT, /* bitmask of what "special fields" exist, just use default */
    .tp_itemsize = 0, /* only for variable sized objects */
    .tp_init = (initproc)Quaternion_init,
    .tp_members = Custom_members,
    .tp_methods = Quaternion_methods,
    .tp_as_number = &Quaternion_operators,
    .tp_as_mapping = &Quaternion_mappers,
    .tp_repr = (reprfunc)Quaternion_repr,

    /* create new instance using type's tp_alloc slot */
    .tp_new = PyType_GenericNew,

     /* ".tp_alloc = PyType_GenericAlloc" is the default allocation, which uses the basicsize we
     * set earlier to create this memory and use Python's default memory allocation to
     * allocate a new instance and initialize all its contents to NULL, we can think of this
     * field as optional*/

    /* however, the dealloc must be defined (unless our struct is empty)*/
    .tp_dealloc = (destructor)Quaternion_dealloc,

};
You can then change your setup.py to look like:
from distutils.core import setup, Extension

quat_module = Extension('CQuat',
        sources=['CQuatModule.c','Quaternion.c'])

setup(name="CQuat",
    version="1.0",
    description = 'A Python Quaternion Extension',
      ext_modules=[quat_module])
and build and test out our swanky operations with the generated shared object

Taking Stock

Now that we've got the python module, let's do a little benchmark for comparison. I don't care as much about correctness since I know the underlying building block is well secured, so any changes will be done more for the way we present it than verifying its functionality. In-fact, if I were to write tests, it would be to verify that the operators are implemented correctly rather than the library is mathematically sound. Here is a benchmark that roughly matches the benchmark from earlier:
from CQuat import CQuat
from random import random
import time


def rand_CQuat():
    q = CQuat()
    q.w = random()*20000
    q.x = random()*20000
    q.y = random()*20000
    q.z = random()*20000
    return q

def time_binary_op(func, N):
    ops = [(rand_CQuat(),rand_CQuat()) for i in range(0,N)]

    t1 = time.time()
    for i in range(0,N):
        func(ops[i][0], ops[i][1])

    t2 = time.time()
    print(f"%s avg time %s" % (func.__name__, (t2 - t1)*(1e9/N)))

def time_unary_op(func, N):
    ops = [rand_CQuat() for i in range(0,N)]

    t1 = time.time()
    for i in range(0,N):
        func(ops[i])

    t2 = time.time()
    print(f"%s avg time %s" % (func.__name__, (t2 - t1)*(1e9/N)))

if __name__ == '__main__':
    # generate input quaternions
    N = 10000
    time_unary_op(CQuat.norm,N)
    time_unary_op(CQuat.__neg__,N)
    time_unary_op(CQuat.inverse,N)
    time_unary_op(CQuat.conjugate,N)


    time_binary_op(CQuat.__add__,N)
    time_binary_op(CQuat.__sub__,N)
    time_binary_op(CQuat.__mul__,N)
    time_binary_op(CQuat.rotate,N)
and now when we run it, we get...
norm avg time 29.18243408203125
__neg__ avg time 47.11151123046875
inverse avg time 47.63603210449219
conjugate avg time 38.19465637207031
__add__ avg time 74.24354553222656
__sub__ avg time 67.78240203857422
__mul__ avg time 85.25848388671875
rotate avg time 93.12629699707031
Wow, much slower! But that's in line with what we expect from our sczhmoozy waiter. We now have a well defined module. We've quantified the C and Python portions and they seems good enough for government work. You can share the source with people that want to install the package, or just share the .so file if you want to be secretive about it. Here's also where you could start writing additional utilities (probably in python) to do those conversion functions we talked about earlier. It could even be a utility file specifically for numpy or opencv types. At this level it makes sense. Could pyquaternion be better? Absolutely. We could start inlining assembly and putting in different optimization flags or allow for a trade-off with a less precise floating type or even make the precision flexible. The code could be more verbose and we could probably make many operators more concise. Even the C interface could be more generous about converting types to quaternions for us. Will I keep working on this gradually and keep improving this library bit by bit? Absolutely not. A program only needs to be clear enough, a program only needs to be fast enough, it doesn't need to be as eloquent or as speedy as possible, but it's your job to think of what "enough" is. From the perspective of learning about modules, writing better C, writing this post, communicating all of it to you, and being within my time budget, there is no more perfect version of CQuat than what you've been reading. Thanks for reading, I didn't know any of this before starting. A full implementation of the project is found on my gitlab.