Awesome
c-iterators
A demonstration of implementing, and using, a "type safe", extensible, and lazy iterator interface in pure C99. The iterable is generic on the input side, but not output side - functions taking an Iterable
don't need to know the concrete data structure backing up the Iterable
, but the type of value the Iterator
yields must be concrete and exact, no void*
. (Well, you can still make it void*
if you want - but I wouldn't suggest it.)
This doesn't mean you can't have an iterable of generic elements though. More on that can be found in Iterable of Generic Elements and The Typeclass Pattern document.
The only files you need to implement the Iterator
typeclass, for your own types, are- typeclass.h
, maybe.h
, and iterator.h
. The usages of these files, as well as extra utilities operating on iterables are shown in examples/
. examples/iterutils
also demonstrates the implementation of take
and map
utilites.
More info about the file structure can be found in the Architecture document.
You can find the generated docs here.
If you're looking for functional, lazy abstractions on top of base iterators, check out Lazy Abstractions and c-iterplus.
A small taste
Here's a snippet where the Iterator
typeclass has been implemented for a "fibonacci struct". Giving you an iterable representing the infinite Fibonacci sequence.
Iterable(uint32_t) it = get_fibitr(); /* Create an infinite fibonacci sequence iterable */
Iterable(uint32_t) it10 = take_from(it, 10, uint32_t); /* Iterable of the first 10 items in the sequence */
/* Print the first 10 items */
foreach (uint32_t, n, it10) {
printf("%" PRIu32 " ", n);
}
puts("");
The fibonacci struct for which Iterator
has been implemented looks like-
typedef struct fibonacci
{
uint32_t curr;
uint32_t next;
} Fibonacci;
The get_fibitr
macro does nothing but just initialize that struct with curr = 0
, and next = 1
, turn it into an Iterable
and return it.
The take_from
macro is explained in Lazy Abstractions. You may be familiar with take
already though.
This entire construct is lazy. No extra iteration is performed. The only iteration that happens here is in the explicit foreach
loop. Neither get_fibitr
nor take_from
does eager generation. In fact, you could even do take
and map
at the same time, and both would be evaluated together - in a singular iteration.
/* A function that increments and returns the given integer */
static uint32_t incr(uint32_t x) { return x + 1; }
...
Iterable(uint32_t) it = get_fibitr(); /* Create an infinite fibonacci sequence iterable */
Iterable(uint32_t) it10 = take_from(it, 10, uint32_t); /* Iterable of the first 10 items in the sequence */
Iterable(uint32_t) incrit10 = map_over(it10, incr, uint32_t, uint32_t); /* Map the `incr` function over it10 */
How cool is that? You can see the map
utility in action at map_over.c. Its implementation is explained in Lazy Abstractions
Highlights
-
Pure C99 support, no non standard extensions used
-
Type safety (through the usage of abstracted macros that monomorphize based on the type given)
Though you may have to actually turn on the warnings, preferably all warnings.
-
Lazy-ness (the iterables are all lazily consumed, iterator utilites can also be chained lazily to evaluate all at once on demand)
-
Extensible (
Iterable
uses dynamic dispatch, allowing library functions to use it in a public API) -
Functions working on
Iterables
can returnIterables
- allowing them to be chained together lazily. This means you can have amap
that returns anIterable
, pass it through afilter
that also returns anIterable
- and both the map and filter will be evaluated in one singular iteration, on demand. Or you couldtake
from anIterable
and thenmap
on it, there won't be 2 iterations to do this, just one. (see iterutils and Advanced Usage)
Building
Although you don't really need to build anything per se, since the primary files (maybe.h
, typeclass.h
, and iterator.h
) are just headers that you can include in your project, you may still use the provided CmakeLists.txt
(CMake 3.15 or higher) to build an executable running all the examples. The built executable should be present in the examples/
directory inside the build directory.
UNIX
mkdir build
cd build
cmake -G "Unix Makefiles" ..
This will create the required make
config inside build/
. Now you can run make
to build the executable.
Windows
Visual Studio (2017 or higher)
You must have CMake integration for Visual Studio installed.
Open this project in Visual Studio (2017 or higher) and hit Build -> Build All
from the toolbar. This will build the executable and put it into a directory named out/
(by default). You can also directly run the built executable by choosing the "Startup Item" near the debug configuration.
MinGW/Cygwin
Same as the unix way.
A word on terminology
I like to use the word Iterator
to refer to the typeclass, or the interface, or trait - whatever you wanna call it. Iterable
, on the other hand, I use to refer to the Iterator
instance, the concrete type that functions can take in, the type that holds the self
member. This terminology is just for clarity - I don't really mind if you use the 2 words interchangably.
Usage
In general, there are some contraints to implementing Iterator
for a type-
-
The type must be a pointer
-
The type must be able to hold iteration information about itself - specifically, how much of itself has been iterated already and what's the next element.
This is because the
next
function of the typeclass needs to just take in this type and figure out what to return. -
The element that will be yielded from the iterator instance of this type, must have an alphanumeric type name. If the element is a pointer,
typedef
it into some alphanumeric type name. -
A
Maybe(T)
for the correspondingT
(type that theIterator
will yield) must exist.
Of course, you'll also need to have the Iterator(T)
and Maybe(T)
for a certain T
(element the Iterable
will yield) defined before you can implement Iterator(T)
for anything. Remember to define those using DefineIteratorOf
and DefineMaybe
respectively.
The examples define the common Maybe
and Iterator
types in func_iter.h, this file is then included by most other files.
Iterating through an Iterable
To iterate through an Iterable
, you call the Iterable
's next
function and pass it the self
member of the iterator.
This function's return value is of type Maybe
. In case you're unfamiliar- a Maybe
type represents the presence or absence of a value. As a C programmer, think of how a pointer can either point to a valid object, or be NULL
- indicating the absence of any valid object to point to.
When the iteration ends, a Nothing
value is returned. Otherwise, a Just
value is returned. Use the is_nothing
or is_just
macros to check whether or not a certain Maybe
struct contains a value. If is_nothing
returns false, or is_just
returns true - you can use from_just
or from_just_
to extract the actual value from the Maybe
struct.
Quick note: Difference between from_just
and from_just_
from_just_
is simply the "unsafe" version of from_just
. from_just_
assumes the passed maybe actually has a Just
value and simply returns it, if the passed maybe did not have a value (i.e was Nothing
), the behavior is undefined (the value can be indeterminate, but if the Maybe
struct was built using the provided Nothing
macro, the value will actually be zero initialized). from_just
on the other hand, will abort the program with an error if the passed maybe did not have a value (i.e was Nothing
). It is totally safe to use from_just_
after you've checked is_nothing
or is_just
accordingly, though.
When you call next
on an Iterable
, you consume the iterator. This is a mutating process - the next time you call next
, it'll return the next element - until it is fully consumed. Once that happens, calling next
on it will keep returning Nothing
values.
Note: This behavior is actually dependent on the next
function implementation that the user provides when making their own data stucture an Iterable
. An erroneous next
function will not have defined behavior.
So, you can iterate through the entire iterable (therefore consuming it) by doing-
Iterable(int) it = ...; /* Acquire the iterable */
for (Maybe(int) res = it.tc->next(it.self); is_just(res); res = it.tc->next(it.self)) {
int x = from_just_(res);
/* do stuff with x */
}
/* Iterable returned `Nothing`, iteration finished - iterable has been fully consumed */
Seems like a repeating pattern for any given type (here the type is int
), right? You can implement a macro instead-
#define UNIQVAR(x) CONCAT(CONCAT(x, _4x2_), __LINE__) /* "Unique" variable name */
/* Iterate through given `it` iterable that contains elements of type `T` - store each element in `x` */
#define foreach(T, x, it) \
Maybe(T) UNIQVAR(res) = (it).tc->next((it).self); \
for (T x = from_just_(UNIQVAR(res)); is_just(UNIQVAR(res)); \
UNIQVAR(res) = (it).tc->next((it).self), x = from_just_(UNIQVAR(res)))
(You can find this macro in iterable_utils.h)
Using this macro instead of the manual loop, the above snippet could look like-
Iterable(int) it = ...; /* Acquire the iterable */
foreach (int, x, it) {
/* do stuff with x */
}
Much cleaner!
Some basic functions that work on an Iterable
You can find these in iterable_utils.c
sum_intit
- Sum all values in an Iterable(int)
int sum_intit(Iterable(int) it)
{
int sum = 0;
foreach (int, x, it) {
sum += x;
}
return sum;
}
print_strit
- Print all values in an Iterable(string)
/* Note: `string` is just `typedef`-ed `char*` */
void print_strit(Iterable(string) it)
{
foreach (string, s, it) {
printf("%s ", s);
}
puts("");
}
revlist_from_intit
- Build a singly linked list from an iterable (list is built in reverse order)
This one is much more useful and practical, but slightly less basic.
IntList revlist_from_intit(Iterable(int) it)
{
IntList list = Nil;
foreach (int, val, it) {
list = Cons(val, list);
}
return list;
}
This allows you to build a linked list from any Iterable
, and since Iterable
is lazy - and can be implemented for any type, you not only have a generic function to build a list but you also essentially skip an extra iteration.
Note: Cons
is just an alias to prepend_intnode
, which is a function that prepends values to a singly linked list of ints. Nil
is an alias to NULL
.
Expected behavior of next
When you're implementing Iterator
for your desired type, the next function implementation you provide must follow some rules (outside of the context of the type system). These are as following-
- The function must return
Nothing
at the end of iteration, all returns before this must beJust
. - Once
Nothing
has been returned (i.e iteration has ended) - any extra calls to thenext
function must keep returningNothing
.
Implementing Iterator
For Arrays
We'll start with implementing Iterator
for good ol' arrays. You can find the code for the implementation part in array_iterable.h and array_iterable.c.
We need to keep track of the index and the size of the array to implement Iterator
for it. So we need a struct
like-
struct arriter
{
size_t i;
size_t const size;
T const* const arr;
}
A struct that keeps track of the i
(the current index), the size
(the length of the array), and the array itself. It forbids modifications to size
and arr
.
Notice the T
, this isn't an actual type. The snippet just demonstrates that each arriter
will need to hold its own type of array. Of course, you could cheat and use void*
- but let's keep it type safe here.
We'll call this struct concept ArrIter
. This concept is similar to rust's IntoIter
.
An ArrIter
where T = int
(i.e array of ints) would look like-
struct intarriter
{
size_t i;
size_t const size;
int const* const arr;
}
You can generalize this into a macro-
#define ArrIter(ElmntType) ElmntType##ArrIter
#define DefineArrIterOf(T) \
typedef struct \
{ \
size_t i; \
size_t const size; \
T const* const arr; \
} ArrIter(T)
The macro will define the ArrIter
struct based on the given array element type. It'll also use that type name to name the struct itself. You can later refer to this struct using ArrIter(T)
. Let's define one for int
-
DefineArrIterOf(int);
Now, we need to implement Iterator
for ArrIter(T)
. To do that, we use the impl_iterator
macro provided by iterator.h
. But before that, we need the next
function that can work on ArrIter
-
static Maybe(int) intarrnxt(ArrIter(int) * self)
{
int const* const arr = self->arr;
return self->i < self->size ? Just(arr[self->i++], int) : Nothing(int);
}
To implement Iterator
for a type ItrbleType
, that yields elements of type T
- you need a next
function of the signature- Maybe(T) (*)(ItrbleType self)
. Here, ItrbleType
is ArrIter(int)*
and T
is int
.
Now that we have the next
implementation for this struct, we can implement the typeclass.
impl_iterator(ArrIter(int)*, int, prep_intarr_itr, intarrnxt)
This defines a function named prep_intarr_itr
(feel free to name it whatever you want), that takes in a ArrIter(int)*
and returns an Iterable
. Any function can now take in this Iterable
and iterate through it using the same interface without caring about the internals.
You now need to include the declaration of prep_intarr_itr
in a corresponding header file, so it's exposed to the outside world. You can also make this handy macro-
#define arr_into_iter(srcarr, sz, T) \
prep_##T ##arr_itr(&(ArrIter(T)) { .i = 0, .size = sz, .arr = srcarr })
This will take in an array (src
), its size (sz
), and its element type name (T
) - and call the prep function, automatically creating and passing in the ArrIter
struct. Note that this heavily relies on consistent naming of the function previously defined by impl_iterator
. This assumes that such functions will be named prep_:T:arr_iter
, so for T = int
, the function should be named prep_intarr_itr
. You can maintain this convention strictly by having another macro to name the function during definition-
#define prep_arriter_of(T) prep_##T ##arr_itr
And use this name during implementing-
impl_iterator(ArrIter(int)*, int, prep_arriter_of(int), intarrnxt)
And also when referring to it-
#define arr_into_iter(srcarr, sz, T) \
prep_arriter_of(T)(&(ArrIter(T)) { .i = 0, .size = sz, .arr = srcarr })
Consistency is key to safety!
For Linked Lists
Now, we'll implement Iterator
for a singly linked list. You can find the code for the implementation part in list_iterable.h and list_iterable.c.
Just so we're on the same page, the linked list impl looks like-
typedef struct int_node
{
int val;
struct int_node* next;
} IntNode, *IntList;
typedef IntNode const* ConstIntList;
It has the following functions to work with it-
/* Create and prepend an IntNode to given IntList and return the new list */
IntList prepend_intnode(int val, IntList list);
/* Print the given int list */
void print_intlist(ConstIntList head);
/* Free the given IntList */
IntList free_intlist(IntList head);
Just like we did for array, we will need a struct to keep track of the state of iteration for the list. We'll call this ListIter
-
#define ListIter(T) T##ListIter
#define DefineListIterOf(T) \
typedef struct \
{ \
T curr; \
} ListIter(T)
This is much the same construct as an ArrIter
. We have a ListIter
for a particular type of list. The T
here refers to the type of the list struct itself. Let's define a ListIter
for our linked list built using IntNode
s. We'll use a IntNode const*
- to make sure the iterable has no way to modify the actual contents of the node. Since T
has to be alphanumeric - we typedef-ed it into ConstIntList
-
DefineListIterOf(ConstIntList);
Now, we will implement Iterator
for ListIter(T)
. Just like last time, we use impl_iterator
. Here's the next
function we'll use for our ListIter(ConstIntList)
impl-
static Maybe(int) intlistnxt(ListIter(ConstIntList) * self)
{
IntNode const* node = self->curr;
if (node == Nil) {
return Nothing(int);
}
self->curr = node->next;
return Just(node->val, int);
}
Note: Nil
is an alias to NULL
.
Now that we have the next
implementation for this struct, we can implement the typeclass-
impl_iterator(ListIter(ConstIntList) *, int, prep_listiter_of(ConstIntList), intlistnxt)
Where prep_listiter_of
is a similar construct to the previously encountered prep_arriter_of
, defined as a macro-
#define prep_listiter_of(T) prep_##T ##_itr
Finally, we can have a nice helper macro to convert a list to an iterable-
#define list_into_iter(head, T) prep_listiter_of(T)(&(ListIter(T)){.curr = head})
Examples
- Using an array's iterator instance
- Using a list's iterator instance
- Using an iterable to build a list
- Using an iterator to represent the infinite fibonacci sequence
- Mapping over an iterable
Things to keep in mind
-
Mutation is inherent to iterators. During every iteration, the state of the structure backing up the iterable is altered. Once an iterator has been fully consumed, it can no longer be iterated over - it'll just keep returning
Nothing
. You may already be used to this behavior if you're using a non-pure language with built in iterators though. -
If you're making a custom iterable utility that is backed up by another iterable (see
map
,take
) - the source iterable is also consumed when you iterate over the wrapper. This is demonstrated, and taken advantage of, in the fibonacci example. -
Be very careful about lifetimes when you're using the very convenient macros showcased in the examples! All the macros that create and return an
Iterable
take the address of a compound literal. Compound literals are local to the scope and hence the address is valid for the lifetime of that scope. Don't use theIterable
outside that scope. As a rule of thumb, always declare and initialize theIterable
s in the same line (using the macros). -
If you need to return an
Iterable
from a function - you should make sure itsself
member's lifetime doesn't end upon returning. Sinceself
is a pointer, the data it is pointing to may have any storage duration. If you're responsible for filling thisself
member - make sure you pay attention to its lifetime.As mentioned previously, the utility macros, used in the examples to build
Iterable
s, use compound literals - whose lifetimes end once the enclosing scope ends.Iterable
s built in this way are not suitable to be returned (or used) outside of their enclosing scope. -
The
tc
member of the typeclass contains a pointer to a struct withstatic
storage duration - so this pointer is totally reusable in any scope.
Semantics
maybe.h
The Maybe
struct is quite simple. This is what it generally looks like (without a concrete type)-
typedef enum
{
MaybeTag_Nothing = 0, /**< `Nothing` tag - indicates absence of a value. */
MaybeTag_Just /**< `Just` tag - indicates presence of a value. */
} MaybeTag;
struct maybe_t
{
MaybeTag tag;
/* Don't access this member manually */
T val;
};
It's a tagged "union", nothing special. The Just
tag is used to indicate the presence of a value in val
, and Nothing
is used to indicate the absence of a value in val
.
The DefineMaybe
macro is used to define an actual concrete Maybe
struct for the given type, the type is also used to name the Maybe
struct.
#define Maybe(T) Maybe##T
The name of the Maybe
struct containing a value of type T
is just Maybe
and the type name concatenated together. This is why T
must be alphanumeric.
There's one more thing DefineMaybe
does, it defines a static inline
function-
static inline T T_from_just(Maybe(T) maybex)
{
if (is_just(maybex)) {
return maybex.val;
} else {
fputs("Attempted to extract Just value from Nothing", stderr);
abort();
}
}
This is the function the from_just
macro calls. The function ensures type safety - but most importantly, it ensures that referring to maybex
doesn't execute it multiple times. In case the user passes an expression with side effects (that returns a Maybe(T)
) to from_just
.
So, a call like DefineMaybe(int)
translates to-
typedef struct
{
MaybeTag tag;
/* Don't access this member manually */
int val;
} Maybeint;
static inline int int_from_just(Maybeint maybex)
{
if (is_just(maybex)) {
return maybex.val;
} else {
fputs("Attempted to extract Just value from Nothing", stderr);
abort();
}
}
Alongside the utilities to define and refer to a Maybe
of a certain type - there's also type constructors and value extractors, as well as macros to check if the Maybe
is a Just
or Nothing
.
-
Just
takes in a value, and the type of said value (alphanumeric, same one used duringMaybe
definition) and constructs aMaybe(T)
. This is "type safe",v
must actually be of typeT
, otherwise there will be, at worst a warning, and at best an explicit error.Just(1, int)
translates to((Maybe(int)){.tag = MaybeTag_Just, .val = (1)})
. Just the compound literal to build the struct. Type safety comes from the fact that theval
member forMaybe(int)
is of typeint
. -
Nothing
takes in a type (alphanumeric, same one used duringMaybe
definition) and constructs aMaybe(T)
tagged withNothing
.Nothing(int)
translates to((Maybe(int)){0})
. I decided to zero initialize the struct since I've set theNothing
tag to0
explicitly. However, it'd be totally valid to only set the tag toNothing
and leave theval
member indeterminate. Since you shouldn't accessval
if tag isNothing
anyway. -
from_just
andfrom_just_
have previously been mentioned briefly.from_just
takes in aMaybe
struct, and theT
(type theMaybe
contains) and calls theT##_from_just
function above. The function checks if theMaybe
is indeedJust
, and returns the value. If it isNothing
, however, the program aborts.from_just_
directly accesses and returns theval
member of the givenMaybe
struct, it does not take in theT
parameter, since it doesn't need to. Only use this after you've made sure theMaybe
struct is aJust
. Otherwise the behavior is undefined. Though in practical terms, if theMaybe
struct was built using theNothing
macro,val
would just be zero initialized. This should not be relied on however. -
is_just
andis_nothing
are self explanatory, they compare thetag
toMaybeTag_Just
andMaybeTag_Nothing
respectively.
typeclass.h
This file provides utility macros to define a typeclass and its instance.
A typeclass is just a struct containing a bunch of function pointers.
A typeclass instance contains a pointer to the typeclass struct in its tc
member. It also contains the self
member of type void*
. This is the concrete type that will be used by functions.
-
The
typeclass
macro takes in a semicolon separated list of function pointer members and puts them into a struct. It doesn't name the struct, that's upto the caller.typedef typeclass( size_t (*const from_enum)(void* self); void* (*const to_enum)(size_t x) ) Enum;
translates to
typedef struct { size_t (*const from_enum)(void* self); void* (*const to_enum)(size_t x); } Enum;
-
The
typeclass_instance
macro takes in the type name of the previously defined typeclass and defines the typeclass instance struct. It also doesn't name the struct.typedef typeclass_instance(Enum) Enumerable;
translates to
typedef struct { void* self; Enum const* tc; } Enumerable;
iterator.h
This uses the typeclass utilities mentioned previously to define the Iterator
typeclass and its instance, Iterable
. The naming of each of these defined structs is dependent on the type of the element the iterator yields.
#define Iterator(T) T##Iterator
#define Iterable(T) T##Iterable
The DefineIteratorOf
macro takes in the typename of the element the Iterator
will yield, and uses that to define the iterator and iterable with appropriate names. This is why the type names must be alphanumeric.
DefineIteratorOf(int);
translates to
typedef typeclass(Maybe(int) (*const next)(void* self)) intIterator;
typedef typeclass_instance(Iterator(int)) intIterable;
Two structs, of names Iterator(int)
(i.e intIterator
) and Iterable(int)
(i.e intIterable
), respectively.
Now, we need a function to implement Iterator
for our own type. That's where the impl_iterator
macro comes in. This is its signature-
#define impl_iterator(IterType, ElmntType, Name, next_f) ...
It defines a function, which turns the custom type (for which the impl is for) into an Iterable
.
It takes in-
-
IterType
, the custom typeIterator
is being implemented for -
ElmntType
, the type thisIterator
will yield.An
Iterator(ElmntType)
(andIterable(ElmntType)
) should already exist, obviously.Must be alphanumeric just like everywhere else.
A
Maybe(ElmntType)
must also exist. -
Name
to give to the function being defined. -
next_f
, the next function implementation for thisIterType
.Must be of type
Maybe(ElmntType) (*)(IterType)
. It should take in a value ofIterType
, and return aMaybe(ElmntType)
.
Generally, you need to include the declaration of this function in a header file yourself.
In the array_iterable.c example. The impl_iterator(ArrIter(int)*, int, prep_arriter_of(int), intarrnxt)
translates to-
Iterable(int) prep_intarr_itr(ArrIter(int)* x)
{
Maybe(int) (*const next_)(ArrIter(int)* self) = (intarrnxt);
(void)next_;
static Iterator(int) const tc = {.next = (Maybe(int) (*const)(void*))intarrnxt};
return (Iterable(int)){.tc = &tc, .self = x};
}
Note the first 2 lines - those make sure the passed next
function impl has the exact correct type (no void*
shenanigans). The 2 lines will not generate any extra code when compiled with a decent compiler - they are just there to present warnings/errors when the type of the passed function is implicitly wrong.
Advanced Usage
Lazy Abstractions
This lazy iterator interface demonstrated, lets you achieve strong abstractions that are also lazy. 2 such abstractions, take
and map
are demonstrated in fibonacci.c and map_over.c respectively. If you'd like even more abstractions, you can find them in c-iterplus.
Many of these abstractions follow the same basic pattern. Have a custom struct that wraps a given iterable - have some extra context in that struct to operate on the elements of said iterable, and a next
function implementation that simply iterates through the src iterable and does some necessary action based on the stored context. No extra iteration is done, the extra functionality is simply applied on top of the source iterable's next
function on demand.
This allows take
to simply wrap the given iterable inside a struct with some context and turn that struct into its iterable implementation. No iteration needs to happen in this process, it's a completely lazy process.
The take
utility
Let's look at how the take
-like utility is implemented. You can find the implementation in iterutils.
The struct we'll use to implement this utility looks like-
struct
{
size_t i;
size_t const limit;
Iterable(T) const src;
}
It stores the source iterable (src
), the iteration step (i
), and the limit to stop the iteration at (limit
). We'll call this struct IterTake
, just like Iterator
and Iterable
, the exact name will depend on the element type.
#define IterTake(ElmntType) IterTake##ElmntType
#define DefineIterTake(ElmntType) \
typedef struct \
{ \
size_t i; \
size_t const limit; \
Iterable(ElmntType) const src; \
} IterTake(ElmntType)
Now we need the next
function impl to implement Iterator
for this IterTake
struct. That function should look like-
static Maybe(T) IterTake(T)_nxt(IterTake(T) * self)
{
if (self->i < self->limit) {
++(self->i);
Iterable(T) srcit = self->src;
return srcit.tc->next(srcit.self);
}
return Nothing(T);
}
It simply iterates through the source iterable but stops if it reaches the limit specified in the IterTake
struct. Now, impl_iterator
can be used to implement Iterator
for the IterTake(T)
-
impl_iterator(IterTake(T)*, T, prep_IterTake(T)_itr, IterTake(T)_nxt)
And that's it! Now an IterTake
can be converted into an Iterable
. But how about we also make an abstraction to turn a given Iterable
into another Iterable
with the IterTake
applied directly?
#define take_from(it, n, T) prep_IterTake(T)_itr(&(IterTake(T)){.i = 0, .limit = n, .src = it})
take_from
can be used to take n
elements of type T
from an iterable, it
.
You'll notice that in the previous snippets, the pre processor token concatenation for the function names isn't going to quite work - that was just for simplification, in reality IterTake(T)_nxt
is written as CONCAT(IterTake(T), _nxt)
where CONCAT
is-
#define CONCAT_(A, B) A##B
#define CONCAT(A, B) CONCAT_(A, B)
While we're at it, let's also make a macro to name the prep_
function consistently, just like we did for arrays and lists-
/* Name of the function that wraps an IterTake(ElmntType) for given ElmntType into an iterable */
#define prep_itertake_of(ElmntType) CONCAT(CONCAT(prep_, IterTake(ElmntType)), _itr)
We can use that in the take_from
macro instead, as well as in the function name in the impl_iterator
macro.
Combining all of that, gives you-
/*
Define the iterator implementation function for an IterTake struct
The function is named `prep_itertake_of(ElmntType)`
*/
#define define_itertake_func(ElmntType) \
static Maybe(ElmntType) CONCAT(IterTake(ElmntType), _nxt)(IterTake(ElmntType) * self) \
{ \
if (self->i < self->limit) { \
++(self->i); \
Iterable(ElmntType) srcit = self->src; \
return srcit.tc->next(srcit.self); \
} \
return Nothing(ElmntType); \
} \
impl_iterator(IterTake(ElmntType)*, ElmntType, prep_itertake_of(ElmntType), CONCAT(IterTake(ElmntType), _nxt))
You need to call this macro with a concrete type (a type for which an Iterator
, and Iterable
, already exist) inside a C source file. In the examples, this macro is called inside iterable_utils.c - for defining IterTake
for a couple of types. The declrations of the prep_
function is then included in the iterable_utils.h header file.
That's the implementation, let's use it now!
Iterable(uint32_t) it = get_fibitr(); /* Create an infinite fibonacci sequence iterable */
Iterable(uint32_t) it10 = take_from(it, 10, uint32_t); /* Iterable of the first 10 items in the sequence */
/* Print the first 10 items */
foreach (uint32_t, n, it10) {
printf("%" PRIu32 " ", n);
}
puts("");
You can find this code in fibonacci.c. get_fibitr
simply returns an infinite Iterable(uint32_t)
representing the fibonacci sequence. take
shines in its ability to operate on infinite iterators. In the above snippet, we take the first 10 elements of it
(the fibonacci sequence) and store the built iterable in it10
.
Since this is a lazy process - it10
is actually just backed by it
. When you consume from it10
, it
is also being consumed from.
This means that you can call take_from
on it
again and try to get 10 more items, and this time you'll get the next 10 items. Cool!
If the Iterable
is implemented correctly (i.e the next
function behaves as expected), Using take_from
on an already consumed Iterable
does not have any traps - it simply gives you back an already consumed Iterable
.
Even if you use take_from
with a n
value greater than the length of the source Iterable
, the returned Iterable
will simply be fully consumed before reaching n
- no surprises.
The map
utility
The map
implementation will follow a very similar pattern. We need a struct to hold the mapping function, a next
function impl to use that mapping function while iterating over the source iterable, and the Iterator
implementation for this struct.
However, we now need to think about the mapping function's type. It's argument should be the same type as that of the source iterable's elements - but what about its return type? The return type can be anything. So each one of these special map structs, need to be parametrize on not only the element type, but also the return type of the mapping function.
#define IterMap(ElmntType, FnRetType) IterMap##ElmntType##FnRetType
#define DefineIterMap(ElmntType, FnRetType) \
typedef struct \
{ \
FnRetType (*const mapfn)(ElmntType x); \
Iterable(ElmntType) const src; \
} IterMap(ElmntType, FnRetType)
As usual, type safety is a priority.
The next
function impl should look like-
static Maybe(FnRetType) CONCAT(IterMap(ElmntType, FnRetType), _nxt)(IterMap(ElmntType, FnRetType) * self)
{
Iterable(ElmntType) const srcit = self->src;
Maybe(ElmntType) res = srcit.tc->next(srcit.self);
if (is_nothing(res)) {
return Nothing(FnRetType);
}
return Just(self->mapfn(from_just_(res)), FnRetType);
}
It simply consumes the source iterable, applies the mapping function on each element and returns the result.
Now, we can use impl_iterator
-
/* Name of the function that wraps an IterMap(ElmntType, FnRetType) for given ElmntType and FnRetType into an iterable */
#define prep_itermap_of(ElmntType, FnRetType) CONCAT(CONCAT(prep_, IterMap(ElmntType, FnRetType)), _itr)
impl_iterator(IterMap(ElmntType, FnRetType)*, FnRetType, prep_itermap_of(ElmntType, FnRetType), CONCAT(IterMap(ElmntType, FnRetType), _nxt))
Combine all of that together, and you get a similar macro as last time-
/*
Define the iterator implementation function for an IterMap struct
Also define a function with the given `Name` - which takes in an iterable and a function to map over said iterable,
wraps said iterable and function in an `IterMap` struct and wraps that around its `Iterable` impl
*/
#define define_itermap_func(ElmntType, FnRetType) \
static Maybe(FnRetType) CONCAT(IterMap(ElmntType, FnRetType), _nxt)(IterMap(ElmntType, FnRetType) * self) \
{ \
Iterable(ElmntType) const srcit = self->src; \
Maybe(ElmntType) res = srcit.tc->next(srcit.self); \
if (is_nothing(res)) { \
return Nothing(FnRetType); \
} \
return Just(self->mapfn(from_just_(res)), FnRetType); \
} \
impl_iterator(IterMap(ElmntType, FnRetType)*, FnRetType, prep_itermap_of(ElmntType, FnRetType), \
CONCAT(IterMap(ElmntType, FnRetType), _nxt))
The key difference is that this works on ElmntType
and FnRetType
, as opposed to just ElmntType
. You can call this macro to define the functions in a source file, and include the declarations in a header file. In the examples, this is done in iterable_utils.c and iterable_utils.h respectively.
We should also have a convenient macro in the same style as take_from
-
/* Map the function `fn` of type `FnRetType (*)(ElmntType)` over `it` to make a new iterable */
#define map_over(it, fn, ElmntType, FnRetType) \
prep_itermap_of(ElmntType, FnRetType)(&(IterMap(ElmntType, FnRetType)){.mapfn = fn, .src = it})
And that's all there is to implementing, let's use it!
int arr[] = {1, 2, 3};
/* Turn the array into an Iterable */
Iterable(int) arrit = arr_into_iter(arr, sizeof(arr) / sizeof(*arr), int);
/* Map an increment function over the iterable */
Iterable(int) mappedit = map_over(arrit, incr, int, int);
/* Print the iterable */
foreach (int, x, mappedit) {
printf("%d ", x);
}
puts("");
where incr
is-
static int incr(int x) { return x + 1; }
You can find this code in map_over.c. The above snippet maps the incr
function over the Iterable(int)
. Once again, this is a lazy process - no iteration is done by map_over
. The iteration, as well as the mapping function application, is only done in the foreach
.
A quick glance at implementing filter
Implementing filter
would also be just as simple as the previous examples. Though no concrete implementation is included in this repo, the pattern is really the exact same. Here's what the next
function impl would be-
static Maybe(ElmntType) CONCAT(IterFilter(ElmntType), _nxt)(IterFilter(ElmntType) * self)
{
Iterable(ElmntType) const srcit = self->src;
while (1) {
Maybe(ElmntType) res = srcit.tc->next(srcit.self);
if (is_nothing(res) || self->filterfn(from_just_(res))) {
return res;
}
}
}
where IterFilter(T)
would be-
struct
{
bool (*const filterfn)(T x);
Iterable(T) const src;
}
Iterable of Generic Elements
In the beginning of this README, while introducing this Iterator
interface, I talked about how an Iterator
is only generic on the input side, not on the output side. The element the Iterator
yields must be a concrete type - which separates Iterator(int)
and Iterator(string)
, and forbids you from using them interchangably.
But what if you wanted generic elements without giving up type safety? Well, you'd need some sort of constraint - I should be able to sum both an Iterable(int)
and an Iterable(float)
. Ignoring the bounds issues, it is agreeable that there are multiple types that can perform a general action.
Emphasis on "general action". Typeclasses (or interfaces) allow you to have a generic constraint based around the ability to do some action(s). What if, we had an Iterator
that yielded typeclass instances?
typedef typeclass(char* (*const show)(void* self)) Show;
typedef typeclass_instance(Show) Showable;
This is the Show
typeclass. It represents the ability of a type to be converted into a string (that may then be printed).
If we had an Iterator(Showable)
- we could turn each element into the strings representing them, doesn't really matter what type the actual data is, as long as it implements Show
. In the same way, you could have a Num
typeclass for arithmetic operations.
The pattern for defining and implementing such typeclasses in a type safe way, is the same as the pattern used to define and implement Iterator
. The typeclass and typeclass_instance struct, and an impl_iterator
macro that takes in some necessary info about the type for which the typeclass is being implemented, as well as the function implementations, type checks the function impls as a no-op, and returns the typeclass instance.
You can find examples for this usage, as well as more info in the Typeclass Pattern document.
Motivation
I needed some way to implement generic interfaces for a C library I'm working on. The library functions just needed a type that could do a certain thing, without caring about which exact type it could be backed by. It'd be the library function's responsibility to make the type do the action if and when needed.
Of course, I could segment a good bunch of these functions so one part of them do everything up until the point where the generic action needs to be performed, return control to the user, and let them continue to the next part of the library function by performing the action on their concrete type themselves and passing the result to the next part of the function. But this felt rather unintuitive, especially when the functions were already pretty small and performed extra computation which would be stored in the local scope.
I instead made the functions so that they take in a struct containing a void*
for the generic type and a function pointer for the required action, of the correct type. The library function wouldn't use this void*
directly (since that'd be unsafe) - but it could just pass it to the function pointer and call it.
After some experimenting with that approach to make it more extensible and general, with a heavy dosage of inspiration from haskell (and also from rust) - this is where it ended up. An Iterator
isn't the only typeclass I needed to implement, but it had the most potential for a demonstration - so here it is.