Awesome
C-Iterplus
Functional abstractions for working with iterators, as demonstrated in c-iterators.
Finding the longest common prefix of an array of strings
#include "itplus.h"
#include "sugar.h"
#include <string.h>
static bool pair_is_equal(Pair(char, char) x) { return fst(x) == snd(x); }
static char fst_chr(Pair(char, char) chrpair) { return fst(chrpair); }
/* Using iterators to find common prefix - with C11 sugar */
static char* cmn_prefx(string s1, string s2)
{
size_t len = 0;
char* const prefx = collect(
map(takewhile(
zip(chrarr_to_iter(s1, strlen(s1)), chrarr_to_iter(s2, strlen(s2))),
pair_is_equal),
fst_chr),
&len);
prefx[len] = '\0';
return prefx;
}
/* Accumulator function wrapping `cmn_prefx` - accumulator type is a heap allocated string */
static inline char* acc_cmn_prefx(char* acc, string s)
{
char* const res = cmn_prefx(acc, s);
free(acc);
return res;
}
int main(void)
{
string arr[] = {"flower", "flow", "flight"};
size_t arrlen = sizeof(arr) / sizeof(*arr);
char* const lngest_prefx = fold(strarr_to_iter(arr + 1, arrlen - 1), strdup_(arr[0]), acc_cmn_prefx);
puts(lngest_prefx); /* "fl" */
free(lngest_prefx);
return 0;
}
Output-
fl
Just like the polymorphic iterators, most of the abstractions are also implemented using The Typeclass Pattern. A polymorphism pattern based around actions, similar to Haskell typeclasses, Java/C# interfaces, or Rust traits. This pattern aims for-
- Extensibility (through dynamic dispatch)
- Minimal runtime cost (the only real cost being the dynamic dispatch)
- No dynamic memory allocations
- Type safety is a first class concern
The term "Iterplus" is shamelessly stolen borrowed from the original iterplus.
Information about file structure and file contents is described in ARCHITECTURE. You can find the generated docs and general information here.
Highlights
- Wrtten in pure, standard C (C99 for the actual utilities, C11 for extra syntactic sugar)
- Every utility that returns an iterable is lazy; nothing is evaluated until the iterable is explicitly iterated over
- No hidden memory allocations
- Type safety through the typeclass pattern
- It's kinda fun to play around with functional abstractions, I guess
Available Abstractions
take
- defined in itplus_take.hdrop
- defined in itplus_take.htake_while
- defined in itplus_takewhile.hdrop_while
- defined in itplus_dropwhile.hmap
- defined in itplus_map.hfilter
- defined in itplus_filter.hfilter_map
- defined in itplus_filtermap.hreduce
- defined in itplus_reduce.hfold
- defined in itplus_fold.henumerate
- defined in itplus_enumerate.hzip
- defined in itplus_zip.hcollect
- defined in itplus_collect.h
You can also implement your own abstractions using the same pattern. Refer to Semantics.
Usage
Just include itplus.h
and get to using it!
Before anything, you should familiarize yourself with the basics of these iterators. Discussed in c-iterators. You just need to know how to iterate through iterables though. TL;DR- you use the foreach
macro to iterate over an iterable.
Refer to tests or samples to look at how to implement the utilities. In general the pattern goes like this-
-
Define the iterplus structs required to define the specific utilities. Include it in a header.
For
tests/
, these are in common.h.For
samples/
, these are in impls.h. -
Define the specific iterplus utility function using the corresponding
define_
macro.For
tests/
, these are in impls.c.For
samples/
, these are in impls.c. -
Include the function signature of this defined in a header.
For
tests/
, these are in impls.h.For
samples/
, these are in impls.h.
There's also the IterPlus
, DeclIterPlus
, and DefnIterPlus
macro to define/declare all iterplus utility structs, and utilities themselves at once, for a specific type.
Most of these functions take in a pointer to an iterplus struct, that is filled with necessary information - and turns it into an iterable. For example, the define_itertake_func
macro defines a function that takes in an IterTake
struct, containing a specific type of iterable and the limit
value (as well as i
set to 0), and returns an Iterable
that has, at most limit
amount of elements. All of these elements are consumed from the iterable put into the IterTake
struct you passed to the function. The lifetime of the returned iterable is the same as the lifetime of the IterTake
struct pointed to by the given pointer.
Though I should mention, these abstractions are just for fun. I wanted to see what I could do with c-iterators. I think the the typeclass based polymorphism pattern is really useful. Even the regular lightweight iterators are certainly useful. But with the lack of closures, no way to capture types, and somewhat "meh" static dispatch support - there's a LOT of boilerplate needed to get you up on your feet even after including itplus.h
. It's pretty fun once you actually define all of this boilerplate - but that's not a great thing.
Semantics and Explanation
Most abstractions are documented in their own files, so you can view them in docs. The general idea behind every abstraction is further explained in the "Lazy Abstractions" part of the c-iterators README.
C11 Syntax Sugar
With _Generic
you can make macros to statically dispatch to the iterplus utility functions. Essentially making the usage be expression oriented, and certainly much better to use. But once again, more boilerplate, heh.
The main skeleton of these can be found in sugar.h) and sugar.c.
In general, you could implement generic selection with a macro take
, that takes an iterable and the amount of items to "take" from it. To do this, however, you need
-
A function that can put that iterable into an
IterTake
struct, as well as the given limit. -
A way to pass this
IterTake
struct into the function defined usingdefine_itertake_func
.As mentioned before, most of the functions defined using the
define_
macros, take in a pointer to an iterplus struct (in this caseIterTake
), and return anIterable
according to theIterTake
struct'sIterator
implementation.
All of this should ideally avoid heap allocation. This means you can't have a function that creates a local IterTake
struct, since its lifetime ends when the function ends, so the returned Iterable
would be pointing to invalid memory. You need to somehow make the IterTake
struct in the same scope.
Suppose the take
utility has been implemented for an Iterable(int)
-
define_itertake_func(int, takeint_to_itr)
i.e the function that turns an IterTake(int)*
to the corresponding Iterable(int)
, is named takeint_to_itr
. The signature of this function is- Iterable(int) takeint_to_itr(IterTake(int)* x);
You would use the take
utility on an Iterable(int)
, it
like so-
Iterable(int) it = ...;
Iterable(int) first_10 = takeint_to_itr(&(IterTake(int)){ .limit = 10, .src = it });
Compound literals allow you to create the struct and pass a pointer to it into the function, all in the same scope, in the same line.
This is pretty neat, but now you need to make this generic, something like-
Iterable(int) first_10 = take(it, 10);
where takeint_to_itr
is statically dispatched to by figuring out the type of it
.
But _Generic
takes actual expressions as values in its assoc list. This means you can't do-
#define take(it, n) _Generic((it), Iterable(int): takeint_to_itr)(&(IterTake(_Generic((it), Iterable(int): int))){ .limit = (n), .src = (it) })
Since int
is not a valid expression, it's a type! Making _Generic((it), Iterable(int): int)
a compile error.
You also can't do-
#define take(it, n) _Generic((it), Iterable(int): takeint_to_itr)(_Generic((it), Iterable(int): &(IterTake(int)){ .limit = (n), .src = (it) }))
Because every expression in the assoc list of _Generic
must be correct, whether or not they're actually chosen. it
of type Iterable(int)
would not be a valid assignment to .src
to IterTake(char)
for example. Even though that branch wouldn't be chosen anyway. This is a rather unfortunate and silly part of the spec.
Instead, you'll need a function, that does the following
- Takes in a pointer to an iterplus struct allocated in the same scope as the macro caller
- Takes in the iterable itself
- Takes in some extra context arguments if required
- Puts the iterable and the extra context specific arguments into the given struct
For the take
utility, this function looks like (for Iterable(int)
)-
Iterable(int) prep_tkint(IterTake(int) * tk, Iterable(int) x)
{
tk->src = x;
return takeint_to_itr(tk);
}
Now you can make the take
macro using _Generic
selection-
#define take(it, n) _Generic((it), Iterable(int): prep_tkint)(_Generic((it), Iterable(int): &(IterTake(int)){ .limit = (n)}), (it))
And you can add in more types as needed to the assoc list, provided you have their specific prep_tkint
-like wrapper function.
This is the core idea, it's expanded upon using a tiny bit of metaprogramming in sugar.h.
Pitfalls, Memory, Lifetime, and Ownership
-
When you use one iterable to build another iterable through the utilities, no implicit cloning is done. The new iterable is just consuming from the original iterable, possibly with some added context. This is pretty normal for iterators - but should be kept in mind.
Unless you're chaining abstractions together, don't feed the same iterator into multiple utilities. Iterating through one of those utilities will iterate from the other one as well. This is why, in snippets like this one,
get_fibitr
is called multiple times to acquire a new iterable. Otherwise, the utilitychain
would be chaining together 2 of the same iterables.... essentially achieving nothing.Instead, make a new iterator. Making iterators should be very inexpensive.
-
Consumers of iterables should not be expected to free the elements extracted from the iterable. Iterables should never clone data from their source. The iterator implementation for an array of strings should yield the exact same pointers contained within the array. No implicit cloning of the strings. The memory is owned by the array, and it is the responsibility of that array's owner to free the strings in it. The strings extracted from the iterable should not be freed.
This is not a forced requirement, but it is certainly encouraged to have a standard memory model. Every iterplus utility follows this principle. Nothing will ever be cloned, the values from the sources are simply re-extracted. It's the responsibility of the original source to manage memory.
This works great when chaining abstractions as well. Convert a array of pointers to an iterable,
zip
with another array of pointers (doesn't have to be the same type of pointers),filter
it by apredicate
function,drop
the firstN
elements,enumerate
them to get a final iterable. Now, when you iterate over this resulting iterable, every element is still the same elements from the original arrays. No need to manage the memory for these, leave that to whoever is handling the arrays themselves.A problem may arise, however, when you use utilities such as
map
, orfilter_map
. Here, it's upto the user about what they return. You couldmap
over an iterable to make another iterable where each element is actually heap allocated. In which case, you do need to free the elements extracted from the iterable. This should not be a problem however, since it's completely transparent. If you mapped with a function that heap allocates - well you know the elements will need to be freed. The function is provided by you.But what happens if you need to feed this iterable into a general API function, an API function that adheres to the standard principle of not freeing the iterable's elements? In general, you should use the
collect
utility to collect the iterable's data into an array. Now the array contains the heap allocated data that you can free later. Convert this back to an iterable and feed it into the API function.
Examples
You can find the implementations and examples within tests and samples.