Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What would you include in a tier-0 lib? #3

Open
capr opened this issue Jan 26, 2019 · 16 comments
Open

What would you include in a tier-0 lib? #3

capr opened this issue Jan 26, 2019 · 16 comments

Comments

@capr
Copy link
Member

capr commented Jan 26, 2019

I'll list a few things roughly in the order of how much I use them in my code. I've only written about 7K of Terra code total so far, so someone with more LOC written would probably have a better list and a different order, so FWIW:

  • the ternary operator macro: but what should we name it? the operator itself would be ideal (i.e. overloading ':' in the language when the first operand is a bool)... lacking that, I would prefer something as short as possible. I personally use iif (from sql, basic), but cond could work too (form lisp). Other suggestions?

  • print() and assert() -- my default debugging tools, use them all the time (tostring comes as a bonus, basically as a typed version of snprintf).

  • clock() -- monotonic clock, indispensable for profiling and benchmarking -- basically terralib.currenttimeinseconds but for the Terra runtime and implement it with more accurate implementations (clock_gettime(CLOCK_MONOTONIC) for Linux, mach_absolute_time for OSX)

  • typed versions of malloc, calloc, memset and memmove -- those could mimic ffi.new(), ffi.copy() and ffi.fill() semantics, so I would call them new(), copy() and fill() respectivelly. I would also add alloc() for malloc, since new() calls calloc which you may or may not want. I didn't test this but it would be interesting to know if Terra actually generates LLVM intrinsics for memmove and memset, that could be useful.

  • inc() and dec() macros -- to ease the pain of not having ++ and --

These are all in my low module, which begs the question -- what should we name this module? :)

@aiverson
Copy link
Collaborator

What exactly do you mean by tire-0? Did you mean to write tier-0? And what exactly does that mean to you?

@capr capr changed the title What would you include in a tire-0 lib? What would you include in a tier-0 lib? Jan 26, 2019
@capr
Copy link
Member Author

capr commented Jan 26, 2019

Yes, bad spelling.

@capr
Copy link
Member Author

capr commented Jan 26, 2019

It means the most basic tools that could almost be considered language extensions and which should really be built-ins but aren't, like the ones listed above, but without which the code becomes tedious and redundant i.e. writing if then else instead of an iif expression or a.b.c = a.b.c + 1 instead of inc(a.b.c) or memset(t, 0, sizeof(T)) instead of fill(t), etc.

@aiverson
Copy link
Collaborator

We need better debugging tools. The ability to do breakpoint debugging is really useful, but I don't see anything that currently gives us that ability. Building better debugging tools is a longer term project for later that probably requires more discussion. For a tier zero standard library, I have a few things that I definitely want in it.

IO facilities

printf and scanf from the C stdlib are workable, but they aren't great. They have security problems, the need to parse the format string at runtime is inefficient, they can't handle anything very sophisticated, and they aren't easy to metaprogram. I think writing a set of formatting and parsing IO primitives that have their format in a structure that can be efficiently metaprogrammed and resolved at compile time, that are more flexible for simple but nontrivial formats. For example CSV has quoted strings with escape characters. CSV should be dead simple to parse with stdlib tools, but C scanf can't handle that at all, which requires a manual FSA parser to deal with CSV. Making a parser do that should be simple and fast. And in Terra, it should be metaprogrammable. Same for formatting it out. I have some sample code relevant to this.

Iterators, transducers, and streams

Flexible and powerful filter-map-reduce operators should be available. I have workable implementations of these. Iterators over the basic collections should be available and tools to build new iterators should as well. I have a basic implementation of this. Streams for potentially asynchronous data and events are also useful and should be usable with the same API. I do not have code for this. These parts may not be appropriate for tier 0.

Memory management

Using malloc and free is acceptable if we want to only run code in environments with libc. WASM, which I am interested in using, does not have malloc and free. Microcontrollers running baremetal would not always have malloc and free. It may also be useful to have support for alternative allocators to do things like FIFO allocation, fixed size allocation, or other optimization scenarios. This needs to be architected in to the standard library so that it becomes the default way of doing things and because the standard library needs to support pluggable allocators for places where it allocates memory. Not all of this is in scope for tier zero, and all the alternative allocators are certainly not.

@aiverson
Copy link
Collaborator

I would rather use cond than iif as the name for the ternary operator. We could actually borrow Lisp's semantics there a bit more and let it have any odd number of arguments to handle the equivalent of if elseif elseif elseif else chains.

@aiverson
Copy link
Collaborator

aiverson commented Jan 26, 2019

A function to synthesize comparison metamethods out of some minimum spanning set of comparison metamethods would be useful too. If we could come up with a reasonably general comparator synthesis tool, that would be useful, and I think I have an idea for making that.

@capr
Copy link
Member Author

capr commented Jan 26, 2019

@aiverson can you post here some links with the stuff that you already have implementations for so we can take a look? thanks!

@capr
Copy link
Member Author

capr commented Jan 27, 2019

@aiverson can you make a top 5 or a top 10 list of your most used functions across a variety of problem domains? That's what I would start with for a tier-0 lib.

@aiverson
Copy link
Collaborator

Off the top of my head, some of my most commonly used functions and datastructures are, in no particular order:

  • ArrayList/Vector
  • printf/scanf and friends and equivalents
  • sort an array
  • binary search a sorted array
  • priority queue
  • hashmap
  • string manipulation (find, substring, split, etc.)

Repetitive code snippets that are amenable to metaprogramming

  • comparators for a struct based on a precedence order of fields in the struct
  • building a full set of comparison metamethods for a total ordering of a type.

@aiverson
Copy link
Collaborator

aiverson commented Feb 1, 2019

Here is an implementation of some basic filter-map-reduce capabilities.
https://github.com/aiverson/constellation/blob/master/src/fmr.t

It would need to be cleaned up a bit and extended to integrate with the rest of the standard library better before being integrated into the standardlib, but it should be a good start.

There are also some performance optimizations that I have thought of since writing that that I can implement.

@capr
Copy link
Member Author

capr commented Feb 1, 2019

That's great for a functional module, I'm just not sure we're on the same page here with regards to what tier-0 means and why it's needed. Also, IMHO a functional library would be quite limited in a language that doesn't support closures. For functional programming maybe adding closures to Terra would be a good research project (even in limited form like GCC's nested functions).

@aiverson
Copy link
Collaborator

aiverson commented Feb 2, 2019

Something that optimizes and simplifies loops, one the most commonly written parts of any program, seems like something that should be part of the very first bits of the standard library. Recursions schemes can wait because most people (including myself) don't have a good idea of how do design with them. Filter-map-reduce and code that can be replaced by them though is something that seems about as commonly used as the inc, dec, or cond macros you mentioned and code that can be replaced by them. Loops that can be made out of an FMR construct are common, and my library synthesizes efficient code for them while separating concerns clearly and making the programming less error prone. They certainly aren't that much less common than incrementing a variable, and are much more common than even a priority queue. This sounds like what you were describing as a tier zero library.

@capr
Copy link
Member Author

capr commented Feb 2, 2019

Well, we'll just have to agree to disagree then :) Like I said, the probability of two people to see eye-to-eye on such matters is pretty low. That being said, I'm willing to be proven wrong so here's my null-hypothesis: if you have terra code where you solved an actual problem and used this style a lot, I'd be very glad to learn something and upgrade my style.

@aiverson
Copy link
Collaborator

aiverson commented Feb 3, 2019

Good point. I'll work on this in the background and bring it up again when I have a nice compact example of doing something practical with it that I think demonstrates the value well. In the mean time, we should talk about the rest of the first section of the standard library to be implemented.

I think the first thing should be a collection of basic container types. Ideally, we would have some idea of allocators from the word go, but that is an architectural consideration that may need to be done wrong before we can figure out how to do it right.

Parsing and formatting is something we should work on as well. My desiderata for parsing and formatting are: metaprogrammable easily (it should be clean and reasonably legible to define a function that takes a struct definition and synthesizes parsers or formatters for a specific representation), secure (no (accidental?) pointer overrun or format string injection vulnerabilities like in C), efficient (the runtime shouldn't need to parse the format string in addition to parsing or serializing the actual target. Ideally there would be an option to choose between optimizing for code size and code speed), flexible (the formatter and parser should allow reading and writing fron memory buffers, streams, files, sockets, strings, etc. in a smart and extensible manner), and expressive (It should be possible to parse a lot of things without too much difficulty.) The obvious solution to straightforward parsing is a Parsing Expression Grammar. LPEG provides a pretty good example for this. If we implement something very similar to LPEG that compiles into terra native code and include some convenience functions that produce patterns like the C scanf specifiers for convenience, and maybe some terra specific capture operations and semantics, that should be a suitable solution for the parsing half of that problem. The C printf/scanf pair has an appealing symmetry between the format strings for reading and writing, and ideally our solution would preserve that. I have some ideas, but I would like to hear other ideas to brainstorm.

@TWarszawski
Copy link

I just wanted to add a comment that as a user (through Regent), I would have found a vector/arraylist and hashmap useful. I ended up implementing my own.

@aiverson
Copy link
Collaborator

aiverson commented Mar 8, 2019

There is already a basic vector/arraylist included in the core library. We are definitely going to have an improved arraylist and a hashmap in the library among other things. A nice variety of container types seems like a very important feature of a standard library, I've got some plans to make it as good as I possibly can. If there is anything special about your implementation or usecase, please let us know about that so we can use it to inform the design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants