System call restrictions beyond individual processes
When I think of designing a program to be secure on a Linux machine or something similar, it always comes down to the same few things. First, don't do anything stupid with user input. Next, only let it see the files it absolutely needs to see. Last, don't let it have any more permissions than it absolutely needs to have.
Being paranoid about user input is a topic for another post, so I won't get into that here. Limiting what sort of files are visible is what chroot and jail are for. If you extend this visibility to also walling off what can be seen in terms of other processes, then containers might be a good idea, too. It's the last one where it gets tricky.
Restricting permissions might mean having a specific user account and having the program run as that user. Then, inside the jail (or chroot, or whatever), you have only the bare minimum in terms of writable spaces, if at all. This is all well and good, but unless you also do some very interesting things, that process can still use a whole bunch of syscalls. If compromised, it might be able to do things you never intended for it to do.
So then there are things like SELinux where you can crank down the range of things which may be performed by a process. This seems to be about as fine-grained as it can get: individual processes. What I've been wondering about is having different levels of access even within the same program.
Programs are already frequently split into logical separations along various lines: classes and instances of them, functions within a given scope, and different object files. All of them represent some kind of grouping which might make sense to use as a border for security domains.
For example, let's say I have a function which opens my log file for writing, a second function which creates a TCP socket for listening, and a third which calls accept() on that socket when new connections arrive. Other than those three places, I never create file descriptors. If there was some way to say "no other place may use anything which creates a file descriptor", then that might be interesting. For one thing, even if you could get my program to run arbitrary code, there's no guarantee it would be able to do what you wanted.
For such an attack to work, it would have to only try to use the restricted syscalls in those few contexts in which they were allowed. It seems like that would reduce the attack surface for any given privilege from the entirety of a program to just those places where it has been switched on.
There is a catch here, naturally. I have no idea how you would tell the system which areas are allowed to do X without also having the possibility of someone else coming along later and adding to that list. The finer points of how binaries really run are not my thing, but just generically speaking, I wonder if something like this would be possible.
Let's assume you have your program code, and it's fixed. It's set up in memory that is not writable at any point. Then you could have regions of that code which are set up when execution begins, and in those regions, you say "X is allowed here". There might be a "default" setting which also says "X is otherwise not allowed".
Then, if there's a syscall to X and your program counter (or whatever applies for this kind of stuff) isn't in one of those special regions, you deny it.
Could this scheme be exploited? Sure, why not. If you grant open() permission to a function which takes anything as a parameter, then you've just defeated the purpose, since some evil code would just call it to get things done. On the other hand, if you have a second function which calls the first with a specific argument, you could bless that function instead, and have it carry that permission through to the actual open() call.
In this sense, it's not just where your program counter is, but also where it has been. It seems it would be necessary to check the call stack to see if privilege can be determined from one of those places.
I also wonder about using this the other way: checking the call stack to look for functions which add "taint". Maybe you have something which performs a read() from the network. You might want to say "any time this appears in the stack, disallow syscalls X Y and Z".
My final thought about this has to do with the metadata sometimes placed in comments within code. Sometimes you say "requires X" or "expects Y" or "provides Z". With this, you might also need a section to say "is allowed to call syscall_foo()" in order to populate the permissions. Then again, it might be even more interesting to have that happen separate from the source code. Just look up the addresses of your functions later and add it when signing your binary, or something like that.
This idea isn't entirely baked, but it's something to think about.