Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an option not to have the whole database in memory #4

Open
pisto opened this issue Feb 15, 2015 · 6 comments
Open

Provide an option not to have the whole database in memory #4

pisto opened this issue Feb 15, 2015 · 6 comments

Comments

@pisto
Copy link
Contributor

pisto commented Feb 15, 2015

Suggestions for the right way to do this? I think that file:seek() would work.

@daurnimator
Copy link
Owner

You need to have much of it in memory to find the data section seperator. See https://github.com/daurnimator/mmdblua/blob/master/mmdb.lua#L20
Searching backwards is 'hard'.

After that, ipv6_find_ipv4_start will need to traverse a reasonable amount of the file...

I'm not sure if it's worth putting the work in for this?

@pisto
Copy link
Contributor Author

pisto commented Feb 15, 2015

no it's not high priority. But would be nice, as right now basically my server thing takes 3MB and the database 30MB. With the other geoip bindings there was an option to map the file in memory, so multiple instances of the same program would allocate the memory only once. I'll look around if there's a decent lua library that can do this with arbitrary files and expose them as strings.

@daurnimator
Copy link
Owner

With the other geoip bindings there was an option to map the file in memory

Yeah; they'll probably be using mmap().
You could do this in luajit via the ffi; but I'd rather not bring that dependency in.

I'll look around if there's a decent lua library that can do this with arbitrary files and expose them as strings.

That won't be possible; lua strings are interned.

@pisto
Copy link
Contributor Author

pisto commented Feb 15, 2015

I know, I mean expose the mmaped region in some way semantically equivalent to a string.

@daurnimator
Copy link
Owner

Looking through again there isn't that many string methods in use:

  • that first :find to get the data section seperator
  • :byte to read a few bytes at a time
  • :sub to extract a substring
  • casting to an ffi const char* for speed improvements on the above.

You should be able to replace most of these with a :seek and :read, I'd be willing to accept a pull request that adds this.

@daurnimator
Copy link
Owner

To quote myself from #5 (comment)

Sorry, but after reflecting on this a while, I realised that seeking a file here is not the correct answer: it will result in un-necessary slowness due to seek syscall overhead.

What I suggest instead is an ffi-only optimisation that uses an mmap call (perhaps via ljsyscall).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants