Basic_tutorial_3d_scanning

3D scanning

The starting point of an HRTF simulation is a good 3D mesh of both ears and a reasonable mesh of the head (+ shoulders). The mesh is crucial – bad 3D mesh will guarantee flawed results. Note that at 20kHz the wavelength of sound is still 1.7 cm long, therefore while mesh detail is very important, some millimeter-level details are not essential.

------------------------------------------------------------	Part of the the Complete Beginner’s tutorial	------------------------------------------------------------
Previous <<< Installation of Mesh2HRTF	last updated in 2023-02-18	Next >>> 3D mesh clean-up and merging

Video version of this tutorial:

Note: Example 3D mesh is available for simulation. If you would like to test Mesh2HRTF without 3D scanning - download the example high-quality 3D scan of a Kemar dummy and jump straight to: ___ https://youtu.be/wRvSooD0n0E?t=641 ___ .

3D scanning alternatives

There is excellent research done to compare both quality of various 3D scanning systems and the impact of detail loss on the resulting HRTF simulation by Dinakaran, M., Brinkmann, F., Harder, S., Pelzer, R., Grosche, P., Paulsen, R. R., & Weinzierl, S. (2018) (free Poster download or full Paper). Note that the study above was based on a static dummy – real people breathe and are never entirely still. Based on the published research and some practical experimentation, here are a few suggestions for practical 3D scanning of live people:

iPhone with Face-ID sensor – a lot of iOS devices have Face-ID feature with a structured light 3D sensor (this sensor has been independently evaluated for use as a 3D scanner - paper1, paper2, paper3). Scanning using iPhone TrueDepth sensor requires repeated attempts, care and practice, but can produce results that are detailed enough for as little as 10 Euros for a suitable app (assuming you have access to a compatible iOS device).
- iPhone Face-ID method currently (2022) produces the best results among near-free 3D scanners.
- There are non-Apple devices (both standalone and integrated into smartphones) with similar or better structured light sensors as in iPhones, but the difficult task is to find good 3D scanning software for those devices. Plus the cost of alternative devices often exceeds the cost of a second-hand iPhone with suitable hardware.
- LIDAR sensor on some iOS devices and other smartphones (for 2022) have too low resolution to be useful for head scanning. The same issue applies to other long-range 3D scanners - not all 3D sensors are suitable for scanning small details at close range.
Photogrammetry tools can be the last-resort, entry level alternative (MacOS has quite good native software API (if you compile it or find a 3rd party app that uses this API), then there are many other computer software alternatives and even phone apps). This method is not recommended, but with very good lighting conditions, visual reference markers on the skin and enough high quality source photos from all possible angles it could work well enough. Use at your own risk.
- Note that most photogrammetry workflows can not measure absolute object dimensions, therefore the 3D data usually has to be scaled based on some known reference in the scene.
Professional human scanning equipment or services. If you can book an appointment to a clinic with a good 3D body scanner - it may be the most elegant method to get a good head scan. Such scanners are used for skin health monitoring but are more commonly found in high-end plastic surgery clinics. For example such medical scanner was evaluated in the study mentioned above with good results. The greatest advantage of the professional photogrammetry systems may be instant capture – all the data is captured in one moment and it is never distorted by subject movement. Alternatively some companies offer 3D capture as a service for various computer graphics needs – in those cases scan quality is less predictable, but could be excellent.
- Note: any fixed-camera system will likely be inaccurate on some of the deeper ear pinna folds because photogrammetry requires multiple points of view for every 3D point, but some folds on the ear are always obstructed and not even a single camera is likely to have a direct line of sight (more cameras == better).
  - Nevertheless, these tight areas are difficult for many 3D scanners (for example Apple iPhone Face-ID camera) but according to the existing research this inaccuracy may have only minor practical effect.
Dedicated 3D scanners. Most 3D scanners are considered professional equipment and are priced accordingly. There are even specialized ear scanners - read more below. Meanwhile cheaper, hobby-level 3D scanners may not perform better than an iPhone Face-ID camera. The biggest risk with 3D scanners that are not marketed towards healthcare is that they could be unsuitable for scanning breathing and shivering human subjects. Notice that the highest precision 3D scanners also come with high precision mounting platforms or referencing systems to keep perfect alignment with the measured object. Humans move many millimeters even when trying to be still and that can completely confuse software of otherwise flawless 3D scanner.
- Very good option: companies that make custom-molded earplugs and in-ear headphone adapters may have a special ear scanner like this one: Otoscan 3D (YouTube link). Most importantly - the service should be easily available for private people for a reasonable price. Such scanner would ensure high quality 3D scan of the ears, but the rest of the head would need to be scanned by other means and the high quality ear-scans would require more-complex-than-usual merging with the rest of the head. (just make sure that A- you will receive the 3D scan data in .stl format after the scanning session and B- the company will scan the ears all the way to the outer edges, not just the inner parts that are typically sufficient for making earplugs. C- also ask to scan the ear canal as deep as safely possible).
- In practice, the best scanners for scanning ears are hand-held professional models, because only they allow scanning ears from all the necessary angles. Exact approach may vary by scanner and underlying technology, but most professional hand-held 3D scanners should work great for Mesh2HRTF purposes.
Other - there are a few more methods to get the 3D data, for example computed tomography, but those are hardly relevant for Mesh2HRTF.
- A very promising, potential method is to use raw depth map data ("3D photos" with point cloud data) from any suitable 3D imaging sensor (for example Intel RealSense sensors) and merge it into a 3D scan with a help of computer software. For example this tutorial for Meshlab shows how multiple point clouds can form a 3D scan out of individual depth maps. This method can be used even if there is no suitable 3D scanning software that follows the sensor, but it requires either very high sensor accuracy or a large amount of depth maps to average-out the noise in the data.

What needs to be scanned for Mesh2HRTF

Before the 3D scan, try to optimize the scan conditions:

Cover or remove hair as much as possible to reveal ears and skin surface for scanning. Hair is mostly transparent for sound waves and hair shape can distort the actual 3D head boundary that determines ITD (interaural time differences) and other HRTF aspects. Ears should be visible from all directions (only some compromises are acceptable for the surface behind the pinna). If feasible – shave; cut the hair shorter; tie long hair together and out of the way. Ideally use a tight swimming hat or a wig cap to compress and organize hair. The remaining hair will require more 3D data clean-up and reduce the accuracy of the head boundary.
Take off glasses – glasses reflect light, are transparent and have small details that will not scan well with most methods. Plus glasses like other accessories are not permanent - person may replace their glasses with another design.
Do not use reflective makeup – in fact matte surface is optimal for most 3D scanners (clean skin is good as well).
Check what is needed to get the best results from the specific scanner technology. This tutorial only covers scanning with iPhone Face-ID sensor or similar structured light scanner. Other scanners can be quite different and for example may require use of reference markers.

The 3D scan should include:

Left and Right ear meshes in best possible detail (preferably better than 1mm resolution and data accuracy). Note that Mesh2HRTF will simulate each ear as it is. Simplifications where both ears are assumed to be identical cause some fidelity loss above 5 kHz and can only be explained by a priority to save on simulation and 3D mesh pre-processing time/cost.
The head can be much less detailed (2 mm or even rougher resolution is still good). Many 3D scanners, including iPhone, accumulate measurement errors and loose finer details when they scan large objects such as a head from all sides. Therefore it is often necessary to merge high quality ear data to a rougher head mesh.
Shoulders are optional - you can choose to simulate just the head geometry or to also include shoulders in the 3D mesh. Shoulders can provide reflection surface for vertical sounds to bounce-off on their way into the ears. There are pros & cons for including shoulders in the simulation, so there is no right or wrong answer at this point. (For shoulders there is no value in any geometric details beyond a rough outline and distance from the ears). In real life shoulders move wildly in relation to the ears, so there is no need for precision.
Multiple takes. An ideal scanner, would make one “flash” and output a perfect, reliable 3D mesh (such body scanning systems do exist, only the mesh accuracy is not guaranteed to be great). But most 3D scanning methods involve moving a sensor around the head and ears to get a perspective on all surfaces. This process takes time during which subject breaths, moves, facial muscles contract and positional tracking may not be perfect either. Therefore even with metrology grade 3D scanner it is wise to plan for multiple scanning attempts to afterwards pick the best take for further cleanup.

Suitable Apple devices

This 3D scanning tutorial is applicable for Apple devices with a Face-ID sensor (which is used for face recognition).

Note that smartphone LIDARs are useless for Mesh2HRTF purposes. Therefore iPhone Pro models have no advantage over non-Pro variants, because LIDARs used in smartphones have too low resolution for scanning people.

iPhone13 is a problem! Unfortunately iPhone 13 introduced a more compact and much lower quality Face-ID camera which still works, but may not be good enough for Mesh2HRTF 3D scanning purposes. The situation may change and with never hardware and newer software both iPhone 13 and newer iOS devices may be perfectly suitable for our scanning needs, but

Based on existing research into Apple TrueDepth sensor accuracy (paper1, paper2, paper3) the following devices are suitable for the task as of 2022:

iPhone X
iPhone XS
iPhone XR
iPhone 11
iPhone 12
iPad Pro models with Face-ID camera introduced in 2021 or earlier (works as good as iPhones, but there may be potential issues with some iPad models).

In terms of 3D scan quality iPhone X or XR will work just as well as iPhone 12. The only difference is that for longer scans, older devices may start lagging due to memory overload. In fact iPhone XR is tested to work well enough, but iPhone 11 will probably never run into slowdowns at all. Thankfully these iPhones are widespread globally so if you do not have a fitting iPhone model, You can try to borrow one, or in the worst case buy a used phone for this purpose.

Apps for scanning with an iPhone

Apple devices with Face-ID sensor provide a cheap and usefully-high quality 3D scanning tool to anyone who has access to such device. Apple does not provide any 1st party tool for 3D scanning with Face-ID sensor, therefore you need to find a suitable app that provides the necessary functionality:

The app must use Face-ID sensor which is on the same side as the screen. There are many alternative 3D scanning methods and apps that can rely on LIDAR, photogrammetry or even an external sensor. Some apps may support multiple methods as well.
Check the export formats and payment model. Ideally app should have one-time purchase to enable unlimited scans with export to STL or other generic 3D mesh format. You will need over 10 scans per each person, so it is not OK to have a limit on the number of scans.
Apps are not the same – Because Face-ID sensor does not have a comprehensive 3D scanning library from Apple, each app implements their own algorithms to make the scanning robust and precise. If one app is performing poorly, you may find a better one.

At the moment (in 2022) one app that works reasonably well, has reasonable pricing and all the necessary functionality is Heges 3D v1.6. This commercial app has been evaluated in detail (paper1 - note, older Heges 3D version) and was used in the making of this tutorial, but Mesh2HRTF has no affiliation with the developers of this app. As other alternatives appear and get tested, this tutorial could be updated to include links to additional apps.

Note that some iPhone 3D scanning apps have a “screen sharing to a 2nd device” feature. Because Face-ID sensor in on the display side, it can be hard to scan larger objects without additional screen. BUT for Mesh2HRTF purposes, it is possible to comfortably scan a person by using phone as a “selfie” camera. See video tutorial for an example.

Settings for Heges 3D

Scanning in Heges 3D works fine with 0.5 mm precision. It is possible to adjust scanning range as well, but usually in 0.5 mm mode the range will be short enough by default.

In the iOS Settings, search for Heges to open the app specific settings. Here it is important to "Enable Infinite Scanning". Also check that units are set to “mm”.

For an example see the video at 05:25.

3D Scanning process

The steps described here are focused on iPhone, but may be useful for other 3D scanning methods as well. During the making of this tutorial, all source scans were made with commercial app Heges 3D v1.6 running on a regular iPhone 11 or iPhone XR.

Geometric references:

While many hand-held 3D scanners are supposed to just work on any object, scanning ears may prove hard even for professional scanners. Therefore it is a good idea to include some geometric references into the scene that help 3D scanner to orient itself relative to the human.

Geometric reference requirements:

Reference should always be well visible from most angles. To catch the inner details of the ears, 3D scanner may need to be placed at unusual angles towards the head and that can lead to loss of tracking. Reference geometry that sticks out and is always in the field of view can help.
Reference should be regular and simple. Regular shapes (straight lines, sharp edges) not only help 3D scanners to orient in space but can also be used as a repeatable quality check to measure how accurate is the 3D data from a specific take, app or device.
- Note that if a scanner triangulates its position by round reference markers - for the best results the markers can be glued to the extra geometric reference.
Reference should not interfere with the scanning - extra references should not cover the ears and have minimal contact area with the head.

Here are some examples that can work for placing on a head:

A frame made out of a toy construction set. This is a very good example that satisfies all requirements and is easily adjustable in size. Clearly not everyone has suitable construction set, but a similar structure can be even nailed together using any available wood material. And there are other ways to make a similar, adjustable frame around the head, including 3D printing.

Lego Duplo bricks held in place by a thin rubber string. The shiny Lego bricks may not be ideal as an object to scan, but this is one approach to consider.

How to scan

The main principle that needs to be understood is: 3D scanners have limited range. It means that:

If the subject sits on a chair without leaning back, it is possible to scan the head without catching the chair, room or furniture into the scan. Therefore we can avoid problems due to relative movement between static objects and the human subject, because people always move due to breathing and other natural reasons.
It is possible to scan another person while holding phone in a "selfie" position. Person who is scanning will be in the line of sight but outside of the 3D scanning range. This is very useful trick when using iPhone Face-ID camera.

The recommended scanning procedure with Face-ID sensor includes:

in all cases:
- Subject should sit with a straight back on a chair placed so that it is possible to walk 360deg around the subject.
- Optionally: Use geometric reference that is visible on both sides of the head.
Base head scan – rough scan of head & shoulders in looking straight pose – 3-5 takes to get the base mesh with straight neck and overall head shape where the detailed ear scans will be merged in later:
- Start scanning from the BACK. After scanning 360deg around the subject, any accumulated errors are less significant on the back of the head (sound localization from the back is never very precise and due to the hair, back of the head may still require some sculpting in pre-processing).
- Focus on getting un-distorted data on the relative position of the ears, shape of the face and nose outline (notice that real faces are not actually symmetrical).
- Go around the person to get full coverage, but do not worry about finest details. This will be the relatively low detail base mesh where detailed meshes of the ears will be merged in pre-processing.
- Be reasonably quick – the longer you scan, the higher the likelihood that something goes wrong.
- Do include shoulders in the scan, but missing data in shoulder region is not a real issue – shoulders can be filled in with very rough geometry in mesh pre-processing.
Detail scan Left/Right – this is the scan that defines the final ear mesh for simulation. 3-5 takes to get all the details around the ear and one half of the head:
- Suggested sequence is to start from the back of the ear and move along the side to capture all details in one good pass. When there are no holes left, try to spend a LOT of time scanning the problematic creases in the pinna.
  - As scanner resolution is limited, we need to collect a lot of data to convince the algorithm that deep, narrow creases are actually there. (see illustration for an example of an area that needs to be scanned for especially long time)
- Minimize accumulated error - the most important parts should be scanned last or should stay in the scanner's field of view till the end of the scan.
- Avoid holes - after each scan check that there are no complex holes or defects in the mesh. It is often better to re-scan the subject than to try to afterwards fix complex holes in the mesh.
  - Simple hole can be automatically filled in by a smooth continuation of the surrounding boundaries.
  - Complex hole is impossible for an algorithm to guess and requires artistic sculpting to reconstruct or the areas will never match the reality.
- Do not scan neck and cheeks more than necessary (single, fast pass) because those areas can deform and introduce errors if they are scanned repeatedly.
Optional: Quick face scan – follow up the “base scan” with about 3 takes of very short scans to get un-distorted quality reference:
- The goal is to grab only the key dimensions of face, eyes, nose and position of the ears. No worries about holes and other parts that are missing. This data can serve as cross check to see if the Base head scan was distorted or accurate enough.
- The subject should not breathe during the fast scan. The scan should only take a few seconds to avoid breathing related movement.
  - Ask subject to take a deep breath and exhale. When you hear the exhaling sound, wait a second to “settle down” and start the scan after (remember to announce the start). Stop the scan as soon as possible after getting the minimum of needed data.
Optional: Quick ear scan Left/Right – another 3 fast takes for Left and then for Right side scan to get the ear profile.
- Again the objective is to get un-distorted reference while the subject is not breathing.

In total the data-set for one person should contain about 9-24 STL files before clean-up and pre-processing:

3-5 Base head scans
3-5 Detail scans Left
3-5 Detail scans Right
3 Quick face scans
3 Quick ear scans Left
3 Quick ear scans Right

One dataset can require over 10GB of storage on the device. You can re-name each scan file in the app. To copy data to a PC each scan must be first “exported to STL”. There are several methods to transfer files, but connecting device via iTunes should work for everyone.

The reason for multiple takes (repeating the same scan several times) is because – usually the 1st take gets corrupt due to operator errors or subject not yet being steady and calm. As for the other 2-3 takes there are always risks that something goes wrong even with a "perfect 3D scanner". You will be happy to have several takes to choose from for pre-processing.

More Tips

Face-ID sensor scanning even with good software is highly dependent on technique and practice. These tips are based on https://hege.sh/faq “How do I scan?” and also include ear scanning specific experience as well as information from other sources:

Do not scan yourself. It is virtually impossible to get a high quality scan of your own head – another person scanning you avoids subject movement and produces a lot better results.
Keep the subject still and move the device around the subject (people are never 100% still, but it works well enough).
Practice handling the scanner! The person handling the scanner should follow the tips and just practice in finding a good sequence and hold to perform the scan as good as possible. Results will improve with practice.
Allow tracking to recover. If Scanning software says it lost tracking, try smoothly returning back to the last position where tracking worked. Usually algorithm will recover and continue.
Do not bump or touch the subject during scanning. Simple concept, but not so easy in practice. Ensure there is enough space to move around the subject for all angles.
Announce to the subject when you start and when you have finished scanning. It helps.
Person being scanned should have eyes closed and be as still as possible. In practice people will move a bit, but hopefully scanner algorithm manages to ignore that data and resume scanning when subject returns to initial position.
Use the physical Volume Up-Down buttons to start and stop the scanning in iPhone scanning apps.
Do the scanning in-doors – UV light from sunlight can interfere with the sensor.
Heges 3D scanning method does not use visible light and therefore scanning can be done in any light conditions. Tracking does not use any 2D features – no use placing any 2D markers, points, lines or otherwise painting the skin.

For more details see the video tutorial.

Next tutorial step >>> 3D mesh clean-up and merging

Provide feedback

Saved searches

Use saved searches to filter your results more quickly