{
  "metadata": {
    "asset_id": "0",
    "logical_subdocument_id": "0",
    "semantic_modality": "DOCUMENT",
    "s3_bucket": "bedrock-bda-us-east-1-564c8cc1-3078-49bc-92f6-ae8cf1d0a548",
    "s3_key": "62c75269_f45f_4525_b91c_38386a5423ab_whp415.pdf",
    "number_of_pages": 14,
    "start_page_index": 0,
    "end_page_index": 13
  },
  "document": {
    "description": "This is a research white paper about live music in immersive virtual spaces",
    "summary": "This document presents approaches the BBC has been trialling for delivering live events into virtual immersive spaces. It discusses the business, operational and audience requirements, technical approaches using volumetric capture and 2D/2.5D capture, and reports results from trials including technical aspects and audience feedback. The document was originally published at the IBC 2024 conference.",
    "statistics": {
      "element_count": 103,
      "table_count": 0,
      "figure_count": 15
    }
  },
  "pages": [
    {
      "id": "898bf1a0-cc63-4ed4-a1bb-4ce7547e5138",
      "page_index": 0,
      "representation": {
        "markdown": "# B B C\n\n# Research & Development White Paper\n\n# WHP 415\n\n*September 2024*\n\n**Live Music in Immersive Virtual Spaces**\n\nG. A. Thomas, F. M. Rivera, L. Kelso, B. Weir, P. Rich, O. Moolan-Feroze\n\n*BRITISH BROADCASTING CORPORATION*"
      },
      "statistics": {
        "element_count": 7,
        "table_count": 0,
        "figure_count": 0
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2480,
        "rectified_image_height_pixels": 3507,
        "corners": [
          [
            -5.1841300685942596e-9,
            -4.696901387736597e-8
          ],
          [
            1.0000286471459174,
            -0.000010205604325069605
          ],
          [
            1,
            1.0000040376835615
          ],
          [
            -0.00001696898271479914,
            0.9999999303847662
          ]
        ]
      }
    },
    {
      "id": "060e8521-79fa-497e-aac8-12c50689dcab",
      "page_index": 1,
      "representation": {
        "markdown": "White Papers are distributed freely on request. Authorisation of the Head of ARA/Group or Head of Standards is required for publication.\n\n© BBC 2024. All rights reserved. Except as provided below, no part of this document may be reproduced in any material form (including photocopying or storing it in any medium by electronic means) without the prior written permission of BBC except in accordance with the provisions of the (UK) Copyright, Designs and Patents Act 1988.\n\nThe BBC grants permission to individuals and organisations to make copies of the entire document (including this copyright notice) for their own internal use. No copies of this document may be published, distributed or made available to third parties whether by paper, electronic or other means without the BBC's prior written permission. Where necessary, third parties should be directed to the relevant page on BBC's website at http://www.bbc.co.uk/rd/pubs/whp for a copy of this document.\n\n2"
      },
      "statistics": {
        "element_count": 4,
        "table_count": 0,
        "figure_count": 0
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2479,
        "rectified_image_height_pixels": 3499,
        "corners": [
          [
            0.0003972010987420236,
            0.001355373522886565
          ],
          [
            1.000000196887601,
            0.0013970644820066201
          ],
          [
            1.0001097648374495,
            0.999350629098945
          ],
          [
            0.0004460555411154224,
            0.9992422381798902
          ]
        ]
      }
    },
    {
      "id": "eb7248e3-fcef-4d2b-a77e-59952a35f14a",
      "page_index": 2,
      "representation": {
        "markdown": "# WHP 415\n\n# Live Music in Immersive Virtual Spaces\n\n**G. A. Thomas, F. M. Rivera, L. Kelso, B. Weir, P. Rich, O. Moolan-Feroze**\n\n## Abstract\n\nGame-like environments that offer live multiplayer capability are becoming a major form of entertainment. A large community, estimated to be 3.2M in the UK, also spend time in these environments for social & experiential reasons, rather than gaming. Music artists are using these spaces to present virtual concerts, drawing in big crowds and revenue. Broadcasters, as well as major music labels, are starting to look at how to harness games-like media to deliver live music events.\n\nThis paper presents approaches the BBC has been trialling for delivering live events into virtual immersive spaces. Trials are being run to explore the use of low-latency volumetric capture technology of the artists, to allow virtual attendees, through their avatars, to interact with the performer and each other. Other trials are looking at the capture of performers in larger spaces such as stages at a festival, relying on 2/2.5D video approaches. Results from these trials are reported, including technical aspects and audience feedback.\n\nThis document was originally published at the IBC 2024 conference.\n\n**Additional key words:** metaverse, MAX-R\n\n4"
      },
      "statistics": {
        "element_count": 9,
        "table_count": 0,
        "figure_count": 0
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2480,
        "rectified_image_height_pixels": 3507,
        "corners": [
          [
            -0.0000060923386275047255,
            -6.349061402718749e-9
          ],
          [
            1.000002165763609,
            -0.0000014508392429178449
          ],
          [
            1.0000000984438004,
            1.0000008353828058
          ],
          [
            1.0702548582142505e-9,
            1
          ]
        ]
      }
    },
    {
      "id": "45b802bb-11d9-4d60-b43b-8e1553dc1e8c",
      "page_index": 3,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization related to live music in virtual immersive spaces.](./50bcf0fc-302f-49de-a7c6-da7049a38f3c.png)\n\nib\n\nIBC2024\n\n# LIVE MUSIC IN VIRTUAL IMMERSIVE SPACES\n\nG. A. Thomas¹, F. M. Rivera¹, L. Kelso¹, B. Weir P. Rich O. Moolan-Feroze² ¹BBC R&D, UK, 2 Condense Reality, UK\n\n## ABSTRACT\n\nGame-like environments that offer live multiplayer capability are becoming a major form of entertainment. A large community, estimated to be 3.2M in the UK, also spend time in these environments for social & experiential reasons, rather than gaming. Music artists are using these spaces to present virtual concerts, drawing in big crowds and revenue. Broadcasters, as well as major music labels, are starting to look at how to harness games-like media to deliver live music events.\n\nThis paper presents approaches the BBC has been trialling for delivering live events into virtual immersive spaces. Trials are being run to explore the use of low-latency volumetric capture technology of the artists, to allow virtual attendees, through their avatars, to interact with the performer and each other. Other trials are looking at the capture of performers in larger spaces such as stages at a festival, relying on 2/2.5D video approaches. Results from these trials are reported, including technical aspects and audience feedback.\n\n## INTRODUCTION\n\nGame-like environments that offer live multiplayer capability are becoming a major form of entertainment. A large community, estimated to be 3.2M in the UK, also spend time in these environments for social & experiential reasons, rather than gaming. Music artists are using these spaces to present virtual concerts, drawing in big crowds and revenue. Ariana Grande's 2021 'Rift Tour' in Fortnite, garnered 78M views across 5 events. In 2020, Travis Scott's 5 events in Fortnite generated an estimated $12M in merchandise sales. At the same time, audiences for TV and radio are reducing. Broadcasters, as well as major music labels, are starting to look at how to harness games-like media to deliver live music events. Such approaches offer the potential for audiences to engage in a more interactive way with the performers and each other.\n\nThere are many challenges around delivering a live concert into these virtual immersive spaces. The Fortnight concerts relied on pre-generated animated avatars of the performer, making live interaction with the audience unfeasible. Such an approach would also significantly add to the production cost and make a 'simulcast' of a broadcast event difficult or impossible.\n\nThis paper presents approaches the BBC has been trialling for delivering live events into virtual immersive spaces."
      },
      "statistics": {
        "element_count": 10,
        "table_count": 0,
        "figure_count": 1
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.000006765202258393343,
            4.534692643900947e-8
          ],
          [
            1.00000669148025,
            -0.00000686238371622066
          ],
          [
            0.9999999015958787,
            1.00000905256168
          ],
          [
            -4.94394662719804e-8,
            0.9999999303649102
          ]
        ]
      }
    },
    {
      "id": "6a261118-a1b9-4974-bd6c-d15bc9870051",
      "page_index": 4,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image appears to be a logo for an event or organization called &quot;IBC2024&quot;. The logo consists of the letters &quot;ib&quot; in a red circle, with the text &quot;IBC2024&quot; written below it.](./00ee54d8-0ddc-4966-89a8-30fd89a3d074.png)\n\nib\n\nIBC2024\n\n# BUSINESS, OPERATIONAL AND AUDIENCE REQUIREMENTS\n\nPre-animated performers rendered as 3D avatars in virtual music experiences enable fans to experience the music in shared spaces and interact with others. However, initial surveys we conducted before embarking on any user trials indicate that potential audience members and live music artists would like the audience to be able to interact with the performer in real-time through modes such as sing-along, cheering, dancing, and expression of emotions \\[Rivera et al. (1)\\]. This was later borne out through audience surveys of the trial events described later in this paper. Video-based capture of performers (as distinct from motion capture) was also favoured from a business perspective, as this makes it easier to simulcast a performance being staged for a live audience or TV show, where the performer would not want constrains on what they can wear.\n\nWe took a holistic approach for this preliminary research to elicit a requirements wish list for live immersive music events. We considered an audience perspective, the perspective of music artists, and that of extended reality (XR) practitioners with experience in the field. We treated the initial findings as points to consider, but still open to further investigation, on which subsequent sections of this paper will elaborate.\n\nOur survey \\[Rivera et al. (1)\\] had a total of 45 responses from potential audience members, and 15 responses from workshop participants. These indicated that the ability to interact with each other and with the music artist, the agency to choose a point-of-view, adapt the audio mix, re-watch the performance, explore the environment, and take away digital artefacts, were unanimously popular features. Proximity-based audio chats were the most desired mode of interaction with friends, with text chats the next most popular. However, text chats were the most desired mode of interaction for conversing with new people, rather than audio. Personalisation of avatars was also a high priority, with the added option for gender-neutral avatars. The strongest incentives to join a live immersive music experience (apart from the music) was to socialise with friends.\n\nThe importance of supporting text-based chat was also reiterated in interviews with 4 established XR practitioners. Amongst other findings it was identified that a low latency real- time experience, supported by a text chat platform (either built into the platform or using an external service, such as Discord) was considered key to helping foster a sense of engagement for an audience attending a virtual event. This was also borne out in feedback from pilot live events as described later in the paper.\n\nWe also interviewed 5 music artists (with substantial experience playing live gigs) to explore expectations from a creative performer's perspective. The artists prioritised good quality sound, with spatial audio and realistic room acoustics generally preferred, although some preferred sound to reflect their own intended sound mix. Audience reaction through real- time low latency feedback was considered essential to the performers. If playing to a live real-world crowd and simultaneously to the virtual audience, it was deemed important that their view of the virtual world did not distract from the live performance (for example having to look at a small monitor, or text messages). The concept of a large LED wall displaying the virtual world was appealing to provide more of a seamless audience. The potential to reach new audiences, and to express themselves in more immersive and creative ways was highly appealing. Some of the artists preferred being represented as realistically as possible (through volumetric capture for instance), while others were enticed by the possibilities of having more creative representations (such as custom avatars)."
      },
      "statistics": {
        "element_count": 7,
        "table_count": 0,
        "figure_count": 1
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.000009694200118719114,
            6.01071673174454e-8
          ],
          [
            1.0000041329730955,
            -0.000002535438580524561
          ],
          [
            1,
            1.000001323066707
          ],
          [
            -2.00380297931957e-8,
            1.0000001392701796
          ]
        ]
      }
    },
    {
      "id": "3b7df286-73b3-487b-83a8-f58f6f6a0715",
      "page_index": 5,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image appears to be a logo for an event or organization called &quot;IBC2024&quot;.](./a1816763-4606-4fa7-9e0e-94418057a28d.png)\n\nib\n\nIBC2024\n\n# TECHNICAL APPROACHES\n\n## Volumetric Capture\n\nVolumetric capture describes a group of technologies that are able to represent real world 3D objects and events in a way that can be played back and rendered from a floating \"free viewpoint\" camera. Typically, this type of playback is enabled via 3D game engines which allow events that have been volumetrically captured to appear alongside traditional 3D graphics assets.\n\nThe volumetric capture system used in the experiments takes a similar form to that presented by Orts-Escolano et al. (2). On the recording end, a set of 10 calibrated depth and RGB sensors capture raw image frames. These frames are passed through a \"fusion\" process that generates a 3D model, which is represented as a triangle mesh and UV mapped texture. The fusion process implements a real-time hole filling and an occlusion prediction process which compensates for noisy and incomplete information coming in from the cameras. During operation, the system records everything that occurs within a 3D \"capture area\". Events that fall outside of this area will not be represented in the output model. The capture area is limited to 4mx4mx3m to ensure real-time playback. There are also limitations on the amount of surface area within the scene, both in terms of processing times and bit-rate limitations. Once the data is fused, the model is delivered to a cloud-based distribution system which compresses the data to support a number of different bit rates, and will generate segments similar to an HLS/DASH system. The segments are then served out via a manifest to client devices over a CDN.\n\nAs a means of injecting live content into virtual spaces, volumetric capture has a number of benefits. Compared to methods such as motion capture, it provides a much more authentic representation of the performance, allowing anything within the capture volume to be captured and represented in-game. It does not suffer from the uncanny valley effect that can result from the armature of the motion capture not being able to accurately represent the performer's movements. When compared to 2D and 2.5D video, volumetric video enables full free viewpoint representation, as well as the opportunity to shadow cast and relight the content which allows better integration into the virtual scene.\n\nHowever, volumetric capture does present challenges. A calibrated camera rig needs to be set up surrounding the performer. The calibration can take time, and it is possible to disrupt the calibration during an event if the cameras are knocked. Additionally, depending on the technology employed within the cameras, certain materials and surfaces are unable to be properly captured. Some of this can be solved algorithmically, although in practice, it is simplest to put some limitations on the artist's costumes and props. The interaction between the cameras and the lighting can cause additional problems. To be able to relight the content within the game engine requires flat lighting in the capture area. This causes difficulties when recording content during in-person events where existing event lighting is in place.\n\n## 2D / 2.5D Capture\n\nA simpler approach that may suffice in situations where a fully-free viewpoint is not required, is to use one or more conventional 2D video streams (sometimes referred to as video billboards), or so-called 2.5D video (where depth data may be used to help extend the range of viewpoints) \\[Grau et al. (3)\\]. Such approaches may be particularly suitable when the range of viewpoints is naturally limited, such as for an audience viewing a stage from the front. The lack of true 3D may also be less apparent in browser-based VR use cases (which are our"
      },
      "statistics": {
        "element_count": 9,
        "table_count": 0,
        "figure_count": 1
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.000004296428278082563,
            -4.2331616862539435e-8
          ],
          [
            0.9999975398969669,
            -0.000006490549592208808
          ],
          [
            0.9999976383010882,
            1.0000031335790431
          ],
          [
            3.6067523091133564e-8,
            1
          ]
        ]
      }
    },
    {
      "id": "56925c7c-37f3-4b73-add4-37255f3f87f1",
      "page_index": 6,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters &quot;ib&quot; in a red circle, with the text &quot;IBC2024&quot; written below.](./913ec5f3-0dea-4431-abd2-19d6b7b65313.png)\n\nib\n\nIBC2024\n\ncurrent focus), where the user has no stereoscopic depth cues, rather than applications using a VR headset. There is also anecdotal evidence that flat images placed within a 3D environment can trick the brain into seeing them as true 3D, through the successful use of Pepper's Ghost illusions for so-called holographic performers on stage \\[Grow (4)\\].\n\nAdding depth data to images, or simply using an alpha mask to define the foreground area, can help make a performer captured in a single video stream look better-integrated into a 3D environment. However, often the stage background or lighting/haze effects form an important part of a live performance, so for our initial work we have focused on what can be achieved with one or more conventional video streams without any sophisticated processing, to provide a baseline for representing a live music stage performance without needing to impose constraints on the performance itself.\n\nFor this experiment, we built a 3D \"nightclub-style\" environment in Unreal Engine that included a replica of the real-world stage on which we had recorded a number of musical performances. We then tested the user responses to the following scenarios:\n\n- 1) A single view of a performer taken from a camera in front of the stage, placed on a 3D plane covering the virtual set stage boundaries.\n- 2) A view of the performer from one of three different camera positions, placed on a 3D plane covering the virtual set stage boundaries. The camera feed displayed on that plane was chosen on a frame-by-frame basis according to whichever camera was closest to the player viewpoint, the idea being to switch to the \"best\" perspective on the musician as the user moves about.\n- 3) As #2, but with the addition of a strobe-light effect from virtual lights in the 3D scene that triggered whenever the displayed camera view changed. Early experimentation showed that this could distract the user from noticing the camera view transition.\n\nThe video from each camera was packed into one of four quadrants in a single 3840x2160 UHD video encoded as mp4 at 20 Mbps. That meant that the resolution of a single camera feed was 1920x1080. This resolution was used for each of the three scenarios above. Since we were using at most three cameras, the fourth quadrant can hold a \"broadcast-style\" cut that can be used as a texture for a virtual \"big screen\" in the virtual nightclub (note though that we did not add the big screen for this experiment). Although we would typically stream the video live into the rendering application, for this experiment the videos were added as files into the build to ensure consistent display quality for the user.\n\nThe three cameras were positioned to record the performance as shown in the photo below. The height off the ground for each camera was chosen to match the typical viewing height of the user's camera in the 3D scene, and the distance from the stage was chosen to match the typical viewing distance of the user from the stage in the 3D scene.\n\nEach camera view was framed such that its image encompassed the entire stage. A physically accurate model of the stage was created and placed into the 3D world, and a plane (onto which the texture of the video from the cameras was drawn) was positioned to cover the stage corners. In order to ensure that the stage corners in the video were correctly placed at the corners of the video plane, a transform matrix for each camera was calculated that mapped the UV values for the stage corners in each camera image to the corners of the video plane."
      },
      "statistics": {
        "element_count": 8,
        "table_count": 0,
        "figure_count": 1
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.000006679286343202818,
            -2.4538530229128095e-8
          ],
          [
            1,
            -0.0000028320990876850914
          ],
          [
            1.0000044281854594,
            1.000008565116051
          ],
          [
            4.256075365522915e-8,
            0.9999999303649102
          ]
        ]
      }
    },
    {
      "id": "2264da87-a211-4a15-8d10-dc5cb966d124",
      "page_index": 7,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to the media and broadcasting industry.](./de8e06d6-3354-44e1-8130-760e016b95fa.png)\n\nib\n\nIBC2024\n\n![Title: Capture of a Performance in a Virtual Nightclub\nThe image shows a capture of a performance taking place on a stage, with central and left-hand cameras visible. The camera feed from the left-hand camera is embedded in a virtual nightclub scene, creating an immersive experience for the audience.](./1a610f4f-51d3-4476-ba7d-420ea8eca2bd.png)\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\nma/R\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\n\n![Title: Capture of a Performance in a Virtual Nightclub\nThe image shows a capture of a performance, with central and left-hand cameras visible on the left side. The camera feed from the left-hand camera is embedded in a virtual nightclub scene on the right side, displaying a colorful and dynamic virtual environment.](./098356bc-a881-4b54-b3e7-e8d52346832e.png)\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\nma/R\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\n\n## Delivery and Rendering\n\nFor the user to experience the performance in a game-like environment, they need to be able to run what is effectively a 3D multiplayer networked game that receives a live stream of the captured data. There are several approaches that can be taken for this.\n\nPixel streaming, browser-based rendering and using a downloadable executable have a number of benefits and drawbacks. Pixel-streaming represents the most easily accessible technology for users. Typically, only a browser is required, and the bandwidth requirements are similar to those required for typical video streaming. This is because the heavy rendering tasks are performed remotely, and only the rendered output is delivered to the client. The main drawback of pixel streaming is the cost and limitations on availability of cloud rendering resources. These limitations can severely constrain the overall reach of the event. From a user's perspective, pixel streaming has a few drawbacks. First, due to additional latency incurred by sending control inputs to the cloud renderer, game play can feel laggy and unresponsive. Furthermore, due to compression artefacts, the visual presentation is not always great. These artefacts can include blocking, as well as colour space reductions.\n\nBrowser-based rendering has a number of advantages. The user only requires a modern web browser to access the event. As the content is rendered locally there is none of the input lag and compression artefacts caused by pixel streaming. Another benefit of browser rendering is that it can better integrate with web-based platforms, as no switch between applications is required when moving from the platform to the performance. The main drawback from browser-based rendering is the limitations that browsers put on access to the user's machine for security purposes. This can limit the feature-set that can be delivered to users. In the case of volumetric video, the rendering process can also require very modern browser features such as web codecs, which require users to be running the most recent version of their browser.\n\nRendering via a downloaded application will provide the best user experience, as the application can make full use of the user's system, with no limitations on the rendering feature set. However, as the application will need to be downloaded and installed on the user's system, this method is likely the most limited in its accessibility. Additional limitations can be imposed depending on the application distribution method. Distributing via digital platforms such as Steam or the App Store can simplify the download and install process, as well as provide users with additional assurances around the security of the application. However, these digital platforms can put limitations on monetisation options, as well as imposing significant requirements on submission."
      },
      "statistics": {
        "element_count": 8,
        "table_count": 0,
        "figure_count": 2
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            3.314474364367255e-8,
            -1.6212599633975592e-8
          ],
          [
            0.9999962606433898,
            -0.0000025846698431375704
          ],
          [
            1.0000107260492241,
            1.000022143958571
          ],
          [
            -0.0000042631071254593,
            1
          ]
        ]
      }
    },
    {
      "id": "8e4c0df7-044b-492f-b2c7-bd53e6c1e860",
      "page_index": 8,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image appears to be a logo for an event or organization called &quot;IBC2024&quot;.](./468b118b-3409-4978-b0a6-848ca4ff3baf.png)\n\nib\n\nIBC2024\n\n# RESULTS To DATE\n\n## Volumetric capture\n\nWe have so far conducted four trials using volumetric capture. The first three of these relied on a pixel streaming solution and the fourth on browser-based rendering. All were accessible via the browser and did not require any additional downloading of supporting applications. Each trial was conducted as part of a live music session broadcast as part of a radio show and was therefore produced in collaboration with a studio production team. The artists involved each performed three songs and were captured in the volumetric rig which had been constructed within the recording studio space.\n\nFor the duration of these live performances, the audience were given the option of joining the immersive experience which included the ability to choose an avatar and then navigate a virtual venue using keyboard and mouse controls. The fourth trial also included the ability to join via a browser on a phone and included touch control. A volumetric capture of the live performer was visible on a virtual stage alongside fellow avatars that had joined the experience. Technical limitations meant that only 30 user avatars could be rendered in a common space, so the audience was divided into identical 'rooms', each holding up to 30 people plus the performer. Each performer had access to a screen as they performed which gave a fixed perspective of the virtual venue meaning they could see avatars of audience members and their movements in one of the rooms with a 10-second delay. These trials were publicised on the radio show with joining details offered on air. A 'common stream' was also produced from a virtual viewpoint controlled by the production team, which was made available as conventional streaming video for those unable to join virtually.\n\nAround 150 unique users joined each trial as avatars and feedback was captured for the third trial via three online focus groups of differing ages (15-16, 17-19 and 20-24). Overall, responders found the experience intriguing and unique but their interest to attend further events was predominantly driven by the choice of artist performing. In particular, it was felt that the experience lacked social features that would better emulate the communal feeling of a concert experience. On the basis of this feedback, and our initial audience research, a Discord server was used as part of the fourth trial which allowed for interaction between participants and the radio DJ. This - in particular - was a feature that the audience seemed to value.\n\nOur initial findings from these trials showed that the livestreaming of volumetric capture into a 3D game engine environment allows for elements of interaction with a virtual audience which is not possible via other mediums. The level of interaction from the performer was very dependent on their comfort with the technology and being briefed sufficiently, though dance moves and emotes played an important role in making this interaction feel two-way and dynamic between attendees and performer. While we are yet to test in-experience voice or text communication, Discord proved a powerful way for users to interact with each other, build anticipation and share their experience of the live experience together. Re-posting some of these messages into the in-app messaging solution also made the experience feel more alive and dynamic for those not actively participating in the Discord chat. Scaled interaction was made challenging as a result of the 30-person room limitation applicable within our trials meaning some participants found themselves in less trafficked rooms, and only one room was visible to the performer."
      },
      "statistics": {
        "element_count": 7,
        "table_count": 0,
        "figure_count": 1
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.000004394857174612158,
            2.761334867170957e-8
          ],
          [
            1.000002460103033,
            -0.0000081588070535755
          ],
          [
            1,
            1.0000059886177268
          ],
          [
            -5.18581869661754e-8,
            1
          ]
        ]
      }
    },
    {
      "id": "9c06d452-0eaa-477e-a91c-8edb2dbce995",
      "page_index": 9,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to the broadcasting or media industry.](./4723f8c1-df97-49b7-899e-b0c7ce220f71.png)\n\nib\n\nIBC2024\n\n![Title: Volumetric Capture and Delivery System\nThe image shows a capture of a performance, with a monitor on the left displaying the view of the audience, and a volumetric render of the captured performer being watched by audience avatars on the right. The image is accompanied by text describing the technical details of the volumetric capture and delivery system, including the bitrate and latency metrics measured during the recording and the compression settings used for the &quot;low bitrate&quot; and &quot;high bitrate&quot; streams.](./53724fd6-8624-4e9f-b204-d0b89ad77f7e.png)\n\nFigure 2 - Capture of a performance, with monitor to show view of audience (left); volumetric render of captured performer being watched by audience avatars (right).\n\nThe volumetric capture and delivery system is able to measure bitrate and latency metrics while recording. During the browser-based rendering event we compressed the volumetric content at two bitrates. The \"low bitrate\" was produced to be consumed by browsers and the \"high bitrate\" to be consumed by the team producing the common stream, where there were no bandwidth limitations. Of the 3 types of streamed content (audio, mesh geometry, and texture), for the \"low bitrate\" stream the audio was \\~1mbit/s, the mesh geometry \\~8mbit/s and the texture data \\~10mbit/s. This resulted in a total bandwidth requirement of \\~20mbit/s. For the \"high bitrate\" stream, the equivalent values are \\~1mbit/s, \\~24mbit/s and \\~20mbit/s for a total of 45mbit/s.\n\nFor the events which were handled via pixel streaming, we ran with a single bitrate. This was set quite high as there were no bandwidth limitations on the cloud infrastructure handling the pixel streaming. The values were \\~1mbit/s for audio, \\~30mbit/s for mesh geometry and \\~35mbit/s for the texture data.\n\nThe latency of the pipeline was measured in a number of places. The total in-rig latency - which measures the on-site processing time - was around 100ms. Once delivered to the cloud, the compression added around \\~1000ms and the segment processing and delivery to the CDN incurred a further \\~3500ms of latency. In total this adds up to \\~4600ms of latency. This gives plenty of head room to allow the clients to playback at a consistent 10- second latency. Although this sounds high, it can largely be mitigated by briefing the performer to not expect instant reactions from the attendees. Interactions between the attendees themselves had a much lower latency.\n\n## 2D / 2.5D Capture\n\nWe conducted a multi-day test shoot with four music acts at Production Park studios, to trial our technology in a setting resembling a live concert/festival. We live streamed into our virtual nightclub in Unreal using our three locked-off cameras approach (on the left, middle and right-hand sides of the stage).\n\nWe evaluated a number of streaming options for delivering live video into an Unreal application and settled on the Millicast plugin \\[Dolby (5)\\]. This provided the desired functionality, a managed streaming service, and compatibility with existing applications built by our partners in the MAX-R project in which this work was carried out. Millicast provides support for HD resolution video with low latency for live video. This meant that as we were"
      },
      "statistics": {
        "element_count": 8,
        "table_count": 0,
        "figure_count": 2
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            1.0222402253086464e-8,
            4.355105573706716e-8
          ],
          [
            0.9999950797939339,
            -0.00002958728622111741
          ],
          [
            0.9999774654562172,
            1.0000060582528165
          ],
          [
            -0.00004249789250180491,
            1.0000001392701796
          ]
        ]
      }
    },
    {
      "id": "d7c5481a-2e83-4010-9fee-4b1869d4c8ff",
      "page_index": 10,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to broadcasting or media technology.](./6595373b-5647-4a5b-aa63-83c144cfa5bd.png)\n\nib\n\nIBC2024\n\nusing a quad video incorporating multiple camera views then each camera view was reduced to 960x540 pixels. However, for the user trials for testing camera layouts, we used 4K video files rather than a live stream, with each camera view ending up as 1920x1080 pixels. A fork of the OBS-Studio video streaming software was chosen to provide the Web- RTC video stream required by Millicast.\n\nStreaming the multi-cam setup to our game environment enabled users joining as avatars to see the most relevant point of view for the position they moved to. Although this approach provides a more limited range of viewpoints in the 3D spaces than volumetric capture, it enables capturing performances with challenging staging conditions that could interfere with volumetric capture. For example, our test shoot included theatrical flashing stage lighting, and dynamic graphics displayed on an LED wall backdrop, with some haze. Dynamic virtual stage lighting was triggered by lighting in the videos of the performers to facilitate a better visual connection between the real performance and virtual space. Two of the music acts also made use of the entire stage area which was significantly larger than the volumetric capture approach discussed above can easily handle.\n\nWe conducted an online study using this set up, to explore potential preferences for the following viewing conditions for the performances:\n\n- C1: Single centre camera view only. This provides seamless viewing within the virtual venue, but skewed perspectives to either side of the stage\n- C2: Multi-cam view with 3 cameras. This provides more accurate viewing perspectives from the sides, and thus potentially more sense of depth, but (currently) at the cost of a visual glitch when the camera plane updates in response to the avatar's position in the room.\n- C3: Multi-cam view as in C2, but with flashing stage light distractions the point of camera change as the player avatar's viewpoint changes at from one camera to another.\n\nConsistent viewing angles and duration for each condition were provided by the point-of- view from a player avatar following a specified path around the virtual venue. We used video clips of the experience recorded in the gaming engine for each condition for 3 of our artists to open the study to those who are not familiar with 3D navigation in games. Participants were instructed to view each video and respond to whether the appearance of the musical performance was satisfactory. We also asked whether the lighting in the scene enhanced the viewing experience since we had used lights to distract from camera changes. We had also received indications from some pilot test participants that lighting might impact perceived levels of realism, and engagement. In a third questions, we asked about interest in joining an interactive experience based on the examples shown. Figure 3 shows some example views from the study."
      },
      "statistics": {
        "element_count": 6,
        "table_count": 0,
        "figure_count": 1
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.0000109709033510561,
            -0.0000011337821675210562
          ],
          [
            1.0000061994596432,
            -7.117399016126387e-8
          ],
          [
            1.0000000984041213,
            1.000000487445629
          ],
          [
            4.7380392262201605e-9,
            1.0000001392701796
          ]
        ]
      }
    },
    {
      "id": "c41e3e96-e87b-47e5-aad4-972590ca6dd5",
      "page_index": 11,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters &quot;ib&quot; in a red circle, with the text &quot;IBC2024&quot; below.](./bf3b89d3-6936-48dd-afc6-b24981912f8b.png)\n\nib\n\nIBC2024\n\n![Title: Example views from the study\nThe image shows multiple views of a stage setup with various lighting and camera configurations. The top row (a, b, c) shows single camera views from either side of the stage and the center. The middle row (d, e, f) shows multi-cam views, and the bottom row (g, h, i) shows the multi-cam views when the distraction lights are triggered, with the player avatar point of view changing when moving to the side.](./e958bf35-13e1-48e8-a167-341065879225.png)\n\na)\nb)\nc)\n\nd)\ne)\nf)\n\ng)\nh)\ni)\n\nFigure 3 - Example views from the study. The top row (a), (b), (c) show single camera views (artist KDYN) from either side of stage and centre. The middle row (d), (e), (f) show multi-cam views (artist TWST). And the bottom row (g), (h), (i), show the multi-cam views (artist Badliana) when the distraction lights are triggered (g), (i) as the player avatar point of view changes when moving to the side.\n\nAs shown in Figure 4, results from our 75 respondents reflect that when considering C3 (multi-cam with distractions) 55.2% of respondents agreed/strongly agreed with our three propositions regarding satisfaction of viewing angles of the performer, the lighting enhancing the experience, and interest in joining an experience based on the example. 27.7% disagreed/strongly disagreed, with the remaining 17% neutral. C2 produced similar results with 53.6% positive (agreed/strongly agreed), 28.3% more negative (disagreed /strongly disagreed), and the remaining 18% were neutral. C1 (the single-cam) option had similar neutral responses of 18%, with 50.4% agreeing/strongly agreeing, and 31.5% disagreeing/strongly disagreeing."
      },
      "statistics": {
        "element_count": 3,
        "table_count": 0,
        "figure_count": 2
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3496,
        "corners": [
          [
            4.983679631688166e-8,
            0.0015570155838456618
          ],
          [
            1.0000000984041213,
            0.0015606967231850588
          ],
          [
            1.0000141701934704,
            0.9988377207153808
          ],
          [
            -0.000008402148269924701,
            0.9988428040769395
          ]
        ]
      }
    },
    {
      "id": "ad639e7f-1e97-48b7-b40f-a5fd5f2ac167",
      "page_index": 12,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for &quot;IBC2024&quot;, which appears to be an event or organization.](./fa7b0ed1-5b34-43e1-b844-d8db8d03ad2e.png)\n\nib\n\nIBC2024\n\n![Title: Overall Study Results: Agreement Levels Across Three Camera Conditions\nThe chart displays agreement/disagreement levels for three different camera conditions: multi-cam with distractions, multi-cam, and single-cam. Each condition shows percentage distributions across five response categories ranging from &quot;Strongly Disagree&quot; to &quot;Strongly Agree&quot;. Across all three conditions, the &quot;Agree&quot; category received the highest percentage (34.1-37.9%), followed by &quot;Disagree&quot; (20.1-22.8%). The &quot;Neither Agree Nor Disagree&quot; category consistently showed around 17-18% responses. &quot;Strongly Agree&quot; and &quot;Strongly Disagree&quot; categories received the lowest percentages, with &quot;Strongly Disagree&quot; ranging from 7.6-8.7% and &quot;Strongly Agree&quot; ranging from 16.0-17.3%.\n&lt;csv_data&gt;\nCondition,Strongly Disagree,Disagree,Neither Agree Nor Disagree,Agree,Strongly Agree\nMulti-cam with distractions,7.6,20.1,17.0,37.9,17.3\nMulti-cam,7.7,20.6,18.0,37.6,16.0\nSingle-cam,8.7,22.8,18.0,34.1,16.3\n&lt;/csv_data&gt;](./5281a84a-4c9a-456d-bec4-a5cbae538367.png)\n\nMulti-cam\n with distractions\n7.6%\n20.1%\n17%\n37.9%\n17.3%\n\nMulti-cam\n7.7%\n20.6%\n18%\n37.6%\n16.0%\n\nSingle-cam\n8.7%\n22.8%\n18%\n34.1%\n16.3%\n\nStrongly\nNeither Agree\nStrongly\n Disagree\nDisagree\nNor Disagree\nAgree\nAgree\n\nFigure 4 - Overall study results, showing percentage of respondents&#x27; agreement/disagreement levels to our propositions across our three conditions\n\nWe also asked respondents at the end of the study to prioritise up to 3 factors about the experience that they would like to have improved. The most popular options were to make the 3D performer seem more a part of the 3D world (31%), with some supporting comments suggesting for integration of lighting from real-world to virtual world (e.g. \"match colours of performance with in 3D space light colour\"). 13% would also like to have the performer fill up more of the stage, and 9% would like to have more music genres.\n\nWe deem the small differences between C1 (single cam) and the multi-cam conditions worth further exploration in an interactive version of the study, where participants can freely explore the space and their own viewing angles. The qualitative feedback also indicates that the relationship between the lighting in the virtual space and in the real-world performance is worth further exploration.\n\n## Audience interactivity\n\nAs mentioned previously, audience experience and interactivity came out loud and clear in audience feedback from our surveys and trials. There were calls for more distinctive storytelling in the event, including countdowns to the event beginning, more dynamism/sense of progression during the event and clearer follow-on journeys at the end. Customisable avatars were a popular addition, though there were issues for some with the controls, especially amongst those less familiar with game environments.\n\nIn terms of audience participation, an ability to communicate between users was seen as key, with this working particularly well when the radio DJ hosting the broadcast was able to seamlessly link the immersive experience, radio show and Discord with shout outs and interactions.\n\n## CONCLUSION\n\nThis paper has presented a summary of work in progress to evaluate technical approaches and audience feedback for presentation of live music events in a game-like multi-user shared virtual space. Further developments and trials are currently taking place, and we expect to have new results to present by September 2024."
      },
      "statistics": {
        "element_count": 9,
        "table_count": 0,
        "figure_count": 2
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2480,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.000006050491575654693,
            1.9056451040985784e-8
          ],
          [
            0.999973430887243,
            -0.000010278398947789203
          ],
          [
            0.9999811064087062,
            1.000000974891258
          ],
          [
            0.000004268446933449588,
            0.9999997910947305
          ]
        ]
      }
    },
    {
      "id": "38a926a5-9790-46ad-bfe9-6068ef8d969b",
      "page_index": 13,
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters &quot;ib&quot; in a red circular background.](./1aa5bd55-86cf-432e-beb7-9cbff0ad1520.png)\n\nib\n\nIBC2024\n\nFrom the results to date, it is clear that good visual representation is important, and this includes the appearance and lighting of the virtual venue as well as the appearance of the artist themselves. There are also many other aspects that need to be considered in order to provide a good experience for users, including a clear narrative for the event in terms of how it starts and ends, and support for audience participation including communication with each other and with the artist.\n\nOver the coming months we intend to advance the volumetric capture system in two ways. First, we will look at improving the compression rates of our volumetric video. By better leveraging the temporal consistency of the data, we believe we can maintain existing visual quality at lower bandwidth, and increase the accessibility of the events. Second, we are looking at moving some of our existing on-site processing to the cloud. This will allow us to increase the reliability of the capture system by running multiple redundant pipelines, as well as relieve some of the resource constraints we have due to running on-premises hardware.\n\nWe also plan to further study the 2D video approach, in particular for larger events, with experiments planned using content from a music festival.\n\n## REFERENCES\n\n- 1. Rivera, F., Thomas, G. et al. 2023. D2.2 Report on Scenario Use-Cases for Pipelines using Virtual and XR Production (see Annex 1). MAX-R project public deliverable available at https://www.max-r.eu/documents\n- 2. Orts-Escolano, S. Et al. 2016. Holoportation: Virtual 3D Teleportation in Real-time. UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pp. 741-754. https://dl.acm.org/doi/abs/10.1145/2984511.2984517\n- 3. Grau, O., Price, M., Thomas, G. 2002. Use of 3-D Techniques for Virtual Production. BBC R&D White Paper WHP033. https://www.bbc.co.uk/rd/publications/whitepaper033\n- 4. Grow, K. 2019. Live After Death: Inside Music's Booming New Hologram Touring Industry. Rolling Stone Magazine, Sept. 2019. https://www.rollingstone.com/music/music- features/hologram-tours-roy-orbison-frank-zappa-whitney-houston-873399.\n- 5. Dolby. 2024. Dolby Real-time Streaming Player Plugin for Unreal Engine. https://docs.dolby.io/streaming-apis/docs/unreal-player-plugin\n\n## ACKNOWLEDGEMENTS\n\nThe authors would like to thank the music artists who took part in these trials, including Badliana, KDYN, TWST and Sam Tompkins. They would also like to thank staff at Production Park, UK and students at the Academy of Live Technology for help with the 2D test shoot. Some of the work reported here was carried out as a part of the MAX-R project, which is co-funded by Innovate UK and the European Union's Horizon Europe Research & Innovation Programme under Grant Agreement No. 101070072."
      },
      "statistics": {
        "element_count": 8,
        "table_count": 0,
        "figure_count": 1
      },
      "asset_metadata": {
        "rectified_image_width_pixels": 2481,
        "rectified_image_height_pixels": 3506,
        "corners": [
          [
            -0.000003152156788487917,
            -1.7689299435395677e-8
          ],
          [
            1.0000046249937022,
            -0.00001059162828389128
          ],
          [
            0.9999999015958787,
            1.0000020194176056
          ],
          [
            5.607223627148125e-8,
            1
          ]
        ]
      }
    }
  ],
  "elements": [
    {
      "type": "TEXT",
      "id": "edf825e1-d179-4921-b123-e520b065ecac",
      "reading_order": 0,
      "page_indices": [
        0
      ],
      "representation": {
        "markdown": "# B B C"
      },
      "sub_type": "HEADER"
    },
    {
      "type": "TEXT",
      "id": "b5412709-ed93-4830-b11b-e55bd6929288",
      "reading_order": 1,
      "page_indices": [
        0
      ],
      "representation": {
        "markdown": "# Research & Development White Paper"
      },
      "sub_type": "HEADER"
    },
    {
      "type": "TEXT",
      "id": "c5278e9f-eaf2-488c-8021-3315a96eecc1",
      "reading_order": 2,
      "page_indices": [
        0
      ],
      "representation": {
        "markdown": "# WHP 415"
      },
      "sub_type": "HEADER"
    },
    {
      "type": "TEXT",
      "id": "8118d5aa-768f-4db4-9a8a-409337ba9d86",
      "reading_order": 3,
      "page_indices": [
        0
      ],
      "representation": {
        "markdown": "*September 2024*"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "3dbe20b5-052a-4ed9-b383-7267b7250d2d",
      "reading_order": 4,
      "page_indices": [
        0
      ],
      "representation": {
        "markdown": "**Live Music in Immersive Virtual Spaces**"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "bc3e678e-de58-4564-9545-29e409c5a56a",
      "reading_order": 5,
      "page_indices": [
        0
      ],
      "representation": {
        "markdown": "G. A. Thomas, F. M. Rivera, L. Kelso, B. Weir, P. Rich, O. Moolan-Feroze"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "1d80b700-1d37-4933-8cee-6240ab26e902",
      "reading_order": 6,
      "page_indices": [
        0
      ],
      "representation": {
        "markdown": "*BRITISH BROADCASTING CORPORATION*"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "f6c04391-c336-4fe0-b9ee-15ee87f25f02",
      "reading_order": 0,
      "page_indices": [
        1
      ],
      "representation": {
        "markdown": "White Papers are distributed freely on request. Authorisation of the Head of ARA/Group or Head of Standards is required for publication."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "22e7385e-8308-4c53-9355-eea9234a34e3",
      "reading_order": 1,
      "page_indices": [
        1
      ],
      "representation": {
        "markdown": "© BBC 2024. All rights reserved. Except as provided below, no part of this document may be reproduced in any material form (including photocopying or storing it in any medium by electronic means) without the prior written permission of BBC except in accordance with the provisions of the (UK) Copyright, Designs and Patents Act 1988."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "a88ffb32-066b-414d-88a9-79879153e5dc",
      "reading_order": 2,
      "page_indices": [
        1
      ],
      "representation": {
        "markdown": "The BBC grants permission to individuals and organisations to make copies of the entire document (including this copyright notice) for their own internal use. No copies of this document may be published, distributed or made available to third parties whether by paper, electronic or other means without the BBC's prior written permission. Where necessary, third parties should be directed to the relevant page on BBC's website at http://www.bbc.co.uk/rd/pubs/whp for a copy of this document."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "7ed37c21-c748-4927-b7e9-8a653471c2e9",
      "reading_order": 3,
      "page_indices": [
        1
      ],
      "representation": {
        "markdown": "2"
      },
      "sub_type": "PAGE_NUMBER"
    },
    {
      "type": "TEXT",
      "id": "76040b59-8b3d-47ea-b62e-a639a26ec59e",
      "reading_order": 0,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "# WHP 415"
      },
      "sub_type": "HEADER"
    },
    {
      "type": "TEXT",
      "id": "74a97bfe-da78-4238-b4da-b9fb3fb47992",
      "reading_order": 1,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "# Live Music in Immersive Virtual Spaces"
      },
      "sub_type": "TITLE"
    },
    {
      "type": "TEXT",
      "id": "489a85f9-9b18-4cf0-8339-747bb02a0601",
      "reading_order": 2,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "**G. A. Thomas, F. M. Rivera, L. Kelso, B. Weir, P. Rich, O. Moolan-Feroze**"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "fccd167e-d20a-4f7c-82a6-2001720dd2f9",
      "reading_order": 3,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "## Abstract"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "3b2633cb-3eb5-48be-9261-eb619482421c",
      "reading_order": 4,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "Game-like environments that offer live multiplayer capability are becoming a major form of entertainment. A large community, estimated to be 3.2M in the UK, also spend time in these environments for social & experiential reasons, rather than gaming. Music artists are using these spaces to present virtual concerts, drawing in big crowds and revenue. Broadcasters, as well as major music labels, are starting to look at how to harness games-like media to deliver live music events."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "f08c54fc-1501-47ba-afc3-bcd7992aaef2",
      "reading_order": 5,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "This paper presents approaches the BBC has been trialling for delivering live events into virtual immersive spaces. Trials are being run to explore the use of low-latency volumetric capture technology of the artists, to allow virtual attendees, through their avatars, to interact with the performer and each other. Other trials are looking at the capture of performers in larger spaces such as stages at a festival, relying on 2/2.5D video approaches. Results from these trials are reported, including technical aspects and audience feedback."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "fd985b66-1e76-4cd8-9591-f7f71ebd010e",
      "reading_order": 6,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "This document was originally published at the IBC 2024 conference."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "021e9f6e-e1b5-4dbd-a3d0-7954fd89e354",
      "reading_order": 7,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "**Additional key words:** metaverse, MAX-R"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "95dc2945-af09-430f-a6cd-a19746eb1bed",
      "reading_order": 8,
      "page_indices": [
        2
      ],
      "representation": {
        "markdown": "4"
      },
      "sub_type": "PAGE_NUMBER"
    },
    {
      "type": "FIGURE",
      "id": "50bcf0fc-302f-49de-a7c6-da7049a38f3c",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization related to live music in virtual immersive spaces.](./50bcf0fc-302f-49de-a7c6-da7049a38f3c.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization related to live music in virtual immersive spaces."
    },
    {
      "type": "TEXT",
      "id": "45e1d810-deb4-4b4f-ac39-957778a16da6",
      "reading_order": 1,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "# LIVE MUSIC IN VIRTUAL IMMERSIVE SPACES"
      },
      "sub_type": "TITLE"
    },
    {
      "type": "TEXT",
      "id": "27ee844d-0ce6-4eee-a066-d1d88cda2364",
      "reading_order": 2,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "G. A. Thomas¹, F. M. Rivera¹, L. Kelso¹, B. Weir P. Rich O. Moolan-Feroze² ¹BBC R&D, UK, 2 Condense Reality, UK"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "fc3d12ec-4e96-4cf5-8839-e1cec6412008",
      "reading_order": 3,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "## ABSTRACT"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "0cd0fcbd-1c9d-408d-86b4-d43cd0089039",
      "reading_order": 4,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "Game-like environments that offer live multiplayer capability are becoming a major form of entertainment. A large community, estimated to be 3.2M in the UK, also spend time in these environments for social & experiential reasons, rather than gaming. Music artists are using these spaces to present virtual concerts, drawing in big crowds and revenue. Broadcasters, as well as major music labels, are starting to look at how to harness games-like media to deliver live music events."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "84f5fe00-e2e3-4964-88fc-01fe4038fe44",
      "reading_order": 5,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "This paper presents approaches the BBC has been trialling for delivering live events into virtual immersive spaces. Trials are being run to explore the use of low-latency volumetric capture technology of the artists, to allow virtual attendees, through their avatars, to interact with the performer and each other. Other trials are looking at the capture of performers in larger spaces such as stages at a festival, relying on 2/2.5D video approaches. Results from these trials are reported, including technical aspects and audience feedback."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "f47c7c9f-21b8-44d8-9200-7d31c083b70f",
      "reading_order": 6,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "## INTRODUCTION"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "b5d7d632-f58a-48e0-8606-21f29939782b",
      "reading_order": 7,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "Game-like environments that offer live multiplayer capability are becoming a major form of entertainment. A large community, estimated to be 3.2M in the UK, also spend time in these environments for social & experiential reasons, rather than gaming. Music artists are using these spaces to present virtual concerts, drawing in big crowds and revenue. Ariana Grande's 2021 'Rift Tour' in Fortnite, garnered 78M views across 5 events. In 2020, Travis Scott's 5 events in Fortnite generated an estimated $12M in merchandise sales. At the same time, audiences for TV and radio are reducing. Broadcasters, as well as major music labels, are starting to look at how to harness games-like media to deliver live music events. Such approaches offer the potential for audiences to engage in a more interactive way with the performers and each other."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "d1d597d0-a9d1-4fbb-8548-22e8de907937",
      "reading_order": 8,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "There are many challenges around delivering a live concert into these virtual immersive spaces. The Fortnight concerts relied on pre-generated animated avatars of the performer, making live interaction with the audience unfeasible. Such an approach would also significantly add to the production cost and make a 'simulcast' of a broadcast event difficult or impossible."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "90053f76-7657-475b-abd8-97ac4d4cf88c",
      "reading_order": 9,
      "page_indices": [
        3
      ],
      "representation": {
        "markdown": "This paper presents approaches the BBC has been trialling for delivering live events into virtual immersive spaces."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "00ee54d8-0ddc-4966-89a8-30fd89a3d074",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image appears to be a logo for an event or organization called &quot;IBC2024&quot;. The logo consists of the letters &quot;ib&quot; in a red circle, with the text &quot;IBC2024&quot; written below it.](./00ee54d8-0ddc-4966-89a8-30fd89a3d074.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image appears to be a logo for an event or organization called \"IBC2024\". The logo consists of the letters \"ib\" in a red circle, with the text \"IBC2024\" written below it."
    },
    {
      "type": "TEXT",
      "id": "54e62ca5-6f79-454e-87a3-de826ef643e3",
      "reading_order": 1,
      "page_indices": [
        4
      ],
      "representation": {
        "markdown": "# BUSINESS, OPERATIONAL AND AUDIENCE REQUIREMENTS"
      },
      "sub_type": "TITLE"
    },
    {
      "type": "TEXT",
      "id": "f56c1d3a-c8c0-43f7-92ef-8c21db4071d0",
      "reading_order": 2,
      "page_indices": [
        4
      ],
      "representation": {
        "markdown": "Pre-animated performers rendered as 3D avatars in virtual music experiences enable fans to experience the music in shared spaces and interact with others. However, initial surveys we conducted before embarking on any user trials indicate that potential audience members and live music artists would like the audience to be able to interact with the performer in real-time through modes such as sing-along, cheering, dancing, and expression of emotions \\[Rivera et al. (1)\\]. This was later borne out through audience surveys of the trial events described later in this paper. Video-based capture of performers (as distinct from motion capture) was also favoured from a business perspective, as this makes it easier to simulcast a performance being staged for a live audience or TV show, where the performer would not want constrains on what they can wear."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "fe7282d9-2e68-4cae-b744-7541d69b6bca",
      "reading_order": 3,
      "page_indices": [
        4
      ],
      "representation": {
        "markdown": "We took a holistic approach for this preliminary research to elicit a requirements wish list for live immersive music events. We considered an audience perspective, the perspective of music artists, and that of extended reality (XR) practitioners with experience in the field. We treated the initial findings as points to consider, but still open to further investigation, on which subsequent sections of this paper will elaborate."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "e8746c04-834e-4ea5-91d6-3c0522e12d69",
      "reading_order": 4,
      "page_indices": [
        4
      ],
      "representation": {
        "markdown": "Our survey \\[Rivera et al. (1)\\] had a total of 45 responses from potential audience members, and 15 responses from workshop participants. These indicated that the ability to interact with each other and with the music artist, the agency to choose a point-of-view, adapt the audio mix, re-watch the performance, explore the environment, and take away digital artefacts, were unanimously popular features. Proximity-based audio chats were the most desired mode of interaction with friends, with text chats the next most popular. However, text chats were the most desired mode of interaction for conversing with new people, rather than audio. Personalisation of avatars was also a high priority, with the added option for gender-neutral avatars. The strongest incentives to join a live immersive music experience (apart from the music) was to socialise with friends."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "c92422c9-ab00-45bd-8da9-8cef979ab504",
      "reading_order": 5,
      "page_indices": [
        4
      ],
      "representation": {
        "markdown": "The importance of supporting text-based chat was also reiterated in interviews with 4 established XR practitioners. Amongst other findings it was identified that a low latency real- time experience, supported by a text chat platform (either built into the platform or using an external service, such as Discord) was considered key to helping foster a sense of engagement for an audience attending a virtual event. This was also borne out in feedback from pilot live events as described later in the paper."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "17f2756f-2676-41aa-959e-afe84017a933",
      "reading_order": 6,
      "page_indices": [
        4
      ],
      "representation": {
        "markdown": "We also interviewed 5 music artists (with substantial experience playing live gigs) to explore expectations from a creative performer's perspective. The artists prioritised good quality sound, with spatial audio and realistic room acoustics generally preferred, although some preferred sound to reflect their own intended sound mix. Audience reaction through real- time low latency feedback was considered essential to the performers. If playing to a live real-world crowd and simultaneously to the virtual audience, it was deemed important that their view of the virtual world did not distract from the live performance (for example having to look at a small monitor, or text messages). The concept of a large LED wall displaying the virtual world was appealing to provide more of a seamless audience. The potential to reach new audiences, and to express themselves in more immersive and creative ways was highly appealing. Some of the artists preferred being represented as realistically as possible (through volumetric capture for instance), while others were enticed by the possibilities of having more creative representations (such as custom avatars)."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "a1816763-4606-4fa7-9e0e-94418057a28d",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image appears to be a logo for an event or organization called &quot;IBC2024&quot;.](./a1816763-4606-4fa7-9e0e-94418057a28d.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image appears to be a logo for an event or organization called \"IBC2024\"."
    },
    {
      "type": "TEXT",
      "id": "84ecd07f-4941-42ad-a0d6-abf9581273de",
      "reading_order": 1,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "# TECHNICAL APPROACHES"
      },
      "sub_type": "TITLE"
    },
    {
      "type": "TEXT",
      "id": "1ef2b0bf-6295-4790-b409-421837938ebc",
      "reading_order": 2,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "## Volumetric Capture"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "727fb618-4251-4532-bf95-343ed8c53eb7",
      "reading_order": 3,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "Volumetric capture describes a group of technologies that are able to represent real world 3D objects and events in a way that can be played back and rendered from a floating \"free viewpoint\" camera. Typically, this type of playback is enabled via 3D game engines which allow events that have been volumetrically captured to appear alongside traditional 3D graphics assets."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "854d8ac4-7a45-41e7-892e-059efdc1ec10",
      "reading_order": 4,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "The volumetric capture system used in the experiments takes a similar form to that presented by Orts-Escolano et al. (2). On the recording end, a set of 10 calibrated depth and RGB sensors capture raw image frames. These frames are passed through a \"fusion\" process that generates a 3D model, which is represented as a triangle mesh and UV mapped texture. The fusion process implements a real-time hole filling and an occlusion prediction process which compensates for noisy and incomplete information coming in from the cameras. During operation, the system records everything that occurs within a 3D \"capture area\". Events that fall outside of this area will not be represented in the output model. The capture area is limited to 4mx4mx3m to ensure real-time playback. There are also limitations on the amount of surface area within the scene, both in terms of processing times and bit-rate limitations. Once the data is fused, the model is delivered to a cloud-based distribution system which compresses the data to support a number of different bit rates, and will generate segments similar to an HLS/DASH system. The segments are then served out via a manifest to client devices over a CDN."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "f3be2d71-df26-4874-83ea-d8169ec669b4",
      "reading_order": 5,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "As a means of injecting live content into virtual spaces, volumetric capture has a number of benefits. Compared to methods such as motion capture, it provides a much more authentic representation of the performance, allowing anything within the capture volume to be captured and represented in-game. It does not suffer from the uncanny valley effect that can result from the armature of the motion capture not being able to accurately represent the performer's movements. When compared to 2D and 2.5D video, volumetric video enables full free viewpoint representation, as well as the opportunity to shadow cast and relight the content which allows better integration into the virtual scene."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "f7bdc6b9-69a1-4886-92f8-510b24013b45",
      "reading_order": 6,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "However, volumetric capture does present challenges. A calibrated camera rig needs to be set up surrounding the performer. The calibration can take time, and it is possible to disrupt the calibration during an event if the cameras are knocked. Additionally, depending on the technology employed within the cameras, certain materials and surfaces are unable to be properly captured. Some of this can be solved algorithmically, although in practice, it is simplest to put some limitations on the artist's costumes and props. The interaction between the cameras and the lighting can cause additional problems. To be able to relight the content within the game engine requires flat lighting in the capture area. This causes difficulties when recording content during in-person events where existing event lighting is in place."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "5361b4e1-169e-4719-b6ac-9c1841e23a71",
      "reading_order": 7,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "## 2D / 2.5D Capture"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "7ceafcbf-8df4-4853-8883-0fbd5c575bea",
      "reading_order": 8,
      "page_indices": [
        5
      ],
      "representation": {
        "markdown": "A simpler approach that may suffice in situations where a fully-free viewpoint is not required, is to use one or more conventional 2D video streams (sometimes referred to as video billboards), or so-called 2.5D video (where depth data may be used to help extend the range of viewpoints) \\[Grau et al. (3)\\]. Such approaches may be particularly suitable when the range of viewpoints is naturally limited, such as for an audience viewing a stage from the front. The lack of true 3D may also be less apparent in browser-based VR use cases (which are our"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "913ec5f3-0dea-4431-abd2-19d6b7b65313",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters &quot;ib&quot; in a red circle, with the text &quot;IBC2024&quot; written below.](./913ec5f3-0dea-4431-abd2-19d6b7b65313.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters \"ib\" in a red circle, with the text \"IBC2024\" written below."
    },
    {
      "type": "TEXT",
      "id": "c2b2046f-864b-4ec8-80ae-f96a27c46430",
      "reading_order": 1,
      "page_indices": [
        6
      ],
      "representation": {
        "markdown": "current focus), where the user has no stereoscopic depth cues, rather than applications using a VR headset. There is also anecdotal evidence that flat images placed within a 3D environment can trick the brain into seeing them as true 3D, through the successful use of Pepper's Ghost illusions for so-called holographic performers on stage \\[Grow (4)\\]."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "c1b17f6c-300c-4287-ab0c-1df5ae0aca76",
      "reading_order": 2,
      "page_indices": [
        6
      ],
      "representation": {
        "markdown": "Adding depth data to images, or simply using an alpha mask to define the foreground area, can help make a performer captured in a single video stream look better-integrated into a 3D environment. However, often the stage background or lighting/haze effects form an important part of a live performance, so for our initial work we have focused on what can be achieved with one or more conventional video streams without any sophisticated processing, to provide a baseline for representing a live music stage performance without needing to impose constraints on the performance itself."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "0f3727e0-3472-4083-afb3-442d5ad0bb5a",
      "reading_order": 3,
      "page_indices": [
        6
      ],
      "representation": {
        "markdown": "For this experiment, we built a 3D \"nightclub-style\" environment in Unreal Engine that included a replica of the real-world stage on which we had recorded a number of musical performances. We then tested the user responses to the following scenarios:"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "00ae2246-5133-40f2-8c04-655364d6dcc8",
      "reading_order": 4,
      "page_indices": [
        6
      ],
      "representation": {
        "markdown": "- 1) A single view of a performer taken from a camera in front of the stage, placed on a 3D plane covering the virtual set stage boundaries.\n- 2) A view of the performer from one of three different camera positions, placed on a 3D plane covering the virtual set stage boundaries. The camera feed displayed on that plane was chosen on a frame-by-frame basis according to whichever camera was closest to the player viewpoint, the idea being to switch to the \"best\" perspective on the musician as the user moves about.\n- 3) As #2, but with the addition of a strobe-light effect from virtual lights in the 3D scene that triggered whenever the displayed camera view changed. Early experimentation showed that this could distract the user from noticing the camera view transition."
      },
      "sub_type": "LIST"
    },
    {
      "type": "TEXT",
      "id": "a915e15c-1554-4f82-9b58-5c08dd5d491a",
      "reading_order": 5,
      "page_indices": [
        6
      ],
      "representation": {
        "markdown": "The video from each camera was packed into one of four quadrants in a single 3840x2160 UHD video encoded as mp4 at 20 Mbps. That meant that the resolution of a single camera feed was 1920x1080. This resolution was used for each of the three scenarios above. Since we were using at most three cameras, the fourth quadrant can hold a \"broadcast-style\" cut that can be used as a texture for a virtual \"big screen\" in the virtual nightclub (note though that we did not add the big screen for this experiment). Although we would typically stream the video live into the rendering application, for this experiment the videos were added as files into the build to ensure consistent display quality for the user."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "82fc8318-0229-455a-979f-aecabe3ab91c",
      "reading_order": 6,
      "page_indices": [
        6
      ],
      "representation": {
        "markdown": "The three cameras were positioned to record the performance as shown in the photo below. The height off the ground for each camera was chosen to match the typical viewing height of the user's camera in the 3D scene, and the distance from the stage was chosen to match the typical viewing distance of the user from the stage in the 3D scene."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "53305e72-b1b6-4a8a-a19b-763ca4ecd1eb",
      "reading_order": 7,
      "page_indices": [
        6
      ],
      "representation": {
        "markdown": "Each camera view was framed such that its image encompassed the entire stage. A physically accurate model of the stage was created and placed into the 3D world, and a plane (onto which the texture of the video from the cameras was drawn) was positioned to cover the stage corners. In order to ensure that the stage corners in the video were correctly placed at the corners of the video plane, a transform matrix for each camera was calculated that mapped the UV values for the stage corners in each camera image to the corners of the video plane."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "de8e06d6-3354-44e1-8130-760e016b95fa",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to the media and broadcasting industry.](./de8e06d6-3354-44e1-8130-760e016b95fa.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to the media and broadcasting industry."
    },
    {
      "type": "FIGURE",
      "id": "1a610f4f-51d3-4476-ba7d-420ea8eca2bd",
      "reading_order": 1,
      "sub_type": "IMAGE",
      "title": "Figure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).",
      "representation": {
        "markdown": "![Title: Capture of a Performance in a Virtual Nightclub\nThe image shows a capture of a performance taking place on a stage, with central and left-hand cameras visible. The camera feed from the left-hand camera is embedded in a virtual nightclub scene, creating an immersive experience for the audience.](./1a610f4f-51d3-4476-ba7d-420ea8eca2bd.png)\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\nma/R\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right)."
      },
      "summary": "Title: Capture of a Performance in a Virtual Nightclub\nThe image shows a capture of a performance taking place on a stage, with central and left-hand cameras visible. The camera feed from the left-hand camera is embedded in a virtual nightclub scene, creating an immersive experience for the audience."
    },
    {
      "type": "FIGURE",
      "id": "098356bc-a881-4b54-b3e7-e8d52346832e",
      "reading_order": 2,
      "sub_type": "IMAGE",
      "title": "Figure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).",
      "representation": {
        "markdown": "![Title: Capture of a Performance in a Virtual Nightclub\nThe image shows a capture of a performance, with central and left-hand cameras visible on the left side. The camera feed from the left-hand camera is embedded in a virtual nightclub scene on the right side, displaying a colorful and dynamic virtual environment.](./098356bc-a881-4b54-b3e7-e8d52346832e.png)\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\nma/R\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right).\n\nFigure 1 - Capture of a performance, with central and left-hand cameras visible (left); camera feed from left-hand camera embedded in virtual night club (right)."
      },
      "summary": "Title: Capture of a Performance in a Virtual Nightclub\nThe image shows a capture of a performance, with central and left-hand cameras visible on the left side. The camera feed from the left-hand camera is embedded in a virtual nightclub scene on the right side, displaying a colorful and dynamic virtual environment."
    },
    {
      "type": "TEXT",
      "id": "33b57fdc-60b6-45fb-9388-1e757fc45301",
      "reading_order": 3,
      "page_indices": [
        7
      ],
      "representation": {
        "markdown": "## Delivery and Rendering"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "ba755fe1-9697-4eda-abfd-e66494b73fea",
      "reading_order": 4,
      "page_indices": [
        7
      ],
      "representation": {
        "markdown": "For the user to experience the performance in a game-like environment, they need to be able to run what is effectively a 3D multiplayer networked game that receives a live stream of the captured data. There are several approaches that can be taken for this."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "d26b3291-1270-427f-b8e1-9275630cf697",
      "reading_order": 5,
      "page_indices": [
        7
      ],
      "representation": {
        "markdown": "Pixel streaming, browser-based rendering and using a downloadable executable have a number of benefits and drawbacks. Pixel-streaming represents the most easily accessible technology for users. Typically, only a browser is required, and the bandwidth requirements are similar to those required for typical video streaming. This is because the heavy rendering tasks are performed remotely, and only the rendered output is delivered to the client. The main drawback of pixel streaming is the cost and limitations on availability of cloud rendering resources. These limitations can severely constrain the overall reach of the event. From a user's perspective, pixel streaming has a few drawbacks. First, due to additional latency incurred by sending control inputs to the cloud renderer, game play can feel laggy and unresponsive. Furthermore, due to compression artefacts, the visual presentation is not always great. These artefacts can include blocking, as well as colour space reductions."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "582f1f2a-5832-4850-bf55-cfec051a6ab7",
      "reading_order": 6,
      "page_indices": [
        7
      ],
      "representation": {
        "markdown": "Browser-based rendering has a number of advantages. The user only requires a modern web browser to access the event. As the content is rendered locally there is none of the input lag and compression artefacts caused by pixel streaming. Another benefit of browser rendering is that it can better integrate with web-based platforms, as no switch between applications is required when moving from the platform to the performance. The main drawback from browser-based rendering is the limitations that browsers put on access to the user's machine for security purposes. This can limit the feature-set that can be delivered to users. In the case of volumetric video, the rendering process can also require very modern browser features such as web codecs, which require users to be running the most recent version of their browser."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "4f377e55-d80c-4ed4-80c6-18f7d3ba722d",
      "reading_order": 7,
      "page_indices": [
        7
      ],
      "representation": {
        "markdown": "Rendering via a downloaded application will provide the best user experience, as the application can make full use of the user's system, with no limitations on the rendering feature set. However, as the application will need to be downloaded and installed on the user's system, this method is likely the most limited in its accessibility. Additional limitations can be imposed depending on the application distribution method. Distributing via digital platforms such as Steam or the App Store can simplify the download and install process, as well as provide users with additional assurances around the security of the application. However, these digital platforms can put limitations on monetisation options, as well as imposing significant requirements on submission."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "468b118b-3409-4978-b0a6-848ca4ff3baf",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image appears to be a logo for an event or organization called &quot;IBC2024&quot;.](./468b118b-3409-4978-b0a6-848ca4ff3baf.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image appears to be a logo for an event or organization called \"IBC2024\"."
    },
    {
      "type": "TEXT",
      "id": "b360715d-6a41-4088-9ba9-9661271fd381",
      "reading_order": 1,
      "page_indices": [
        8
      ],
      "representation": {
        "markdown": "# RESULTS To DATE"
      },
      "sub_type": "TITLE"
    },
    {
      "type": "TEXT",
      "id": "6b1946f1-0c95-4233-bbd3-01f7dbad1a2a",
      "reading_order": 2,
      "page_indices": [
        8
      ],
      "representation": {
        "markdown": "## Volumetric capture"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "adf55810-4d77-4e08-95a5-9138856d4296",
      "reading_order": 3,
      "page_indices": [
        8
      ],
      "representation": {
        "markdown": "We have so far conducted four trials using volumetric capture. The first three of these relied on a pixel streaming solution and the fourth on browser-based rendering. All were accessible via the browser and did not require any additional downloading of supporting applications. Each trial was conducted as part of a live music session broadcast as part of a radio show and was therefore produced in collaboration with a studio production team. The artists involved each performed three songs and were captured in the volumetric rig which had been constructed within the recording studio space."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "9df8c803-4cf3-47aa-b3db-173d15b64cf4",
      "reading_order": 4,
      "page_indices": [
        8
      ],
      "representation": {
        "markdown": "For the duration of these live performances, the audience were given the option of joining the immersive experience which included the ability to choose an avatar and then navigate a virtual venue using keyboard and mouse controls. The fourth trial also included the ability to join via a browser on a phone and included touch control. A volumetric capture of the live performer was visible on a virtual stage alongside fellow avatars that had joined the experience. Technical limitations meant that only 30 user avatars could be rendered in a common space, so the audience was divided into identical 'rooms', each holding up to 30 people plus the performer. Each performer had access to a screen as they performed which gave a fixed perspective of the virtual venue meaning they could see avatars of audience members and their movements in one of the rooms with a 10-second delay. These trials were publicised on the radio show with joining details offered on air. A 'common stream' was also produced from a virtual viewpoint controlled by the production team, which was made available as conventional streaming video for those unable to join virtually."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "66f56728-013d-4504-ada1-dcd741d948b2",
      "reading_order": 5,
      "page_indices": [
        8
      ],
      "representation": {
        "markdown": "Around 150 unique users joined each trial as avatars and feedback was captured for the third trial via three online focus groups of differing ages (15-16, 17-19 and 20-24). Overall, responders found the experience intriguing and unique but their interest to attend further events was predominantly driven by the choice of artist performing. In particular, it was felt that the experience lacked social features that would better emulate the communal feeling of a concert experience. On the basis of this feedback, and our initial audience research, a Discord server was used as part of the fourth trial which allowed for interaction between participants and the radio DJ. This - in particular - was a feature that the audience seemed to value."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "24ce7aa0-1c49-443f-87c4-7cfd487c8558",
      "reading_order": 6,
      "page_indices": [
        8
      ],
      "representation": {
        "markdown": "Our initial findings from these trials showed that the livestreaming of volumetric capture into a 3D game engine environment allows for elements of interaction with a virtual audience which is not possible via other mediums. The level of interaction from the performer was very dependent on their comfort with the technology and being briefed sufficiently, though dance moves and emotes played an important role in making this interaction feel two-way and dynamic between attendees and performer. While we are yet to test in-experience voice or text communication, Discord proved a powerful way for users to interact with each other, build anticipation and share their experience of the live experience together. Re-posting some of these messages into the in-app messaging solution also made the experience feel more alive and dynamic for those not actively participating in the Discord chat. Scaled interaction was made challenging as a result of the 30-person room limitation applicable within our trials meaning some participants found themselves in less trafficked rooms, and only one room was visible to the performer."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "4723f8c1-df97-49b7-899e-b0c7ce220f71",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to the broadcasting or media industry.](./4723f8c1-df97-49b7-899e-b0c7ce220f71.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to the broadcasting or media industry."
    },
    {
      "type": "FIGURE",
      "id": "53724fd6-8624-4e9f-b204-d0b89ad77f7e",
      "reading_order": 1,
      "sub_type": "IMAGE",
      "title": "Figure 2 - Capture of a performance, with monitor to show view of audience (left); volumetric render of captured performer being watched by audience avatars (right).",
      "representation": {
        "markdown": "![Title: Volumetric Capture and Delivery System\nThe image shows a capture of a performance, with a monitor on the left displaying the view of the audience, and a volumetric render of the captured performer being watched by audience avatars on the right. The image is accompanied by text describing the technical details of the volumetric capture and delivery system, including the bitrate and latency metrics measured during the recording and the compression settings used for the &quot;low bitrate&quot; and &quot;high bitrate&quot; streams.](./53724fd6-8624-4e9f-b204-d0b89ad77f7e.png)\n\nFigure 2 - Capture of a performance, with monitor to show view of audience (left); volumetric render of captured performer being watched by audience avatars (right)."
      },
      "summary": "Title: Volumetric Capture and Delivery System\nThe image shows a capture of a performance, with a monitor on the left displaying the view of the audience, and a volumetric render of the captured performer being watched by audience avatars on the right. The image is accompanied by text describing the technical details of the volumetric capture and delivery system, including the bitrate and latency metrics measured during the recording and the compression settings used for the \"low bitrate\" and \"high bitrate\" streams."
    },
    {
      "type": "TEXT",
      "id": "e888370c-2812-4995-93de-0215e3a387e0",
      "reading_order": 2,
      "page_indices": [
        9
      ],
      "representation": {
        "markdown": "The volumetric capture and delivery system is able to measure bitrate and latency metrics while recording. During the browser-based rendering event we compressed the volumetric content at two bitrates. The \"low bitrate\" was produced to be consumed by browsers and the \"high bitrate\" to be consumed by the team producing the common stream, where there were no bandwidth limitations. Of the 3 types of streamed content (audio, mesh geometry, and texture), for the \"low bitrate\" stream the audio was \\~1mbit/s, the mesh geometry \\~8mbit/s and the texture data \\~10mbit/s. This resulted in a total bandwidth requirement of \\~20mbit/s. For the \"high bitrate\" stream, the equivalent values are \\~1mbit/s, \\~24mbit/s and \\~20mbit/s for a total of 45mbit/s."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "8a504212-1426-451c-8f94-c73af55fa9fd",
      "reading_order": 3,
      "page_indices": [
        9
      ],
      "representation": {
        "markdown": "For the events which were handled via pixel streaming, we ran with a single bitrate. This was set quite high as there were no bandwidth limitations on the cloud infrastructure handling the pixel streaming. The values were \\~1mbit/s for audio, \\~30mbit/s for mesh geometry and \\~35mbit/s for the texture data."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "d4293ca5-b189-4871-b446-12cce98c6736",
      "reading_order": 4,
      "page_indices": [
        9
      ],
      "representation": {
        "markdown": "The latency of the pipeline was measured in a number of places. The total in-rig latency - which measures the on-site processing time - was around 100ms. Once delivered to the cloud, the compression added around \\~1000ms and the segment processing and delivery to the CDN incurred a further \\~3500ms of latency. In total this adds up to \\~4600ms of latency. This gives plenty of head room to allow the clients to playback at a consistent 10- second latency. Although this sounds high, it can largely be mitigated by briefing the performer to not expect instant reactions from the attendees. Interactions between the attendees themselves had a much lower latency."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "7a847fde-93d7-4b79-9ded-90aafd7ccda9",
      "reading_order": 5,
      "page_indices": [
        9
      ],
      "representation": {
        "markdown": "## 2D / 2.5D Capture"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "255663ec-95ec-4a7a-aa38-fa5514cb05e6",
      "reading_order": 6,
      "page_indices": [
        9
      ],
      "representation": {
        "markdown": "We conducted a multi-day test shoot with four music acts at Production Park studios, to trial our technology in a setting resembling a live concert/festival. We live streamed into our virtual nightclub in Unreal using our three locked-off cameras approach (on the left, middle and right-hand sides of the stage)."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "dbe1f8f8-84d3-4a81-bb87-639d1f79e5d1",
      "reading_order": 7,
      "page_indices": [
        9
      ],
      "representation": {
        "markdown": "We evaluated a number of streaming options for delivering live video into an Unreal application and settled on the Millicast plugin \\[Dolby (5)\\]. This provided the desired functionality, a managed streaming service, and compatibility with existing applications built by our partners in the MAX-R project in which this work was carried out. Millicast provides support for HD resolution video with low latency for live video. This meant that as we were"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "6595373b-5647-4a5b-aa63-83c144cfa5bd",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to broadcasting or media technology.](./6595373b-5647-4a5b-aa63-83c144cfa5bd.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be a conference or event related to broadcasting or media technology."
    },
    {
      "type": "TEXT",
      "id": "aa5aa846-7494-4833-bcf8-2204aa13e104",
      "reading_order": 1,
      "page_indices": [
        10
      ],
      "representation": {
        "markdown": "using a quad video incorporating multiple camera views then each camera view was reduced to 960x540 pixels. However, for the user trials for testing camera layouts, we used 4K video files rather than a live stream, with each camera view ending up as 1920x1080 pixels. A fork of the OBS-Studio video streaming software was chosen to provide the Web- RTC video stream required by Millicast."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "79cd3c8c-1e03-42ab-90c1-ec976f53120f",
      "reading_order": 2,
      "page_indices": [
        10
      ],
      "representation": {
        "markdown": "Streaming the multi-cam setup to our game environment enabled users joining as avatars to see the most relevant point of view for the position they moved to. Although this approach provides a more limited range of viewpoints in the 3D spaces than volumetric capture, it enables capturing performances with challenging staging conditions that could interfere with volumetric capture. For example, our test shoot included theatrical flashing stage lighting, and dynamic graphics displayed on an LED wall backdrop, with some haze. Dynamic virtual stage lighting was triggered by lighting in the videos of the performers to facilitate a better visual connection between the real performance and virtual space. Two of the music acts also made use of the entire stage area which was significantly larger than the volumetric capture approach discussed above can easily handle."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "233a1e9b-4f1b-4a33-be20-123021e35b9b",
      "reading_order": 3,
      "page_indices": [
        10
      ],
      "representation": {
        "markdown": "We conducted an online study using this set up, to explore potential preferences for the following viewing conditions for the performances:"
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "b851fc80-9c3a-492a-b8ba-0ec8ceb2a3f7",
      "reading_order": 4,
      "page_indices": [
        10
      ],
      "representation": {
        "markdown": "- C1: Single centre camera view only. This provides seamless viewing within the virtual venue, but skewed perspectives to either side of the stage\n- C2: Multi-cam view with 3 cameras. This provides more accurate viewing perspectives from the sides, and thus potentially more sense of depth, but (currently) at the cost of a visual glitch when the camera plane updates in response to the avatar's position in the room.\n- C3: Multi-cam view as in C2, but with flashing stage light distractions the point of camera change as the player avatar's viewpoint changes at from one camera to another."
      },
      "sub_type": "LIST"
    },
    {
      "type": "TEXT",
      "id": "5b6bdb32-ab1c-49a9-b6fd-589a3049cefd",
      "reading_order": 5,
      "page_indices": [
        10
      ],
      "representation": {
        "markdown": "Consistent viewing angles and duration for each condition were provided by the point-of- view from a player avatar following a specified path around the virtual venue. We used video clips of the experience recorded in the gaming engine for each condition for 3 of our artists to open the study to those who are not familiar with 3D navigation in games. Participants were instructed to view each video and respond to whether the appearance of the musical performance was satisfactory. We also asked whether the lighting in the scene enhanced the viewing experience since we had used lights to distract from camera changes. We had also received indications from some pilot test participants that lighting might impact perceived levels of realism, and engagement. In a third questions, we asked about interest in joining an interactive experience based on the examples shown. Figure 3 shows some example views from the study."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "bf3b89d3-6936-48dd-afc6-b24981912f8b",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters &quot;ib&quot; in a red circle, with the text &quot;IBC2024&quot; below.](./bf3b89d3-6936-48dd-afc6-b24981912f8b.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters \"ib\" in a red circle, with the text \"IBC2024\" below."
    },
    {
      "type": "FIGURE",
      "id": "e958bf35-13e1-48e8-a167-341065879225",
      "reading_order": 1,
      "sub_type": "IMAGE",
      "title": "Figure 3 - Example views from the study. The top row (a), (b), (c) show single camera views (artist KDYN) from either side of stage and centre. The middle row (d), (e), (f) show multi-cam views (artist TWST). And the bottom row (g), (h), (i), show the multi-cam views (artist Badliana) when the distraction lights are triggered (g), (i) as the player avatar point of view changes when moving to the side.",
      "representation": {
        "markdown": "![Title: Example views from the study\nThe image shows multiple views of a stage setup with various lighting and camera configurations. The top row (a, b, c) shows single camera views from either side of the stage and the center. The middle row (d, e, f) shows multi-cam views, and the bottom row (g, h, i) shows the multi-cam views when the distraction lights are triggered, with the player avatar point of view changing when moving to the side.](./e958bf35-13e1-48e8-a167-341065879225.png)\n\na)\nb)\nc)\n\nd)\ne)\nf)\n\ng)\nh)\ni)\n\nFigure 3 - Example views from the study. The top row (a), (b), (c) show single camera views (artist KDYN) from either side of stage and centre. The middle row (d), (e), (f) show multi-cam views (artist TWST). And the bottom row (g), (h), (i), show the multi-cam views (artist Badliana) when the distraction lights are triggered (g), (i) as the player avatar point of view changes when moving to the side."
      },
      "summary": "Title: Example views from the study\nThe image shows multiple views of a stage setup with various lighting and camera configurations. The top row (a, b, c) shows single camera views from either side of the stage and the center. The middle row (d, e, f) shows multi-cam views, and the bottom row (g, h, i) shows the multi-cam views when the distraction lights are triggered, with the player avatar point of view changing when moving to the side."
    },
    {
      "type": "TEXT",
      "id": "33a4235c-9259-463f-af83-6d5f19509715",
      "reading_order": 2,
      "page_indices": [
        11
      ],
      "representation": {
        "markdown": "As shown in Figure 4, results from our 75 respondents reflect that when considering C3 (multi-cam with distractions) 55.2% of respondents agreed/strongly agreed with our three propositions regarding satisfaction of viewing angles of the performer, the lighting enhancing the experience, and interest in joining an experience based on the example. 27.7% disagreed/strongly disagreed, with the remaining 17% neutral. C2 produced similar results with 53.6% positive (agreed/strongly agreed), 28.3% more negative (disagreed /strongly disagreed), and the remaining 18% were neutral. C1 (the single-cam) option had similar neutral responses of 18%, with 50.4% agreeing/strongly agreeing, and 31.5% disagreeing/strongly disagreeing."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "fa7b0ed1-5b34-43e1-b844-d8db8d03ad2e",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for &quot;IBC2024&quot;, which appears to be an event or organization.](./fa7b0ed1-5b34-43e1-b844-d8db8d03ad2e.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for \"IBC2024\", which appears to be an event or organization."
    },
    {
      "type": "FIGURE",
      "id": "5281a84a-4c9a-456d-bec4-a5cbae538367",
      "reading_order": 1,
      "sub_type": "CHART",
      "title": "Figure 4 - Overall study results, showing percentage of respondents' agreement/disagreement levels to our propositions across our three conditions",
      "representation": {
        "markdown": "![Title: Overall Study Results: Agreement Levels Across Three Camera Conditions\nThe chart displays agreement/disagreement levels for three different camera conditions: multi-cam with distractions, multi-cam, and single-cam. Each condition shows percentage distributions across five response categories ranging from &quot;Strongly Disagree&quot; to &quot;Strongly Agree&quot;. Across all three conditions, the &quot;Agree&quot; category received the highest percentage (34.1-37.9%), followed by &quot;Disagree&quot; (20.1-22.8%). The &quot;Neither Agree Nor Disagree&quot; category consistently showed around 17-18% responses. &quot;Strongly Agree&quot; and &quot;Strongly Disagree&quot; categories received the lowest percentages, with &quot;Strongly Disagree&quot; ranging from 7.6-8.7% and &quot;Strongly Agree&quot; ranging from 16.0-17.3%.\n&lt;csv_data&gt;\nCondition,Strongly Disagree,Disagree,Neither Agree Nor Disagree,Agree,Strongly Agree\nMulti-cam with distractions,7.6,20.1,17.0,37.9,17.3\nMulti-cam,7.7,20.6,18.0,37.6,16.0\nSingle-cam,8.7,22.8,18.0,34.1,16.3\n&lt;/csv_data&gt;](./5281a84a-4c9a-456d-bec4-a5cbae538367.png)\n\nMulti-cam\n with distractions\n7.6%\n20.1%\n17%\n37.9%\n17.3%\n\nMulti-cam\n7.7%\n20.6%\n18%\n37.6%\n16.0%\n\nSingle-cam\n8.7%\n22.8%\n18%\n34.1%\n16.3%\n\nStrongly\nNeither Agree\nStrongly\n Disagree\nDisagree\nNor Disagree\nAgree\nAgree\n\nFigure 4 - Overall study results, showing percentage of respondents&#x27; agreement/disagreement levels to our propositions across our three conditions"
      },
      "summary": "Title: Overall Study Results: Agreement Levels Across Three Camera Conditions\nThe chart displays agreement/disagreement levels for three different camera conditions: multi-cam with distractions, multi-cam, and single-cam. Each condition shows percentage distributions across five response categories ranging from \"Strongly Disagree\" to \"Strongly Agree\". Across all three conditions, the \"Agree\" category received the highest percentage (34.1-37.9%), followed by \"Disagree\" (20.1-22.8%). The \"Neither Agree Nor Disagree\" category consistently showed around 17-18% responses. \"Strongly Agree\" and \"Strongly Disagree\" categories received the lowest percentages, with \"Strongly Disagree\" ranging from 7.6-8.7% and \"Strongly Agree\" ranging from 16.0-17.3%.\n<csv_data>\nCondition,Strongly Disagree,Disagree,Neither Agree Nor Disagree,Agree,Strongly Agree\nMulti-cam with distractions,7.6,20.1,17.0,37.9,17.3\nMulti-cam,7.7,20.6,18.0,37.6,16.0\nSingle-cam,8.7,22.8,18.0,34.1,16.3\n</csv_data>"
    },
    {
      "type": "TEXT",
      "id": "8c84f22d-e293-416d-8b33-cebc074f5352",
      "reading_order": 2,
      "page_indices": [
        12
      ],
      "representation": {
        "markdown": "We also asked respondents at the end of the study to prioritise up to 3 factors about the experience that they would like to have improved. The most popular options were to make the 3D performer seem more a part of the 3D world (31%), with some supporting comments suggesting for integration of lighting from real-world to virtual world (e.g. \"match colours of performance with in 3D space light colour\"). 13% would also like to have the performer fill up more of the stage, and 9% would like to have more music genres."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "4f2ff7bd-e86a-43cf-89f3-aa0682619520",
      "reading_order": 3,
      "page_indices": [
        12
      ],
      "representation": {
        "markdown": "We deem the small differences between C1 (single cam) and the multi-cam conditions worth further exploration in an interactive version of the study, where participants can freely explore the space and their own viewing angles. The qualitative feedback also indicates that the relationship between the lighting in the virtual space and in the real-world performance is worth further exploration."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "b055a78f-af8c-4276-a622-fd5982c8d0d7",
      "reading_order": 4,
      "page_indices": [
        12
      ],
      "representation": {
        "markdown": "## Audience interactivity"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "1fab0c7f-761a-436f-a6ba-094732d8aeff",
      "reading_order": 5,
      "page_indices": [
        12
      ],
      "representation": {
        "markdown": "As mentioned previously, audience experience and interactivity came out loud and clear in audience feedback from our surveys and trials. There were calls for more distinctive storytelling in the event, including countdowns to the event beginning, more dynamism/sense of progression during the event and clearer follow-on journeys at the end. Customisable avatars were a popular addition, though there were issues for some with the controls, especially amongst those less familiar with game environments."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "74532354-98e6-4955-aa6f-896500bc58c0",
      "reading_order": 6,
      "page_indices": [
        12
      ],
      "representation": {
        "markdown": "In terms of audience participation, an ability to communicate between users was seen as key, with this working particularly well when the radio DJ hosting the broadcast was able to seamlessly link the immersive experience, radio show and Discord with shout outs and interactions."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "c27467dd-dcb6-498c-8bfc-33a001c24c9c",
      "reading_order": 7,
      "page_indices": [
        12
      ],
      "representation": {
        "markdown": "## CONCLUSION"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "0b436d78-32b8-4d97-b0cb-433e16224fb9",
      "reading_order": 8,
      "page_indices": [
        12
      ],
      "representation": {
        "markdown": "This paper has presented a summary of work in progress to evaluate technical approaches and audience feedback for presentation of live music events in a game-like multi-user shared virtual space. Further developments and trials are currently taking place, and we expect to have new results to present by September 2024."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "FIGURE",
      "id": "1aa5bd55-86cf-432e-beb7-9cbff0ad1520",
      "reading_order": 0,
      "sub_type": "LOGO",
      "title": "",
      "representation": {
        "markdown": "![Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters &quot;ib&quot; in a red circular background.](./1aa5bd55-86cf-432e-beb7-9cbff0ad1520.png)\n\nib\n\nIBC2024"
      },
      "summary": "Title: IBC2024\nThe image shows the logo for IBC2024, which appears to be an event or organization. The logo consists of the letters \"ib\" in a red circular background."
    },
    {
      "type": "TEXT",
      "id": "199db3c4-f5ef-4bfa-8e42-d20907871f7f",
      "reading_order": 1,
      "page_indices": [
        13
      ],
      "representation": {
        "markdown": "From the results to date, it is clear that good visual representation is important, and this includes the appearance and lighting of the virtual venue as well as the appearance of the artist themselves. There are also many other aspects that need to be considered in order to provide a good experience for users, including a clear narrative for the event in terms of how it starts and ends, and support for audience participation including communication with each other and with the artist."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "eda7801f-0e46-46c4-b584-895df53c540b",
      "reading_order": 2,
      "page_indices": [
        13
      ],
      "representation": {
        "markdown": "Over the coming months we intend to advance the volumetric capture system in two ways. First, we will look at improving the compression rates of our volumetric video. By better leveraging the temporal consistency of the data, we believe we can maintain existing visual quality at lower bandwidth, and increase the accessibility of the events. Second, we are looking at moving some of our existing on-site processing to the cloud. This will allow us to increase the reliability of the capture system by running multiple redundant pipelines, as well as relieve some of the resource constraints we have due to running on-premises hardware."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "c74029eb-53d8-4936-9e77-effc887b1c48",
      "reading_order": 3,
      "page_indices": [
        13
      ],
      "representation": {
        "markdown": "We also plan to further study the 2D video approach, in particular for larger events, with experiments planned using content from a music festival."
      },
      "sub_type": "PARAGRAPH"
    },
    {
      "type": "TEXT",
      "id": "1356dd5e-ee48-49c5-a8bb-1493be4a2e0c",
      "reading_order": 4,
      "page_indices": [
        13
      ],
      "representation": {
        "markdown": "## REFERENCES"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "19f5cdb5-1363-463b-a8f2-ce0eef4a9c05",
      "reading_order": 5,
      "page_indices": [
        13
      ],
      "representation": {
        "markdown": "- 1. Rivera, F., Thomas, G. et al. 2023. D2.2 Report on Scenario Use-Cases for Pipelines using Virtual and XR Production (see Annex 1). MAX-R project public deliverable available at https://www.max-r.eu/documents\n- 2. Orts-Escolano, S. Et al. 2016. Holoportation: Virtual 3D Teleportation in Real-time. UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pp. 741-754. https://dl.acm.org/doi/abs/10.1145/2984511.2984517\n- 3. Grau, O., Price, M., Thomas, G. 2002. Use of 3-D Techniques for Virtual Production. BBC R&D White Paper WHP033. https://www.bbc.co.uk/rd/publications/whitepaper033\n- 4. Grow, K. 2019. Live After Death: Inside Music's Booming New Hologram Touring Industry. Rolling Stone Magazine, Sept. 2019. https://www.rollingstone.com/music/music- features/hologram-tours-roy-orbison-frank-zappa-whitney-houston-873399.\n- 5. Dolby. 2024. Dolby Real-time Streaming Player Plugin for Unreal Engine. https://docs.dolby.io/streaming-apis/docs/unreal-player-plugin"
      },
      "sub_type": "LIST"
    },
    {
      "type": "TEXT",
      "id": "5a936329-7fb5-419e-ad47-1363be2f22a9",
      "reading_order": 6,
      "page_indices": [
        13
      ],
      "representation": {
        "markdown": "## ACKNOWLEDGEMENTS"
      },
      "sub_type": "SECTION_HEADER"
    },
    {
      "type": "TEXT",
      "id": "c9485c8c-6bb2-4150-b531-378d61046223",
      "reading_order": 7,
      "page_indices": [
        13
      ],
      "representation": {
        "markdown": "The authors would like to thank the music artists who took part in these trials, including Badliana, KDYN, TWST and Sam Tompkins. They would also like to thank staff at Production Park, UK and students at the Academy of Live Technology for help with the 2D test shoot. Some of the work reported here was carried out as a part of the MAX-R project, which is co-funded by Innovate UK and the European Union's Horizon Europe Research & Innovation Programme under Grant Agreement No. 101070072."
      },
      "sub_type": "PARAGRAPH"
    }
  ]
}