Expert Group on
Shared Tools


October 2020


VTL-Insee


Franck Cotton (franck.cotton@insee.fr)
Nicolas Laval (nicolas.laval@insee.fr)

Slides

Context

Insee's strategy


  • Align on international standards (even imperfect ones)
  • Active metadata all along the statistical process
  • Embrace open data and open source

As a result


  • Complete overhaul of our survey data acquisition system
  • Work started recently on the dissemination phase

Why VTL?


  • After focusing on information models
  • Switching to process models and meta-information
  • Need to represent processes at the logical level
  • VTL strategic choice in the ESS

Guidelines

Priorities (descending order)


  • Edit code (highlighting, suggestion...)
  • Validation from simple to complex
  • Transformations from simple to complex

Development principles


  • Grammar-based
  • Open-source
  • Driven by use cases

Tools

Overview


  • VTL Tools: JavaScript editor and engine
  • Trevas: Java engine
  • Darkr: R package around Trevas
  • Utility tools (tree visualizer...)

Focus on Trevas (1)


Inspired by Java VTL (SSB):
  • Java VTL 1.1 engine
  • Focused mainly on transformations

Focus on Trevas (2)


Trevas features:

  • Java VTL 2.0 (latest) engine
  • Java JSR 223 conformance
  • Mapper implementations to bind VTL Dataset
  • Good coverage of validation
  • Work starting on transformations
  • Based on Antlr and Apache Spark

Use cases

Overview


  • Crabe: allocation of survey zones to field agents
  • Data validation for households surveys
  • Control of administrative data (multiple uses)

Crabe PoC


  • Reusing Eurostat editor in React app to edit VTL script
  • Calling Trevas engine in Java API:
    • input: script & data
    • output: transformed data
poc-crabe

Thoughts


  • Eurostat should offer editor as a standard tool for the ESS
  • Make VTL an ESS standard
  • Create an ESS VTL users group
  • Improve standardization process (version control, issues, reviews, etc.)
  • Optimize grammar
  • Agree on a common test suite (compatibility kit...)

Thank you




Questions?