Data Source Transformation

Once a decision has been made to include a data source in the product, it must be transformed to fit the WellLine data model. Three important ideas to keep in mind:

  1. Events have two required fields: if they are not included, the event will give an error when uploaded

  2. Each event must be on its own row, in JSONL format, if uploading through the WellLine command line interface (as this walkthrough does),

  3. Any method can be used to transform data

Other sections of the WellLine documentation provide detailed information on each field of the data model, including definitions, formatting, and examples of fully-formed events

The following steps use Alteryx as the transformation / ETL application. All tools described, below, exist as "template containers" in Alteryx and can be sent to interested users of the product upon request.

Keep in mind that these steps are simply one example of processing a single data source.

Transformation is performed in the following steps:

  1. Choose source data to transform (see Data Sources Determination)

  2. Perform cleanup on the source data

  3. Create id field

  4. Turn coding hierarchy (if available) into code and subcode values

  5. Choose structured terminology (e.g. assets, names) to turn into entities

  6. Choose structured numeric data (e.g. depths, tests) to turn into measures

  7. Populate as many WellLine event model fields as the source data permits

  8. Ensure each row from the event model population contains required fields

  9. Output fully-formed JSONL events

  10. Use chosen entities to create entity values + output them

  11. Use code/subcode values to create eventTypeGroup and eventType values + output them

  12. Use chosen numeric data to create measures values + output them

Transformed data can then be uploaded (see Transformed Data Upload), after which point it will display in the WellLine Knowledge Application and be accessible by the WellLine TimeLine Service GraphQL API.

Transformation to Events

While this walkthrough uses only one input, it may be that one type of event you are producing comes from multiple inputs, e.g. multiple tables from a single database. In this case, for each distinct type of input it may be helpful to use one template/pipeline for each type of input.

A scenario for illustration:

  1. A database contains four tables: Wells, Observations, Problems, and Lessons_Learned

    • Wells has referential information that needs to be joined to the other tables

    • Observations has daily reports about work performed at a wellsite

    • Problems has a correlation to Observations when there was non-productive time

    • Lessons_Learned has a correlation to Problems when an investigation was performed

  2. A determination is made: observations, problems, and lessons learned would ideally be displayed as individual events in WellLine

  3. The necessary joins are made between Wells and the other tables, to incorporate needed information (e.g. well name)

  4. To create separate outputs for the remaining tables, each table is run through its own distinct series of tools

  5. In the output step, each is output with a unique name, e.g. "Observation_001.jsonl", "Problem_001.jsonl", "LessonLearned_001.jsonl"

Transformation - Events - Data Cleanup

A logical first step is to perform some data cleanup. This is needed to ensure that, for example, new lines are removed from unstructured text.

The second step is to ensure uniqueness and expected completeness in the data, so as to avoid duplication in the output. This means performing some quality-assurance-style checks on the input.

Example of a Data Cleanup, Unique, and Filter Tools in Alteryx, with notes on their intended use.

The idea of "essential data" in the Filter Tool will be specific per-source and generally includes one or more portions of the source that should pass a check (i.e. "not null"). Also useful for making sure each input contains information required for WellLine inclusion, e.g. a date.

Transformation - Events - ID creation

The id field is used by several future tools and containers, and is a unique identifier for each event.

The Formula Tool used to create the ID field.

This formula creates one field:

An MD5 hash is generated from a number of columns

The combination of input data used to create the ID should be unique.

If duplicate id values are generated, they will either cause errors during upload or they will overwrite previously-uploaded data.

Transformation - Metadata - Event Type Creation

Fields created here are used in the Event Types and Event Type Groups section, below, and also in the creation of the typeIdfield (see Populating the Data Model).

The creation of values in this formula is not strictly required. However, for the Event Filters in the WellLine Knowledge Application to function correctly, coding hierarchy values must exist in the "metadata" files.

This portion of the data processing greatly simplifies the creation of eventTypeGroup and eventType values, used in their synonymous "metadata" files.

The Formula Tool used to create event classification fields.

This formula creates two fields:

code
subcode
code

Values should be the highest level of your organization's coding hierarchy for the given source system.

Example where source system had field MAIN ACTIVITY:

If field is populated, value is normalized + cleaned, otherwise default value is used.
subcode

Values should be the lowest level of your organization's coding hierarchy for the given source system.

Example where source system had field SUB_ACTIVITY:

If field is populated, value is normalized + cleaned, otherwise default value is used.

For coding hierarchies larger than two levels, contact WellLine support for advice and best practices.

Transformation - Metadata - Measures

Fields created here are used in the Measures section, below.

The creation of values in this formula is not strictly required. However, for measures to appear properly in the Histogram measures dropdown of WellLine's Knowledge Application, a representation of each measure much exist in the "metadata" files.

This portion of data processing greatly simplifies the creation of measure "metadata" values, which are needed by section Measures.

The Formula Tool used to create measure information fields.

Measures can come from many areas of a source system. Common measures include:

  • Depths, including measured (MD) and total vertical (TVD)

  • Production metrics, including oil and natural gas

  • Pressure tests

  • Hole sizes

This field is used to create the metadata required for proper measures display and querying. Future step Populating the Data Model creates their numerical representations.

Notice the specific formatting of the field, which is necessary for future processing.

The name of each field created in this Formula Tool is important to consider, as it will correspond with the value by which the measure will be shown in the Histogram measures dropdown and when querying. For example, the value "depthMD" (from the above screenshot) shows up in the WellLine interface as follows:

Example of a measure value 'depthMD' as shown in the Histogram measure dropdown

Transformation - Events - Entity Creation

Fields created here are used in the Transformation - Entities section, and also in the creation of referenceEntityIds and subjectEntityIds fields (see Populating the Data Model).

The creation of values in this formula is not strictly required. However, for entities to appear properly in the WellLine Knowledge Application's autocomplete search results as well as ConnectionViewer, a unique representation of each entity much exist in the system.

This portion of data processing greatly simplifies the creation of entity typeId and name values, which are needed by section Transformation - Entities.

The Formula Tool used to create entity "helper" fields.

Entities can come from many areas of source systems. Common entities include:

  • Well names or IDs

  • Well sections

  • Rig names

  • Formation names

  • Fluid additives (e.g. types of mud)

The most important thing to remember when creating fields in this tool: the name of the field will be the typeId of the entity, while the value in the field will be the name of the entity. For more information, see Transformation - Entities.

Entities are grouped by their typeId values. These typedId values appear as dropdown options in ConnectionViewer.

Here are some examples of fields in this tool:

Well
Rig
Mud Type
Well
After Entity processing, this field will appear as "well.<well name>"

These values appear in ConnectionViewer like the following:

"Well" is the group with 20 unique entities in it; "NO 15/9-F-1 C" is an entity found in 73 events.

Notice that the name of the field, "well", becomes the ConnectionViewer group, "Well". The individual [WELL_ID] values then appear as entities in this group.

Rig
After Entity processing, this field will appear as "rig.<rig name>"

These values appear in ConnectionViewer like the following:

"Rig" is the group with 3 unique entities in it; "BYFORD DOLPHIN" is an entity found in 1 event.

Notice that the name of the field, "rig", becomes the ConnectionViewer group, "Rig". The individual [RIG] values then appear as entities in this group.

Mud Type
After Entity processing, this field will appear as "mudType.<mud type name>"

These values appear in ConnectionViewer like the following:

"MudType" is the group with 10 unique entities in it; "Hpwbm"is an entity found in 51 events.

Notice that the name of the field, "mudType", becomes the ConnectionViewer group, "MudType". The individual [mudType] values then appear as entities in this group.

Transformation - Events - Populating the Data Model

The next step is encompassed within a single tool which allows for the creation of new fields/columns of data. This Formula Tool will end up containing properly-formatted WellLine data model fields populated by the source data and sometimes populated by "helper" fields from previous sections.

The Formula Tool used to create fields adhering to the WellLine data model.

The new data being created consists of transformations of one or more existing source data columns into a new field that fits the naming conventions and formatting requirements of a single data model field

Below are some examples:

startedOn
content
typeId
subjectEntityIds
referenceEntityIds
measures
startedOn

The startedOn field stores the start datetime for the event, and is a required field. Here, the original start datetime is converted to required ISO formatting:

Notice the JSON formatting
content

The content field stores free text and is a required field. This field is mined by WellLine's entity extraction processing, and it displays in the EventStream.

Here, two source data fields containing unstructured text are combined, and characters are removed which would cause improper JSONL formatting:

typeId

The typeId field references the same data found in the eventType "metadata" file. If previous steps in this walkthrough are completed, then code and subcode should now be available and should be inserted exactly as follows:

Notice the JSON formatting
subjectEntityIds

The subjectEntityIds field is used to store the primary subject(s) of the event. This is generally the well ID, well name, or other concrete well-related entities.

If previous steps in the walkthrough have been followed, then any value here should have been created in the previous Entity Creation step:

Notice the JSON formatting

Take note of the formatting: the values will end up as "well.<well name>".

referenceEntityIds

The referenceEntityIds field is used to store entities where serve as "references" to the event. These are entities such as rig names, mud additives, personnel names, and more.

All entities created in the Entity Creation step that are not contained in subjectEntityIds should be given their own IIF statement here:

Notice that JSON formatting surrounds both IIF statements

Take note of the formatting: the values will end up as "rig.<rig name>" and "mudType.<mud type name>".

measures

The measures field is used to store the measures of the event. This is generally depths, production metrics, or other physical quantities.

If previous steps in the walkthrough have been followed, then any value here should have associated "metadata" info created in the previous Metadata - Measures step:

Notice that JSON formatting surrounds both IIF statements

The 'typeId' values for each measure should match the field names provided in the Metadata - Measures step.

For detailed information on every field in the data model, see: Data Model.

Transformation - Events - Data Completeness Check

The final steps before output are to ensure each row output from the previous processing is unique and contains data in required fields. This is done with the following tools:

A Unique and Filter Tool inserted just before JSONL output.

The id field should be unique for every row, so a simple unique filter on this field should ensure uniqueness.

  • There are many reasons why rows may be duplicates at this point, including one-to-many joins and the combination of previously-separated input data. However, if there are many "duplicates" being discovered at this stage, it may worth double-checking the id field creation logic (from Events - ID step) to ensure it is producing unique values as intended.

Transformation - Events - File Output

The final output of data is as one or more JSONL files. For fastest upload speeds we recommend limiting files to 10,000 rows.

This is done with the following tools:

Select Tool should limit output to just those rows that have been properly-formatted.

There is no "JSONL" extension output option in Alteryx, so instead the files are output with the below parameters. This is made possible by the formatting performed when Populating the Data Model.

Delimiters \0 is equal to "no delimiters". The extension of the file should be "jsonl".

For examples of properly-formatted output rows, see Examples of Complete Events.

Remember that any process can be used to transform source data into events. As long as they final output matches the data model requirements, it will successfully upload and display in the WellLine interface.

Transformation - Entities

Entities are another output of data processing and are necessary for the correct creation of the product's underlying Knowledge Graph. The graph is created naturally and automatically in the WellLine infrastructure by the combination of events and entities.

In Alteryx processing, all Entity Creation formula tool(s) should be connected to this container, which then sends them through a process which creates a separate "Entities.jsonl" file.

The desired output of this process is a single JSONL file that contains a unique set of entities from the transformation processing. Here is an example of rows from a properly-formatted output file:

'typeId' becomes a ConnectionViewer dropdown group; 'name' becomes a value in the dropdown.

The beginning of this process is as follows:

Then, a series of tools are used to transpose and deduplicate the existing data.

Data is transposed so each entity is its own row, then empty rows are filtered out and deduplicated.

The data is then formatted to fit the data model for entities, where each entity can be represented as simply as:

{"typeId": "STRING", "name": "VALUE"}

Value typeId is used to group entities within ConnectionViewer. Value name becomes a selection within its group.

The tools which achieve this are:

Only 'typeId' and 'name' fields are required.

The Output tool parameters are the same as they were for events.

Event Types and Event Type Groups

The ability for WellLine to display hierarchical event types (in the Event Filter panel) relies upon "metadata" files which contains a JSONL definition of the hierarchy.

Data which populates the eventType and eventTypeGroup "metadata" files are gathered in section Transformation - Metadata - Event Types, then processed in the Alteryx container titled, "Event hierarchy processing".

The output of this container can be uploaded with the WellLine CLI tool, specifically the eventTypeGroups and eventTypes commands.

To begin, each Transformation template/pipeline that produces an output should be combined:

In this example, two output pipelines are routed to a union and then grouped.

Event Type Groups

The code values are used to create eventTypeGroup values, which generally are the highest level of a coding system hierarchy. The following set of tools creates these values:

'code' values have one-to-many mappings, so a unique set is transformed with the Formula tool.

There are four fields created in this Formula tool:

  • id

    • The lowercase code value

  • name

    • The title case code value

  • description

    • A brief or general description of the value

  • groupId

    • For eventTypeGroups this is generally the value "all"

The output file from this processing looks like the following:

{"id": "activity", "name": "Activity", "description": "An event group from a public dataset.", "groupId": "all"}
{"id": "completion", "name": "Completion", "description": "An event group from a public dataset.", "groupId": "all"}

All eventTypeGroup values should eventually point to a groupId of "all", as in the above examples.

Event Types

The subcode values are used to create eventType values, which generally are the lowest level of a coding system hierarchy. The following set of tools creates these values:

Unique 'code' values are transformed with the Formula tool.

There are four fields created in this Formula tool:

  • id

    • To ensure uniqueness, this is generally a combination of lowercase code and lowercase subcode values

    • NOTE: event model typeId values (created in the Event Type Creation step, above) must match one of these id values

  • name

    • The title case subcode value

  • description

    • A brief or general description of the value

  • groupId

    • The code value associated with this subcode

The output of this processing looks like the following:

{"id": "activity - mud sampling", "name": "Mud Sampling", "description": "An event from a public dataset.", "groupId": "activity"}
{"id": "completion - bop/wellhead equipment", "name": "Bop/Wellhead Equipment", "description": "An event from a public dataset.", "groupId": "completion"}

All eventType values must point to a defined eventTypeGroup value.

Finally, each path is output as a separate file, one for eventTypeGroups and one for eventTypes:

Measures

The ability for WellLine to display time series data (overlaid in the Histogram) relies upon "metadata" files which contains a JSONL definition of the measures.

Data which populates the measures "metadata" files are gathered in section Transformation - Metadata - Measures, then processed in the Alteryx container titled, "measureTypes file creation".

The output of this container can be uploaded with the WellLine CLI tool, specifically the measureTypes command.

The desired output of this process is a single JSONL file that contains a set of information about each measure, from the transformation processing. Here is an example of rows from a properly-formatted output file:

{"id": "depthTVD", "name": "Depth TVD", "units": "m", "description": "Total vertical depth, in meters"}
{"id": "depthMD", "name": "Depth MD", "units": "m", "description": "Measured depth, in meters"}

This container has a nearly-identical set of tools as can be found in Transformation - Entities steps. Please review that section for more details.

Tips

Useful tips to remember when performing transformations:

  • Each event must have have a populated content, startedOn, and endedOn field.

    • By that same logic: all other fields are optional.

  • "Helper" fields created in several Formula Tools are useful for organizational purposes, and more importantly they allow for the simple creation of several non-event outputs (entities and "metadata")

  • Confused about referenceEntityIds vs subjectEntityIds, or any other fields? Ask the WellLine team for advice! You can contact product support by using the chat support in the bottom-right corner of your screen.

  • Unsure if a specific portion of source data can "fit" into the data model? Ask a SME in your organization, review the WellLine data model documentation (including examples) for ideas, or simply give it a try and see if the output makes sense and adds value to the system.

  • Data must be output as JSONL if you are planning to upload with the WellLine CLI tool.

  • If you are using Alteryx as your transformation tool and have been given the WellLine Tool Templates (available upon request), each field within the Formula tool (shown in Populating the Data Model) contains comments with helpful descriptions.

  • For the best upload performance, we recommend output JSONL files to be limited to 10,000 rows per file.