Data Standardizer provides implementations of various internationally recognised standards in data processing, covering topics ranging from languages to currencies and geographical entities. With strongly-typed enumerations for each standard (where applicable) or other targeted data types, you can represent these elements in your code such that errors with invalid values are minimised.
Supported target platforms include (modern) .Net and .Net Standard. Data Standardizer can be used in modern application software, but is also available as an option for older codebases that are being upgraded more gradually or may remain on older frameworks indefinitely.
If you derive a commercial benefit from use of Data Standardizer or feel it otherwise adds value to your project, you are asked to please consider supporting the project. You can do this by becoming a GitHub sponsor to make a financial contribution. Data Standardizer is maintained and enhanced by @matthew25187 in his personal time and made available for free for all to use.
Data Standardizer is available as a series of packages from NuGet.org that can be linked to your existing projects. Available packages include:
| Package | Description |
|---|---|
| DataStandardizer.BCP47 | Supports IETF BCP 47 language tags. |
| DataStandardizer.Chronology | Provides support for the TZ Database. |
| DataStandardizer.Core | Common types used to implement standards in the other packages. You should not need to link to this package directly. |
| DataStandardizer.File.CSV | Provides implementations of standards-based file formats. |
| DataStandardizer.ISO15924 | Supports ISO 15924, Codes for the representation of names of scripts. |
| DataStandardizer.ISO3166 | Supports ISO 3166, Codes for the representation of names of countries and their subdivisions parts 1 & 2. |
| DataStandardizer.ISO4217 | Supports ISO 4217, Codes for the representation of currencies and funds. |
| DataStandardizer.ISO639 | Supports ISO 639, Codes for the representation of names of languages parts 1, 2, 3 & 5. |
| DataStandardizer.Money | Provides types for the handling of monetary values. |
| DataStandardizer.UNM49 | Supports UN M49 or the Standard Country or Area Codes for Statistical Use (Series M, No. 49). |
To use a particular standard in your application, find the corresponding package from the above list and add it as a dependency to your project. Instructions for doing so will depend on what development tooling you are using.
- Visual Studio: see Install and manage packages in Visual Studio using the NuGet Package Manager
- .Net CLI: see Install and manage NuGet packages with the dotnet CLI
- Visual Studio Code: see NuGet in Visual Studio Code
Depending on which .Net platform you are targeting, the above packages will also depend on various other system- and third-party packages. They will be included as static dependencies where required and should be automatically resolved, but if you are using a proxy for your package server you may need to make sure these other packages are also available.
The repository includes a number of PowerShell scripts with names starting with Generate. These scripts are used to re-generate the enums that comprise the implementations of each corresponding standard and require the use of a PowerShell shell prompt to execute as well as access to the official flat-file data sources provided by the relevant standards body or designated maintainer. Some scripts may also require a minimum version of PowerShell to run.
Other scripts and YAML files are included to support the infrastructure (IaC) used by the Data Standardizer project for functions such as pipelines, package hosting, etc. These files are not intended to be used by the end-user.
The most recently produced release version (shown above) does not necessarily correspond with the latest package version published to NuGet or any other publically available source.
The Data Standardizer repository makes use of two "main" branches. They are:
| Name | Description |
|---|---|
master |
Top-level branch from which all package release builds are produced. The develop branch will be merged into this branch when a new release is done. |
develop |
Default branch and the branch from which preview package builds are produced. Changes are marshalled on this branch before being included in a release build. |
Other branches that may be created from time-to-time are not relevant to non-contributors.
To compile the source code, first you will need to clone the repository to your local machine. You can find instructions for doing so here.
With the source code, you can then open a command prompt, change the current directory to the repository root folder, and use the following command to compile the entire solution:
dotnet build DataStandardizer.sln
You can also work with the source code in IDEs such as Visual Studio or Visual Studio Code. In these cases, open the DataStandardizer.sln solution file to access the source code.
There are also solution filter files (*.slnf) for each of the projects (packages) in the repository root folder alongside the main solution file. These files narrow the scope of projects included to only those needed to build and test a single package. You can also build these solution filters if so desired, and even open them in your IDE if you only want to work with the code for one package. They are included mainly because they are used by the CI pipelines to enable the building and testing of each package individually.
The included tests are based on the XUnit test framework. To run the tests, you will need a test runner able to work with XUnit. The test projects do include a default test runner dependency, which enables you to run the tests from the command line. With a command prompt open (as described above), you can run all tests in the solution:
dotnet test DataStandardizer.sln
Visual Studio includes the Test Explorer that enables you to discover available tests and execute those tests by various categorizations. Find out more about Test Explorer here. Testing is also supported in Visual Studio Code with use of the C# Dev Kit (learn more here).
Though each package contains many types, typically there will be only a few that you will end up using directly in your application. Listed here are the main types you are most likely to include in your source code.
Please refer to the project documentation for more information.
| Type | Description |
|---|---|
Bcp47LanguageTag |
Represents an IETF language tag. May be created by using the provided static factory methods or by using the language tag builder. |
Bcp47LanguageTagBuilder |
Can be used to construct a language tag using a fluent-style syntax. |
SubtagRegistry |
Represents a copy of the IANA Subtag Registry. May be loaded by various means, but the source must be in the original "record-jar" format as described in RFC 5646. Used to create language tags based on the subtag registry (which defines most valid tags and subtags) as opposed to creating a language tag based just on the rules defined by RFC 5646. |
SubtagRegistryFileDateRecord |
Represents a "File-Date" record from the subtag registry. |
SubtagRegistrySubtagRecord |
Represents a "Subtag" record from the subtag registry. |
SubtagRegistryTagRecord |
Represents a "Tag" record from the subtag registry. |
| Type | Description |
|---|---|
TzDataTimezone |
An enum containing the timezones defined by the TZ Database. |
| Type | Description |
|---|---|
CsvFieldMappingAttribute |
Declares the mapping of a property to a CSV field. |
CsvFileHeaderLine |
Represents a header line from a CSV file. |
CsvFileOptions |
Options for configuring the behaviour of a CSV reader or writer. |
CsvFileReader |
Reader of a CSV file sourced from a Stream, TextReader or file. |
CsvFileRecordLine |
Represents a record line from a CSV file. |
CsvFileWriter |
Writes a CSV file to a Stream, TextReader or file. |
| Type | Description |
|---|---|
Iso15924Script |
An enum containing script codes from ISO 15924. Includes both the four-letter alpha codes and three-digit numeric codes from the standard as the name and value of the members, respectively. |
| Type | Description |
|---|---|
Iso3166Part1Alpha2Country |
An enum containing the country codes from ISO 3166-1 Alpha-2. Includes both the two-letter alpha codes and numeric codes from the standard as the name and value of the members, respectively. |
Iso3166Part1Alpha3Country |
An enum containing the country codes from ISO 3166-1 Alpha-3. Includes both the three-letter alpha codes and numeric codes from the standard as the name and value of the members, respectively. |
Iso3166Part2Subdivision |
An enum containing the subdivision codes from ISO 3166-2. Given the hierarchical nature of these codes, this implementation uses a nested structure to access the codes so that each group of subdivision codes is grouped under a nested type named after the country code of the country the subdivision codes belong to. |
| Type | Description |
|---|---|
Iso4217CurrencyCurrent |
An enum containing active currency codes from ISO 4217. Includes both the three-letter alpha codes and numeric codes from the standard as the name and value of each member, respectively. |
Iso4217CurrencyHistoric |
An enum containing retired currency codes from ISO 4217. Includes both the three-letter alpha codes and numeric codes from the standard as the name and value of each member, respectively. |
| Type | Description |
|---|---|
Iso639Part1Language |
An enum containing the alpha-2 language codes from ISO 639-1. |
Iso639Part2BLanguage |
An enum containing the bibliographic alpha-3 language codes from ISO 639-2. |
Iso639Part2TLanguage |
An enum containing the terminological alpha-3 language codes from ISO 639-2. |
Iso639Part3Language |
An enum containing the alpha-3 language codes from ISO 639-3. |
Iso639Part5LanguageFamily |
An enum containing the alpha-3 language family codes from ISO 639-5. |
| Type | Description |
|---|---|
Money |
A data type for handling a monetary value comprising an amount and a currency code. Optionally supports user-specified rounding that is applied on conversion to a decimal value. |
| Type | Description |
|---|---|
UnM49AreaByAlpha2CountryCode |
An enum containing the numeric M49 codes from standard UN M49. Because of technical requirements on the naming of members, each code is keyed on its corresponding ISO 3166-1 alpha-2 code. |
UnM49AreaByAlpha3CountryCode |
An enum containing the numeric M49 codes from standard UN M49. Because of technical requirements on the naming of members, each code is keyed on its corresponding ISO 3166-1 alpha-3 code. |
N.B. Because of the way the source data is arranged, the above enums only directly include members representing M.49 codes that have a corresponding alpha-2 or alpha-3 code from ISO 3166-1. There are additional M.49 codes representing supra-national regions or other areas that are included as metadata on these enum members, and can be retrieved using provided extension methods.
The DataStandardizer.File.CSV package includes built-in support for working with CSV (Comma-Separated Values) files β a common format for structured data exchange. This functionality is designed to be lightweight, flexible, and compatible with legacy .NET applications via support for .NET Standard 1.x and 2.0.
- Read and write CSV files with customizable delimiters
- Normalize inconsistent CSV structures for downstream processing
- Handle headers, quoted fields, and edge cases gracefully
- Designed for extensibility and integration into broader data workflows
var inputPath = "data.csv";
var outputPath = "normalized.csv";
var csvInputOptions = new CsvFileOptions
{
TerminatorLineBreak = "\n" // source file has non-standard line breaks
};
var csvOutputOptions = csvInputOptions with
{
TerminatorLineBreak = "\r\n", // write lines to the output file with standard line breaks
QuoteHandling = CsvFieldQuoteHandling.Required // quote field values only when needed
};
using (var input = File.OpenRead(inputPath))
using (var csvReader = new CsvFileReader<CsvFileRecordLine>(input, csvInputOptions))
using (var output = File.Create(outputPath))
using (var csvWriter = new CsvFileWriter<CsvFileRecordLine>(output, csvOutputOptions))
{
var line = csvReader.ReadLine();
while (line is not null)
{
csvWriter.WriteLine(line);
line = csvReader.ReadLine();
}
}
CSV support is provided by the DataStandardizer.File.CSV NuGet package. You can install it via:
dotnet add package DataStandardizer.File.CSV
For more advanced usage and configuration options, see the project documentation.