Skip to content

Conversation

ruihmicrosoft
Copy link

@ruihmicrosoft ruihmicrosoft commented Aug 7, 2019

This commit adds two unit tests which mimic the reducers used in email search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet.

  1. The first reducer computes the top N contacts per user. It first groups by the user id and then collapses each group into one row by concatenating the records. A Udf is used in order to apply some string operations.
  2. The second reducer takes N rows and outputs M rows. Given a succeeded search query with flattened folder and item list, the reducer explodes the input dataframe by pairing the corresponding folder and item ids. If the folder and item list contain multiple records, the reducer will generate multiple rows as output.

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

…l search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet.

1) The first reducer computes the top N contacts per user. It first groups by the user id and then collapses each group into one row by concatenating the records. A Udf is used in order to apply some string operations.
2) The second reducer takes N rows and outputs M rows. Given a succeeded search query with flattened folder and item list, the reducer explodes the input dataframe by pairing the corresponding folder and item ids. If the folder and item list contain multiple records, the reducer will generate multiple rows as output.
@dnfclas
Copy link

dnfclas commented Aug 7, 2019

CLA assistant check
All CLA requirements met.

@ruihmicrosoft ruihmicrosoft changed the title This commit adds two unit tests which mimic the reducers used in emai… Email search scenario tests of using dotnet spark as reducers Aug 7, 2019
@imback82
Copy link
Contributor

@ruihmicrosoft, we have been adding examples with some documentations under the example folder: #319, #320. If you want to push forward this PR, can you please convert this to an example, not E2E test? Thanks.

Base automatically changed from master to main March 18, 2021 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants