Skip to content

Email search scenario tests of using dotnet spark as reducers #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ruihmicrosoft
Copy link

@ruihmicrosoft ruihmicrosoft commented Aug 7, 2019

This commit adds two unit tests which mimic the reducers used in email search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet.

  1. The first reducer computes the top N contacts per user. It first groups by the user id and then collapses each group into one row by concatenating the records. A Udf is used in order to apply some string operations.
  2. The second reducer takes N rows and outputs M rows. Given a succeeded search query with flattened folder and item list, the reducer explodes the input dataframe by pairing the corresponding folder and item ids. If the folder and item list contain multiple records, the reducer will generate multiple rows as output.

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

…l search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet.

1) The first reducer computes the top N contacts per user. It first groups by the user id and then collapses each group into one row by concatenating the records. A Udf is used in order to apply some string operations.
2) The second reducer takes N rows and outputs M rows. Given a succeeded search query with flattened folder and item list, the reducer explodes the input dataframe by pairing the corresponding folder and item ids. If the folder and item list contain multiple records, the reducer will generate multiple rows as output.
@dnfclas
Copy link

dnfclas commented Aug 7, 2019

CLA assistant check
All CLA requirements met.

@ruihmicrosoft ruihmicrosoft changed the title This commit adds two unit tests which mimic the reducers used in emai… Email search scenario tests of using dotnet spark as reducers Aug 7, 2019
@imback82
Copy link
Contributor

@ruihmicrosoft, we have been adding examples with some documentations under the example folder: #319, #320. If you want to push forward this PR, can you please convert this to an example, not E2E test? Thanks.

Base automatically changed from master to main March 18, 2021 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants