Skip to content

Conversation

MohammadAlavi1986
Copy link

The order of fields in 2018 daily attendance is as follows:

School DBN,Date,Enrolled,Absent,Present,Released
01M015,20180905,172,19,153,0
01M015,20180906,171,17,154,0
01M015,20180907,172,14,158,0

Unlike the other CSV files, Absent is followed by Present and this explains the absurdly high (over 90%) absenteeism.
By fixing the order of the fields, the list of schools with most absenteeism will become:

+--------+------------------+------------------+------------------+
|schoolId|      avg_enrolled|        avg_absent|                 %|
+--------+------------------+------------------+------------------+
|  10X476|179.33333333333334| 79.28439587128112| 44.21062966800062|
|  17K646|             240.0|105.67567567567568|44.031531531531535|
|  79Q607|             214.0|  92.5546218487395| 43.24982329380351|
|  79K957|             177.0| 74.23145161290323| 41.93867322762894|
|  79K665|            1128.0| 458.4076086956522|40.638972402096826|
+--------+------------------+------------------+------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant