Commit graph

115 commits

Author SHA1 Message Date
Nelson Jovel
e5e969b968 feat: add parent survey gauges 2024-09-16 15:24:43 -07:00
Nelson Jovel
3ecc68edd0 chore: correct parsing for 'not sped' and 'lep not first year' 2024-06-26 12:02:22 -07:00
Nelson Jovel
ed3ac25a7b chore: During cleaning, stop execution if grade column isn't found. Also stop execution if a duplicate header is found. Turn off spec for duplicate header check 2024-05-23 12:52:13 -07:00
Nelson Jovel
b1f942133b update parsing rules from glossary 2024-05-15 10:15:16 -07:00
Nelson Jovel
ea2feb138b add disaggregation glossary 2024-05-15 10:15:16 -07:00
Nelson Jovel
927fae1afd chore: add a test for categorizing sped values of 1 and 0 as 'Special Education' and 'Not Special Education' 2024-05-07 19:38:08 -07:00
Nelson Jovel
8bb6f5e8f0 Add ell income and sped parsing rules for SIS data. Add tests for the
new inputs.
2024-05-07 18:46:53 -07:00
Nelson Jovel
33da0859b9 Split academic year into seasons if the academic year's range is
initialized with a season, i.e. "2024-25 Fall".  Update scapers for
admin data, enrollment and staffing to use the new range standard
correctly.   Update the loaders for admin data, enrollment and staffing
so that it populates all seasons in a given year.  So admin data for
2024-25 gets loaded into "2024-25 Fall" and "2024-25 Spring".  Add tests
for the new range format.  Set the default cutoff for the start of Spring season will be the last Sunday in February
2024-04-25 09:21:04 -07:00
5789ebf564 Faster admin data loader + rename School.school_hash 2024-04-22 14:46:37 -04:00
Nelson Jovel
289b04bc69 match an additional format for Dates. Supported dates are now '1/10/2022 14:21:45' '2022-1-10T14:21:45' '2022-1-10 14:21:45' 2024-03-01 09:30:23 -08:00
Nelson Jovel
d6735d449d feat: Support two date formats: ISO 8601 and the standard US date format
used in google sheets
2024-02-27 11:55:47 -08:00
Nelson Jovel
9696a2b2fa fix: fix failing test 2024-02-23 11:54:32 -08:00
Nelson Jovel
0a32fb50ff fix: no longer support 'form' in filename when cleaning. Only look for 'part X' and add that to the filename if it exists 2024-02-22 10:55:48 -08:00
Nelson Jovel
ed07114a91 fix: fix failing tests 2024-02-22 10:41:42 -08:00
Nelson Jovel
880b438eb4 chore: reenable test spec that tests data loader for races 2023-12-20 12:39:44 -08:00
Nelson Jovel
36e21515c3 chore: refactor Race out of survey_item_values 2023-12-20 12:25:23 -08:00
Nelson Jovel
e7fb009425 chore: refactor Gender out out of survey_item_values row 2023-12-20 11:08:23 -08:00
Nelson Jovel
41d942c214 chore: Make sure 'hispanic' column only gets applied when using SIS race information 2023-12-12 10:53:07 -08:00
Nelson Jovel
f028e6c884 feat: if the filename includes the words 'form' or 'part' add that to the resulting cleaned filename 2023-12-11 15:39:20 -08:00
Nelson Jovel
d90a83e510 fix: instead of looking for 'asian' at the start of a word, look for it
after a word boundary.  This means it still doesn't get confused with
caucasian and it's more flexible whan asian appears inside other text
such as 'Caucasian and Asian and Black'
2023-12-08 14:16:50 -08:00
Nelson Jovel
3f44613085 chore: various fixes for race and gender categorization during cleaning.
Also add tests for race and gender categorization
2023-12-08 13:12:19 -08:00
Nelson Jovel
b7e670bb60 Lower threshold for the number of valid student responses from 17 to 11 2023-12-06 14:15:19 -08:00
Nelson Jovel
6e05909423 chore: fix categorization of gender 2023-12-06 14:12:27 -08:00
Nelson Jovel
e325f38c43 Convert gender and race text into qualtrics codes during cleaning. Abide by 'prefer not to disclose' for self reported race. Give priority to self reported data but use SIS information as backup 2023-12-06 14:10:16 -08:00
Nelson Jovel
305ddf2b1a chore: add test for checking duplicate headers during cleaning process 2023-12-06 14:10:08 -08:00
Nelson Jovel
b63c327d33 chore: when searching for dese id, split up pattern so that to be more explicit about the order in which to search out the columns that might have the dese ID we're looking for. 2023-11-06 13:15:50 -08:00
rebuilt
1a707eb6bc feat: load student responses in the same pass as loading the survey responses
chore: remove student loader since loading students is now done with the survey response loader
2023-11-02 09:52:39 -07:00
rebuilt
e3fbbabce5 feat: We no longer trust the progress number that gets exported from qualtrics. Instead during the cleaning progress, perform a manual count of the number of responses to filter out rows that don't meet the minimum threshold. 2023-10-27 15:12:24 -07:00
rebuilt
83661540b7 chore: upgrade to rails 7.1.
upgrade rspec

fix failing tests

upgrade devise
2023-10-11 10:58:52 -07:00
rebuilt
48e795fcfb feat: add special education disaggregation 2023-10-06 11:41:52 -07:00
rebuilt
060d7aa55a Add disaggregation by ELL 2023-09-29 19:29:23 -07:00
rebuilt
abea2cb8fa feat: support multiple columns for race and gender information 2023-08-25 15:37:20 -07:00
rebuilt
714b90b3eb fix: ensure cleaner outputs columns for all survey items. Before the fix, if a survey item varient (ending in -1, ie s-tint-q1-1) did not have a matching survey item s-tint-q1, the resulting csv would not include that column 2023-08-23 15:30:47 -07:00
rebuilt
a785c69c44 Add Overall Response Rate 2023-08-09 15:13:58 -07:00
rebuilt
4afa030141 chore: remove precalculated race scores. Calculate race scores on every reload 2023-08-07 16:02:59 -07:00
rebuilt
cec48e55d3 chore: remove outdated admin data loader file. We now use Dese::Loader to load school level data 2023-07-21 12:16:59 -07:00
rebuilt
5c7729beeb feat: if admin data value is above 5, round down to 5 2023-07-21 12:14:46 -07:00
rebuilt
4f035f6a63 feat: Add income table to the database. Add seeder for income. Add a reference to income from survey item response. Update the loader to import income data from the survey response csv. Refactor analyze controller to extract presenter. Add corresponding specs. Add income graph to analyze page 2023-07-07 09:14:36 -07:00
rebuilt
d72f8d31e0 fix: There was an n+1 problem where we looked up the list of schools for
every row. Now we query the list of schools just once per file
2023-06-26 11:36:03 -07:00
rebuilt
e8aa75bf66 feat: update survey_item_response table to indlude recorded date and import recorded date when loading responses 2023-06-23 11:26:53 -07:00
Nelson Jovel
0a2c5e02c5 feat: add ability to merge disaggregation data with raw survey data to
produce a cleaned csv with merged income disaggregation columns
2023-06-20 12:22:24 -07:00
rebuilt
411c632c25 chore: remove errant comment 2023-06-12 16:06:57 -07:00
rebuilt
30285efd69 It's possible for admin data likert score values to be above 5. If that happens, we
cap the likert score at 5.   This was happening already at the scraper
level but it's also now being done by the admin data loader for safety.
Also make sure to just update admin data instead of deleting and
reloading all values each load. Add tests to confirm this behavior
2023-06-03 17:14:41 -07:00
rebuilt
9aeb5f92af Missing progress or duration information does not result in a row removed in the cleaning process 2023-06-02 15:23:21 -07:00
rebuilt
e3ae12b425 update response_date to recorded_date 2023-05-31 16:57:47 -07:00
rebuilt
8ef8cfce58 Adjust valid duration threshold of short form items 2023-05-26 18:30:44 -07:00
rebuilt
4509c157fa Add automated data cleaning. Modify SurveyItemValues class to use regex
instead of hard coded values.  Produce a clean csv and a csv with all
the removed values and columns with reason for removal. Add script for
running cleaning for each project
2023-05-16 13:38:29 -07:00
rebuilt
3f2a7dff50 Fix problem with dese scraper lumping in 2021-22 data as 2022-23 data.
Deleted unused csvs.  Turned off puts statements in admin loader.
Remove old, now unused admin data loader class.
2023-04-27 15:43:17 -07:00
rebuilt
128748addd Update logic for calculating student response rate. Remove references
to survey table.  We no longer check or keep track of the survey type.
Instead we look in the database to see if a survey item has at least 10
responses.  If it does, that survey item was presented to the respondent
and we count it, and all responses when calculating the response rate.

Remove response rate timestamp from caching logic because we no longer
add the response rate to the database. All response rates are calculated
on the fly

Update three_b_two scraper to use teacher only numbers

swap over to using https://profiles.doe.mass.edu/statereport/gradesubjectstaffing.aspx as the source of staffing information
2023-04-18 13:59:29 -07:00
rebuilt
b250ebe415 Memoize schools in SurveyItemValues and academic_years in AcademicYear
for performace improvement
2023-03-28 03:38:52 -07:00