Things I wish I knew before starting a relational database
Contents
Things I wish I knew before starting a relational database#
Why relational#
Checks on data inegrity means you can pass your project to the maintainers
Lower barrier to try out ideas on old data
Staying sane#
Separate phases of analysis: ingestion/condensing, chunking, computation, organization, plotting
Write two schemas at a time, one you’re thinking about now and its child
First, populate a few recording sessions, not all datasets
Once the current schema and its child seem to work fine, get them populating and move on the the next node. Work on whole analysis simultaneously - no need to wait for all step to finish. Can start next as soon as a few keys have populated.
My data entry stage is during recording when I make the filename
Keep Datetime as primary key for restrictions on populate calls (Not so sure anymore)
baby steps at the top, cram many at the bottom - bottom is high overhead due to multiplication of keys (eg events), whereas dropping top nodes means recalculate all children For populate calls, restrict by list of fids if you suspect that some are not populating due to errors
Xarray populate must not be parallelized or dropped - dj doesn’t check it
if not all items are showing up at the end of populate, repeat with reserve=False
start with reserve=False for schemas that require little cpu
If you work on children and parents simultaneously, parents may not show up for a while. wait until mysql processes them first.
Gotchas#
fractional primary keys must be stored as decimal (10,30)