{"id":966,"date":"2013-08-30T15:47:58","date_gmt":"2013-08-30T19:47:58","guid":{"rendered":"http:\/\/blog.bitsofgenius.com\/?p=966"},"modified":"2013-08-30T16:16:13","modified_gmt":"2013-08-30T20:16:13","slug":"the-5-core-system-requirements-to-protect-your-sleep","status":"publish","type":"post","link":"https:\/\/blog.bitsofgenius.com\/?p=966","title":{"rendered":"The 5 Core System Requirements To Protect Your Sleep"},"content":{"rendered":"<p>I have been a software developer for over 30 years now. \u00a0That includes time as a kid playing with technology, time as a student, and professional and non-professional work. \u00a0Over that time I have encountered a lot of bad designs, and some good designs in systems and software. \u00a0I also have, like every person in the engineering or operations side of technology companies, lost a lot of sleep from time to time, due to midnight emergencies or insecurity about having a stable system during peak usage.<\/p>\n<p>I&#8217;ve distilled my overall experience into a list of what I consider the fundamental requirements of a system (enterprise or, more simply, multi-server) to<\/p>\n<ol>\n<li>Ensure it is survivable (more than just the original developer can make it run)<\/li>\n<li>Keeps its ability to evolve over time<\/li>\n<li>Minimizes the amount of technical debt created by its design<\/li>\n<li>Minimizes the amount of disjointed interaction, and maximizes discovery and effectiveness of community effort.<\/li>\n<\/ol>\n<p>I am focusing here on the big picture: the system as a whole. \u00a0Best practices for code styles, library use, etc are only implementations of these suggestions which are up to you as a business to decide. \u00a0These requirements are all agnostic to a particular platform or technology.<\/p>\n<p><strong><em>#1: Keep configurations, to the fullest extent possible, out of local .config\/.xml\/.ini\/.cfg files, registries, and other application\/service\/server-specific locations. \u00a0Centralize it, with a good user interface that configuration specialists can use to review and adjust everything. \u00a0<\/em><\/strong><\/p>\n<p>This avoids the nightmare of relocating hardware or services, and potentially not having the entire set of configurations needing changes due to the migration. \u00a0There is nothing worse than a series of inconsistent tools, services and applications written by several different developers over time, each having their own philosophy or practice of how to store configurations for various items.<\/p>\n<p>Enforce this requirement: it will give your system a chance at longevity and retain scalability, and drastically reduce technical debt.<\/p>\n<p><strong><em>#2: Log what your application is doing, not only to a specified level of detail set in a configuration, but log it to the RIGHT location. \u00a0The RIGHT location has broad centralized access with search and filtering abilities.<\/em><\/strong><\/p>\n<p>This avoids the nightmare of not being able to properly troubleshoot a problem, because the details of an error or step in a process are not recorded. \u00a0Personally, I am a strong advocate of chatty log entries. \u00a0In my experience, deep-detail logging is useful for the most recent 96 to 120 hours of operation. \u00a0After that time, the log entries can be pruned down to a summary level for process accounting.<\/p>\n<p>By keeping the details for 96 to 120 hours, someone can return from a long weekend and have enough information to troubleshoot some problem occurring over the weekend. \u00a0The most important details to log are information about a specific step being done, error trace and stack info, and breadcrumb information (specific file names, URI or other resources). \u00a0Logging should also provide some information about the server itself (at a minimum it&#8217;s system name), to ensure that the reader is aware of where the process occurred.<\/p>\n<p>If the logging system is well-defined and well-exposed, it becomes an excellent foundation for others systems that will be of great value: metrics and alerting.<\/p>\n<p><strong><em>#3: Have a way to uniquely identify your application or component, within the collection of applications and services on a server, and also within a sea of servers in an infrastructure.<\/em><\/strong><\/p>\n<p>This partially applies to logging, and partially to configuration. \u00a0It is also intended to apply to contracts, permissions, service enabling, etc, which are related to the business side of things. \u00a0Unix and Linux engineers love deep paths or dotted identifiers. \u00a0While it does have to be this technique, you get the idea.<\/p>\n<p><strong><em>#4: Have a place to validate a business relationship for a customer consuming the process.<\/em><\/strong><\/p>\n<p>As software engineers and system designers, one of the critical considerations is the ability to allow the business administration to control what is available to a client. \u00a0If accounting has not been able to get a client to pay, how is the service for that client disabled? \u00a0If the system does not have an inherent ability to check this, outside of system\u00a0configurations it uses for operation of the system itself, then some form of &#8220;workaround&#8221; develops that may be more for technically disabling the service, then denying access to it.<\/p>\n<p>It is not uncommon to see a service for a client disabled (as part of a bulk action) in order for technical maintenance to be done, then that service is mistakenly enabled\u00a0for the client even though there was a business reason not to. \u00a0If there are not separate switches designed for technical disable versus business disable, it&#8217;s easy to cross the streams and get confused.<\/p>\n<p><strong><em>#5: Implement Flight Tracking<\/em><\/strong><\/p>\n<p>While logging tells you what a process has done (right or wrong), it will not tell you whether a process either ran when it should not, did not run when it should, or &#8230; just stopped processing. \u00a0While the latter (stopped processing) can be determined from a log, it must be discovered.<\/p>\n<p>Flight Tracking is a concept borrowed from the aviation industry. \u00a0When a pilot plans to fly his aircraft from point A to point B, he files a plan with his intended departure time, his intended route, and his destination. \u00a0The pilot can cancel the flight plan if needed before departure, or he can make changes as needed. \u00a0But the flight plan&#8217;s purpose is to know that the pilot and his place is where he said he would be, and react if the aircraft is overdue and out of communication for a period of time, or did not even depart as scheduled.<\/p>\n<p>This is a good practice in an enterprise system. \u00a0A specific process should report its launch\u00a0to a central location, and send periodic updates that it is still running and processing. \u00a0Ultimately, it should report its completion. \u00a0By doing this, a layer of monitoring can be added to the flight status compared against the flight plan, to report processes which have not provided updates (hung or crashed), or which have not launched as scheduled. \u00a0This is an important feature in a system which has defined SLA&#8217;s.<\/p>\n<p>While there are a number of passive monitors available (Nagios, etc), there are times where the passive monitor will report the application or service as running, but the app\/service is actually doing nothing. \u00a0By writing active flight status reporting in the application code itself, the confidence level is higher. \u00a0Think of it as an aircraft on autopilot. \u00a0Even if the pilot has passed out at the controls on autopilot, the plane will look fine on radar for a while (passive monitoring). \u00a0Only direct communication from the pilot via the radio will ensure confidence that the flight is going as intended.<\/p>\n<p>* * * * *<\/p>\n<p>There are a slew of other issues that need to be addressed in design, but these items are the core of protecting your sleep (and sometimes, even your sanity). \u00a0These 5 core principles all establish a standard, broad-based view of \u00a0a system that keeps everyday operation as simple as possible&#8211;and keeps the developer focused on developing.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have been a software developer for over 30 years now. \u00a0That includes time as a kid playing with technology, time as a student, and professional and non-professional work. \u00a0Over that time I have encountered a lot of bad designs, and some good designs in systems and software. \u00a0I also have, like every person in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19,28,20,13,17],"tags":[],"class_list":["post-966","post","type-post","status-publish","format-standard","hentry","category-non-technicalthoughts","category-just-on-my-mind","category-technologynetworking","category-technologythoughts","category-tips-and-tricks"],"_links":{"self":[{"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=\/wp\/v2\/posts\/966","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=966"}],"version-history":[{"count":8,"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=\/wp\/v2\/posts\/966\/revisions"}],"predecessor-version":[{"id":983,"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=\/wp\/v2\/posts\/966\/revisions\/983"}],"wp:attachment":[{"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=966"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=966"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.bitsofgenius.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=966"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}