Common Tick-Borne Diseases (TBDS) — Global Lyme Alliance

Common Tick-Borne Diseases (TBDS)


While Lyme disease is the most common vector-borne illness, ticks may carry multiple infectious organisms (co-infections) along with Lyme. Here are some common tick-borne diseases. Future research may uncover additional co-infections.

ANAPLASMOSIS – Caused by the bacterium Anaplasma phagocytophilum, previously known as human granulocytic ehrlichiosis (HGE) and more recently called human granulocytic anaplasmosis (HGA). One to two weeks after a bite, watch for fever, headache, chills, abdominal pain and muscle aches.

BABESIOSIS – A malaria-like, parasitic infection of red blood cells. Most cases are caused by the parasite Babesia microti, though occasionally caused by other strains of Babesia. Symptoms can be mild to life threatening, with a high fever, sweats, nausea, headaches and fatigue.

BARTONELLA HENSELAE – CAT-SCRATCH DISEASE – Bacterial disease starting with a red mark that can become swollen and discolored or even look like “stretchmarks”/striations. Symptoms include swollen lymph nodes (especially under ears) often with conjunctivitis, heart or spleen problems, bone lesions, hepatitis, other eye problems and encephalitis (causing seizures and coma). There is indirect clinical evidence that this is a group of “Bartonella-like organisms” that can co-infect a Lyme patient. Identification of these organisms awaits further scientific study.

BORRELIA MIYAMOTOI – One of the newer tick-borne infections, it is distantly related to the bacterium that cause Lyme disease. Patients with this infection are most likely to have relapsing fever, chills, headache, fatigue, body and joint pain. Left untreated, it can morph into a disease that causes cardiac, neurologic and arthritic problems.

BOURBON VIRUS – An extremely rare new virus reported in two people in Kansas and Oklahoma. The Oklahoma patient made a full recovery, while the first and only other case in Bourbon, Kansas was deadly. Symptoms include fever, fatigue, rash, muscle and joint pain.

COLORADO TICK FEVER – Viral disease characterized by a high fever and sometimes a faint rash. After a 2-3 day remission, symptoms recur, accompanied by a drop in white blood cells. Complications may include encephalitis, heart problems and severe bleeding.

EHRLICHIOSIS -Rickettsial infections (HME and HGE forms) of white blood cells. A rash may occur. Severe illness may have neurologic complications. Delayed treatment can result in death.

HEARTLAND VIRUS – Studies suggest that the Heartland virus, first identified in Missouri, may be spread through the bite of an infected Lone Star tick. To date, the virus has been found in Missouri, Tennessee and Oklahoma. Symptoms include easy bruising, diarrhea, fever, headaches, fatigue, appetite loss and muscle pain. There are no known treatments for it.

MYCOPLASMA – Although not necessarily transmitted along with the Borrelia bacterium via an infected tick, Mycoplasma fermentans organisms are often present in Lyme disease patients. Smaller than bacteria, they invade all cells throughout the body and disrupt the immune system, causing severe fatigue, joint pain, nausea and neuropsychiatric problems.

POWASSAN ENCEPHALITIS – Viral brain infection causing seizures, aphasia, muscle weakness, dementia and death. No effective treatment and onset of illness is fairly rapid.

RELAPSING FEVER – Multi-system bacterial infection with symptoms similar to Lyme disease. Characterized by repeating bouts of fever lasting 2-9 days, alternating with periods of no fever.

ROCKY MOUNTAIN SPOTTED FEVER – Caused by the bacterium Rickettsia rickettsii, it is transmitted by the American dog tick, Rocky Mountain wood tick and brown dog tick. It can cause a reddish-to-black rash resembling measles, though in some patients a rash never develops. It can be a severe or even fatal illness if not treated in the first few days of symptoms.

SOUTHERN TICK-ASSOCIATED RASH ILLNESS (STARI) – Also known as Masters disease, this disease is transmitted by the Lone Star tick. Early STARI symptoms are similar to the symptoms of early Lyme disease, including a circular rash. The rash may be accompanied by fatigue, headache, fever and joint pains

TICK PARALYSIS – Loss of motor function and increasing paralysis caused by a toxic reaction to saliva from female ticks.

TULAREMIA – A bacterial infection (sometimes called rabbit fever), the hallmark of which is the development of an ulcer at the site of infection. The illness can manifest in various symptom complexes, including spiking fevers, inflamed lymph nodes and eyes, pneumonia and weight loss.

How to Take Care of Cold Sores at Home

Articles On Cold Sores

Cold Sores

Cold Sores — How to Take Care of Cold Sores at Home

If you’ve ever had a cold sore, you know the signs. It starts with the tingling, then the edge of your lip or the corner of your mouth begins to burn. Then the outbreak: An ugly red sore appears. A few days later it breaks open and crusts over.

Cold sores, or fever blisters, are a bother in more ways than one. They’re not only painful, they can ruin your smile. When you feel one budding, you want to get rid of it, fast.

But you probably don’t need a doctor. There are things that you can do at home to soothe the pain and make cold sores look nicer as they heal.


Cold sores are caused by a common virus called herpes simplex. Most people get exposed to the virus when they’re babies or children. There’s no cure for it. Once you’ve been exposed to it, it’s always in your system, even if it doesn’t often cause cold sores or other symptoms.

Herpes simplex is spread by close contact. If you kiss someone with a cold sore, or you touch his face and then touch your own face, you can catch the virus. You can also get herpes simplex by sharing lip balm, a fork, a mug or a razor with someone who has it. You’re most likely to get the virus from someone who has an active cold sore, but it’s also possible to contract it from someone who doesn’t have a sore or blister showing.

The virus also can spread to the eyes or the genitals. For example, if you rub your eyes after getting saliva from an infected person on your hands, or if you receive oral sex from someone who has cold sores.

When you’re first exposed to the virus, you’re likely to get a cold sore. After a week or two, it’ll go away on its own. Then the virus goes dormant in your body. You may never have another cold sore outbreak again, but many people do.

Some things that make an outbreak more likely are:

  • A cold or other illness
  • A fever
  • Stress
  • Too much sun
  • Your period


How to treat cold sores

There are many that you can do at home to soothe the sting of a cold sore, such as:

Ice. You can numb the pain if you apply a cold compress to the sore. Don’t put ice directly on your skin — that could damage it.

Pain relievers. When a cold sore really stings, you may get some relief from an over-the-counter painkiller like acetaminophen.

Over-the-counter creams. There are products available at the drugstore that can help reduce the pain of a cold sore or help keep the skin soft while it heals.

Aloe vera gel. The same gel used for sunburn may help a cold sore to heal.В Lab research has shown the gel may help fight viruses, including herpes simplex.

Avoid triggers. This means that if you know a hot, sunny day at the beach or a lot of stress makes you break out in cold sores, try to stay out of those situations when you can. You may be able to stop it in its tracks, or at least keep it from getting worse.

Don’t touch. If you pick your cold sore, you may spread the virus to another part of your body. That will just make your outbreak worse. Keep your hands away from your mouth, and wash your hands often, especially when you touch your face.


American Academy of Dermatology: “Herpes simplex: Signs and symptoms,” “Herpes simplex: Who gets and causes,” “Herpes simplex: Diagnosis and treatment,” “Herpes simplex: Tips for managing.”

American Academy of Family Physicians: “Mouth Problems.”

Office on Women’s Health, U.S. Department of Health and Human Services: “Oral health fact sheet.”

American Academy of Otolaryngology-Head and Neck Surgery: “Mouth Sores.”

Journal of Dentistry (Shiraz): Assessment of Anti HSV-1 Activity of Aloe Vera Gel Extract: an In Vitro Study.

Is Your Excessive Sweating Caused by a Medical Problem?

Sweating may be a symptom of thyroid problems, diabetes, or infection.

Do you sweat more than other people? Does a five-minute workout on the treadmill leave you sopping wet? Do you wipe your hand before every handshake?

At the very least, excessive sweating is a hassle. But sometimes heavy sweating is sign of a medical condition.

«It’s not always easy for the average person to know the difference,» says Benjamin Barankin, MD, a dermatologist in Toronto and a member of the American Academy of Dermatology.

Excessive sweating, or hyperhidrosis, can be a warni ng sign of thyroid problems, diabetes or infection. Excessive sweating is also more common in people who are overweight or out of shape.

The good news is that most cases of excessive sweating are harmless. If you are worried about how much you sweat, here’s information to help you decide if you should see a doctor for a medical diagnosis.

What Is Excessive Sweating?

If you just sweat more than other people when it’s hot or you’re exerting yourself, that’s not usually a sign of trouble. Sweating is a normal reaction when your body’s working harder and needs to cool itself down.

«There are natural variations in how people sweat, just as there are variations in other bodily functions,» says Dee Anna Glaser, MD, vice chair of the dermatology department at St. Louis University and president of the International Hyperhidrosis Society. «Some people start sweating more easily than others.»

True excessive sweating goes beyond the normal physical need to sweat. If you have hyperhidrosis, you may sweat heavily for no reason — when it’s not appropriate to the circumstances.

«Let’s say that the temperature is mild, and you’re not anxious, and you don’t have a fever, and you’re just watching a movie with your family,» says Glaser. «If you’re sitting there sweating profusely, that’s not normal.»

Barankin says that there are two basic types of excessive sweating: localized hyperhidrosis and generalized hyperhidrosis.

Localized Sweating: Primary Focal Hyperhidrosis

The most common cause of excessive sweating is called primary focal hyperhidrosis. This form of hyperhidrosis affects about 1% to 3% of the population, and usually starts in childhood or adolescence.


Primary focal hyperhidrosis does not cause illness. Basically, you just sweat excessively. Although it is a medical condition, it’s not a sign of disease or a drug interaction. People who have it are otherwise healthy.

The symptoms of primary focal hyperhidrosis are fairly specific. It’s called focal or localized because it only affects specific parts of the body, such as the underarms, groin, head, face, hands, or feet. Symptoms also tend to be symmetrical, occurring on both sides equally.

Why does it happen? Experts aren’t sure, but primary focal hyperhidrosis seems to stem from a minor malfunction in the nervous system. There’s some evidence that it could run in families.

While primary focal hyperhidrosis isn’t medically risky, it can cause problems in your life. «Primary focal hyperhidrosis can really interfere with your quality of life,» Glaser says.

Some people are merely inconvenienced by excessive sweating. Others are so embarrassed that they limit their social and work life in harmful ways.

Generalized Sweating: Secondary General Hyperhidrosis

This less common form of hyperhidrosis causes sweating all over the body — not just on the hands or feet. Secondary general hyperhidrosis is also more serious medically. It’s called secondary because it’s being caused by something else, such as an underlying health condition.

One telltale sign of secondary hyperhidrosis is exc essive generalized sweating at night.

What can trigger secondary general hyperhidrosis? There are many possibilities, including a number of different medical conditions and diseases. They include:

What about anxiety? People who are anxious — or have actual anxiety disorders — may sweat more than others. But experts say that anxious sweating isn’t the same as hyperhidrosis. (In some people, however, the two conditions can occur at the same time.)

Medications can also cause general excessive sweating. Medications that can cause sweating include:

  • Some psychiatric drugs
  • Some blood pressure medications
  • Some medicines for dry mouth
  • Some antibiotics
  • Some supplements

Excessive Sweating: Signs You Should See the Doctor

Should you see a doctor about your excessive sweating? Yes, if you have these symptoms:


Night sweats: if you’re waking up in a cold sweat or you find your pillowcase and sheets are damp in the morning.

Generalized sweating: if you’re sweating all over your body, and not just from your head, face, underarms, groin, hands, or feet.

Asymmetrical sweating: if you notice that you’re only sweating from one side of your body, like one armpit.

Sudden changes: if your sweating has suddenly gotten worse.

Late onset: if you develop excessive sweating when you’re middle-aged or older. The more common primary focal hyperhidrosis usually starts in teenagers and young adults.

Symptoms after medication changes: if an outbreak of excessive sweating started up after you began a new drug.

Sweating accompanied by other symptoms, like fatigue, insomnia, increased thirst, increased urination, or cough.

Even if you don’t have those symptoms, if excessive sweating is bothering you or interfering with your life, talk to your doctor. Remember to bring along a list of all the drugs you take, including over-the-counter drugs and supplements. Your doctor may want to check your medications and run some tests.

Treating Excessive Sweating

While there is no cure for primary focal hyperhidrosis, there are ways to help control the symptoms. They include:

  • Antiperspirants. Special over-the-counter or prescription sprays, lotions, and roll-ons can help control symptoms.
  • Iontophoresis. This treatment uses low-level electrical impulses to temporarily disable the sweat glands.
  • Medications. Some drugs can stop the sweat glands from kicking into action.
  • Botox. Injections of Botox can temporarily stop the nerves from triggering excessive sweating. It is approved for treatment of excessive underarm sweating.
  • Surgery. One approach is to cut a nerve in the chest that triggers excessive sweating. Another is to surgically remove some of the sweat glands.

Secondary hyperhidrosis can often be treated too, although the right approach depends on th e condition causing it.

For instance, hyperhidrosis caused by an overactive thyroid may be resolved by treating the thyroid with medication or surgery. Excessive sweating caused by diabetes may disappear once glucose levels are under control. If a medication is causing your excessive sweating, your doctor may be able to prescribe a different drug.


Sometimes, the underlying cause of hyperhidrosis can’t be cured. Or you might really need a medicine that’s causing excessive sweating as a side effect.

However, if that’s the case, there are still things you can do, Glaser says.

«We try to just treat the symptom even when we can’t cure the underlying disease,» says Glaser. She says that many of the same treatments for primary focal hyperhidrosis work quite well in these cases. They include topical treatments, oral drugs, and Botox.

Getting Help for Excessive Sweating

Experts say that excessive sweating is something that people don’t take seriously enough. Many ignore their symptoms for months, years, and sometimes decades. That’s a bad idea for a couple of reasons.

First of all, it could have grave health consequences. «Excessive sweating can be a sign of a serious underlying health condition,» says Glaser. «Getting it diagnosed and treated sooner rather than later could really make a difference.»

Second, even when excessive sweating isn’t a sign of a more serious medical problem, getting expert help can be crucial.

«A lot of people don’t realize the impact that their symptoms are having,» says Glaser. In high school, they cover themselves up in layers and avoid school dances. As adults, they shy away from dating or socializing after work. Over time, they set up barriers between themselves and other people. But with treatment, that can all change.

«We have treatments that really work,» Glaser says. «They could make a huge improvement in your work life, your personal life, and your self-esteem.»

Barankin agrees. «For many people with hyperhidrosis, treatment is life-altering,» he tells WebMD. «They’re so grateful. They’re probably the happiest patients I see.»


Benjamin Barankin, MD, dermatologist, Toronto; member, American Academy of Dermatology.

Dee Anna Glaser, MD, vice chair, department of dermatology, St. Louis University; president, International Hyperhidrosis Society.

American Academy of Dermatology web site: «Hyperhidrosis.»

eMedicine web site: «Hyperhidrosis.»В

International Hyperhidrosis Society web site: «Understanding Hyperhidrosis,» «Diseases and Conditions that Can Cause Hyperhidrosis,» «ComnDrugs/Medications Known to Cause Hyperhidrosis.» web site: «Hyperhidrosis.»

The Society of Thoracic Surgeons web site: «Hyperhidrosis.»



Sign up for our newsletter.

Get the latest tutorials on SysAdmin and open source topics.


An Introduction to Metrics, Monitoring, and Alerting


Understanding the state of your infrastructure and systems is essential for ensuring the reliability and stability of your services. Information about the health and performance of your deployments not only helps your team react to issues, it also gives them the security to make changes with confidence. One of the best ways to gain this insight is with a robust monitoring system that gathers metrics, visualizes data, and alerts operators when things appear to be broken.

In this guide, we will discuss what metrics, monitoring, and alerting are. We will talk about why they are important, what types of opportunities they provide, and the type of data you may wish to track. We will be introducing some key terminology along the way and will end with a short glossary of some other terms you might come across while exploring this space.

What Are Metrics, Monitoring and Alerting?

Metrics, monitoring, and alerting are all interrelated concepts that together form the basis of a monitoring system. They have the ability to provide visibility into the health of your systems, help you understand trends in usage or behavior, and to understand the impact of changes you make. If the metrics fall outside of your expected ranges, these systems can send notifications to prompt an operator to take a look, and can then assist in surfacing information to help identify the possible causes.

In this section, we’ll take a look at these individual concepts and how they fit together.

What Are Metrics and Why Do We Collect Them?

Metrics represent the raw measurements of resource usage or behavior that can be observed and collected throughout your systems. These might be low-level usage summaries provided by the operating system, or they can be higher-level types of data tied to the specific functionality or work of a component, like requests served per second or membership in a pool of web servers. Some metrics are presented in relation to a total capacity, while others are represented as a rate that indicates the “busyness” of a component.

Often, the easiest metrics to begin with are those already exposed by your operating system to represent the usage of underlying physical resources. Data about disk space, CPU load, swap usage, etc. are already available, provide value immediately, and can be forwarded to a monitoring system without much additional work. Many web servers, database servers, and other software also provide their own metrics which can be passed forward as well.

For other components, especially your own applications, you may have to add code or interfaces to expose the metrics you care about. Collecting and exposing metrics is sometimes known as adding instrumentation to your services.

Metrics are useful because they provide insight into the behavior and health of your systems, especially when analyzed in aggregate. They represent the raw material used by your monitoring system to build a holistic view of your environment, automate responses to changes, and alert human beings when required. Metrics are the basic values used to understand historic trends, correlate diverse factors, and measure changes in your performance, consumption, or error rates.

What is Monitoring?

While metrics represent the data in your system, monitoring is the process of collecting, aggregating, and analyzing those values to improve awareness of your components’ characteristics and behavior. The data from various parts of your environment are collected into a monitoring system that is responsible for storage, aggregation, visualization, and initiating automated responses when the values meet specific requirements.

In general, the difference between metrics and monitoring mirrors the difference between data and information. Data is composed of raw, unprocessed facts, while information is produced by analyzing and organizing data to build context that provides value. Monitoring takes metrics data, aggregates it, and presents it in various ways that allow humans to extract insights from the collection of individual pieces.

Monitoring systems fulfill many related functions. Their first responsibility is to accept and store incoming and historical data. While values representing the current point in time are useful, it is almost always more helpful to view those numbers in relation to past values to provide context around changes and trends. This means that a monitoring system should be capable of managing data over periods of time, which may involve sampling or aggregating older data.

Secondly, monitoring systems typically provide visualizations of data. While metrics can be displayed and understood as individual values or tables, humans are much better at recognizing trends and understanding how components fit together when information is organized in a visually meaningful way. Monitoring systems usually represent the components they measure with configurable graphs and dashboards. This makes it possible to understand the interaction of complex variables or changes within a system by glancing at a display.

An additional function that monitoring systems provide is organizing and correlating data from various inputs. For the metrics to be useful, administrators need to be able to recognize patterns between different resources and across groups of servers. For example, if an application experiences a spike in error rates, an administrator should be able to use the monitoring system to discover if that event coincides with the capacity exhaustion of a related resource.

Finally, monitoring systems are typically used as a platform for defining and activating alerts, which we will talk about next.

What is Alerting?

Alerting is the responsive component of a monitoring system that performs actions based on changes in metric values. Alerts definitions are composed of two components: a metrics-based condition or threshold, and an action to perform when the values fall outside of the acceptable conditions.

While monitoring systems are incredibly useful for active interpretation and investigation, one of the primary benefits of a complete monitoring system is letting administrators disengage from the system. Alerts allow you to define situations that make sense to actively manage, while relying on the passive monitoring of the software to watch for changing conditions.

While notifying responsible parties is the most common action for alerting, some programmatic responses can be triggered based on threshold violations as well. For instance, an alert that indicates that you need more CPU to process the current load can be responded to with a script that auto-scales that layer of your application. While this isn’t strictly an alert since it doesn’t result in a notification, the same monitoring system mechanism can often be used to kick off these processes as well.

However, the main purpose of alerting is still to bring human attention to bear on the current status of your systems. Automating responses is an important mechanism for ensuring that notifications are only triggered for situations that require consideration from a knowledgeable human being. The alert itself should contain information about what is wrong and where to go to find additional information. The individual responding to the alert can then use the monitoring system and associated tooling like log files to investigate the cause of the problem and implementing a mitigation strategy.

Infrastructure of even moderate complexity requires distinctions in alert severity so that the responsible teams or individuals can be notified using methods appropriate to the scale of the problem. For instance, rising utilization of storage might warrant a work ticket or email, while an increase in client-facing error rates or unresponsiveness might require sending a page to on-call staff.

What Type of Information Is Important to Track?

The types of values you monitor and the information you track will probably change as your infrastructure evolves. Since systems usually function hierarchically, with more complex layers building on top of more primitive infrastructure, it can be useful to think about the metrics available at these different levels when planning your monitoring strategy.

Host-Based Metrics

Towards the bottom of the hierarchy of primitive metrics are host-based indicators. These would be anything involved in evaluating the health or performance of an individual machine, disregarding for the moment its application stacks and services. These are mainly comprised of usage or performance of the operating system or hardware, like:

These can give you a sense of factors that may impact a single computer’s ability to remain stable or perform work.

Application Metrics

The next category of metrics you may want to look at are application metrics. These are metrics concerned with units of processing or work that depend on the host-level resources, like services or applications. The specific types of metrics to look at depends on what the service is providing, what dependencies it has, and what other components it interacts with. Metrics at this level are indicators of the health, performance, or load of an application:

  • Error and success rates
  • Service failures and restarts
  • Performance and latency of responses
  • Resource usage

These indicators help determine whether an application is functioning correctly and with efficiency.

Network and Connectivity Metrics

For most types of infrastructure, network and connectivity indicators will be another dataset worth exploring. These are important gauges of outward-facing availability, but are also essential in ensuring that services are accessible to other machines for any systems that span more than one machine. Like the other metrics we’ve discussed so far, networks should be checked for their overall functional correctness and their ability to deliver necessary performance by looking at:

  • Connectivity
  • Error rates and packet loss
  • Latency
  • Bandwidth utilization

Monitoring your networking layer can help you improve the availability and responsiveness of both your internal and external services.

Server Pool Metrics

When dealing with horizontally scaled infrastructure, another layer of infrastructure you will need to add metrics for is pools of servers. While metrics about individual servers are useful, at scale a service is better represented as the ability of a collection of machines to perform work and respond adequately to requests. This type of metric is in many ways just a higher level extrapolation of application and server metrics, but the resources in this case are homogeneous servers instead of machine-level components. Some data you might want to track are:

  • Pooled resource usage
  • Scaling adjustment indicators
  • Degraded instances

Collecting data that summarizes the health of collections of servers is important for understanding the actual capabilities of your system to handle load and respond to changes.

External Dependency Metrics

Other metrics you may wish to add to your system are those related to external dependencies. Often, services provide status pages or an API to discover service outages, but tracking these within your own systems—as well as your actual interactions with the service—can help you identify problems with your providers that may affect your operations. Some items that might be applicable to track at this level are:

  • Service status and availability
  • Success and error rates
  • Run rate and operational costs
  • Resource exhaustion

There are many other types of metrics that can be helpful to collect. Conceptualizing the most important information at varying levels of focus can help you identify indicators that are most useful for predicting or identifying problems. Keep in mind that the most valuable metrics on higher levels are likely to be resources provided by lower layers.

Factors That Affect What You Choose to Monitor

For peace of mind, in an ideal world you would track everything related to your systems from the beginning in case an item may one day be relevant to you. However, there are many reasons why this might not be possible or even desirable.

A few factors that can affect what you choose to collect and act on are:

  • Resources available for tracking: Depending on your human resources, infrastructure, and budget, you will have to limit the scope of what you keep track of to what you can afford to implement and reasonably manage.
  • The complexity and purpose of your application: The complexity of your application or systems can have a large impact on what you choose to track. Items that might be mission critical for some software might not be important at all in others.
  • The deployment environment: While robust monitoring is most important for production systems, staging and testing systems also benefit from monitoring, though there may be differences in severity, granularity, and the overall metrics measured.
  • The likelihood of the metric being useful: One of the most important factors affecting whether something is measured is its potential to help in the future. Each additional metric tracked increases the complexity of the system and takes up resources. The necessity of data can change over time as well, requiring reevaluation at regular intervals.
  • How essential stability is: Simply put, stability and uptime might not be priorities for certain types of personal or early stage projects.

The factors that influence your decisions will depend on your available resources, the maturity of your project, and the level of service you require.

Important Qualities of a Metrics, Monitoring, and Alerting System

While each monitoring application or service will have its strengths and weaknesses, the best options often share some important qualities. A few of the more important characteristics to look for when evaluating monitoring systems are below.

Independent from Most Other Infrastructure

One of the most basic requirements of an adequate monitoring system is to be external to other services. While it’s sometimes useful to group services together, a monitoring system’s core responsibilities, its helpfulness in diagnosing problems, and its relationship to the watched systems means that it’s important for your monitoring system to be independently accessible. Your monitoring system will inevitably have some effect on the systems it monitors, but you should aim to keep this minimal to reduce the impact your tracking has on performance and to increase the reliability of your monitoring in the event of other system problems.

Reliable and Trustworthy

Another basic requirement is reliability. As a monitoring system is responsible for gathering, storing, and providing access to high value information, it is important that you can trust it to operate correctly on a daily basis. Dropped metrics, service outages, and unreliable alerting can all have an immediate harmful impact on your ability to manage your infrastructure effectively. This applies not only to the core software reliability, but also to the configuration you enable, since mistakes like inaccurate alerting can lead to a loss of trust in the system.

Easy to Use Summary and Detail Views

The ability to display high-level summaries and ask for greater detail on-demand is an important feature to ensure that the metrics data is useful and consumable to human operators. Designing dashboards that present the most commonly viewed data in an immediately intelligible manner can help users understand system state at a glance. Many different dashboard views can be created for different job functions or areas of interest.

Equally important is the ability to drill down from within summary displays to surface the information most pertinent to the current task. Dynamically adjusting the scale of graphs, toggling off unnecessary metrics, and overlaying information from multiple systems is essential to make the tool useful interactively for investigations or root cause analysis.

Effective Strategy for Maintaining Historical Data

A monitoring system is most useful when it has a rich history of data that can help establish trends, patterns, and consistencies over long timelines. While ideally, all information would be retained indefinitely in its original granularity, cost and resource constraints can sometimes make it necessary to store older data at a reduced resolution. Monitoring systems with the flexibility to work with data both at full granularity and in a sampled format provide a wider range of options for how to handle an ever increasing amount of data.

A related feature that is helpful is the ability to easily import existing data sets. If reducing the information density of your historic metrics is not an attractive option, offloading older data to a long-term storage solution might be a better alternative. In this case, you don’t need to maintain older data within the system, but you need to be able to reload it in bulk when you wish to analyze or use it.

Able to Correlate Factors from Different Sources

The monitoring system is responsible for providing a holistic view of your entire infrastructure, so it needs to be able to display related information, even if it comes from different systems or has different characteristics. Administrators should be able to glue together information from disparate parts of their systems at will to understand potential interactions and overall status across the entire infrastructure. Ensuring that time synchronization is configured across your systems is a prerequisite to being able to correlate data from different systems reliably.

Easy to Start Tracking New Metrics or Infrastructure

In order for your monitoring system to be an accurate representation of your systems, you need to be able to make adjustments as the machines and infrastructure change. A minimal amount of friction when adding additional machines will help you do so. Equally important is the ability to easily remove decommissioned machines without destroying the collected data associated with them. The system should make these operations as simple as possible to encourage setting up monitoring as part of the instance provisioning or retirement process.

A related ability that is important is the ease in which the monitoring system can be set up to track entirely new metrics. This depends on the way that metrics are defined in the core monitoring configuration as well as the variety and quality of mechanisms available to send metric data to the system. Defining new metrics is usually more complex than adding additional machines, but reducing the complexity of adding or adjusting metrics will help your team respond to changing requirements in an appropriate time frame.

Flexible and Powerful Alerting

One of the most important aspects of a monitoring system to evaluate is its alerting capabilities. Aside from very strict reliability requirements, the alerting system need to be flexible enough to notify operators through multiple mediums and powerful enough to be able to compose thoughtful, actionable notification triggers. Many systems defer the responsibility of actually delivering notifications to other parties by offering integrations with existing paging services or messenger applications. This minimizes the responsibility of the alerting functionality and usually provides more flexible options since the plugin just needs to consume an external API.

The part that the monitoring system cannot defer, however, is defining the alerting parameters. Alerts are defined based on values falling outside of acceptable ranges, but the definitions can require some nuance in order to avoid over alerting. For instance, momentary spikes are often not a concern, but sustained elevated load may require operator attention. Being able to clearly define the parameters for an alert is a requirement for composing a robust, trustworthy set of alert conditions.

Additional Terminology

As you explore the monitoring ecosystem, you’ll start to encounter a set of shared terminology that is frequently used to discuss characteristics of monitoring systems, the data being handled, and different trade offs that require consideration. While in no way exhaustive, the list below can help introduce you to some of the terms you’re most likely to come across.

  • Observability: Although not strictly defined, observability is a general term used to describe processes and techniques related to increasing awareness and visibility into systems. This can include monitoring, metrics, visualization, tracing, and log analysis.
  • Resource: In the context of monitoring and software systems, a resource is any exhaustible or limited dependency. What is considered a resource can vary greatly based on part of the system being discussed.
  • Latency: Latency is a measure of the time it takes to complete an action. Depending on the component, this can be a measure of processing, response, or travel time.
  • Throughput: Throughput represents the maximum rate of processing or traversal that a system can handle. This can be dependent on software or hardware design. Often there is an important distinction between theoretical throughput and practical observed throughput.
  • Performance: Performance is a general measure of how efficiently a system is completing work. Performance is an umbrella term that often encompasses work factors like throughput, latency, or resource consumption.
  • Saturation: Saturation is a measure of the amount of capacity being used. Full saturation indicates that 100% of the capacity is currently in use.
  • Visualization: Visualization is the process of presenting metrics data in a format that allows for quick, intuitive interpretation through graphs or charts.
  • Log aggregation: Log aggregation is the act of compiling, organizing, and indexing log files to allow for easier management, searching, and analysis. While separate from monitoring, aggregated logs can be used in conjunction with the monitoring system to identify causes and investigate failures.
  • Data point: A data point is a single measurement of a single metric.
  • Data set: A data set is a collection of data points for a metric.
  • Units: Units are the context for a measured value. A unit defines the magnitude, scope, or quantity of a measurement to understand extent and allow comparison.
  • Percentage Units: Percentage units are measurements that are taken as a part of a finite whole. A percentage unit indicates how much a value is out of the total possible amount.
  • Rate Units: Rate units indicate the magnitude of a metric over a constant period of time.
  • Time series: Time series data is a series of data points that represent changes over time. Most metrics are best represented by a time series because single data points often represent a value at a specific time and the resulting series of points is used to show changes over time.
  • Sampling rate: Sample rate is a measurement of how often a representative data point is collected in lieu of continuous collection. A higher sampling rate more accurately represents the measured behavior, but requires more resources to handle the extra data points.
  • Resolution: Resolution refers to the density of data points that make up a data set. Collections with higher resolutions over the same time frame indicate a higher sample rate and a more granular view of the same behavior.
  • Instrumentation: Instrumentation is the ability to track the behavior and performance of software. This is accomplished by adding code and configuration to software to output data that can then be consumed by a monitoring system.
  • The observer effect: The observer effect is the impact of the monitoring system itself on the phenomena being observed. Since monitoring takes up resources, the act of measuring behavior and performance will alter the values produced. Monitoring systems seek to avoid adding unnecessary overhead to minimize this impact.
  • Over-monitoring: Over-monitoring occurs when the quantity of metrics and alerts configured is inversely related to their usefulness. Over-monitoring can cause stress on the infrastructure, make it difficult to find relevant data, and cause teams to lose trust in their monitoring and alerting systems.
  • Alert fatigue: Alert fatigue is the human response of desensitivity that results from frequent, unreliable, or improperly prioritized alerts. Alert fatigue can cause operators to ignore severe problems and is usually an indication that alert conditions need to be reevaluated.
  • Threshold: When alerting, a threshold is the boundary between acceptable and unacceptable values which triggers an alert if exceeded. Often alerts are configured to trigger when a value exceeds the threshold for a certain period of time, in order to avoid sending an alert for temporary spikes.
  • Quantile: A quantile is a dividing point used to separate a dataset into distinct groups based on their values. Quantiles are used to put values into “buckets” that represent segments of a population of data. Often, this is used to separate common values from outliers to better understand what constitutes representative and extreme cases.
  • Trend: A trend is the general direction that a set of values is indicating. Trends are more reliable than single values in determining the general state of the component being tracked.
  • White-box monitoring: White-box monitoring is a term used to describe monitoring that relies on access to internal state of the components being measured. White-box monitoring can provide a detailed understanding of system state and is helpful for identifying causes of problems.
  • Black-box monitoring: Black-box monitoring is monitoring that observes the external state of a system or component by looking only at its inputs, outputs, and behavior. This type of monitoring can closely align with a user’s experience of a system, but is less useful for finding the cause of problems.


Gathering metrics, monitoring components, and configuring alerts is an essential part of setting up and managing production infrastructure. Being able to tell what is happening within your systems, what resources need attention, and what is causing a slowdown or outage is invaluable. While designing and implementing your monitoring setup can be a challenge, it is an investment that can help your team to prioritize their work, delegate the responsibility of oversight to an automated system, and understand the impact of your infrastructure and software on your stability and performance.

No comments

Добавить комментарий

Your e-mail will not be published. All fields are required.