Functional safety of the communication protocol standard FlexRay bus

Electronic systems have been used in automobiles for decades, and they have greatly improved the safety, energy saving and environmental performance of automobiles. With the deepening of research, many systems need to share and exchange information. In order to save cables, a distributed embedded system that relies on communication is formed. Currently, 90% of the world's systems use CAN bus-based systems. FlexRay is the de facto standard for next-generation communication protocols, and how functional security is critical.

1 IEC61508 functional safety requirements

Currently, vehicle control systems are transitioning to wire-by-wire technology (xbywire), such as wire-steering and wire-controlled brakes. The ultimate goal of the line control system is to eliminate mechanical backups, because eliminating these backups can reduce costs, increase design flexibility, expand the scope of application, and create conditions for new features. However, the elimination of mechanical backup has greatly increased the reliability of the electronic system. The car is a moving object, in a moving environment, it may hurt itself and others due to malfunction. By eliminating mechanical backups, the electronic system is upgraded from today's failsilent requirements to failoperaTIonal requirements.

Internationally, the functional safety requirements for industrial applications have established the standard IEC61508, which is mainly concerned with the safety of the controlled equipment and its control system. Although it is also suitable for automobiles, the car not only has the above-mentioned functional safety problems, but also concerns the safety of the entire vehicle system due to functional changes, so the automotive industry is developing the corresponding standard ISO26262. The functional safety level of the car is divided into 4 levels. The highest requirement is ASILD, and the corresponding failure probability is <10-8/h, which is equivalent to SIL3 of IEC61508. According to practical experience, the probability of failure assigned to communication is <10-10/h. A description of this can be found in the references.

The range of security-critical applications is now expanding, and some systems that were not counted before are now counted. For example, the seat adjustment subsystem in the safety pre-action system (presafe), the lighting control subsystem in the brake assist system, and the subsystem of the telemaTIc automatic call for assistance after collision will all be regarded as safety-critical systems.

1.1 Communication failures that cause system security risks

There are five manifestations of communication failures, the first being errors in the range. The second type is the error that causes time domain, which is the part of industry that is different from civilian use. If a message cannot be delivered before the scheduled time limit, it loses its practical significance. For example, sensor messages related to airbag detonation cannot be delivered within a few ms, causing safety problems. There is a third type of error in multicast or broadcast communication: data integrity error (Byzantine error), that is, the results received by each node are inconsistent. It causes systemic failures, and the strategy to deal with must consider all relevant nodes at the same time. The fourth type is system crash. In addition to hardware failure, there is also interference or software, such as babbling idiot to block communication. The fifth type is frame loss, short-term failure, such as recoverable offline or bug-induced equivalent offline state, and small group error.

1.2 Allowable failure rate of communication

In the analysis of the impact of communication failure on system security, the reference provides a method to calculate the length of communication failure based on the possible length of transient interference, and to introduce the system failure rate under the assumed communication failure rate. In this example, the interval of the electric field exceeding 100 V/m on the road section may cause communication failure, the failure rate is approximately 5 & TImes; 10-3, the vehicle speed is 90 km/h, and the identified possible failure time is about 74 s. The communication is in the period of 6 ms, and the frame loss for 7 consecutive cycles is regarded as system failure. Under this condition, the system failure rate is 1.640 9 & TImes; 10-10, which is considered to meet the security requirements of SIL4. This analysis method is effective, but the assumptions are too many, for example, the bit error rate has a large change interval; the change of the frame length affects the failure rate of one transmission; the assumption of the interference duration; the continuous loss of 7 frames is also For the application occasion, the loss of control of the vehicle of 90 km/h for 42 ms has a distance of about 1 m for the braking system. I am afraid that the consequences of the impact are completely different. It is also assumed that SIL4 is completely allocated to the communication, and the CPU is Part of the software-related failure rate is negligible. Today, the software is getting bigger and bigger, this assumption is unreasonable. On the other hand, when determining the system failure rate, other forms of communication failure should also be considered. For example, the time when a small group is wrong to conflict depends on the relative clock drift. The more accurate, the longer the time between failures, the longer the failure time. In the reference, it took 300 ms to artificially create a small group to find the conflict, far exceeding the above 42 ms. Therefore, the general discussion of system security articles separately stipulates that the communication failure rate is 1/100 of the corresponding security level failure rate.

1.3 Factors Affecting Communication Failure Rate

The functional safety level is related to the coverage of fault detection. If some faults are not detected (not recognized or can not be achieved), of course, the failure scenario cannot be counted, and the division of the safety level is wrong.

The reference introduces the concept of SFF (Safety Failure Fraction): failure is divided into hazard-based failures and safety failures, which are divided into two types: detectable and undetected. The safety failure ratio SFF is the share of the total failure that can be detected as a hazard failure and a safety failure. Diagnostic Coverage (DC) is the share of hazard failures that can be detected as a total hazard failure. It can be derived that SFF has a linear relationship with DC. SFF is related to SIL. The SIL rating of IEC61508 is related to SFF. SIL3 can tolerate one fault when SFF accounts for 90%~99%. Therefore, DC also determines the SIL level that can be achieved. According to the article, the probability of transient faults is two orders of magnitude larger than the probability of hardware failure. Therefore, it can be roughly inferred that the coverage of transient fault diagnosis should reach 90%~99%. Hazard failure may be caused by communication failure, and diagnostic coverage becomes an important part of evaluating communication protocols.

In the communication, because the CRC has missed detection, this is an obvious diagnosis of the uncovered area, and the diagnosis of the uncovered rate is equivalent to the missed frame miss detection rate, such as the CAN frame miss detection.

A value domain error or a time domain error occurs in communication and a frame loss is a hazard that can be diagnosed (this is the main object of this analysis). False mistakes, Byzantine mistakes, etc. should be undetected hazards. When a small group error occurs, both frame loss and Byzantine error may occur. The equivalent off-line failure of CAN is also a hazard failure caused by an uncovered diagnosis. It is still difficult to calculate the proportion of hazard failures caused by these uncovered diagnoses to total hazard failures, because it is difficult to determine the probability of failure model. However, qualitatively, only the diagnosis of counterfeiting, Byzantine error and small group mistakes can be made to improve the diagnostic coverage (increased SIL level).

2 Introduction to FlexRay

Because the wire control technology can improve the handling performance of the car, reduce the production and use cost, improve safety, energy saving, environmental protection and comfort, it becomes an important part of the progress of the whole vehicle technology. However, in order to eliminate the mechanical or hydraulic backup, the requirements for the reliability of the control device and its communication are greatly improved. This has stricter requirements on the bandwidth and certainty of communication. The CAN bus cannot meet this bandwidth requirement and is not sufficient in certainty, so the FlexRay technology is generated. According to the standard, FlexRay can have topologies such as bus, star, and tree. It provides a two-channel controller structure that can be configured for redundant communication or for independent operation of each channel with great flexibility. Each channel can be configured up to 10 Mb/s. FlexRay is a time-triggered communication protocol that is synchronized by a distributed clock. The system's schedule is determined by cycle\\static slot\\minislot. A cycle has a fixed number of static slots and minislots, and their durations are equal, as determined by configuration. A node can occupy multiple static slots in a cycle. The static slot can be multiplied. That is, the same static slot of each cycle can be used for different nodes. The data field of the FlexRay frame can be up to 254 bytes. Its header is control information such as identifier and frame length. It has an independent CRC check, and the tail has a 24-bit CRC check covering the full frame. FlexRay has a Bus Guardian design against time domain errors.

Regarding the shortcomings or weaknesses of FlexRay, the reference mentions the difficulty of physical layer connection, affecting signal integrity. In fact, it is easier to use active star type, but this brings cost improvement; cycle design constraints are many, with It is difficult; synchronization and startup node configuration is related to fault tolerance, which is a challenge; due to limited resources, it is difficult to upgrade and evolve (not like the composability advantage of the time-triggered protocol as before - the author's note). The reference describes the possibility of generating separate clock synchronization small groups in FlexRay, which means that although each node is communicating, there is no effective communication between the two groups, which is a fault condition. The solution is to use 3 cold start nodes and 3 synchronous nodes, but this contradicts the requirement of time synchronization fault tolerance. There is also the need to fill the schedule to avoid the formation of small groups, which is also in conflict with the requirement to leave room for future upgrades. In short, there is no complete solution. Then there is a clock that may produce a drift in the same direction, and the difference between the application clock causes the frame to fail to be ready or the cover causes a missing frame. Although FlexRay is designed for high credibility, after the error in the transmission, the processing is solved through the application layer, which brings new problems. This article will analyze what happens if it is not processed.

3 Functional safety levels for Audi and BMW FlexRay bus applications

BMW and Audi were the first to use the FlexRay bus in bulk, and their specific usage has not yet been found, but the reference gives some of the parameters used for some preliminary analysis.

3.1 Audi parameters

Audi's cycle is 5 ms, each cycle has 62 static slots, and the slot is used to transmit a 42-byte payload frame with a static segment of 4.03 ms. There are 8 ECUs that transmit a total of 220 Protocol Data Units (PDUs). These PDUs are combined and finally transmitted in 27 slots. According to the provided periodic distribution, there are 8 5 ms messages, 1 10 ms messages, 7 20 ms messages, and 6 40 ms messages. The remaining longer period messages are ignored first.

The payload can be calculated to use a frame length of 500 bits. Assuming a bit error rate of ber=1×10-7 (this is quite good in copper), then the frame error rate is fer=5×10-5. /frame.

The number of frames per hour that can be calculated from the period is n=7.92×105frame/h. Assuming that the communication is transmitted simultaneously with 2 channels, the probability of simultaneous failure is fer2=2.5×10-9/frame. The probability that all frames are successfully transmitted within 1 hour is: P = (1-fer2)n.

The probability of having one or more errors in one hour is 1-P≈fer2×n=2.5×10-9×7.92×105/h=1.98×10-3/h. The security level requirement of SIL2 is that the system failure probability is 10-7/h, and the communication is 10-9/h, which shows that there is a huge gap.

We offer Phone Glass Protector, iPhone Glass Screen Protector, Tempered Glass for iPhone

Features:

 

 

 

 

 

Feature

Thickness: 0.33mm

 Hardness: 9H

Anti-oil, easy absorption

Super high-definition for true color display, enjoy your visual feast

Our cell phone protective films can anti-smudge and anti-fingerprints protection

Anti-scratch and explosion-proof

Provide supreme and nice appearance , our tempered glass protector make people feel comfortable and smooth grip

Bubble-free and easily absorb

Lead time

7-9 working days/ Sample lead time : 1-4 days

Package

crystal boxes and Blister box packing available

Payment

T/T, Western Union, L/C, PayPal 

 

Glass Screen Protector

Glass Screen Protector,Tempered Glass Screen Protector,Anti Fingerprint Glass Screen Protector,Iphone 6 Plus Screen Protector,Iphone 6 Screen Protector,Iphone 6S Screen Protector,Iphone Glass Screen Protector,Tempered Glass For Iphone

Hebei Baisiwei Import&Export Trade Co., LTD. , https://www.baisiweicable.com

Posted on