摘要
联邦学习是一种新型的分布式学习框架,它允许在多个参与者之间共享训练数据而不会泄露其数据隐私。但是这种新颖的学习机制仍然可能受到来自各种攻击者的前所未有的安全和隐私威胁。本文主要探讨联邦学习在安全和隐私方面面临的挑战。首先,本文介绍了联邦学习的基本概念和威胁模型,有助于理解其面临的攻击。其次,本文总结了由内部恶意实体发起的3种攻击类型,同时分析了联邦学习体系结构的安全漏洞和隐私漏洞。然后从差分隐私、同态密码系统和安全多方聚合等方面研究了目前最先进的防御方案。最后通过对这些解决方案的总结和比较,进一步讨论了该领域未来的发展方向。
随着机器学习方法在绝大多数识别相关的领域展现出了明显的优
为了解决这一问题,联邦学
尽管联邦学习具有上述明显优点,由于以下3方面主要原因,其安全与隐私的问题依然存
已有的研究结果表明,在联邦学习中学习模型的安全性和用户的隐私信息会受到一系列被动和主动攻击的威
联邦学习是一种由谷
由于联邦学习与传统的机器学习方法相比在模型训练阶段具有显著的优势,本文将重点分析和研究该阶段的主要特性。完整的联邦学习模型训练阶段主要包括以下步
在联邦学习过程中将迭代地执行上述步骤以达到优化当前全局模型的目的,整个迭代过程将在全局模型参数满足收敛条件时停止。在实际应用时,小批量随机梯度下降法(Mini⁃batch stochastic gradient descent,MSGD)非常适合建立联邦优化算法,其本地训练策略为
(1) |
式中:代表第轮通信轮次中服务器分发给用户的本地模型参数;代表本地训练批量大小;代表学习速率。
例如,在谷歌的FedSGD算
(2) |
式中:代表第轮通信轮次中服务器端的全局模型参数;代表本轮次中选取的参与者数量;代表服务器接收到用户发出的本地模型更新参数。
在联邦学习场景中,攻击行为不但可以由不受信任的服务器发起也有可能由恶意用户发
无论在被动攻击还是主动攻击场景中,攻击者的主要目的都是破坏学习模型具有的基本性能,主要包括机密性、完整性和可用
对于恶意用户,可以通过全局模型参数模拟其他用户的训练样本,例如另外部署一个生成式对抗网络(Generative adversarial net, GAN)便能实现这种策略。与此同时,恶意用户还可以完全控制本地训练过程,继而修改模型超级参数(例如批量大小、epoch数量和学习速率)或本地模型更更新(例如本地模型训练结果)。对于不可信的服务器,他们可能会根据每个参与者上传的参数推断出一些非预期的信息(例如梯度变化和真值标签)。此外,不受信任的服务器还可以与一部分恶意用户串通,实现对其他用户的细粒度隐私信息的窃取(例如特定用户的梯度
目前,联邦学习中的攻击主要来自参与联邦学习过程的内部攻击者和独特的模型训练策略。首先,由于本地模型参数中隐含了用户的相关信息,一旦这些参数被发送到服务器进行联邦平均后,用户的敏感信息就很可能被泄露给服务器。其次,由于全局模型参数会进行共享,那么用户的私有信息也可能被泄露给其他联邦学习参与者。最后,由于联邦学习中的服务器不可访问用户的本地数据,对于整个学习过程便无法判定各用户所上传的本地更新是否是通过正确的执行学习协议而生成,伪造的本地更新便很难被察觉。联邦学习中主要的攻击类型如
中毒攻击在集中式学习场景的训练阶段中已取得到了不错的研究成
标签翻转攻击是指恶意用户通过翻转样本标签,将预先准备好的攻击点嵌入到训练数据中,便可使训练后的模型偏离既定的预测边

图1 在联邦学习中利用标签翻转进行投毒攻击
Fig.1 Poisoning attack with label-flipping in federated learning
由于攻击者是作为普通用户参与到整个联邦学习过程中的,他所进行的模型结构、全局参数等知识的获取都将被视为正常行为。与此同时,由于整个训练过程都是在本地执行的,服务器无法监督这些训练过程,因此很难检测检查攻击者的训练样
(3) |
式中:代表攻击者的中毒更新参数,代表比例因子。
与标签翻转攻击不同,后门攻击需要攻击者在其精心设计的训练数据上使用一些特定的隐藏模式来训练目标深度神经网络(Deep neural network, DNN)模型。这些模式称为“后门触发器”,他们可以干预学习模型在预测阶段生成与真实情况大相径庭的结

图2 后门攻击
Fig.2 An explanation of backdoor attack
对于联邦学习框架,攻击者可以利用后门数据对本地模型进行训练,并提交按比例缩小的训练结果以达到增强后门在全局模型中影响的目
基于以下原因,联邦学习框架面对这种投毒攻击时非常脆弱:(1)联邦学习系统中存在大量的参与者,很可能包含一个或多个恶意用户。(2)由于用户的本地数据和训练过程对服务器是不可见的,所以无法验证特定用户所上传更新的可靠性。(3)由于不同用户生成的本地更新可能会存在很大差异,给用户更新异常检测过程将带来巨大挑战。
Hitaj
如
(4) |
式中:代表原始数据分布;代表随机向量的分布。生成的样本将被错误标记并输入本地模型达到更新全局模型参数的目的。通过这种方式,受害者将被强制对更多的样本进行本地训练,实现对正确与错误训练样本的区分,这对迭代式地改进鉴别器有很大的好处。用户端GAN攻击的强大之处在于,攻击者可以在不损害任何实体的前提下隐秘地完成所有的恶意行为,并可伪装成正常的参与者顺利地执行所建立的协议。

图3 联邦学习中基于用户端的GAN攻击
Fig.3 User-side GAN-based attack in federated learning
服务器端GAN攻击依然暴露出了一些局限性。首先,错误标记的训练样本不仅会破坏全局模型,还会打破被攻击用户数据采样的平衡。其次,由于经过模型平均后错误标记样本的对抗性影响会大幅削弱,这种攻击在联邦学习中的效果将变得较差。第三,由于攻击者只能通过访问中央服务器来获取聚合和平均后的模型更新,所以这种攻击在类级样本重构阶段也同样受到了限制。
为了应对上述局限性,Wang等引入了一种基于服务器端GAN攻击的方法用于推断用户级样

图4 联邦学习中基于GAN的服务器端攻击
Fig.4 Server-side GAN-based attack in federated learning
正如上文所述,联邦学习机制要求所有参与者通过在本地数据集上训练全局模型用于上传梯度。在这种情况下,如果联邦学习系统中存在一个非可信且知识丰富的服务器,则无法保证用户的隐私数据信息。这种非可信服务器可以获取大量涉及每个参与者的局部训练模型的辅助知识(如模型结构、用户身份和梯度),并且有足够的能力进行用户隐私信息泄露。例如,Aono等设计了一种服务器端推理攻
最近,Melis
首先,上文所提到的隐私威胁的影响是非常巨大的,因为攻击者只需要伪装成普通实体加入联邦学习系统,并秘密地执行恶意活动就可产生攻击效果。其次,对于这两种基于GAN的攻击,很难区分生成的样本和来自同一类的训练输入。这是因为生成的样本只是在视觉上与目标训练数据相似,而无法用精确的数据体系。第三,攻击者在基于GAN的用户端攻击中,可以通过上传被覆盖的模型更新来强制攻击目标用户释放更多的敏感信息。然而,这种攻击方式只有在目标类的所有训练样本都属于攻击目标用户的情况下才能有效实施。
本节将从差分隐私(Differential privacy, DP)、同态密码系统(Homomorphic cryptosystem, HC)和安全多方聚合(Secure multi⁃party aggregation, SMA)3个角度,简要介绍几种具有可行性的策略,用于构建安全并可以保护用户隐私的联邦学习环境。本节所涉及到的解决方案如
Shen等提出了一种间接协作的深度学习框
对抗性训练是一种主动防御技术,在这种防御技术的训练阶段模型就开始猜测对手攻击的所有排列,使机器学习模型对已知的对抗性攻击具有鲁棒性。文献[
DP是工业界和学术界广泛使用的隐私保护技术。DP保护隐私的主要概念是给私有敏感属性添加相应的噪声。因此,每个用户的隐私都能受到保护。与此同时,与增加的隐私保护能力相比,为每个用户的附加噪声所造成的统计数据质量损失微不足道。在联邦学习中,为了避免逆向数据检索,引入DP向参与者上传的参数中添加噪声。
DP机制已广泛应用于数据发布系统,主要是通过在数据集中加入随机噪声(如拉普拉斯噪声或高斯噪声),将数据查询操作的实际结果隐藏起来。在深度学习的背景下,DP可以作为本地隐私解决方案来保护用户梯度的私密性。Abadi
为了保证多个参与者在共同计算一个模型或函数时的输入数据的具有安全性,安全多方计算(Secure multi⁃party computation, SMC)的概念应运而生。安全多方计算中多个参与者间的通信是具有安全性的,并且通过加密方法加以保护。最近,安全多方计算也被用来保护客户端所上传的更新数据的安全。与传统的安全多方计算不同,联邦学习算法只需对参数进行加密,而不需要大量的输入,大大提高了计算效率。这种性能特点使得安全多方计算成为联邦学习环境下的首选技术。
Aono
Bonawitz等提出了一种实用的安全聚合协
总体而言,可以从以下两个方面控制安全与隐私的威胁:结合安全方法或者改变学习策略。例如,将DP,HC,SMA方法与联邦学习相结合,可以保证用户训练数据的安全性和私密性。然而,必须考虑安全机制在联邦学习过程产生的负面影响,例如DP的隐私开销、加密系统的计算复杂度、多方聚合的通信开销等因素。在今后的研究过程中,可以设计一个隐藏的机制来阻止攻击者对其他参与者的学习结果进行估计,这是由于隐私泄露取决于对目标类较高的模型准确性。此外,应深入研究身份验证和本地训练完整性机制,实现对每个参与者可信度的验证。
在联邦学习安全和隐私问题的探索以及未来研究方向上,本文针对投毒攻击类型,提出了基于对抗生成网络的数据生成方法,进一步实现用户端的主动式投
本文重点讨论了联邦学习中的安全和隐私挑战,揭示了在参与者和服务器之间共享模型参数的独特特性可能会带来前所未有的安全和隐私挑战。本文还介绍了由内部攻击者发起的3种不同攻击类型,并指出了这些攻击能够成功构建的原因,还总结了近年来在这一领域已有的防御对策,为构建一个安全、隐私的联邦学习系统提供了新的研究方向。
参考文献
Ribeiro M, Grolinger K, Capretz M A M. MLaaS: Machine learning as a service[C]//Proceedings of 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). [S.l.]: IEEE, 2015: 896-902. [百度学术]
Ge Z, Song Z, Ding S X, et al. Data mining and analytics in the process industry: The role of machine learning[J]. IEEE Access, 2017, 5: 20590-20616. [百度学术]
Waring J, Lindvall C, Umeton R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare[J]. Artificial Intelligence in Medicine, 2020, 104: 101822. [百度学术]
Lopez K L, Gagne C, Gardner M A. Demand-side management using deep learning for smart charging of electric vehicles[J]. IEEE Transactions on Smart Grid, 2018, 10(3): 2683-2691. [百度学术]
Lin W Y, Hu Y H, Tsai C F. Machine learning in financial crisis prediction: A survey[J]. IEEE Transactions on Systems Man and Cybernetics, 2012, 42(4): 421-436. [百度学术]
Zhou L, Pan S, Wang J, et al. Machine learning on big data: Opportunities and challenges[J]. Neurocomputing, 2017, 237(10): 350-361. [百度学术]
Papernot N, Mcdaniel P, Sinha A, et al. Towards the science of security and privacy in machine learning[J]. arXiv, 2016, 16: 11-19. [百度学术]
Yang Q, Liu Y, Chen T, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems, 2019, 10(2): 12.1-12.19. [百度学术]
Barreno M, Nelson B, Joseph A D, et al. The security of machine learning[J]. Machine Learning, 2010, 81(2): 121-148. [百度学术]
Hunt T, Song C, Shokri R, et al. Privacy-preserving machine learning as a service[J]. Proceedings on Privacy Enhancing Technologies, 2018(3): 123-142. [百度学术]
Shokri R, Shmatikov V. Privacy-preserving deep learning[C]//Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. Denver, CO, USA: ACM, 2015: 1310-1321. [百度学术]
Mcmahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data[C]//Proceedings of AISTATS. Fort Lauderdale, USA: JMLR, 2017: 1-10. [百度学术]
Konen J, Mcmahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency[J]. arXiv, 2016, 16: 1-10. [百度学术]
Nishio T, Yonetani R. Client selection for federated learning with heterogeneous resources in mobile edge[C]//Proceedings of ICC. Shanghai, China: IEEE, 2019: 1-7. [百度学术]
Li T, Sahu A K, Talwalkar A, et al. Federated learning: Challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50-60. [百度学术]
Wang S, Tuor T, Salonidis T, et al. When edge meets learning: Adaptive control for resource-constrained distributed machine learning[C]//Proceedings of INFOCOM. Paris, France: IEEE, 2019: 63-71. [百度学术]
Tran N H, Bao W, Zomaya A, et al. Federated learning over wireless networks: Optimization model design and analysis[C]//Proceedings of INFOCOM. Paris, France: IEEE, 2019: 1387-1395. [百度学术]
Jagielski M, Oprea A, Biggio B, et al. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning[C]//Proceedings of 2018 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, 2018: 19-35. [百度学术]
Wang B, Yao Y, Shan S, et al. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks[C]//Proceeding of 2019 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, 2019: 707-723. [百度学术]
Yuan X, He P, Zhu Q, et al. Adversarial examples: Attacks and defenses for deep learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(9): 2805-2824. [百度学术]
Hitaj B, Ateniese G, Perez-Cruz F. Deep models under the GAN: Information leakage from collaborative deep learning[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, TX, USA: ACM, 2017: 603-618. [百度学术]
Wang Z, Song M, Zhang Z, et al. Beyond inferring class representatives: User-level privacy leakage from federated learning[C]//Proceedings of IEEE INFOCOM Conference on Computer Communications. Paris, France: IEEE, 2019: 2512-2520. [百度学术]
Yang K, Jiang T, Shi Y, et al. Federated learning via over-the-air computation[J]. IEEE Transactions on Wireless Communications, 2020, 19(3): 2022-2035. [百度学术]
Wang X, Han Y, Wang C, et al. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning[J]. IEEE Network, 2019, 33(5): 156-165. [百度学术]
Kim H, Park J, Bennis M, et al. Blockchained on-device federated learning[J]. IEEE Communications Letters, 2019, 24(6): 1279-1283. [百度学术]
Samarakoon S, Bennis M, Saady W, et al. Distributed federated learning for ultra-reliable low-latency vehicular communications[J]. IEEE Transactions on Communications, 2019, 68(2): 1146-1159. [百度学术]
Truex S, Baracaldo N, Anwar A, et al. A hybrid approach to privacy-preserving federated learning[C]//Proceeding of the 12th ACM Workshop on Artificial Intelligence and Security. London, UK: ACM, 2019: 1-11. [百度学术]
Robin C G, Tassilo K, Moin N. Differentially private federated learning: A client level perspective[C]//Proceedings of NIPS. Long Beach, CA, USA: MIT Press, 2017. [百度学术]
Xu G, Li H, Liu S, et al. VerifyNet: Secure and verifiable federated learning[J]. IEEE Transactions on Information Forensics and Security, 2019, 15: 911-926. [百度学术]
Lim W Y B, Luong N C, Hoang D T, et al. Federated learning in mobile edge networks: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2020, 22(3): 2031-2063. [百度学术]
Lu Y, Huang X, Dai Y, et al. Blockchain and federated learning for privacy-preserved data sharing in industrial IoT[J]. IEEE Transactions on Industrial Informatics, 2019, 16(6): 4177-4186. [百度学术]
Le T P, Aono Y, Hayashi T. Privacy-preserving deep learning: Revisited and enhanced[C]//Proceedings of International Conference on Applications and Techniques in Information Security. Singapore: [s.n.], 2017: 100-110. [百度学术]
Melis L, Song C, de Cristofaro E, et al. Exploiting unintended feature leakage in collaborative learning[C]//Proceedings of 2019 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, 2019: 691-706. [百度学术]
Xiao H, Biggio B, Nelson B, et al. Support vector machines under adversarial label contamination[J]. Neurocomputing, 2015, 160: 53-62. [百度学术]
Zhang J, Chen J, Wu D, et al. Poisoning attack in federated learning using generative adversarial nets[C]//Proceedings of IEEE Trustcom. Rotorua, New Zealand: IEEE, 2019: 374-380. [百度学术]
Bhagoji A N, Chakraborty S, Mittal P, et al. Analyzing federated learning through an adversarial lens[C]//Proceedings of ICML. Long Beach California, USA: ACM, 2019: 634-643. [百度学术]
Zhao Y, Chen J, Zhang J, et al. PDGAN: A novel poisoning defense method in federated learning using generative adversarial network[C]//Proceedings of ICA3PP. Melbourne, VIC, Australia: IEEE, 2019: 595-609. [百度学术]
Bagdasaryan E, Veit A, Hua Y, et al. How to backdoor federated learning[C]//Proceedings of AISTATS. Palermo, Sicily, Italy: JMLR, 2020: 2938-2948. [百度学术]
Xie C, Huang K, Chen PY, Li B. DBA: Distributed backdoor attacks against federated learning[C]//Proceedings of ICLR. Addis Ababa: IEEE, 2020. [百度学术]
Nasr M, Shokri R, Houmansadr A. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning[C]//Proceeding of IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, 2019: 739-753. [百度学术]
Truex S, Liu L, Gursoy M E, et al. Demystifying membership inference attacks in machine learning as a service[J]. IEEE Transactions on Services Computing, 2019, 1: 1. [百度学术]
Shen S, Tople S, Saxena P. Auror: Defending against poisoning attacks in collaborative deep learning systems[C]//Proceedings of the 32nd Annual Conference on Computer Security Applications. Los Angeles, CA, USA: IEEE, 2016: 508-519. [百度学术]
Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Vienna, Austria: ACM, 2016: 308-318. [百度学术]
Aono Y, Hayashi T, Wang L, et al. Privacy-preserving deep learning via additively homomorphic encryption[J]. IEEE Transactions on Information Forensics and Security, 2017, 13(5): 1333-1345. [百度学术]
Bonawitz K, Ivanov V, Kreuter B, et al. Practical secure aggregation for privacy-preserving machine learning[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, TX, USA: ACM, 2017: 1175-1191. [百度学术]
Cao D, Chang S, Lin Z, et al. Understanding distributed poisoning attack in federated learning[C]//Proceeding of the 25th International Conference on Parallel and Distributed Systems (ICPADS). Tianjing, China: IEEE, 2019: 233-239. [百度学术]
Florian T, Kurakin A, Papernot N, et al. Ensemble adversarial training: Attacks and Defenses[EB/OL].(2017-05-07)[2020-03-10]. https://arxiv.org/abs/1705.07204. [百度学术]
Cretu G F, Stavrou A, Locasto M E, et al. Casting out demons: Sanitizing training data for anomaly sensors[C]//Proceedings of the 2008 IEEE Symposium on Security and Privacy. Oakland California, USA: IEEE, 2008: 81-95. [百度学术]
Zhang J, Zhao Y, Wu J, et al. LVPDA: A lightweight and verifiable privacy-preserving data aggregation scheme for edge-enabled IoT[J]. IEEE Internet of Things Journal, 2020, 7(5): 4016-4027. [百度学术]
Zhang J, Zhao Y, Wang J, et al. FedMEC: Improving efficiency of differentially private federated learning via mobile edge computing[J]. Mobile Networks and Applications, 2020, 1: 13. [百度学术]
Zhang J, Chen B, Yu S, et al. PEFL: A privacy-enhanced federated learning scheme for big data analytics[C]//Proceedings of 2019 IEEE Global Communications Conference (GLOBECOM). Waikoloa, HI, USA: IEEE, 2019: 1-6. [百度学术]
Bonawitz K, Ivanov V, Kreuter B, et al. Practical secure aggregation for federated learning on user-held data[C]//Proceedings of NIPS. Barcelona, Spain: MIT Press, 2016. [百度学术]
Sattler F, Wiedemann S, Mülle K R, et al. Robust and communication-efficient federated learning from non-iid data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(9): 3400-3413. [百度学术]
Mowla N I, Tran N H, Doh I, et al. Federated learning-based cognitive detection of jamming attack in flying AD-HOC network[J]. IEEE Access, 2020, 8: 4338-4350. [百度学术]
Huang X, Ding Y, Jiang Z L, et al. DP-FL: A novel differentially private federated learning framework for the unbalanced data[J]. World Wide Web, 2020, 23: 2529-2545. [百度学术]
Zhao R, Yin Y, Shi Y, et al. Intelligent intrusion detection based on federated learning aided long short-term memory[J]. Physical Communication, 2020, 42: 101157. [百度学术]
Kaissis G A, Makowski M R, Daniel R, et al. Secure, privacy-preserving and federated machine learning in medical imaging[J]. Nature Machine Intelligence, 2020, 2: 305-311. [百度学术]
Chen H, Li H, Xu G, et al. Achieving privacy-preserving federated learning with irrelevant updates over e-health applications[C]//Proceedings of ICC 2020 - 2020 IEEE International Conference on Communications (ICC). Dublin, Ireland: IEEE, 2020. [百度学术]