Member-only story
Summary of the paper “Harnessing the Power of LLM to Support Binary Taint Analysis”
arXiv, 2023
Paper citation: Liu, Puzhuo, Chengnian Sun, Yaowen Zheng, Xuan Feng, Chuan Qin, Yuncheng Wang, Zhi Li, and Limin Sun. “Harnessing the power of llm to support binary taint analysis.” arXiv preprint arXiv:2310.08275 (2023).

Brief Summary:
Use LLM (almost exclusively) to do taint analysis.
Main idea: Identifying sources, sinks, and propagation flow rules is hard, LLMs are better suited for this task.
Approach:
- LLM task: Identify sinks using LLM (GPT-4), give names of all external method calls (APIs) to LLM (presumably one at a time) and ask if it can be a sink for sensitive info or not.
- LLM task: Identify sources using the same approach as the sinks.
- Analysis task: identify the line of code that calls a source or sink.
- Analysis task: from each sink, do static backward call graph construction to identify call trace, only include caller X if there’s a data flow from X’s parameters to callee’s parameters.
- Analysis task: mark all sources on all extracted call graph nodes.
- Analysis task: generate call chain from source to sink…