Twitter is an excellent source of data for NLP researches as it offers tremendous amount of very useful textual information. Using nonstandard words and combining multiple languages in a single tweet called code-mixed is common among Twitter data, due to its characteristics where Twitter is written with informal manner. Several studies have addressed nonstandard words or code-mixed issues, but to the best of our knowledge, there is no study that addresses those problems on Indonesian-English code-mixed data. In this study, we created a pipeline to normalize Indonesian-English code-mixed data, comprised of four modules i.e tokenization, language identiﬁcation, lexical normalization, and translation. In an effort to initiate the task of normalizing code-mixed data especially in domain Indonesian-English, we also created 501 corpora of Indonesian-English code-mixed gold standards including gold standard for the four modules in our pipeline.