Software and data culture for AI system integration

要導入AI需要軟體+數據資料的文化

台灣業界長期並不將軟體工程與資料科學視為顯學,比較擅長晶片製造和硬體工廠生產,然而在新一波的AI浪潮,台灣能做產業轉型跟上時代的腳步嗎?公司上層領導者的思維邏輯能不能變通、持續終身學習,將會決定台灣能否追上世界趨勢。AI本質是軟體,它分析的是資料。

In Taiwan, the industry doesn’t value much on software engineering and data science. More companies are good at chip manufacturing and hardware production. However, in the coming AI trend, is Taiwanese companies ready for catching up the trend? Whether the mindset of top managers can be changed and managers keep learning will decide whether Taiwan can catch up the world trend. Essentially, AI is about software, it analyzes data.
sdc_pic1_ml_vs_manufacturing
Fig 1. Machine Learning V.S. Manufacturing ,圖片來源

資料科學(機器學習研究)是奠基在 ML Engineer 與 Data Engineer 之上,還不包含最普遍的 Software Engineer。Data Engineer是整個的基礎,負責把撒在外面的IoT devices、網路平台、App使用者操作紀錄給完整的存到一個Data Lake。台灣人工智慧學校最近所做的AI人才概況調查,資料品質不佳、取得不易是最常遇到的難題。之前Jason在eBay工作時所待的Analytics Platform部門有200多人在圍繞著維護Data平台,在處理資料、清資料、做ETL、A/B testing、開發ML模型…等。

Data Science (Machine Learning Research) stepped on ML Engineer and Data Engineer. It even not includes the most commonly seen Software Engineer. Data Engineer is the foundation, he takes care of collecting user behavior data from IoT devices, web platform, and App. The data is stored in Data Lake. Taiwan AI Academy recently published a survey. Data quality and hard-to-get data are the most seen difficulties. Jason stayed in Analytics Platform team while he was a software engineer. There were more than 200 people in that team, who are responsible for maintaining Data Platform. People were doing data collection, cleaning data, ETL jobs, A/B testing, ML model development,…etc.

sdc_pic2_ml_researcherFig 2. Machine Learning Skills Pyramid,圖片來源

所以在人工智慧之前,不妨先想想「工人智慧」可以做到什麼:比如影像監視、圖片物體辨識、語音辨識、自動駕駛,請人類來做,可以提供有價值的服務嗎?如果提供的價值能夠讓服務品質升級,就可以考慮透過AI來自動化。

Before introducing AI, why not thinking what “human intelligence” can do? Take the followings as examples, video surveillance, image object identification, audio recognition, and auto-driving. If we let human do those things, can we provide valuable service? If the value coming out from service can make quality improvement, we can start to automate the process through AI.

台灣每年大量的資訊科系畢業生卻都以投入硬體產業鏈居多,直到2018年國際科技公司如Google, Microsoft, Amazon, IBM, Line, Oath Yahoo開始招聘AI工程師,希望這能帶來一些變化。在AI這個新領域裡台灣有多年Software+Data經驗的人不多,所以剛投入AI的工程師很多時候還會花時間打滾摸索,跌跌撞撞,踩洞,管理者即使有AI意識,也聘用了AI engineer/researcher,在遇到進度緩慢、效果不彰、成果不如預期時,往往也不知道該如何改善,就以計劃失敗收場。

In Taiwan, there are a lot of well-trained Computer Science graduates every year. However, most of them join hardware industry. In 2018, international Tech companies, such as Google, Microsoft, Amazon, IBM, Line, and Oath Yahoo started hire AI engineers. Hopefully, this can bring some changes. In AI field, there are not many people with Software and Data experience. For new hire AI engineers, they might spend time to bump into several problems. Even if for managers with AI consciousness, hiring AI engineers/researchers, when they see slowing progress, bad effect, and bad results. A lot of times, they don’t know how to improve it. Therefore, the project was failed.

有人擔心台灣是小國,沒有什麼資料優勢,如果你有看全球資料增長的速度,很多資料是最近才開始蒐集的,而且呈現指數型成長,過去即使有蒐集,也不夠乾淨來分析,所以別說資料不夠了。

Some people might worry about that Taiwan is a small country. There is no data advantages. If you look at the growth speed of data in the world, a lot of data is collected recently. The number grows exponentially. Even if the data is collected before, it’s not clean enough for analysis. Let along saying no enough data.

AI所帶來的服務是能幫企業內部提高價值,如電信業者想透過AI來減少Google Play上電信帳單代收的呆帳。銀行招募AI人才想透過NLP做客戶意見分析,民眾跟銀行的互動已經在網銀、行動銀行App所接觸的時間比去實體分行還多,透過Line與客戶互動,當越來越多的面對客戶管道所蒐集到的資料,會彙整到一個分析平台,來做使用者使用方式分析,來推薦商品或更清楚知道客戶在使用哪些功能,透過ML分析預測使用者的行為模式來帶來更好的服務品質。

What AI can bring in is helping enterprises increase value internally. Take the telecom industry as example, it leverages AI to reduce bad debts made by carrier billing from Google Play. Banks started to hire AI specialists for customer analysis by NLP. People engaging with banks happened more often on Web-bank, Mobile App than in physical banks. There is also interaction channel in Line. When there are multiple engaging channels with customers. The data is collected and merged into one analytics platform. Through customer behavior analysis, banks can recommend products and know more about what functions customers used. Through ML prediction and data analysis, it can bring in better service quality.

傳統的模型或是網路上開源的模型只能把performance帶到一個程度,如需要突破,必須要有ML的思維,自己建模型。AI其實一開始設計出的軟體performance不會是很好,需要不斷的調適,透過feature extraction、model selection、parameter tuning來提高performance。以全球知名的ImageNet比賽,在2015超越人類判讀,從此以後人類再也追不上了。AlphaGo下圍棋來說,一開始也不是比人好,但是經過ML researcher/ML engineer/Data Engineer的合作,終究會有突破的一天。

Traditional statistics model or open-source model can only bring the performance to a certain level. To make breakthrough, it must bring in ML mindset, making its own model. Most of the time, when AI model was developed in the beginning, the performance is not very good. It takes efforts to improve. Through feature extraction, model selection, and parameter tuning to improve performance. Take the global well-known ImageNet Challenge as example, the AI bypass human in terms of classification error in 2015. Since then, human won’t never be able to catch up. Take AlphaGo as an example, in the beginning, it loses to real human. Through the collaboration with ML researcher/ ML engineer/ Data Engineer, it finally beats human beings.

sdc_pic3_imagenet-challenge
Fig 3. ImageNet Challenge Trend,圖片來源

台灣一些晶片製造商做出sensor,如果有好的系統整合功力,把蒐集資料的data pipeline做好軟硬整合,是很有機會把產品/服務賣到全世界,把資料彙整到data center或是雲平台,然後搭配ML algorithm和data platform,可以做出解決特有的應用場景,相信可以幫助台灣把製造業的強項做到軟硬整合的end-to-end total solution,來解決客戶痛點。比較好的例子像是這兩家新創題目圍繞著監視攝影機保全,Umbo CV (盾心科技)(B2B)是賣給其他企業,Deep Sentinel(B2C)是賣給終端消費者。

In Taiwan, there are some companies making sensor chips. If there is good system integration capabilities, by integrating software and hardware for data collection/ data pipeline, it’s very likely to sell products/services to the world. It can collect and store data in data centers or cloud platform. By leveraging ML algorithm and data platform, it can solve customers’ pain points in certain scenario. By leveraging the existing strength in hardware manufacturing with software/hardware integration, it can provide end-to-end total solution. There are two good examples in video surveillance industry. Umbo CV (B2B) sells product/service to other business. Deep Sentinel (B2C) sells product/service to end customers.

智慧醫療在醫學影像上的幫助可以減輕醫生的負擔,不然看片還是由資深的醫生來看,如果把資深醫生的經驗讓AI學會,其實可以讓醫療品質提升,又減少人力,之前有報導史丹佛的研究醫學影像與阿茲海默症,甚至可以透過AI提早幾年給偵測出來。工廠可以埋IoT蒐集廠房數據,邁向工業4.0,透過AI做瑕疵檢測減少人力、透過AI做維護預測來降低停機的風險。

AI+healthcare can alleviate medical doctors’ burden in medical images. Senior medical doctors are usually asked to read the images. If the experience from senior doctors can be learned by AI, the healthcare quality can be increased with reducing man power. There was a report that Stanford researchers had good results in diagnosing Alzheimer’s disease by medical images. It can bring the diagnosis years before hand. In manufacturing factories, some IoT devices can be installed for industry 4.0. It’s possible to reduce man power in defect inspection via AI. It’s possible to reduce the risk of stopping machines for maintenance prediction via AI.

台灣軟體人才很優秀,不然不會吸引國際科技大廠來招員工,如果搭配好的軟硬整合系統架構把AI系統設計出來,其實還是很有機會在這波AI浪潮上,趕上世界的趨勢潮流。一個實際例子Google Play在2018年票選的最受歡迎App就有來自台灣不超過10人的小型開發團隊,台灣開發者軟體實力堅強。(參考報導)
這時候就看公司高層經理人能不能看到軟體工程師可以為公司帶來價值,而好好重用。

Software engineers in Taiwan are excellent. Otherwise, it wouldn’t attract some international technology companies recruiting employees in Taiwan. If we can integrate hardware and software with good system architecture design, we still can catch up the international AI wave. Take the following as a good example, in 2018 Google Play voting for most popular apps, there are small development teams with less than 10 people got elected. In Taiwan, the software development capability is strong. With it, it depends on whether higher managers can see through how much value software engineers can bring in. Software engineers should be valued.

在機器學習裡有分training phase/ prediction phase,台灣很多做硬體embedded system,以硬體思維會很高興哇可以有edge端運算,這樣也算帶到AI,然而要給預測辨識提高價值的如準確率辨識率,是需要靠大數據平台來做訓練,且需要不斷花時間調試。在資料的學習階段時還是需要軟體大數據,學習好後的模型就可以壓縮佈署到edge端。舉語音辨識相關的應用,可能是智慧音箱或是即時翻譯器,需要蒐集大量的詞彙用語來學習,台灣當地特有的用字遣詞與大陸地方就會不一樣,資料有地區性,隨著時間,新的詞彙可能會生出來,這都需要靠軟體做不斷的學習更新,然後再佈署上edge端。

In machine learning, there are training phase and prediction phase. There are many hardware embedded system companies in Taiwan. They are happy to see the opportunities of bringing AI via edge computing. However, the real value comes from increasing recognition/prediction accuracy. It needs big data platform for training. It takes time to tune parameters. During the learning/training phase, it still require software big data. After learning, it can compress and deploy the model to edges. Take voice recognition as an example, there are related applications, such as smart speaker and  real-time translation. It requires a lot of vocabulary for training. There are vocabulary differences between Taiwan and China. There is locality issue in data. With time flies by, new vocabulary might come out, it requires software continuous learning for update, then deploy to edges.

sdc_pic4_training_prediction
Fig 4. Training and Prediction system,圖片來源

製造業著重在硬體功能必須兜得起來、每個元件有按照規格做好該做的事情;軟體數據服務則必須 end-to-end 整個系統一起考量,否則很容易會落得 garbage in garbage out 的下場。軟體服務的價值在自己找到痛點並解決。未來會是以軟體提供服務價值為主導的趨勢,Netflix、Spotify、愛奇異、雲端服務也是以每月每月的計費,這就像是我們熟悉的水電費、手機網路費。

Manufacturing industry requires integration for hardware components. Each component follows spec for doing its own thing. Software data service requires consideration for end-to-end whole system. Otherwise, it might get garbage in and garbage out. The value of software service is to find pain points and solve them. The future will be the trend for software service. Netflix, Spotify , iQiyi, and cloud service are charged by monthly usage. It’s very similar to familiar water/electricity bill and mobile phone internet bill.

AI燒錢很兇,而且不容易招募AI工程師,應該說市場上有AI經驗的很少,企業因此花錢派工程師、經理去上AI的課,訓練起來,專案跑了一陣子還是有可能做不出東西,達不到想要的功效,因為AI要做成功有太多東西都要做對,才會有好的結果,如果企業想導入AI,建議找有多年經驗的專業AI架構師幫忙評估規劃整個計畫,來降低風險。

It’s very likely to burn cash while introducing AI. It’s not easy to recruit AI engineers. It should be said that there are fewer people with AI experience. Companies spend money to send engineers and managers for AI training. After the AI training, it is still possible to get nothing while running projects. It doesn’t come out with what was expected. Many things need to be done correctly for success for AI projects. If enterprises are thinking about introducing AI into the flow, maybe it’s better to find professional AI architects with multiple year experience to help on planning to reduce risk.

以無人自駕車當例子,它是非常燒錢的一項研發,需要蒐集很多影像資料,透過ML做出很多種複雜的場景判斷,美國科技大廠已經投入很多金錢、人力、資源去開發,做是可以做,但就是看公司高層的領導決策層想投入多少資源透過AI去解決什麼樣的問題,預期投資下去的錢何時能回收。投資不要只想賺快錢,太短視近利了。

Take self-driving car as an example. It’s a very expensive R&D. It needs to collect a lot of video data. Make some prediction for complicated scenes through ML. US technology companies already spent a lot of money, human power, and resources for R&D. Everything is possible. It depends on how much resources that higher-level managers are willing to throw in to solve what kind of problems. How long can the money bring back return. Investors should not just look at the quick money. It’s too short-sighted.

參考文章(Reference)

[1]https://www.facebook.com/ai4quant/posts/385788271986151
[2]https://data.leafwind.tw/build-software-engineering-and-data-culture-before-doing-ai-6e345986f872