Welcome to Osaka(n)

I’ve recently spent 3 months in Osaka, travelling and working. What may have been a distant, tourist experience became much more intimate and inspiring. My secret? A coworking space named Osakan Space.

Started by local entrepreneur Ms. Osaki (Osaki-san), it is a casual coworking space, housing more than 100 members. It offers a number of equipment and amenities including high-speed WiFi, 3D printers, couches, quiet spaces, staff assistance, even their very own Google Glass. While most members are in the tech industry, it is not exclusive. Photographers, researchers, even mangakas are found in Osakan Space, providing a diverse and creative atmosphere. There is even a resident ninja!

An Osaka native, Osaki-san has dedicated herself to promote local startups and businesses. Beyond a coworking space, Osakan Space also housed local startup events, including Startup Weekend and Shoot from Osaka(n). She took great efforts to foster a culture beyond mere acquaintanceship. Besides sharing work spaces, most members often spend lunch and off-work hours together. I was invited to events including table tennis meets, takoyaki parties, oden and nabe dinners. Along with daily self-introductions and status updates, we celebrated each others' successes and supported each others' challenges.

While Osaki-san is undoubtedly the heart of Osakan Space, the members share her spirit and actualize her vision. You can easily find successful and aspiring entrepreneurs sitting next to you. Many local startups, including MyISBN and Mimimiru, were born from Osakan Space. Members are also welcome to organize their own relevant events. During my stay, I've attended Scalar study group and tech demo day organized by the members. Surrounded by passionate entrepreneurs, it is hard not to be inspired.

A few months ago, I wrote about my Krash family in Boston. I am happy to share that I recently found myself in another family in Osaka.

How to Mine the Web

UPDATE: I forgot to mention that PiCloud has been bought by Dropbox (Congrats!) at the end of 2013. Fortunately MultyVac will rise from its ashes. How well MultyVac supports web mining has yet to be seen.

Recently, I've done a number of projects utilizing web mining techniques. Also known as web scraping, web mining is a set of methods to extract useful information from websites, be it content, connections, or others. With the stagnation in semantic web and the often-lacking API endpoints, web mining has become a popular way to retrieve and consume publicly available data in a scalable fashion.

A web scraper typically has 3 components: HTTP Client, Content Parsing, and Data Consumption. As I have been using mostly Python for my recent work, thus many of my recommended libraries will be in Python.

First Step

The first step to mine web for information is search for an API. It’s surprising how trigger-happy developers may get with web scraping. If an API already exists for the needed data, don’t scrape it! Web mining often requires much work to start and maintain, as any HTML changes may throw off the script. The best way to scrap the web is not do it.

HTTP Client

The HTTP client is responsible for chatting communicating with the internet. It needs to be able to make basic HTTP requests (e.g. GET and POST) and output the response. A good example is the GNU wget, retrieving the raw response at the URI of interests. Extra points go to libraries and tools that can understand and format the response. My favorite is requests, which saves developers time on common routines, including status code parsing and header extraction.

Often, this client may also disguised its requests as if from a popular browser, in order to better mirror the traditional user experience. Alternatively, a pool of IP addresses may be used. through cloud computing services such as AWS and PiCloud, in order to distribute the load. If the data you need is populated through an AJAX call, use tools such as Fiddler to examine the particular HTTP request that populated the data of interest. If the same response can be replicated through manipulating the GET arguments (i.e. the stuff following ? in the URI), you can make less requests to get the same information. Depending on the website, AJAX responses are often JSON objects, making it easier to parse and traverse.

Content Parsing

Once the content is retrieved, we need to traverse the data structure and identify the data of interest. If the response is in JSON format, there are many standard libraries for parsing it into a memory object. If the response is HTML, or even a javascript snippet, however, the task becomes a bigger challenge. Broadly speaking, the two solutions: Structural traversal and pattern matching.

We can traverse the XML tree (of which HTML is a type) with tools that supports XPATH expressions, or if you are using Python, use BeautifulSoup. By identifying the specific order or unique trait, XML parsers can quickly traverse to the node containing the target data. This method is sensitive to UI updates, as the node orders and attributes may change. If the website doesn't update often, this may be a good and easy-to-follow technique.

An alternative is to treat the whole response as a raw text file and apply text pattern techniques. Yes, I am talking about Regular Expressions (regex). Sometimes there are nothing uniquely identifying the node of interests; yet, the raw data itself is unique. For example, a price tag may appear in a random row of a table, but is always prefixed by the dollar sign ($). In situations such as this, regex is a powerful technique to quickly extract the needed data, while ignoring the structure of the HTML. Regex is also surprisingly effective when the target data is embedded within the javascript tags, instead of the HTML.

While these are two very different techniques, they are not mutually exclusive. I have often traversed to a set of relevant HTML nodes using BeautifulSoup, then use regex to extract the actual data of interest.

Data Consumption

An important step many developers forget, or not emphasize enough, is actually consuming the data. While it is the last step of importing data from the web, it should be the one of the first to be considered when designing the solution. How is the data ultimately used? What is the nature of data and how to make it easier to manipulate later? We need to think about these questions before we set out to build a web mining solution. If you are dealing to text data, you may want to apply text analytics techniques afterwards, requiring you to clean up the text and strip any HTML tags or encoded characters. If you are importing data tables, you may need to properly cast each row into useful data types (e.g. Int32, Boolean, String, etc). Ask any database engineers, and they can give you much thorough list of the dangers of improperly processed data.


Finally, it is important have an attitude of gratitude when mining information from the web. Often, the information we seek are not the primary purpose of the target websites, thus not guaranteed the quality, support, or even access. If we can avoid overstepping our boundary with the traffic load and usage, web community as a whole may be more inclined to share their data.

How to Krash Boston

It is amazing how impactful a well-ran community can be to its members. Think the early days of Facebook and Hacker News; and Reddit of today. I was fortunate enough to experience such a community recently. One that is driven by the goal of supporting and empowering entrepreneurs. This community, whose youth is betrayed by its maturity, is Krash. My name is Ken Hu, a data scientist, text analytics consultant, aspiring entrepreneur; and, I’m a Krasher.

I joined Krash in September 2013 (known as CrashPad at the time), stayed at one of its co-living space for 2 months, and continued to be involved for another 2 months, until I left the cities in which it currently operates (Boston, New York, and soon-to-come DC). I won’t try to recite its format and mission, you can look them up on its website.

Initially found it on Craigslist as a sublet house for entrepreneurs, I was unprepared for the depth of experience I will have. And it wasn’t simple either; I went through at least 2 interviews with its staffs. Much efforts were made to ensure the qualities of their tenants.

Krashers

As any good community, Krash’s true value comes from its members. The diversity of Krashers are great, across the US and the World. I met many residents from countries including Mexico, France, UK, Ukraine, Germany, Canada, and Poland. Many of them continue onto Silicon Valley, while other stays in the northeast or return to their home country.

Beyond geographical and cultural diversity, the residents have such a range of backgrounds as well. They range from fresh graduates, startup professionals, postgraduate students, aspiring entrepreneurs, and those as myself, jaded founders. It is that balance of experience and aspiration, ideas and skills that differentiate from many homogeneous startup communities I’ve observed. At Krash, I’ve seen collaborations made, lessons shared, and passions rekindled as its members engage with each other.

Engagements

Beyond the ad-hoc interactions between members, the Krash staff also make great efforts to stimulate intra- and inter-community engagements. Two of their regular events are Taste of E-ship and Sunday Family Dinner. The former event invites experienced and successful entrepreneurs within the local community for a quick interview and be involved with Krash members. You can find many of some the past flavors on Twitter with the hashtag #TasteOfEship. The guest speakers I’ve experienced have all been very passionate and eager to share their experiences and offer feedback. I was fortunate enough to find someone whom I can consider as a mentor, someone I was able to meet occasionally one-on-one and ask questions specific to my situations. I am extremely grateful for that opportunity.

The other big regular event is Sunday Family Dinner, which, you guessed it, is a huge dinner every Sunday night involving all Krashers in the location. This is an opportunity for introspection within the community, celebrating recent wins and announcing upcoming opportunities. It is exhilarating and inspiring to see others in community progress and succeed. Their efforts and excitement kept myself own aspiration and interests going in a tough time. Oh, and it’s free dinner.

Beyond the regular the events, there are also many opportunities to engage the rest of the local startup community. Look out to your local community manager (in my case, Colin) who will have their ear on the local startup events. From TechCrunch parties to local WebInno events, Colin notified and lead us to many opportunities to meet others startup communities in the area. There are so many events within the startup and tech community in Boston, I can practically have free pizzas every weeknights. I have no doubt New York is similar, if not more so.

At its core, Krash is a community of current/aspiring entrepreneurs. It strengthens its value through the interactions and successes of its members. Like any community, it is up to its members to fully utilize that value. Sure, one can simply use Krash as a short term living solution. However, so much of the potentials of being a Krasher are missed that way. I highly recommend you to check out the Krash website and make your own call.

Notes from Open Source Business Models, with Jim Whitehurst

I meant to write and post this earlier this week. Unfortunately, I ended up being too busy/tired from preparing my upcoming travels.

It's been two weeks since the original event. It was very rewarding. What originally drove me to the event was the topic of business models for open source technologies. While I am uncertain of my future ventures, I am very interested in making contribution to the open source community. Here are some notes I've jotted down from Jim's talk:

Open Source Business Models, by Jim Whitehurst

Jim Whitehurst is the President and CEO of Redhat. Coming from the airline industry, he took the torch from a line of successful CEOs and scaled it beyond its Linux business. His talk was mainly focused around successful business models around open sourced software. There was a hint of evangelism for open sourcing technology in his message, but he’s certainly preaching to the choir with me.

User-driven innovation is the strength behind open sourced technologies. What makes open source projects so powerful and innovate so quickly is because most of the contributors are users of the technology. Dogfooding is a powerful driver to push the technology forward. Leaders of a open sourced business must always keep that in mind.

Value and defensibility are the keys behind a successful business model. Often, the key innovation from a startup is how it changes consumer behavior by creating a new way to purchase an existing value. For example, Uber changed the way we purchase rides; just as Amazon changed the way we shop for, well, anything.

Enterprise users require the stability of long software life cycles, thus creating an opportunity. A key hurdle for open source technology adoption at large enterprises is instability from the rapid innovation. Bridging that gap, however, is where many open source technology companies create monetizable value.

The contributor community is essential for any successful open source project. To continue promoting the rapid, user-driven innovation to sets open source technology apart, creating and managing the community is critical. For a project-leading business, this may include supporting features and fulfill requests that does not generate monetary value. They must not overlooked, however. If the community fails, so does the lore of the project, and thus the business.

Lead the open source community by being the largest contributor. Besides supporting the community, it is also important for to properly lead the project. A business can lead an open source project by becoming its largest contributor. But do balance the business requirements and the community requirements. It’s not a proper community if your engineers are the only ones contributing.

Product road map are crucial to enterprise users, creating another opportunity. Knowing, and perhaps influencing, where a critical open source technology goes is critical for enterprise users. It allows them to plan internal development cycles in conjunction. Thus, the influence a business gains by being a project leader is a tangible value that can also be monetized.

Full stack compatibility and stability can be guaranteed through certification, while only innovating one component. Thinking along the stability requirements for enterprise clients, a open source technology business can implement capability certifications for third-party products across the vertical. It allows the enterprise clients to have trust in the full vertical, while only requiring the business to build a particular layer.

A scalable business must have a scalable business model that extends beyond the initial product. Consider that a key innovation behind a successful startup is its business model. Then for the startup to scale, so must its business model. A scalable business model can allow the business to extend beyond its core product, but also other verticals. For example, Google’s key business model is targeted ads. By staying true to it, Google was able to support many seemingly unrelated products, including Gmail and YouTube, that ultimately strengthens its portfolio.

Enterprise clients buy business solutions, the technology is only a piece of the puzzle. This is a key learning for me, recalling my experiences in B2B sales. Ultimately, the technology isn’t nearly as important as the product owner would like to think. Any B2B deal must be driven by a business need and provides a business solution.


I am extremely fresh with building businesses around open sourced projects. Meanwhile, I do believe in its cause. If you have tips and tricks to share, please let me know!

Notes from Startup Secrets, with Michael Skok

I attended an event in Boston last week. It covered two key topics: Startup Secrets, a startup framework, and open source business models. The event was structured with 2 talks followed by 2 panels. Each key topic gets one talk and one panel. Overall, it was a great experience, though a bit tiring (3 hour sessions with no breaks). The talks were both really beefy in content and insights. Despite running on backup power, I managed to jot down some notes to share.

Startup Secrets, by Michael Skok

Startup Secrets is a framework developed by Michael Skok, a partner at North Bridge Venture. It covers a wide range of considerations and focuses to improve the chance of success. Michael gave an overview of the framework along with clips from the courses he taught in conjunction.

Every pitch is a story. People are much better at remembering narratives than facts. A pitch is a lot more memorable if its a story, be it an user or founder story.

VC invests in value propositions, not ideas. Investors look for the value proposition underneath the cool “what if” ideas. It reflects the why of user adoption beyond "for shits and giggles".

The perfect problem is unworkable, unavoidable, urgent, and underserved. "If you build it, they will come" doesn't always apply, but it is more likely to if the right customer problem is solved. Michael outlined some guidelines to identify the right problems. First, the existing solutions should be obviously broken. The target customers also cannot ignore it nor postpone it. Furthermore, the market should not be saturated with options. If you are the only shop open with turkeys on Thanksgiving eve, people will find you to satisfy their poultry cravings.

Opportunities are built around discontinuous innovation, defensive technology, and disruptive business model. To ensure that the business cannot be easily replicated, there must be a significant gap between the status quo and the product. The technology driving the product may not need to be ground breaking, but it should be costly to replicate. The true innovation behind many successful startup, including Google and Uber, is their business model, allowing them to deliver the same values in new ways to their customers.

Technology adaptation requires tipping the pain-versus-gain scale. People are creatures of habit, no matter if it is deciding what detergent to get or what business software to buy. Thus, when a startup seeks to introduce new technology into the target customers’ lives, there is an obvious cost. This pain of adoption must be offset by the benefits of the new technology, be it money, time, or convenience. This concept is also explored by many marketers, trying to alter consumers’ purchasing habits.

Start with mission, build a roadmap to reach it, then create a culture to execute it. This is a topic that is dear to me, so it’s great to hear it echoed by veteran entrepreneurs. Everything stems from the company mission. Along with the roadmap to fulfill it, the mission is an invaluable tool in converting early investors and customers. Finally, the key to execute the roadmap to fulfill the mission is through the company culture. It is the vehicle to hire the right people, build the right team, and ensure the team will deliver without constant management.

Key hiring question: What are you passionate about? Knowing the passions of a candidate helps determine their alignment with the company mission and place them into the proper role.

The perfect hire excels in passion, intelligence, integrity, and initiative. ‘nough said.

A great business model should be repeatable, scalable, valuable, and predictable. This is certainly the key theme of the event. Many startups became successful because they constantly improve their business model, protecting them against their competitors. For a business to scale, its customer acquisition needs to be reproducible from client to client, and even from product to product. A predictable business model will make the startup more attractive to investors and allow the leaders to plan for future growth, relying on past successes.


Well that was quite a lot of content, and we are only half way done! I will share my notes on the talk by Jim Whitehurst, the CEO of Red Hat, in the near future.

Here's an Idea: Show-and-tell Coding

In my last post, I shared my incentives for writing better code. Rather than obsessing over best practices and coding styles, make it harder for lazy people to write bad code.

So here’s an idea, instead of pair programming or rigorous code review, how about a show-and-tell, a.k.a slash knowledge transfer, on a monthly basis? Each developer would share the “coolest” work they have done in the past month and have a short presentation about it.

There are few reasons why I am proposing this practice, most of them are about incentivising better code from lazy developers. Because developers need to regularly teach their code to their peers, it incentivize the developers to write more readable code. Instead of pairwise knowledge transfer, the show-and-tell would spread the knowhow among the whole team, minimizing the need for documentation (that people are unlikely to read). Developers are also more likely to be conscience of the code the write, to better select the interesting pieces they want to present. Finally, since it is once a month, less time are taken comparing to pair programming or code reviews.

But what if a developer cannot find something they are proud to present? Then, we have a problem. Not necessarily with the developer, mind you, but the team as a whole. Is the work that uninteresting that developer cannot be passionate about it? By now, I (would like to) believe that most leaders, e.g. managers, are aware how suboptimal are unmotivated developers. I also (would like to) believe most leaders recognize that interesting, meaningful work motivate employees much more than salary, bonus, or free Redbull.

What do you think? Have you seen similar practices in action before? This has been a thought exercise, so I’d love to hear some feedback.

Incentives for Better Code

I am lazy. I would rather spend hours catching up on Studio C skits than sitting at my desk working on code. Note that I did not say “writing” code, but “working on” code. Despite popular belief, especially those who have never worked as a developer, we don’t spend a lot of time developing or writing code. Instead, we fix bugs, update configs, create environments, destroy environments, induction, knowledge transfer, etc. And I am too lazy to do any of them.

I had the opportunity to work in various teams, including test, operations, and development, over my years as a professional developer. Suffice to say, I have read code produced by many different “developers”, from analysts to academics, from beginners to seasoned developers. In the coding realm, there are no shortages of conversations on code quality, producing guidelines include test coverage, entropy, coupling, etc. I am too lazy to remember all of those jargon.

Instead, I tend to let my laziness drive me. If I write better code, it would reduce the time I have to work on code.

Any team of substantial size, i.e. 2, would have to deal with the problems of knowledge transfer. A lot of practices are created to reduce the friction of this, including pair programming, documentation, etc. While they do work, I prefer writing more readable code: Writing functions and classes that are short and concise; properly referencing external resources; commenting when something complicated cannot be further simplified. Instead of writing a book of documentation that no one will read, I can just tell them to rtfc.

Fixing bugs suck. I am not terrible at it, but I certainly don’t enjoy it. To combat the creation of bugs, we write tests. However, that’s a lot of work to do too. Especially for fail-fast language including Python, it’s easy to find out the line of error. Instead, I found myself writing tests only for complex functions, and keeping the architecture as simple and independent as possible. If something goes wrong, I can quickly and easily find the problem.

I don’t believe writing better code is hard. We only need the right incentives, not lines of code, not test counts. We only need to think beyond the present moment, and imaging all the painful time we will have if we don’t write better code.

Why I Left the Startup I've Founded and Built for 2 Years

It was best of times, it was the worst of times.

The startup I’ve spent two years building is finally starting to get attention. Opportunities are coming to us. The team, correction, most of the team is coming together, churning out product improvements and customer values efficiently and effectively.

Two years ago, I left my full time job, dreaming of manifesting my vision of the ideal company, that treats its members like family and delivers its customer real values. I was a solo founder for the first year, sending cold emails, developing the product, networking at conventions around the country. I learned a great deal about the industry, the business, and how to get people to believe in me and my startup.

Fast forward 20 months to September 2013. I’ve decided to leave the company I’ve founded, built, and grew to a revenue positive entity without any outside funding. It certainly was not an easy decision, giving one’s child away. I suspect that I will always wonder what would have happened if I have stuck around. And the sentiment is not mine alone. Nearly all those I’ve shared the story with have expressed the sympathy of the missed opportunity and wasted hard work.

So the (potentially) million dollar question is, why did I do it? What would possess a solo founder to leave his own company?

The answer is simpler than one may think: Mission. Well, mission, with a lot of mistakes along the way. The company mission is everything. It is what rallies the team when the times are tough. It is what gets others excited enough to vouch for a startup to their boss, colleagues, or on their podcast. It is what startup founders have to preach again and again that it is ingrained behind every business decisions they make. Without it, a startup is no different from the “business as usual” companies, without a personality or a real goal.

My most important job, as the founder, is to protect the mission. And I have failed. Given into the pressure of immediate revenue, I allowed our mission to be compromised to appeal to anyone with money. And that’s the beginning of downward slope. I delegated the role of communicating our mission and value, then retreated to my comfort zone of technology. I wanted to sit comfortably while others rake in the dough. That was my biggest mistake.

All core founders are business founders, even if they are technical. My retreat into the technical role was initially beneficial, quickly improving the quality of the product and fixing the bugs that have been put off. However, without the core founder(s) to lead the charge, the representation and direction of a startup can sway easily and quickly on the front line. The direction of my startup drifted further and further away from the original mission without me there. And once I got back in front of the customer, media, and business, I had to fight for its soul.

Screw benefit of the doubt, especially if there are equity involved. When I returned onto the business side, aiming to realign the business to its original mission, I started a long and treacherous war. The misalignment had settled in and the lack of direction had already become the norm. Retrospecting, I wondered if things would have been different with another team member. I realized that I recklessly gave the candidates the benefit of the doubt. It is true that in the pay-it-forward startup community, we all must trust strangers to allow them to help. But recruitment is different. If I can, I would tell the me 20-months ago not to give any benefit of the doubt. Unless I was wow’d and don’t mind begging the candidate to join, the answer should have been “no”.

In the end, I was too tired to continue the battles. The company is no longer serving the mission I had initially set forth. The internal battles have become crippling . Thus, I have decided to step aside, allowing others to raise their own banner and rally the troops under a new company mission. For all I know, the new mission can be extremely lucrative. As for me, I have no interested in that new mission; I am not in it for the money. Despite the lost efforts of the last 2 years, I have learned much from the ordeal. I prefer to count my losses, and look for a new banner I can rally behind.

For the record: No, I have no idea how the post turned into a giant medieval warfare analogy.

Why We Search

This is supposedly where I quote a dictionary definition of “search”. But I’m sure most of you can Google. (Actually, I highly recommend Wiktionary.) Instead, I hope to discuss what search today means to different users, or at different times to the same users.

Search functionality has permeated through all aspects of UX. It was once said that if a page cannot be found on Google, it does not exist. I’d take that hyperbole further, if a piece of content cannot be found by the intranet search, it does not exit. Users today are fully spoiled by Google and expect the same level of quality for all intranet searches. To better serve a search-happy audience, we must first understand how and why they uses search. Fortunately, the same knowledge that SEO analysts use to game search engines can be used to improve our own search engine.

User Intents

The conversations on search intent boomed in 2007 along with the interests in SEO. Marketers realized that to get better AdWord conversions, the queries they bid must meet the users' intentions. Traditionally, SEO specialists consider 3 types of user intents: Navigational, Informational, and Transactional. However, for information retrieval scientists, that list grows in both in breadth and depth.

Navigational

Navigational intent is the most straight forward, but perhaps the most unanticipated. Users with this intent already know exactly what they seek. In fact, they may even know the URL. Search query of this intent may look like "coke", "vw", "disney", or even "tesco.com". Because they tend to have confidence that they gave the search engine everything it needs, their tolerance for bad results are lower.

Informational

Informational intent is one of the hardest to serve. These users are looking to learn on a topic of interests. This can be surrounding information behind the latest news, that actor in that movie you liked, or what is 2 to the 11th power (2048). This is an area which Google triumphs many competitors, with their introduction of semantic search improvements.

Due to the broadness of the intent, researchers introduced many dimensions to better classify search queries. These additional features include Directed vs Undirected, Open vs Close, Advice, Locate, and more. Search News Central has a nice blog post that describes the subject.

Transactional

The primary interests for SEO marketers, a Transactional intended user starts with a purchase in mind. These queries may range from "movie times at Night Hawk", to "cheap halloween costume". Marketers may dig deeper to determine the user's place on the buying cycle. From many marketplace website, optimizations in this area tend to be lacking, costing them potential conversions.

Exploratory

Exploratory intent is often considered as a specific type of Informational intent, i.e. Undirected, Listed results. However, I want highlight this intent due the rise of multimedia contents. For media-focused websites, e.g. YouTube and SoundCloud, users are much more likely to exert exploratory intent. That is, searching "rock" and expect results that can be classified as a rock song or video. For this intent, relying solely on textual context of the media content will result in many false positives. There are much exciting growth to come with improvements in audio and video analysis.


To better serve users, we must go beyond what their search query, but understand why they search. For my next post, I want to dig a little deeper to see how many modern search engines are satisfying these user intents.

Lycos and Me

No, I wasn't a loyal user of Lycos. The title just sounds better that way.

With the advent of Google, online search has become an integral part of many people’s online experience. Search technology, a.k.a information retrieval, also had a profound effect of my career, from my first peek under the hood to the founding a startup from a related technology. Recently, I had the opportunity to think a lot about search technology, what is it, how people use it, and where it may go in the future. As a relatively young, yet crucial technology, I've decided to split the bubbles of my thoughts into multiple posts, starting with its influences on me.


It's hard now to think search used be such a fragmented market. Bouncing between AskJeeves, Yahoo, and Lycos, I was actually a late adopter of Google. Perhaps queries of my interests then had yet to be heavily spammed. Perhaps I got used to picking out spams from legit results through the urls. In any case, my undergraduate search engine experience was somewhat bland, with the exception of doing a class project for a meta-search company.

It wasn't until nearly 2 years after university, when I was doing my Masters, I dug into the engine behind the internet giant. By then, I already had years of web app, SaaS, and web security work experiences under my belt. This time, I was fascinated. These technologies empower hundreds of millions of users to find their needles in the haystack. Moreover, these techniques can connect pockets of knowledge with each other, scoring similarities between concepts and entities on the web.

Once that connection is made, it’s hard to ignore again. The search engine technology can go beyond search: Not only indexing the web, but indexing concepts. This realization lead me into the realm of recommender systems. I started to explore various research papers on the how to identify and score similarity between two entities, rather just query and webpages. Eventually, a couple of graduate school friends and myself decided to found Magoou, a personalized online magazine. Magoou eventually dissolved for commitment issues. Around this time, Flipboard just raised a huge seed round and making a scene in New York. The ideas behind Magoou, however, eventually lead me to another project.

After I got my postgraduate degree, I joined a local growth-stage, flight-search startup, Skyscanner. (Quick shout out to Skyscanner who was recently valued at $800 Million. Great job!) During my time there, I got to see how a non-webpage search company operates. Taking the inspiration from Magoou, I created a travel destination recommendation engine as a side project. The concept is simple: If two travel guides share similar expressions, then the destinations they describe are also very similar. I was also involved an internal hackathon to build a flight query parser, utilizing my familiarity with text content from information retrieval technologies. This experience ultimately lead me to found my startup in text analytics for social contents.

After this long loop with information retrieval, recommendation system, and text analytics, I found myself back at the starting point but with a fresh perspective on the challenges and potentials on search technology. I hope my brain dump in the next few posts can spark some conversations and encourage others to explore this exciting field.